Experimental phasing opportunities for macromolecular crystallography at very long wavelengths
Protein sulfur content for S-SAD
To consider S-SAD as a generic method for phasing, we have analyzed the proteome of the different domains of life (Fig. 1c). The average sulfur content, defined as the percentage of the sulfur-containing amino acids cysteine and methionine in archaea and bacteria is about 3.5% (median 3.2%), whereas in eukaryotes it is slightly higher, with a mean sulfur content of about 4.4% (median 4.1%). Obviously, there are variations in the sulfur content; extracellular proteins tend to be rich in cysteines and disulfides since they are situated in an oxidizing environment. Often domains rather than full-length proteins are studied by X-ray crystallography, making them easy targets. The Bijvoet ratio16 has been used as a measure for the expected anomalous signal and a value of 0.6% has originally been proposed as a lower limit to predict the success of SAD phasing experiments17. Based on this, a sulfur content of 2% would be enough to measure a useful anomalous signal for S-SAD at λ = 2.06 Å, and even a sulfur content of only 0.25% would be sufficient at wavelengths close to the sulfur K-edge (λ = 5.02 Å). However, the sulfur content on its own is not a reliable indicator for a successful S-SAD experiment. Terwilliger et al. studied the important parameters contributing to SAD analyses and found that a low number of unique acentric reflections and a high number of anomalous scatterers are detrimental to the success of SAD phasing18. Based on 52 S-SAD projects on beamline I23, we simplified the resulting formula to the ratio between the number of unique reflections and that of anomalous scatterers. For successful S-SAD phasing of data collected at a wavelength of λ = 2.75 Å on beamline I23, this ratio typically needs to be over 1000 which corresponds to 89% of deposited structures in the PDB (Fig. 1d). Through an analysis of 52 S-SAD projects collected on I23, we successfully solved 41 structures. Among these 41 structures, only 4 had a ratio below 1000, accounting for over 10% of the successful cases. Conversely, out of the 11 failed projects, 8 had a ratio below 1000, representing approximately 72% of the failures. For a given sized protein, a larger number of sulfur atoms will require a correspondingly higher number of unique acentric reflections, hence higher resolution. We have developed a web app that can predict the resolution required for successful S-SAD on the I23 beamline at λ = 2.75 Å based on the space group, unit cell parameters, and sulfur content (Yosoku-I23 Resolution Requirement for Phasing, https://diamondi23.anvil.app/).
S-SAD measurements at long wavelengths
As stated previously, for S-SAD the anomalous signal increases towards the sulfur K-edge. However, due to sample absorption effects, collecting datasets at very long wavelengths is detrimental to the data quality, compromising the anomalous signal. We have used test crystals (insulin, lysozyme, proteinase K, and thaumatin) to determine the optimum wavelength for S-SAD experiments on the I23 beamline (Supplementary Fig. 1 and Supplementary Tables 1–4). A wavelength of λ = 2.75 Å (f” = 1.6e−) was found to be a good compromise and if the crystals have uniform dimensions, it is possible to select a longer wavelength. At longer wavelengths, X-ray absorption from the crystals, sample holder, and surrounding materials is difficult to account for with current data reduction software packages, especially when the shape of the crystal is not uniform. Once the wavelength is set, the typical I23 data collection strategy for native-SAD phasing requires 3 × 360° of data from a single crystal taken at multiple orientations with a multi-axis goniometer with a flux of ~2 × 1010 photons s−1 from a non-focused beam matching the crystal size. The number of datasets required for structure solution and electron density map interpretation is determined in real-time by evaluating the success of substructure solution and secondary structure identification in electron density maps after each collected dataset. This data collection strategy allows the assessment of radiation damage after each low-dose dataset. Successful experimental phasing can be considered as proof that radiation damage has been minimized during data collection.
We also used test crystals containing sulfur (thaumatin collected at λ = 2.75 Å) and zinc (LMO4 collected at λ = 1.28 Å) as anomalous scatterers to assess the benefit of our experimental setup. A number of factors might contribute to enhancing the anomalous signal, such as beam stability, the absence of a cryo-stream responsible for sample vibration, or the decreased parallax effect due to the cylindrical shape of the detector but we focused on the reduced background noise of the detector, which is a result of the in-vacuum environment. The sulfur and zinc anomalous peak heights were calculated for different noise levels added artificially. The results demonstrate that the presence of added background noise diminishes the heights of anomalous peaks for both crystals (Supplementary Fig. 2). This effect is more prominent in the LMO4 datasets since it is more weakly diffracting than the thaumatin ones (LMO4 I/σI = 16.6 vs. Thaumatin I/σI = 42.2) (Supplementary Tables 5–6). These results show that reduced background noise is clearly benefiting the anomalous signal, and this is even more pronounced for weakly diffracting crystals.
S-SAD as a routine method
Using this protocol, we selected 10 projects solved by S-SAD phasing at the beamline, including soluble and membrane proteins (Fig. 2 and Supplementary Table 7). Structures varied in size with molecular weights ranging from 14 to 114 kDa, sulfur contents varied between 0.9 and 6.5%, and diffraction resolution from 1.8 (detector limited) to 3.4 Å. The use of native SAD has also been extended to 9 additional projects where potassium, chlorine, phosphorus, calcium, vanadium, cadmium, and iodine were exploited as main anomalous scatterers. These elements were present as co-factors or in the crystallization conditions (Fig. 3 and Supplementary Table 8). We also report the structure solution of two laser-shaped protein crystals that allow anomalous data collected at wavelengths longer than 4 Å to be improved (Fig. 4 and Supplementary Table 9). Finally, to assess and compare the anomalous signal from these projects, anomalous peak heights from phased anomalous difference Fourier maps are reported in Supplementary Tables 10 and 11.
Soluble proteins such as RNaseA, Petase19, ThcOx20, or Ssek321 (Fig. 2a–d) represent typical examples that were solved by S-SAD following our protocol, despite ThcOx diffracting to a resolution of 3.1 Å and Ssek3 showing signs of pseudo-translation. The overall multiplicity for these datasets was below 26, which is four times lower than the multiplicity typically needed for S-SAD at shorter wavelengths (100)10 (Supplementary Table 7).
As mentioned before, most proteins have a sulfur content sufficient for S-SAD phasing. Amongst the structures solved at the beamline, the lowest sulfur content was found in the PAS domain of the protein codified by the locus tag LIC_11128 from Leptospira interrogans serovar Copenhageni Fiocruz L1-130 (Fig. 2e). The domain contains only one sulfur in 116 amino acids (0.9% sulfur content), nevertheless the substructure solution was straightforward in SHELXD22 with 360° of data, a resolution of 2.5 Å, completeness of 83% and an overall multiplicity of 11. For refinement purposes, additional datasets were included, resulting in a final multiplicity of 49.2. However, these additional datasets were not necessary for the structure solution. This example shows that low sulfur content is not necessarily a limiting factor for phasing if the anomalous signal can be enhanced at longer wavelengths, here λ = 3.09 Å.
To further showcase the capability of in-vacuum long-wavelength beamlines, we determined the structure of the RNA recognition motif (RRM) of Seb1 (Fig. 2f) that was originally solved by S-SAD after collecting datasets from 16 crystals at a wavelength of 1.77 Å on beamline I03 at the Diamond Light Source23. A merged dataset with an overall multiplicity of 167 was obtained and led to the sulfur substructure being solved in SHELXD22, nevertheless, phase extension to a native resolution of 1.0 Å was needed to get interpretable experimental electron density maps. To confirm the benefit of collecting at long wavelengths, we collected from Seb1-RRM crystals on beamline I23. A single dataset of 360° from a crystal diffracting to 1.8 Å was sufficient not only to solve the substructure but also to solve the structure without any phase extension, despite the low symmetry space group (C2), low overall completeness (65%) and the 27-fold lower multiplicity of only 6. This case clearly shows that data collection at longer wavelengths requires substantially reduced overall multiplicity. Solving a structure from a single crystal with low multiplicity is advantageous for multiple reasons: the number of crystals might be limited; crystals may not be isomorphous and cannot be merged or they may be especially susceptible to radiation damage.
Since integral membrane proteins are notoriously difficult to study, we show here that S-SAD (with sulfur as the only anomalous scatterer) was successful in phasing the α-helical membrane proteins AcrB24, McjD25, A2AR, and mPGES (Fig. 2g–j). mPGES, diffracting to 1.77 Å resolution, was straightforward to solve even with a low overall multiplicity of 17.5. In contrast, AcrB, McjD, and A2AR diffracted to only modest resolution: 3.4, 2.8, and 2.9 Å respectively (with corresponding molecular weights of 115, 65, and 50 kDa). A2AR was previously solved with long-wavelength native phasing at X-ray free-electron lasers at 2.65 Å resolution (detector limited)26, using a serial crystallographic approach with samples delivered by a high viscosity injector. In total, 199136 images from crystals with an average size of 35 × 35 × 5 µm were collected at the SwissFEL at a wavelength of λ = 2.71 Å, and automatic phasing was successful using data from 50000 crystals. On beamline I23, a single crystal of A2AR 20 × 20 × 5 µm (18,000 images) was sufficient for successful S-SAD phasing. Long-wavelength native phasing at X-ray free-electron lasers may be useful if crystals cannot be optimized to larger sizes, are very susceptible to radiation damage, or need to be studied at room temperature, otherwise it is easier and more practical to collect at long wavelength at beamline I23. AcrB was possibly one of the most challenging projects due to its large sulfur substructure (45 sulfur atoms) and low resolution (3.4 Å) resulting in a ratio of unique reflections over a number of anomalous scatterers of 650. It required the collection of five datasets of 360° each, reaching an overall multiplicity of 86.9, all from a single crystal because of non-isomorphism. For McjD, two crystals were needed totaling six datasets of 360° and an overall multiplicity of 54. These examples show that S-SAD is not necessarily limited to well-diffracting crystals and the method is applicable even to more difficult membrane proteins.
Other-SAD at long wavelengths
About a third of proteins are bound to a metal cofactor, with zinc and iron regularly used for phasing. Calcium is used less often because its K-edge is located at a longer wavelength (λ = 3.07 Å), typically not accessible at standard MX beamlines. A domain of the accessory Sec-dependent serine-rich glycoprotein adhesin from Streptococcus oralis (Fap1) was solved by Ca-SAD on beamline I23 from two calcium ions and 1080° of data to a resolution of 2 Å (Fig. 3a). Compared to sulfur, the calcium anomalous signal is stronger and more efficient for phasing (Supplementary Table 11). Calcium plays an important role in many cellular processes, so it can be easily used instead of, or combined with sulfur for phasing, as in the case of the β-barrel membrane protein AlgE (Fig. 3b). AlgE crystallized in a low symmetry space group (C2) and contained 6 sulfur and 2 calcium atoms. For successful phasing, it was necessary to merge datasets from 2 crystals (a total of 7 × 360°). The substructure could be found readily but increasing the multiplicity to 26.7 was necessary to obtain interpretable electron density maps. From our experience, proteins with secondary structures mainly composed of β-strands tend to need more data to obtain interpretable initial maps.
Where sulfur is not present in the protein, other absorption edges can be reached on the beamline I23. We have solved the K+ selective transporter NaK2K using K-SAD (potassium K-edge = 3.43 Å) with four K+ ions located within the selectivity filter of the membrane protein with strong anomalous peak heights >30 σ27 (Supplementary Table 11) (Fig. 3c). A second example of a protein without any sulfur atoms is the streptavidin mutant Streptactin XT28 (Fig. 3d). Data were collected at λ = 2.75 Å from crystals that diffracted to better than 1.8 Å resolution. The structure could easily be solved with seven chloride ions bound to the protein although the data was not collected at the chlorine absorption edge (Cl K-edge = 4.39 Å). To the best of our knowledge, this is the first example of Cl-SAD.
Within the I23 accessible wavelength range, absorption edges of non-physiological elements can also be used. These elements are either incorporated in protein ligands, such as vanadium in vanadate or are present in the crystallization buffer. We have previously shown that V-SAD is a rapid method to obtain experimental phases for protein structure determination29. A single vanadium, bound as a reaction mechanism inhibitor (VO3−), was enough to solve the 110 kDa membrane protein SERCA diffracting to 3.1 Å resolution (Fig. 3e). Iodine has been used to phase protein structures for example with the magic triangle30 and is sometimes found in crystallization conditions, but the three iodine L-edges are located between λ = 2.38–2.72 Å (Fig. 1b). Hence a beamline that can access longer wavelengths is more suitable to get the optimum iodine anomalous signal (f” > 10e−). The protein TauA was crystallized in a solution containing 200 mM NaI, 14 iodine ions were bound to two TauA molecules present in the asymmetric unit, allowing the structure to be swiftly solved by I-SAD31 (Fig. 3f). Another electron-rich element that was used on the beamline is cadmium, again found in the crystallization condition of the Loei River virus GP1 glycoprotein (Fig. 3g). Data from GP1 crystals were collected at λ = 2.75 Å (Cd L-edges = 3.08–3.53 Å) giving diffraction to 3 Å resolution, with four Cd atoms bound to the protein32. With a f” > 10e−, cadmium provides a strong anomalous signal even at low resolution.
Finally, phosphorus is an essential element in biology and is found in nucleic acids and some protein co-factors such as NADPH and FADH. There are very few nucleic structures solved by P-SAD, which can be explained by two main reasons. Firstly, phosphorus atoms in the nucleic acid backbone tend to have higher B-factors than for example main chain protein atoms, as they are exposed on the surface of the molecule, and larger atomic displacements make substructure determination more difficult. Secondly, nucleic acid crystals usually have small unit cells, hence a low number of unique reflections, and a high number of anomalous scatterers (one per nucleotide). From our experience of solving nucleic acid structures at medium resolution, an additional anomalous scatterer is needed, even if a strong phosphorus anomalous signal is measured. The Pseudorabies virus RNA G-quadruplex and the i-motif of the human telomeric sequence were solved with potassium and bromine, respectively33 (Fig. 3h). These initial phases were used to locate the phosphorus atoms and improve the experimental phases for model building. Being able to solve the structure with potassium and not phosphorus shows again that the ratio of unique reflections over a number of anomalous scatterers is critical for structure determination. In the case of nucleic acid—protein complexes, asymmetric units are larger because of the presence of proteins yielding a larger number of unique reflections per anomalous scatterer. Hence, the structure of such complexes can be elucidated; as in the case of IRF4, where both phosphorus and sulfur atoms contributed to the phasing power34 (Fig. 3i).
Managing X-ray absorption at long wavelengths
Despite the evident benefit of long wavelengths for experimental phasing, the protocol can be further improved by managing the effect of X-ray absorption from the samples, either by introducing analytical sample absorption corrections or by machining the crystal sample to a uniform shape, such as a sphere or a cylinder. The first method requires an accurate measurement of the sample shape, which can be obtained from X-ray tomography experiments. An absorption factor, based on the path length through the different sample materials (crystal, sample holder, and surrounding materials) and their absorption coefficients, can be applied for each reflection as the basis of the analytical absorption correction. The second method applies laser shaping of the crystal to remove all non-diffracting materials and define more regular path lengths through the crystals35. This method has the added advantage of choosing the crystal size, as smaller crystals absorb less X-rays at longer wavelengths. Crystals of BphA4 complexed with FAD were shaped as spheres at SPring-8 (Japan) and collected at the phosphorus edge (λ = 5.76 Å) on beamline I23 (Fig. 3j and Supplementary Table 9). At this wavelength, the anomalous signal from sulfur is negligible. Only the anomalous signal from the two phosphorus atoms of the FAD molecule was present to successfully phase BphA4, despite the low resolution of 3.7 Å (detector limited) and a low overall multiplicity of 12. Since the two phosphorus atoms are close to each other, they behave as a super-phosphorus with an anomalous peak height of 39.7 σ (Supplementary Table 11) (Fig. 4a). We also laser-shaped crystals of the β-barrel membrane protein Ompk3636 as cylinders. Ompk36 crystallized as a trimer, with 2 sulfur atoms per monomer and 3 additional sulfate ions bound to the trimer. Six datasets of 360° each (multiplicity of 22.3) were collected from a single crystal at a wavelength of 4.13 Å and were sufficient for S-SAD phasing (Fig. 4b and Supplementary Tables 9–10). The resolution of the structure was limited to 2.7 Å due to the detector resolution limitation at this wavelength. As mentioned before, β-barrel membrane proteins are more challenging to solve and our attempts to solve the structure with our standard protocol (i.e., not laser-shaped) were unsuccessful. This example shows how long wavelengths can be crucial in solving a difficult structure. The full potential of the beamline can be achieved if the absorption effects are corrected or equalized and the crystal size is decreased. Laser shaping provides an enormous advantage for long-wavelength phasing and helps with extracting the optimum anomalous signal from sulfur, phosphorus, and other elements. Laser shaping can also be applied to determine the identity and possibly the oxidation state of light metals. At the longest wavelengths available, even a large detector like the Pilatus 12 M is limited to about 3.7 Å resolution (Supplementary Fig. 3), but since the multiplicity required for phasing is very low, the same crystal can also be measured at shorter wavelengths to obtain higher resolution for structure refinement.
In conclusion, the beamline I23 at Diamond Light Source significantly extends the available wavelength range (λ = 1.1–5.9 Å) for anomalous experiments on macromolecular crystals. A variety of absorption edges can be utilized for experimental phasing and element identification, including K-edges (Zn, Cu, Ni, Co, Fe, Mn, Cr, V, Ca, K, Cl, S, P), L-edges (I, Cd, Ag, Pb), and M-edges (Pb, Hg, Au, Pt). While certain experiments, such as Zn-SAD, can be performed on standard beamlines, conducting the same experiment on I23 would yield superior results due to its significantly lower background, resulting in data of higher quality. We demonstrate that long-wavelength native-SAD has become a very compelling technique, where a low multiplicity dataset from a single crystal can be sufficient for successful structure determination, making it a general vehicle of choice for experimental phasing. This capability is particularly advantageous for projects with limited crystal availability that require experimental structure solutions. Crystals sensitive to radiation damage also benefit from the I23 beamline instrument due to its low standard flux and non-focused beam, allowing for the collection of multiple sweeps of 360° data. Moreover, the large beam size (up to 500 × 500 μm) enables the entire crystal to be exposed, resulting in increased recorded intensity. Typically, once the multiple datasets have been merged, the recorded resolution on the I23 beamline is comparable to that of standard beamlines. However, it is important to note that at long wavelengths, the resolution can be limited by the detector, as illustrated in Supplementary Fig. 3. Se-SAD (λ = 0.97 Å, f” = 4e−) is a widely used experimental phasing method, and although the I23 beamline cannot reach such short wavelengths, the same f” value can be obtained at 3.1 Å wavelength (close to the Se L-edge: λ = 7.49 Å). However, this does not account for the Se K-edge white line. Hence, a standard beamline is more suitable, especially for Se-MAD experiments. Nevertheless, one must consider if Se labeling is necessary when S-SAD can be performed with native crystals. In addition, native-SAD data provides accurate locations of anomalous scatterers: sulfur positions can assist with assigning the protein sequence with the help of cysteines and methionines, and positions of other scatterers can help with the identification of co-factors or metal ions to improve the quality of deposited models. In this study, we show that the increased wavelength range not only boosts the anomalous signal for sulfur and phosphorus, but additional elements, like calcium, potassium, vanadium, or chlorine can now be routinely considered for experimental phasing experiments. We are currently establishing protocols to use analytical absorption corrections and laser shaping to deal with the increased sample absorption. This will improve data quality at the longest wavelengths further to exploit the full potential of this method.
link