Impact of the Conformational Variability of Oligopeptides on the Computational Prediction of their CD Spectra

While successful in the structural determination of ordered biomolecules, the spectroscopic investigation of oligopeptides in solution is hindered by their complex and rapidly changing conformational ensemble. The measured circular dichroism (CD) spectrum of an oligopeptide is an average of the signals coming from all the ensemble of microstates, severely limiting its interpretation, in contrast to the successful structural determination of ordered biomolecules. Spectral deconvolution methods to estimate the secondary structure contributions in the ensemble are still mostly based on databases of larger ordered proteins. Here we establish how the interpretation of CD spectra of oligopeptides can be enhanced by the ability to compute the same observable from a set of atomic coordinates. Focusing on two representative oligopeptides featuring a known propensity towards an α -helical and β -hairpin motif, respectively, we compare and cross-validate the structural information coming from: deconvolution of the experimental CD spectra, sequence-based de novo structure prediction and molecular dynamics simulations based on enhanced sampling methods. We ﬁnd that small conformational variations can give rise to signiﬁcant changes in the CD signals. Therefore, while for the simpler conformational landscape of the α -helical peptide de novo structure prediction can already give reasonable agreement with the experiment, an extended ensemble of conformers needs to be considered for the β -hairpin sequence.


Introduction
Oligopeptides and disordered proteins are characterized by an ensemble distribution of dynamic and rapidly changing conformers (microstates). 1 Fundamental understanding of such conformational ensembles is key to the basic principles of protein folding and misfolding, with important implications for the aggregation of proteins and their related diseases. 2,3 In these cases, solid-state based techniques such as X-ray diffraction, while extremely successful in the structural determination of ordered proteins, are not informative, because they miss a full consideration of the ensemble. Solution-based experimental structural investigation methods, e.g. NMR or optical spectroscopies, also deliver incomplete information, since they output a signal representative of the conformational macrostate, i.e. the Boltzmann average of all ensemble microstates. However, the measured signals can be commonly analysed to estimate of the relative population of typical secondary structure elements within this ensemble using deconvolution approaches. [4][5][6][7] Among other solution-based methods, electronic circular cichroism (CD) spectroscopy is widely used for identifying and quantifying the secondary structures of solvated biomolecules, using regression techniques based on datasets of proteins with both accessible CD spectra and crystal structures. 8 A more direct observation of the conformations of oligopeptides can be achieved via computational approaches such as all-atom molecular dynamics (MD) based on classical force fields, given their proven ability to predict the dynamics of biomolecular systems with high accuracy. However, so-called enhanced-sampling MD methods are required to capture with statistical relevance all rare events contributing to intramolecular conformational changes, and thus to explore the conformational ensemble of oligopetides. 9 In particular, Replica Exchange with Solute Tempering (REST) 10 in combination with Meta-Dynamics (MetaD) 11 is an advanced-sampling MD method that has been successfully used for this purpose. [12][13][14][15] In this context, the ability to compute experimental observables from RESTMetaD trajectories constitutes a valuable tool of cross-validation and can strengthen the interpretation of the experimental data. The combination of CD spectroscopy and atomistic simulations has been indeed successfully employed to gain some atomistic understanding of conformational ensembles. 14,15 This is due to the fact that a CD spectrum can be computationally predicted based on the atomic positions of a given biomolecular conformation. [16][17][18] When dealing with ensembles, however, the need to consider thousands of different conformations requires a computationally efficient approach for predicting the CD spectra of the corresponding microstates. A popular approach is provided by the DichroCalc software platform, 19 which employs the so-called matrix method to calculate CD ellipticities from first principles using exciton theory.
While promising in general, this approach has been proven to work best for oligopeptides and proteins with high α-helical secondary structure content. 14,15 This is due to the nature of the electronic transitions in CD, which can be well captured by only a few matrix elements in the case of α-helices. 19 Moreover, the regularity of the α-helical hydrogen-bond pattern produces intense spectral features in well-defined wavelength regions, facilitating the computational prediction. 6 On the quest towards a generalization of this approach, it would be desirable to tackle the higher structural variability of conformations associated with other secondary structure motifs, in particular β-sheets and turns. 4,5 The aim of this paper is to move in this direction, by choosing two oligopetides featuring, respectively, a preferential α-helical and a β-hairpin conformation, and determining both experimentally and theoretically their CD spectra in order to evaluate and compare their conformational ensembles. For the helical model peptide, we choose the 4DAR5 sequence, 20,21 which some of us investigated previously, 14 demonstrating for the first time the possibility of predicting experimental CD spectra from enhanced-sampling MD simulations.
For the β-hairpin model, we choose the C-terminal fragment of the B1 domain of protein G (called GB1 throughout this paper), which has been intensely studied experimentally and theoretically. [22][23][24][25][26][27][28][29][30] Our analysis will focus both on rapid de novo secondary structure prediction tools and more demanding RESTMetaD simulations, from which CD spectra will be calculated and compared with our own experiments.
In doing so, we will attempt to separately assess the predictive power and limitations of two different key elements involved in the theoretical CD prediction, namely (i) the pa-rameter sets of the DichroCalc method, and (ii) the influene of the force fields used in the MD simulations. The latter is crucial to predict the correct equilibrium distribution of conformational microstates, a task that has proven very hard especially for short peptides with non-helical structures. The former should lead to quantitatively correct spectra when applied to the correct ensemble of microstates. The consideration of peptides with known conformational phase spaces will allow us to distinguish between these two potential sources of errors and thus help us apply the same methodology to new oligopeptide sequences with unknown conformations in future studies.

Circular dichroism (CD) experiments
CD experiments were performed with oligopeptides with sequences Ac-DDDDAAAAARRRR-Am (4DAR5), GEWTYDDATKTFTVTE (GB1-uncapped) and Ac-GEWTYDDATKTFTVTE-Am (GB1) synthezised by JPT (purity ≥ 95 % Berlin, Germany). The as-purchased oligopeptide 4DAR5 was dissolved in ddH 2 O (18 MΩ cm −1 , MilliQ, Synergy, Millipore, Germany) at a stock concentration of 1 mg ml −1 , as quantified by absorbance at 205 and 214 nm. 31,32 The as-purchased oligopeptides GB1 and GB1-capped were dissolved in a 3 mM PBS solution (41 mM NaCl; P4417, Sigma Aldrich, Germany) at a concentration of 1 mg ml −1 , as quantified by absorbance at 205 and 214 nm. CD spectra were recorded with an Applied Photophysics Chirascan spectrometer running the Pro-Data Chirascan software (v4.2.22). At least three repeat scans for each sample were measured at 25 • C, over the wavelength range of 190 to 250 nm using intervals of 1 nm, in Suprasil quartz cells (Hellma UK Ltd.) with a pathlength of 0.05 cm. The scans were averaged, the respective baseline subtracted and the resulting net spectra smoothed with a Savitsky-Golay Filter using smoothing windows of 5 to 10 data points.
The mean residue ellipticity (Θ M RE ) was defined as 33 where Θ is the raw CD ellipticity (mdeg), n is the number of amino acids in the solvated peptide, l is the pathlength of the used quartz cuvette (mm) and c the molar concentration of the peptides. To estimate the relative amount of specific secondary conformational elements in the samples, the CD spectra were analysed using the BeStSel webserver. 4,5 This analysis decomposes the whole CD spectrum in terms of individual secondary structure contributions using an extended set of eight independent elements with special emphasis on the different types of β-sheets defined by different twist angles in the polypeptide backbone. 4,5 De novo secondary structure prediction The intrinsic conformational propensities of the two investigated peptides are evaluated with the PEP-Fold secondary structure prediction server. 34-36 PEP-Fold first predicts Structural Alphabet (SA) letters from the primary peptide sequence using a hidden Markov model approach. The letter fragments are then assembled by a Greedy procedure driven by a modified OPEP coarse-grained force field energy score followed by a clustering procedure, leading to a maximum of five most-probable 3D structures.

Enhanced sampling Molecular Dynamics (MD) simulations
MD simulations were performed with LAMMPS, 37 employing the Amber03 force field, 38,39 following Meissner et al. 14 In the GB1 case, a further simulation employing the Amber14SB force field was performed, for comparison. 40 Visualization and analysis of the trajectories were performed with VMD. 41 Free energy profiles for both peptides, in capped form, along the respectively chosen collective variables were computed with Replica Exchange with Solute Tempering (REST) 10,42,43 combined with Metadynamics (MetaD), 11 as employed in previous works. 13,14 Both peptides were defined as the 'solute' in the REST simulations, whose temperature was scaled in the different system replicas ('hot' system region). The water and the salt ions remained at the base temperature T 0 = 300 K ('cold' system region).
with N corresponding to the total number of amino acids in the helix (N = 13) and the index i spanning between 11 residues. An ideal helix assumes a value H = 100 %, whereas a completely unordered structure assumes a value H = 0 %. In accordance with Bussi et al., 12 for GB1 the radius of gyration R Gyr and the number of hydrogen bonds N H were used as the CVs. They are defined in PLUMED 1 44 as follows: where m i and m i are the positions and masses of the heavy atoms of the backbone of the peptide, respectively, and r COM is the position of the center of mass;

Structural and cluster analysis
Cluster analysis of the conformational microstates sampled by the RESTmetaD trajectories in selected regions of the free energy landscapes was performed using the GROMOS algorithm, 45 as implemented in Gromacs, according to the differences in the root-mean square displacement (RMSD) values of individual conformers, using an RMSD cutoff of 3 Å applied to the atoms of the backbone. The conformers were aligned by a rigid rotation and translation to minimize the RMSD variation prior to clustering. For comparison, a hierarchical clustering was performed, on selected regions, using the distance in inter-residue contact maps. For further validation, average inter-residue contact maps were also computed for the most populated clusters obtained by backbone RMSD clustering. The secondary structures of the most populated clusters were also analyzed via the STRIDE algorithm. 46 The Φ and Ψ dihedrals of the peptides backbones along the trajectories were computed via the 'chi' program in Gromacs to create Ramachandran plots.

Calculation of theoretical CD spectra and analysis
Theoretical CD spectra of individual conformational microstates were calculated using the DichroCalc web interface. 17,19 DichroCalc estimates the mutual interactions between electronic transitions with a model exciton Hamiltonian matrix; 19 the sign and magnitude of the interactions is dictated by the relative orientation and separation of chromophores, i.e., by the precise structure of the protein/peptide. The matrix elements were computed using three different parameter sets modelling the n → π * and π nb → π * transitions of the peptide bond chromophore, namely: a set derived from semi-empirical calculations, 47 a set based on ab initio calculations, 48 and another ab initio set in which the vibrational structure of the π nb → π * transition is explicitly included. 49 The vibrational structure, arising from the Franck Condon progression of the π nb → π * transition, contributes to the broadening of CD bands and is thus important to improve the agreement with experimental spectra. In all calculations, neither the π b → π * transition nor charge-transfer transitions were included. The calculated line spectra were convoluted with Gaussian bands of bandwidth 12.5 nm, except for the spectra calculated with the parameters incorporating vibrational structure, where a narrower bandwidth of 10 nm was applied, to avoid double-counting.

Circular dichroism experiments
We first report the experimentally measured CD spectra of the two model peptides 4DAR5 and GB1. In agreement with previous investigations, 20,21 the CD spectrum of 4DAR5 is characterized by two minima at 208 and 222 nm and a maximum at 190 nm, an overall shape which is commonly associated with helical conformations (Figure 1 a). However, the BestSel analysis of the spectrum reveals a helical content of only 24.5 ± 0.8 % and a major contribution of unordered conformations ('other' elements). The CD spectrum of GB1 is reported in Figure 1 b. Since a capping of the end groups has been reported to stabilize the β-hairpin conformation of the peptide, 50 we have also measured the CD signal for the native form of GB1 (GB1-uncapped, Figure S1 in Supporting Information). However, we do not report any difference between the two forms, suggesting that the acetyl-amidation does not have a significant effect on the conformational ensemble. In the following, we will only refer components (c.f. Figure 1 b)). Indeed, NMR experiments on GB1 in solution also reveal that the peptide is only 30 % folded in water at 25 • C. 22

De novo structural predictions
We now evaluate the intrinsic conformational propensity of 4DAR5 and GB1 with the Pep-Fold de novo secondary structure prediction server. [34][35][36] All five most-probable structures for 4DAR5 (labelled PF1 to PF5) present a helical con-formation with different extent of disorder in their termini (Figure 2 (a)). For all structures, we compute the corresponding CD spectra using DichroCalc and three different parameter sets, as described in the Methods section (Figure 2 (b)). The semi-empirical parameter set gives a very good qualitative agreement to the experimental spectrum, predicting the two minima at 206 and 222 nm for all structures. The structures in best agreement (PF1 and PF3) present a helicity value H of 65% and 60%, respectively, as computed with equation 2.
The ab initio parameter set is not able to reproduce the experimental CD spectrum for any  The predicted structures for GB1 all present β-hairpin configurations, differing mostly in the number of residues within the antiparallel β-strands and in their mutual twisting ( Figure   3 (a)). Irrespectively of the employed parameter set, a large variability can be observed in the corresponding DichroCalc CD spectra (Figure 3 (b)). The spectra of PF1, PF3 and PF4 present maxima in the region between 200 and 220 nm, which are entirely absent in the experiments. Only the 'ab initio + vib' parameter set predicts spectra for PF2 and PF5 that follow at least qualitatively the experimental spectral shape. A further validation of the parameter sets was also performed on selected results of the enhanced sampling simulations, vide infra and Supporting Information.

Enhanced sampling simulations
The whole conformational ensembles of the two peptides are explored with RESTmetaD, following our previous work. 14 The free-energy profile associated with the folding/unfolding of the 4DAR5 peptide in TIP3P water along the collective variable H is reported in Figure 4 ( The resulting two-dimensional free energy profile is reported in Figure 5,

Structural cluster analysis and calculation of CD spectra
The RESTmetaD ensembles presented above are now used to compute averaged CD spectra, which are compared with the experimental ones. In the present work, rather than considering the entire sampled phase space, we first select constrained regions of interest (delimited by black lines in Figures 4 and 5). We then apply a clustering algorithm based on the  plots (c.f. Figure 6 b)) confirm the STRIDE secondary structure assignments. However, at variance with 4DAR5, the residues are spread in a wider Φ/Ψ range. Noteworthy, only in the most populated cluster of region 5 the K10 residue of GB1 adopts torsional angles characteristic of the α L region of the Ramachandran plot, which is usually populated by residues in a turn conformation. 23 Inter-residue average contact maps for the most populated clusters in each region of the GB1 free energy landscape ( Figure S4 in Supporting Information) also confirm the previous structural assignment, with typical α-helical signatures in regions 2, 3 and 4 and β-hairpin signatures in regions 5 and 6.
The observed conformational richness within the computed free energy landscape makes the GB1 system an interesting playground to test the predictive capability of the DichroCalc parameter sets. The average spectrum corresponding to the most extended and energetically unfavored conformation (region 1) is qualitatively not dissimilar from the average spectrum of unfolded 4DAR5, although with larger spectral intensities. The spectra in regions 2, 3 and 4 do reveal α-helical signatures (e.g. local minima at about 220 and 225 nm), in agreement with the cluster and STRIDE analyses ( Figure 5 and 6). However, the spectral variability among the single microstates within these clusters is very high, consistent with the overall poor degree of order therein. A high variability of CD spectra is found also in region 5, even though the structure of the conformers inside the most populated cluster are tightly overlaid. We note that a similar variability was observed in the computation of CD spectra for the de novo predicted GB1 structures (c.f. Figure 3), that, indeed, locate themselves within the same region of the CV space (stars in Figure 5) This situation is even more evident in the case of the GB1 peptide. For instance, although the five most probable structures predicted by the de novo server all present a characteristic β-hairpin shape (see Figure 3 (a)), their computed CD spectra vary by a very large extent (see Figure 3 (b)) irrespectively of the employed DichroCalc parameter set. This finding confirms the sensitivity of CD spectra towards minor differences in β-sheet conformations.
A similar picture emerges from analysis of the CD spectra associated will all ensemble microstates within the β-hairpin region of the free-energy landscape of GB1 (region 5 and 6 in Figure 5). Besides a shift in the minimum position, the larger disagreement is found in the wavelength region between 223 and 227 nm (c.f. Figure 5, CD spectra of region 5 and 6), where the experimental spectrum presents a shallow negative shoulder, whereas the theoretical spectrum presents a shallow positive maximum. This possibly indicates that the conformational ensemble of GB1 comprises not entirely of microstates with β-hairpin propensity, but also of a substantial amount of microstates with (partially) unordered conformers. This is indeed in agreement with experimental NMR investigations, proposing that GB1 is only 30, % folded as a β-hairpin in water. 22 Indeed the STRIDE analysis performed in this regions assigns 4 out of 16 residues partially to a random coil conformation and 4 out of 16 to a turn conformation. In terms of CD reference spectra, both of these conformations would be charaterised by positive values in the wavelength region between 222 and 227 nm.
As a matter of fact, the BeStSel basis set is constructed from CD spectra of larger proteins, whose characteristic secondary structure elements are well characterised by X-ray diffraction or NMR measurements. 4,5 The protein environment exerts a strong stabilizing action over the atomic positions of such structure elements, even if they occupy terminal positions, 15 which is the case of the GB1 fragment. Solvated oligopeptides, even if carrying the same sequence and possessing the same intrinsic conformational propensity in the folded state, are subjected to larger structural fluctuations, and thus, to a considerably magnified extent, to large CD spectral variability. The CD community has recognised this fact, especially for the case of β-sheets elements and their sensitivity to mutual strand twisting. 4,5,55 BeStSel, for instance, considers an extended parameter set associated with various kinds of β-sheets, which makes its secondary structure analysis generally more robust than other approaches, especially if applied to oligopeptides. 56 We suggest that further improvements could be achieved if the spectral data set was extended in a way that is able to capture the large thermal fluctuations of oligopeptides within confined regions of the peptide's conformational phase space. To reach this aim, the inclusion of theoretically predicted CD spectral variability in the spirit of the approach presented in this paper could be highly beneficial. Taking into account not only the intrinsic structural variability of oligopeptides but also the CD spectral variability for each representative structural motif (as identified e.g. via cluster analysis techniques) will be essential especially to rationalise the conformational phase space of intrinsically disordered peptides and proteins. [57][58][59] A further crucial point regarding the prediction of structural propensities of oligopeptides via combined experimental CD analysis and theoretical exploration of their conformational phase space is the choice of the force field. 40 When no knowledge about the structure is available a priori, it is not guaranteed that the employed force field predicts correctly the relative energy levels of the various local minima of the free energy surface, no matter which collective variable are chosen for the analysis. For instance, the Amber03 force field employed here notoriously overestimates the stability of helical motifs, as observed in the case of GB1.
As a direct comparison, however, we found that also the more recent Amber14SB force field fails to capture the correct free energy landscape for the peptide ( Figure S5 in Supporting Information). Recent developments for force fields aim at an accurate description of the ensembles of intrinsically disordered proteins, 60-62 which could also be very promising for oligopeptides. However, it is very reassuring to learn that the cluster-averaged CD spectrum predicted by the 'ab initio + vib' parameter set of DichroCalc for confined regions of the free energy surfaces with clear structural ordering (e.g. around the minimum corresponding to a β-hairpin fold) does agree well with the experimental measurement.

Conclusions
We have shown that conformational ensembles can be described with atomistic precision by means of Circular Dichroism spectroscopy coupled with all-atom molecular dynamics simulations based on enhanced sampling methods. In particular, the recently introduced 'ab initio + vib' parameter set for the calculation of CD spectra within DichroCalc was proven capable of reproducing the spectral features of peptides with both helical and β-sheet intrinsic conformational propensities. This is despite the observed sensitivity of CD spectra prediction with respect to the geometry of the individual conformers composing the ensemble. The effect of this sensitivity is that de novo structure prediction can give reasonable results in term of computed CD spectra in the case of extremely simple conformational phase spaces, such as pure helices, while an extended ensemble of conformers already needs to be considered for β-sheets. Aiming at a generalization of the employed approach, we envisage that for preferentially unordered oligopeptides and proteins a sequence-based structure prediction with a limited amount of conformers will be even more inappropriate. This makes essential both the usage of advanced sampling simulations combined with accurate cluster analysis and the experimental acquisition of CD spectra also in the wavelength region below 190 nm, as accessible by Synchrotron techniques.