CoCo-MD: A Simple and Effective Method for the Enhanced Sampling of Conformational Space

CoCo ("complementary coordinates") is a method for ensemble enrichment based on principal component analysis (PCA) that was developed originally for the investigation of NMR data. Here we investigate the potential of the CoCo method, in combination with molecular dynamics simulations (CoCo-MD), to be used more generally for the enhanced sampling of conformational space. Using the alanine penta-peptide as a model system, we find that an iterative workflow, interleaving short multiple-walker MD simulations with long-range jumps through conformational space informed by CoCo analysis, can increase the rate of sampling of conformational space up to 10 times for the same computational effort (total number of MD timesteps). Combined with the reservoir-REMD method, free energies can be readily calculated. An alternative, approximate but fast and practically useful, alternative approach to unbiasing CoCo-MD generated data is also described. Applied to cyclosporine A, we can achieve far greater conformational sampling than has been reported previously, using a fraction of the computational resource. Simulations of the maltose binding protein, begun from the "open" state, effectively sample the "closed" conformation associated with ligand binding. The PCA-based approach means that optimal collective variables to enhance sampling need not be defined in advance by the user but are identified automatically and are adaptive, responding to the characteristics of the developing ensemble. In addition, the approach does not require any adaptations to the associated MD code and is compatible with any conventional MD package.


d) Development and evaluation of the reweighting/resampling method for CoCo-MD ensembles.
The procedure to correct, approximately, for the biased sampling produced by the CoCo-MD method is as follows: 1. For each member of the ensemble, we obtain the coordinates in a principal component subspace, and the potential energy. 2. By binning coordinate and energy data, we assign each structure to a microstate. 3. A histogram of the potential energy distribution from a short but equilibrated (c 10 ns) conventional MD simulation of the same system is generated. A target number of counts for each bin in the histogram is set: this is what the resampling process will seek to satisfy. 4. A random microstate is selected. 5. A random member of the biased ensemble that is in that microstate is selected. 6. If it has an energy that corresponds to a bin in the potential energy histogram that has a count less than the target value, the member is added to the new ensemble, and the population count in that bin is increased by one. 7. We return to step 4, until there are no bins left that need more samples.
The final result is a new ensemble, formed by sampling with replacement from the original, that has a potential energy distribution that matches a conventional MD simulation of the same system.
We used a simple Metropolis Monte-Carlo approach to generate a Boltzmann weighted sample on the surface (100000 trials, 22000 samples, acceptance ratio 0.22). From this we generated reference 2D and 1D free energy surfaces ( Figure S2b and S2c).
We then repeated the procedure, in the presence of a bias potential that was added to the "true" potential ( Figure S2d): The apparent 2D and 1D free energy surfaces from this biased sampling is shown in Figure  S2e and S2f Since in the CoCo-MD method we obtain a biased sampling, but have an unbiased potential, we calculated the true (unbiased) energy for each member of the biased sample. For this toy system, the microstate of each sample is defined precisely by the indices (i, j). Therefore to mimic a more realistic scenario, we assumed that only coordinate i was known, plus the potential energy. We then explored how well our reweighting method could recover the true one-dimensional free energy surface (along i), from the biased sample. Without including potential energy in the microstate assignments, all samples are assigned to one of just twenty microstates according to the value of i, and the method over-corrects for the bias: the two minima are of equal energy ( Figure S2h). As we add potential energy into the microstate definition the performance of the method improves dramatically. With increasing resolution in the partitioning of the energy (greater number of bins), the number of microstates increases ( Figure S2g) and the accuracy with which the 1D free energy surface can be recovered increases. However, the performance with a quite modest discretisation of the energy term is very reasonable ( Figure S2h). Figure S2: Application of the CoCo-MD ensemble reweighting method to a simple doublewell potential. See text for details. Figure S3. Kullback-Leibler divergence in the histograms generated from the first and second halves of the merged two hundred independent 10ns CMD simulations of alanine pentapeptide, as a function of the amount of each initial trajectory that was trimmed off, as potentially biased by the starting configuration. Figure S4: Convergence in the estimated value of the free energy difference between the extended and alpha-helical states of alanine pentapeptide for CMD simulations begun from the alpha-helical state. Each estimation used ten of the one hundred independent 10 ns simulations.

Analysis of MBP conformational sampling.
List of crystal and NMR structures (PDB codes) used for the PCA analysis. In the case of NMR ensembles, just the first submitted model was used.