Molecular Simulations of Mg2+-induced Folding of the Central Domain of the 16S Ribosomal RNA

The central domain of the 16S ribosomal RNA folds independently driven either by Mg2+ ions or by interaction with a ribosomal protein. In order to provide a quantitative description of ion-induced folding of the ∼350 nucleotide RNA, we carried out extensive simulations using the coarse-grained Three Interaction Site (TIS) model with explicit Mg2+ ions at a fixed K+ ion concentration typically used in the Tris buffer. The Mg2+ dependence of the radius of gyration shows that globally the RNA folds cooperatively. However, various structural elements order at different Mg2+ concentrations, indicative of the hierarchical assembly even within a single domain of the rRNA. Binding of Mg2+ ions is highly specific with successive ion condensation resulting in increasing tertiary structure formation. We also predict the Mg2+-dependent protection factors, measurable in hydroxyl radical footprinting experiments, which corroborate the specificity of Mg2+-induced folding. As a byproduct of the simulations, we provide a molecular basis of the structural rearrangements in the three-way junction in the intact central domain whose folding in isolation has been studied experimentally. In accord with experiments, we find that the three coplanar helices (h20, h21, and h22) with ∼120° interhelical angles at low Mg2+ concentration undergo a large conformational transition to a state in which h21 and h22 are coaxially stacked, and h20 forms an acute angle with h21 at high Mg2+ concentration. The angle distributions are broad even at high Mg2+ concentrations attesting to the assembly heterogeneity. Most of our results are predictions that could be tested. Graphical Abstract


INTRODUCTION
The determination of the spectacular ribosome structures has galvanized great interest in dissecting how such a complex structure could assemble so rapidly in vivo. Since ribosomes synthesize proteins in all living organisms, considerable energy and other regulatory mechanisms are used to generate and maintain their homeostasis. Although there have been vast experimental e↵orts to understand the molecular mechanism of ribosome assembly, which led to some important early discoveries, such as Nomura and Nierhaus maps [1, 2], ⇤ dave.thirumalai@gmail.com the general principles associated with the assembly process of the ribosome have not been fully resolved. [3,4]. In order to solve the assembly problem, many beautiful experiments have been systematically performed initially by investigating how the various domains of the rRNA fold and subsequently by the e↵ect of rRNA proteins in reshaping the assembly landscape. These studies, not unexpectedly, have produced an evolving and a complex picture of the assembly process. Along the way it has been pointed out, especially by Woodson [5], that some of the principles of ribozyme folding might form a useful framework for producing a quantitative theoretical model for ribosome assembly. After all rRNA folding problem has to be solved to produce an intact ribosome capable of protein synthesis.
In bacteria, the three ribosomal rRNA chains (⇠4,500 nucleotides in total) fold and assemble together with over 50 r-proteins in order to build one functional ribosome. The small subunit, 30S particle, is itself a large ribonucleoprotein complex consisting of a single 16S rRNA chain (approximately) 1500 nucleotides and about 20 proteins. The rRNA chain could be further decomposed into the 5 0 (⇠ 560 nucleotides), central (⇠ 350 nucleotides), and 3 0 (⇠ 625 nucleotides) domains. It is known that the three domains fold independently in the absence of r-proteins. Therefore, a logical approach is to understand how the individual components, especially the various rRNA domains, might fold independent of each other.
Previous simulation studies, focusing on the nuances of the protein (S4) induced structural transitions in the 5 0 domain [6][7][8] have been most insightful, especially when combined with experiments [8]. These studies showed that S4-guided assembly results in the 5 0 domain reaching the folded state by navigating through multiple metastable states, revealing the rugged nature of the folding landscape of RNA [9,10]. Inspired by the studies on the complex assembly process, here we investigated using simulations of an accurate model the folding of Mg 2+ -induced folding of the central domain of the 16S rRNA. We view this as a first step in the computational leading to protein-rRNA interactions. Although this work provides no insights into any aspect of ribosome assembly, we hope to use this study as a spring board to a more detailed description of the r-protein interactions with ribosomal RNA.
A major hurdle that has prevented computations of rRNA folding and ribosome assembly is the complexity and the size of the ribosome, in addition to the unsatisfactory quality of force fields for RNA and metal ions. We tackle these issues using coarse grained (CG) models, which we have used with some success in describing ion-induced folding of RNA and the e↵ects of denaturants on folding transitions in a number of proteins. In particular, we have recently created a model for RNA, which quantitatively captures the folding thermodynamics of group I intron RNA and other RNA constructs, with particular focus on the e↵ects of divalent cations [11].
Divalent cations, Mg 2+ in particular, are essential in stabilizing tertiary structures of RNA. In the context of ribosome assembly, the importance of Mg 2+ has been recognized from various kinds of experiments since the 1950s [12,13]. For instance, it has been recently shown that the entire 23S rRNA from the large subunit of the bacterial ribosome forms a near-native conformation in the presence of Mg 2+ without any r-proteins [14]. In the central domain of the small subunit, in vitro studies have shown that a three-way junction (3WJ) (Fig. 1) is in dynamic equilibrium between an open and a closed conformation, whose populations depends on the Mg 2+ concentration. Experiments have shown that the 3WJ undergoes substantial rearrangements upon addition of Mg 2+ or S15 [15][16][17].
Here we study Mg 2+ -dependent folding of the central domain of the small subunit of bacterial ribosome. We chose this 346-nt RNA fragment as our starting point because it has been used in experimental studies on the early stage of RNA assembly process involving r-proteins [16][17][18][19][20]. In this paper, we focus on the Mg 2+ -dependent folding of the isolated RNA, aiming to establish a basis for subsequent studies including assembly mediated by r-proteins.

MODEL AND METHODS
Three-Interaction-Site model: We used the Three-Interaction-Site (TIS) coarsegrained RNA model with explicit ions [11] to simulate the folding of the ribosomal RNA. In our previous study we establish that the model quantitatively reproduces thermodynamics of folding of RNA hairpins and pseudoknots as well as folding of the 195 nucleotide Azoarcus ribozyme [11]. Because the details of the model have been reported previously [11], we only provide a brief description here. In the TIS model [21], each nucleotide is represented by three coarse-grained spherical beads corresponding to phosphate, ribose sugar, and a base.
Ions in the solution, Mg 2+ , K + , and Cl -, are explicitly treated whereas water is modeled implicitly with temperature-dependent dielectric constant. Briefly, the e↵ective potential energy is taken to be, U TIS = U L + U EV + U ST + U HB + U EL , where U L accounts for chain connectivity and bending sti↵ness of the polynucleic acids, U EV accounts for excluded volume interactions of each chemical group including interactions between RNA sites and ions, U ST and U HB are the base-stacking and hydrogen-bond interactions, respectively. All the consecutive bases have the base-stacking interactions, of which the strength depends on the sequence. Any pair of the canonical Watson-Click base pairs (A-U and G-C) and the Wobble base pair (G-U), separated by at least four nucleotides along the chain, can form hydrogen bonds thus contribute U HB . In other words, the model accounts for certain non-native interactions as well. Besides those general stacking and hydrogen-bonding interactions, we generate a list of tertiary stacking and hydrogen-bonding appear in the specific RNA based on the crystal structure (see the next paragraph). The interactions between the charged moieties (phosphate groups and ions), U EL , interact via the standard Coulomb potential.
The values of the charges on phosphate groups and ions are 1 (phosphate), +2 (Mg), +1 (K) and 1 (Cl). All the force-field parameters used here are the same as in our earlier study [11]. Applications to a wide variety of RNA molecules show that the TIS model is transferable, quantitatively accounting for many aspects of RNA folding, although it would require exhaustive testing to be sure that this is so. consists of ⇠350 nucleotides (C562-A914 in T. thermophilus, PDB entry 1J5E [22]). The secondary structure map and the tertiary structure are displayed in Figure.1C. The list of hydrogen bonds was generated by WHAT-IF server based on the crystal structure [23].
The secondary structure diagram was generated using RNApdbee [24] and Forrna [25]. All three-dimensional graphics of this paper were generated with VMD [26].
Simulations: In this study, we examined the e↵ects of Mg 2+ ions by varying the concentration from 0 to 30 mM in the presence of 50 mM K + , a typical value contained in the Tris bu↵er. All simulations were conducted using periodic cubic box (each side is 35 nm) containing di↵erent number of Mg 2+ , determined by the concentration of divalent ions, and a fixed number of K + ions. An appropriate number of anions (Cl -) was added to neutralize the entire system. We performed low friction Langevin dynamics simulations at 37 C in order to sample conformations of the system containing the ribosomal RNA and ions [27].
For each condition (Mg 2+ concentrations), at least 60,000 conformations were collected from the equilibrated trajectories.
Calculation of contact Mg 2+ concentration: To quantify the a nity of Mg 2+ at each nucleotide site, we computed a contact ion concentration around the i th nucleotide using [11], where ⇢ i (r) is number density of the ion at the distance r from the phosphate of the i th nucleotide, V c is the spherical volume of radius r c , and N A is the Avogadro's number to represent c ⇤ i in molar units. In order to count only tightly bound Mg 2+ , we used a cuto↵ distance, r c = R Mg + R P + r where R Mg and R P are radii of Mg 2+ and phosphate sites, respectively, and r = 0.15 nm is a margin for contact formation. Since r is small, the quantity c ⇤ i corresponds the local molar concentration of Mg 2+ at the surface of the phosphate groups.
Calculation of Mg 2+ e↵ects on conformational changes: We computed the free energy contribution of Mg 2+ binding to conformational changes as G ↵ , where ↵ represents a certain conformational change (e.g. formation of certain tertiary contacts, TCs). This quantity can be defined for each Mg 2+ binding site, namely G ↵ (i) for the i th nucleotide.
The changes in the stability upon specific Mg 2+ binding is, where the superscripts Mg and indicate that Mg 2+ is bound and unbound to the i th nucleotide, respectively. Each term on the right hand side gives the stability due to contact formation, given that Mg 2+ is bound or unbound to the i th nucleotide. In other words, where G Mg ↵F and G Mg ↵U are the free energies of the states where the contact ↵ is formed and disrupted, respectively, given that Mg 2+ is bound to the i th nucleotide. Similarly, is the stability of the contact given that Mg 2+ is unbound. In our simulations, we calculated the di↵erence in the free energy as a combination of the joint probabilities computed from the ensembles of conformations, where each P ↵, i is a joint probability. For instance, P ↵, i (F, Mg) is the joint probability that the contact ↵ is formed, and Mg 2+ is not bound to the i th nucleotide.
Angles between helices: Our previous work had shown that a key element in the self-assembly of Azoarcus ribozyme is the establishment of an angle between certain helices, which due to topological frustration, occurs only at high Mg 2+ . To determine if this is the case for this piece of the rRNA we computed the Mg 2+ -dependent angles between certain helices. In the three-way junction ( Figure 1C), angles (⇥) between helices h20, h21, and h22 were calculated using the axis of each helix. The helix axes were computed using Kahn's algorithm [28] using nucleotides 577-586 and 755-764 for h20, 588-597 and 643-651 for h21, and 655-672 and 734-751 for h22.
Footprinting Protecting Factors (FPFs): In order to calculate the solvent accessible surface area (SASA), we first reconstructed atomistic structures from simulations based on coarse-grained coordinates using a tool developed in-house that employs a fragment-assembly approach and energy minimization by AmberTools [29][30][31]. Using the reconstructed atomically detailed structures, SASA was computed with Free-SASA version 2.0 [32]. It is known that experimental footprinting data, obtained using hydroxyl radicals, is highly correlated with SASA of the sugar backbone [33,34]. Considering that hydroxyl radicals preferably cleave C4' and C5' atoms of the RNA backbone [34], we assigned the larger SASA values of C4' and C5' atoms to each nucleotide. From the SASA data, we computed the protection factor of i th nucleotides as, where the bracket indicates an ensemble average [35]. We used the conformations obtained in the absence of Mg 2+ to compute the average SASA of the unfolded state, hSASAi Unfolded .
Hill equation for R g : The global transition of the rRNA size, measured by R g as a function of Mg 2+ concentration ([Mg 2+ ]), was fit to the Hill equation, The two parameters, the Hill coe cient (n) and the midpoint of Mg 2+ concentration ([Mg 2+ ] m ), were obtained by fitting the simulation data to Eq. 7. Before fitting, we converted the average R g values at each [Mg 2+ ] to ✓ using, where max (R g ) and min (R g ) were taken from the average R g values at the lowest and highest Mg 2+ concentrations, respectively. may be needed to fully drive the conformation to that found in the crystal structure. To see the similarity between the structures found in the simulations and the crystal structure, we also computed the root-mean-square deviation (RMSD, Figure 2A inset). The RMSD of the entire rRNA fragment is 1.5 nm at 30 mM Mg 2+ , whereas RMSD computed excluding h21 and h27 is 0.7 nm. From this data, we conclude that fluctuations of h21 and h27 contribute to the slightly larger R g found in simulations. The larger R g compared to the crystal structure notwithstanding, we conclude that the structure is correctly folded at high [Mg 2+ ], as visually illustrated in Figure 2. Figure 2(B, C) shows the distance distribution functions at several Mg 2+ concentrations, which may be obtained as the inverse Fourier transform of the scattering function that is measurable by SAXS experiments, thus serving as a testable prediction. We also calculated the shape parameters, asphericity (0   1) and prolateness S ( 0.25  S  2) [36,37]. Both and S are unity for rods, whereas = S = 0 for spheres. The calculated distributions show that the shape of the RNA is shifted from being spherical to a prolate ellipsoid as the [Mg 2+ ] concentration increased.

Mg
Helix h19 forms at high Mg 2+ : In the rRNA fragment, there are nine distinct he-lices, that are conventionally named h19 through h27 (see Figure 1C). We calculated the probability of formation of the helices as a function of [Mg 2+ ] ( Figure 3). All the helices except the short h19 are stable over the entire range of Mg 2+ concentration in the presence of 50 mM KCl at the simulation temperature (37 C). These helices, which are stable at very low [Mg 2+ ], contain at least 8 canonical Watson-Crick base pairs. In contrast, h19, connecting the central junction to h27, has only three G-C base pairs. Thus, it is likely that the stability of individual helices determine the order of their formation, as previously suggested [38]. More importantly, as shown in Figure 1C, h19, h25, h24 and h20, shown in Figure 1C).
[Mg 2+ ] midpoints of tertiary contacts formation vary: In the crystal structure of the rRNA fragment, there are seven regions with major tertiary contacts (TC), where two or more secondary structural elements contact each other ( Figure 4A). These contacts are typically clusters of hydrogen-bonding interactions. We label them as in Figure 4, TC1 through TC7. Since we have already seen that all the secondary structures except h19 are stable even in the near absence of Mg 2+ , the formation of the TCs must drive the folding, and consequently compaction (decrease in R g ) of the RNA upon addition of Mg 2+ ions.
In Figure 4B (Figure 2). The results confirm that the R g change, which depends on [Mg 2+ ], is a consequence of the tertiary contact formation.
The six TCs can be classified into three pairs in terms of the Mg 2+ concentration requirements. From the titration curve ( Figure 4B), the midpoints of [Mg 2+ ] are ⇠2.5 mM for TC1 and TC2, ⇠4 mM for TC3 and TC4, and ⇠5 mM for TC6 and TC7. By mapping the TCs onto the secondary structure (see Figure 4A) Mg 2+ binding to specific sites drives tertiary contact formations: How do Mg 2+ ions facilitate the folding of the RNA fragment, in particular, the formation of the TCs?
To begin the investigation, we computed the fingerprints of Mg 2+ binding at the nucleotide resolution as contact Mg 2+ concentration (see Methods). In Figure 5A, the fingerprints are shown at three bulk Mg 2+ concentrations at which the major conformational changes occur. The data clearly reveal that there are distinct positions where Mg 2+ ions bind with substantial probability. Although the contact concentrations increase as the bulk Mg 2+ concentration increases from 2.5 mM to 5.0 mM, many of the distinctive peaks exist even at 2.5 mM. This is a non-trivial finding because, at [Mg 2+ ] = 2.5 mM, only TC1 and TC2 form with ⇠50% probability but all other contacts are either absent or formed with low probability ( Figure 4). This shows that coordination of Mg 2+ to rRNA is nucleotide specific, and does not occur in a random di↵usive manner as often assumed.
We further analyzed the relationship between TC formations and Mg 2+ binding to spe-  Figure 4, we conclude that the contact formations are associated with specific Mg 2+ bindings to the same nucleotides.
Tertiary stacking nucleates folding of the central junction: The central junction is stabilized by several tertiary base stacking between non-consecutive nucleotide (tertiary stacking, TST). From the crystal structure, we detected six TST interactions around the central junction ( Figure 6A, B), that are supposed to contribute the correct folding of the junction. Figure  Three-way junction folds upon tertiary-contact formation of constituent helices: The three-way junction (3WJ) consisting of h20-22 has been used as a representative folding motif in earlier experimental studies of the rRNA folding [39]. In the unfolded state, because of the electrostatic repulsion, the three helices are expected to be well separating without having major interactions with each other. Indeed, an experimental study using transient electric birefringence indicated that the three angles between helices are nearly equal (roughly 120 ) in the absence of Mg 2+ and r-protein S15 [17]. On the other hand, in the folded state, the two helices h21 and h22 are coaxially stacked, while the other, h20, forms an acute angle with h22 ( Figure 1). Although, in the ribosome, S15 binds to the center of the junction and presumably stabilizes the folded form, experiments demonstrated that Mg 2+ ions alone are su cient to stabilize the native form [17,18]. In addition, further analyses showed that the folding of the junction is determined solely by the RNA sequence rather than by binding of S15 [16].
We calculated the three angles between h20, h21 and h22 as a function of Mg 2+ concentration (Figure 7). At Mg 2+ concentration below 4 mM, the average of the angle ⇥ h20-h22 is about 80 , whereas the values of the other two angles are around 110 . On an average, the sum of the three angles is smaller than 360 indicating that the helices are not entirely confined to a plane. All the three angles fluctuate in the ensemble of simulated conformations, which is shown as ⇡ 20 in the standard deviations ( Figure 7). Nonetheless, the three helices are aligned roughly in a radial manner, in qualitative accord with the experiments [17]. At [Mg 2+ ] ⇠5 mM or above, the angles change dramatically. Angles ⇥ h21-h22 and ⇥ h20-h21 are around 150 indicating that two helices are coaxially stacked. Angle ⇥ h20-h22 adopts an acute value around 15 , showing that h20 is aligned toward h22, as in the crystal structure. As shown as the shaded regions in Figure 7, the fluctuations in the angles also much less at the higher Mg 2+ concentrations. The Mg 2+ concentration at which the 3WJ folds corresponds to the concentration at which TC3 forms, which is reasonable because TC3 is the tertiary interactions between the two ends of h20 and h22.
Prediction for hydroxyl radical footprinting: Lastly, we provide predictions for hydroxyl-radical footprinting based on solvent accessible surface area (SASA) of the simulated ensembles at each Mg 2+ concentration. Footprinting is a powerful experimental technique to probe RNA structures in vitro and in vivo [5,40]. Much like hydrogen-deuterium exchange experiments using NMR in the context of protein folding, hydroxyl-radical footprinting is used to assess the extent to which each nucleotide is exposed to the solvent. The solvent exposure has been demonstrated to be highly correlated with the SASA of sugar backbone [33,34]. We first calculated the SASA of each structure generated in the simulations, and then determined the protection factor (P F ) of each nucleotide site (see Method) by averaging over the simulated conformational ensemble at each Mg 2+ concentration. where only the secondary structures are intact. As Mg 2+ concentration is increased, tertiary interactions form in a hierarchical manner in three distinct stages. Considering that the typical Mg 2+ concentration of bacterial cytoplasm is ⇠ 1 mM [41], our data suggest that the rRNA should not spontaneously form a stable compact structure in the absence of r-proteins ( Figure 2). This conclusion is not definitive because folding in vivo occurs in a crowded milieu, which would lower the e↵ective midpoint for folding [42][43][44]. The detailed simulations, which currently cannot be done using any other computational method for a large RNA construct studied here, allow us to provide a few additional remarks.
Fate of the three-way junction: The independent folding of the 3WJ and related constructs in the central domain (Fig.1) has been studied extensively, principally by Williamson and coworkers, over two decades ago by a variety of experimental methods [16][17][18]. In a quest to understand the influence of r-proteins on the assembly of 30S particle they focused initially on the folding of the 3WJ, which folds either in the presence of S15 or Mg 2+ [17].
Using transient electric birefringence and model to analyze the data [17] they inferred that the angles between the three helices (h20, h21, and h22) are roughly 120 , which implies that they adopt a planar structure. Upon addition of 1.5 mM Mg 2+ (or S15) the 3WJ is structured in which h21 is coaxially stacked with helix 22 and h20 forms an acute angle with h22. The results of our simulations, carried using the entire central domain, are in excellent agreement with experiments (Fig.7). We find that 3WJ folds cooperatively with the formation of tertiary interactions between h20 and h22 (TC3). The good agreement between simulations and experiments not only validates the model, which has been adequately justified previously [11,45], but also shows that the global structure of the 3WJ does not change significantly when embedded in the intact central domain. concentrations, which is reminiscent of the FRET e ciency distributions [15]. The substantial width of the angle distributions, which are di cult to directly measure experimentally, also show that upon folding there are substantial conformations in the 3WJ even in the intact central domain.
Specificity of Mg 2+ binding: The role of Mg 2+ ions is particularly di cult to probe experimentally because monitoring the binding to RNA is likely to be cooperative. Indeed, we find that the Hill coe cient extracted from the dependence of R g on Mg 2+ condensation is roughly three, an indication of cooperative binding ( Fig. 2A). Our simulations show in no unequivocal terms that Mg 2+ ion binding is discrete and highly specific. Even at the lowest concentration of Mg 2+ there are specific nucleotides where the local concentrations of the divalent ions is significantly higher than the bulk value. It is possible that these discrete and specific binding nucleates the folding of the RNA, which was an idea that was proposed long ago based on the crystal structure of the P4-P6 domain of the Tetrahymena ribozyme [47]. Indeed, a new theme that is emerging through simulations that we have performed is that Mg 2+ ion binding, which can occur directly or mediated by a single water molecule [48], is specific and is dictated by the architecture of the folded state [11,45]. The present study adds one more example of this new concept.
Transition midpoint is not unique: A corollary of specificity of Mg 2+ ion binding implies that the midpoint concentration of Mg 2+ at which the RNA folds is not unique. This is evident in Fig. 4 which shows that the important tertiary interactions that stabilize RNA order at di↵erent Mg 2+ concentrations. The variations likely suggests the di↵erences in stabilities in the di↵erent regions with the least stable one requiring higher Mg 2+ concentration. A recent study has explicitly demonstrated this idea in temperature induced folding of several RNA pseudoknots [38]. The non-uniqueness of ordering temperature or denaturant concentration has previously been predicted for proteins [49] in which (typically) the secondary structural elements are not stable, this making it di cult to validate the predictions experimentally. CONCLUSION Currently there appears to be no alternative than to use the TIS model with explicit ions for simulating Mg 2+ -dependent folding of large RNA molecules. The present study on the central domain of rRNA, which yields results that are consistent with experiments on the 3WJ, has produced a number of predictions that could be tested. We are encouraged that the very general framework could treat the folding the full length rRNA. We have also created a protein model, which has produced results for the thermodynamics and kinetics of folding that are in near quantitative agreement with experiments for several proteins (see for example [52][53][54][55]). Naturally, our immediate aim is to combine these models to simulate the assembly of RNA-protein complexes, and examine how the ribosomal proteins modulate the shape fluctuations of the rRNA.   Figure 1C).   B-D).