Condensates in RNA repeat sequences are heterogeneously organized and exhibit reptation dynamics

Although it is known that RNA undergoes liquid–liquid phase separation, the interplay between the molecular driving forces and the emergent features of the condensates, such as their morphologies and dynamic properties, is not well understood. We introduce a coarse-grained model to simulate phase separation of trinucleotide repeat RNAs, which are implicated in neurological disorders. After establishing that the simulations reproduce key experimental findings, we show that once recruited inside the liquid droplets, the monomers transition from hairpin-like structures to extended states. Interactions between the monomers in the condensates result in the formation of an intricate and dense intermolecular network, which severely restrains the fluctuations and mobilities of the RNAs inside large droplets. In the largest densely packed high-viscosity droplets, the mobility of RNA chains is best characterized by reptation, reminiscent of the dynamics in polymer melts. Our work provides a microscopic framework for understanding liquid–liquid phase separation in RNA, which is not easily discernible in current experiments. The molecular driving forces underlying the liquid–liquid phase separation (LLPS) of RNA are not well understood. Now simulations show that low-complexity RNA sequences undergo LLPS at high RNA concentrations, driven by the formation of Watson–Crick base pairs between distinct RNA polymers. LLPS occurs by merger of small droplets into larger ones and RNA chains in the large droplets exhibit reptation dynamics.

Nucleotide repeat expansion is known to cause several neurological and neuromuscular disorders such as Huntington disease, muscular dystrophy and amyotrophic lateral sclerosis [37][38][39][40] . The disease-associated repeat sequences have been found in both the coding and non-coding regions of the transcripts 38 . However, determining exactly how these repeat expansion sequences cause diseases is still elusive. Recently, Jain and Vale 33 showed that the trinucleotide repeat sequences (CAG) n , (CUG) n and repeats of the hexanucleotide (G 4 C 2 ) form biomolecular condensates. The main findings of the Jain and Vale study are: (1) the high GC content sequences (CAG) n , (CUG) n and G 4 C 2 repeats form droplets, which over time transform to a gel-like state at high RNA concentrations; (2) phase separation occurs only if the number of repeats exceeds a critical value, which is similar to the value where diseases appear 33,41 ; (3) in vivo, the gel-like state is abolished (except in the repeats of G 4 C 2 ), and only a liquid-like state is found. Presumably, this is due to active forces generated by an energy source (ATP binding or hydrolysis, for example). Although RNA transcripts are usually exported to the cytoplasm, the (CAG) n foci, once formed, are preferentially retained in the nucleus and co-localize with nuclear speckles that sequester splicing factors. These findings provide the required impetus to investigate the relationship between the tendency to undergo LLPS and the neurotoxicity of the repeat RNA sequences.
To provide insights into the mechanism of condensate formation of the RNA repeat sequences, the organization of the RNA chains in the condensates and the associated dynamics, we created a minimal coarse-grained model, representing each nucleotide by a single interaction site (SIS), to investigate the molecular mechanism of LLPS in RNA repeat sequences. We incorporated only features that are essential for intra-and intermolecular interactions of RNA. In creating the model, we adopted a top-down approach with the goal that it is sufficiently simple to make multichain simulations feasible. The resulting SIS model has only one parameter that sets the energy scale for base-pair interactions, which was chosen to reproduce the known structures of a short (CAG) 2 duplex 42 .
RNA structure changes dramatically in the condensates. We then characterized the RNA conformations inside the condensates containing multiple chains utilizing numerous quantities widely used in polymer physics. The conformational changes of (CAG) n relative to the monomers is dramatic. (1) First, we calculate the mean distance between two nucleotides separated by s = |i − j| (where i and j are the indices of these nucleotides along the chain). The R(s) curve (shown in Fig. 2a) increases at large s, implying that the two ends are not in proximity, as in the isolated chain (Extended Data Fig. 2).
(2) Interestingly, R(s) versus s is independent of the droplet sizes. (Oligomers, 2-4 monomers; medium droplets, 5-10 monomers; large droplets, >10 RNA molecules.) (3) The bond orientational correlation, cos θ(s) = ⟨b i · b i+s ⟩ /l 2 b (where b i is the i th bond vector and l b is the bond length), of the individual chains lacks periodicity inside the droplets, which is prominent in the monomers and oligomers (Fig. 2b). (4) The end-to-end distance (R ee ) distribution for chains inside the droplets shifts to higher values (Fig. 2c), signalling a disruption of the hairpin structures adopted by the isolated chains. Snapshots from the simulations (Fig. 2e) show that the RNA polymers populate extended conformations.
The form factors (equation (4) in Methods), which characterize the overall shape of the polymer in the reciprocal space, show that the chains inside the droplets are markedly different from the monomeric RNAs (Fig. 2d). For the chains in the condensates, at small scattering vector the Guinier regime. At higher q, there is a crossover to a power law, S(q) ≈ q −1/ν , with ν ≈ 0.5, suggesting that the RNAs may be characterized as ideal chains just as in polymer melts. This is because from the perspective of a single chain inside dense condensates, it is irrelevant whether the interactions arise intramolecularly or from other chains in the droplet. A similar reasoning led to the 'Flory ideality hypothesis' proposed for polymer melts (see Fig. 3 for the analyses of intra-and intermolecular interactions). At small spatial length scale, or high q, S(q) ≈ q −1 is recovered.

Intermolecular base-pairing drives RNA condensate formation.
We calculated the fraction of formed base pairs, f bp = N bp /N CAG where N bp is the number of base pairs and N CAG is the number of CAG units in the simulations, to decipher the molecular details of the transition from low-order oligomers to condensates (see Methods for details). The results were decomposed into intra-and intermolecular base pairs (shown in Fig. 3a). Initially, the chains mostly adopt hairpin-like structures by forming exclusively intramolecular base pairs. Once small and medium-sized droplets form, intermolecular interactions increase at the expense of intramolecular interactions. For RNA chains to form inter-chain base pairs, the C and G nucleotides from two chains have to satisfy both the distance and orientation criteria, thus requiring population of extended conformations (Fig. 2). In this process, the self-interactions are replaced by the intermolecular interactions without having to compensate for bending on a short length scale, s. We find that intermolecular base-pair interaction is the driving force in the coalescence of the RNA repeats 18,33,44,45 . For short chains (n = 20) or at low C RNA , the number of intermolecular interactions is small ( Supplementary Fig. 4). Thus, most of the molecules remain as monomers or form relatively small droplets with fewer than eight chains. At high concentrations or for longer chains (n = 47 and n = 31), the increase in the number of intermolecular base pairs leads to phase separation, resulting in the formation of two discrete phases with vastly different RNA densities. The number of base pairs in the high-density phase is much higher than that in the low-density phase (Fig. 3b).
Intra-and intermolecular interactions depend on the droplet size (Fig. 3c). Unlike the self-interactions in the isolated RNA chain, which feature extensive Watson-Crick (WC) base-pairing along the anti-diagonal to form hairpin structures (Extended Data Fig. 2), those inside the droplets are structurally more diverse. The propensity to form interactions along the chains increases, beyond  those that form along the anti-diagonal. The total number of self-interactions of RNAs inside the droplets is smaller than in the monomers ( Fig. 3a) due to the formation of extended conformations. Because the RNA conformations are considerably stretched inside the condensates, the interactions between different chains ought to increase. Intermolecular interactions between RNA molecules in droplets are also position-independent, reflecting the repeat nature of the sequence. However, for oligomers, specific 5′ to 3′ interactions still dominate, similar to what is found in a hybridized duplex.

RNA conformations in the droplets are highly heterogeneous.
Due to extensive intermolecular base-pair formation inside the droplets, we expect the RNA mobility to be seriously hindered. Each RNA could be kinetically trapped, and only sample conformations around a local minimum in the free energy landscape, much like in a glass or a jammed system 46 . Figure 4 reveals that this is indeed the case. The distributions of R ee and R g of the individual chains have large dispersions, exceeding their usual values, although the histogram averaged over all the chains (shown in the inset) shows behaviour resembling that expected for ideal chains (red curves). Therefore, averaging over the ensemble of structures and over the number of RNA monomers in the droplet conceals the conformational heterogeneity. Some of the chains sample conformations with R ee exceeding 200 Å, which is much greater than the dimensions expected for a random coil. As a consequence, there is a great degree of heterogeneity in the conformations that are sampled in the droplets. The unusual conformations that are accessed could arise because of the droplet is arrested in a non-ergodic phase with long overall relaxation times.
Condensates are dynamic. A hallmark of LLPS is that the droplets (or foci) interact with one another, which could result in two (or more) droplets fusing to form a larger droplet by the Ostwald ripening mechanism. The reverse process in which a large droplet disintegrates into smaller ones could also occur. Our simulations capture both these processes in the phase separation of (CAG) n . Figure 5b shows the time evolution of each droplet size in the system, illustrating that the condensates formed in the simulations are dynamic, undergoing continuous growth, fusion and fission. The small droplets frequently interact and fuse with one another to form a larger droplet (snapshots in Fig. 5a and Supplementary Movie 2). However, not all fusion events are successful. There are instances of failed events in which the two droplets come together, interact for a while but subsequently dissociate (Fig. 5a). Such fusion-fission processes occur frequently, suggesting that the droplets are dynamic. The dynamic nature of the droplets is also highlighted by the lack of persistent internal order, measured by the nematic order parameter (Fig. 5c). For small droplets, the internal RNA molecules need to position themselves in a preferred orientation, which would result in a loss of entropy. An extreme example is a dimer, in which the two strands are parallel to maximize the interaction energy (illustrated in Fig. 3c). However, in large droplets the RNA could form base-pair interactions with other chains without sacrificing orientational entropy, which results in a decrease in the nematic order parameter. Figure 5a shows exquisitely the dynamic nature of the condensate formation. Each row corresponds to a single chain, with the colour denoting the droplet to which it is recruited. Each colour represents a unique droplet in Fig. 5b. At the early stage (τ < 2 × 10 7 ), most RNA molecules, at one instance or another, interact with almost all the droplets. Their residence times within a single droplet are relatively short. The monomers exchange with the bulk and are subsequently recruited by another (or the same) droplet. In contrast,   Supplementary Fig. 4). Different colours indicate if the molecule is a monomer (orange) or in different sized droplets: oligomer (green), medium (blue) or large (purple) droplets. Decomposition into intra-and intermolecular contributions (top and bottom panels, respectively). b, Concentration versus the number of base pairs formed in the two coexisting phases. c, Probability of contact formation due to intramolecular (top) and intermolecular (bottom) interactions. The contact between the two nucleotides is formed whenever the distance between them is less than 20 Å. In oligomers, intermolecular interactions between the RNAs generate structures similar to RNA duplexes, with signals along the anti-diagonal (from the bottom left to the top right) in the contact map. In larger droplets, the specific interactions decrease, and non-specific interactions predominate. Representative snapshots for different droplets with different sizes are at the bottom.
at later stages, the chains are mostly restricted inside large droplets (orange and blue droplets in Fig. 5a). Despite the restricted movement, the interactions between the RNA chains inside large droplets are nevertheless transient, as illustrated in Fig. 5d, which shows the contact lifetime of one molecule with all other chains inside a single droplet. Although some of the contacts are stable, most of them are intermittent and are disrupted over time. The lifetime of contacts decays following a power law (Fig. 5e). This is because the chains shift their internal positions frequently, and therefore interact with different chains at various times (Supplementary Movie 2). Due to the absence of droplet fusion and lack of monomer exchange as time progresses, we surmise that the large droplets in our simulations could be near the onset of coarsening into a rigid gel-like state, which is in accord with in vitro experiments. 33 A major advantage of our simulations is that it is possible to describe in vivid detail the behaviour of the chains during the phase-transition process where the monomers become part of the droplets. We found that there is no average or typical way a chain is integrated into a droplet, which underscores the importance of conformational and dynamic heterogeneity. Figure 6 illustrates one possibility (out of a large number) of how a droplet could grow, and what happens to the RNA chains during the process. Figure 6a shows the number of base pairs for a particular chain in the (CAG) 47 simulation. As the chain enters the droplet, the number of base pairs it forms with other chains increases, while the number of intramolecular base pairs diminishes. The recruitment of the monomer to a droplet started from a dangling end or an overhang of the hairpin (Fig. 6d,e). The end of the chain then hybridized with other chains on the surface of the droplet, and gradually penetrated into the droplet interior. Interestingly, the penetration of the monomer into the droplet was associated with slippage events of the stem region of the monomer hairpin. It is possible that the nature of the repeat sequences makes this kind of integration energetically feasible, as the monomer could retain intra-base-pairs that stabilize the stem on one side, while the other end hybridizes with other chains in the droplet. We wish to emphasize that the mechanism of incorporation is different for other chains. In addition, the droplet could grow by recruiting not only monomers but also oligomers and even smaller droplets (Extended Data Fig. 3 and Supplementary Movies 1 and 2). The growth mechanism is, therefore, probably highly heterogeneous.
Monomer dynamics inside the condensates is sluggish: evidence of reptation. We quantify the dynamics of RNA movement using the mean squared displacement (MSD) of the RNA centre-of-mass (Fig. 7a). The average MSDs for all the RNA chains increase linearly with time regardless of C RNA , implying that the overall motion is diffusive. However, the dynamics are highly heterogeneous upon undergoing LLPS. The RNA chains in the aqueous phase undergo normal diffusion, but the movement of the polymers in the droplets is strongly affected, especially in large droplets. The MSD exponents are less than unity, and decrease as the droplet size increases (Fig. 7b), which is suggestive of subdiffusive behaviour. Indeed, the slowing down of the RNA dynamics inside large droplets could signal the onset of formation of a gel-like state. Given sufficient time, the large droplets could potentially undergo subsequent maturation through a sol-gel transition. 20,[46][47][48][49][50][51][52][53][54][55] To probe the jamming dynamics of RNA chains inside large droplets, we calculate the MSDs for each nucleotide in the RNA chain (Fig. 7c). The average values (the solid line in Fig. 7c) show that the MSD scales as Δ (τ) ∝ τ 1/4 at long times. In monomers, we find that Δ (τ) ∝ τ 0.78 ( Supplementary Fig. 1). The τ 1/4 behaviour, which was first predicted theoretically 43 and observed in simulations of polymer melts 43,56,57 , is suggestive of reptation-like movement. The motion of these nucleotides is physically constrained along the contour length of the RNA, i.e. the chain movement is thought to occur in a low (one)-dimensional 'tube' generated by topological constraints imposed by other chains (see Supplementary Movie 3 for illustration). To exhibit reptation-like dynamics, the RNAs should form an intricate and entangled network of interactions, which arises due to intermolecular base-pair interactions. Interestingly, the formation of such a network has recently been demonstrated in AU-rich mRNAs 27,58 . The network of intermolecular interactions between the RNAs, therefore, plays a crucial role in dictating the slow heterogeneous dynamics of RNAs inside the droplets.

Discussion and conclusion
We developed a minimal coarse-grained SIS model that captures many of the observed features in the LLPS of repeat RNA sequences. In accord with experiments, we find that long (CAG) n repeats (n = 47, 31) phase separate at relatively low concentrations, whereas short chains (n = 20) remain in a single liquid phase even at elevated RNA concentrations. It is worth noting that there is only a single parameter in the SIS model that was determined using the structures of the (CAG) 2 duplex. In probing condensate formation in CAG repeats, the SIS model is essentially parameter-free, and hence is transferable to probe LLPS in other RNA sequences as well. As an example, we show in Extended Data Fig. 4 that the tendency to undergo LLPS in one scrambled sequence is diminished compared to the CAG repeat sequence, which is in qualitative agreement with experiments where there are no droplets observed. 33 There could be two reasons-one thermodynamic and another kinetic-that explain the quantitative difference between our simulations and experiments. One possibility is that a much higher concentration is needed in the experiment to induce phase separation in the   47 . This is merely one scenario that illustrates the changes in the dynamics of the monomer as it is recruited into the droplet. See Supplementary Movies 1 and 2 for additional scenarios. a, Number of base pairs of one specific chain as the recruitment process occurs. The base pairs are classified as either intra-chain (blue) or inter-chain (orange). In the bottom panel, the time is enlarged with arrows indicating time points of snapshots (b-h). b-h, Snapshots taken from the monomer recruiting event, accompanied by the RNA secondary structure of the recruited chain. Initially, the chain (in red) is in the hairpin conformation (b). The chain approaches a droplet from the tip of the loop. The attempt to coalesce is unsuccessful, and the chain eventually dissociates from the droplet (c). The 5′ and 3′ ends open, creating overhangs (d). The opened ends start interacting with other chains on the surface of the droplet (e). The 5′ end starts expanding and hybridizes with another chain (cyan) in the droplet. The base pairs of the hairpin stem slip due to the nature of the repeat sequence. As a result, the 3′ end shortens (f). At this stage, the 5′ end has been entirely integrated into the droplet. The 3′ end appears at the periphery, and transiently forms intramolecular base pairs (g). The RNA chain completely penetrates into the droplet from one side to the other. The 3′ end still has a short stem that is stabilized by intra-chain base pairs. This residual structure remains for the rest of the simulation (h). The secondary-structure diagrams were generated with Forna 79 .
scrambled sequence. Another reason follows from the observation in the simulations that the scrambled sequence tends to form multiple smaller droplets instead of a few relatively large ones as in the CAG repeat sequence. It is conceivable that due to the smaller droplet sizes in the scrambled sequence, they are below the critical threshold detectable in the experiment (or one would need to wait longer for the droplets to grow to detectable sizes).

Recruitment of monomers to condensates involves unwinding of hairpin-like structure.
Due to the high GC content of the sequence, the monomeric RNAs adopt an ensemble of hairpin-like structures with a small end-to-end distance (Extended Data Fig. 2), as is the case in a number of RNA molecules 42,59,60 . In sharp contrast, RNA chains inside the condensates are extended with large 5′-3′ distances. To form intermolecular interactions with other chains, the RNA monomer has to unwind, exposing the bases to form WC base pairs (or π-π stacking and non-canonical base pairs depending on the sequence 24,27,34 ). We expect this finding to be a general feature of not only repeat RNAs, but also in any binary biomolecular system that undergoes phase separation (with solvent being another component), as was shown previously in oligomer formation in a fragment of Aβ peptides. 61 Topological constraints, conformational and dynamical heterogeneity. As a result of a large number of intermolecular interactions between the RNA chains within the condensates, a given RNA molecule is topologically constrained by the neighbouring chains. The local environment of individual RNA chains differs greatly, which results in wide variability in the individual distributions of various shape parameters ( Supplementary Fig. 6), even though the average distribution could be explained using polymer theory. Because of such topological constraints, the dynamics of RNA chains is also strongly affected. The mobility of the RNA chains is greatly diminished even though diffusion of the system as a whole is liquid-like 62,63 .
In large droplets in which long RNA polymers are tightly packed with elevated density and viscosity, the movement of RNA molecules is reminiscent of reptation in polymer melts. RNA chains slither along their contour length due to translational inhibition in other dimensions. Strong entanglement in large droplets could further facilitate the maturation process, eventually driving the system to a gel-like state at long times under in vitro conditions. Under cellular conditions, it is likely that active processes (ATP binding and/ or hydrolysis) regulate the formation of gel-like states by remodelling the RNA structures. [64][65][66][67] For instance, the recruitment of additional protein clients 29 or the presence of ATP-consuming enzymes (for example, helicases or RNA chaperones) in the nucleoplasm that reorganize the RNA base pairings 33,67 could play a role in maintaining the condensates in a liquid-like state.
Limitations of the study. The simulations, which use the SIS model without any parameter to fit the experiments, explain the findings on condensate formation in the repeat RNA sequence. However, the SIS model has limitations.
(1) Counterions, which play an important role in driving condensate formation 33,68 , are not included explicitly. We assume that the cumulative effects of the Tris buffer, and the added monovalent and divalent ions used in the experiments effectively screen the electrostatic repulsions between the nucleotides. It has been shown by Oosawa and Manning and confirmed by experiments that the polyelectrolyte charge is neutralized by nearly 88% whenever divalent ions are present (compared to 76% for monovalent ions alone) [69][70][71] . Under these conditions, the electrostatic repulsion between the phosphate groups is effectively 1% of its original magnitude, justifying our assumption of complete neutralization of RNA charge by counterions. Our simulations are valid at high ion concentrations at which the charge on the phosphate is negligible. The impact of ions, especially divalent cations, has to be examined, possibly using a theory proposed recently 72 , for a more complete theoretical understanding of the RNA phase separation.
(2) Our simulations only consider the canonical WC base-pair formation, and completely neglect the possibility that non-canonical base pairs can form. This assumption is rooted in the observation that only 20-30% of nucleotides form non-canonical base pairs in the RNA structures 73,74 . Another reason is that the difference between these types of base pairs mostly arises from the alternative conformations of the base, sugar and phosphate groups within the same nucleotide. Since the SIS model represents a nucleotide by a single bead, any distinction at the single nucleotide resolution cannot be modelled explicitly. To take into account the possibility of non-canonical base-pair formation, one needs to utilize the more detailed three-interaction-site model in which a nucleotide is represented by three beads for the base, sugar and phosphate groups 75,76 . Incorporation of such non-canonical base pairs in the simulations could introduce alternative conformations and intermediate structures during the phase-transition process (Extended Data Fig. 5), altering the kinetics of hairpin unwinding and the timescale of conversion from intra-to intermolecular base pairs. However, we anticipate that such modifications would not remarkably change the outcomes of the simulations. 77 (3) It is unclear whether the in-silico-generated condensates adopt gel-like states as has been suggested using flourescence recovery after photobleaching experiments, which we should point out are not always accurate. We did provide evidence that the condensates start the transition from liquid-like to gel-like at long times. For instance, as time progresses, we have shown that the droplets fail to fuse once they are close and the monomers inside droplets stop exchanging with the diluted phase. These criteria are qualitatively similar to how gel-like or liquid-like natures of the condensates are characterized in experiments. Our simulations, therefore, suggest that the condensates first form via LLPS, and then subsequently coarsen into 'gels' , which could be the state observed in the Jain-Vale experiment. The time scale of the maturation process is likely to be long compared with the simulation time scale presented here. Because of the difficulty in distinguishing between the sol state with very slow dynamics and a gel-like state, material properties (by applying shear, for example) have to be determined before further characterization of the RNA condensates. 46 Final remarks. We have used molecular simulations to probe condensate formation in repeat RNA sequences, and have uncovered general concepts that are difficult to obtain using experiments alone. Our study establishes that in the process of self-assembly of low-complexity RNA sequences into droplets, RNA chain makes a transition from hairpin-like structures to extended conformations, thereby maximizing the number of intermolecular base-pair interactions. The droplets are stabilized by a network of soft intermolecular interactions, which repeatedly break and reform as the chains move within the confines of the droplets. The dense network, which is most prominent in large, high-density droplets, restricts the mobility of the chain mostly along the contour. As a result, motion occurs by a process that is similar to the snake-like movement envisioned in the context of polymer melts 43 . Taken together, our results show that the structural heterogeneity and the pinning of RNA chains by their neighbours makes the dynamics sufficiently sluggish that the droplet may be in a non-ergodic phase.
The entanglement of RNA chains arises in part because the CXG sequence could engage in both intra-and intermolecular WC base-pair formation. Because such specific interactions are probably not possible in poly(rU) sequences 78 , which also form droplets stabilized possibly by π-π stacking interactions, we believe that the only requirement for droplet formation in low-complexity RNA sequences is the presence of many weak inter-chain interactions. This would suggest that the length dependence for condensate formation is likely to be different in poly(rU) sequences to that in CAG or CUG polymers. Future studies are needed to provide definite answers about the effect of base stacking and non-canonical base-pair formation on RNA phase separation. Finally, it is worth emphasizing that our computational framework is not only well suited to examine the mechanism of self-assembly in other RNA sequences but also is general enough to account for ion effects and other cellular factors.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41557-022-00934-z.

RNA energy function.
To describe the organization and dynamics of RNA condensates, we develop a new coarse-grained model that is sufficiently simple to simulate multiple chains while retaining sequence information. Such an approach is needed to investigate the mechanism of RNA condensation in arbitrary sequences. Because condensate formation requires multiple simulations involving many chains for sufficiently long times, we resorted to a low-resolution, coarse-grained description for RNA. We have shown previously that simulations based on related models are efficacious in predicting the outcomes of single-molecule pulling experiments 80,81 .
We introduce the SIS model in which each nucleotide is represented by a single bead (Extended Data Fig. 6c). The energy function for an isolated RNA chain has the following form: where U BA accounts for chain connectivity, consisting of bond and angle restraints between the connected beads. We used harmonic potentials to keep the bond lengths and the angles close to the A-form helix: k bond = 15.0 kcal mol −1 Å −2 , k angle = 10.0 kcal mol −1 rad 2 , r 0 = 5.9 Å, α 0 = 2.618 rad. In the simulations, the angles fluctuate due to soft restraints, and deviate from the values in the A-form helix. The excluded volume interaction U EV is given by the Weeks-Chandler-Andersen potential 82 : where Θ is the Heavyside step function, σ = 10.0 Å and ε = 2.0 kcal mol −1 . The excluded term is only computed between two beads that are not involved in the bond or angle restraints, and are separated at least by two other beads (nucleotides) along the chain. The base-pairing term U BP is a many-body and short-range potential that mimics the canonical WC base-pairing between A-U and G-C. The base-pairing potential between the two beads i, j is taken to be: where θ a,b,c is the angle formed between beads a, b and c; ϕ a,b,c,d is the dihedral angle formed between beads a, b, c and d; r bp,0 = 13.8 Å, k r = 3.0 Å −2 , k θ = 1.5 rad −2 , k ϕ = 0.5, θ 1 = 1.8326 rad, θ 2 = 0.9425 rad, ϕ 1 = 1.8326 rad, ϕ 2 = 1.1345 rad. The functional form of U BP is designed to capture both the WC base-pairing and the helical nature of the A-form RNA at single-nucleotide resolution. The only unknown parameter, the strength of the base-pairing interactions, is adjusted to reproduce the structure of a small CAG repeat sequence 42  Inter-chain interactions. Interactions between different chains also involve excluded volume and base-pairing interactions, which are treated identically as in intramolecular interactions. Thus, the SIS model when used to simulate multiple RNA chains does not distinguish between intra-and intermolecular base-pair interactions. A nucleotide could form WC base-pair interactions with others either within the same chain or in a different chain. Therefore, the SIS model is general enough for studying condensate formation in a generic RNA, provided that single-bead resolution suffices. As always, the validity of the model, like all others, can only be assessed by comparison to experiments.
Counterion effects. Although counterions play an important role in driving RNA condensate formation (Extended Data Fig. 1h in ref. 33 ), we do not include them explicitly in the current version of the model. One reason for neglecting them is that the charge on the phosphate ions has to be effectively neutralized by counterions for them to coalesce into droplets. Upon counterion condensation, electrostatic repulsion between phosphate groups is greatly diminished. A similar situation occurs in Ψ-condensation, where it has been shown that roughly 90% of the DNA charge needs to be neutralized for the DNA to condense 69,70 .
The electrostatic repulsion between phosphate groups is around 1% of its original magnitude. Thus, it is not unreasonable to assume that the effective charge on the phosphate group is sufficiently small that it can be neglected. As a result, the simulations reported here are valid only at high monovalent (for example, Na + ) and divalent (for example, Mg 2+ ) concentrations. Future works, using a recently proposed theory 72 , could incorporate ion effects more precisely to calculate the ion-dependent phase diagram.
Simulation details and analyses. Simulation details. We placed 64 (CAG) n (n = 20, 31 or 47) molecules evenly spaced in a cubic box, whose size varied from 50 to 200 nm depending on the RNA concentration. The initial conformations of the RNAs are deliberately expanded to minimize possible conformational biases. The initial conditions used in the simulations are a mimic of the experimental protocol in which the RNA molecules are denatured at an elevated temperature for a certain duration, and then the temperature is decreased to room temperature 33 . To ensure that our results are not affected by initial conditions, we also performed simulations where the temperature is decreased slowly starting from a high temperature. We observed little difference in the phase behaviour (Extended Data Fig. 7), which shows that the results do not depend on the initial conditions as long as the RNA molecules are unfolded. Simulations were performed on a graphics processing unit using the custom OpenMM code to speed up sampling of the conformational space 83 . We used low-friction Langevin dynamics in which the viscosity of water was reduced by a factor of 100 to further speed up the sampling efficiency 84 . Even using the simple SIS model for RNA, the simulations are computationally intensive because there are many chains in the system. Simulations were conducted for ∼100 days for each trajectory using an NVIDIA Quadro RTX 5000 graphics card on the Frontera supercomputer (Texas Advanced Computing Center). Snapshots were recorded every 10,000 steps, which were subsequently used to calculate several quantities of interest.
Clustering. For each snapshot, we grouped the monomers that are within 20 Å of each other to the same cluster. A monomer belongs to a cluster if the distance between any of its nucleotide to any nucleotide in the cluster is less than 20 Å. For comparison, the equilibrium base-pairing distance in the SIS model is r bp,0 = 13.8 Å. Thus, the cut-off distance for clustering is about ∼40% larger than r bp,0 .
Form factor. The form factor of RNAs was calculated as: where q is the momentum transfer and r is the position of RNA beads. The brackets denote averaging over the chains at different orientations.
Radius of gyration and shape parameters. R g and the shape parameters were calculated using the gyration tensor, with each element defined as: R 2 g is then calculated by summing over the three eigenvalues of the gyration tensor R 2 g = λ 2 1 + λ 2 2 + λ 2 3 . To characterize the shape of RNAs, we calculated the relative shape anisotropy κ 2 and the shape parameter S using: where λ2 = 1 . κ 2 is bounded between 0 and 1; κ 2 = 0 implies that the molecule is perfectly spherical, and κ 2 = 1 if every point lies on a straight line. S satisfies −1/4 ≤S ≤ 2. Negative values of S correspond to oblate ellipsoids, while prolate ellipsoids have positive S.

Concentrations of the two phases.
To determine the concentration inside a droplet, we first calculate the volume of a droplet. We assume the droplet is an ellipsoid with the three effective radii, characterized by the three eigenvalues of the gyration tensor. Thus, the volume of the ith droplet is: Note that the gyration tensor is calculated for the whole droplet, and not individual RNA molecules as the previous section. The concentration inside the droplets is computed by averaging all the concentrations inside medium and large droplets: where C i = N i /V i is the concentration of the i th droplet with N i RNA molecules and volume V i . We calculate C i only for N i ≥ 5. The concentration of the aqueous solution, consisting of monomers and oligomers outside the droplets, is then calculated using: Mean squared displacement. The MSD of the centre-of-mass of a chain is given by: where τ 0 is the initial time.
The time-averaged MSD is calculated using: We also computed the MSD of individual nucleotides to probe the dynamics of the RNA chains in large droplets (Fig. 7c). For this purpose, we chose the largest droplet formed in the simulations. Before computing the nucleotide MSD, the position of the droplet in each snapshot was sequentially superimposed to the one in the previous time frame, starting from the instance the droplet is formed (τ 0 ). This procedure ensures that the MSD accounts solely for the dynamics of chains within the droplet, but not for the collective translational and rotational motion of the entire droplet in the simulation box. The MSD was then calculated in the same way using equation (11) by averaging over all the nucleotides except for 10 nucleotides at both the 5′ and 3′ ends. Note that τ 0 depends on when the droplet forms.
Criterion for base-pair formation. To assess if a base pair is formed between G and C, we rely on the base-pair energy U BP defined in equation (3). If U BP < −3k B T, where k B T is the thermal energy, between two nucleotides, then we consider that they form a base pair. Since the energy function consists of several terms that depend on the distance, angles and dihedral angles around the two nucleotides (equations (3b), (3c) and (3d)), the energy-based definition naturally captures the geometric criteria that base pairs satisfy in the ideal A-form RNA.

Isolated repeat RNAs form hairpin-like structures.
We first characterized the structural ensembles of an isolated (CAG) n , which serves as a reference when comparisons to RNAs within the condensates are made. Interestingly, the mean end-to-end distance, R ee , is independent of n (Extended Data Fig. 2a), which accords well with experiments and theoretical predictions for mRNA and long non-coding RNA 60,85-87 . The peak value in the distributions of R ee is around 35 Å. The R ee distributions for the three RNA chains with different lengths deviate from the Gaussian distribution for ideal chains, exhibiting long tails. This is somewhat surprising because the distribution of the radius of gyration, R g (shown in Extended Data Fig. 2b), shows little deviation from the ideal chain behaviour.
Unstructured RNAs (poly(rA) or poly(rC)) that do not favour base-pair formation follow the expected R ee distribution for a random coil 60,78 . In contrast, CAG repeats form intramolecular base-pair interactions, bringing the 5′ and 3′ ends into proximity. Experiments have reported that (CAG) n repeat sequences with small n could form stable hairpins containing GC base pairs with A:A mismatches 42,59 . Formation of GC base pairs in longer repeats also requires the chain to fold upon itself. Folding would be favoured if the overall free energy gain by forming GC base pairs compensates for the chain-bending penalty. For the CAG repeat sequences, extensive GC base-pair formation could mitigate the unfavourable interactions, leading to the formation of hairpin-like structures 42,44 . The probabilistic contact map (Extended Data Fig. 2d) shows that the majority of interactions are along the anti-diagonal, consistent with the formation of a hairpin structure. The two ends have the highest probability of being in proximity. Representative snapshots from the simulations, shown in Extended Data Fig. 2a, confirm that the RNA monomer mostly samples a set of hairpin-like structures.
It is worth emphasizing that we did not adjust any parameter in the SIS model to constrain (CAG) n to form hairpins. The structures found in the simulations are the result of the generic tendency of C and G to form WC base pairs. Due to the repeat nature of the sequence, the RNA maintains an ensemble of hairpin-like structures by sliding one end over another without paying a large energetic penalty.
The formation of helical structures is also reflected in the bond-bond orientational correlation function, where bi is the vector of bond i th , with the bond length l b . Cosθ(s) shows periodicity at small s = |i − j|, where i and j are the indices of the nucleotides in the (CAG) n sequence (Extended Data Fig. 2c). At a short length scale (up to five or six nucleotides), the structure of CAG is roughly rigid, with R(s) scaling almost linearly with s (inset of Extended Data Fig. 2c). At larger separations (s ≥ 6), R (s) ∝ s 0.56 , suggesting that the RNA is flexible, and hence could fold upon itself to generate hairpin-like structures. At a separation corresponding to the end-to-end distance, there is an abrupt decrease in R(s) as a function of s because the 5′ and 3′ ends are close. Similar results are also observed for (CUG) n (Extended Data Fig. 8). Our simulations show that the hairpin-like structures in (CXG) n (X = A or U) are a consequence of multiple canonical WC base-pair formation.
Non-canonical base-pairing has a minimal effect on the conformations of the repeat monomers. In our model, only canonical WC base pairs are allowed; that is, G only pairs with C, and A only pairs with U. It is known that RNA bases do form a wide variety of other pairings, which may be classified as non-canonical base pairs 88 . In principle, it is possible to include non-canonical base pairs within our framework. However, by analysing the RNA structures in the PDB, it is found that the frequency of non-canonical base-pairs is small compared to WC base pairs; 70-80% base pairs in the PDB are WC, and the remaining fraction forms non-canonical base pairs 73,74 . Furthermore, while there is only one potential WC base pair between two bases (cis-WC-WC, using the Leontis-Westhof notation), there are 11 additional ways to form non-canonical base pairs by orienting different edges of the base (WC, Hoogsteen or sugar). Therefore, one could infer that the non-canonical base pairs are much less stable compared to the WC base pairs, which is the basis for our assumption that WC base-pairing is the dominant force in driving phase separation of the RNA repeats. In addition, the difference between these types of base pairs mostly arises from the alternative conformations of the base, sugar and phosphate groups within the same nucleotide. Since the SIS model represents a nucleotide by a single bead, any distinction within the nucleotide resolution cannot be modelled explicitly. Therefore, we believe that in order to quantitatively account for the effects of non-canonical base-pair formation, one needs the three-interaction-site model in which a nucleotide is represented by three beads for the base, sugar and phosphate groups 72,75,76,89 . The drawback is that a higher resolution naturally raises the computational demand, and making it difficult to simulate multi-chains to probe LLPS using the currently available computing resources.
Having given the justification for neglecting non-canonical base pairs, here we explored a revised SIS model that includes non-canonical base pairs between any two bases that cannot form WC base pairs. The base-pairing energy function in the revised model is the same as in the SIS model (equation (3)), except we include all other possible combinations of base pairs with a smaller U • bp = −1.67 kcal mol −1 (which is a third of the value used for a GC base pair). When used the revised model to the CAG repeats, this means that A could form non-canonical base pairs with C, A or G. Even though the modification is a minor addition, it takes much longer time to perform simulations compared with the SIS model. Thus, in this preliminary calculation, we report results for the monomer (CAG) n , shown in Extended Data Fig. 5. The chain is somewhat more compact (R ee and R g are smaller than in the SIS model) due to the ability of A nucleotides to form non-canonical base pairs. However, the conformations are globally hairpin-like, which is exactly the same as in the model that neglects non-canonical base-pair formation. Therefore, introduction of the non-canonical base-pair formation in the phase-separation simulations could alter the kinetics of hairpin unwinding and the time scale of conversion from intrato intermolecular base pairs. However, we anticipate that such modifications would not remarkably change the major findings in our work.
Effect of cooling rate on droplet formation. An annealing protocol was employed in the in vitro phase-separation experiments of the CAG repeats 33 . RNA molecules were denatured at 95 °C for 3 min and cooled at the rate of 1-4 °C min −1 to 37 °C. In our simulations, the initial conformations of the RNA molecules are deliberately expanded to eliminate biases in the structures that the RNA molecules adopt. The expanded initial condition used in the simulations is a mimic of the experimental protocol in which the RNAs are denatured at an elevated temperature for a short duration. Subsequently, the temperature is lowered to the desired value. To ensure that our findings are not affected by the initial conditions, we also performed simulations in which we slowly decreased the temperature, as in the experiments. We first ran simulations at 100 °C, which is high enough such that all the RNA molecules are unfolded. Then, after every δt, we lowered the temperature by 10 °C until the final temperature reached 20 °C. We could explore the cooling rate by varying δt. We hasten to add that limitations in the simulation times prevent us from reaching the cooling rates achieved in experiments. Extended Data Fig.  7 shows that the overall picture of phase separation is not qualitatively altered, regardless of the procedures followed in initiating the simulations. This indicates that our simulations are robust with respect to the initial conditions and simulation protocols, and that the phase separation of the repeat RNAs occurs spontaneously, just as in experiments. It should be noted that experimental studies from other groups also showed that RNA molecules alone undergo phase separation without using the aforementioned annealing method, 27,34 or using a rapid cooling rate as we did here 24 . Thus, it appears that initial conditions do not affect the mechansism of LLPS in RNA repeat sequences.

Droplet stability at low salt concentrations.
To probe the droplet formation in low salt conditions, we included electrostatic effects. In addition to the bond, angle, excluded volume and base-pairing terms in equation (1), each nucleotide now carries a charge of −q (0 < q < 1) to account for counterion condensation 69 . The electrostatic repulsion between them is modelled using the Debye-Hückel theory, accounting for the screening effect of the buffer solution. This approach has been successfully used to calculate accurately the thermodynamics of RNA folding 72,75 . To test if the phase separation occurs at low salt conditions, we performed simulations using a model that includes ion effects. The initial conditions correspond to a preformed condensate. The simulations show that the RNA droplets start dissociating into smaller droplets to form oligomers and monomers (Extended Data Fig. 9). Thus, in the presence of monovalent ions alone, the droplet is unstable, which recapitulates qualitatively the experimental findings.
Of course, these simulations using the Debye-Hückel theory only account for the effect of monovalent ions (Na + , K + , and so on). This approach could be extended to include divalent cations (Mg 2+ , Ca 2+ and so on) explicitly, while keeping monovalent ions at the continuum level, as we have shown previously for the RNA folding problem 72 . Although simulations using such a model would be computationally demanding, they would provide the complete phase diagram. The SIS model is sufficiently general to accommodate ion explicitly, as shown elsewhere 72,75 .
Scrambled sequences are less likely to undergo LLPS. We then tried a different sequence to test the model transferability. We shuffled the sequence of (CAG) 47 to maintain the GC content and chose the sequence (out of an astronomically large number of s eq ue nc es) C CG GGAAGAGACCGCAACAGAAGCAGCCG CGAGCGCGACAGCGACGAGCACGCGGCACAGACAGCAAGAGAAGGG AAGACAAAGAGCCGACGGAAGCCACGCAGAGCAAAAGGACGCCGGC GGAACGACAAGGAAAGAGAG. To the best of our knowledge, Jain and Vale did not report the order of nucleotides in their scrambled sequence, which forced us to create one for computational purposes. We then repeated the simulations at different bulk concentrations 200, 100, 50 and 20 μM. We observed that this particular scrambled sequence undergoes phase separation, which seems to be in apparent contradiction to the findings of the Jain-Vale experiment (Extended Data Fig. 4). However, in our simulations, the fraction of chains that are in the droplet state for the scrambled sequence is lower than in (CAG) 47 (Extended Data Fig. 4a). As a result, the concentration of the diluted solution in the case of the scrambled sequence is larger (45 μM, compared to 25 μM for (CAG) 47 ) (Extended Data Fig. 4c). We emphasize that the concentration of the diluted phase is the saturated concentration C sat , above which phase separation occurs. The observation of C scrambled sat > C (CAG) 47 sat indicates that the propensity to undergo phase separation of the scrambled sequence is lower than in (CAG) 47 . On the other hand, the concentration inside the droplets of the scrambled sequence is lower than in droplets composed of (CAG) 47 (5 versus 10 mM), suggesting less compaction of the scrambled sequence droplets. This decreased compaction is probably due to the random nature of the sequence, leading to a decreased probability of finding similarly patterned neighbouring chains to form base pairs. We also observed that the droplet sizes in the case of the scrambled sequence are smaller than (CAG) 47 , and the system tends to form many smaller droplets instead of a few relatively large droplets (Extended Data Fig. 4).
Taken together, the simulations suggest that the propensity to undergo LLPS of the scrambled sequence is less than that of the repeat sequence, which may be in qualitative agreement with the experiment 33 . It is possible that a much higher concentration of the scrambled sequence is needed to induce phase separation. Even if the concentration is higher than C sat , it is conceivable that due to the smaller droplet sizes in the scrambled sequence compared to the repeat sequence, it is below the critical threshold detectable in the experiment (or one would need to wait longer for the droplets to grow to detectable sizes).
Local behaviour of RNA chains growing from monomer to oligomer to droplet. A major advantage of simulations is that the microscopic behaviour of the RNA chains can be visualized and investigated if the simulations are validated against experiments. In particular, it is interesting to probe the sequences of events, which are not currently accessible in experiments, that occur as the monomer becomes part of the condensate. One difficulty is that the fate of each chain could be different, which implies that there may not be typical or most probable changes in the dynamics as the monomer becomes part of the droplet. With this caveat, we performed several analyses focusing on the local behaviour of chains.
Extended Data Fig. 3 illustrates examples of local dynamics while multiple chains form a single droplet. In the series of snapshots, we show a process of coalescence by 11 RNA chains that eventually form an 11-mer in the middle of the simulation at 200 μM (CAG) 47 . We observed that chains often form small oligomers first, then they coalesce to form larger oligomers, leading eventually to a droplet. Oligomers can also grow by merging with monomers and dimers. In the sample trajectory, we show that a trimer and a tetramer coalesce into a heptamer, followed by integrating other dimers to finally form an 11-mer. A more detailed description of the event is provided in the caption of Extended Data Fig. 3. Supplementary Fig. 1 shows MSD for nucleotides in an RNA monomer in the diluted phase. The MSD scales as τ 0.78 , which is substantially larger than τ 1/4 expected for reptation. This proves that reptation dynamics only occurs inside the droplets.

Data availability
All data are included in the paper and the Supplementary Information. The raw data are available on Zenodo at https://zenodo.org/record/5794441. 90

Code availability
The codes to perform simulations and analyses are available at GitHub (https://github.com/tienhungf91/RNA_llps). Fig. 2 | structures of isolated (CAG) n monomers. a, Distribution of the end-to-end distance R ee and b, radius of gyration R g of (CAG) n with n=47, 31 and 20. The vertical dash lines in a indicate mean values for self-avoiding random walk chains with the same n. Snapshots are for (CAG) 47 . Cytosine is in cyan, adenine is in red and guanine is in black. c, Bond-bond orientational correlation function cos θ (s) as a function of the sequence distance s. The periodicity, as indicated by the vertical lines, is unmistakable. The inset shows average inter-nucleotide distances R(s) vs. s. The dashed line shows R(s) for a self-avoiding polymer (R(s) ∝ s 0.588 ). At large s, there is an abrupt drop in R(s) because the two ends strongly interact with each other, thus bringing them to proximity. d, Contact map for (CAG) 47 shows that the majority of interactions occur along the anti-diagonal, indicating the formation of hairpin structures.