One of the most critical aspects of our project is the design of the toehold riboswitches to detect our target sequences. In order to do this, we have come up with a comprehensive in silico model to study their properties and how they relate to their sequence. While toehold riboswitches can be used to detect both DNA and RNA targets, we focused primarily on RNA targets for two important reasons:
- mRNA molecules are found in higher concentrations in cells than DNA
- Available models are better suited to model RNA-RNA interactions
A tool to design toehold riboswitches
We noticed that the methods used to design the switches in the original article (Green et al., 2014) were very lengthy. We looked for computational tools we could use for their design. We found that the 2017 iGEM team from the Chinese University of Hong Kong had already produced a similar online tool (To et al., 2018). However, we tried submitting a query to the server and the results were not generated in a reasonable amount of time (>2 months). Therefore, we decided the best way to move forward would to write our own software to produce riboswitches, especially since we could have control over all the parameters and make it as versatile as possible to suit any user’s needs, while also providing rigorous tests to base the choice of the toehold sequence (see Software).We are currently hosting this tool in our Github repository at see https://github.com/igem-ulaval/toehold_designand considering hosting it in a web server with a previously designed structure (Barbeau et al., 2018). The following sections explain the considerations and tests we perform.
RNA molecules fold into secondary structures
RNA folding allows toehold riboswitches to fold into their secondary structure. However, this can also be an obstacle because the target RNA molecule can also fold (Figure 1). As a result, many potential binding sites for our riboswitches will not be accessible because they are already bound to another sequence within the target molecule. We reasoned that our riboswitches would need to bind more favorably to their target than the competing sequences to be successful. Therefore, we used the NUPACK suite (Zadeh et al., 2011) to model the secondary structure of the target molecule and use the number of unpaired positions as a parameter that could affect binding efficiency.
Once we have selected a suitable trigger sequence, the next step is to add the other sequence elements and calculate the most stable secondary structure. These include:
- A loop at the top of the hairpin with the ribosome binding sequence (RBS).
- Closing the hairpin with the complement of a part of the recognition sequence.
- Adding the start codon (AUG) within said complementary sequence.
- Including a linker between the start codon and the reporter gene. Such a linker should not contain stop codons.
- The reporter gene or a flag to simulate it.
Several of these parameters can be specified by the user, including:
- The target gene from which candidate triggers will be tested
- The number of bases of the recognition sequence that contribute to the hairpin.
- The reporter gene or tag to be used.
Thermodynamic in silico tests
Each of our candidate toehold riboswitches goes through the following series of thermodynamic tests to rank them:
- Test 1 - Free energy of the secondary structure: The change of Gibbs free energy (ΔG) allows us to predict if a process will happen spontaneously or not. In this sense, it is desirable that our toehold riboswitches fold spontaneously into the appropriate secondary structure and stay in a folded state. The more negative the ΔG of folding, the more favorable the folded state is.
- Test 2 - Differences of free energy between the bound and unbound state: This test compares the ΔG of folding of the unbound toehold switch and the target mRNA to that of the bound state. The bound state of the toehold switch should be more favorable than the unbound state. Since a more negative ΔG indicates a more favorable process, suitable toeholds will satisfy the following condition:
ΔGbound < ΔGtoehold + ΔGmRNA
- Test 3 - Accurate binding to the target: Toeholds are designed to recognize a specific trigger sequence within a target gene. Thus, we can use the simulated bound state to look at the list of paired positions and evaluate if the toehold can successfully bind to the intended target. Thus, we prioritize toehold switches that are predicted to bind exactly to all the intended positions over the ones that are missing one or more pairs. The failure to bind the intended target might cause unspecific binding, which could lead to false positives. However, we could use it to learn about sequence properties related to unspecific binding and improve our tool to avoid them.
- Test 4 - Alignment to reference genomes: A final test for the specificity of our recognition sequence is to align them to user-defined genomes (Camacho et al., 2009)(Camacho et al., 2009). This allows us to evaluate whether there are sequences in the other reference genomes that could produce false positives in a real setting. For example, in a hospital, there could be potential human cells in air samples. Therefore, it is important to make sure that the target sequences in the pathogen’s genome do not appear in the human genome and those of other organisms that may be realistically present. However, it could also be used to select riboswitches that could target several strains of the same species. For instance, we used this function to produce toeholds that could target six different strains of measles virus whose genomes were recently sequenced (Phan et al., 2018) (see Parts).
By doing these tests, we observed that the number of available positions in the candidate trigger does not seem to affect the binding accuracy or free energy of toehold switches (Figure 2). If anything, triggers with fewer accessible positions seem to provide marginally greater efficiency and a more favorable binding energy. However, the correlation between these two variables is very low, as implied by the low R2 value. This observation is contrary to our expectations, but factors involved might include the small sample size for these regions with high accessibility or sequence properties like low GC content that cause binding to them to be weaker. We also observed that between 70-79% of the toeholds our tool produces are predicted to bind perfectly to the target, while the remaining ones have at mismatches or shifts in the binding position or produced stop codons (see Software).
3D model and molecular dynamics
Molecules are not rigid entities, but actually dynamic structures. They interact with their 3D environment, which causes them to vibrate and potentially change their conformation. In the case of our toehold riboswitches, while we had previously evaluated whether the secondary structure is expected to form spontaneously, the following step was to examine if this structure remains stable in time. To study this, we first extended our secondary structure model of the most effective toehold from the original reference (Green et al., 2014) to two 3D models. We produced the first one using the RNAComposer online server (Purzycka et al., 2015; Popenda et al., 2012) (Figure 3A) and the second one through a collaboration with Team iGEM Concordia using Rosetta (Das et al., 2010; Cheng et al., 2015; Schrödinger, 2015) (Figure 3B). We validated both structures using the MOLProbity server (Chen et al., 2010), which showed that both models have an overall high quality (Tables 1 and 2). Indeed, the two models show a very similar structure for the hairpin, which is the most important part of the structure because of its role in blocking the ribosome’s access to the ribosome binding sequence. However, they showed different conformations in the ends of the molecule. We reasoned that those molecules are inherently more flexible because of a lack of hydrogen bonds, which could explain why they might have access to different conformations.
Figure 3: Alternative 3D structures of a toehold riboswitch. A) Model dervied from the predicted secondary structure using the RNAComposer server. B) Model derived from the predicted secondary structure using Rosetta. Matthew Tiranardi from Team iGEM Concordia contributed to this modeling effort. Both structures were visualized with Pymol (Schrodinger LLC, 2015).
|All-Atom Contacts||Clashscore, all atoms:||2.83||98th percentile* (N=1784, all resolutions)|
|Clashscore is the number of serious steric overlaps (>0.4 Å) per 1000 atoms.|
|Nucleic Acid Geometry||Probably wrong sugar puckers:||2||2.02%||
|Bad backbone conformations#:||20||20.20%||Goal:<= 5%|
|Bad bonds:||0 / 2364||0.00%||Goal: 0%|
|Bad bonds:||0 / 3683||0.00%||
|All-Atom Contacts||Clashscore,all atoms:||14.47||51st percentile* (N=1784, all resolutions)|
|Clashscore is the number of serious steric overlaps (>0.4 Å) per 1000 atoms.|
|Nucleic Acid Geometry||Probably wrong sugar puckers:||1||1.01%||
Goal: <= 5%
|Bad backbone conformations#:||11||11.11%||Goal: 0|
|Bad bonds:||0 / 2365||0.00%||Goal: 0%|
|Bad bonds:||0 / 3684||0.00%||
Once we had 3D structures for the toehold riboswitch, we mounted a system for molecular dynamics simulations with the CHARMM-GUI server (Lee et al., 2016; Jo et al., 2008) and the NAMD simulation engine (Phillips et al., 2005). This would allow us to explore the dynamics of our structural model and to evaluate the flexibility of the ends of the molecule. We ran two simulations starting from our RNAComposer model: a computationally tractable one using an implicit solvent and a computationally intensive one using explicit solvent. Briefly, the simulation with implicit solvent uses the Born generalization to derive an approximation of the effect of solvation on each atom in the toehold’s structure, while the simulation with explicit solvent actually uses distance measurements to water molecules to calculate the magnitude of their interactions with the toehold (Anandakrishnan et al., 2015). In the simulations, we observe qualitatively that under an implicit solvent the toehold riboswitch starts unfolding after the first 40 ns (Movie 1) but not in the explicit solvent simulation (Movie 2). We analyzed the number of hydrogen bonds formed in the hairpin throughout the simulations using VMD built-in tools (Humphrey et al., 1996). In the explicit solvent simulation, the number of hydrogen bonds stays closer to the theoretical optimal expectation, which suggests a stable hairpin (Figure 4). This confirms the advantages of using an explicit solvent to obtain more accurate results from simulations. However, it also suggests that there could be potential for design improvements to prevent leaky expression caused by a spontaneous unfolding of the toehold riboswitch. We studied which hydrogen bonds were detected in most of the simulation and which were lost to find potential sites for design improvement (Figure 5). We found that the most stable hydrogen bonds were mediated by GC pairings at the base and the top of the hairpin, while the least stable ones were located closer to the AUG mismatch near the center of the hairpin. This result suggests that even more stable structures could be obtained by selecting sequences that have a higher GC content or ensuring that the last pairing at the top of the hairpin is a GC pair. Nevertheless, an excessively stable structure could bind inefficiently to the trigger sequence, so this potential trade-off between preventing leaky expression and target recognition. Further simulations will allow us to provide insight into this design aspect and further improvements to our workflow.
Anandakrishnan, R., A. Drozdetski, R.C. Walker, and A.V. Onufriev. 2015. Speed of conformational change: comparing explicit and implicit solvent molecular dynamics simulations. Biophys. J. 108:1153–1164.
Barbeau, X., A.T. Vincent, and P. Lagüe. 2018. ConfBuster: Open-Source Tools for Macrocycle Conformational Search and Analysis. Journal of Open Research Software. 6.
Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T.L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10:421.
Cheng, C.Y., F.-C. Chou, and R. Das. 2015. Modeling complex RNA tertiary folds with Rosetta. Methods Enzymol. 553:35–64.
Chen, V.B., W.B. Arendall 3rd, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, L.W. Murray, J.S. Richardson, and D.C. Richardson. 2010. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66:12–21.
Das, R., J. Karanicolas, and D. Baker. 2010. Atomic accuracy in predicting and designing noncanonical RNA structure. Nat. Methods. 7:291–294.
Green, A.A., P.A. Silver, J.J. Collins, and P. Yin. 2014. Toehold switches: de-novo-designed regulators of gene expression. Cell. 159:925–939.
Humphrey, W., A. Dalke, and K. Schulten. 1996. VMD: visual molecular dynamics. J. Mol. Graph. 14:33–8, 27–8.
Jo, S., T. Kim, V.G. Iyer, and W. Im. 2008. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 29:1859–1865.
Lee, J., X. Cheng, J.M. Swails, M.S. Yeom, P.K. Eastman, J.A. Lemkul, S. Wei, J. Buckner, J.C. Jeong, Y. Qi, S. Jo, V.S. Pande, D.A. Case, C.L. Brooks 3rd, A.D. MacKerell Jr, J.B. Klauda, and W. Im. 2016. CHARMM-GUI Input Generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM Simulations Using the CHARMM36 Additive Force Field. J. Chem. Theory Comput. 12:405–413.
Phan, M.V.T., C.M.E. Schapendonk, B.B. Oude Munnink, M.P.G. Koopmans, R.L. de Swart, and M. Cotten. 2018. Complete Genome Sequences of Six Measles Virus Strains. Genome Announc. 6. doi:10.1128/genomeA.00184-18.
Phillips, J.C., R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R.D. Skeel, L. Kalé, and K. Schulten. 2005. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26:1781–1802.
Popenda, M., M. Szachniuk, M. Antczak, K.J. Purzycka, P. Lukasiak, N. Bartol, J. Blazewicz, and R.W. Adamiak. 2012. Automated 3D structure composition for large RNAs. Nucleic Acids Res. 40:e112.
Purzycka, K.J., M. Popenda, M. Szachniuk, M. Antczak, P. Lukasiak, J. Blazewicz, and R.W. Adamiak. 2015. Chapter One - Automated 3D RNA Structure Prediction Using the RNAComposer Method for Riboswitches1. In Methods in Enzymology. S.-J. Chen and D.H. Burke-Aguero, editors. Academic Press. 3–34.
Schrödinger, L.L.C. 2015. The PyMOL Molecular Graphics System, Version 1.8.
To, A.C.-Y., D.H.-T. Chu, A.R. Wang, F.C.-Y. Li, A.W.-O. Chiu, D.Y. Gao, C.H.J. Choi, S.-K. Kong, T.-F. Chan, K.-M. Chan, and K.Y. Yip. 2018. A comprehensive web tool for toehold switch design. Bioinformatics. 34:2862–2864.
Zadeh, J.N., C.D. Steenberg, J.S. Bois, B.R. Wolfe, M.B. Pierce, A.R. Khan, R.M. Dirks, and N.A. Pierce. 2011. NUPACK: Analysis and design of nucleic acid systems. J. Comput. Chem. 32:170–173.