Software
Toehold riboswitches are at the heart of our project. However, methods to design them are not straightforward and we were not able to get results in a reasonable amount of time with an available tool (CUHK 2017).Thus, we decided to write Toeholder, our own tool to design toeholds. Toeholder is available on Github,under the MIT license and as a web server at https://toeholder.ibis.ulaval.ca.Its availability will allow the synthetic biology community to use it to design toeholds, customize it, and provide comments for further development. The available documentation in our repository mentions the libraries and dependencies of our tools and how to install them:
- Biopython (Cock et al. 2009. Bioinformatics)
- Numpy
- Pandas
- NUPACK (Zadeh et al. 2011. Journal of Computational Chemistry)
- BLAST+ (Camacho et al. 2009. BMC Bioinformatics)
A general scheme of the workflow implemented in Toeholder is presented in Figure 1. Briefly, the code uses a moving window to look at candidates for the trigger sequence within a gene provided by the user. Following Green et al. (2014), suitable candidates should have a triad of nucleotides with 2 weak (A-T) pairings at the base of the hairpin. For each candidate, a toehold riboswitch using the complement of the candidate trigger (recognition sequence), as well as the RBS, loop, and linker sequences, as shown in Figure 2. Toeholds are then examined to ensure they do not have stop codons in the translated region, and the NUPACK suite (Zadeh et al., 2011) is used to test their secondary structure and the free energy of binding to the target. The optimal secondary structure was considered to be that of the toehold riboswitch from Green et al. (2014) that had the highest ON/OFF ratio, that is, the greatest difference between the activated and the background states (Figure 3). To characterize the secondary structures produced by our software, we derived consensus structures for all riboswitches produced for several target genes. The consensus structures were determined by assigning each position the symbol in dot-parentheses notation that was observed with the higher frequency for that position. As shown in Table 1, the consensus structures produced with our tool always contain the main elements required for the optimal secondary structure: main hairpin, the loop with the ribosome binding sequence, and the small hairpin at the end. Afterward, the recognition sequences are aligned to user-defined genomes in order to test its specificity. This allows us to evaluate whether there are sequences in the other reference genomes that could produce false positives in a real setting. For example, in a hospital, there could be potential human cells in air samples. However, it could also be used to select riboswitches that could target several strains of the same species. For instance, we used this function to produce toeholds that could target six different strains of measles virus whose genomes were recently sequenced (Phan et al., 2018) (see Parts).Finally, results are saved in a library of toeholds the user can use to decide which toehold to use based on the accuracy and favorability of its binding to the target.
Target |
Number of toeholds | Consensus secondary structure |
---|---|---|
Best toehold from Green et al. (2014) | 1 | .................(((((((((((..((((((...............))))))..)))))))))))..(((....)))((((.....)))).... |
Consensus from toeholds for the oxyR gene from E. coli | 175 | .........((((((..(((((((((((..((((((...............))))))..)))))))))))..))))))))..((((.....)))).... |
Consensus from toeholds for a putative CDS from the Phi6 bacteriophage | 386 | ..........((.....((((((((((...((((((...............))))))...))))))))))..))))).))..((((.....)))).... |
Consensus from toeholds for a second putative CDS from the Phi6 bacteriophage | 96 | ...........((((..((((((((((((.((((((...............)))))).))))))))))))...)))).))..((((.....)))).... |
Consensus from toeholds for an ORF from the PR772 bacteriophage | 267 | ........((((((...((((((((((((.((((((...............))))))..)))))))))))...)))))))..((((.....)))).... |
Consensus from toeholds for the FQV34_gp2 ORF from norovirus GII/Hu/JP/2007/GII.P15_GII.15/Sapporo/HK299 | 379 | ...........((....(((((((((((..((((((...............))))))..)))))))))))..))))).))..((((.....)))).... |
Consensus from toeholds for the MG912594.1 ORF from measles virus H1 | 406 | ............((...(((((((((((..((((((...............))))))..)))))))))))...))).)))..((((.....)))).... |
Consensus from toeholds for the BAC44882.1 gene from the human alphaherpesvirus 3 (chickenpox) | 233 | .........((((....(((((((((((..((((((...............))))))..)))))))))))...)))))))..((((.....)))).... |
Our tool produces toeholds that are predicted to bind successfully to the trigger sequence, as shown in Figure 4. First, we evaluated if the binding of the toehold to its target was more stable than the unbound state. To assess this, we calculated the free energy of the unbound state as the sum of the separate free energies of the toehold riboswitch and the target mRNA. If the bound state is more favorable than the unbound state, then subtracting the free energy of the unbound state from the free energy of the bound state should yield a negative value. Indeed, Figure 4A shows the distribution of free energy differences, with all the values being negative. This suggests that all our toeholds would be favored to bind to their trigger sequences. We then evaluated the accuracy of binding, that is if the toehold bound the trigger sequence exactly at the intended position. To do this, we compared the list of paired positions in the bound structure we simulated with NUPACK to the list of intended paired positions, while keeping track of the toeholds that produced stop codons. We classified matches as perfect if all paired positions within the recognition sequence were the same as the intended ones and as imperfect if there was at least one unintended paired position. Figure 4B shows that most of our toeholds (78%) produced perfect matches with their trigger sequences, while a minority of them should be discarded because they form stop codons or produced imperfect matches. This confirms that our tool produces sets of toeholds that are predicted to bind correctly to their trigger sequences within the target gene. All results shown in Figure 4 were obtained when producing 175 toehold riboswitches for the oxyR gene of Escherichia coli, but binding accuracies in the 70-79% range for coding sequences from the Phi6 and PR772 bacteriophages.
Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T.L. Madden. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10:421.
Cock, P.J.A., T. Antao, J.T. Chang, B.A. Chapman, C.J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, and M.J.L. de Hoon. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25:1422–1423.
Green, A.A., P.A. Silver, J.J. Collins, and P. Yin. 2014. Toehold switches: de-novo-designed regulators of gene expression. Cell. 159:925–939.
Phan, M.V.T., C.M.E. Schapendonk, B.B. Oude Munnink, M.P.G. Koopmans, R.L. de Swart, and M. Cotten. 2018. Complete Genome Sequences of Six Measles Virus Strains. Genome Announc. 6. doi:10.1128/genomeA.00184-18.
Zadeh, J.N., C.D. Steenberg, J.S. Bois, B.R. Wolfe, M.B. Pierce, A.R. Khan, R.M. Dirks, and N.A. Pierce. 2011. NUPACK: Analysis and design of nucleic acid systems. J. Comput. Chem. 32:170–173.