Team:HUBU-WUHAN/Model

Model

Modeling of intrinsic terminator

The termination of transcription events is a necessary process in all organisms, which occurs at the end of every transcriptional unit. In the synthesized gene circuits, a strong promoter is required to enhance the expression quantity. In our project, there are several genes need to be introduced into the plasmid. All genes are preceded by strong promoters, like Pgap of Zymomonas mobilis, it would generate a high RNAP flux and may interfere with the next transcription unit. The un-regulated expression of downstream genes may be toxic to cells. And in the future, we will integrate these biological parts into the genome, this will increase genomic transcriptional events. In addition, the encounters between RNAP and DNAP may result in clashes during replication [1], which will increase genomic instability and can cause double-stranded breaks [2].

...

Figure 1. Unexpected events during the transcription process.

Types of termination

There are two mechanisms in bacteria to promote transcriptional termination by the sequence signal of terminators. Rho-dependent termination requires Rho-factor for termination, which binds to specific sites in the single-stranded RNA and moves along the nascent RNA towards the transcription complex. When it catches up the ECs (Elongation transcription complex), Rho-factor enables the polymerase to terminate transcription. Rho-independent termination, or intrinsic termination, induces ECs disassembly via a signal encoded by the DNA and RNA, which comprised a GC-rich hairpin immediately followed by an 8-nt U-rich tract (U-tract) [3]. The U-tract causes transcription to pause, thus allowing the hairpin to form and disrupt the binding between the U-tract RNA and the template DNA [4]. In Zymomonas mobilis we use intrinsic terminators at the end of the gene that we introduced.

Parameters to evaluate terminator strength

Termination occurs in three steps: (1) RNAP pause by U-tract. (2) Hairpin form. (3) RNA U-tract pulled from DNA. By understanding the mechanism of transcriptional termination, we select the parameters associated with termination.

(1) The biochemical role of the U-tract is to cause transcription to pause and to provide a weak base pairing to the template DNA that favors the dissociation. The free energy of the binding between the U-tract and the template DNA, , can be calculated using[5]:

...

where Nu=8 is the length of the U-tract, is the initiation term for RNA/DNA hybridization and is the free-energy contribution of the RNA/DNA hybridization from the dinucleotide pair at positions i and i + 1. Parameters is from[6].

(2) is the free energy for the closure of the hairpin loop. is the free energy for the stacking of the last three base stacks.

(3) There was also a correlation between TS and the presence of an A-rich tract upstream of the hairpin. The role of the A-tract could be to extend the hairpin stem, which contributes to the ratcheting of the U-tract from the DNA.

is the free energy of the folded RNA beginning 8 nucleotides upstream of the hairpin and ending 8 nucleotides downstream of the hairpin. is the free energy of hairpin folding.

Feature identification and parameter calculation

It is very important to correctly identify the hairpin and U-tract for subsequent parameter calculation. Here we write a Python script to find the sequence characteristics of the terminator.

First, we use KineFold [7] to simulate the structure that RNA would form during the transcription. For every terminator we repeat the simulation 20 times with different random seeds. We identify these simulation results and select the structure with the highest frequency or the same frequency but lower free energy. We scan the text output by the software and determine the hairpin and U-tract according to the arbitrary rules we set. is calculated during locating 8-nt U-tract. U-tract is searching in a range from 6-nt upstream to 8-nt downstream of the end of hairpin (because U-tract may pair to A-tract). In first 6-nt, if sequence starts with a U, then it is U-tract, otherwise U-tract is the sequence with lowest RNA/DNA hybrid thermodynamic parameters.

Then, we use ViennaRNA Package [8] to calculate the free energy for every structure characteristics. RNAfold is used to calculate , and the free energy for the hairpin ΔG_H. RNAeval was used to calculate .

Prediction and verification

After we get the above five parameters, The Statistics and Machine Learning Toolbox in MATLAB was used to predict the strength of terminators. After testing the tools in the toolbox, we decided to use the Gaussian Process Regression (GPR) for the prediction. The model is trained by test data from published literature [5]. Some special terminators in the datasets have strong termination strength which is caused by irregular structure or have interactions with plasmid sequences. These unusual data were excluded. Trained results are shown as Figure 2.

At last we introduced 15 endogenous terminators in Zymomonas mobilis into the dual-reporter gene system (Figure 3) and measured the intensity ratio of upstream and downstream fluorescent proteins. The correlation between the ratio of GFP to mCherry and the predicted Ts is shown below (Figure 4). Our model performs better than published literature(Figure 5) [5] because we can better associate terminator sequence characteristics with strength through GPR. which calculates the probability distribution over all admissible functions that fit the data rather than specific function. It is very suitable for such complex regression problems as high dimension and small sample.

...

Figure 2. Correlation between true data and predicted data, R-square is 0.46.

...

Figure 3. The schematic diagram of the dual-reporter gene system.

...

Figure 4. Correlation between the ratios of GFP to mCherry and predicted Ts (terminator strength).

...

Figure 5. Correlation between the measured strength and predicted Ts in data set.

Conclusion

We build a workflow (Figure 6) to predict the strength of terminator sequences. A sequence was input, and the structure was predicted using software with and the structural features identified by the Python scrips we coded. The feature found by the Python script can be used to calculate the free energy, which is then used as inputs of the prediction model. Finally, we can get a predicted Ts to choose a stronger terminator and use flow cytometry to verify the real strength of terminators.

...

Figure 6. Work flow form terminator sequence to real strength.

Reference

1. Helmrich A, Ballarino M, Nudler E, Tora L: Transcription-replication encounters, consequences and genomic instability. Nat Struct Mol Biol 2013, 20(4):412-418.

2. Bellecourt MJ, Ray-Soni A, Harwig A, Mooney RA, Landick R: RNA Polymerase Clamp Movement Aids Dissociation from DNA but Is Not Required for RNA Release at Intrinsic Terminators. J Mol Biol 2019, 431(4):696-713.

3. d'Aubenton Carafa Y, Brody E, Thermes C: Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. J Mol Biol 1990, 216(4):835-858.

4. Gusarov I, Nudler E: The mechanism of intrinsic transcription termination. Mol Cell 1999, 3(4):495-504.

5. Chen YJ, Liu P, Nielsen AAK, Brophy JAN, Clancy K, Peterson T, Voigt CAJNM: Characterization of 582 natural and synthetic terminators and quantification of their design constraints. 2013, 10(7):659-664.

6. Sugimoto N, Nakano S, Katoh M, Matsumura A, Nakamuta H, Ohmichi T, Yoneyama M, Sasaki M: Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. Biochemistry 1995, 34(35):11211-11216.

7. Xayaphoummine A, Bucher T, Isambert H: Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res 2005, 33(Web Server issue):W605-610.

8. Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL: ViennaRNA Package 2.0. Algorithms Mol Biol 2011, 6:26.