Team:OUC-China/Model

kkkkk
Overview:

This year OUC-China are working on post-transcriptional regulation. At present, post-transcriptional regulation is still a hot topic of research, and various regulation methods and levels are what most people pursue. In our work, we want the modular riboswitch to have a wider level of regulation. Therefore, in order to explore what parameters play a crucial role in the translation level of our RiboLego system, we carried out sensitivity analysis on the parameters obtained by ordinary differential equation(ODE). And the result of the ODE model is that translation rate plays an important role in different translation levels.

So in order to change the translation rate, the thermodynamic model was introduced . By this model, we know that riboswitch and stabilizer is how to affect the downstream gene translation level.What’s more, this sequence-to-function thermodynamic model can predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios.Depending on the thermodynamic model, a variety of tuners were successfully designed.

The thermodynamic model shows that different stabilizers will affect downstream gene translation rate. So how to select the suitable stabilizer become crucial. We get the docking matrix corresponding score function, and compared these score values, so as to obtain the suitable to stabilizer sequence.

Through various tuners, we have more translation levels in our RiboLego system. But our aim is to have more diverse regulation, so we introduced asRNA to re-activate or re-repress the riboswitch by hiding or exposing the RBS. According to the following design factors: an Hfq binding site, thermodynamics, the asRNA length, the number of mismatches, and the binding location. Finally, we successfully designed the corresponding asRNA sequence. And the repressing or activating effect of the designed asRNA sequence were predicted by the sequence-effect function.

Finally, in order to explore the whole process of the riboswitch from off to on at the molecular level with or without ligands, we used molecular dynamics to simulate the whole process and explored how the riboswitch function in more detail by distinguishing its on ,off and transition states by molecular distance. And depend on it, we can understand the molecular mechanism which the riboswitch is turned on.


Fig A. Model overview.




PART 1: ODE

In our work, our aim is to have more different translation’s levels. Therefore, in order to understand which parameters play a key role in the RiboLego system, we used ODE model to simulate the whole process of the RiboLego system.

REACTION:

We can describe our RiboLego system to be followings:

① null -> Riboswitch_off

② Riboswitch_off + ligand <-> Riboswitch_on

③ Riboswitch_on -> STA + Riboswitch_on

④ Riboswitch_on -> GOI + Riboswitch_on

⑤ Riboswitch_off -> null

⑥ Riboswitch_on -> ligand

⑦ STA -> null

⑧ GOI -> null

ODES:

To simulate the dynamics of sfGFP, we use ordinary differential equations to model the reactions above. And ODEs are given as follows:

d[ GOI ] dt =kp2[ Riboswitch_on ]-kd3[ GOI ] MathType@MTEF@5@5@+= feaagKart1ev2aaaMvivLHfij5gC1rhimfMBNvxyNvgaCzMCHn2E7T hxY12EK1xFCXwzMr3wGS3rpLuFGWLCPDgA01vF9T3EKrxF9bspGSgC YWfBLzgDBbYEsLMyVn3DPr3yOX1xV5wFGWLCPDgA011ECjxB7bslGS gzZ0xFCXwzMr3wGS3rpLuFGWLCPDgA01fatCvAUfeBSjuyZL2yd9gz LbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYL wzYbItLDharqqtubsr4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZ LdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8 WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vq pWqaaeaabiGaciaacaqabeaadaqaaqaafaGcbaaeaaaaaaaaa8qada WcaaWdaeaapeGaaeizamaadmaapaqaa8qacaWGhbGaam4taiaadMea aiaawUfacaGLDbaaa8aabaWdbiaadsgacaWG0baaaiabg2da9iaadU gacaWGWbGaaGOmamaadmaapaqaa8qacaWGsbGaamyAaiaadkgacaWG VbGaam4CaiaadEhacaWGPbGaamiDaiaadogacaWGObGaai4xaiaad+ gacaWGUbaacaGLBbGaayzxaaGaaeylaiaabUgacaqGKbGaae4mamaa dmaapaqaa8qacaWGhbGaam4taiaadMeaaiaawUfacaGLDbaaaaa@92F2@

For the readability, the complex symbol is simplified as:


SPECIES, SYMBOLS AND PARAMETERS


FIG A.Species, symbols and parameters.


The parament we used in the ODEs is listed in the following table:


FIG B.The parament we used in the ODEs.


SIMULATION

With the help of Simbiology toolbox in Matlab,we simulate our RiboLego system for 16h, the result can be seen in the Fig.A.


Fig.A The dynamics of sfGFP by model prediction


We compare the experimental data to the simulation:




Fig.B The comparison between experimental data and simulation data


SENSITIVITIES ANALYSIS

In order to explore what parameters play a crucial role in the translation level of our RiboLego system, we carried out sensitivity analysis on the parameters obtained by ordinary differential equation.


Fig C. The numerical integration of sensitivities of parameters in 14h


Through sensitivity analysis, we found that parameters k1, kp2, kdc1 and kd2 had an effect on the expression level of GOI. Since our work focuses on post-transcriptional regulation, we want to keep the riboswitch on-state and GOI translation level stable. Therefore, it is our best choice to change the translation rate to achieve the diversification of the regulation level of RiboLego.

To verify that kp2 has an effect on translation rate in the riboswitch systems.We went further. We did the following exploration:

The above differential equations are nonlinear equations. It is known from the theorem that there is a solution. However, after calculation by MATLAB program, there is no explicit solution for this nonlinear ordinary differential equations, so the model is reasonably linearized.

Since the amount of the ligand does hardly change during the reaction, we assume that the ligand is a constant.

This system of equations is a linear system of equations. By the MATLAB program , we know that the symbolic solution of GOI(E) with respect to time t:(a1= 289.5845,a2= 4.2734e-07,a3= 7.3077e+19,a4= -0.1561,a5= -90.2011,a6=- 2.9988e+05,a7= 1.5465e-49,a8= -7.7220e+41,a9= 8.7493e-06,C1 C2 C3 C4 are consants.)


The translation rate corresponds to Kp2 in our model. So, we only need to do local sensitivity analysis of Kp2. Using following model we will know how the Kp2 effect the value of GOI.


The result shows when we increase kp2, the GOI increase, which means we can use the translation rate to change the GOI expression level in following models.



PART 2: NEW THERMODYNAMIC MODEL

FIRST PART: THERMODYNAMIC MODEL WITH PROMOTER

By performing sensitivity analysis of all parameters in this process, we find that translation rate plays an important role in different translation levels. So how to change the translation rate is a big problem for us. Fortunately, from last year’s work, we found that the rate of translation initiation is positively proportional to the translation rate. And the thermodynamic model can calculate the rate of translation initiation.So we try to use thermodynamic model to design different tuners to achieve our goal.

As we all know, in bacteria,translational coupling provides a genetically encoded mechanism to control expression level ratios within multi-cistronic operons.From picture A, we can know the coupled translation of a bi-cistronic operon.The schematic shows the promoter,the (yellow box) 5' UTR, the (red) upstream coding sequence, the (orange box) intergenic region, the (green) downstream coding sequence, and the transcriptional terminator. The intergenic region contains an inhibitory RNA structure composed of the common RBS, repressing region and coupling region . We named it tuner. When the upstream coding sequence is translated, the ribosome unbinds tuner’s inhibitory structure to facilitate downstream coding sequence.

Fig A. Translational coupling encoded mechanism.


Therefore, the inhibitory structure of tuner directly affects the translation rate of downstream coding sequence. The thermodynamic model takes advantage of this principle to transform the inhibitory structure into translation rate. Finally, We have developed a sequence-to-function thermodynamic model of translational coupling to predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios. You only give the sequences, the model will use this sequences to calculate the sequences' translation initiation rate .

Now we have known the coupled translation model for bicistronic. So we should know how to use this thermodynamic model to predict GOI’s translation level. In the process of translation, the initial rate of translation determines the rate of translation process. This thermodynamic model will predict the GOI’s initial rate of translation in two parts. The first part is ribosome re-initiation. The second part is ribosome de novo initiation. Ribosome re-initiation occurs when an upstream elongating ribosome dissociates and reassembles at an intergenic region, followed by translation initiation of the downstream coding sequence. Ribosome de novo initiation occurs when cytosolic ribosomes assemble onto the mRNA at intergenic regions and initiate translation of the downstream coding sequence.


Fig B. Ribosome re-initiation and de novo ribosome initiation.


From this model, we get a formula to calculate the GOI’S initial rate of translation.The proposed thermodynamic model of translational coupling quantifies the molecular interactions that control both ribosome re-initiation and de novo initiation rates according to an inputted mRNA sequence. Model components and their corresponding sequence region are color coded(Fig C), including the free energies of inhibitory RNA structures (coupled and non-coupled), the intergenic distance, the free energy of the ribosome-bound state, and the upstream coding sequence’s translation rate.



Fig C. The proposed biophysical model of translational coupling quantifies the molecular interactions that control both ribosome re-initiation and de novo initiation rates according to an inputted mRNA sequence(tuner).



The ΔG total,upstream refers to the total binding free energy between the ribosome and 5’UTR, according to the equation:

Δ G total,upstream =Δ G mRNArRNA +Δ G spacing +Δ G start +Δ G standby Δ G mRNA MathType@MTEF@5@5@+= feaahqart1ev3aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVu0Je9sqqr pepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs 0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaai aabeqaamaabaabauaakqaabeqaaiabfs5aejaadEeadaWgaaWcbaGa amiDaiaad+gacaWG0bGaamyyaiaadYgacaGGSaGaamyDaiaadchaca WGZbGaamiDaiaadkhacaWGLbGaamyyaiaad2gaaeqaaOGaeyypa0Ja euiLdqKaam4ramaaBaaaleaacaWGTbGaamOuaiaad6eacaWGbbGaey OeI0IaamOCaiaadkfacaWGobGaamyqaaqabaGccqGHRaWkcqqHuoar caWGhbWaaSbaaSqaaiaadohacaWGWbGaamyyaiaadogacaWGPbGaam OBaiaadEgaaeqaaOGaey4kaSIaeuiLdqKaam4ramaaBaaaleaacaWG ZbGaamiDaiaadggacaWGYbGaamiDaaqabaaakeaacqGHRaWkcqqHuo arcaWGhbWaaSbaaSqaaiaadohaciGG0bGaaiyyaiaac6gacaWGKbGa amOyaiaadMhaaeqaaOGaeyOeI0IaeuiLdqKaam4ramaaBaaaleaaca WGTbGaamOuaiaad6eacaWGbbaabeaaaaaa@7C28@

The translation initiation rate of the first CDs(upstream) is calculated according to r 1 exp( βΔ G total ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMfgvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFftFGWfCY9gC09giCvgEWbcxSvMz0Hci7bslGWLyLrxyGWfrLXgD HbYEh91E7XLCTThDVrxyS1xF91xFGWLCPDgA0LcatCvAUfeBSjuyZL 2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerb d9wDYLwzYbItLDharqqtubsr4rNCHbWexLMBbXgBd9gzLbvyNv2Cae Hbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0x bba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0= vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaafaGcbaaeaaaaaaaa a8qacaqGYbWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabg2Hi1k GacwgacaGG4bGaaiiCamaabmaapaqaa8qacqGHsislcqaHYoGycqqH uoarcaWGhbWdamaaBaaaleaapeGaaeiDaiaab+gacaqG0bGaaeyyai aabYgaa8aabeaaaOWdbiaawIcacaGLPaaaaaa@7598@
The ΔG mRNA-rRNA refers to the free energy of folding for mRNA-rRNA complex, which is negative.

The ΔG spacing refers to an energetic penalty for ribosomal stretching or compression caused by a long or short spacer region between the 16S rRNA binding site and start codon, which is negative.

The ΔG start refers to the free energy for fMet-tRNA and start codon complex, which is negative.

The ΔG standby refers to an energy penalty determined by the mRNA standby site’s interactions with the ribosome’s platform domain, which is positive.

The ΔG mRNA refers to the free energy of folding for 5’UTR,which is negative.


Fig D. The regions that consisted of tuner and their corresponding free energy.


To sum up, each tuner sequence has its corresponding free energy, which represents the regulation level of each tuner. In order to accurately calculate the magnitude of each free energy, it is necessary to learn to distinguish the regions of the sequence.The reason why tuner is consisting of 35 nucleotides before and after the start codon,is that the model predictions do not improve when longer subsequences are considered. The tuner subsequence S1 consists of the max(1, nstart-35) to nstart nucleotides and the subsequence S2 consists of the max(1,nstart-35) to nstart+35 nucleotides, where nstart is the position of a start codon.

To calculate the ΔGmRNA:rRNA, ΔGstart, ΔGmRNA, and ΔGstandby free energies, we use the NUPACK suite of algorithms.And five energy can be seen vividly in the Fig D.These free energy calculations do not have any additional fitting or training parameters and explicitly depend on the mRNA sequence. In addition, the free energy terms are not orthogonal; changing a single nucleotide can potentially affect multiple energy terms.

ΔGmRNA.Using the NuPACK ‘mfe’ algorithm and Mfold parameters, the mfe configuration of

sequence S2 is calculated and its free energy is designated ΔGmRNA.

ΔG standby.The standby site is the 4 nt region upstream of the 16S rRNA binding site. The energy required to unfold the standby site is determining by calculating the mfe of sequence S2 with and without preventing the standby site from forming base pairs. The difference between these mfe's is

designated ΔGstandby. The mfe of sequence S2 without a standby site is considered as △GmRNA. To calculate the mfe of sequence S2 with a standby site that is constrained to be single-stranded, the sequence is first split into two subsequences, their mfes are each calculated, and then summed together. The two subsequences are the nucleotides nstart-35 to n3-4 and n3 to nstart+35, where n3 is the most 5′ base pair in the 16S rRNA binding site and 4 is the standby site length.

ΔGmRNA:rRNA.Using the NuPACK ‘subopt’ algorithm with Mfold 3.0 parameters at 37°C base pair configurations of the folded 16S rRNA and sequence S1 are enumerated, starting with the minimum free energy (mfe) configuration and continuing with suboptimal configurations, each with a corresponding ΔGmRNA:rRNA. In our work, we choose the mfe configuration's ΔGmRNA:rRNA .

ΔGspacing .The spacing(s) is a distance between SD sequence’s 3’ terminal and the start codon’s first base .When the 30S complex is stretched (s > 5 nt), the ΔGspacing is calculated according to the quadratic equation,

Δ G spacing = C 1 ( S S opt ) 2 + C 2 ( S S opt ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMrjvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7nhCHnwAUDwF91xF7XLCTThi9asF9T3qFftF7XfBLzgD OaYEtbslGS3uFT3ECjxB79gC01xF91xFGWLCPDgA0LIxY0hiRaYEd9 Lm9XfBLzgDOaYEtbslGS3uFT3ECjxB79gC01xF91xFGWLCPDgA0Lca tCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBae XatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexLMBbXgB d9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8 qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9 pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaafa Gcbaaeaaaaaaaaa8qacqqHuoarcaWGhbWdamaaBaaaleaapeGaae4C aiaabchacaqGHbGaae4yaiaabMgacaqGUbGaae4zaaWdaeqaaOWdbi aab2dacaWGdbWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbmaabmaa paqaa8qacaWGtbGaeyOeI0Iaam4ua8aadaWgaaWcbaWdbiaab+gaca qGWbGaaeiDaaWdaeqaaaGcpeGaayjkaiaawMcaa8aadaahaaWcbeqa a8qacaaIYaaaaOGaey4kaSIaam4qa8aadaWgaaWcbaWdbiaaikdaa8 aabeaak8qadaqadaWdaeaapeGaam4uaiabgkHiTiaadofapaWaaSba aSqaa8qacaqGVbGaaeiCaiaabshaa8aabeaaaOWdbiaawIcacaGLPa aaaaa@95B4@

where sopt = 5 nt, c1 = 0.048 kcal/mol/nt2, and c2 = 0.24 kcal/mol/nt. When the 30S complex

is compressed (s < 5 nt), the ΔGspacing is calculated according to the sigmoidal function,

Δ G spacing = C 1 [ 1+exp( C 2 ( S S opt +2 ) ) ] 3 MathType@MTEF@5@5@+= feaagKart1ev2aaaMPkvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7nhCHnwAUDwF91xF7XLCTThi9asF9XLzYf2y7T3Ed9vm 91xF7T3E7XfBLzgDBbYEXaYkGWvz4bhiCXwzMrhkGS3Ed9Lm9XfBLz gDOaYEtbslGS3uFT3ECjxB79gC01xF91hiRaIm9bcxYL2zOrxk9bcx YL2zOrxk9bcxYL2zOrxx951m91xFamXvP5wqSXMqHnxAJn0BKvguHD wzZbqefqvATv2CG4uz3bIuV1wyUbqedmvETj2BSbqefm0B1jxALjhi ov2DaebbnrfifHhDYfgatCvAUfeBSn0BKvguHDwzZbqegSSZmxoasa acH8YjY=vipgYlh9vqqj=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vq ai=hGuQ8kuc9pgc9q8qqaq=dir=f0=yqaiVgFr0xfr=xfr=xb9adba qaaeGaciGaaiaabeqaamaabaabauaakeaaqaaaaaaaaaWdbiabfs5a ejaadEeapaWaaSbaaSqaa8qacaqGZbGaaeiCaiaabggacaqGJbGaae yAaiaab6gacaqGNbaapaqabaGcpeGaaeypamaalaaapaqaa8qacaWG dbWdamaaBaaaleaapeGaaGymaaWdaeqaaaGcbaWdbmaadmaapaqaa8 qacaaIXaGaey4kaSIaciyzaiaacIhacaGGWbWaaeWaa8aabaWdbiaa doeapaWaaSbaaSqaa8qacaaIYaaapaqabaGcpeWaaeWaa8aabaWdbi aadofacqGHsislcaWGtbWdamaaBaaaleaapeGaae4BaiaabchacaqG 0baapaqabaGcpeGaey4kaSIaaGOmaaGaayjkaiaawMcaaaGaayjkai aawMcaaaGaay5waiaaw2faa8aadaahaaWcbeqaa8qacaaIZaaaaaaa aaa@9F41@

where c1 = 12.2 kcal/mol and c2 = 2.5 nt -1.

ΔGstart.The ΔGstart is -1.19 and -0.075 kcal/mol for AUG and GUG start codons, respectively


The translation initiation rate of the second CDs is then calculated by summing together two sources of translational coupling: ribosome re-initiation and ribosome de novo initiation according to

r 2 α r reinitiation ( 2 ) +exp( βΔ G total (2) ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMDjvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFjtFCfgBWHwyGShxY12EY1xFFT3ECjxB7jxzP5wA0Lwy0L2BU1xF 951ECXwzMrhkGidiCjxANHgDP0hiRacxLHhCGWfBLzgDOaYEG0ciCj wz0fgiCruzSrxyGC0x7ThxY12E09gDHXwF91Nx7HImP0xFGWLCPDgA 0LcatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTf MBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexLMB bXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbb f9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as 0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaq aafaGcbaaeaaaaaaaaa8qacaqGYbWdamaaBaaaleaapeGaaGOmaaWd aeqaaOWdbiabeg7aHjaabkhapaWaa0baaSqaa8qacaqGYbGaaeyzai aabMgacaqGUbGaaeyAaiaabshacaqGPbGaaeyyaiaabshacaqGPbGa ae4Baiaab6gaa8aabaWdbmaabmaapaqaa8qacaaIYaaacaGLOaGaay zkaaaaaOGaey4kaSIaciyzaiaacIhacaGGWbWaaeWaa8aabaWdbiab gkHiTiabek7aIjabfs5aejaadEeapaWaa0baaSqaa8qacaqG0bGaae 4BaiaabshacaqGHbGaaeiBaaWdaeaapeGaaiikaiaaikdacaGGPaaa aaGccaGLOaGaayzkaaaaaa@9C3E@

The first part in formula can be calculated by the following formula:


Where the kreinitiation(d1,2) refers to the intergenic distance dependence and the kp refers to the proportionality constant between the ribosome assemble rate and the translation initiation rate.

For the kreinitiation(d1,2) is proved that can be calculate by the formula following:


Where the d=xstart-xstop-3, xstart refers to the first nucleotides in jth CDs’s start codon while the xstop refers to first nucleotides in i th CDs's stop codon.



Fig E. d>0 and d<0.


And it also points that the kp=10 .

The second part in formula can be calculated by the following formula:


Where the ΔG coupling refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacing region or the downstream footprint region of ribosome.(If we assume that elongating ribosomes are uniformly distributed across a coding sequence with a footprint of 30 nucleotides (fp), then each one will occupy a fraction of the coding sequence’s length (fp/L).)

The ΔG noncoupling refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the Fcoupling can be calculated by the following formula:

Fcoupling=1/[ 1+c*exp( β*ΔGtotal ) ] MathType@MTEF@5@5@+= feaagKart1ev2aaaM1gvLHfij5gC1rhimfMBNvxyNvga7XLCTTNrJ9 wDWXwAUDgi9asmV0xFCXwzMr3wGSxmGSciJPcxLHhCGWfBLzgDOaYE G0ciCjwz0fgiQWfrLXgDHbYr09gDHXwFGWLCPDgA0LsFGWLCPDgA01 fatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMB aeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexLMBbX gBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9 v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0F b9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaa faGcbaaeaaaaaaaaa8qacaqGgbGaae4yaiaab+gacaqG1bGaaeiCai aabYgacaqGPbGaaeOBaiaabEgacaqG9aGaaeymaiaab+cadaWadaWd aeaapeGaaGymaiabgUcaRiaadogacaGGQaGaciyzaiaacIhacaGGWb WaaeWaa8aabaWdbiabgkHiTiabek7aIjaacQcacqqHuoarcaWGhbGa amiDaiaad+gacaWG0bGaamyyaiaadYgaaiaawIcacaGLPaaaaiaawU facaGLDbaaaaa@8384@

Which the C is the ribosome-assisted unfolding coefficient. C=0.81 in our study.

We’ve understood about this thermodynamic model, but this model doesn’t apply for riboswitch. Because compared to normal 5’ UTR, the riboswitch’s structure will affect the upstream gene’s translation level in different ways. So we should build a new coupled translation model for bicistronic with riboswitch and use this new thermodynamic model to describe how the riboswitch affects the upstream and downstream gene’s translation level.



SECOND PART: THERMODYNAMIC MODEL WITH RIBOSWITCH


Fig F. The reaction coordinate diagram showing the states, energies, ligand-binding and translation initiation. Without ligand, the ribosome binds to a folded mRNA (state 1) with free energy ΔG total,OFF . When excess ligand is present, the co-transcriptional folding of mRNA and ligand through paths a ends with a mRNA–ligand complex (state 3) that binds the ribosome with free energy ΔG total,ON . The stability of the mRNA–ligand complex is controlled by the switching free energy (ΔG mRNA + ΔG ligand ).



As it shows in the Fig F, when there is no ligand, the conformational change leads to ΔG total,OFF, while ligands are excess, the conformational change leads to ΔG total,ON. In the later cases, it could be divided into three conditions. First, excessive concentration of intracellular ligands and stable binding of mRNA molecules to ligands. Second, the concentration of ligand is small, and the mRNA molecule can bind to ligand stably. Third, mRNA molecules cannot bind to ligands stably.

First situation: Excessive concentration of intracellular ligands and stable binding of mRNA molecules to ligands


Fig G. The change of free energies during the ligand binding and ribosome 30S subunit binding to the mRNA:No ligand binds,Ligand binds.


As it shows in the Fig G, conformational changes lead to the changes of free energies. The initial state is the minimum energy state, when ribosome 30S subunit binds to mRNA, it will change to the final state. We focus on the energy difference of final and initial state, that is, ΔG total, because it is proportionable to our desired translation initiation rate(ron and roff).


When there is no lgand:

Δ G totalOFF =Δ G mRNArRNA +Δ G start +Δ G standby Δ G mRNA MathType@MTEF@5@5@+= feaagKart1ev2aaaMzivLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFThDVrxySbslG8Krg1xFG0diCruzSrxyGS3rFTxBs5uqYjLtb1xF GSciCruzSrxyGS3rFT3C0fMC01xFGSciCruzSrxyGS3rFT3CCrxyUb czILxF9bslGWfrLXgDHbYEh91ETjLtb1xFamXvP5wqSXMqHnxAJn0B KvguHDwzZbqefqvATv2CG4uz3bIuV1wyUbqedmvETj2BSbqefm0B1j xALjhiov2DaebbnrfifHhDYfgatCvAUfeBSn0BKvguHDwzZbqegSSZ mxoasaacH8YjY=vipgYlh9vqqj=hEeeu0xXdbba9frFj0=OqFfea0d Xdd9vqai=hGuQ8kuc9pgc9q8qqaq=dir=f0=yqaiVgFr0xfr=xfr=x b9adbaqaaeGaciGaaiaabeqaamaabaabauaakeaaqaaaaaaaaaWdbi abfs5aejaadEeapaWaaSbaaSqaa8qacaWG0bGaam4BaiaadshacaWG HbGaamiBaiabgkHiTiaad+eacaWGgbGaamOraaWdaeqaaOWdbiabg2 da9iabfs5aejaadEeapaWaaSbaaSqaa8qacaWGTbGaamOuaiaad6ea caWGbbGaamOCaiaadkfacaWGobGaamyqaaWdaeqaaOWdbiabgUcaRi abfs5aejaadEeapaWaaSbaaSqaa8qacaWGZbGaamiDaiaadggacaWG YbGaamiDaaWdaeqaaOWdbiabgUcaRiabfs5aejaadEeapaWaaSbaaS qaa8qacaWGZbGaciiDaiaacggacaGGUbGaamizaiaadkgacaWG5baa paqabaGcpeGaeyOeI0IaeuiLdqKaam4ra8aadaWgaaWcbaWdbiaad2 gacaWGsbGaamOtaiaadgeaa8aabeaaaaa@9CC5@

When ligand is binding:


The corresponding translation initiation rates r are:

r OFF exp( βΔ G total,OFF ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMLgvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFT3tgzuF9bcxWj3BWr3BGWvz4bhiCXwzMrhkGShiTacxIvgDHbcx evgB0fgi7D0x7ThxY12E09gDHXglpzKr91xF91hiCjxANHgDPaWexL MBbXgBcf2CPn2qVrwzqf2zLnharuavP1wzZbItLDhis9wBH5garmWu 51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyamXvP5wqSX2qVr wzqf2zLnharyWYoZC5aibaieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea 0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0dXdbba9pGe9xq=Jbb G8A8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaqbaOqa aabaaaaaaaaapeGaaeOCa8aadaWgaaWcbaWdbiaad+eacaWGgbGaam OraaWdaeqaaOWdbiabg2Hi1kGacwgacaGG4bGaaiiCamaabmaapaqa a8qacqGHsislcqaHYoGycqqHuoarcaWGhbWdamaaBaaaleaapeGaae iDaiaab+gacaqG0bGaaeyyaiaabYgacaGGSaGaae4taiaabAeacaqG gbaapaqabaaak8qacaGLOaGaayzkaaaaaa@7D0C@

r ON exp( βΔ G total,ON ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMDgvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFT3to1xFGWfCY9gC09giCvgEWbcxSvMz0Hci7bslGWLyLrxyGWfr LXgDHbYEh91E7XLCTThDVrxySXsF99Kt91xFGWLCPDgA0LcatCvAUf eBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxB I9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexLMBbXgBd9gzLb vyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr 0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYR Xxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaafaGcbaae aaaaaaaaa8qacaqGYbWdamaaBaaaleaapeGaam4taiaad6eaa8aabe aak8qacqGHDisTciGGLbGaaiiEaiaacchadaqadaWdaeaapeGaeyOe I0IaeqOSdiMaeuiLdqKaam4ra8aadaWgaaWcbaWdbiaabshacaqGVb GaaeiDaiaabggacaqGSbGaaiilaiaad+eacaWGobaapaqabaaak8qa caGLOaGaayzkaaaaaa@7B0E@

In the riboswitch model, the calculating method for ΔGstart and ΔGspacing remains the same, however, there are quite a lot of changes for ΔGstandby, ΔGmRNA and ΔGmRNA:rRNA. The specific algorithm are listed as follows:

ΔGmRNA:rRNA and ΔGmRNA:ligand:rRNA: Using NuPACK ‘mfe’ algorithm to calculate the free energy when 16rRNA binds to mRNA.

ΔGmRNA : Using NuPACK ‘mfe’ algorithm to calculate the free energy of the initial state of riboswitch.

ΔGstandby: we use formula below to calculate this free energy:

Δ G standby =Δ G distortion +Δ G unfolding +Δ G sliding MathType@MTEF@5@5@+= feaagKart1ev2aaaM5jvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7nxF9XfDH5gi7XLCTThzILxF91xF7XLCTThi9asF9Xfr LXgDHbYEh91E7XLCTThzPnhDVjhDP9MB91xF9bYkGWfrLXgDHbYEh9 1E7XLCTTxDUz2BSrwAUDwF91xFGSciCruzSrxyGS3rFT3ECjxB7nhB PrwAUDwF91xFamXvP5wqSXMqHnxAJn0BKvguHDwzZbqefqvATv2CG4 uz3bIuV1wyUbqedmvETj2BSbqefm0B1jxALjhiov2DaebbnrfifHhD YfgatCvAUfeBSn0BKvguHDwzZbqegSSZmxoasaacH8YjY=vipgYlh9 vqqj=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9 q8qqaq=dir=f0=yqaiVgFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabe qaamaabaabauaakeaaqaaaaaaaaaWdbiabfs5aejaadEeapaWaaSba aSqaa8qacaqGZbGaciiDaiaacggacaGGUbGaaeizaiaabkgacaqG5b aapaqabaGcpeGaaeypaiabfs5aejaadEeapaWaaSbaaSqaa8qacaqG KbGaaeyAaiaabohacaqG0bGaae4BaiaabkhacaqG0bGaaeyAaiaab+ gacaqGUbaapaqabaGcpeGaey4kaSIaeuiLdqKaam4ra8aadaWgaaWc baWdbiaabwhacaqGUbGaaeOzaiaab+gacaqGSbGaaeizaiaabMgaca qGUbGaae4zaaWdaeqaaOWdbiabgUcaRiabfs5aejaadEeapaWaaSba aSqaa8qacaqGZbGaaeiBaiaabMgacaqGKbGaaeyAaiaab6gacaqGNb aapaqabaaaaa@A952@

Here, ΔG distortion describes ribosomal platform’s distortion energy penalty, and can be calculate by this:

Δ G distortion = C 1 ( A S ) 2 + C 2 ( A 2 )+ C 3 MathType@MTEF@5@5@+= feaagKart1ev2aaaMfivLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7rwAZr3BYrxAV5wF91xF7XLCTThi9asF9T3qFftF7XfB LzgDOaYE7f0xt1xFGWLCPDgA0LIxY0hiRaYEd9Lm9XfBLzgDOaYE7f 0xY0xFGWLCPDgA0LciRaYEd91m9bWexLMBbXgBcf2CPn2qVrwzqf2z LnharuavP1wzZbItLDhis9wBH5garmWu51MyVXgaruWqVvNCPvMCG4 uz3bqee0evGueE0jxyamXvP5wqSX2qVrwzqf2zLnharyWYoZC5aiba ieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbb G8FasPYRqj0=yi0dXdbba9pGe9xq=JbbG8A8frFve9Fve9Ff0dmeaa baqaciGacaGaaeqabaWaaeaaeaqbaOqaaabaaaaaaaaapeGaeuiLdq Kaam4ra8aadaWgaaWcbaWdbiaabsgacaqGPbGaae4CaiaabshacaqG VbGaaeOCaiaabshacaqGPbGaae4Baiaab6gaa8aabeaak8qacaqG9a Gaam4qa8aadaWgaaWcbaWdbiaaigdaa8aabeaak8qadaqadaWdaeaa peGaamyqa8aadaWgaaWcbaWdbiaadofaa8aabeaaaOWdbiaawIcaca GLPaaapaWaaWbaaSqabeaapeGaaGOmaaaakiabgUcaRiaadoeapaWa aSbaaSqaa8qacaaIYaaapaqabaGcpeWaaeWaa8aabaWdbiaadgeapa WaaSbaaSqaa8qacaaIYaaapaqabaaak8qacaGLOaGaayzkaaGaey4k aSIaam4qa8aadaWgaaWcbaWdbiaaiodaa8aabeaaaaa@8B1A@

As=15+D+PH MathType@MTEF@5@5@+= feaagKart1ev2aaaMDdvLHfij5gC1rhimfMBNvxyNvga7f0C7XLCTT hi9asF9ftn7XLCTThiRasF9reiRacuG0cii1hatCvAUfeBSjuyZL2y d9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9 wDYLwzYbItLDharqqtubsr4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHb l7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaafaGcbaaeaaaaaaaaa8 qacaWGbbGaam4Caiaab2dacaaIXaGaaGynaiaabUcacaWGebGaey4k aSIaamiuaiabgkHiTiaadIeaaaa@5A3C@

As means the available single-stranded surface area; H means the hairpin’s height; D&P stand for distal and proximal binding sites, while C1 , C2 and C3 are the fitted parameter values and equal to 0.038kcal/mol/nt2 ,-1.629kcal/mol/nt and 17.359kcal/mol.

ΔG sliding quantize the presence of a sliding mechanism for the ribosomal platform. It’s the energy penalty the intervening mRNA structures are not unfolded, but are pushed aside with an energetic cost. To calculate this energy, we sum up the heights of all downstream RNA structures, and multiplied it by the coefficient (c = 0.2 kcal/mol/nt) to yield ΔG sliding.

ΔG unfolding describes the free energy released by unfolding base pairs. According to Amin Espah Borujeni[1], Unfolding 1 bp increases the standby site module’s surface area by 3 nt and increases the unfolding energy penalty to 0.90 kcal/mol.

ΔGmRNA:ligand: In the case that we know the secondary structure of initial state of riboswitch in “ON” state, we use the RNAeval web server to calculates the energy of the ligand-binding riboswitch.


Second situation: the concentration of ligand is small, and the mRNA molecule can bind to ligand stably

During the ligand binding event, the mRNA folding conformation does switch from its minimum free energy at the OFF state to a more positive free energy at the ON state, resulting in a positive energy penalty ΔΔG mRNA

ΔΔ G mRNA =Δ G mRNA,ON Δ G mRNA,OFF MathType@MTEF@5@5@+= feaagKart1ev2aaaMbhvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGW frLXgDHbYEh91E7XLCTTxB91NuofuF9bspGWfrLXgDHbYEh91E7XLC TTxB91Nuofelp5uF9bslGWfrLXgDHbYEh91E7XLCTTxB91Nuofelpz Kr91hatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1B TfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexL MBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hh bbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0= as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqa aqaafaGcbaaeaaaaaaaaa8qacqqHuoarcqqHuoarcaWGhbWdamaaBa aaleaapeGaaeyBaiaadkfacaWGobGaamyqaaWdaeqaaOWdbiabg2da 9iabfs5aejaadEeapaWaaSbaaSqaa8qacaqGTbGaamOuaiaad6eaca WGbbGaaiilaiaad+eacaWGobaapaqabaGcpeGaeyOeI0IaeuiLdqKa am4ra8aadaWgaaWcbaWdbiaab2gacaWGsbGaamOtaiaadgeacaGGSa Gaam4taiaadAeacaWGgbaapaqabaaaaa@82D2@

This energy penalty can be compensated by the free energy input from the binding of ligand to the mRNA, ΔG ligand (negative value)

Δ G ligand =RTIn( K D ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMLfvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7XwANfMBK1xF913ECjxB7bspG0xFsrvs7XLCTTNB91hx SvMz0Hci7T3sFruF9bcxYL2zOrxkamXvP5wqSXMqHnxAJn0BKvguHD wzZbqefqvATv2CG4uz3bIuV1wyUbqedmvETj2BSbqefm0B1jxALjhi ov2DaebbnrfifHhDYfgatCvAUfeBSn0BKvguHDwzZbqegSSZmxoasa acH8YjY=vipgYlh9vqqj=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vq ai=hGuQ8kuc9pgc9q8qqaq=dir=f0=yqaiVgFr0xfr=xfr=xb9adba qaaeGaciGaaiaabeqaamaabaabauaakeaaqaaaaaaaaaWdbiabfs5a ejaadEeapaWaaSbaaSqaa8qacaqGSbGaaeyAaiaabEgacaqGHbGaae OBaiaabsgaa8aabeaak8qacaqG9aGaamOuaiaadsfacaWGjbGaaeOB amaabmaapaqaa8qacaWGlbWdamaaBaaaleaapeGaamiraaWdaeqaaa GcpeGaayjkaiaawMcaaaaa@713B@

Here, R is the universal gas constant, while KD is the ligand’s dissociation constant

K D = C ligandfree C mRNAfree C mRNAligand C ligand,total C mRNA,free C mRNAligand MathType@MTEF@5@5@+= feaagKart1ev2aaaMHovLHfij5gC1rhimfMBNvxyNvga7T0xe1hi9a cxMjxyJT3E7n0x7ThxY12ESL2zH5gzMjxzL1xF913Ed91E7XLCTTxB 91Nuof0ECjxB7zMCLvwF91xF913E7T3qFT3ECjxB71wF9jLtbbslGS hxY12ESL2zH5gz91xF91xFGW1yV52zGS3qFT3ECjxB7XwANfMBKXcD VrxyS1xF91hxMjxyJT3E7n0x7ThxY12ET1xFs5uqSShxY12EMjxzL1 xF91xF9T3E7n0x7ThxY12ET1xFs5uqG0ci7XLCTThBPDwyUrwF91xF 91hatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTf MBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexLMB bXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hhbb f9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as 0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaq aafaGcbaaeaaaaaaaaa8qacaWGlbWdamaaBaaaleaapeGaamiraaWd aeqaaOWdbiabg2da9maalaaapaqaa8qacaWGdbWdamaaBaaaleaape GaaeiBaiaabMgacaqGNbGaaeyyaiaab6gacaqGKbGaaeOzaiaabkha caqGLbGaaeyzaaWdaeqaaOWdbiaadoeapaWaaSbaaSqaa8qacaqGTb GaamOuaiaad6eacaWGbbGaaeOzaiaabkhacaqGLbGaaeyzaaWdaeqa aaGcbaWdbiaadoeapaWaaSbaaSqaa8qacaqGTbGaamOuaiaad6eaca WGbbGaeyOeI0IaaeiBaiaabMgacaqGNbGaaeyyaiaab6gacaqGKbaa paqabaaaaOWdbiabgwKiajaadoeapaWaaSbaaSqaa8qacaqGSbGaae yAaiaabEgacaqGHbGaaeOBaiaabsgacaGGSaGaaeiDaiaab+gacaqG 0bGaaeyyaiaabYgaa8aabeaak8qadaWcaaWdaeaapeGaam4qa8aada WgaaWcbaWdbiaab2gacaWGsbGaamOtaiaadgeacaGGSaGaaeOzaiaa bkhacaqGLbGaaeyzaaWdaeqaaaGcbaWdbiaadoeapaWaaSbaaSqaa8 qacaqGTbGaamOuaiaad6eacaWGbbGaeyOeI0IaaeiBaiaabMgacaqG NbGaaeyyaiaab6gacaqGKbaapaqabaaaaaaa@E016@

When the switching free energy ( ΔΔG mRNA +ΔG ligand ) is negative, the mRNA-ligand binding is stable (first situation), and when the summation is positive the binding becomes metastable (second situation).

For the second situation, ON state translation initiation rate, rON,conc is:


Third situation , mRNA molecules cannot bind to ligands stably.

For the third situation, ON state translation initiation rate, rON,actual is

r ON,actual = r OFF + r ON,CONC exp( β( ΔΔ G mRNA +Δ G ligand ) ) 1+exp( β( ΔΔ G mRNA +Δ G ligand ) ) MathType@MTEF@5@5@+= feaagKart1ev2aaaM9=sbqvzybssUbxD0bctH52z1f2zLbWE7XLCTT NC913x79KtSShxY12EHngD1fgB91xF9ThxY12EG0di91hxMjxyJT3E 7ThxY12EY1xFFT3tgzuF9bYkGS3ECjxB7jxF991Ep5eld9Ktd1xFCv gEWbcxSvMz0Hci7bslGWLyLrxyGWfBLzgDOaYECruzSrxyGWfrLXgD HbYEh91E7XLCTTxB91NuofuF9bYkGWfrLXgDHbYEh91E7XLCTThBPD wyUrwF91xF9bcxYL2zOrxk9bcxYL2zOrxk913E7fdiRacxLHhCGWfB LzgDOaYEG0ciCjwz0fgiCXwzMrhkGShxevgB0fgiCruzSrxyGS3rFT 3ECjxB71wF9jLtb1xFGSciCruzSrxyGS3rFT3ECjxB7XwANfMBK1xF 91xFGWLCPDgA0LsFGWLCPDgA0LsF9bWexLMBbXgBcf2CPn2qVrwzqf 2zLnharuavP1wzZbItLDhis9wBH5garmWu51MyVXgaruWqVvNCPvMC G4uz3bqee0evGueE0jxyamXvP5wqSX2qVrwzqf2zLnharyWYoZC5ai baieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0x bbG8FasPYRqj0=yi0dXdbba9pGe9xq=JbbG8A8frFve9Fve9Ff0dme aabaqaciGacaGaaeqabaWaaeaaeaqbaOqaaabaaaaaaaaapeGaaeOC a8aadaWgaaWcbaWdbiaad+eacaWGobGaaiilaiaabggacaqGJbGaae iDaiaabwhacaqGHbGaaeiBaaWdaeqaaOWdbiaab2dadaWcaaWdaeaa peGaaeOCa8aadaWgaaWcbaWdbiaad+eacaWGgbGaamOraaWdaeqaaO WdbiabgUcaRiaabkhapaWaaSbaaSqaa8qacaWGpbGaamOtaiaacYca caWGdbGaam4taiaad6eacaWGdbaapaqabaGcpeGaciyzaiaacIhaca GGWbWaaeWaa8aabaWdbiabgkHiTiabek7aInaabmaapaqaa8qacqqH uoarcqqHuoarcaWGhbWdamaaBaaaleaapeGaaeyBaiaadkfacaWGob GaamyqaaWdaeqaaOWdbiabgUcaRiabfs5aejaadEeapaWaaSbaaSqa a8qacaqGSbGaaeyAaiaabEgacaqGHbGaaeOBaiaabsgaa8aabeaaaO WdbiaawIcacaGLPaaaaiaawIcacaGLPaaaa8aabaWdbiaaigdacqGH RaWkciGGLbGaaiiEaiaacchadaqadaWdaeaapeGaeyOeI0IaeqOSdi 2aaeWaa8aabaWdbiabfs5aejabfs5aejaadEeapaWaaSbaaSqaa8qa caqGTbGaamOuaiaad6eacaWGbbaapaqabaGcpeGaey4kaSIaeuiLdq Kaam4ra8aadaWgaaWcbaWdbiaabYgacaqGPbGaae4zaiaabggacaqG UbGaaeizaaWdaeqaaaGcpeGaayjkaiaawMcaaaGaayjkaiaawMcaaa aaaaa@0567@

ΔΔ G mRNA =Δ G mRNA,ON Δ G mRNA,OFF MathType@MTEF@5@5@+= feaagKart1ev2aaaMbhvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGW frLXgDHbYEh91E7XLCTTxB91NuofuF9bspGWfrLXgDHbYEh91E7XLC TTxB91Nuofelp5uF9bslGWfrLXgDHbYEh91E7XLCTTxB91Nuofelpz Kr91hatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1B TfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexL MBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hh bbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0= as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqa aqaafaGcbaaeaaaaaaaaa8qacqqHuoarcqqHuoarcaWGhbWdamaaBa aaleaapeGaaeyBaiaadkfacaWGobGaamyqaaWdaeqaaOWdbiabg2da 9iabfs5aejaadEeapaWaaSbaaSqaa8qacaqGTbGaamOuaiaad6eaca WGbbGaaiilaiaad+eacaWGobaapaqabaGcpeGaeyOeI0IaeuiLdqKa am4ra8aadaWgaaWcbaWdbiaab2gacaWGsbGaamOtaiaadgeacaGGSa Gaam4taiaadAeacaWGgbaapaqabaaaaa@82D2@

After figuring about all the three situations, we decide to combine the riboswitch part and bicistronic part together to yield the translation initiation rate (TIR) for our own GOI.

As we talk about in the first part, r2 can be calculated like this:

r 2 =r reintiation ( 2 ) + r 1 MathType@MTEF@5@5@+= feaagKart1ev2aaaMLhvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFjtF7XLCTThi9aIC913x7ThxY12EYvwF9ThxTfgDO9gC7XLCTbsA UrxF9bYECjxB7Lwy0L2BU1xF951ECXwzMrhkGidiCjxANHgDP0hiRa YE7XLCTTNC913xX0hatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwB Lnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtub sr4rNCHbWexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJ H8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq =JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaa caqabeaadaqaaqaafaGcbaaeaaaaaaaaa8qacaqGYbWdamaaBaaale aapeGaaGOmaaWdaeqaaOWdbiaab2dacaqGYbWdamaaDaaaleaapeGa aeOCaiaabwgacaqGPbGaaeOBaiaabshacaqGPbGaaeyyaiaabshaca qGPbGaae4Baiaab6gaa8aabaWdbmaabmaapaqaa8qacaaIYaaacaGL OaGaayzkaaaaaOGaey4kaSIaaeOCa8aadaWgaaWcbaWdbiaaigdaa8 aabeaaaaa@828D@

r 1 =exp( βΔ G total ( 2 ) ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMnhvLHfij5gC1rhimfMBNvxyNvga7ThxY12EY1 xFFftF7XLCTThi9asF9Xvz4bhiCXwzMrhkGShiTacxIvgDHbcxevgB 0fgih91E7XLCTThDVrxyS1xF951ECXwzMrhkGidiCjxANHgDP0xFGW LCPDgA0LcatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2D Gi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHb WexLMBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeu Y=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepe ea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaa daqaaqaafaGcbaaeaaaaaaaaa8qacaqGYbWdamaaBaaaleaapeGaaG ymaaWdaeqaaOWdbiaab2daciGGLbGaaiiEaiaacchadaqadaWdaeaa peGaeyOeI0IaeqOSdiMaeuiLdqKaam4ra8aadaqhaaWcbaWdbiaabs hacaqGVbGaaeiDaiaabggacaqGSbaapaqaa8qadaqadaWdaeaapeGa aGOmaaGaayjkaiaawMcaaaaaaOGaayjkaiaawMcaaaaa@7D59@

r reinitiation (2) =k p k reinitiati on ( d 1,2 )exp(βΔ C total (1) ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMDlvLHfij5gC1rhimfMBNvxyNvga7XLCTTNC91 3x7ThxY12EYvwAULgDPfgDP9MB91xFEThkYKsF7XLCTThi9asF9T3E CjxB7TwF991ECjxB7bxF913E7XLCTT3A913x7ThxY12EYvwAULgDPf gDP1xF913x7ThxY12EV5wF91hk7ThxY12EK1xFFTxmSitF9LcxLHhC GGciTacxIvgDHbcxevgB0fgid91E09gDHXwFEThkXKsFPaWexLMBbX gBcf2CPn2qVrwzqf2zLnharuavP1wzZbItLDhis9wBH5garmWu51My VXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyamXvP5wqSX2qVrwzqf 2zLnharyWYoZC5aibaieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea0xe9 Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0dXdbba9pGe9xq=JbbG8A8 frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaqbaOqaaaba aaaaaaaapeGaaeOCa8aadaqhaaWcbaWdbiaabkhacaqGLbGaaeyAai aab6gacaqGPbGaaeiDaiaabMgacaqGHbGaaeiDaiaabMgacaqGVbGa aeOBaaWdaeaapeGaaiikaiaaikdacaGGPaaaaOGaaeypaiaabUgapa WaaSbaaSqaa8qacaqGWbaapaqabaGcpeGaae4Aa8aadaWgaaWcbaWd biaabkhacaqGLbGaaeyAaiaab6gacaqGPbGaaeiDaiaabMgacaqGHb GaaeiDaiaabMgaa8aabeaakmaaBaaaleaapeGaae4Baiaab6gaa8aa beaak8qacaGGOaGaaeiza8aadaWgaaWcbaWdbiaaigdacaGGSaGaaG OmaaWdaeqaaOWdbiaacMcaciGGLbGaaiiEaiaacchacaGGOaGaeyOe I0IaeqOSdiMaeuiLdqKaam4qa8aadaqhaaWcbaWdbiaadshacaWGVb GaamiDaiaadggacaWGSbaapaqaa8qacaGGOaGaaGymaiaacMcaaaGc caGGPaaaaa@BBCB@

Here, we regard the TIR of riboswitch (rON) as r1, so as to build a connection between the riboswitch model and bicistronic thermodynamic model.

Depend on the riboswitch model and bicistronic thermodynamic model , we can design different tuners and predict different tuners' functions under the riboswitch easily. you only give the sequences, the thermodynamic model will use this sequences to calculate the sequences' energy. This energy will predict the GOI's initial rate of translation.

The following calculation example is given:


Fig H. Division of tunerE region.


From Fig H,we can see that there are a lot of regions from the tunerE which are mentioned before.

The results calculated by the program are shown in the following Fig I:



Fig I. Individual free energy values associated with tunerE

We compared the predicted translation rate of all tuner with the experimental results, and the results are as follows:



Fig J.Comparison between experimental data and model results



Now, we know how to calculate the energy of the tuner, but how to get the suitable tuner’s sequence is a important problem for us. In our work, we use python to get the target sequence. First, we generated a random sequence which consists of the max(x, nstart-35) to nstart+35 nucleotides. The nstart is the position of a start codon. The x is the first base of the sequence. The reason why we choose a sequence longer than nstart-35 is that we need to generate a stable secondary structure to hide the RBS. It so happened that this generated structure’s length is longer than nstart- 35 to nstart+35 nucleotides.Second, we set a target energy of the tuner. The software will defined the different regions of tuner which we mentioned before. And then the software will use ‘NuPACK’ algorithm to calculate the corresponding energy. This energy will compared with the goal energy. If they are too far apart, the software will change the common RBS, repressing region and coupling region’s sequences to keep approaching the target energy until the gap is small.

However, this thermodynamic model can not accurately predict the translation level. The translation rate is indeed related to the translation initiation rate, but the translation elongation rate also has great influence on the translation level. Therefore, to solve this problem, we introduce model relative codon bias(RCB).

There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. This model was confirmed in Escherichia coli. Depend on this model we can assess the strength of RCB(RCBS) in genes as a guide to the gene’s expression level. RCBS is the overall score of a gene indicating the influence of RCB of each codon in a gene.

The RCBS in estimating gene expression level is related to codon usage difference of a gene with respect to biased nucleotide composition at the three codin sites. Let f(x,y,z) be the normalized codon frequency for the codon triplet(x,y,z) of a gene. Then the RCB of a codon triplet (x,y,z) in a gene is defined as:

d xyz = f( x,y,z ) f 1 ( x ) f 2 ( y ) f 3 ( z ) f 1 ( x ) f 2 ( y ) f 3 ( z ) MathType@MTEF@5@5@+= feaagKart1ev2aaaMvmvLHfij5gC1rhimfMBNvxyNvga7r2x7HxE61 xFG0diCzMCHn2E7zgxSvMz0Hci7Hhl5XIE9bcxYL2zOrxkG0ci7z2x X0hxSvMz0Hci4bcxYL2zOrxk7z2xY0hxSvMz0Hci5bcxYL2zOrxk7z 2xZ0hxSvMz0Hci6bcxYL2zOrxk913E7TNzFftFCXwzMrhkGGhiCjxA NHgDPSNzFjtFCXwzMrhkGKhiCjxANHgDPSNzFntFCXwzMrhkGOhiCj xANHgDP0xFamXvP5wqSXMqHnxAJn0BKvguHDwzZbqefqvATv2CG4uz 3bIuV1wyUbqedmvETj2BSbqefm0B1jxALjhiov2DaebbnrfifHhDYf gatCvAUfeBSn0BKvguHDwzZbqegSSZmxoasaacH8YjY=vipgYlh9vq qj=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9q8 qqaq=dir=f0=yqaiVgFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqa amaabaabauaakeaaqaaaaaaaaaWdbiaadsgapaWaaSbaaSqaa8qaca WG4bGaamyEaiaadQhaa8aabeaak8qacqGH9aqpdaWcaaWdaeaapeGa amOzamaabmaapaqaa8qacaWG4bGaaiilaiaadMhacaGGSaGaamOEaa GaayjkaiaawMcaaiabgkHiTiaadAgapaWaaSbaaSqaa8qacaaIXaaa paqabaGcpeWaaeWaa8aabaWdbiaadIhaaiaawIcacaGLPaaacaWGMb WdamaaBaaaleaapeGaaGOmaaWdaeqaaOWdbmaabmaapaqaa8qacaWG 5baacaGLOaGaayzkaaGaamOza8aadaWgaaWcbaWdbiaaiodaa8aabe aak8qadaqadaWdaeaapeGaamOEaaGaayjkaiaawMcaaaWdaeaapeGa amOza8aadaWgaaWcbaWdbiaaigdaa8aabeaak8qadaqadaWdaeaape GaamiEaaGaayjkaiaawMcaaiaadAgapaWaaSbaaSqaa8qacaaIYaaa paqabaGcpeWaaeWaa8aabaWdbiaadMhaaiaawIcacaGLPaaacaWGMb WdamaaBaaaleaapeGaaG4maaWdaeqaaOWdbmaabmaapaqaa8qacaWG 6baacaGLOaGaayzkaaaaaaaa@B0E0@

where f1(x) is the normalized frequency of x at the first codon position, f2(y) is the normalized frequency of y at the second codon position, and f3(z) is the normalized frequency of z at the third codon position of the gene. The frequencies f1, f2, f3 have been derived from the set of codon samples of a gene and the normalization of frequency is done over the gene length in codons, in an attempt to compensate for the expected increase of RCB with the total number of codons. We quantify the degree of codon bias of a gene in such a way that comparisons can be made both within and between genomes. As defined earlier, dxyz contains somewhat more quantitative information than others, since it considers codon usage as well as the base compositional bias. Then the expression measure of a gene is

RCBS= ( i=1 L ( 1+ d xyz i ) ) 1/L 1 MathType@MTEF@5@5@+= feaagKart1ev2aaaMbhvLHfij5gC1rhimfMBNvxyNvgasnKqtbspGS hxSvMz0Hci7XfCY9gzCXwATLgDZ91EPbspGetFEXei7XfBLzgDOaYE XaYkGq2x7HxE61NxP1hiCjxANHgDP0hi9bcxYL2zOrxkETxmVWuF9b slGedatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1B TfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbWexL MBbXgBd9gzLbvyNv2CaeHbl7mZLdGeaGqiVCI8FfYJH8YrFfeuY=Hh bbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0= as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqa aqaafaGcbaaeaaaaaaaaa8qacaWGsbGaam4qaiaadkeacaWGtbGaey ypa0ZaaeWaa8aabaWdbmaarahapaqaa8qadaqadaWdaeaapeGaaGym aiabgUcaRiaadsgapaWaa0baaSqaa8qacaWG4bGaamyEaiaadQhaa8 aabaWdbiaadMgaaaaakiaawIcacaGLPaaaaSWdaeaapeGaamyAaiab g2da9iaaigdaa8aabaWdbiaadYeaa0Gaey4dIunaaOGaayjkaiaawM caa8aadaahaaWcbeqaa8qacaaIXaGaai4laiaadYeaaaGccqGHsisl caaIXaaaaa@7E88@

where d i xyz MathType@MTEF@5@5@+= feaagKart1ev2aaaMzcvLHfij5gC1rhimfMBNvxyNvga7rMx7XLCTT xA91xFFThE5PxFamXvP5wqSXMqHnxAJn0BKvguHDwzZbqefqvATv2C G4uz3bIuV1wyUbqedmvETj2BSbqefm0B1jxALjhiov2DaebbnrfifH hDYfgatCvAUfeBSn0BKvguHDwzZbqegSSZmxoasaacH8YjY=vipgYl h9vqqj=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pg c9q8qqaq=dir=f0=yqaiVgFr0xfr=xfr=xb9adbaqaaeGaciGaaiaa beqaamaabaabauaakeaaqaaaaaaaaaWdbiaadsgapaWaaWbaaSqabe aapeGaaeyAaaaak8aadaWgaaWcbaWdbiaadIhacaWG5bGaamOEaaWd aeqaaaaa@53D6@ is the codon usage difference of ith codon of a gene. L is the number of codons in the gene.

Our analysis is based on the hypothesis that RCB reflects the level of gene expression. The expression measure of a gene in this approach is denoted by RCBS. RCBS s thus useful for comparing different sets of genes. In most of the case, highly expressed genes have RCBS >0.5,the majority of gene (63%) have RCBS values lying between 0.2 and 0.4 and RCBS value close to 0 indicates a lack of bias for the codons.

Here is a example:

GOI sequence: AAA CGA CAC AGG AAA, L=5


Finally, by combing the thermodynamic model with RCBS, a new formula describing the translation rate is obtained:

R( translationrate )=r2*RCBS MathType@MTEF@5@5@+= feaagKart1ev2aaaM9dvLHfij5gC1rhimfMBNvxyNvgasXfBLzgDOa YE0jxyUnhBHrxAV5MCHrxz9bcxYL2zOrxkG0diYjJksnKqtbWexLMB bXgBcf2CPn2qVrwzqf2zLnharuavP1wzZbItLDhis9wBH5garmWu51 MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyamXvP5wqSX2qVrwz qf2zLnharyWYoZC5aibaieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0=yi0dXdbba9pGe9xq=JbbG8 A8frFve9Fve9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaqbaOqaaa baaaaaaaaapeGaamOuamaabmaapaqaa8qacaWG0bGaamOCaiaadgga caWGUbGaam4CaiaadYgacaWGHbGaamiDaiaadMgacaWGVbGaamOBai aadkhacaWGHbGaamiDaiaadwgaaiaawIcacaGLPaaacqGH9aqpcaWG YbGaaGOmaiaacQcacaWGsbGaam4qaiaadkeacaWGtbaaaa@6CD6@

Because of this new formula, we can predict the translation rate of the target gene more accurately that the thermodynamic model.





PART 3: STABILIZER

After we overcome the problem of designing tuner, we also need to select the appropriate stabilizer clearly.

As we all konw, different riboswitches need different stabilizers to stable its second structure. If someone want to use a new riboswitch or they want to find out the shortest stabilizer’s length, we can use the docking matrix to get target stabilizer.

The two sides of the docking matrix consist of sequences of riboswitch. If the bases on both sides are paired, then we put a 1 in the matrix. If the bases on both sides are not paired, then we put a 0 in the matrix. By predicting the secondary structure of a riboswitch in a sequence of only riboswitch. Thus, we have a matrix of 0, 1 and riboswitch sequences. We call this matrix the Ribo. From this matrix we get a total target value i(that is, all zeros and ones added together).


Fig A. The types and composition of docking matrix.


Next, we should find the appropriate GOI downstream of the riboswitch by blast or paper. We added this GOI to the riboswitch sequence. On this basis, we predict the secondary structure of the riboswitch. We can get a new matrix called Ribo-GOI. From this matrix we get a total target value j(that is, all zeros and ones added together).Finally,We added this GOI in multiples of three to the riboswitch sequence to predict the sequence pairing of riboswitch by adding the stabilizer. This new matrix called Ribo-STA.From this matrix we get a total target value k(that is, all zeros and ones added together). If the values of the same position of the matrix are the same(i.e., both are 0 or 1), we consider the position to be structurally correct.If the value are different at the same place in the matrix(i.e., not all of them are 0 or 1), we consider the riboswitch sequence to be mismatched. So, in the corresponding position in this matrix, we’re going to give it very high Penalty value to alert us.Therefore we also get a new total value k· from the Ribo-STA.

In conclusion, we get there matrices, Ribo, Ribo-GOI and Ribo-STA. By comparing three kinds of matrix, we select the most suitable stabilizer.We will discuss it in two ways:

If i-j≠0:

This represents a change in the secondary structure of the riboswitch by adding the GOI.However, STA of blast or paperbase has been proved to be successful in stabilizing secondary structure of riboswitch.Therefore, although its structure is not the best, it is relatively stable. So we calculate the different between i and k and i and j by the residual sum of squares . We want GOAL2/GOAL1 to be close to1 or 0. Close to 1 represents that the introduction of stabilizer’s stable effect is close to blast or paperbase. Close to 0 represents that the introduction of stabilizer’s stable effect is better than blast or paperbase. So on this comparison method, we can choose the relatively suitable stabilizer.

GOAL1=( i-j )**2,GOAL2=( i-k )**2 MathType@MTEF@5@5@+= feaagKart1ev2aaaMvgvLHfij5gC1rhimfMBNvxyNvgah9uqmfdi9a cxSvMz0Hci7L2ECjxB7bslGOwF91hiCjxANHgDPOIkYWYrpfetYasp GWfBLzgDOaYE7XLCTTxAG0ciR1xF9bcxYL2zOrxkQOImamXvP5wqSX MqHnxAJn0BKvguHDwzZbqefqvATv2CG4uz3bIuV1wyUbqedmvETj2B Sbqefm0B1jxALjhiov2DaebbnrfifHhDYfgatCvAUfeBSn0BKvguHD wzZbqegSSZmxoasaacH8YjY=vipgYlh9vqqj=hEeeu0xXdbba9frFj 0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9q8qqaq=dir=f0=yqaiVgFr 0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabauaakeaaqaaa aaaaaaWdbiaadEeacaWGpbGaamyqaiaadYeacaaIXaGaeyypa0Zaae Waa8aabaWdbiaadMgacaqGTaGaaeOAaaGaayjkaiaawMcaaiaacQca caGGQaGaaGOmaiaacYcacaWGhbGaam4taiaadgeacaWGmbGaaGOmai abg2da9maabmaapaqaa8qacaqGPbGaaeylaiaabUgaaiaawIcacaGL PaaacaGGQaGaaiOkaiaaikdaaaa@7782@


If i-j=0:

This indicates that the introduction of GOI stabilizes the secondary structure of the riboswitch. So

We divide i by k, and if it’s 1, that means the Ribo-STA corresponds to adding sequence is the most suitable stabilizer. If not, we’re going to pick the one that approaches 1.

Therefore, we translate the above procedure into code. with the help of python, we can get the suitable stabilizer sequence.we called them STA9, STA21, STA81 and STA129. Among them, the program tells us that STA9 and STA21 stabilizer changed the secondary structure of riboswitch, they all have high levels of punishment , in contrast, STA81 and STA129 both close to 1 ,STA81 and STA129 stable the secondary structure of riboswitch. From the following experimental result, we can successfully verify the correctness of the theory and program.












PART 4: ASRNA

In order to regulate the riboswitch’s function more plentiful , we want to use asRNA to re-activate or re-repress the riboswitch by hiding or exposing the RBS. The mechanism is shown in the figure below. And we can also discover that asRNA contains a hfq binding site. The hfe binding site will recruit a hfq protein to stable the asRNA. Under the hfq protein’s protection, the asRNA prevents degradation.


Fig A. Principle of activation and repressing of asRNA.



FIRST PART : REPRESSING ASRNA

A challenge in developing synthetic repressing asRNA design rules is a wide range of parameters that need to be considered: an Hfq binding site, thermodynamics, the asRNA length, the number of mismatches, and the binding location. Studying one parameter in isolation can be difficult because of the interdependence of each of these parameters. For example, changing the asRNA length or the number of mismatches can affect the thermodynamics of asRNA-mRNA interaction.


Fig B. Synthetic repressing asRNA design rules.


From this picture, we can know the asRNA design rules. The first rule is choose the best Hfq binding site. There are four Hfq binding sites, and the Spot42 had the highest overall repression with an average of 76.9% repression for all five TBRs. The MicC,MicF, and MicF M7.4 binding sites also performed well, with average repressions of 67.2%, 71.6%, and 69.8%, respectively. Because orthogonality is critical to the function of complex gene circuits and the MicF M7.4 had the lowest off-target effects.Finally we choose the Micf M7.4 Hfq binding site.

The second rule is about target binding region design.The asRNAs are composed of two regions: the TBR and the Hfq binding site.The goal is to develop a list of design rules that will allow for the de novo design of asRNAs that can achieve high levels of target repression, high orthogonality, and generally predictable behavior. The following categories of design parameters were investigated: (1) target location, (2)mismatch, (3) length, (4) thermodynamics.

target location.The first design characteristic that was varied was the binding location of the asRNA. Several previous studies have shown that asRNAs are most effective when they target the translation initiation region (TIR).And the TIR was divided into four regions (USD, SD,AUG, and C2-8).The first region is the portion of the 5′ untranslated region (5′UTR) that is upstream of the Shine Dalgarno sequence, and is hereafter referred to as the Upstream Shine Dalgarno (USD).The second region contains the Shine Dalgarno (SD) sequence in addition to the nucleotides between the SD sequence and the start codon. This region is simply referred to as the Shine Dalgarno region (SD),to indicate that it includes the SD sequence.The third region is the three-nucleotide start codon (AUG), and the fourth region is codons 2-8 in the coding region (C2-8).and then each TBR was designed to target one of seven different combinations of these four regions.TBRs were designed to target one of seven mRNA locations.The binding location has no impact on the repression efficiency of the asRNA, as each location showed approximately equal percent repression.


Fig C. Repressing asRNA's target region.


Mismatch. Though most engineered TBRs are perfectly complementary to their target gene, asRNAs in natural systems bind imperfectly to their target, resulting in several shorter (8-9 nt) regions of dsRNA. Because many naturally occurring asRNAs have some degree of mismatch,which raises the question as to what the maximum mismatch tolerance is for effective repression. The knowledge that TBRs with up to 15% mismatch can still effectively repress gene expression will be a helpful design guideline in preventing off-target effects.We introduced the mismatch by deleting nucleotides from the TBR sequence in order to introduce mismatches into the asRNA-mRNA complex.

Length. AsRNAs that had at least 15 nt of dsRNA had significantly higher target gene repression than asRNAs with less than 15 nt of dsRNA. Someone showing the correlation between percent repression and maximum dsRNA length in the asRNA-mRNA complex shows a sharp increase in repression, followed by a plateau in repression after approximately 15 nt of dsRNA.


Fig D. Repressing asRNA's length.


Thermodynamics. Thermodynamics were found to have a significant impact on repressing asRNA function. This result is expected, since several studies have already identified the importance of thermodynamics to the function of RNA regulators. In this study, two thermodynamic parameters were considered. First,the ΔG of the asRNA-mRNA complex was taken into consideration (ΔG Complex). This accounted for intermolecular forces between the asRNA and mRNA, and because this structure was designed to be stable, this value was very negative.This parameter was estimated using Mfold, as described previously.Second, the difference between the ΔG of the TBR (ΔG TBR), which was unstructured and generally very close to zero, and the ΔG of the asRNA-mRNA complex was calculated and called ΔG Complex Formation (ΔG CF). Lower ΔG values, which indicate more thermodynamically favorable structures, were correlated with higher repression, as expected.AsRNAs with ΔG CF values that were less than -40 kcal/mol had significantly higher repression than asRNAs whose ΔG CF values were higher than -40 kcal/mol.

In order to predict the asRNA’s repression efficiency, we use a multivariate model.

F( X 1 X 2 )=[ 0.38480.0068 X 1 0.0125 X 2 +ε ] MathType@MTEF@5@5@+= feaagKart1ev2aaaMHhvLHfij5gC1rhimfMBNvxyNvgagXfBLzgDOa YE7H1xX0hiTaYEy9Lm91hiCjxANHgDPaspGWfBLzgDBbYEW4Ym4qdo G0ciW4cmWydo7H1xX0hiTacmUatmYuZEy9Lm9bYkGWLDHjxzWnxAS9 MBG0hiCjxANHgDDbWexLMBbXgBcf2CPn2qVrwzqf2zLnharuavP1wz ZbItLDhis9wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGu eE0jxyamXvP5wqSX2qVrwzqf2zLnharyWYoZC5aibaieYlNi=xH8yi VC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbbG8FasPYRqj0= yi0dXdbba9pGe9xq=JbbG8A8frFve9Fve9Ff0dmeaabaqaciGacaGa aeqabaWaaeaaeaqbaOqaaabaaaaaaaaapeGaamOramaabmaapaqaa8 qacaWGybWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbiabgkHiTiaa dIfapaWaaSbaaSqaa8qacaaIYaaapaqabaaak8qacaGLOaGaayzkaa Gaeyypa0ZaamWaa8aabaWdbiaaicdacaGGUaGaaG4maiaaiIdacaaI 0aGaaGioaiabgkHiTiaaicdacaGGUaGaaGimaiaaicdacaaI2aGaaG ioaiaadIfapaWaaSbaaSqaa8qacaaIXaaapaqabaGcpeGaeyOeI0Ia aGimaiaac6cacaaIWaGaaGymaiaaikdacaaI1aGaamiwa8aadaWgaa WcbaWdbiaaikdaa8aabeaak8qacqGHRaWkcqaH1oqzaiaawUfacaGL Dbaaaaa@8586@

where F is the predicted repression efficiency, X1 is ΔGCF (in kcal/mol), X2 is percent mismatch (in %), and ε is the standard error (ε = 0.123). All coefficients had p values less than 0.001.



Depend on these design principles, we can design a lot of applicable asRNAs by python.First, we get the sequence which matches TIR 100%. The TIR is located in the USD, SD,AUG, and C2-8 region of the adda riboswitch in our work. Second, We introduced the mismatch by deleting nucleotides from the TBR sequence in order to introduce mismatches into the asRNA-mRNA complex. This mismatches is less than 15%. Third, we calculate the energy ΔGCF and made sure it was less than -40 kcal/mol. Finally, we get a lot of feasible asRNAs. We use the a multivariate model to predict the asRNA’s repression efficiency. And we choose the best one.

Here is a design example:

Repressing asRNA sequence:


Target sequence:

TTATAAAGAGAAGACTATGAATTACTTTGACCTGCCGAAG

Mismatch:1/39

ΔG Complex Formation=-54.381+1.1=-53.281

The results of repressing asRNA designed by the model are given as follows:






SECOND PART : ACTIVATING ASRNA

Compare to the repressing asRNA, there are no mature design principles in activating asRNA. On the present paper introduction, the design of an activating asRNA requires consideration of an Hfq binding site, thermodynamics, and the binding location.

As we mentioned before, we choose the Micf M7.4 Hfq binding site.

Thermodynamics were also found to have a significant impact on activating asRNA function. Here is a formula that describe the effect of thermodynamics on the activating asRNA.

Δ G kin ( A,B )=ΔG+α G p MathType@MTEF@5@5@+= feaagKart1ev2aaaMbgvLHfij5gC1rhimfMBNvxyNvgaCruzSrxyGS 3rFT3ECjxB7TwAU1xF91hxSvMz0Hci7felc1hiCjxANHgDPaspGWfr LXgDHbYrGSciCfgBWHwyGS3rFThxY12EW1xF9bWexLMBbXgBcf2CPn 2qVrwzqf2zLnharuavP1wzZbItLDhis9wBH5garmWu51MyVXgaruWq VvNCPvMCG4uz3bqee0evGueE0jxyamXvP5wqSX2qVrwzqf2zLnhary WYoZC5aibaieYlNi=xH8yiVC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vq aqpepm0xbbG8FasPYRqj0=yi0dXdbba9pGe9xq=JbbG8A8frFve9Fv e9Ff0dmeaabaqaciGacaGaaeqabaWaaeaaeaqbaOqaaabaaaaaaaaa peGaeuiLdqKaam4ra8aadaWgaaWcbaWdbiaabUgacaqGPbGaaeOBaa WdaeqaaOWdbmaabmaapaqaa8qacaWGbbGaaiilaiaadkeaaiaawIca caGLPaaacqGH9aqpcqqHuoarcaWGhbGaey4kaSIaeqySdeMaam4ra8 aadaWgaaWcbaWdbiaabchaa8aabeaaaaa@731B@

The A and B represent asRNA and TIR, respectively. The ΔGc represents the energy of A-B complex which is calculated by NUPACK_mfe_multi.(ΔGc Complex). The ΔG is ΔGc Complex-ΔGA-△GB. The ΔGA and ΔGB represent the energy of A and B which are calculated by NUPACK_mfe,respectively. The α is a length of the toehold of all targeted pairwise interactions AB. In order to design activated asRNA easily, α is a sequence length independent of the TBR stem ring which is considered to be a sequence length perfectly complementary to TIR(Fig E). And the α’s saturation levels is 6 (αsat=6 ). Gp=-1.28 Kcal/mol is the average contribution of a nucleotide in the toehold to the free energy of the initiation process.


Fig E.The definition of alpha.


The binding location are also have some difference between repressing asRNA and activated asRNA.No studies have clearly shown the effect of different binding location on activated asRNA.In order to ensure that displace the cis-repressive sequences easily,We don’t introduce the mismatch but make the activated asRNA’s TBR matches the cis-repressive sequences 100%.To maximize the translation rate, we also enforce that the four nucleotides upstream the RBS(standby site) and all downstream nucleotides be unpaired in the final hybridized structure.



Fig F. Principle of of activating asRNA.


In order to predict the activation, We used the fitting formula :


This formula is derived from experimental date.According to the formula, if we want to have a good activation effect of asRNA, the objective function △GKin needs to be less than -20 kal/mol.


Fig G. The activation times of △Gkin.

With these guidelines, we were able to design activated asRNA.

Here is a design example:

Activating asRNA sequence


△G= -34.500 kal/mol

Target sequence:

CCUCGCAUAAUUUCACUUCUUCAAUCCUCCCGUUAAAGAGGAGAAAUUAUGAAUG

△GA=-14.800 kal/mol

Complex sequence:


△GB=-62.581 kal/mol

As a result, △Gkin=-62.581+34.5+14.8-6*1.28=-20.961 kal/mol

The results of activating asRNA designed by the model are given as follows:


PART5:MOLECULAR DYNAMICS

This year, we want to use molecular dynamics method to explore conformational change of riboswitch between ON state and OFF state. First, we find the secondary structure of addA in OLOF ALLNéR’s research, we know the sequence of addA riboswitch can be divided into several parts: stem P1, P2 and P3, loop L2 and L3, junction-connecting segments J1-2, J2-3, and J3-1, as it shows in Fig A Then we get tertiary structure file of addA riboswitch from RCSB PDB, which is a protein data bank. The PDB ID we search for is 1Y26, and we use PyMOL to visualize it. The result is shown in Fig B.


Fig A. 2-D secondary structure of addA riboswitch.


Fig B. Using PyMOL to visualize addA tertiary structure. Green section represents P1 stem, Blue section represents junction area, red section represents the ligand 2-AP, yellow section represents P2 stem, orange section represents P3 stem, purple section represents L2 stem , cyan section represents L3 stem and five pink dots represents Mg2+.

According to OLOF ALLNéR’s study, the distance between loop L2 and L3 is extremely important in the conformational change of addA riboswitch between ‘ON’ state and ‘OFF’ state. When such distance is larger than 27 Å, we can consider the ribowitch as an ‘ON’ state. As it shows in Fig C, with the distance between loop L2 and L3 increasing, the structure of addA becomes more and more loose, which provides a chance for RBS to expose, leading to slow change of riboswitch conformation from ‘OFF’ state to ‘ON’ state.


Fig C. Riboswitch aptamer in complex with adenine. Showing the relationship between the L2-L3 distnce and addA riboswitch ON/OFF state.


Also, in another study of adenine riboswitch structure, we find the ligand is essential in stabilizing the addA riboswitch’s structure. When ligand adenine binds to the addA riboswitch, the whole system becomes stable, so the classical structure such as L2, L3 and P1 come into being (Fig D). However, when adenine disassociates with the addA riboswitch, riboswitch suffers a heavy loss of stable binding of ligand covalent bonds, so the whole systems start to collapse. The bonds between the ribonucleotide bases begin to break, leading to disappearance of the stems and loops. In the end, the structure of addA riboswitch totally vanishes and becomes a straight chain, which couldn’t accept ligand-binding to initiate the translation of downstream gene.


Fig D. With or without ligand-binding, the conformational change of addA riboswitch.


Inspired by their previous work, we decide to simulate the whole process of how the addA ribswitch opens and closes. We choose GROMACS as our molecular dynamics simulations program because of its widely-use and great manipulational convenience compared to other programs. Before using it, in order to add the missing atoms to addA PDB file, we use PDBfixer to fix it, so as to get the proper input for the next move. We use the output of PDBfixer as input for GROMACS and start the simulation. This simulation refers to the GROMACS Tutorial: Lysozyme in Water to make the simulation more accurate.

We choose CHARMM27 all-atom force field as this simulation’s force field. We use a cubic water box, put the RNA in the center, and add Na+ and Cl- to neutralize the charge of the system. For the energy minimization process, we simulate 50000 steps, while for the NVT and NPT equilibration process, we simulate 100ps. For the last step of Production MD, simulation lasts 10ns. Except the situation of ligand-binding, we also probe the situation of no ligand-binding. We use PyMOL to delete the ligand of addA, and use the same method to run the simulation for 30ns.


Fig E. RMSD analysis of addA with or without ligand-binding. (a)RMSD of addA with ligand-binding. (b)RMSD of addA without ligand-binding.


First, we detect root-mean-square deviation(RMSD) of the addA during the whole simulation. As it shows in Fig E, with time passing by, RMSD of addA with ligand-binding doesn’t vary too much. It fluctuates around 1nm, indicating during the whole process of simulation, the structure of addA riboswitch is very stable. Due to ligand-binding, 2-AP help stabilize the structure of addA, so structure remains almost the same as the original state. However, as for addA without ligand-binding, RMSD becomes larger and larger as times goes by, meaning the structure changes a lot, which proves that without ligand-binding, the structure of addA tends to becomes loose and lose its function gradually.



Fig F. Average distance of L2 and L3.(a)The distance for addA with ligand-binding. (b) The distance for addA without ligand-binding. (c) Schematically shows the distance between A35 and U63 using PyMOL.


Then we explores how the distance between of loop L2 and L3 varies with time. We difines the distance of A35(in loop L2) and U63(in loop L3) as the distance between L2 and L3. We see the same regulation as we find in the RMSD(Fig F): the distance doesn’t change much for addA with ligand-binding, but changes a lot for addA without ligand-binding, which further proves our previous assumption.

Then, we further explore how the length of stabilizer affects addA riboswitch’s structure. We take 9bp and 150bp stabilizer as example to demonstrate such influence. First, we use SimRNA to predict the structure of addA riboswitch with 9bp stabilizer. As we can see in Fig G, the structure becomes linear, losing all the classic structure of P1, L2 and L3. And most of all, it loses ligand-binding site, which indicates it can’t correspond to ligand-bind, not to speak of the “ON” state or “OFF” state of riboswitch.


Fig G. The predict structure of addA riboswitch with 9bp stabilizer.


Next, we use RNAcompser to predict the structure of addA riboswitch with 150bp stabilizer. The certain sequence and secondary structure we use are given in the Fig H. The result of RNAcomposer are shown in Fig I.


Fig H. The sequence and secondary structure (in dot and bracket) of addA riboswitch with 150bp stabilizer.


Fig I. The predict tertiary structure of addA riboswitch with 150bp stabilizer. The area with color is addA riboswitch, while the white area represents 150bp stabilizer.

We then use autodock to dock 2-AP to the structure, to see if it can get the right conformation. We get 50 results of docking, and we choose the conformation where 2-AP exists between P1 stem and junction area, which is consistent with crystal structure. (Fig J)


Fig J.The docking result of addA riboswitch with 150bp stabilizer. The area with color is addA riboswitch, the red molecule represents ligand 2-AP, the white area represents 150bp stabilizer.


For the last step, we put the docking result into GROMACS, to see the conformational change over time. The simulation lasts for 1ns. The result can be seen in the following video. From the video, we can see The structure is gradually elongated, but the ligand stays at the proper place. So the whole structure doesn’t collapse, indicating 150bp might stablize the structure of addA riboswitch.

Click hereto watch the vedio.
The docking result of addA riboswitch with 150bp stabilizer by autodock. Using ApowerREC to do the screen shot.

For the future work, we will change the simulation conditions, so as to see when the distance between L2 and L3 increases, the structure changes from “ON” state to “OFF” state and visualize it with PyMOL.





Reference:

1. Tian Tian and Howard M. Salis.A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons.Nucleic Acids Research, 2015, Vol. 43, No. 14 7137–7151.

2. Howard M. Salis, Ethan A. Mirsky, and Christopher A. Voigt.Automated Design of Synthetic Ribosome Binding Sites to Precisely Control Protein Expression.Nat Biotechnol. 2009 October ; 27(10): 946–950.

3. Nicholas J. Porubsky, Brian R. Wolfe, Justin S. Bois, and Niles A. Pierce.NUPACK 3.2 User Guide

Analysis and Design of Nucleic Acid Structures, Devices, and Systems.California Institute of Technology March 7, 2017.

4. Amin Espah Borujeni, Dennis M. Mishler, Jingzhi Wang,etc.Automated physics-based design of synthetic riboswitches from diverse RNA aptamers.Nucleic Acids Research, 2015.

5. Amin Espah Borujeni, Anirudh S. Channarasappa and Howard M. Salis.Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites.Nucleic Acids Research, 2014, Vol. 42, No. 4.

6. Uttam ROYMONDAL, Shibsankar DAS, and Satyabrata SAHOO.Predicting Gene Expression Level from Relative Codon Usage Bias: An Application to Escherichia coli Genome.DNA RESEARCH 16, 13–30, (2009).

7. Serganov A, Yuan YR, Pikovskaya O, et al. Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs. Chem Biol. 2004;11(12):1729–1741.

8. James E. Johnson, Jr., Francis E. Reyes,1 Jacob T. Polaski,etc.B12 cofactors directly stabilize an mRNA regulatory switch.Nature. 2012 Dec 6; 492(7427): 133–137.

9. Jacob T. Polaski,1 Samantha M. Webster,1 James E. Johnson, Jr,etc. Cobalamin riboswitches exhibit a broad range of ability to discriminate between methylcobalamin and adenosylcobalamin.J Biol Chem. 2017 Jul 14; 292(28): 11650–11658.

10. Antony Lussier, Laurène Bastet, Adrien Chauvier,etc.A Kissing Loop Is Important for btuB Riboswitch Ligand Sensing and Regulatory Control.J Biol Chem. 2015 Oct 30; 290(44): 26739–26751.

11. Jörg Vogel and Ben F. Luisi.Hfq and its constellation of RNA.Nature.2011.

12. Young Je Lee, Tae Seok Moon.Design rules of synthetic non-coding RNAs in bacteria.Methods 143 (2018) 58–69.

13. Allison Hoynes-O’Connor and Tae Seok Moon.Development of Design Rules for Reliable Antisense RNA Behavior in E. coli.ACS Synth. Biol. 2016, 5, 1441-1454.

14. Young Je Lee, Soo-Jung Kim, Matthew B. Amrofell,etc.Establishing a Multivariate Model for Predictable Antisense RNA Mediated Repression.ACS Synth. Biol. 2019, 8, 45-56.

15. Rodrigo G, Landrain TE, Majer E, Daro`s J-A, Jaramillo A (2013) Full Design Automation of Multi-State RNA Devices to Program Gene Expression Using Energy-Based Optimization. PLoS Comput Biol 9(8): e1003172.

16. Guillermo Rodrigo, Thomas E. Landrain, and Alfonso Jaramillo.De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells.PNAS ∣ September 18, 2012 ∣ vol. 109 ∣ no. 38 ∣ 15271–15276.

17. Yuta Sakai,Koichi Abe, Saki Nakashima,etc.Improving the Gene-Regulation Ability of Small RNAs by Scaffffold. Engineering in Escherichia coli .ACS Synth. Biol. 2014, 3, 152-162

18. Yin P, Choi HM, Calvert CR, Pierce NA (2008) Programming biomolecular self-assembly

pathways. Nature 451:318–322.

19. OLOF ALLNéR, LENNART NILSSON, and ALESSANDRA VILLA.Loop–loop interaction in an adenine-sensing riboswitch: A molecular dynamics study.RNA 19:916–926; 2013.

20. Priyakumar U D , Jr M K . Role of the Adenine Ligand on the Stabilization of the Secondary and Tertiary Interactions in the Adenine Riboswitch[J]. Journal of Molecular Biology, 2010, 396(5):0-1438.

21. Rolf Lutz and Hermann Bujard. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements. Nucleic Acids Research, 1997, Vol. 25, No. 6 1203–1210.