Team:OUC-China/Model

PROJECT

Description

Design

Results

Demonstrate
LAB WORK

Experiment

Safety
MODEL

Overview

ODE

Thermodynamic model

Stablizer

Antisense RNA

Molecular Dynamics
PARTS

List

Basic Part

Composite Part
HP

Human Practices

Public Engagement
TEAM

Members

Attributions

Medal requirement
COLLABORATIONS
SOFTWARE

Overview：

This year OUC-China are working on post-transcriptional regulation. At present, post-transcriptional regulation is still a hot topic of research, and various regulation methods and levels are what most people pursue. In our work, we want the modular riboswitch to have a wider level of regulation. Therefore, in order to explore what parameters play a crucial role in the translation level of our RiboLego system, we carried out sensitivity analysis on the parameters obtained by ordinary differential equation(ODE). And the result of the ODE model is that translation rate plays an important role in different translation levels.

So in order to change the translation rate, the thermodynamic model was introduced . By this model, we know that riboswitch and stabilizer is how to affect the downstream gene translation level.What’s more, this sequence-to-function thermodynamic model can predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios.Depending on the thermodynamic model, a variety of tuners were successfully designed.

The thermodynamic model shows that different stabilizers will affect downstream gene translation rate. So how to select the suitable stabilizer become crucial. We get the docking matrix corresponding score function, and compared these score values, so as to obtain the suitable to stabilizer sequence.

Through various tuners, we have more translation levels in our RiboLego system. But our aim is to have more diverse regulation, so we introduced asRNA to re-activate or re-repress the riboswitch by hiding or exposing the RBS. According to the following design factors: an Hfq binding site, thermodynamics, the asRNA length, the number of mismatches, and the binding location. Finally, we successfully designed the corresponding asRNA sequence. And the repressing or activating effect of the designed asRNA sequence were predicted by the sequence-effect function.

Finally, in order to explore the whole process of the riboswitch from off to on at the molecular level with or without ligands, we used molecular dynamics to simulate the whole process and explored how the riboswitch function in more detail by distinguishing its on ,off and transition states by molecular distance. And depend on it, we can understand the molecular mechanism which the riboswitch is turned on.

Fig A. Model overview.

PART 1: ODE

In our work, our aim is to have more different translation’s levels. Therefore, in order to understand which parameters play a key role in the RiboLego system, we used ODE model to simulate the whole process of the RiboLego system.

REACTION:

We can describe our RiboLego system to be followings:

① null -> Riboswitch_off

② Riboswitch_off + ligand <-> Riboswitch_on

③ Riboswitch_on -> STA + Riboswitch_on

④ Riboswitch_on -> GOI + Riboswitch_on

⑤ Riboswitch_off -> null

⑥ Riboswitch_on -> ligand

⑦ STA -> null

⑧ GOI -> null

ODES:

To simulate the dynamics of sfGFP, we use ordinary differential equations to model the reactions above. And ODEs are given as follows:

\frac{d [G O I]}{d t} = k p 2 [R i b o s w i t c h_o n] -kd3 [G O I]

For the readability, the complex symbol is simplified as:

SPECIES, SYMBOLS AND PARAMETERS

FIG A.Species, symbols and parameters.

The parament we used in the ODEs is listed in the following table:

FIG B.The parament we used in the ODEs.

SIMULATION

With the help of Simbiology toolbox in Matlab，we simulate our RiboLego system for 16h, the result can be seen in the Fig.A.

Fig.A The dynamics of sfGFP by model prediction

We compare the experimental data to the simulation:

Fig.B The comparison between experimental data and simulation data

SENSITIVITIES ANALYSIS

In order to explore what parameters play a crucial role in the translation level of our RiboLego system, we carried out sensitivity analysis on the parameters obtained by ordinary differential equation.

Fig C. The numerical integration of sensitivities of parameters in 14h

Through sensitivity analysis, we found that parameters k1, kp2, kdc1 and kd2 had an effect on the expression level of GOI. Since our work focuses on post-transcriptional regulation, we want to keep the riboswitch on-state and GOI translation level stable. Therefore, it is our best choice to change the translation rate to achieve the diversification of the regulation level of RiboLego.

To verify that kp2 has an effect on translation rate in the riboswitch systems.We went further. We did the following exploration:

The above differential equations are nonlinear equations. It is known from the theorem that there is a solution. However, after calculation by MATLAB program, there is no explicit solution for this nonlinear ordinary differential equations, so the model is reasonably linearized.

Since the amount of the ligand does hardly change during the reaction, we assume that the ligand is a constant.

This system of equations is a linear system of equations. By the MATLAB program , we know that the symbolic solution of GOI(E) with respect to time t:(a1= 289.5845,a2= 4.2734e-07,a3= 7.3077e+19,a4= -0.1561,a5= -90.2011,a6=- 2.9988e+05,a7= 1.5465e-49,a8= -7.7220e+41,a9= 8.7493e-06,C1 C2 C3 C4 are consants.)

The translation rate corresponds to Kp2 in our model. So, we only need to do local sensitivity analysis of Kp2. Using following model we will know how the Kp2 effect the value of GOI.

The result shows when we increase kp2, the GOI increase, which means we can use the translation rate to change the GOI expression level in following models.

PART 2: NEW THERMODYNAMIC MODEL

FIRST PART: THERMODYNAMIC MODEL WITH PROMOTER

By performing sensitivity analysis of all parameters in this process, we find that translation rate plays an important role in different translation levels. So how to change the translation rate is a big problem for us. Fortunately, from last year’s work, we found that the rate of translation initiation is positively proportional to the translation rate. And the thermodynamic model can calculate the rate of translation initiation.So we try to use thermodynamic model to design different tuners to achieve our goal.

As we all know, in bacteria,translational coupling provides a genetically encoded mechanism to control expression level ratios within multi-cistronic operons.From picture A, we can know the coupled translation of a bi-cistronic operon.The schematic shows the promoter,the (yellow box) 5' UTR, the (red) upstream coding sequence, the (orange box) intergenic region, the (green) downstream coding sequence, and the transcriptional terminator. The intergenic region contains an inhibitory RNA structure composed of the common RBS, repressing region and coupling region . We named it tuner. When the upstream coding sequence is translated, the ribosome unbinds tuner’s inhibitory structure to facilitate downstream coding sequence.

Fig A. Translational coupling encoded mechanism.

Therefore, the inhibitory structure of tuner directly affects the translation rate of downstream coding sequence. The thermodynamic model takes advantage of this principle to transform the inhibitory structure into translation rate. Finally, We have developed a sequence-to-function thermodynamic model of translational coupling to predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios. You only give the sequences, the model will use this sequences to calculate the sequences' translation initiation rate .

Now we have known the coupled translation model for bicistronic. So we should know how to use this thermodynamic model to predict GOI’s translation level. In the process of translation, the initial rate of translation determines the rate of translation process. This thermodynamic model will predict the GOI’s initial rate of translation in two parts. The first part is ribosome re-initiation. The second part is ribosome de novo initiation. Ribosome re-initiation occurs when an upstream elongating ribosome dissociates and reassembles at an intergenic region, followed by translation initiation of the downstream coding sequence. Ribosome de novo initiation occurs when cytosolic ribosomes assemble onto the mRNA at intergenic regions and initiate translation of the downstream coding sequence.

Fig B. Ribosome re-initiation and de novo ribosome initiation.

From this model, we get a formula to calculate the GOI’S initial rate of translation.The proposed thermodynamic model of translational coupling quantifies the molecular interactions that control both ribosome re-initiation and de novo initiation rates according to an inputted mRNA sequence. Model components and their corresponding sequence region are color coded(Fig C), including the free energies of inhibitory RNA structures (coupled and non-coupled), the intergenic distance, the free energy of the ribosome-bound state, and the upstream coding sequence’s translation rate.

Fig C. The proposed biophysical model of translational coupling quantifies the molecular interactions that control both ribosome re-initiation and de novo initiation rates according to an inputted mRNA sequence(tuner).

The ΔG total,upstream refers to the total binding free energy between the ribosome and 5’UTR, according to the equation:

\begin{array}{l} Δ G_{t o t a l, u p s t r e a m} = Δ G_{m R N A - r R N A} + Δ G_{s p a c i n g} + Δ G_{s t a r t} \\ + Δ G_{s \tan d b y} - Δ G_{m R N A} \end{array}

The translation initiation rate of the first CDs(upstream) is calculated according to

r_{1} \propto \exp (- β Δ G_{total})

The ΔG mRNA-rRNA refers to the free energy of folding for mRNA-rRNA complex, which is negative.

The ΔG spacing refers to an energetic penalty for ribosomal stretching or compression caused by a long or short spacer region between the 16S rRNA binding site and start codon, which is negative.

The ΔG start refers to the free energy for fMet-tRNA and start codon complex, which is negative.

The ΔG standby refers to an energy penalty determined by the mRNA standby site’s interactions with the ribosome’s platform domain, which is positive.

The ΔG mRNA refers to the free energy of folding for 5’UTR,which is negative.

Fig D. The regions that consisted of tuner and their corresponding free energy.

To sum up, each tuner sequence has its corresponding free energy, which represents the regulation level of each tuner. In order to accurately calculate the magnitude of each free energy, it is necessary to learn to distinguish the regions of the sequence.The reason why tuner is consisting of 35 nucleotides before and after the start codon,is that the model predictions do not improve when longer subsequences are considered. The tuner subsequence S1 consists of the max(1, nstart-35) to nstart nucleotides and the subsequence S2 consists of the max(1,nstart-35) to nstart+35 nucleotides, where nstart is the position of a start codon.

To calculate the ΔGmRNA:rRNA, ΔGstart, ΔGmRNA, and ΔGstandby free energies, we use the NUPACK suite of algorithms.And five energy can be seen vividly in the Fig D.These free energy calculations do not have any additional fitting or training parameters and explicitly depend on the mRNA sequence. In addition, the free energy terms are not orthogonal; changing a single nucleotide can potentially affect multiple energy terms.

ΔGmRNA.Using the NuPACK ‘mfe’ algorithm and Mfold parameters, the mfe configuration of

sequence S2 is calculated and its free energy is designated ΔGmRNA.

ΔG standby.The standby site is the 4 nt region upstream of the 16S rRNA binding site. The energy required to unfold the standby site is determining by calculating the mfe of sequence S2 with and without preventing the standby site from forming base pairs. The difference between these mfe's is

designated ΔGstandby. The mfe of sequence S2 without a standby site is considered as △GmRNA. To calculate the mfe of sequence S2 with a standby site that is constrained to be single-stranded, the sequence is first split into two subsequences, their mfes are each calculated, and then summed together. The two subsequences are the nucleotides nstart-35 to n3-4 and n3 to nstart+35, where n3 is the most 5′ base pair in the 16S rRNA binding site and 4 is the standby site length.

ΔGmRNA:rRNA.Using the NuPACK ‘subopt’ algorithm with Mfold 3.0 parameters at 37°C base pair configurations of the folded 16S rRNA and sequence S1 are enumerated, starting with the minimum free energy (mfe) configuration and continuing with suboptimal configurations, each with a corresponding ΔGmRNA:rRNA. In our work, we choose the mfe configuration's ΔGmRNA:rRNA .

ΔGspacing .The spacing(s) is a distance between SD sequence’s 3’ terminal and the start codon’s first base .When the 30S complex is stretched (s > 5 nt), the ΔGspacing is calculated according to the quadratic equation,

Δ G_{spacing} = C_{1} {(S - S_{opt})}^{2} + C_{2} (S - S_{opt})

where sopt = 5 nt, c1 = 0.048 kcal/mol/nt2, and c2 = 0.24 kcal/mol/nt. When the 30S complex

is compressed (s < 5 nt), the ΔGspacing is calculated according to the sigmoidal function,

Δ G_{spacing} = \frac{C_{1}}{{[1 + \exp (C_{2} (S - S_{opt} + 2))]}^{3}}

where c1 = 12.2 kcal/mol and c2 = 2.5 nt -1.

ΔGstart.The ΔGstart is -1.19 and -0.075 kcal/mol for AUG and GUG start codons, respectively

The translation initiation rate of the second CDs is then calculated by summing together two sources of translational coupling: ribosome re-initiation and ribosome de novo initiation according to

r_{2} α r_{reinitiation}^{(2)} + \exp (- β Δ G_{total}^{(2)})

The first part in formula can be calculated by the following formula:

Where the kreinitiation(d1,2) refers to the intergenic distance dependence and the kp refers to the proportionality constant between the ribosome assemble rate and the translation initiation rate.

For the kreinitiation(d1,2) is proved that can be calculate by the formula following:

Where the d=xstart-xstop-3, xstart refers to the first nucleotides in jth CDs’s start codon while the xstop refers to first nucleotides in i th CDs's stop codon.

Fig E. d>0 and d<0.

And it also points that the kp=10 .

The second part in formula can be calculated by the following formula:

Where the ΔG coupling refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacing region or the downstream footprint region of ribosome.(If we assume that elongating ribosomes are uniformly distributed across a coding sequence with a footprint of 30 nucleotides (fp), then each one will occupy a fraction of the coding sequence’s length (fp/L).)

The ΔG noncoupling refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the Fcoupling can be calculated by the following formula:

Fcoupling=1/ [1 + c * \exp (- β * Δ G t o t a l)]

Which the C is the ribosome-assisted unfolding coefficient. C=0.81 in our study.

We’ve understood about this thermodynamic model, but this model doesn’t apply for riboswitch. Because compared to normal 5’ UTR, the riboswitch’s structure will affect the upstream gene’s translation level in different ways. So we should build a new coupled translation model for bicistronic with riboswitch and use this new thermodynamic model to describe how the riboswitch affects the upstream and downstream gene’s translation level.

SECOND PART: THERMODYNAMIC MODEL WITH RIBOSWITCH

Fig F. The reaction coordinate diagram showing the states, energies, ligand-binding and translation initiation. Without ligand, the ribosome binds to a folded mRNA (state 1) with free energy ΔG total,OFF . When excess ligand is present, the co-transcriptional folding of mRNA and ligand through paths a ends with a mRNA–ligand complex (state 3) that binds the ribosome with free energy ΔG total,ON . The stability of the mRNA–ligand complex is controlled by the switching free energy (ΔG mRNA + ΔG ligand ).

As it shows in the Fig F, when there is no ligand, the conformational change leads to ΔG total,OFF, while ligands are excess, the conformational change leads to ΔG total,ON. In the later cases, it could be divided into three conditions. First, excessive concentration of intracellular ligands and stable binding of mRNA molecules to ligands. Second, the concentration of ligand is small, and the mRNA molecule can bind to ligand stably. Third, mRNA molecules cannot bind to ligands stably.

First situation: Excessive concentration of intracellular ligands and stable binding of mRNA molecules to ligands

Fig G. The change of free energies during the ligand binding and ribosome 30S subunit binding to the mRNA:No ligand binds,Ligand binds.

As it shows in the Fig G, conformational changes lead to the changes of free energies. The initial state is the minimum energy state, when ribosome 30S subunit binds to mRNA, it will change to the final state. We focus on the energy difference of final and initial state, that is, ΔG total, because it is proportionable to our desired translation initiation rate(ron and roff).

When there is no lgand:

Δ G_{t o t a l - O F F} = Δ G_{m R N A r R N A} + Δ G_{s t a r t} + Δ G_{s \tan d b y} - Δ G_{m R N A}

When ligand is binding:

The corresponding translation initiation rates r are:

r_{O F F} \propto \exp (- β Δ G_{total, OFF})

r_{O N} \propto \exp (- β Δ G_{total, O N})

In the riboswitch model, the calculating method for ΔGstart and ΔGspacing remains the same, however, there are quite a lot of changes for ΔGstandby， ΔGmRNA and ΔGmRNA:rRNA. The specific algorithm are listed as follows:

ΔGmRNA:rRNA and ΔGmRNA:ligand:rRNA: Using NuPACK ‘mfe’ algorithm to calculate the free energy when 16rRNA binds to mRNA.

ΔGmRNA : Using NuPACK ‘mfe’ algorithm to calculate the free energy of the initial state of riboswitch.

ΔGstandby: we use formula below to calculate this free energy:

Δ G_{s \tan dby} = Δ G_{distortion} + Δ G_{unfolding} + Δ G_{sliding}

Here, ΔG distortion describes ribosomal platform’s distortion energy penalty, and can be calculate by this:

Δ G_{distortion} = C_{1} {(A_{S})}^{2} + C_{2} (A_{2}) + C_{3}

A s = 15 + D + P - H

As means the available single-stranded surface area; H means the hairpin’s height; D&P stand for distal and proximal binding sites, while C1 , C2 and C3 are the fitted parameter values and equal to 0.038kcal/mol/nt2 ,-1.629kcal/mol/nt and 17.359kcal/mol.

ΔG sliding quantize the presence of a sliding mechanism for the ribosomal platform. It’s the energy penalty the intervening mRNA structures are not unfolded, but are pushed aside with an energetic cost. To calculate this energy, we sum up the heights of all downstream RNA structures, and multiplied it by the coefficient (c = 0.2 kcal/mol/nt) to yield ΔG sliding.

ΔG unfolding describes the free energy released by unfolding base pairs. According to Amin Espah Borujeni[1], Unfolding 1 bp increases the standby site module’s surface area by 3 nt and increases the unfolding energy penalty to 0.90 kcal/mol.

ΔGmRNA:ligand： In the case that we know the secondary structure of initial state of riboswitch in “ON” state, we use the RNAeval web server to calculates the energy of the ligand-binding riboswitch.

Second situation: the concentration of ligand is small, and the mRNA molecule can bind to ligand stably

During the ligand binding event, the mRNA folding conformation does switch from its minimum free energy at the OFF state to a more positive free energy at the ON state, resulting in a positive energy penalty ΔΔG mRNA

Δ Δ G_{m R N A} = Δ G_{m R N A, O N} - Δ G_{m R N A, O F F}

This energy penalty can be compensated by the free energy input from the binding of ligand to the mRNA, ΔG ligand (negative value)

Δ G_{ligand} = R T I n (K_{D})

Here, R is the universal gas constant, while KD is the ligand’s dissociation constant

K_{D} = \frac{C_{ligandfree} C_{m R N A free}}{C_{m R N A - ligand}} ≅ C_{ligand, total} \frac{C_{m R N A, free}}{C_{m R N A - ligand}}

When the switching free energy ( ΔΔG mRNA +ΔG ligand ) is negative, the mRNA-ligand binding is stable (first situation), and when the summation is positive the binding becomes metastable (second situation).

For the second situation, ON state translation initiation rate, rON,conc is:

Third situation , mRNA molecules cannot bind to ligands stably.

For the third situation, ON state translation initiation rate, rON,actual is

r_{O N, actual} = \frac{r_{O F F} + r_{O N, C O N C} \exp (- β (Δ Δ G_{m R N A} + Δ G_{ligand}))}{1 + \exp (- β (Δ Δ G_{m R N A} + Δ G_{ligand}))}

Δ Δ G_{m R N A} = Δ G_{m R N A, O N} - Δ G_{m R N A, O F F}

After figuring about all the three situations, we decide to combine the riboswitch part and bicistronic part together to yield the translation initiation rate (TIR) for our own GOI.

As we talk about in the first part, r2 can be calculated like this:

r_{2} {=r}_{reintiation}^{(2)} + r_{1}

r_{1} = \exp (- β Δ G_{total}^{(2)})

r_{reinitiation}^{(2)} {=k}_{p} k_{reinitiati}_{on} (d_{1, 2}) \exp (- β Δ C_{t o t a l}^{(1)})

Here, we regard the TIR of riboswitch (rON) as r1, so as to build a connection between the riboswitch model and bicistronic thermodynamic model.

Depend on the riboswitch model and bicistronic thermodynamic model , we can design different tuners and predict different tuners' functions under the riboswitch easily. you only give the sequences, the thermodynamic model will use this sequences to calculate the sequences' energy. This energy will predict the GOI's initial rate of translation.

The following calculation example is given:

Fig H. Division of tunerE region.

From Fig H，we can see that there are a lot of regions from the tunerE which are mentioned before.

The results calculated by the program are shown in the following Fig I:

Fig I. Individual free energy values associated with tunerE

We compared the predicted translation rate of all tuner with the experimental results, and the results are as follows:

Fig J.Comparison between experimental data and model results

Now, we know how to calculate the energy of the tuner, but how to get the suitable tuner’s sequence is a important problem for us. In our work, we use python to get the target sequence. First, we generated a random sequence which consists of the max(x, nstart-35) to nstart+35 nucleotides. The nstart is the position of a start codon. The x is the first base of the sequence. The reason why we choose a sequence longer than nstart-35 is that we need to generate a stable secondary structure to hide the RBS. It so happened that this generated structure’s length is longer than nstart- 35 to nstart+35 nucleotides.Second, we set a target energy of the tuner. The software will defined the different regions of tuner which we mentioned before. And then the software will use ‘NuPACK’ algorithm to calculate the corresponding energy. This energy will compared with the goal energy. If they are too far apart, the software will change the common RBS, repressing region and coupling region’s sequences to keep approaching the target energy until the gap is small.

However, this thermodynamic model can not accurately predict the translation level. The translation rate is indeed related to the translation initiation rate, but the translation elongation rate also has great influence on the translation level. Therefore, to solve this problem, we introduce model relative codon bias(RCB).

There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. This model was confirmed in Escherichia coli. Depend on this model we can assess the strength of RCB(RCBS) in genes as a guide to the gene’s expression level. RCBS is the overall score of a gene indicating the influence of RCB of each codon in a gene.

The RCBS in estimating gene expression level is related to codon usage difference of a gene with respect to biased nucleotide composition at the three codin sites. Let f(x,y,z) be the normalized codon frequency for the codon triplet(x,y,z) of a gene. Then the RCB of a codon triplet (x,y,z) in a gene is defined as:

d_{x y z} = \frac{f (x, y, z) - f_{1} (x) f_{2} (y) f_{3} (z)}{f_{1} (x) f_{2} (y) f_{3} (z)}

where f1(x) is the normalized frequency of x at the first codon position, f2(y) is the normalized frequency of y at the second codon position, and f3(z) is the normalized frequency of z at the third codon position of the gene. The frequencies f1, f2, f3 have been derived from the set of codon samples of a gene and the normalization of frequency is done over the gene length in codons, in an attempt to compensate for the expected increase of RCB with the total number of codons. We quantify the degree of codon bias of a gene in such a way that comparisons can be made both within and between genomes. As defined earlier, dxyz contains somewhat more quantitative information than others, since it considers codon usage as well as the base compositional bias. Then the expression measure of a gene is

R C B S = {(\prod_{i = 1}^{L} (1 + d_{x y z}^{i}))}^{1 / L} - 1

where

d^{i}_{x y z}

is the codon usage difference of ith codon of a gene. L is the number of codons in the gene.

Our analysis is based on the hypothesis that RCB reflects the level of gene expression. The expression measure of a gene in this approach is denoted by RCBS. RCBS s thus useful for comparing different sets of genes. In most of the case, highly expressed genes have RCBS ＞0.5，the majority of gene (63%) have RCBS values lying between 0.2 and 0.4 and RCBS value close to 0 indicates a lack of bias for the codons.

Here is a example:

GOI sequence: AAA CGA CAC AGG AAA, L=5

Finally, by combing the thermodynamic model with RCBS, a new formula describing the translation rate is obtained:

R (t r a n s l a t i o n r a t e) = r 2 * R C B S

Because of this new formula, we can predict the translation rate of the target gene more accurately that the thermodynamic model.

PART 3: STABILIZER

After we overcome the problem of designing tuner, we also need to select the appropriate stabilizer clearly.

As we all konw, different riboswitches need different stabilizers to stable its second structure. If someone want to use a new riboswitch or they want to find out the shortest stabilizer’s length, we can use the docking matrix to get target stabilizer.

The two sides of the docking matrix consist of sequences of riboswitch. If the bases on both sides are paired, then we put a 1 in the matrix. If the bases on both sides are not paired, then we put a 0 in the matrix. By predicting the secondary structure of a riboswitch in a sequence of only riboswitch. Thus, we have a matrix of 0, 1 and riboswitch sequences. We call this matrix the Ribo. From this matrix we get a total target value i(that is, all zeros and ones added together).

Fig A. The types and composition of docking matrix.

Next, we should find the appropriate GOI downstream of the riboswitch by blast or paper. We added this GOI to the riboswitch sequence. On this basis, we predict the secondary structure of the riboswitch. We can get a new matrix called Ribo-GOI. From this matrix we get a total target value j(that is, all zeros and ones added together).Finally,We added this GOI in multiples of three to the riboswitch sequence to predict the sequence pairing of riboswitch by adding the stabilizer. This new matrix called Ribo-STA.From this matrix we get a total target value k(that is, all zeros and ones added together). If the values of the same position of the matrix are the same(i.e., both are 0 or 1), we consider the position to be structurally correct.If the value are different at the same place in the matrix(i.e., not all of them are 0 or 1), we consider the riboswitch sequence to be mismatched. So, in the corresponding position in this matrix, we’re going to give it very high Penalty value to alert us.Therefore we also get a new total value k· from the Ribo-STA.

In conclusion, we get there matrices, Ribo, Ribo-GOI and Ribo-STA. By comparing three kinds of matrix, we select the most suitable stabilizer.We will discuss it in two ways:

If i-j≠0：

This represents a change in the secondary structure of the riboswitch by adding the GOI.However, STA of blast or paperbase has been proved to be successful in stabilizing secondary structure of riboswitch.Therefore, although its structure is not the best, it is relatively stable. So we calculate the different between i and k and i and j by the residual sum of squares . We want GOAL2/GOAL1 to be close to1 or 0. Close to 1 represents that the introduction of stabilizer’s stable effect is close to blast or paperbase. Close to 0 represents that the introduction of stabilizer’s stable effect is better than blast or paperbase. So on this comparison method, we can choose the relatively suitable stabilizer.

G O A L 1 = (i -j) * * 2, G O A L 2 = (i-k) * * 2

If i-j=0:

This indicates that the introduction of GOI stabilizes the secondary structure of the riboswitch. So

We divide i by k, and if it’s 1, that means the Ribo-STA corresponds to adding sequence is the most suitable stabilizer. If not, we’re going to pick the one that approaches 1.

Therefore, we translate the above procedure into code. with the help of python, we can get the suitable stabilizer sequence.we called them STA9, STA21, STA81 and STA129. Among them, the program tells us that STA9 and STA21 stabilizer changed the secondary structure of riboswitch, they all have high levels of punishment , in contrast, STA81 and STA129 both close to 1 ,STA81 and STA129 stable the secondary structure of riboswitch. From the following experimental result, we can successfully verify the correctness of the theory and program.

PART 4: ASRNA

In order to regulate the riboswitch’s function more plentiful , we want to use asRNA to re-activate or re-repress the riboswitch by hiding or exposing the RBS. The mechanism is shown in the figure below. And we can also discover that asRNA contains a hfq binding site. The hfe binding site will recruit a hfq protein to stable the asRNA. Under the hfq protein’s protection, the asRNA prevents degradation.

Fig A. Principle of activation and repressing of asRNA.

FIRST PART : REPRESSING ASRNA

A challenge in developing synthetic repressing asRNA design rules is a wide range of parameters that need to be considered: an Hfq binding site, thermodynamics, the asRNA length, the number of mismatches, and the binding location. Studying one parameter in isolation can be difficult because of the interdependence of each of these parameters. For example, changing the asRNA length or the number of mismatches can affect the thermodynamics of asRNA-mRNA interaction.

Fig B. Synthetic repressing asRNA design rules.

From this picture, we can know the asRNA design rules. The first rule is choose the best Hfq binding site. There are four Hfq binding sites, and the Spot42 had the highest overall repression with an average of 76.9% repression for all five TBRs. The MicC,MicF, and MicF M7.4 binding sites also performed well, with average repressions of 67.2%, 71.6%, and 69.8%, respectively. Because orthogonality is critical to the function of complex gene circuits and the MicF M7.4 had the lowest off-target effects.Finally we choose the Micf M7.4 Hfq binding site.

The second rule is about target binding region design.The asRNAs are composed of two regions: the TBR and the Hfq binding site.The goal is to develop a list of design rules that will allow for the de novo design of asRNAs that can achieve high levels of target repression, high orthogonality, and generally predictable behavior. The following categories of design parameters were investigated: (1) target location, (2)mismatch, (3) length, (4) thermodynamics.

target location.The first design characteristic that was varied was the binding location of the asRNA. Several previous studies have shown that asRNAs are most effective when they target the translation initiation region (TIR).And the TIR was divided into four regions (USD, SD,AUG, and C2-8).The first region is the portion of the 5′ untranslated region (5′UTR) that is upstream of the Shine Dalgarno sequence, and is hereafter referred to as the Upstream Shine Dalgarno (USD).The second region contains the Shine Dalgarno (SD) sequence in addition to the nucleotides between the SD sequence and the start codon. This region is simply referred to as the Shine Dalgarno region (SD),to indicate that it includes the SD sequence.The third region is the three-nucleotide start codon (AUG), and the fourth region is codons 2-8 in the coding region (C2-8).and then each TBR was designed to target one of seven different combinations of these four regions.TBRs were designed to target one of seven mRNA locations.The binding location has no impact on the repression efficiency of the asRNA, as each location showed approximately equal percent repression.

Fig C. Repressing asRNA's target region.

Mismatch. Though most engineered TBRs are perfectly complementary to their target gene, asRNAs in natural systems bind imperfectly to their target, resulting in several shorter (8-9 nt) regions of dsRNA. Because many naturally occurring asRNAs have some degree of mismatch,which raises the question as to what the maximum mismatch tolerance is for effective repression. The knowledge that TBRs with up to 15% mismatch can still effectively repress gene expression will be a helpful design guideline in preventing off-target effects.We introduced the mismatch by deleting nucleotides from the TBR sequence in order to introduce mismatches into the asRNA-mRNA complex.

Length. AsRNAs that had at least 15 nt of dsRNA had significantly higher target gene repression than asRNAs with less than 15 nt of dsRNA. Someone showing the correlation between percent repression and maximum dsRNA length in the asRNA-mRNA complex shows a sharp increase in repression, followed by a plateau in repression after approximately 15 nt of dsRNA.

Fig D. Repressing asRNA's length.

Thermodynamics. Thermodynamics were found to have a significant impact on repressing asRNA function. This result is expected, since several studies have already identified the importance of thermodynamics to the function of RNA regulators. In this study, two thermodynamic parameters were considered. First,the ΔG of the asRNA-mRNA complex was taken into consideration (ΔG Complex). This accounted for intermolecular forces between the asRNA and mRNA, and because this structure was designed to be stable, this value was very negative.This parameter was estimated using Mfold, as described previously.Second, the difference between the ΔG of the TBR (ΔG TBR), which was unstructured and generally very close to zero, and the ΔG of the asRNA-mRNA complex was calculated and called ΔG Complex Formation (ΔG CF). Lower ΔG values, which indicate more thermodynamically favorable structures, were correlated with higher repression, as expected.AsRNAs with ΔG CF values that were less than -40 kcal/mol had significantly higher repression than asRNAs whose ΔG CF values were higher than -40 kcal/mol.

In order to predict the asRNA’s repression efficiency, we use a multivariate model.

F (X_{1} - X_{2}) = [0.3848 - 0.0068 X_{1} - 0.0125 X_{2} + ε]

where F is the predicted repression efficiency, X1 is ΔGCF (in kcal/mol), X2 is percent mismatch (in %), and ε is the standard error (ε = 0.123). All coefficients had p values less than 0.001.

Depend on these design principles, we can design a lot of applicable asRNAs by python.First, we get the sequence which matches TIR 100%. The TIR is located in the USD, SD,AUG, and C2-8 region of the adda riboswitch in our work. Second, We introduced the mismatch by deleting nucleotides from the TBR sequence in order to introduce mismatches into the asRNA-mRNA complex. This mismatches is less than 15%. Third, we calculate the energy ΔGCF and made sure it was less than -40 kcal/mol. Finally, we get a lot of feasible asRNAs. We use the a multivariate model to predict the asRNA’s repression efficiency. And we choose the best one.

Here is a design example:

Repressing asRNA sequence：

Target sequence：

TTATAAAGAGAAGACTATGAATTACTTTGACCTGCCGAAG

Mismatch:1/39

ΔG Complex Formation=-54.381+1.1=-53.281

The results of repressing asRNA designed by the model are given as follows:

SECOND PART : ACTIVATING ASRNA

Compare to the repressing asRNA, there are no mature design principles in activating asRNA. On the present paper introduction, the design of an activating asRNA requires consideration of an Hfq binding site, thermodynamics, and the binding location.

As we mentioned before, we choose the Micf M7.4 Hfq binding site.

Thermodynamics were also found to have a significant impact on activating asRNA function. Here is a formula that describe the effect of thermodynamics on the activating asRNA.

Δ G_{kin} (A, B) = Δ G + α G_{p}

The A and B represent asRNA and TIR, respectively. The ΔGc represents the energy of A-B complex which is calculated by NUPACK_mfe_multi.(ΔGc Complex). The ΔG is ΔGc Complex-ΔGA-△GB. The ΔGA and ΔGB represent the energy of A and B which are calculated by NUPACK_mfe,respectively. The α is a length of the toehold of all targeted pairwise interactions AB. In order to design activated asRNA easily, α is a sequence length independent of the TBR stem ring which is considered to be a sequence length perfectly complementary to TIR(Fig E). And the α’s saturation levels is 6 (αsat=6 ). Gp=-1.28 Kcal/mol is the average contribution of a nucleotide in the toehold to the free energy of the initiation process.

Fig E.The definition of alpha.

The binding location are also have some difference between repressing asRNA and activated asRNA.No studies have clearly shown the effect of different binding location on activated asRNA.In order to ensure that displace the cis-repressive sequences easily,We don’t introduce the mismatch but make the activated asRNA’s TBR matches the cis-repressive sequences 100%.To maximize the translation rate, we also enforce that the four nucleotides upstream the RBS(standby site) and all downstream nucleotides be unpaired in the final hybridized structure.

Fig F. Principle of of activating asRNA.

In order to predict the activation, We used the fitting formula :

This formula is derived from experimental date.According to the formula, if we want to have a good activation effect of asRNA, the objective function △GKin needs to be less than -20 kal/mol.

Fig G. The activation times of △Gkin.

With these guidelines, we were able to design activated asRNA.

Here is a design example:

Activating asRNA sequence

△G= -34.500 kal/mol

Target sequence：

CCUCGCAUAAUUUCACUUCUUCAAUCCUCCCGUUAAAGAGGAGAAAUUAUGAAUG

△GA=-14.800 kal/mol

Complex sequence:

△GB=-62.581 kal/mol

As a result, △Gkin=-62.581+34.5+14.8-6*1.28=-20.961 kal/mol

The results of activating asRNA designed by the model are given as follows:

PART5:MOLECULAR DYNAMICS

This year, we want to use molecular dynamics method to explore conformational change of riboswitch between ON state and OFF state. First, we find the secondary structure of addA in OLOF ALLNéR’s research, we know the sequence of addA riboswitch can be divided into several parts: stem P1, P2 and P3, loop L2 and L3, junction-connecting segments J1-2, J2-3, and J3-1, as it shows in Fig A Then we get tertiary structure file of addA riboswitch from RCSB PDB, which is a protein data bank. The PDB ID we search for is 1Y26, and we use PyMOL to visualize it. The result is shown in Fig B.

Fig A. 2-D secondary structure of addA riboswitch.

Fig B. Using PyMOL to visualize addA tertiary structure. Green section represents P1 stem, Blue section represents junction area, red section represents the ligand 2-AP, yellow section represents P2 stem, orange section represents P3 stem, purple section represents L2 stem , cyan section represents L3 stem and five pink dots represents Mg2+.

According to OLOF ALLNéR’s study, the distance between loop L2 and L3 is extremely important in the conformational change of addA riboswitch between ‘ON’ state and ‘OFF’ state. When such distance is larger than 27 Å, we can consider the ribowitch as an ‘ON’ state. As it shows in Fig C, with the distance between loop L2 and L3 increasing, the structure of addA becomes more and more loose, which provides a chance for RBS to expose, leading to slow change of riboswitch conformation from ‘OFF’ state to ‘ON’ state.

Fig C. Riboswitch aptamer in complex with adenine. Showing the relationship between the L2-L3 distnce and addA riboswitch ON/OFF state.

Also, in another study of adenine riboswitch structure, we find the ligand is essential in stabilizing the addA riboswitch’s structure. When ligand adenine binds to the addA riboswitch, the whole system becomes stable, so the classical structure such as L2, L3 and P1 come into being (Fig D). However, when adenine disassociates with the addA riboswitch, riboswitch suffers a heavy loss of stable binding of ligand covalent bonds, so the whole systems start to collapse. The bonds between the ribonucleotide bases begin to break, leading to disappearance of the stems and loops. In the end, the structure of addA riboswitch totally vanishes and becomes a straight chain, which couldn’t accept ligand-binding to initiate the translation of downstream gene.

Fig D. With or without ligand-binding, the conformational change of addA riboswitch.

Inspired by their previous work, we decide to simulate the whole process of how the addA ribswitch opens and closes. We choose GROMACS as our molecular dynamics simulations program because of its widely-use and great manipulational convenience compared to other programs. Before using it, in order to add the missing atoms to addA PDB file, we use PDBfixer to fix it, so as to get the proper input for the next move. We use the output of PDBfixer as input for GROMACS and start the simulation. This simulation refers to the GROMACS Tutorial: Lysozyme in Water to make the simulation more accurate.

We choose CHARMM27 all-atom force field as this simulation’s force field. We use a cubic water box, put the RNA in the center, and add Na+ and Cl- to neutralize the charge of the system. For the energy minimization process, we simulate 50000 steps, while for the NVT and NPT equilibration process, we simulate 100ps. For the last step of Production MD, simulation lasts 10ns. Except the situation of ligand-binding, we also probe the situation of no ligand-binding. We use PyMOL to delete the ligand of addA, and use the same method to run the simulation for 30ns.

Fig E. RMSD analysis of addA with or without ligand-binding. (a)RMSD of addA with ligand-binding. (b)RMSD of addA without ligand-binding.

First, we detect root-mean-square deviation(RMSD) of the addA during the whole simulation. As it shows in Fig E, with time passing by, RMSD of addA with ligand-binding doesn’t vary too much. It fluctuates around 1nm, indicating during the whole process of simulation, the structure of addA riboswitch is very stable. Due to ligand-binding, 2-AP help stabilize the structure of addA, so structure remains almost the same as the original state. However, as for addA without ligand-binding, RMSD becomes larger and larger as times goes by, meaning the structure changes a lot, which proves that without ligand-binding, the structure of addA tends to becomes loose and lose its function gradually.

Fig F. Average distance of L2 and L3.(a)The distance for addA with ligand-binding. (b) The distance for addA without ligand-binding. (c) Schematically shows the distance between A35 and U63 using PyMOL.

Then we explores how the distance between of loop L2 and L3 varies with time. We difines the distance of A35(in loop L2) and U63(in loop L3) as the distance between L2 and L3. We see the same regulation as we find in the RMSD(Fig F): the distance doesn’t change much for addA with ligand-binding, but changes a lot for addA without ligand-binding, which further proves our previous assumption.

Then, we further explore how the length of stabilizer affects addA riboswitch’s structure. We take 9bp and 150bp stabilizer as example to demonstrate such influence. First, we use SimRNA to predict the structure of addA riboswitch with 9bp stabilizer. As we can see in Fig G, the structure becomes linear, losing all the classic structure of P1, L2 and L3. And most of all, it loses ligand-binding site, which indicates it can’t correspond to ligand-bind, not to speak of the “ON” state or “OFF” state of riboswitch.

Fig G. The predict structure of addA riboswitch with 9bp stabilizer.

Next, we use RNAcompser to predict the structure of addA riboswitch with 150bp stabilizer. The certain sequence and secondary structure we use are given in the Fig H. The result of RNAcomposer are shown in Fig I.

Fig H. The sequence and secondary structure (in dot and bracket) of addA riboswitch with 150bp stabilizer.

Fig I. The predict tertiary structure of addA riboswitch with 150bp stabilizer. The area with color is addA riboswitch, while the white area represents 150bp stabilizer.

We then use autodock to dock 2-AP to the structure, to see if it can get the right conformation. We get 50 results of docking, and we choose the conformation where 2-AP exists between P1 stem and junction area, which is consistent with crystal structure. (Fig J)

Fig J.The docking result of addA riboswitch with 150bp stabilizer. The area with color is addA riboswitch, the red molecule represents ligand 2-AP, the white area represents 150bp stabilizer.

For the last step, we put the docking result into GROMACS, to see the conformational change over time. The simulation lasts for 1ns. The result can be seen in the following video. From the video, we can see The structure is gradually elongated, but the ligand stays at the proper place. So the whole structure doesn’t collapse, indicating 150bp might stablize the structure of addA riboswitch.

Click hereto watch the vedio.

The docking result of addA riboswitch with 150bp stabilizer by autodock. Using ApowerREC to do the screen shot.

For the future work, we will change the simulation conditions, so as to see when the distance between L2 and L3 increases, the structure changes from “ON” state to “OFF” state and visualize it with PyMOL.

Reference:

1. Tian Tian and Howard M. Salis.A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons.Nucleic Acids Research, 2015, Vol. 43, No. 14 7137–7151.

2. Howard M. Salis, Ethan A. Mirsky, and Christopher A. Voigt.Automated Design of Synthetic Ribosome Binding Sites to Precisely Control Protein Expression.Nat Biotechnol. 2009 October ; 27(10): 946–950.

3. Nicholas J. Porubsky, Brian R. Wolfe, Justin S. Bois, and Niles A. Pierce.NUPACK 3.2 User Guide

Analysis and Design of Nucleic Acid Structures, Devices, and Systems.California Institute of Technology March 7, 2017.

4. Amin Espah Borujeni, Dennis M. Mishler, Jingzhi Wang,etc.Automated physics-based design of synthetic riboswitches from diverse RNA aptamers.Nucleic Acids Research, 2015.

5. Amin Espah Borujeni, Anirudh S. Channarasappa and Howard M. Salis.Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites.Nucleic Acids Research, 2014, Vol. 42, No. 4.

6. Uttam ROYMONDAL, Shibsankar DAS, and Satyabrata SAHOO.Predicting Gene Expression Level from Relative Codon Usage Bias: An Application to Escherichia coli Genome.DNA RESEARCH 16, 13–30, (2009).

7. Serganov A, Yuan YR, Pikovskaya O, et al. Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs. Chem Biol. 2004;11(12):1729–1741.

8. James E. Johnson, Jr., Francis E. Reyes,1 Jacob T. Polaski,etc.B12 cofactors directly stabilize an mRNA regulatory switch.Nature. 2012 Dec 6; 492(7427): 133–137.

9. Jacob T. Polaski,1 Samantha M. Webster,1 James E. Johnson, Jr,etc. Cobalamin riboswitches exhibit a broad range of ability to discriminate between methylcobalamin and adenosylcobalamin.J Biol Chem. 2017 Jul 14; 292(28): 11650–11658.

10. Antony Lussier, Laurène Bastet, Adrien Chauvier,etc.A Kissing Loop Is Important for btuB Riboswitch Ligand Sensing and Regulatory Control.J Biol Chem. 2015 Oct 30; 290(44): 26739–26751.

11. Jörg Vogel and Ben F. Luisi.Hfq and its constellation of RNA.Nature.2011.

12. Young Je Lee, Tae Seok Moon.Design rules of synthetic non-coding RNAs in bacteria.Methods 143 (2018) 58–69.

13. Allison Hoynes-O’Connor and Tae Seok Moon.Development of Design Rules for Reliable Antisense RNA Behavior in E. coli.ACS Synth. Biol. 2016, 5, 1441-1454.

14. Young Je Lee, Soo-Jung Kim, Matthew B. Amrofell,etc.Establishing a Multivariate Model for Predictable Antisense RNA Mediated Repression.ACS Synth. Biol. 2019, 8, 45-56.

15. Rodrigo G, Landrain TE, Majer E, Daro`s J-A, Jaramillo A (2013) Full Design Automation of Multi-State RNA Devices to Program Gene Expression Using Energy-Based Optimization. PLoS Comput Biol 9(8): e1003172.

16. Guillermo Rodrigo, Thomas E. Landrain, and Alfonso Jaramillo.De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells.PNAS ∣ September 18, 2012 ∣ vol. 109 ∣ no. 38 ∣ 15271–15276.

17. Yuta Sakai,Koichi Abe, Saki Nakashima,etc.Improving the Gene-Regulation Ability of Small RNAs by Scaffffold. Engineering in Escherichia coli .ACS Synth. Biol. 2014, 3, 152-162

18. Yin P, Choi HM, Calvert CR, Pierce NA (2008) Programming biomolecular self-assembly

pathways. Nature 451:318–322.

19. OLOF ALLNéR, LENNART NILSSON, and ALESSANDRA VILLA.Loop–loop interaction in an adenine-sensing riboswitch: A molecular dynamics study.RNA 19:916–926; 2013.

20. Priyakumar U D , Jr M K . Role of the Adenine Ligand on the Stabilization of the Secondary and Tertiary Interactions in the Adenine Riboswitch[J]. Journal of Molecular Biology, 2010, 396(5):0-1438.

21. Rolf Lutz and Hermann Bujard. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements. Nucleic Acids Research, 1997, Vol. 25, No. 6 1203–1210.