Document

1. Protein-split Model

Introduction

Proteins are composed of relatively independent subunits or secondary structures. After subtilisin-modified ribonuclease was split and reassembled in 1958， multifarious proteins was split successfully certificated by experiments[1]. However, Ddifferent split sites produce two different parts, and the assembly effect will be different. Renilla luciferase, wildely used in the reporter systems is split in our experiment. , The the two parts of the split Renilla luciferase need to be self-assembled under the guidance of the optical protein switch. Different split sites produce two different parts, and the assembly effect will be different. The split site of Renilla luciferase we choose isis choosed from the result of experiments by Paulmurugan R and Gambhir S S[2], which is time-consuming and laborious, so can we predict better sites with calculational methods? We have searched a large number of previous experimental results which hint some kind of regularities (Table.1). The split site is closely related to the secondary structure of the protein, and the iappropriate good site is generally located in the loop region of the protein. In addition, the location of the split sites seems to be related to the factors such as Sequence conservation, Length of coil regions, and Center distance. Based on these rules we summarized, Protein-split Model is established to predict better protein-split positions. We chose nanoLuc protein in the iGEM library to verify our model. The combination of the two nanoLuc parts obtained by the split position of our model is better than their the original one verified by our experiment that means our model is convincing(Table.2 and Table.3).

Design

We conclude that the protein-split sites may be related to three factors, Secondary structure (factor 1), Length of coil region(factor 2), Center distance(factor 3)(Fig.1), and found 65 split sites (Table.1).

Fig.1 Factor1: Protein is cut in coil region which is represented classified as index 1, In in the same way, the boundary between coil region and α-helix is classified represent as index 2, ，the boundary between coil region and β-strand is classifiedrepresent as index 3，in α-helix or β-strand is classified represent as index 4. Factor2：If protein-split site is one of the first three types, the factor value was defines by counting the length of the coil region. The length of the coil region is 0 if protein-split site is the index 4forth type. Factor3: Center distance was defined as the ratio of the shorter fragment length to half of the full length of the protein(center distence=sublength/(0.5×fulllength).

We can use a typical BP neural network (Dichotomous model) to help us find good split sites. If we can find a suitable set of weights{w1, w2}, at the same time, the inputs (I1, I2, I3) = (Factor1, Factor2, Factor3), output(O1, O2)=(1,0), then the protein-split sites selection is actually solved. For any protein, we only need to download its secondary structure from the PDB database, each amino acid residue corresponds to a set of factors. As long as the factors are input into the neural network, a label will be output: T:1:[1, 0] ; F: 0: [0, 1] (Fig. 2). We find a set of appropriate weights {w1, w2} by training the sample to complete our model, and do the sensitivity and specification analysis.

Fig.2 Feedforward neural network(FNN)

Algorithm

See connected protein-split_supplement1.

Database

Input see connected protein-split_supplement2,Output see connected protein-split_supplement3.

Result

Weight matrix:

w1

1	2	3
-0.0306	0.233	0.5924
0.6037	-0.6431	-0.5429
1.3603	-0.6622	0.5412
0.6023	-0.3277	-0.5616

b1

0.7124	-0.464	-0.9178	-0.5768

w2

1	2
0.3485	-0.1632
-0.2581	0.1562
-0.8925	0.9408
-0.2699	-0.1891

b2

0.7532	0.1274

We sample 20 samples as test sets, and the results are as follows.

Fig.3 Confusion matrix

Performance Evaluation Criteria

Conclusion

The model presents to be highly sensitive, but has low specificity which produces high ratios of false positives, meaning that many inappropriate protein-split sites are recognized as appropriate protein-split sites (Fig.3). However, in general, appropriate split sites are much less than inappropriate split sites in a protein with hundreds of amino acid residues, so the low specificity is inevitable. We only need to ensure that very few appropriate sites are not mistaken for inappropriate sites. This model will provide several predicted appropriate sites to instruct experiment, as long as there are one true appropriate sites. Considering our model has a very low false negative, this model is fabulous in practical applications.

Experimental Verification

We chose nanoLuc in iGEM library to split and test our model. Based on the protein-split model, we calculated an optimal split site and predicted that the original split site is not ideal enough (Fig.4). The experimental results showed that the ability of the two parts of nanoLuc(st-1 and sc-1) cut at our predicted split site to self-assemble is much greater than the original split site(st-2 and sc-2)(Table.2 and Table.3).

Fig.4 Predicted split site and the original split site in the sequence of nanoLuc.

Reference

[1]Richards F M. On the enzymic activity of subtilisin-modified ribonuclease[J]. Proceedings of the National Academy of Sciences of the United States of America, 1958, 44(2): 162.
[2]Paulmurugan R, Gambhir S S. Monitoring protein− protein interactions using split synthetic renilla luciferase protein-fragment-assisted complementation[J]. Analytical chemistry, 2003, 75(7): 1584-1589.
[3]Han T, Chen Q, Liu H. Engineered photoactivatable genetic switches based on the bacterium phage T7 RNA polymerase[J]. ACS synthetic biology, 2016, 6(2): 357-366.
[4]Ghosh I, Hamilton A D, Regan L. Antiparallel leucine zipper-directed protein reassembly: application to the green fluorescent protein[J]. Journal of the American Chemical Society, 2000, 122(23): 5658-5659.
[5]Taniuchi H, Anfinsen C B, Sodja A. Nuclease-T: an active derivative of staphylococcal nuclease composed of two noncovalently bonded peptide fragments[J]. Proceedings of the National Academy of Sciences of the United States of America, 1967, 58(3): 1235.
[6]de Prat Gay G, Ruiz-Sanz J, Davis B, et al. The structure of the transition state for the association of two fragments of the barley chymotrypsin inhibitor 2 to generate native-like protein: implications for mechanisms of protein folding[J]. Proceedings of the National Academy of Sciences, 1994, 91(23): 10943-10946.
[7]Bae C, Suchyna T M, Ziegler L, et al. Human PIEZO1 ion channel functions as a split protein[J]. PloS one, 2016, 11(3): e0151289.
[8]Johnsson N, Varshavsky A. Split ubiquitin as a sensor of protein interactions in vivo[J]. Proceedings of the National Academy of Sciences, 1994, 91(22): 10340-10344.
[9]Pelletier J N, Campbell-Valois F X, Michnick S W. Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally designed fragments[J]. Proceedings of the National Academy of Sciences, 1998, 95(21): 12141-12146.
[10]Galarneau A, Primeau M, Trudeau L E, et al. β-Lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein–protein interactions[J]. Nature biotechnology, 2002, 20(6): 619.
[11]Massoud T F, Paulmurugan R, Gambhir S S. A molecularly engineered split reporter for imaging protein-protein interactions with positron emission tomography[J]. Nature medicine, 2010, 16(8): 921.
[12]Martell J D, Yamagata M, Deerinck T J, et al. A split horseradish peroxidase for the detection of intercellular protein–protein interactions and sensitive visualization of synapses[J]. Nature biotechnology, 2016, 34(7): 774.
[13]Kaihara A, Kawai Y, Sato M, et al. Locating a Protein− Protein Interaction in Living Cells via Split Renilla Luciferase Complementation[J]. Analytical chemistry, 2003, 75(16): 4176-4181.
[14]Paulmurugan R, Gambhir S S. Novel fusion protein approach for efficient high-throughput screening of small molecule–mediating protein-protein interactions in cells and living animals[J]. Cancer research, 2005, 65(16): 7413-7420.
[15]Kim S B, Kanno A, Ozawa T, et al. Nongenomic activity of ligands in the association of androgen receptor with SRC[J]. ACS chemical biology, 2007, 2(7): 484-492.
[16]Luker K E, Smith M C P, Luker G D, et al. Kinetics of regulated protein–protein interactions revealed with firefly luciferase complementation imaging in cells and living animals[J]. Proceedings of the National Academy of Sciences, 2004, 101(33): 12288-12293.
[17]Paulmurugan R, Umezawa Y, Gambhir S S. Noninvasive imaging of protein–protein interactions in living subjects by using reporter protein complementation and reconstitution strategies[J]. Proceedings of the National Academy of Sciences, 2002, 99(24): 15608-15613.
[18]Wehr M C, Laage R, Bolz U, et al. Monitoring regulated protein-protein interactions using split TEV[J]. Nature methods, 2006, 3(12): 985.

2 Microalgae movement model

Introduction

Our project aims to take advantage of the phototaxis characteristics of microalgae to construct a drug carrier which can be directed to the diseased cells under the guidance of light. We first analyzed the movement of microalgae in water. Using Langevin equation to describe the movement of microalgae in water, and then according to the observation of the pause of microalgae movement (movie), we called it "breaststroke", and established the microalgae "breaststroke" model. According to Langevin equation, the viscosity in water can be replaced by the viscosity in blood to imitate the movement in blood. In addition, since red blood cells account for 45% of total volume of blood, the collision between microalgae and red blood cells will change the energy of microalgae. Since the collision between red blood cells and microalgae is random and complex, we assume that the red blood cells and microalgae are spheres. Under this assumption, the energy loss distribution shall conforms the three-dimensional trigonometric function, so we established microalgae movement model. Our model is the first to systematically describe the movement of microalgae in the water and blood. It has great originality and application value.

Ⅰ.The movement of microalgae in water

Design

2.1.1 Parameter

According to Langevin equation, the motion of microalgae in water can be described as（Fig.1）：

By formula transformation, (1) can be simplified as:

Defined：

Fig.1 The movement of microalgae in water

According to the phenomenon that we observed in the experiment: microalgae do not move at a constant speed, it pauses with periodicity during the movement(movie). We assume that the flagellum of microalgae generates a huge thrust, which can make microalgae generate an initial velocity in a very short time. The motion of microalgae in water

Simultaneous equations:

Therefore the motion curve of the microalgae in the water can be determined.

2.1.2 Caculate

(1)Solve : The average velocity of microalgae can be obtained from the experiment, then the distance in one cycle can be obtained:

(2)Solve the expression of :v⁰= 1.96mm/s.

Fig.2 Periodic curve of microalgae movement in water

Ⅱ. The movement of microalgae in the blood

2.2.1 Parameter

2.2.2 Design

In the blood, since the speed of movement of microalgae is related to the light intensity, we can control the light intensity to control the speed of movement of microalgae. Through experiments, we measured the average speeds of microalgae movement under different light conditions in the water（Fig.3）：

Fig.3 The relationship between the microalgae movement speeds and light intensities.

The equation obtained through experimental fitting:

The energy of microalgae in the blood without collision:

Following we will consider the collision in the blood (Fig.4):

Fig.4 Microalgae as a drug carrier in the blood against the flow of blood.

As the figure above, if microalgae swim upstream in the blood, they will be collided by red blood cells when they move in the blood. So, the speed of microalgae movement may change at any time. We find that the percentage of red blood cells in the blood vessels is about 45%. Which means if we take a cross-section of the blood vessels, the percentage of red blood cells is 45%(Fig.5).

Fig.5 cross-section of the blood vessels

Since red blood cells are randomly distributed in blood vessels, we assume that they are also randomly distributed in every small area of every cross section.

Thus, we equivalent the number of red blood cells colliding one microalgae cross section:

The worst case scenario is a head-on collision between the microalgae and the red blood cells. In this case, microalgae lost the most energy. Assume the red blood cell energy is 0 after the collision. By energy conservation, microalgae lost the same energy with red blood cells. While, during the movement of microalgae, the collision angle between microalgae and red blood cells is random. Therefore, we use three-dimensional trigonometric functions to describe the energy loss of microalgae and red blood cells(Fig.6).

Fig.6 The energy loss of microalgae

Through the operation of double integral, we obtained the energy lost of each section:

The energy of microalgae after collision in the blood:

If the microalgae want to move against the blood,then:

2.2.3 Calculate

(1)Calculate N: By taking the radius of red blood cells and microalgae, the following equation can be obtained:

(2)Calculate Q: The energy they lost in unit time of each blood vessel section:

(3)We can obtain the relationship between the light intensity and different blood flow velocity we set（Fig.7）:

Fig.7 Luminous intensity threshold values that allows microalgae to swim upstream in the blood at different blood flow velocities.

2.2.4 Conclusion

(1) We are the first to describe the microalgae periodic motion phenomenon in water using a precise mathematical expression.

(2) We obtained the minimum light intensity thresholds required to control microalgae to swim as intended at different blood flow velocities. In the future, microalgae may act as a drug carrier. Our model may provide some valuable data and rules for clinical trials.

3 Protein-interact model

Introduction

Protein interaction is widespread in all live units, and its importance is self-evident. In our design, red light can stimulate the combination of N-hrluc-PhyB and C-hrluc-PIF3 and then produce blue light to guide microalgae to move(Fig.1).Therefore, whether our design is reasonable and the experiment successes depends on the degree of combination of N-hrluc-PhyB and C-hrluc-PIF3. Based on chemical thermodynamic formula, the relationship between binding rate and concentration and affinity constant is established in two steps. In addition, as a commonly used light-controlled polymeric protein actuator, this model can not only guide our own experiment, but also provide a quantitative model for the field of optogenetics.

Fig.1 A: The combination of N-hrluc and C-hrluc without the assistance by optical protein switch(PhyB&PIF3). B: The combination of N-hrluc and C-hrluc with the assistance by optical protein switch.

Design

The molecular reaction in the solution can be divided into two steps:the first step is red light stimulation of PhyB and PIF3 binding, the second step is that the binding of PhyB and PIF3 facilitates the binding of two parts(NHrluc、Chrluc) of the Sea kidney luciferase. The molecular reactions carried out in our model and solution are also divided into two steps.

3.1 The first part: only the binding of PhyB protein and PIF3 protein is considered.

In order to find out the effect of the initial concentration of PhyB protein and PIF3 protein on the binding rate of the two proteins, we considered the binding of only PhyB and PIF3 proteins, and through of two kinds of proteins .Then,we get a specific equation between the initial concentration of protein and the binding rate of the two proteins.

3.1.1 Parameter

Symbol	Meaning
A	Concentration of PhyB
B	Concentration of PIF3
K_a	Affinity constant
K_d	Dissociation constant
η	Initial concentration of PhyB
A₀	Initial concentration of PhyB
B₀	Initial concentration of PIF3
AB	Concentration of PhyB-PIF3

3.1.2 Hypothesis:

(1) After the binding of the two proteins, the solution contains three substances: protein PhyB, protein PIF3 and the combination of the two. That is, the material is conserved and there is no protein loss before and after the reaction.
(2) Except for the initial protein concentration, other factors have the same effect on the binding of the two proteins.

3.1.3 Modeling

Considering the binding of PhyB protein and PIF3 protein in solution, there are/p>

If the free concentration of PhyB is A, the free concentration of PIF3 is B, and the concentration of PIF3 is AB, the binding constant is:

The dissociation constant is

Then the expression of the binding rate is

Available jointly by (2) (3) (4):

Because we assume that the initial concentration of PhyB protein is higher than that of PIF3 protein, that is,

Available jointly by (5)(6):

The specific equation between the initial concentration of the two proteins and the binding rate of the two proteins was obtained.Available from (7)

When the light with the wavelength of 650 nm and a frequency of 20 Hz, we know Kd=500nM [1]（fig.1）：

Fig.1 Binding Affinity(Kd) of PIF3-PhyB

The main results are as follows: (1) similar to the barrel effect, the binding rate depends on the low concentration of protein, and it can be estimated from the diagram that the minimum threshold of the concentration of the two proteins is mol /L at the same time.

Fig.2 The binding rate of PhyB and PIF3 is related to the concentration of PhyB and PIF3

Fig.3 The trend of binding rate when PhyB and PIF3 are at the same concentration.

(2)If the combination rate of A and B is not considered under red light (dark), then

Under the excitation of red light,the value is drastically reduced due to the change of in the configuration of the protein ,only when,The combination rate in the dark can be ignored.

Fig.4 The net increase of the binding rate under red light eliminated the dark background effect.

3.2 The second part: the combination of N-hrluc and C-hrluc.

Under the action of "optical switch", the distance between N-hrluc and C-hrluc becomes closer, which is equivalent to increasing the concentration. The effect of "optical switch" on protein binding is explored by using the definition of equivalent concentration to replace the "optical switch".

3.2.1 Parameter

Symbol	Meaning
B_i,i=1,2	Represent N-hrluc and C-hrluc protein parts respectively.
A_i,i=1,2	Represent PhyB and PIF3 respectively.
d	Distance between B₁ and B₂

3.2.2 Hypothesis:

(1) The addition of optical switch has no effect on the conformational and properties of the original protein.
(2) After the optical switch binds, the original protein is pulled closer, that is, the distance is smaller.

3.2.3 Modeling:

As shown in the following figure, optical switches are added to N-hrluc and C-hrluc proteins .Under the irradiation of light,the optical switches begin to combine.However, due to the conformational structure of B1 and B2 does not change, so the affinity constant is unchanged, but the distance of B1and B2 is close, which is equivalent to the increase of concentration.

Fig.5 A: The combination of N-hrluc and C-hrluc without the assistance by protein switch(PhyB&PIF3). B: The combination of N-hrluc and C-hrluc with the assistance by protein switch

The equivalent concentration of a particle can be obtained as follows:

The volume can be defined by the spacing between proteins, that is,

So:

According to the size of the protein, in the least ideal case, the distance between N-hrluc and C-hrluc is 16 nm. Assuming that the spacing is 15 nm, 10 nm and 5 nm, the equivalent concentrations are mol/L、 mol/L、mol/L。

Fig.6 N-Luc and C-Luc are catched to the spatial scale of the protein switch which means a higher binding rate (than being dispersed in solution).

As long as the affinity constant of the two parts of fluorescein enzyme is less than ,it can be considered to be 100% binding.

3.3 Conclusion:

our system is mainly limited by optical switches, so the Chlorella vulgaris need to be able to express enough PhyB and PIF3.

3.4 Sensitivity analysis:

If we do not conduct plasmid, add sea kidney luciferase and sea kidney fluorescein to the culture medium directly, and using the red light stimulates the movement of microalgae, and determines the threshold of sea kidney luciferase, which is what we want to verify through experiments.

3.5 Prediction:

If mol/L 's sea kidney luciferase can make microalgae move, then according to our model, it is assumed that under the same light condition, the value under red light is much smaller than that under dark, and the affinity constant between the two parts of sea kidney fluorescein enzyme is less than , equivalent to simultaneous expression in microalgae mol/L N-hrluc-PhyB and C-hrluc-PIF of mol/L).

Reference

[1]Levskaya A, Weiner O D, Lim W A, et al. Spatiotemporal control of cell signalling using a light-switchable protein interaction[J]. Nature, 2009, 461(7266): 997.

Team:DUT China B/Model

1. Protein-split Model

Introduction

Design

Algorithm

Database

Result

Weight matrix:

w1

b1

w2

b2

Performance Evaluation Criteria

Conclusion

Experimental Verification

Reference

2 Microalgae movement model

Introduction

Ⅰ.The movement of microalgae in water

Design

2.1.1 Parameter

2.1.2 Caculate

Ⅱ. The movement of microalgae in the blood

2.2.1 Parameter

2.2.2 Design

2.2.3 Calculate

2.2.4 Conclusion

3 Protein-interact model

Introduction

Design

3.1 The first part: only the binding of PhyB protein and PIF3 protein is considered.

3.1.1 Parameter

3.1.2 Hypothesis:

3.1.3 Modeling

3.2 The second part: the combination of N-hrluc and C-hrluc.

3.2.1 Parameter

3.2.2 Hypothesis:

3.2.3 Modeling:

3.3 Conclusion:

3.4 Sensitivity analysis:

3.5 Prediction:

Reference