Document

1. Protein-split Model

Introduction

Proteins are composed of relatively independent subunits or secondary structures. After subtilisin-modified ribonuclease was split and reassembled in 1958 multifarious proteins was split successfully certificated by experiments[1]. However, different split sites produce two different parts, and the assembly effect will be different. Renilla luciferase, wildely used in the reporter systems is split in our experiment, the two parts of the split Renilla luciferase need to be self-assembled under the guidance of the optical protein switch. The split site of Renilla luciferase is choosed from the result of experiments by Paulmurugan R and Gambhir S S[2], which is time-consuming and laborious, so can we predict better sites with calculational methods? We have searched a large number of previous experimental results which hint some kind of regularities (Table.1). The split site is closely related to the secondary structure of the protein, and the inappropriate site is generally located in the loop region of the protein. In addition, the location of the split sites seems to be related to the factors such as Sequence conservation, Length of coil regions, and Center distance. Based on these rules Protein-split Model is established to predict better protein-split positions. We chose nanoLuc protein in the iGEM library to verify our model. The combination of the two nanoLuc parts obtained by the split position of our model is better than the original one verified by our experiment that means our model is convincing(Fig.4, Table.2, Table.3).

Design

We conclude that the protein-split sites may be related to three factors, Secondary structure (factor 1), Length of coil region(factor 2), Center distance(factor 3)(Fig.1), and found 65 split sites (Table.1).

Factor1: Protein is cut in coil region which is classified as index 1, in the same way, the boundary between coil region and α-helix is classified as index 2, the boundary between coil region and β-strand is classified as index 3，in α-helix or β-strand is classified as index 4. Factor2：If protein-split site is one of the first three types, the factor value was defines by counting the length of the coil region. The length of the coil region is 0 if protein-split site is the index 4 type. Factor3: Center distance was defined as the ratio of the shorter fragment length to half of the full length of the protein(center distence=sublength/(0.5×fulllength).

We can use a typical BP neural network (Dichotomous model) to help us find split sites. If we can find a suitable set of weights{w1, w2}, at the same time, the inputs (I1, I2, I3) = (Factor1, Factor2, Factor3), output(O1, O2)=(1,0), then the protein-split sites selection is actually solved. For any protein, we only need to download its secondary structure from the PDB database, each amino acid residue corresponds to a set of factors. As long as the factors are input into the neural network, a label will be output: T:1:[1, 0] ; F: 0: [0, 1] (Fig. 2). We find a set of appropriate weights {w1, w2} by training the sample to complete our model, and do the sensitivity and specification analysis.

Fig.2 Feedforward neural network(FNN)

Algorithm

See connected protein-split_supplement1.

Database

Input see connected protein-split_supplement2,Output see connected protein-split_supplement3.

Result

Weight matrix:

w1

1	2	3
-0.0306	0.233	0.5924
0.6037	-0.6431	-0.5429
1.3603	-0.6622	0.5412
0.6023	-0.3277	-0.5616

b1

0.7124	-0.464	-0.9178	-0.5768

w2

1	2
0.3485	-0.1632
-0.2581	0.1562
-0.8925	0.9408
-0.2699	-0.1891

b2

0.7532	0.1274

We sample 20 samples as test sets, and the results are as follows.

Fig.3 Confusion matrix

Performance Evaluation Criteria

Conclusion

The model presents to be highly sensitive, but has low specificity which produces high ratios of false positives, meaning that many inappropriate protein-split sites are recognized as appropriate protein-split sites (Fig.3). However, in general, appropriate split sites are much less than inappropriate split sites in a protein with hundreds of amino acid residues, so the low specificity is inevitable. We only need to ensure that very few appropriate sites are not mistaken for inappropriate sites. This model will provide several predicted appropriate sites to instruct experiment, as long as there are one true appropriate sites. Considering our model has a very low false negative, this model is fabulous in practical applications.

Experimental Verification

We chose nanoLuc in iGEM library to split and test our model. Based on the protein-split model, we calculated an optimal split site and predicted that the original split site is not ideal enough (Fig.4). The experimental results showed that the ability of the two parts of nanoLuc(st-1 and sc-1) cut at our predicted split site to self-assemble is much greater than the original split site(st-2 and sc-2)(Table.2 and Table.3).

Fig.4 Predicted split site and the original split site in the sequence of nanoLuc.

Reference

[1]Richards F M. On the enzymic activity of subtilisin-modified ribonuclease[J]. Proceedings of the National Academy of Sciences of the United States of America, 1958, 44(2): 162.
[2]Paulmurugan R, Gambhir S S. Monitoring protein− protein interactions using split synthetic renilla luciferase protein-fragment-assisted complementation[J]. Analytical chemistry, 2003, 75(7): 1584-1589.
[3]Han T, Chen Q, Liu H. Engineered photoactivatable genetic switches based on the bacterium phage T7 RNA polymerase[J]. ACS synthetic biology, 2016, 6(2): 357-366.
[4]Ghosh I, Hamilton A D, Regan L. Antiparallel leucine zipper-directed protein reassembly: application to the green fluorescent protein[J]. Journal of the American Chemical Society, 2000, 122(23): 5658-5659.
[5]Taniuchi H, Anfinsen C B, Sodja A. Nuclease-T: an active derivative of staphylococcal nuclease composed of two noncovalently bonded peptide fragments[J]. Proceedings of the National Academy of Sciences of the United States of America, 1967, 58(3): 1235.
[6]de Prat Gay G, Ruiz-Sanz J, Davis B, et al. The structure of the transition state for the association of two fragments of the barley chymotrypsin inhibitor 2 to generate native-like protein: implications for mechanisms of protein folding[J]. Proceedings of the National Academy of Sciences, 1994, 91(23): 10943-10946.
[7]Bae C, Suchyna T M, Ziegler L, et al. Human PIEZO1 ion channel functions as a split protein[J]. PloS one, 2016, 11(3): e0151289.
[8]Johnsson N, Varshavsky A. Split ubiquitin as a sensor of protein interactions in vivo[J]. Proceedings of the National Academy of Sciences, 1994, 91(22): 10340-10344.
[9]Pelletier J N, Campbell-Valois F X, Michnick S W. Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally designed fragments[J]. Proceedings of the National Academy of Sciences, 1998, 95(21): 12141-12146.
[10]Galarneau A, Primeau M, Trudeau L E, et al. β-Lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein–protein interactions[J]. Nature biotechnology, 2002, 20(6): 619.
[11]Massoud T F, Paulmurugan R, Gambhir S S. A molecularly engineered split reporter for imaging protein-protein interactions with positron emission tomography[J]. Nature medicine, 2010, 16(8): 921.
[12]Martell J D, Yamagata M, Deerinck T J, et al. A split horseradish peroxidase for the detection of intercellular protein–protein interactions and sensitive visualization of synapses[J]. Nature biotechnology, 2016, 34(7): 774.
[13]Kaihara A, Kawai Y, Sato M, et al. Locating a Protein− Protein Interaction in Living Cells via Split Renilla Luciferase Complementation[J]. Analytical chemistry, 2003, 75(16): 4176-4181.
[14]Paulmurugan R, Gambhir S S. Novel fusion protein approach for efficient high-throughput screening of small molecule–mediating protein-protein interactions in cells and living animals[J]. Cancer research, 2005, 65(16): 7413-7420.
[15]Kim S B, Kanno A, Ozawa T, et al. Nongenomic activity of ligands in the association of androgen receptor with SRC[J]. ACS chemical biology, 2007, 2(7): 484-492.
[16]Luker K E, Smith M C P, Luker G D, et al. Kinetics of regulated protein–protein interactions revealed with firefly luciferase complementation imaging in cells and living animals[J]. Proceedings of the National Academy of Sciences, 2004, 101(33): 12288-12293.
[17]Paulmurugan R, Umezawa Y, Gambhir S S. Noninvasive imaging of protein–protein interactions in living subjects by using reporter protein complementation and reconstitution strategies[J]. Proceedings of the National Academy of Sciences, 2002, 99(24): 15608-15613.
[18]Wehr M C, Laage R, Bolz U, et al. Monitoring regulated protein-protein interactions using split TEV[J]. Nature methods, 2006, 3(12): 985.

2 Microalgae movement model

Introduction

Our project aims to take advantage of the phototaxis characteristics of microalgae to construct a drug carrier which can be directed to the diseased cells under the guidance of light. We first analyzed the movement of microalgae in water. Using Langevin equation to describe the movement of microalgae in water, and then according to the observation of the pause of microalgae movement (movie). We called it "breaststroke", and established the microalgae "breaststroke" model. According to Langevin equation, the viscosity in water can be replaced by the viscosity in blood to imitate the movement in blood. In addition, since red blood cells account for 45% of total volume of blood, the collision between microalgae and red blood cells will change the energy of microalgae. Since the collision between red blood cells and microalgae is random and complex, we assume that the red blood cells and microalgae are spheres. Under this assumption, the energy loss distribution shall conform the three-dimensional trigonometric function, so we established the microalgae movement model. Our model is the first to systematically describe the movement of microalgae in the water and blood. It has great originality and application value.

Ⅰ.The movement of microalgae in water

Design

Parameters

According to Langevin equation, the motion of microalgae in water can be described as（Fig.1）：

By formula transformation, (1) can be simplified as:

Defined：

Fig.1 The movement of microalgae in water

According to the phenomenon that we observed in the experiment: microalgae do not move at a constant speed, it pauses with periodicity during the movement(movie). We assume that the flagellum of microalgae generates a huge thrust, which can make microalgae generate an initial velocity in a very short time. The motion of microalgae in water

Simultaneous equations:

Therefore the motion curve of the microalgae in the water can be determined.

Caculation

(1)Solve : The average velocity of microalgae can be obtained from the experiment, then the distance in one cycle can be obtained:

(2)Solve the expression of :v⁰= 1.96mm/s.

Fig.2 Periodic curve of microalgae movement in water

Ⅱ. The movement of microalgae in the blood

Parameters

Design

In the blood, since the speed of movement of microalgae is related to the light intensity, we can control the light intensity to control the speed of movement of microalgae. Through experiments, we measured the average speeds of microalgae movement under different light conditions in the water（Fig.3）：

Fig.3 The relationship between the microalgae movement speeds and light intensities.

The equation obtained through experimental fitting:

The energy of microalgae in the blood without collision:

Following we will consider the collision in the blood (Fig.4):

Fig.4 Microalgae as a drug carrier in the blood against the flow of blood.

As the figure above, if microalgae swim upstream in the blood, they will be collided by red blood cells when they move in the blood. So, the speed of microalgae movement may change at any time. We find that the percentage of red blood cells in the blood vessels is about 45%. Which means if we take a cross-section of the blood vessels, the percentage of red blood cells is 45%(Fig.5).

Fig.5 cross-section of the blood vessels

Since red blood cells are randomly distributed in blood vessels, we assume that they are also randomly distributed in every small area of every cross section.

Thus, we equivalent the number of red blood cells colliding one microalgae cross section:

The worst case scenario is a head-on collision between the microalgae and the red blood cells. In this case, microalgae lost the most energy. Assume the red blood cell energy is 0 after the collision. By energy conservation, microalgae lost the same energy with red blood cells. While, during the movement of microalgae, the collision angle between microalgae and red blood cells is random. Therefore, we use three-dimensional trigonometric functions to describe the energy loss of microalgae and red blood cells(Fig.6).

Fig.6 The energy loss of microalgae

Through the operation of double integral, we obtained the energy lost of each section:

The energy of microalgae after collision in the blood:

If the microalgae need to swim upstream in the blood, then:

Calculation

(1)Calculate N: By taking the radius of red blood cells and microalgae, the following equation can be obtained:

(2)Calculate Q: The energy they lost in unit time of each blood vessel section:

(3)We can obtain the relationship between the light intensity and different blood flow velocity we set（Fig.7）:

Fig.7 Luminous intensity threshold values that allows microalgae to swim upstream in the blood at different blood flow velocities.

Conclusion

(1) We are the first to describe the microalgae periodic motion phenomenon in water using a precise mathematical expression.

(2) We obtained the minimum light intensity thresholds required to control microalgae to swim as intended at different blood flow velocities. In the future, microalgae may act as a drug carrier. Our model may provide some valuable data and rules for clinical trials.

3 Protein-protein interaction model

Introduction

Protein interaction is widespread in all live units, and its importance is self-evident. In our design, red light can stimulate the combination of N-hrluc-PhyB and C-hrluc-PIF3 and then produce blue light to guide microalgae to move(Fig.1).Therefore, whether our design is reasonable and the experiment successes depends on the degree of combination of N-hrluc-PhyB and C-hrluc-PIF3. Based on chemical thermodynamic formula, the relationship between binding rate and concentration and affinity constant is established in two steps. In addition, as a commonly used light-controlled polymeric protein actuator, this model can not only guide our own experiment, but also provide a quantitative model for the field of optogenetics.

Fig.1 A: The combination of N-hrluc and C-hrluc without the assistance by optical protein switch(PhyB&PIF3). B: The combination of N-hrluc and C-hrluc with the assistance by optical protein switch.

Design

The molecular reaction in the solution can be described that the binding of PhyB and PIF3 stimulated by red light facilitates the binding of the two parts(N-Hrluc, C-hrluc) of the Renilla luciferase. Therefore this molecular reaction in solution can be divided into two stages:

StageⅠ: only the binding of PhyB and PIF3 is considered.

In order to find out the effect of the initial concentration of PhyB and PIF3 on the binding rate of the two proteins, we get a specific equation between the initial concentration of protein and the binding rate of two proteins through dissociation constant Kd.

Parameters

Symbol	Meaning
A	Concentration of PhyB
B	Concentration of PIF3
K_a	Affinity constant
K_d	Dissociation constant
η	Initial concentration of PhyB
A₀	Initial concentration of PhyB
B₀	Initial concentration of PIF3
AB	Concentration of PhyB-PIF3

Hypothesis:

After the binding of two proteins, the solution contains three substances: protein PhyB, protein PIF3 and the combination of them. That is, the material is conserved and there is no protein loss during the process of the reaction.

Calculation

Considering the binding of PhyB and PIF3 in solution：

If the free concentration of PhyB is A, the free concentration of PIF3 is B, and the concentration of PhyB&PIF3 is AB, the binding constant :

The dissociation constant :

Then the expression of the binding rate:

By the (2) (3) (4) simultaneous:

We assume that the initial concentration of PhyB protein is higher than that of PIF3 protein, that is,

Available jointly by (5)(6):

The specific equation between the initial concentration of the two proteins and the binding rate of the two proteins was obtained.
Available from (7):

When the light with the wavelength of 650 nm and a frequency of 20 Hz, we know Kd=500nM [1]（Fig.1）：

Fig.2 Binding Affinity(Kd ) of PIF3&PhyB

Calculation

(1) Similar to the barrel effect, the binding rate depends on the low concentration of protein, and it can be estimated from the diagram that the minimum threshold of the concentration of the two proteins is "1" "0" ^"-6" mol /L when the concentration of the two proteins is equal(Fig.3 and Fig.4).

Fig.3 The binding rate of PhyB and PIF3 related to the concentration.

Fig.4 The trend of binding rate when PhyB and PIF3 are at the same concentration

(2)If the combination rate of A and B is not considered under red light (dark), then

Under the excitation of red light, the value K_d is drastically reduced due to the change of in the configuration of the protein, only whenK_d (dark)/K_d (hv)>10000, the combination rate in the dark can be ignored(Fig.5).

Fig.5 The net increase of the binding rate under

red light eliminated the dark background effect.

StageⅡ: the combination of N-hrluc and C-hrluc.

Under the action of "optical switch"（PhyB&PIF3）, the distance between N-hrluc and C-hrluc becomes closer, which is equivalent to the increasing of concentration. The effect of "optical switch" on protein binding is converted to value by equal.

Parameters

Parameter	Description
B_i,i=1,2	N-hrluc and C-hrluc protein parts respectively
A_i,i=1,2	PhyB and PIF3 respectively
d	Distance between B₁ and B₂

Hypothesis:

(1)The addition of optical switch has no effect on the conformational of hrluc.
(2)After the optical switch bound, the original protein is pulled closer, that is, the distance is shorten.

Calculation

Fig.6 A: The combination of N-hrluc and C-hrluc without the assistance by protein switch(PhyB&PIF3). B: The combination of N-hrluc and C-hrluc with the assistance by protein switch

As shown in the following figure(Fig.6), optical switches A_1,A_2are added to N-hrluc and C-hrluc proteins. Under the irradiation of light, the optical switches A_1,A_2begin to combine. Due to the hypothesis that conformational structure of B1 and B2 does not change, so the affinity constant is unchanged, while the distance of B1 and B2 is reduced, which is equivalent to the increase of concentration.

The concentration of a particle can be obtained as follows:

The volume can be defined by the spacing between proteins, that is,

So:

According to the size of the protein switch(PhyB&PIF3), in the least ideal case, the distance between N-hrluc and C-hrluc is 16 nm. Assuming that the spacing is 15 nm, 10 nm and 5 nm, the equivalent concentrations can be caculated as 3×"1" "0" ^"-3" mol/L, 1×"1" "0" ^"-2" mol/L, 8×"1" "0" ^"-2" mol/L according to formula (12).

Fig.7 N-Luc and C-Luc are catched by the optical protein switch in spatial scale which means a higher binding rate (than being dispersed in solution)

As long as the affinity constant of the two parts of fluorescein enzyme is less than ,it can be considered to be completely binding(Fig.7).

Conclusion

(1) The optical switch proteins(PhyB and PIF3) will bind only when the concentration of optical switch protein reach 10-6mol/L according to our model(Fig.4).

(2) The equivalent concentration of renin luciferase depends on the structure size of optical protein switch(Fig.7).

Reference

[1]Levskaya A, Weiner O D, Lim W A, et al. Spatiotemporal control of cell signalling using a light-switchable protein interaction[J]. Nature, 2009, 461(7266): 997.

Team:DUT China B/Model

1. Protein-split Model

Introduction

Design

Algorithm

Database

Result

Weight matrix:

w1

b1

w2

b2

Performance Evaluation Criteria

Conclusion

Experimental Verification

Reference

2 Microalgae movement model

Introduction

Ⅰ.The movement of microalgae in water

Design

Parameters

Caculation

Ⅱ. The movement of microalgae in the blood

Parameters

Design

Calculation

Conclusion

3 Protein-protein interaction model

Introduction

Design

StageⅠ: only the binding of PhyB and PIF3 is considered.

Parameters

Hypothesis:

Calculation

Calculation

StageⅡ: the combination of N-hrluc and C-hrluc.

Parameters

Hypothesis:

Calculation

Conclusion

Reference