Progress Indicator Animation
Within our project, we applied various modeling types including structural 3D-modeling, mathematical modeling and growth curve modeling to facilitate our decision process.
Firstly, we created a model that predicts the ideal split point for intein-mediated splicing using the split-Intein Nostoc punctiforme DnaE. We implemented split antibiotic resistance genes and used them as selection markers via reconstitution of the enzyme through intein-mediated splicing. The predicted split points for Chloramphenicol acetyltransferase and Aminoglycoside Phosphotransferase APH(3')-Ia, conveying resistance against chloramphenicol and kanamycin were used and successfully validated by wet lab experiments. They substantially advanced our project with respect to biosafety and cost efficiency.
Additionally, we modeled the growth of Saccharomyces cerevisiae under the influence of Cas13a as applied by our Cell Death Inducing System. This enables us to predict the optimal number of gRNAs necessary for specific but still robust Cas13a activation in the presence of certain RNA sequences and assess the time it takes until a resistance against the Cas-systems evolves in a population of fungi. This prediction considered the probability of various mutation types within the gRNAs and the possible consequences of these.
Modeling is an important part of synthetic biology. It can be used to understand biological processes, simplify experimental setups and thus even speed up the progress of a project. Modeling is especially important if testing each option in real life is not feasible. As experiments are cost intensive regarding money, resources and time only experiments which are expedient should be conducted. During our project we constructed two models and integrated their results into our further experimenatal setups and overall project design.

Split Selection Markers

Split Selection Markers

To assemble our Troygenics, we need E. coli to maintain two plasmids in the cell. The conventional way to maintain multiple plasmids in the same cell is by encoding different antibiotic resistances on each plasmid and growing the bacteria on selection plates with all the respective antibiotics.
However, every application of antibiotic resistance genes is problematic, due to the risk of spreading the antibiotic resistance genes into the environment. Additionally, the selection pressure on the microbiome of all users leads to a higher proportion of tolerant or resistant species. Finally, there is also a monetary disadvantage of using many antibiotics due to the prices of these high value products.
One way to effectively minimize the use of antibiotic resistances is by splitting the resistance proteins and putting their partial coding sequences on different plasmids. This does not only enable the use of 50% less antibiotics, but also increases the biosafety of the engineered systems. In the scenario that of the plasmids, or a strain carrying this plasmid, should accidentally be released into the environment, organisms taking up the DNA do not gain a direct fitness advantage from partial resistance genes. Consequently, the probability of genetically modified organisms (GMOs) overgrowing the natural populations is substantially reduced. The replacement of antibiotic resistance genes by split antibiotic resistance genes increases the safety of applications involving GMOs both inside and outside the lab.
Since some intended applications of our Troygenics will require their release into the environment, we integrated split antibiotic resistances in the Troygenics production process as one layer of our biosafety system to contain the risk of accidentally spreading antibiotic resistances to other organisms. Thereby our Troygenics that would be released into the environment, would only contain one of the two plasmids needed for production, and never carry a full resistance marker. This disables them from spreading resistance traits to other organisms during future Troygenics applications in the environment.

Split-Intein: Npu DnaE

Our biosafety system is based on intein-splicing using the split-intein Npu DnaE. It works by attaching one half of the Npu DnaE to the N-terminal end of one part of the resistance protein and the other half of Npu DnaE to the C-terminal end of the resistance protein (Stevens et al., 2017).
However, it is not possible to split the protein of interest at any position within the chain of amino acids. There is only a limited number of permissive positions and some are more feasible than others (Amitai, Callahan, Stanger, Belfort, & Belfort, 2009). This is based on the chemical characteristics of the amino acids in these positions, since the splicing process includes a nucleophilic attack on one of them (Shah & Muir, 2011). To avoid trying splitting points in a brute force approach, we developed a model predicting the optimal split points.
Our model takes the structure as well as the sequences around potential split points into account. At first split-points in regions which form an alpha-helix or a beta-sheet are avoided. Additionally the amino acids neighboring the splicing site are inspected. As previously shown, that certain amino acids are more favorable in these positions, leading to more correct splicing events as to an overall increased splicing rate. Based on previously reported data (Cheriyan, Pedamallu, Tori, & Perler, 2013), a score between 0 and 50 has been assigned to imply the value of an amino acid at a certain position. The higher the value the more favorable is the amino acid in that position for faster and more specific splicint. The positions relevant for this kind of splicing are the three ones right before and right after the insertion site for the split-intein (Fig. 1).
The three amino acid positions most relevant for a successful splicing process are highlighted with blue circles. These positions were reported before to have the biggest impact on the functionality, specificity and rate of the splicing process (Cheriyan, Pedamallu, Tori, & Perler, 2013).
While it is sometimes possible, to find split points with the sufficient neighbouring amino acids already in the protein, the likelihood of finding a good splicing site increases when the introduction of amino acids is possible. This is especially essential for the +1 position at the C-terminal extein, as it must be cytosin in almost all cases, due to catalytic reasons.
Relative favorability of each amino acid i (rF(Ai)) at each relevant position j neighboring the inteins. Adjusted from Cheriyan, Pedamallu, Tori, & Perler, 2013. The amino acids neighboring a splicing site were randomly mutated, +1 was set to Cysteine. The mutants conducting successful protein splicing were selected and the 6 neighboring amino acids were determined. To obtain the rf(Ai) values the occurence of each amino acid in each position was counted and this number was divided by the natural frequency of said amino acid. For our model Cysteine +1 was set to “50” to ensure that this was picked over any other combinations. Additionally, Methionine +2 was set to 20, as other sources stated that it might be an appropriate substitute for Cysteine (Brenzel, 2009).
Amino Acid rF(Ai) at the N-terminal extein rF(Ai) at the C-terminal extein
-3 -2 -1 +1 +2 +3
D 1.39 1.52 0.00 0.00 0.00 1.26
E 1.26 3.51 0.07 0.00 0.00 5.5
N 0.93 1.26 2.19 0.00 0.00 8.61
Q 1.39 0.86 0.33 0.00 0.00 0.27
H 1.06 1.26 2.19 0.00 0.00 8.61
K 1.59 0.93 4.44 0.00 0.07 0.00
R 1.81 0.2 0.86 0.00 0.00 0.00
S 0.99 1.24 1.37 0.00 0.00 0.07
C 0.46 0.40 0.80 50 0.07 0.07
T 0.63 0.89 1.59 0.00 0.00 0.96
P 0.80 1.52 0.00 0.00 0.00 0.00
G 4.67 2.68 0.17 0.00 0.00 0.03
A 0.63 0.83 1.92 0.00 0.00 0.00
V 0.56 0.50 0.23 0.00 0.03 0.10
I 0.07 0.53 0.00 0.00 0.00 0.40
L 0.11 0.57 0.75 0.00 0.00 0.82
M 0.13 0.73 1.26 20 0.07 2.25
F 0.13 0.73 1.26 0.00 0.07 2.25
Y 0.20 0.60 2.05 0.00 0.13 5.23
W 0.07 0.40 0.99 0.00 29.42 0.07
To identify the optimal split point for intein-mediated splicing of proteins, all possible 6 amino acid long fragments of the peptide sequences were assessed by a python script. The sum of their rf(Ai)j values was calculated for each fragment, revealing how well the center of the respective fragment would act as a split point.
The generation of all possible 6 amino acid long fragments of the peptide sequence of interest.

Formula used to calculate how beneficial a certain combination of amino acids is for Npu DnaE mediated protein splicing.

Using this approach, a list with all possible split points sorted by B(seq)n is created, ranging from the best to the worst possible split point in the protein.
Complementing the sequence based determination of potential split points the desired structure of the final protein has to be considered. Positions in important structural features of the protein were determined using the protein structure viewer Chimera (Pettersen et al., 2004). Sequences involved in relevant structures were integrated in the analyses as described above to prevent the identification of possible split points in these regions. Lastly, the remaining list of split points was sorted from best to worst and the best split sequences determined were written FASTA files. Using the MODELLER-software (Sali & Webb, 1989), the most likely structure of the N- or C-terminal part of the protein fused to the split intein, prior to finding the other intein in the cell, was determined. To do so, it was necessary to find proteins with similar structures as templates for homology modeling. They were chosen as follows:

Template 1: Structure of the protein encoding antibiotic resistance
Template 2: Crystal structure of native Npu DnaE split intein (PDB: 4QFQ )
Template 3: Crystal structure of an inactivated Npu SICLOPPS intein with CAFHPQ extein (PDB: 5OL6 )
Template 4 and 5: The two BLASTp hits most similar to the Npu DnaE-protein combination

Split Kanamycin Resistance

The amino acid sequence of the protein conveying kanamycin resistance, Aminoglycoside phosphotransferase APH(3')-Ia (Fong & Berghuis, 2002), was obtained from the iGEM-partsreg pSB1K3 and translated to a protein using ExPASy ( Expasy ).
The 3D-structure of Aminoglycoside Phosphotransferase APH(3')-Ia as obtained from PDB ( 4EJ7). This protein conveys resistance to kanamycin.
The final protein sequence can be found here .

Using the python script, the ideal split point was identified as ETS-CSR at the amino acid position B(ETSCSR).

However, this split point is positioned rather close to the end of the polypeptide-chain. Moreover looking at the 3D-structure of the Aminoglycoside phosphotransferase APH(3')-Ia this position is marked to be a beta-strand, making it less likely that the splicing would result in the required protein structure.
The 3D-structure of the Aminoglycoside Phosphotransferase APH(3')-Ia. The peptide colored in magenta depict the split point identified as the most feasible one for Npu DnaE-intein-mediated splicing: ETS-CSR.
In addition to finding the optimal split point already within the sequence, we also considered adding one amino acid to broaden the possible combinations. We decided to implement a cysteine at the +1 position, since that is the most relevant position for the catayltic splicing mechanism, thus having a cysteine there greatly increases the likelihood of successful protein splicing.
The 3D-structure of the Aminoglycoside Phosphotransferase APH(3')-Ia. The peptides colored in magenta depict the position of two possible split points DDA-CWL (Fig. A) and GYK-CWA (Fig. B).
When identifying the ideal split points with a modification of the +1 amino acid to cysteine, our model found GYK-WA at position 25-29 and DDA-WL at position 91-95 to be the best split points. The split point GYK-WA, with a B(GYKCWA) of 89.45, is still rather close to the end of the amino-acid chain. Moreover, this position is located in the beginning of a beta-sheet.
The DDA-WL split point with a B(DDAWL) of 85.07. is still partly in beta-sheet. However, the splicing site is further away from the end of the polypeptide-chain than the GYK_WA position. Since the B(Ai) values are in general higher for the intein-combinations with a newly introduced Cysteine, we decided to choose one of those. As the best and second best predicted split points showed similar B value and were both located on the outskirts of beta-sheets we chose the GYK-WA position based on its position in the polypeptide chain. The more central split position in contrast to the DDA-WL position would increase the chance, that the splitting the protein would abolish the resistance activities of the individual subparts. Using only the last few amino acids as a split partner would pose the risk that the larger protein subpart would remain active and would thereby make our split resistance approach impossible. Therefore, we chose DDA-WL as our split point for Kanamycin.

Upon implementing this split point into the protein, the protein-parts combined with the intein were:
Part 1:

Part 2:

Using homology modeling with MODELLER (Sali & Webb, 1989), the likely structure of each split part was determined. Next, the two protein parts were combined and adjacent to each other using Chimera, representing the protein structure prior to splicing.

The templates given were:
Part 1 Part 2
4QFQ, chain A (Npu DnaE) 4QFQ, chain B (Npu DnaE)
4EJ7, chain B (Aminoglycoside phosphotransferase APH(3')-Ia) 4EJ7, chain B (Aminoglycoside phosphotransferase APH(3')-Ia)
5OL6, chain B (inactivated Npu SICLOPPS intein with CAFHPQ extein) 5OL6, chain A (inactivated Npu SICLOPPS intein with CAFHPQ extein)
5OL7, chain A (inactivated Npu SICLOPPS intein with CFAHPQ extein) 4FEV, chain A (inactivated Npu SICLOPPS intein with CFAHPQ extein)
4KL5, chain B (Npu DnaE) 4FEU, chain A (Aminoglycoside phosphotransferase APH(3')-Ia)
Based on these templates, the default settings of MODELLER (Sali & Webb, 1989) were used to calculate a homology model from several templates.
The predicted 3D-structure of each subunit of the split Aminoglycoside phosphotransferase APH(3')-Ia prior to splicing. The split DDA-CWL was chosen based on our Modeling. The 3D-model was developed using the homology modeling software MODELLER (Sali & Webb, 1989).

Split Chloramphenicol Resistance

We aimed to split the protein conveying resistance to chloramphenicol, chloramphenicol acetyltransferase (BBa_J3105). The resistance protein is commonly used within iGEM, thus making a split Chloramphenicol acetyltransferasa a vluable addition to the iGEM partsreg. The protein sequence of this part can be found here. According to our modeling, the best split point to use when introducing Npu DnaE was VAQ-CTY in the positions 28-33, the B(VAQCTY) is 56.95.
Even though this split point is located at the second amino acid of a beta-sheet, we decided to test whether splitting at this position could lead to Chloramphenicol acetyltransferase fragments that could be reconstituted by intein-mediated protein splicing.
Similar to our modeling of the split kanamycin resistance, we used the homology modeling software MODELLER (Sali & Webb, 1989) to visualize each part of the chloramphenicol acetyltransferase fused to the Npu DnaE intein.
The amino acid sequences given to the software were:
Part 1:

Part 2:

The templates used for homology modeling were:

Part 1 Part 2
1NOC, chain B (Chloramphenicol acetyltransferase) 1NOC, chain B (Chloramphenicol acetyltransferase)
4QFQ, chain A (Npu DnaE) 4QFQ, chain B (Npu DnaE)
5OL6, chain A (inactivated Npu SICLOPPS intein with CAFHPQ extein) 5OL6, chain B (inactivated Npu SICLOPPS intein with CAFHPQ extein)
4KL5, chain A (Npu DnaE) 1QCA, chain A (Type III Chloramphenicol acetyltransferase)
2KEQ, chain A (DnaE intein from Nostoc punctiforme) 3CLA, chain A (Type III Chloramphenicosl acetyltransferase)
After running the MODELLER software (Sali & Webb, 1989), two protein structures were calculated and placed adjacent to each other using Chimera (Pettersen et al., 2004).
The predicted 3D-structure of each subunit of the Cloramphenicol acetyltransferase split at the optimal predicted split point VAQ-CTY prior to splicing. It was developed using the homology modeling software MODELLER (Sali & Webb, 1989).
As shown in growth experiments on plates (Fig. 9), implementing split antibiotic resistances is feasible. Using the split points predicted with the model allowed us to successfully construct functional split antibiotic resistances.

Wet lab experiments

The predictions of our modeling verified in a wet lab experiments. From the upper left site to, to the right till the lower right site, plates are lined up with the chloramphenicol concentrations 0.1, 1, 7, 10, 20, 40, 60, 80, 100 µg/ml. Every plate is divided into three sections. On the upper left site DH5α was plated representing the negative control, on the upper right site colonies containing two plasmids with the split-Chloramphenicol acetyltransferase parts.On the lower part there are cells containing one plasmid with a full Chloramphenicol acetyltransferase.

Resistance Development

As a major part of the practical application of our Troygencs focuses on combating eukaryotic pathogens using our Cell Death Inducing System (CeDIS) we wanted to know how likely it is for a target organism to gain resistance against the CeDIS. In detail the CeDIS employs a Cas13a, a CRISPR/Cas system which induces collateral RNA-cleavage upon detecting specific RNAs complementary to guide RNA sequences. In our project we used our Troygenics to infiltrate the cell of unicellular eukaryotic pathogens, like pathogenic fungi and introduce our CeDIS into the cell. As the Cas13a and it’s assiociated guide RNAs recognize RNAs specific for the pathogen the Cas13a is activated and leads to collateral RNA-cleavage. This event should lead to the death of the eukaryotic pathogen due to the cell’s inability to further metabolize. The Cas13a/gRNA recognition system on its own is very specific and can, under certain circumstances, differentiate in-between single nucleotide polymorphisms (Gootenberg et al., 2017). While we consider this to be crucial to ensure the biosafety of our system, we also learnt that it would facilitate the development of resistances, as even a few changes in the nucleotide sequence of the genetic targets of our CeDIS would abolish it's recognition and therefore killing ability. While we talked about this to many experts, Dr. Mehl and Dr. Derpmann, located at the Fungicide Resistance Action Committee were two of the most valuable experts in this area. They confirmed very specific methods might have biosafety advantages, but make it very likely for the fungi to develop resistances against our Troygenics as as only a small number of genetic or metabolic changes are sufficient to enable resistance against very specific approaches.
A specific mode-of-action increases the likelihood of fungi gaining resistances against fungicides.
Dr. Mehl and Dr. Derpmann, Fungicide Resistance Action Committee
Taking this information into consideration, we developed a mechanism to decrease the probability of emerging resistances in fungal populations. To do so, we aimed to adjust the number of gRNAs for our CeDIS, consequently influencing the number of mutations needed within the fungus to avoid being detected. The general model framework was first crafted for the gain of resistances of S. cerevisiae in the laboratory cultivation environment. At first, the general growth of S. cerevisiae had to be modeled. To do so, we used the logistic growth model. It represents the growth as a sigmoid curve, therefore including the lag-phase, the exponential growth and the plateau phase when the nutrients are limiting the growth.
The formula used to describe this kind of growth is:
Differential equation describing logistic growth. The formula describes the growth rate of a population with the size P with a growth rate of r up to a maximum population or carry capacity of K. This growth model can be used to predict the growth of organisms in areas with limited access to nutrients. We used it to predict the growth of S. cerevisiae in liquid medium.
When the number of cells in the plateau phase was too low for the cells to likely develop resistances, washing and resuspending the cells in the same amount of fresh medium was implemented to stop the cells from dying. This was included by doubling the carrying capacity K.
Formula used to adjust the carrying capacity during the simulated growth of S. cerevisiae. Whenever the carrying capacity within a regular experiment was reached, resuspending the cells in fresh medium was modeled.
To get a non-conservative estimate, we decided that a likelihood of 75% would be high enough to expect the first cells to develop resistances.
Considering the mutation frequency and spectrum of S. cerevisiae we calculated the likelihood of mutations accruing within the area targeted by gRNAs. We implemented the frequencies of point mutations, short insertions or deletions and longer (> 1 kb) insertions or deletions (table 2).
To calculate the mutation probability of each gRNA we constructed the following formula:
The formula designed to calculate the mutation probability of a single gRNA. gciis the GC-content, mfgcthe mutation frequency of all GC positions and mfat the mutation frequency at AT positions.
In addition to point mutations, the yeast genome suffers strongly from segment- or even chromosome-level gain or loss.
Prof. He Xionglei, State Key Laboratory of Biocontrol
Prof. He Xionglei from the State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University in China, elucidated, how relevant insertions, deletions and segment loss are within the genome of S. cerevisiae . Therefore, we also included the likelihood for short insertions and deletions, as well as larger sequence losses into our calculations.
The formula used to calculate the mutation probability of a single gRNA. gciis the GC-content, mfgcthe mutation frequency of all GC positions and mfat the equivalent for AT. mfindel represents the likelihood of short insertions or deletions (< 50 bp) and mfloss losses of larger fragments (> 1000 bp). lenCeDIS represents the length of our entire CeDIS.
To calculate the mutation probability for each construct, we multiplied the previously generated probabilities for all gRNAs included (figure XX)
The formula used to calculate the mutation probability of a single gRNA. gciis the GC-content, mfgcthe mutation frequency of all GC combinations and mfat the equivalent for AT. mfindel represents the likelihood of short insertions or deletions (< 50 bp) and mfloss losses of larger fragments (> 1000 bp). lenCeDIS represents the length of our entire CeDIS.
All formulas noted above were implemented in a python script . The used constants were collected from different literature sources or were taken from experiments conducted by us (Tab. 1). The probability required for the yyeast to gain resistance against our CeDIS with different numbers of gRNAs is depicted in Fig. 15. The time necessary for a resistance to occur with a probability of 75% is shown in Fig. 15.
Used constants for modeling the resistance development.
Mutation Mutation frequency Source
mfgc 2.28456*10-12 (Zhu, Siegal, Hall, & Petrov, 2014)
mfat 1.058878*-12 (Zhu, Siegal, Hall, & Petrov, 2014)
mfindel 5.03*10-12 (Zhu, Siegal, Hall, & Petrov, 2014)
Genome Size 11556150 bp (Engel et al., 2013)
K 2.3*107 Own experiments
rCas 0.12 Own experiments
rWT 1.06 Own experiments
Probability of gaining resistance against our CeDIS depending on the number of gRNAs. The probability of gaining a resistance against the recognition through the Cas13a from the CeDIS system is calculated for a different nuber of gRNAs.
Time predicted time required for gaining resistance against our CeDIS depending on the number of gRNAs. The time necessary for a resistance against CeDIS with a different number of gRNAs to arise is calculated. This model takes especially the individual probabilites of emerging resistance as well as the logistic growth into account and thereby models the probability of a resistant cell after a certain period of time. The timepoint this probability exceeds 75 % is depicted in this figure. The dark blue and violet marks note the time points until a cell resistant against our CeDIS with 3 or 7 different gRNAs first arises with 75% probability.
The time necessary until a resistant cell arises increases linearly with the number of used gRNA sequences. We wanted to use this result to discern the number of gRNAs necessary for our Tryogenics to generate similar or even less resistant cell than conventional fungicides. For this assumption we informed us about the conditions that fungi experience when growing on a field. Due to a limited growth rate on a field, a growth of a fungus for one year on a field is comparable for around 250 hours in a lab setting (Degani & Cernica, 2014; Meletiadis, Meis, Mouton, & Verweij, 2001). Furthermore, the carrying capacity is based on the field size and the mutation rate largely depends on the specific fungus and the environment. After talking to experts in the field of fungicide resistance development, who stated that they usually expect fungi to develop resistances against fungicides after approximately three years, we aimed to reach the same time until resistance emerges against our CeDIS. Therefore, we decided to implement the use of seven gRNAs into our system for future applications of the Troygenics as an alternative to fungicides on crop fields. For our lab experiments however, we only used a combination of three gRNAs in growth experiments, since we expected the S. cerevisiae to not develop resistances in the first 421 hours, predicted by our model. This time frame is feasible for recording growth curves and observing the effects our CeDIS with three gRNAs had on cells.

Amitai, G., Callahan, B. P., Stanger, M. J., Belfort, G., & Belfort, M. (2009). Modulation of intein activity by its neighboring extein substrates. Proceedings of the National Academy of Sciences, 106(27), 11005–11010.

Brenzel, S. (2009). Künstlich gespaltene Inteine für die Protein-Semisynthese – Charakterisierung des Ssp DnaB Inteins und Modifikation des Ionenkanals OmpF mit Hilfe des Psp-GBD Pol Inteins. 148.

Cheriyan, M., Pedamallu, C. S., Tori, K., & Perler, F. (2013). Faster Protein Splicing with the Nostoc punctiforme DnaE Intein Using Non-native Extein Residues. Journal of Biological Chemistry, 288(9), 6202–6211.

Degani, O., & Cernica, G. (2014). Diagnosis and Control of <i>Harpophora maydis</i>, the Cause of Late Wilt in Maize. Advances in Microbiology, 04(02), 94–105.

Fong, D. H., & Berghuis, A. M. (2002). Substrate promiscuity of an aminoglycoside antibiotic resistance enzyme via target mimicry. The EMBO Journal, 21(10), 2323–2331.

Gootenberg, J. S., Abudayyeh, O. O., Lee, J. W., Essletzbichler, P., Dy, A. J., Joung, J., … Zhang, F. (2017). Nucleic acid detection with CRISPR-Cas13a/C2c2. Science, 356(6336), 438–442.

Meletiadis, J., Meis, J. F. G. M., Mouton, J. W., & Verweij, P. E. (2001). Analysis of Growth Characteristics of Filamentous Fungi in Different Nutrient Media. Journal of Clinical Microbiology, 39(2), 478–484.

Pettersen, E., Goddard, T., Huang, C., Couch, G., Greenblatt, D., Meng, E., & Ferrin, T. (2004). UCSF Chimera—A visualization system for exploratory research and analysis. - PubMed—NCBI. J Comput Chem., (25).

Molecular graphics and analyses performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311.

Sali, A., & Webb, B. (1989). MODELLER (Version 9.22). Retrieved from

Stevens, A. J., Sekar, G., Shah, N. H., Mostafavi, A. Z., Cowburn, D., & Muir, T. W. (2017). A promiscuous split intein with expanded protein engineering applications. Proceedings of the National Academy of Sciences, 201701083.