Team:TAU Israel/Computational Work

As part of our project a number of different bioinformatic works have been done

Tail-or Swift

Tail-or Swift is a comprehensive software, aiming to find and create novel tail proteins, which will be used as part of our antibacterial pyocin system. The software uses different algorithms and methods in order to find suitable novel tails for our system. Using the software itself does not require any prior knowledge in coding or bioinformatics, and therefore it is accessible to all. You can learn more about how it works and how to use it in our Software page.

Back

Part Improvement

The improvement of an existing part is an important aspect of our iGEM project. In order to do so, we decided to take a part that has been used by many teams and to increase its expression. For this purpose, we used bioinformatics tools that allowed us to obtain synonymous mutations that improves the expression without altering the amino acid sequence. It is worth mentioning that previous works have been done in order to predict and design the complex biophysics of the translation process[1,2]; however, in order to avoid potential over-fitting, be able to design each of the relevant variables separately, and to have a full understanding and control on the process and the results, we have decided to develop our own methods and algorithms.
The methods that we used are:

1. Improving the Shine-Delgarno sequence: In prokaryotes, the Shine-Delgarno (SD) sequence is responsible for the recognition (via hybridization to the ribosomal RNA) of the mRNA by the ribosomal small sub-unit, and therefore has a direct influence on the initiation and rate of the translation[3]. In order to improve the translation rate, we have changed the given Shine-Delgarno sequence to optimal canonical anti-Shine-Delgarno sequence which match the anti SD sequence in the ribosomal RNA (AGGAGGU). In addition, we changed the distance between the Shine-Delgarno sequence and the start codon, so it too will be optimal, based on the typical distances in E. coli genes. We have done so by inserting two random nucleotides upstream of the start codon

2. Improving the mRNA's folding strength: The mRNA free folding energy determines how difficult it will be for the small sub-unit of the ribosome to recognize the start codon and initiate translation[4]. Thus, our goal was to lower the folding strength (making the free folding energy closer to zero). We used the ViennaRNA tool in order to predict the free energy of a sliding windows of size 40 nucleotides and calculated the average free energy of the entire mRNA[5]. We then looked for synonymous mutations that will reduce the free energy without altering the amino acid sequence. Since there is an enormous number of possible combinations of synonymous mutations (which is exponential with the length of the protein), a heuristic was needed. In order to explore those possibilities and to find the sequence with the best folding energy, we used a genetic algorithm, imitating the natural selection process.
The algorithm has the following stages:

- Stage 0 initialization We created 10,000 copies of our original mRNA sequence and measured its average folding energy. We repeated the following stages for 50 generations:
- Stage 1 selection Out of 10,000 variants we keep only the best 400 sequences (the ones with the lowest energy) and another 100 randomly selected sequences. This leaves us with 500 (out of 10,000 sequences), while the randomly selected sequences should help us avoid local maximum.
- Stage 2 cross-over We imitated the natural reproduction process. For 10,000 times we chose two random "parents" from the survivors of stage one. Then we randomly picked a location in the sequence and took the first part of the sequence (from the beginning up to the random location) from the first parent, and the second part (from the random location to the end of the sequence) from the second parent. We made sure that we have not changed the amino acid sequence, and from now on we will continue with these new 10,000 "offspring" sequences.
- Stage 3 mutation For each one of the 10,000 new sequences, we went over the nucleotides (until the thirteen codon) and in a chance of one to 1000 inserted a mutation (SNP). We made sure that the mutation does not change the Shine-Delgarno sequence, the start codon or the amino acid sequence.
- Stag 4 evaluation and ranking We took the mutated sequences and checked their average free energy, we ranked them accordingly and went back to stage 1.


The following graph shows the sequence with the weakest folding in every generation. As you can see, using this method we were able to create variants with weakened folding, without altering the amino acid sequence:


Finally, we tried to merge these two methods (Shine-Delgarno and folding optimization) in order to create the ultimate gene. We created different variants and checked their free energy across different location in the gene. Each part of the following graph represents the free energy of a window of size of 40 nucleotides beginning in this spot. The goal of this test is to verify that the folding energy near the start codon stays weak (closer to zero). The variants that we checked are the following:
- Original rfp gene
- Folding optimization (without Shine-Delgarno optimization)
- Shine-Delgarno optimization (without folding optimization)
- Shine-Delgarno optimization and folding optimization only on the new nucliotides that were added during the Shine-Delgarno optimization process (we went throw every possible combination of these two new nucleotides and took the ones with the best folding)
- Shine-Delgarno optimization and folding optimization
- Part Shine-delgarno and folding optimization (the previous variant, from the beginning until the start codon) and part only folding optimization (the second variant)

After reviewing these results, we have decided to test three variants in the lab:
- Shine-Delgarno and Folding optimization (green)
- Folding optimization (black)
- Shine-Delgarno optimization and folding optimization only on the new nucleotides (yellow)
The wet lab experiment has shown us that the Shine-Delgarno and Folding optimization (green) variant has preformed better than the original plasmid(!) and therefore we succeed in improving the part. You can read more about this test in our Part Improvement page

Back

Promoter Binding Detection

In order to create our plasmids, it was necessary to find the location of the pyocin cluster's promoter. To do so, we decided to use bioinformatics tools, as following:

1. Retrieved the genomes of more than 300 Pseudomonas strands, taking into consideration that the promoter should be preserved across different strands[6].
2. Due to the problematic annotations of these genomes we had to find the pyocin cluster ourselves: we took the first 10 nucleotides of prf5, the first gene in the cluster and searched in every genome for a sequence close (up to 1 change) to these nucleotides. We found such sequences in 116 genomes, and we proceeded with them.
3. From each genome we took 1000 nucleotides before the beginning of prf5 (that was found in the previous step) and 100 nucleotides after the beginning of the gene.
4. We performed a multiple sequence alignment (using MAFFT)[7]. The results showed an extremely high similarity between the sequences.
5. We created a consensus sequence out of the MSA, composed of the most common nucleotide in each location.
6. In Prokaryotes, the RNA polymerase uses two elements in order to correctly bind to the promoter - the -35 element and the -10 element[8]. These elements have consensus sequences (TTGACA and TATAAT respectively) but they can be found in many forms, close to this consensus. In addition, we know that the average distance (in E.coli) between these elements is 17[9]. Therefore we looked for sequences similar to these consensus sequences (up to 1 change) before the beginning of the cluster and with suitable distance in between. In our search we found suitable sequence of 94 nucleotides before the start codon. The suspected sequence is composed of TTGACG followed by TGTAAT exactly 17 nucleotides later, which match the literature perfectly. In the following graphs you can see the suspected promoter and the distances between it and the next and previous genes (the X axis). The Y axis represents how well conserved each position is.

Zooming in on the suspected promoter itself:

This bioinformatics research helped us to include the promoter in our plasmids.

Back

Refrences

[1] Shaham, G. & Tuller, T. Genome scale analysis of Escherichia coli with a comprehensive prokaryotic sequence-based biophysical model of translation initiation and elongation. DNA research : an international journal for rapid publication of reports on genes and genomes 25, 195–205 (2018)
[2] Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology 27, 946–950 (2009)
[3] Shine J, Dalgarno L. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A. 1974;71(4):1342–1346. doi:10.1073/pnas.71.4.1342
[4] Tuller, T., Waldman, Y. Y., Kupiec, M. , Ruppin, E. Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA 107, 3645 (2010)
[5] Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms for Molecular Biology 6, 26 (2011)
[6] Winsor GL, Griffiths EJ, Lo R, Dhillon BK, Shay JA, Brinkman FS. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database. Nucleic Acids Res. 2016;44(D1):D646–D653. doi:10.1093/nar/gkv1227
[7] Katoh, K., Rozewicki, J. , Yamada, K. D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics 20, 1160–1166 (2017)
[8] Browning, D. F. , Busby, S. J. W. The regulation of bacterial transcription initiation. Nature Reviews Microbiology 2, 57–65 (2004)
[9] Lisser, S. , Margalit, H. Compilation of E.coli mRNA promoter sequences. Nucleic Acids Research 21, 1507–1516 (1993)

Back