Team:NAWI Graz/Model

Beesensor

1 Summary



For a better understanding of our project, we decided to use computational power to model specific parts of our project. On the engineering side, we calculated several parameters including cyclic voltammetry, electrochemical impedance and phage spore binding to identify the proper values for our biosensor configuration.


On the biological side we calculated possible tertiary structures and quaternary structures of the protein we assume to be crucial for the binding on the Paenibacillus larvae cells and spores. The effort to calculate a monomer structure of g17 led to the possible knowledge that this protein relies heavily on the two other subunits to stabilize each other to form a functional receptor-binding protein. Even though we got a lot of data while modeling, additional steps would be necessary to verify them.


2 Introduction


2.1 General idea/reason for structure prediction


Phage genes are clustered and the same goes for spiphoviridae, the family of HB10c2. The first cluster contains structural proteins and the rough order of the genes is conserved throughout many species. They do not always share the same genes, but the functions are the same. The genes coding for the baseplate structure of the phages is located between the tail length tape measure protein (tmp) and holine genes. Two genes regularly found are Dit and Tal, which build the core of the baseplate and they are followed by more baseplate proteins, including the receptor-binding protein (RBP). Many RBPs are trimers and share the same functional structure, with a region linked to the phage, a middle part and the binding site, which contains the specific loop regions. In any case, the RBP can be expected to be a rather big protein.
The genome of HB10c2 has a cluster of structural genes too and the gene 14 has been identified as putative tmp. Although only little data is available, Phyre2 modeling of the genes 15 and 16 showed similarities to a Dit and a Tal protein respectively and gene 17 had similarities with the RBP of the lactococcal phage 1358. Following genes 18, 19, and 20 were only small and their structure did not resemble any RBPs known, according to the model of Phyre2. Gene 21 is a putative lysine and therefore should end the baseplate coding region1-4.



2.2 Software descriptions


The following sections should be a short overview of all software suits used for structure modeling. This part of the modeling site is intended to be read, understood and used by molecular biologists and related studies with basic bioinformatic background knowledge.



2.2.1 T-Coffee Expresso

T-Coffee as a multiple sequence alignment package can be used for alignment only or combining different aligning methods and gives hints on secondary structure similarities between submitted sequences based on the amino acid properties and not the identity only. It is opensource, distributed under the GNU public license.5 We used T-Coffee Expresso to yield information about general protein secondary structure and to compare those with proteins received from other algorithms

It is accessible under http://tcoffee.crg.cat/apps/tcoffee/index.html



2.2.2 Clustal Omega

Like T-Coffee, Clustal Omega can be used for the alignment of several amino acid sequences. But instead of comparing local structure information the software uses seeded guide trees and n new HMM engine to divide the sequences into small groups with similar structures.6 We used Clustal Omega to prepare grishin files as one obligate step for the Rosetta3CM comparative modeling algorithm.

It is accessible under https://www.ebi.ac.uk/Tools/msa/clustalo/



2.2.3 PSIPRED & pGenTHREADER

PSIPRED, offered by the University College London (UCL) Bioinformatics Group, is used for secondary structure prediction through neural network analysis of Position Specified Iterated BLAST (PSI-BLAST) results. pGenTHREADER is used for fold recognition and identification of distant homologues by making use of through PSIPRED generated data.7 We used both algorithms for sequence analysis and finding comparable PDB structures for our target protein.

Both are accessible under http://bioinf.cs.ucl.ac.uk/psipred/



2.2.4 Robetta

As a continually evaluated web interface, Robetta offers ab initio and comparative modeling of protein domains for non-commercial users. If no PDB homologues were detected modeling occurs through usage of the Rosetta de novo protocol.8 For comparative modeling, PDB structures are detected, aligned using HHSEARCH/HHpred, RaptorX and Sparks-X, clustered and furthermore models are generated automatically by the Rosetta comparative modeling protocol (RosettaCM9). Robetta was used to generate fragment libraries as obligate step for Rosetta3 comparative modeling and for preliminary structure predictions.

Robetta is available for registered users under http://new.robetta.org/



2.2.5 SWISS-MODEL

The Schwede Group of Swiss Institute of Bioinformatics offers with SWISS-MODEL a web interface for guided protein structure homology modeling. The comparative modeling engine makes searches for viable candidates in structural databases based on initial structure information yield through alignment. If no candidates were found Monte Carlo algorithms are used for finding conformational spaces. By usage of the backbone-dependent rotamer library OpenMM, the graph-based TreePack algorithm and SCWRL4 energy function minimization unfavorable folding can be resolved.10 We used this algorithm to find homolog proteins with verified structures.

The application is available under https://swissmodel.expasy.org/interactive



2.2.6 Phyre2

Phyre2 as Protein Homology/analogy Recognition Engine V 2.0 powered by the Structural Bioinformatics Group of Imperial College London is a web interface that searches for homologous sequences in comparison to a supplied amino acid sequence via PSI-Blast. Those sequences are converted into a Hidden Markov Model (HMM) containing all mutation patterns in between the homologous sequences representing the supplied protein evolutionary history. Phyre2 generated an HMM database of all known protein structures which is used for scanning the supplied sequence HMM against the HMM database of known proteins to detect similarities through alignments and building 3D structures. This procedure should work nicely even if similarities are below 15%.11, 12 We used Phyre2 for initial structure prediction.

The application is available under http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index



2.2.7 GalaxyHomomer

This application is powered by the Computational Biology Lab of the Department of Chemistry of Seoul National University and can be used to predict homo-oligomer structures based on amino acid sequences or monomer structures with additional input on how many monomers you expect in the final oligomer. It generates 5 models through similarity-based, structure similarity-based and ab initio docking approaches based on the supplied amino acid sequence or the structure input. Those structures will its automatically detected less reliable loop or terminus regions remodeled and the structures will be relaxed with the GalaxyRefineComplex protocol.13 We used GalaxyHomomer to compare the predicted homotrimer protein with the already verified homotrimer baseplate proteins of lactococcal phage.

GalaxyHomomer is available under http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=HOMOMER



2.2.8 Rosetta3

Rosetta as a software suite used for the prediction and design of protein structures, protein folding mechanisms and protein interactions, was developed and is maintained strongly by the BakerLab at University of Washington and BakerLab connected developers all over the world. It began in the late 90s as a protein structure prediction tool and is now used by over 10,000 licensed users for a wide breadth of fields in life science like the development of new vaccines, materials, targeted protein binders, and enzyme design. It generally works on the input of amino acid sequences and protein structure files and instead of many named web interface based applications, Rosetta3 needs to be set up on local devices and clusters and people may take some time to be able to fully unfold its potential to use it to solve their project challenges.14 We used the Rosetta3 comparative modeling (RosettaCM9) according to the Tutorial to produce structures of our target protein g17 on the help of additional supplied threaded models of protein structures yield and hinted on by other applications.

Free licenses for academic usage only and the software suite are available under https://www.rosettacommons.org/software/license-and-download



2.3 Method descriptions and inputs


2.3.1 Basic Local Alignment Search Tool - BLAST

Blast algorithms like NCBI blast compare supplied input to a wide range of databases to identify identities and lead users to further information stored in publications etc. We used blastp for initial comparison of the identity of our target protein with the ncbi and the protein database (PDB) to find further publications and structure solved proteins with high sequence identity.

NCBI blastp is available under https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins



2.3.2 Alignment


Alignment algorithms compare sequence identities and depending on the software also amino acid properties.5 In our case, T-Coffee and Clustal Omega were supplied with the amino acid sequence files in FASTA format of our target protein g17 and several structure solved proteins that were hinted on by SWISS-MODEL and PSIPRED & pGenTHREADER to produce alignments that could be converted into obligate Grishin files to prepare RosettaCM runs.

2.3.3 Threading


Threading is a method to produce protein structures and is comparable to homology modeling. Instead of relying only on sequence information, threading makes use of structural information too and can, therefore, be used for protein structure prediction without supplying of previously described proteins with identical sequences.16 In our case, this was done by the partial_thread.default.linuxgccrelease protocol of the Rosetta3 software suite supplied with prepared Grishin alignment files and our targets FASTA file.
As input, we used a g17 FASTA file plus protein structures suggested by PSIPRED & pGenTHREADER or protein structures suggested by SWISS-MODEL.



FASTA file Suggested by pGenTHREADER grishin files SWISS-MODEL grishin files
g17 FASTA file g17_on_4l9b.pdb
g17_on_5m5z.pdb
g17_on_5xqgA.pdb
g17_on_6gw0.pdb
g17_on_2f0c.pdb
g17_on_3d8m.pdb
g17_on_3da0.pdb
g17_on_3ejc.pdb
g17_on_3hg0.pdb
g17_on_3u6x.pdb
2.3.4 Homology modeling

Homology modeling, also known as comparative modeling, is a method for protein structure prediction that makes use of homologous proteins with high sequence identity levels and already solved structures.16 We used the Robetta comparative modeling web interface and RosettaCM protocol17, 18 of the Rosetta suite to model our monomer protein g17 and GalaxyHomomer to model a potential homotrimer structure building of g17 units.



PHYRE2 Robetta Comparative modeling & ab initio GalaxyHomomer RosettaCM
g17 FASTA g17 FASTA g17 FASTA
Homotrimer option
g17 FASTA
rosetta_cm.xml
rosetta_cm.options
relax.options
hybridize command via rosetta_scripts
relax command
2.3.5 Ab initio

Ab initio is part of general protein structure prediction based on amino acid sequences and becomes relevant if no homologous protein structures exist or could be identified so far. This means ab initio predictions are only based on single atom interactions calculated and weighted by energy functions of high complexity.19 We used the Robetta ab initio protocol for initial structure prediction.



3 Results & Discussion



3.1 NCBI blastp

The Blastp search against the Non-redundant protein sequences (nr) database results showed that 17 entries contained proteins with a sequence identity with >90%, but most of them were marked hypothetical proteins and none of them had a verified structure. Exchanging the database with the Protein data bank proteins (pdb), the results contained only one entry, the Chain A of Topoisomerase V of Methanopyrus kandleri 5HM5_A with a sequence identity of 27.27%. Therefor other search algorithms had to help us find comparable structures.



3.2 PHYRE2

Because of the taxonomy relationship between our Paenibacillus larvae phage and the lactococcal phage stated in 2.1 we submitted several genes of one cluster that should assumably contain the baseplate protein responsible for the phage-bacteria binding event. As a result, our research concentrated on the protein g17 because the built monomer showed a similar structure as the for the binding event responsible protein of lactococcal phage. Both structures are displayed in Figure 1.



3.3 PSIPRED & pGenTHREADER

The PSIPRED results as a prediction of secondary structures are displayed in Figure 1. The pGenTHREADER results are displayed in Figure 2 and let to the first sequences used for the preparation of grishin files and later as threaded template structures for RosettaCM comparative modeling. But because of poorly results of the rosettaCM protocol new structures had to be found what was done using the SWISS-MODELS algorithm.



3.4 SWISS-MODELS

The SWISS-MODELS sequence alignment results led to six structure solved phage proteins with a sequence similarity from 19 to 21% which were all downloaded from the Protein Data Bank Japan. These sequences and structures were used for additional grishin files preparations and RosettaCM calculations.



3.5 T-Coffee & Clustal Omega

The T-Coffee alignment results of g17 and proteins suggested by pGenTHREADER are displayed in Figure 1 and the alignment results of g17 and proteins suggested by SWISS-MODELS are displayed in Figure 1. The Clustal Omega alignment results of g17 and proteins suggested by pGenTHREADER are displayed in Figure 1 and the alignment results of g17 and proteins suggested by SWISS-MODELS are displayed in Figure 1. These results were used to produce grishin filesfor the RosettaCM partial_thread.default.linuxgccrelease protocol.



3.6 Robetta

Robetta produced five comparative modeling models with an overall confidence of 0.08 that are displayed with their corresponding angstrom error estimations per amino acid position in Figure 1-5. The five produced structures show a high tertiary structure diversity and none of them looked like the PHYRE2 prediction. The estimated angstrom errors per position varied a lot between 1 and 25 Å but a good amount of positions was at least . Robetta produced five ab initio models with an overall confidence of 0.11 that are displayed with their corresponding angstrom error estimation per amino acid position in Figure 1-5. The five produced structures show a high tertiary structure diversity and none of them looked like the PHYRE2 prediction. The estimated angstrom errors per position varied a lot between 5 and 45 Å which were not reliable at all for further analyzations. Overall, the predicted structures did not look like the PHYRE2 predicted structure at all, but this is most likely because of Robetta building the proteins as monomer stabilizing itself whereas in nature at least the already solved receptor binding proteins of phages form homotrimers to assumable stabilize each other. But regarding the C-terminus end, where the lactococcal proteins formed individual beta barrels, some Robetta structures gave hints for this tertiary structure too, especially cm_model5 in the range of AS325 where the estimated angstrom error per position was around 2Å, which is actually pretty good.



3.7 RosettaCM

We used the RosettaCM protocol with the corresponding commands and files to produce around 100 structures and a selection of the produced structures is displayed Figure 4. The scorings got more and more negative from different predicted structures to others with a minimum scoring of -600 of the structure __0100 which could be compared with the energy that would be released after burying the hydrophobic residues of the amino acids in the protein interior also known as the hydrophobic collapse20. Even though, this method for structure prediction is not sufficient for our protein as it seems like the monomers rely on the stabilization of the two other units to form their characteristic C-terminal beta-barrels.

3.8 GalaxyHomomer

GalaxyHomomer initially produced five models that are displayed in Figure 1-5 and Model3 was submitted for additional refinement calculation which produced 10 more structures, with no real improvement in folding. From the five initial models, only model_3 showed interesting folding behavior as it used, post T211, the same tertiary structures as the lactococcal phage homotrimer protein to let the monomer units stabilize each other and the same beta barrels assumable be important for the binding event. The long helices which could not be modeled with sufficient precision presumably anchor the protein in the baseplate, as it has been modeled for the baseplate of the phage A11821. That is the reason why model_3 was resubmitted to GalaxyHomomer for additional refinement but those calculations did not lead to different N-terminal folding pre T211. It may look astonishing, but the stability is probably bad and it could be that the same mechanism is involved here like for the monomer. That only the presence of the other proteins forming the complete base plate ensure correct tertiary structure folding. But as preliminary results and as a hint on the possibility that the structure of our Biobrick indeed could look like we expected, this structure can be used for further calculations.



4 Conclusion


In conclusion, a lot of steps for the preparation of structure predictions lead to miserable results in the beginning. Especially the big differences in using algorithms that use the same mechanisms but gave very different results made this modeling work quite a challenge. Luckily, we had some people in our institute that could help us a lot and that is why we want to point out the support of Gustav Oberdorfer and Julia Messenlehner. Without their feedback on this subproject would not have been possible at all. With more work in information processing, refinement and making decisions which information is useful and which not we want to lead this biological modeling to predict a reliable structure of a potential active g17. This would strongly support the lab experiments and help the beeosensor project to succeed beyond the iGEM project.

5 References



[1] Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment, Journal of molecular biology 302, 205-217.

[2] Madeira, F., Park, Y. M., Lee, J., Buso, N., Gur, T., Madhusoodanan, N., Basutkar, P., Tivey, A. R. N., Potter, S. C., Finn, R. D., and Lopez, R. (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019, Nucleic Acids Res 47, W636-W641.

[3] Lobley, A., Sadowski, M. I., and Jones, D. T. (2009) pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination, Bioinformatics (Oxford, England) 25, 1761-1767.

[4] Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B.-H., Das, R., Grishin, N. V., and Baker, D. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta, Proteins 77 Suppl 9, 89-99.

[5] Notredame, C., Higgins, D. G., and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment, Journal of molecular biology 302, 205-217.

[6] Madeira, F., Park, Y. M., Lee, J., Buso, N., Gur, T., Madhusoodanan, N., Basutkar, P., Tivey, A. R. N., Potter, S. C., Finn, R. D., and Lopez, R. (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019, Nucleic Acids Res 47, W636-W641.

[7] Lobley, A., Sadowski, M. I., and Jones, D. T. (2009) pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination, Bioinformatics (Oxford, England) 25, 1761-1767.

[8] Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B.-H., Das, R., Grishin, N. V., and Baker, D. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta, Proteins 77 Suppl 9, 89-99.

[9] Song, Y., DiMaio, F., Wang, R. Y.-R., Kim, D., Miles, C., Brunette, T., Thompson, J., and Baker, D. (2013) High-resolution comparative modeling with RosettaCM, Structure 21, 1735-1742.

[10] Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F. T., de Beer, T. A P., Rempfer, C., Bordoli, L., Lepore, R., and Schwede, T. (2018) SWISS-MODEL: homology modelling of protein structures and complexes, Nucleic Acids Res 46, W296-W303.

[11] Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N., and Sternberg, M. J. E. (2015) The Phyre2 web portal for protein modeling, prediction and analysis, Nature Protocols 10, 845.

[12] Soding, J. (2005) Protein homology detection by HMM-HMM comparison, Bioinformatics (Oxford, England) 21, 951-960.

[13] Baek, M., Park, T., Heo, L., Park, C., and Seok, C. (2017) GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure, Nucleic Acids Res 45, W320-W324.

[14] Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., Kaufman, K., Renfrew, P. D., Smith, C. A., Sheffler, W., Davis, I. W., Cooper, S., Treuille, A., Mandell, D. J., Richter, F., Ban, Y. E., Fleishman, S. J., Corn, J. E., Kim, D. E., Lyskov, S., Berrondo, M., Mentzer, S., Popovic, Z., Havranek, J. J., Karanicolas, J., Das, R., Meiler, J., Kortemme, T., Gray, J. J., Kuhlman, B., Baker, D., and Bradley, P. (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules, Methods in enzymology 487, 545-574.

[15] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool, Journal of molecular biology 215, 403-410.

[16] Peng, J., and Xu, J. (2010) Low-homology protein threading, Bioinformatics (Oxford, England) 26, i294-i300.

[17] Frank DiMaio, S. R., Jared Adolf-Bryfogle. RosettaCM - Comparative Modeling with Rosetta, Rosetta Commons.

[18] Frank DiMaio, S. R., Jared Adolf-Bryfogle. (2019) Comparative Modeling: Multi-template modeling with RosettaCM, Rosetta Commons.

[19] Huang, P.-S., Boyken, S. E., and Baker, D. (2016) The coming of age of de novo protein design, Nature 537, 320.

[20] Fraga, S., Parker, J. M., and Pocock, J. M. (1995) Computer Simulations of Protein Structures and Interactions, Springer-Verlag.

[21] Cambillau, C. (2015) Bacteriophage module reshuffling results in adaptive host range as exemplified by the baseplate model of listerial phage A118, Virology 484, 86-92.