Team:Wageningen UR/Model



Safety Header


Modeling nowadays has become crucial to all scientific fields and synthetic biology is no exception. Computational methods have the ability to provide answers to questions that cannot be obtained in the lab. With Xylencer, we wanted to leverage this power and answer questions that are key in developing our phage therapy. We incorporated a broad array of different computational techniques to make full use of computational methods available in this day and age. This includes the use of temporal and spatial models, a custom physical protein modeling workflow, comparative genomics, and machine learning. This page serves to give an overview of the different questions we answered with computational techniques and links are provided for more in-depth information.

Spatial temporal modelling

How will our therapy function in a real-world setting?

Testing the application of phage therapy on a single field or larger space is far beyond the scope of an iGEM project. Still, this is one of the crucial stages in the Xylencer project. To gain a better understanding of how our therapy would spread and how efficient it would be at curing an X. fastidiosa infection, we employed spatial-temporal modeling. Since X. fastidiosa poses a big threat to the European continent, the EU already created models that can be used to assess the efficiency of different biocontainment approaches [1]. We took these models as a starting point and incorporated the Xylencer phages to assess the efficiency of our solution. This first required the construction of a model for the interaction between the Xylencer phage and an X. fastidiosa colony inside of the plant. Combining this model with the latest EU model yielded our final spatial spread model. From this model we could conclude that the Xylencer phages can be effective at combatting X. fastidiosa, finally providing the much-needed cure.

Figure 1: An illustration on the method of the spatial model. A mesh grid is created where each mesh point symbolizes a tree. For each tree the V(the abundance of vectors colonized by X. fastidiosa) an D(the disease level of the plant) is computed.

Genome-scale machine learning

How do we ensure our phage delivery bacterium is safe?

We revolutionize phage therapy with the phage delivery bacterium (PDB). However, selecting a compatible and safe strain that can serve this role, is a complicated matter. Phage replication requires a high similarity in cell metabolism between X. fastidiosa and the PDB. This is problematic as most species closely related to Xylella are known to be phytopathogenic, making them unsafe to use. A few species have reported non-pathogenic strains, but some of these strains were later shown to be pathogenic under different testing conditions [2]. This makes the classification of the other non-pathogens doubtful. By combining comparative genomics with machine learning, we found a genetic basis for non-pathogenicity and selected a set of non-pathogens that conform to this genetic basis. With this information, we selected Xanthomonas arboricola strain CITA 44 as the prime candidate for our PDB. By analyzing this model, we also identified both the lack of a Type III secretion system and a lack of transposons linked to specific pathogenicity islands, as important factors for non-pathogenicity in the Xanthomonas genus.

Figure 2: Results of the average prediction of 100 machine learning models. Dots resemble individual strains out of a manually curated pathogenicity dataset. Pathogenicity is indicated by the color. Strains that were miss predicted by more than 5% of the models are labeled as “missed” and are symbolized by a crossed-out dot. This figure shows that overall performance is strong, but that there are three main areas with conflicting reports on pathogenicity and thus cannot be reliably labeled as either pathogen or non-pathogen.

Physical Modeling

Would our protein fusions be theoretically possible?

An important part of our therapy is the enhanced Xylencer phages. The creation of these phages hinges on a fusion of the phage capsid protein and an adhesion protein. These capsid molecules form complex multimers to construct the phage capsid and a fusion protein must not restrict the capsid’s ability to assemble. To asses if the capsid is not sterically hindered by the fusion, we make use of protein modeling.

Modeling of regular proteins can nowadays be easily performed by using a single web server. However, fusion proteins cannot be aligned to a single template since they are composed of multiple proteins. This makes it hard to easily simulate them in high quality. We developed a workflow (described below) based on freely accessible web services, that allows anyone with basic knowledge on protein modeling, to model and visualize fusion proteins. Using this workflow, we were able to show that either the chitin-binding domain of chitinase A1 (Bacillus circulans) or a GFP can be fused to the decorator protein of phage lambda, our model phage, without obstructing the trimer-interface. We also used the same workflow to visualize the integration of quorum sensing fusion protein RpfCch into the outer membrane of the phage delivery bacterium phage delivery bacterium.

Figure 3: Overview of the fusion protein modelling workflow. The workflow starts with predicting domains, those domains are individually modelled and trimmed. Then the full sequence is threaded through the created models and the linkers are modeled ab initio, to yield a high-quality model.
  • Fusion protein modeling workflow arrow_downward

    A prevalent hypothesis on how proteins fold is that protein domain, the smallest functional units of proteins, fold first and do so individually. Only after the domains have folded, do the remaining parts of the protein fold in such a way as to minimize the free energy in the entire protein. By closely following these biological phenomena, our workflow aims to achieve the best possible model. The workflow starts with the detection of the different protein domains and individually modeling all the detected domains. Finally, the full sequence is threaded through the domain models and the connecting linkers are modeled ab initio (Figure 3).


    Protein domains are detected by running the entire sequence through the ThreadomEX [3] web server. Detected domains are modeled using the award-winning I-Tasser [4] web server. Generated models are manually trimmed to remove linker sequences using UCF ChimeraX [5], by removing approximately 5 to 20 amino acids at the start and end of the domains, allowing for more flexible modeling. The trimmed models are renumbered and assembled by the AIDA [6] web server. Additionally, for RpfCch the positioning in the outer membrane was calculated using the OPM [7] web server. The final models were visualized using ChimeraX.

Model Examples

Chimeric DSF sensor RpfCch positioned in the outer membrane. The protein is rainbow colored with the N-terminus blue and C-terminus red. The outer membrane is visualized in red and the inner membrane in blue.
Lambda decorator protein GpD fused to the chitin binding domain of adhesin ChitA. Blue = GpD, yellow = linker, Red = ChitA domain.
Lambda decorator protein Gpd fused to GfP. Blue = GfP, yellow = linker, Green = GfP.
  • References arrow_downward
    1. C. Bragard et al., “Update of the Scientific Opinion on the risks to plant health posed by Xylella fastidiosa in the EU territory,” EFSA J., vol. 17, no. 5, May 2019.
    2. Ferrante, P., & Scortichini, M. (2018). Xanthomonas arboricola pv. fragariae: a confirmation of the pathogenicity of the pathotype strain. European journal of plant pathology, 150(3), 825-829.
    3. Wang, Y., Wang, J., Li, R., Shi, Q., Xue, Z., & Zhang, Y. (2017). ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic acids research, 45(W1), W400-W407.
    4. Roy, A., Kucukural, A., & Zhang, Y. (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 5(4), 725.
    5. Goddard, T. D., Huang, C. C., Meng, E. C., Pettersen, E. F., Couch, G. S., Morris, J. H., & Ferrin, T. E. (2018). UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Science, 27(1), 14-25.
    6. Xu, D., Jaroszewski, L., Li, Z., & Godzik, A. (2015). AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction. Bioinformatics, 31(13), 2098-2105.
    7. Lomize, M. A., Pogozheva, I. D., Joo, H., Mosberg, H. I., & Lomize, A. L. (2011). OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic acids research, 40(D1), D370-D376.