Team:SUIS Shanghai/Model

Modeling

Overview

We applied modelling to inform the design of many of our project’s parts. This including using bioinformatic tools and molecular visualization software such as Pymol to make predictions about part sequences and influence our designs.


Our modelling for this project can be divided into two sections: main part where we applied a large amount of statistical data plus molecular visualization and predictive tools to inform the improvement of an oxidative stress (H2O2) sensor. This new part (BBa_K3031018) was part of our “improvement” portion of the project and was aimed to improve the basic part (BBa_K1104200) and add characterization for parts (BBa_K1104201, BBa_K1104241 and, BBa_K1104200), There was also another portion of our modelling where we applied bioinformatics to predict the cleavage site of signal peptides in order to create a catalogue of signal peptides from the genome of Lactobacillus plantarum.


OxyR new part
Signal Peptides Design
OxyR new part

H2O2 sensor design rationale

Because our project is related to pathogenic infection of fish and an immune response, our team – in addition to designing orally administered vaccines, also explored biomarkers associated with infection in fish. The production of Reactive Oxygen Species (ROS) is well known non-specific immune response for many organisms to stressors.


Leukocyte respiratory burst activity: neutrophils, monocytes, macrophages, dendritic cells and B-lymphocyte produce ROS as a strategy of the immune system for several pathogens destruction, activated after microbial agent detection. It has the function of producing strong oxidizing compounds that act for microorganisms’ destruction.


The fish immune system is composed of innate and acquired mechanisms of defense. Phagocytosis is an innate process, which interconnects these two systems, since the pathogens processing by professional phagocytes is a fundamental stage for the antibodies production. During phagocytosis there is the production of ROS which leads to the formation of potent bactericides compounds to combat microorganisms.


OxyR

OxyR is an Escherichia coli transcription factor that senses H2O2 to activate transcription in downstream genes. It is activated when the OxyR protein reacts with a ROS resulting in oxidation of a reactive cysteine and the formation an intramolecular disulfide bond. The cysteine-199 is oxidized and forms an internal disulfide bond with Cys-208, resulting in a conformation change. The active OxyR now can bind to transcription factor site and activate many promoters.


Figure 1 - OxyR Transcription Factor protein in its reduced form (left) and oxidized form (right) showing a disulfide bond between Cys-199 and Cys-208.

Our objective was to improve the biobrick part for OxyR transcription factor (Part:BBa_K1104200), which can activate ROS-sensing promoters. Our approach was to analyze site specific structural information of a wide range of proteins containing a reactive cysteine and apply what we found to mutate and improve the existing biobrick part.


The first stage in the formation of the post translational modification of disulfide bond formation is the oxidation of a cysteine to form a cysteine sulfenic acid (Cys-SOH). This modification is reversible but further oxidation of sulfenic acid can cause irreversible modifications forming Sulfinic and Sulfonic acids, leading to a loss of protein function. Some proteins, like OxyR, can recover function through forming an intermolecular disulfide bond with another neighboring cysteine thiol.


Figure 2 - 2D structure of S-sulfenylated Cysteine.

We decided to explore the structural environments surrounding cysteine sulfenic acids as an indicator of cysteine susceptibility to oxidation. This post-translational modification been identified as a redox sensor in an increasing number of proteins (Beedle et al., 2016). We made the plan to compare the site-specific structural features of cysteine sulfenic acids and compare them to unmodified cysteines within the same proteins. The data collected informed us of potential structural features we could incorporate into OxyR to increase its sensitivity to H2O2.


Background Site-Specific Structural Data

Our approach involved using site specific structural data gathered from high resolution crystal images of proteins shown to have an oxidized cysteine (i.e. formed a sulfenic acid). The environments of these resides were explored and analyzed and compared to non-oxidized cysteines within the same proteins. Differences between the environments of oxidized and reduced cysteines that were discovered between these two groups were taken as indicators of facilitating oxidation and therefore could potentially be used to modify OxyR protein and make it more sensitive to oxidative stress. High-resolution crystal structures of proteins (2.2 Angstroms resolution or better) containing S-sulfenylated cysteine sites were first obtained from the PDB (www.pdb.org) and multiple sequence alignment was used to ensure that only unique protein microenvironments would be analyzed. A representative list of unmodified cysteine sites was identified for comparative analysis with the modified sites by first profiling the proteins containing S-sulfenylated sites by UnipriotKB GO annotations (https://www.uniprot.org/uploadlists/) and then by choosing 40% of proteins from the original S-sulfenylated cysteine protein list which contained unmodified cysteines. In total the number of unique S-sulfeylated cysteine environments analyzed was 373 from 322 proteins. While using the selection criteria outlined in above, a total of 426 unmodified cysteine sites were analyzed in comparison.Refer to the workflow below for an outline of data selection.


Figure 3 - Workflow showing the selection of PDB files for site specific analysis of S-sulfenylated cysteines and unmodified cysteines.

Spatial Environment of S-sulfenylated cysteine sites.


The Amino acid, solvent, and ligand contacts for each atom contained within all Cys-SOH and unmodified cysteine sites in the dataset was furthered analyzed using PISA (Protein Interphases, Surfaces and Assemblies) (Krissinel & Henrick, 2007). The CCP4 Graphical User Interface (CCP4i) (Potterton, Briggs, Turkenburg, & Dodson, 2003) was used to perform and visualize PISA calculations. Crystal structures of protein containing sites of interest in PDB format were used as inputs into the interface. Default parameters were used and only calculations for contacts between 2.0 to 3.2 Angstroms within the chain containing the residue of interest were obtained.


Figure 4 - Proportion of amino acid type in the 3D spatial neighborhood (within 2.0 to 3.5 Angstroms) of S-sulfenylated cysteines (left) and unmodified cysteines (right). The total number of S-sulfenylated environments was 373 and the total number of unmodified cysteine sites after selection was 426.


Linear Sequence surrounding cysteine sites.


In order to investigate the effect of primary amino acid sequence on cysteine oxidation, structural analysis was performed on a 21-mer amino acid sequence for S-sulfenylated sites and compared to all unmodified cysteine site sequences (+/- 10 residues from the cysteine). The frequency of amino acids within the 20 residue windows surrounding both the central Cys-SOH and unmodified cysteine were determined. Sequence logos to visualize the most abundant residues for each environment was created using WebLogo (http://weblogo.berkeley.edu/logo.cgi) (Crooks, Hon, Chandonia, Brenner, 2004).


Figure 5 - S-Sulfenylated Cysteines +/- 10 AAs

Figure 6 - Unmodified Cysteines +/- 10 AAs

To investigate the position specific discrepancies between amino acid sequences surrounding both Cys-SOH and unmodified cysteines sites, the plogo algorithm (https://plogo.uconn.edu/) (O’Shea et al., 2013) was employed. This computational tool allows for identification of both over- and underrepresented residues at each position along a sequence.


Residues at each position within an input sequence alignment are scaled relative to their statistical significance and not overall frequency, after comparison to a user defined background dataset. Residue at each position are assigned a value based on the statistical significance of the frequency of that residue in the foreground data set given the probability of that residue in the background data (O’Shea et al., 2013). Each residue value is visually represented in pLogo by the height of the particular residue at each position either above or below the xaxis using a p-value = 0.05.


Figure 7 - Over and Under represented resdiues flanking a central S-sulfenylated cysteine when compared to unmodified cysteine resdiues.

Determining the Solvent Accessible Surface Area


The solvent accessible surface area (SASA) of Cys-199 was determined using AREAMOL. Areaimol is a tool from the collection of programs from the Collaborative Computational Project No. 4 (CCP4) software suite version 7.0 (Winn et al., 2011). It finds the solvent accessible area of atoms in a PDB coordinate file by calculating the area of an atoms Van der Waals surface that a probe sphere rolls over (Lee & Richards, 1971). The default probe radius of 1.4 Angstroms was used to determine SASA. The CCP4 Graphical User Interface (CCP4i) (Potterton et al., 2003) was used to perform AREAIMOL calculations.


SASA of cysteine-199 in reduced form (PDB ID: 1i69) = 21.5 Angstroms

Relevant Solvent Accessibility


Figure 8 - Quantile Quantile plot showing that S-sulfenylated cysteines have a higher distribution of relative solvent accessibility (RSA) values skewed towards a larger solvent accessibility (i.e. higher RSA) when compared to unmodified cysteines. This plot was constructed by sorting the RSA values of S-sulfenylated cysteines in our data set (n = 373) and unmodified reduced cysteine forms (n = 426) into percentiles and plotting them against each other. The diagonal line represents a line where all points would fall should the RSA values follow an equal distribution between sites.

Relative solvent accessibility (RSA) was determined by dividing the SASA by the max accessible surface area (maxASA).


RSA = SASA/maxASA
where: RSA = relative Solvent Accessibility
SASA = Solvent Accessibly Surface Area
maxASA = maximum Accessible Surface Area

The maxASA value used for cysteine 167.0 Angstroms (Tian et al., 2013). An RSA of 10% was used to be the cut off for classifying a residue as surface accessible (RSA > 10%) or buried (RSA > 10%). A comparison of modified and unmodified cysteine RSA values was conducted in order to determine if there was enrichment for exposed or buried surfaces within the Cys-SOH environments. A quantile– quantile probability plot (Q-Q plot) was used to determine if there was enrichment for exposed or buried cysteines within the Cys-SOH compared to unmodified sites. RSA values of modified and unmodified cysteine residues were sorted and values of RSA for each quantile was plotted against each other (i.e. RSA values of modified and unmodified cysteine residues were sorted from 0.01 to 1.0 and plotted against each other).


Conclusions

The above data was used to inform our decisions for improving the OxyR part. We wanted to make OxyR more sensitive to ROS, we concluded that this meant we wanted to make the reactive cysteine more susceptible to oxidization. From the above data we inferred that solvent accessibility is an important factor for determining the cysteines susceptibility to be oxidized. This is illustrated in the Quantile-Quantile plot shown in figure 8. Here RSA values of modified and unmodified cysteine residues were sorted and values of RSA for each percentile were plotted against each other. The data shows a clear tendency to skew towards the S-sulfenylated cysteines indicating that Cys-SOH residues have a much larger distribution towards areas that are surface accessible than the unmodified cysteine residues within the same proteins.


It was difficult to make any conclusions about the amino acid profile in the surrounding environment and we were worried about making too many structural changes to the protein itself, so we decided to mutate residues to others with similar properties only and with the main objective of increasing the solvent accessible surface area of the reactive cysteine-199. However Serine was found to be over represented at position +4 and we decided to incorporate that into our new modified protein as the original sequence also contained a polar amino acid there (Glutamine).


Mutations made to OxyR:

1. Residue 203: Glutamine (wild type) mutated to Serine


Reasoning: Serine is overrepresented at position + 4 in the amino acid sequences flanking the S-sulfenylated cysteines (fig 7). In OxyR this position was filled with a Glutamine residue. We decided that placing a Serine there may allow for an improvement sensitivity to oxidation. From the literature accompanied by our modeling it was found that Serine, as a residue being an uncharged H-bond donor, likely has an important role in the activation of Cysteine, mainly by lowering its pKa (Mariano & Gladyshev, 2011). These two pieces of evidence allowed us to make this decision to mutate the protein.


2. Residue 149: Valine (wild type) mutated to Glycine


We replaced on hydrophobic residue with another smaller sized one in the hope of increasing the solvent accessibility on the cysteine’s thiol. Val-149 was in close contact with Cys-199 as described below.


Using PyMol to visualize, mutate, and analyze the Cysteine environment of OxyR

The team used PyMol to visualize and explore the environment surrounding the OxyR protein. We obtained the 3D crystal structure of the reduced form of OxyR from E. colifrom the PDB (PDB ID: http://www.rcsb.org/structure/1I69) and used this to analyze the 3D spatial environment and view the residues making contact with the Cys-199. We observed all residues within a 5 Angstrom distance from Cys-199 to get an initial view of the site environment.


Command: sele all with 4 of chain A and res 199
Figure 9 - Environment surrounding the reactive cysteine-199 of the OxyR protein crystal structure protein (PDB ID: 1i69). Cysteine-199 is colored in Blue and the residues chosen for mutagenesis are highlighted in magenta (Val-147 and Glu-203).

Figure 10 - Environment surrounding the reactive cysteine-199 of the OxyR protein crystal structure in reduced form (PDB ID: 1i69) after mutagenesis. New residues (Gly-147 and Ser-203) are colored in light grey.

Once the team decided upon the mutations to make within the protein, we used PyMol to visualize the protein and make mutations to individual residues within the protein. The most stable rotamers were chosen and the 3D file was then saved for later calculations of the solvent accessible surface area.


Pymol offers the function for users of mutagenesis – we mutated the 3D protein according to the predictions we could infer from the data and saved the new PDB file for further analysis of solvent accessible surface area using PISA.


Before Mutagenesis RSA of cysteine-199:


RSA = SASA/maxASA
RSA = 21.5/167.0

Relative solvent accessibility (RSA) of cysteine-199 in reduced form (PDB ID: 1i69) = 12.8%


After Mutageneis RSA of cysteine-199:


SASA of cysteine-199 after mutagenesis = 37.0 Angstroms
RSA = SASA/maxASA
RSA = 37.0/167.0

Relative solvent accessibility (RSA) of cysteine-199 in reduced form (PDB ID: 1i69) = 22.2%


References:

Beedle, A., Lynham, S., & Garcia-Manyes, S. (2016). Protein S-sulfenylation is a fleeting molecular switch that regulates non-enzymatic oxidative folding. Nature Communications, 7(1), 12490.


Crooks, G., Hon, G., Chandonia, J., & Brenner, S. (2004). WebLogo: A sequence logo generator. Genome Research, 14(6), 1188-90.


Krissinel, & Henrick. (2007). Inference of Macromolecular Assemblies from Crystalline State. Journal of Molecular Biology, 372(3), 774-797.


Lee, & Richards. (1971). The interpretation of protein structures: Estimation of static accessibility. Journal of Molecular Biology, 55(3), 379,IN3-400,IN4.


Marino, S., & Gladyshev, V. (2012). Analysis and functional prediction of reactive cysteine residues. The Journal of Biological Chemistry, 287(7), 4419-25.


O'Shea, Joseph P., Chou, Michael F., Saad A Quader, James K Ryan, George M Church, & Daniel Schwartz. (2013). PLogo: A probabilistic approach to visualizing sequence motifs. Nature Methods, 10(12), 1211-2.


Potterton, E., Briggs, P., Turkenburg, M., & Dodson, E. (2003). A graphical user interface to the CCP 4 program suite. Acta Crystallographica Section D, 59(7), 1131-1137.


Winn, M., Ballard, C., Cowtan, K., Dodson, E., Emsley, P., Evans, P., . . . Wilson, K. (2011). Overview of the CCP4 suite and current developments. Acta Crystallographica Section D, 67(4), 235-242.


Signal Peptides Design

As outlined in our project design page we intended to create and characterize new biobricks involved in transporting of proteins to the cell peripheries. Our application of these parts would be to design surface anchored proteins that can be used in the design of live recombinant bacterial vaccines. Our chosen chassis Lactobacillus plantarum has shown promise as a useful species for the delivery of such vaccines. The registry now contains a few biobricks which have been characterized as excellent signal peptides for secretion, cell wall or cell membrane display of recombinant proteins. Usp45 (BBa_K183002) is a commonly used biobrick. It has been noted that


The complete genome of L. plantarum WCFS1 has been determined and Matheisen et al., (2009) has performed a genome wide study of signal peptide functionality for this species. Give that this lactic acid bacteria holds Generally Regarded as Safe (GRAS) status, it is a promising vector for the delivery of biomolecules. We aimed to use information from Matheisen eta l., (2009) to identify potential high performing signal peptides (SPs) for cell secretion. It is also important to note that signal peptide functionality has been reported to be highly difficult to predict so by characterizing a large catalogue of SPs the synthetic biology community can advance the field of recombinant bacterial delivery systems. A potential application is the development of live vaccine delivery vehicles.


We firstly identified four promising SPs from the study mentioned above and obtained the entire protein sequence from the L. plantarum genome on the NCBI database. To create our signal peptides we used the bioinformatic tool SignalP version 5.0 (http://www.cbs.dtu.dk/services/SignalP/). This tool predicted the cleavage site of each secretory signal peptide entire sequence and this amino acid sequence was used to create our biobricks by using codon optimization service from Genscript. Users of SignalP 5.0 must be reminded to choose the correct type of cell for your input sequences.


Each Signal Peptuide sequence we obtained using this tool was used to make a protein domain biobrick by adding a common rbs upstream.


SIGNAL PEPTIDE (LP_2578): part BBa_K3031003

Amino Acid seqience = MRRKLVGYMLSMLTVILALFMLGSTAHAK Nucleotode sequence (after codon optimization) = ATGCGAAGAAAGCTAGTTGGGTATATGCTAAGCATGTTGACGGTAATATTAGCGCTATTTATGCTGGGGAGTACTGCACATGCTAAA


Signal Peptide Lp_2145:


Amino Acid seqience = MKKINKLMILGMLVFGVTGATMINPEMTTAAHASAN Nucleotode sequence (after codon optimization) = ATGCGAAGAAAGCTAGTTGGGTATATGCTAAGCATGTTGACGGTAATATTAGCGCTATTTATGCTGGGGAGTACTGCACATGCTAAA


SIGNAL PEPTIDE (LP_0373): part BBa_K3031004

Amino Acid seqience = MYTENTGKHHRNGLPVWLLPLLVVISFWGVSQNIMVVDASS Nucleotode sequence (after codon optimization) = ATGTATACGGAAAACACGGGGAAACATCATCGTAATGGTCTGCCAGTTTGGTTGTTGCCATTACTGGTCGTCATCTCTTTCTGGGGTGTGTCACAAAATATCATGGTGGTTGATGCTTCATCA


SIGNAL PEPTIDE (LP_3050): part BBa_K3031006

Amino Acid seqience = MKKFNFKTMLLLVLASCVFGVVVNVTTSLGPQTAITTAQASK Nucleotode sequence (after codon optimization) = ATGAAAAAATTTAACTTTAAAACCATGTTGCTATTAGTTTTGGCTAGTTGTGTCTTCGGGGTCGTCGTTAACGTGACTACTAGTCTTGGACCACAAACCGCAATCACCGCCCAGGCCTCCAAG


The application of these signal peptides tour project and future projects who may want to use Lactobacillus plantarum as a chassis is that they play and essential role in secretion of recombinant proteins or the transport of proteins to the cell peripheries for cell wall anchoring. This system can be seen in our design below. Our signal peptides listed here were used to make composite parts containing a cell wall anchoring protein from L. plantarum and an antigen from a virus which affects fish of the cyprid family. Similar systems have been developed for delivery of antigens targeting cancer and have shown promise in mice studies. We hope our parts can are useful to those seeking to use L. plantarum for similar ventures in the future.


References:

Almagro Armenteros, J., Tsirigos, K., Sønderby, C., Petersen, T., Winther, O., Brunak, S., . . . Nielsen, H. (2019). SignalP 5.0 improves signal peptide predictions using deep neural networks. Nature Biotechnology,37(4), 420-423.


Mathiesen, G., Sveen, A., Brurberg, M., Fredriksen, L., Axelsson, L., & Eijsink, V. (2009). Genome-wide analysis of signal peptide functionality in Lactobacillus plantarum WCFS1. BMC Genomics,10(1), 425.


IGEM   |   WANYUAN   |   SUIS

Website: wanyuan.suis.com.cn

Mail Box: suisigem@outlook.com

No.509, Pingji Rd, Minhang District, Shanghai

Copyright © 2019 Shanghai United International School. All Rights Reserved.