Team:Calgary/Model/DirectedProteinModification

MODELLING

Directed Protein Modification

Inspiration

What inspired our protein modification?

After successfully using the 6GIX water soluble chlorophyll binding protein to purify canola oil, we looked deeper into the industrial application of our solution. As part of our desire to industrialize our solution we needed to optimize our system on all levels. This optimization includes optimization at the smallest interactions within our system, primarily with our protein. To optimize our 6GIX protein we looked to make informed modifications such that they have as large an impact possible while additionally maintaining its affinity to chlorophyll. To accomplish we leaned on the use of molecular dynamic simulation that was essential to our understanding of the 6GIX protein.

Methodology

Steps taken to generate this model

When developing ModGIX we went back to the molecular dynamics models generated for 6GIX to identify areas where modifications may have an impact. This was conducted in six key data collection and categorization stages.

Step 1. Molecular Dynamic Simulation. To develop a starting point for our model we developed a one nanosecond dynamics simulation of a single 6GIX monomer. This simulation was conducted with the same methodology as the other molecular dynamics models detailed in our In Silico Emulsion Verification models.

Step 2. Characterize the Proteins Dynamics From this simulation we used Root Mean Square Fluctuation (RMSF) curves to characterize the dynamics of the proteins individual amino acids. This resulted in a series of curves for amino acid that quantify the amino acids dynamics over time.

Step 3. Perform Functional Principal Component Analysis on the RMSF data. After generating the dynamics data functional principal component analysis (fPCA) was conducted on the data. This then provided a series of principal components able to represent the data on a finite dimensional plane. This also generated principal component scores which represent the proportion of total variance explained (PVE) for the parameter.

Step 4. Use Clustering Algorithms on the Principal Component Scores. On the newly generated principal component scores we performed clustering through the use of an Expectation-Maximization Algorithm applied to the parameters of a gaussian mixture model. This ensured tight representative clusters of amino acids. Clustering resulted in 4 distinct clusters each defined by the proportion that they contribute to the overall variance from crystalline structure.

Step 5. Use Hotspot Wizard to Avoid Inhibiting Binding. After identifying the amino acids that attribute the most to structural variance the team utilized Hotspot Wizard. Hotspot Wizard is a free online tool that identifies key amino acids responsible for structural and binding functions. Through the use of this tool we ensured that any further modifications would not cause loss in the form or function of 6GIX.

Step 6. Use a Genetic Algorithm to Optimize the Amino Acid Sequence. Once the problematic amino acids have been identified and cleared by Hotspot Wizard it is time to make modifications. The team developed and used iGAM an R based genetic algorithm that optimizes portions of an amino acid sequence in the context of the entire sequence. This software package is available here. After a five hundred generation run of the iGAM algorithm the team was greeted by the final ModGIX sequence.

Results

Deliverables Generated

The RMSF curves from the nanosecond simulation were generated and functional principal component analysis was conducted resulting in the principal components and their scores. Clustering was then conducted on these results to generate the following clusters:




After generating the list of amino acids for modification the iGAM algorithm was used to determine suitable replacements. After the 101 generations the iGAM algorithm identified ideal replacements. The progression of the algorithm towards its final maximum is seen below.



Ultimately this resulted in the following sequence being generated.


The amino acids in red represent modifications introduced from the algorithm that were not present in the hot spots provided by Hotspot Wizard. The blue amino acids are ones that were within hotspots, the algorithm for these two maintained the amino acid present in the wild type therefore not interfering with the proteins hotspots. The green amino acid is an amino acid located in a hotspot but it was in fact replaced by an amino acid already widely mutated in for that point. Therefore we have generated a sequence for a modified 6GIX (ModGIX) that does not show direct inhibition to its chlorophyll binding. However, to justify the high costs associated with purifying proteins within the lab we had to generate a proof of concept to earn the confidence of the team.

To gain this confidence mutagenesis was performed on the pdb of 6GIX to generate a usable structural starting point for a dynamic simulation. After a six nanosecond simulation an estimate of ModGIXs structure was complete. The resulting structure was then aligned with the original 6GIX structure to identify any catastrophic differences between the two.




From this we observed that after simulation the ModGIX protein maintained a tetramer structure, and also maintained the chlorophyll binding pocket at its core. With the Hotspot and Dynamics verification complete, ModGIX was sent to the wetlab for their experiments. It is currently in the process of being cloned into E coli.


ModGIX can be found in the registry under part Bba_K3114022.