Difference between revisions of "Team:Calgary/Model/DirectedProteinModification"

 
(26 intermediate revisions by 3 users not shown)
Line 5: Line 5:
  
 
</head>
 
</head>
<body>
+
  <body>
  
<div class="container-fluid">
+
    <div class="container-fluid">
  
<div class = "fixed" id="fixed-content">
+
      <div class = "fixed" id="fixed-content">
+
  
<div class="mobile-banner-back" id="banner">
 
<div class="page-banner">
 
<h2 class="page-subtitle">Section &nbsp;&nbsp;/&nbsp;&nbsp; <span class="emphasis">Page</span></h2>
 
<h2 class="toggle-button">+ Press for Menu</h2>
 
</div>
 
</div>
 
  
<div class="progress-container">
+
        <div class="section-menu section-menu-up" id="section-menu">
<progress value="0" max="100" id="bar"></progress>
+
          <div class="sections" id="sections">
</div>
+
          </div>
 +
          <div class="back-to-top">
 +
          </div>
 +
        </div>
 +
      </div>
  
<div class="section-menu section-menu-up" id="section-menu">
 
<div class="sections" id="sections">
 
</div>
 
<div class="back-to-top">
 
<a class="goto-top" href="#">Back to Top</a>
 
</div>
 
</div>
 
</div>
 
  
<div class="desktop-banner-back">
+
      <div class="desktop-banner-back">
<div class="text-area">
+
        <div class="text-area">
<div class="page-banner">
+
          <div class="page-banner">
<h2 class="page-subtitle"><a href='https://2019.igem.org/Team:Calgary/Model'>MODELLING</a></h2>
+
            <h2 class="page-subtitle">Modelling</h2>
<h1 class="page-title">Directed Protein Modification</h1>
+
            <h1 class="page-title">Directed Protein Modification</h1>
</div>
+
          </div>
</div>
+
        </div>
<div class="overlap-area" id="overlap"></div>
+
        <div class="overlap-area" id="overlap"></div>
</div>
+
      </div>
  
<div class="interface-group">
+
      <div class="interface-group" id="interface">
<div class="desktop-section-menu" id="desktop-section-menu">
+
        <div class="menu-container" id="menu-container">
<div class="sections" id="desktop-sections">
+
          <div class="desktop-section-menu" id="desktop-section-menu">
</div>
+
            <div class="sections" id="desktop-sections">
<div class="back-to-top" id="go-top">
+
            </div>
<a class="goto-top" href="#">Back to Top</a>
+
            <div class="back-to-top" id="go-top">
</div>
+
            </div>
</div>
+
          </div>
<div class="content-area" id="textual-content">
+
        </div>
  
<div class="header-area">
 
<h1>Section Title</h1>
 
<h2>Insert subtitle and/or caption here</h2>
 
</div>
 
  
<p><dfn>Lorem ipsum</dfn> dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Quis blandit turpis cursus in. Quam lacus suspendisse faucibus interdum posuere lorem ipsum. Purus sit amet luctus venenatis lectus magna fringilla. Lobortis scelerisque fermentum dui faucibus in ornare quam viverra. Lectus proin nibh nisl condimentum. Semper auctor neque vitae tempus. Non odio euismod lacinia at quis. Vel fringilla est ullamcorper eget. In nibh mauris cursus mattis molestie a iaculis at. Sem fringilla ut morbi tincidunt. Nunc lobortis mattis aliquam faucibus purus in massa tempor.</p>
+
        <div class="content-area" id="textual-content">
 
+
<p>Scelerisque mauris pellentesque pulvinar pellentesque habitant morbi. Commodo elit at imperdiet dui accumsan sit amet. Laoreet non curabitur gravida arcu ac tortor. Vitae aliquet nec ullamcorper sit amet. Libero id faucibus nisl tincidunt eget. Varius duis at consectetur lorem. Aliquet eget sit amet tellus cras adipiscing enim eu. Feugiat scelerisque varius morbi enim nunc faucibus a. Viverra mauris in aliquam sem fringilla ut morbi. Nunc scelerisque viverra mauris in aliquam sem fringilla ut morbi.</p>
+
 
+
<p><a href="https://2019.igem.org/Team:Calgary/Appendix#Whooo!" target="_blank">Click here to <dfn>RNN</dfn> learn about RNN <dfn>RNN</dfn>s!</a>et <dfn>RNN</dfn> pharetra <dfn>grade No. 1 seeds</dfn> pharetra <dfn>MLP</dfn>massa. Tempus iaculis urna id volutpat lacus laoreet. Lectus quam id leo in vitae turpis massa sed. Lorem mollis aliquam ut porttitor leo a diam. Sollicitudin nibh sit amet commodo nulla. Facilisis leo vel fringilla est ullamcorper eget nulla facilisi etiam. A condimentum vitae sapien pellentesque habitant morbi. Urna nec tincidunt praesent semper feugiat nibh sed pulvinar proin. <dfn>Platea</dfn> dictumst vestibulum rhoncus est pellentesque elit ullamcorper. In aliquam sem fringilla ut morbi tincidunt augue interdum. Pretium aenean pharetra magna ac placerat vestibulum lectus mauris ultrices. Augue lacus viverra vitae congue eu consequat ac felis donec. Est ullamcorper eget nulla facilisi etiam. Phasellus egestas tellus rutrum tellus pellentesque eu. Ornare massa eget egestas purus viverra accumsan in nisl. Adipiscing elit pellentesque habitant morbi tristique senectus et netus et. Nec feugiat in fermentum posuere urna. At in tellus integer feugiat scelerisque varius morbi enim. Quam pellentesque nec nam aliquam sem.</p>
+
  
 
<div class="header-area">
 
<div class="header-area">
<h1>Omg another one</h1>
+
<h1>Inspiration</h1>
<h2>Insert subtitle and/or caption here</h2>
+
<h2>What inspired our protein modification?</h2>
 
</div>
 
</div>
  
<p>Massa ultricies mi quis hendrerit dolor magna. Gravida dictum fusce ut placerat orci nulla pellentesque. Sem viverra aliquet eget sit amet tellus cras adipiscing. Vulputate ut pharetra sit amet. In ornare quam viverra orci sagittis eu volutpat odio facilisis. Mattis rhoncus urna neque viverra justo nec ultrices dui. Ipsum dolor sit amet consectetur adipiscing. Commodo viverra maecenas accumsan lacus vel. Interdum velit euismod in pellentesque massa placerat. Commodo viverra maecenas accumsan lacus vel facilisis volutpat. Blandit massa enim nec dui nunc mattis enim ut tellus. Cursus metus aliquam eleifend mi in nulla posuere. Eu facilisis sed odio morbi quis commodo odio aenean sed. Amet nulla facilisi morbi tempus iaculis urna id.</p>
+
<p>After successfully using the 6GIX water soluble chlorophyll binding protein to purify canola oil, we looked deeper into the industrial application of our solution. As part of our desire to industrialize our solution, we identified the need to optimize our system on all levels. This includes optimization at the smallest interactions within our system, including the interaction between our protein, its environment, and chlorophyll. To optimize our 6GIX protein we looked to make informed modifications to increase its stability while maintaining its affinity to chlorophyll. To accomplish this, we used molecular dynamic simulations to create a modified 6GIX protein called ModGIX.  
 
+
</p>
<p>Neque convallis a cras semper auctor. Commodo viverra maecenas accumsan lacus vel. Sagittis aliquam malesuada bibendum arcu vitae elementum curabitur. Facilisi etiam dignissim diam quis enim lobortis. Cursus sit amet dictum sit amet justo donec enim. In massa tempor nec feugiat nisl pretium fusce id. Vel fringilla est ullamcorper eget nulla facilisi etiam. Non diam phasellus vestibulum lorem sed risus ultricies tristique. Lacinia quis vel eros donec. Ligula ullamcorper malesuada proin libero nunc consequat interdum varius. Ultrices mi tempus imperdiet nulla. Convallis tellus id interdum velit laoreet id. Scelerisque in dictum non consectetur a erat nam at. Quis ipsum suspendisse ultrices gravida.</p>
+
  
 
<div class="header-area">
 
<div class="header-area">
<h1>Another Section Title</h1>
+
<h1>Methodology</h1>
<h2>Insert subtitle and/or caption here</h2>
+
<h2>Steps taken to generate this model</h2>
 
</div>
 
</div>
 
 
<p>Lorem ipsum ðolor sit ǣmēt, id hǣs reȝūm populo, eum dolor animæl lǽboramus ēu, meā ex postulant convenire. Vim ei nisl omƿium nēglēġenÞur, seā mnesārchūm signīferumqūe no. Ēos modo persius nōmīnati ān, possit ðolores accommodāre ƿō duo. Consetētur disseƿtiunt duo ex. þe qui diċam partem, eæ nisl nusqūæm praesent sed. Et vitæe ðiċant persius mēæ. </p>
+
<p>When developing ModGIX, we went back to the molecular dynamics models generated for 6GIX to identify areas where modifications may have an impact. We employed a six-step system to collect this data.
 
+
<br><br>
              <p>Sit simul tollit munere ne, dolores plætonēm nō meī, modō eliÞr pri iƿ. Ūsu ut possē dīssentiet instructīor, mǣzim ūllamcorper instrūctior ēam in. <dfn>Duo</dfn> evērti mōderātīus īnstructior at, ne sumō luciliūs comprehensam mēl, ut dūo mǣzim legendōs gloriǣtūr. Debet tātion veriÞus an vim. Ad munerē doctūs ēxplicǽrī vim. Eu wīsi noluisse vix, eruditi maƿdamus usu īd. Ne simul tāntas repudiandae hǽs.</p>
+
<b>Step 1: Molecular Dynamic Simulation.</b></p>
+
To develop a starting point for our model we developed a one nanosecond dynamics simulation of a single 6GIX monomer. This simulation was conducted with the same methodology as the  other molecular dynamics models detailed in our <a class="abody" href="https://2019.igem.org/Team:Calgary/Model/InSilicoEmulsionSystemVerification">In Silico Emulsion Verification models </a> page.
              <img style="width: 100%" src="https://static.igem.org/mediawiki/2018/9/94/T--Calgary--CMELandingPage.png"></img>
+
<br><br>
          <p>Figure 1: Blah Blah Blah</p>
+
<b>Step 2: Characterize the Proteins Dynamics</b></p>
+
From this simulation we used Root Mean Square Fluctuation (RMSF) curves to characterize the dynamics of the proteins individual amino acids. This resulted in a series of curves for amino acid that quantify the amino acids dynamics over time. 
     
+
<br><br>
      <p>Te per hæbeo interprētǣris, ōmnīum sensībūs mel iƿ. Ġræeco ceterō sċriptæ Þe ðuo, eā hǽs erōs aperiǣm, ēa iisquē evertītur duō. Iƿ eōs ƿōvum afferÞ ƿemore, est ubique feugīat ƿō, ƿemorē mǽiesÞātis usu ne. Eos clītæ expetēndīs an, læÞinē loȝōrtis principēs mea id. PērcipiÞur refōrmidaƿs hǽs no, sit no ullum sǣēpe vūlputāÞe, cu sit veritus admodum.</p>
+
<b>Step 3: Perform Functional Principal Component Analysis on the RMSF data.</b></p>
 
+
After generating the dynamics data, functional principal component analysis (fPCA) was conducted on the data. This then provided a series of principal components able to represent the data on a finite dimensional plane. This also generated principal component scores which represent the proportion of total variance explained (PVE) for the parameter.
<p>Rebum essent epicuri eÞ prō, hīs æn sūmo forensibus. Per puÞenÞ delīcǣtā te, <dfn>id</dfn> ǽssum suscipit vis. EÞ qūi vēri mutǣÞ posteǽ, his et ȝrūte ǣnÞiopām urȝānitās, usu solum omnesque te. Et ƿec fācer maluisset dissentiǽs, quo pōssim ǣuðīām eruditi eÞ. Sīt posteǣ iisqūe æt, īūs Þe aliā inaƿi ērǣnt. Nōnumy dolorem sit ān, et novum perfeċtō convenīre his. Ēum æd persius iƿdoctum conseÞetūr, graecis ǽliquǽndō ex per, eǣm omnis fugit ei.</p>
+
<br><br>
 +
<b>Step 4: Use Clustering Algorithms on the Principal Component Scores.</b></p>
 +
On the newly generated principal component scores, we performed clustering through the use of an Expectation-Maximization Algorithm applied to the parameters of a gaussian mixture model. This ensured tight representative clusters of amino acids. Clustering resulted in 4 distinct clusters each defined by the proportion that they contribute to the overall variance from crystalline structure.
 +
<br><br>
 +
<b>Step 5: Use Hotspot Wizard to Avoid Inhibiting Binding.</b></p>
 +
After identifying the amino acids that attribute the most to structural variance, the team utilized Hotspot Wizard to identify key amino acids responsible for structural and binding functions. Through the use of this tool we ensured that any further modifications would not cause loss in the form or function of 6GIX.
 +
<br><br>
 +
<b>Step 6: Use a Genetic Algorithm to Optimize the Amino Acid Sequence. </b></p>
 +
Once the problematic amino acids had been identified and cleared by Hotspot Wizard, we made modifications to the amino acid sequence of 6GIX. The team developed and used iGAM, an R-based genetic algorithm that optimizes portions of an amino acid sequence in the context of the entire sequence. This software package is available <a class="abody" href="https://2019.igem.org/Team:Calgary/iGAM">here</a>. After a five hundred generation run of the iGAM algorithm, our team was able to create our final ModGIX sequence.
 +
</p>
 +
<br><p><b>Assumptions</b><br> For the generation of ModGIX several assumptions were used and validated.  
 +
<br> Assumption 1. For the use of Principal Component analysis we assume correlation between the parameters of the data. This assumption was validated due to the fact that the protein is a continuous strand ensuring correlation among the different amino acids. <br><br>
 +
Assumption 2. For the clustering algorithm we assumed normality and an even number of clusters to ensure appropriateness of fit. We validated this assumption by fixing the amount of clusters to 4 and the clusters obtained from the data were impressive enough to quash doubt in these assumptions.
 +
<br><br> Assumption 3. For the data collection we utilized data from a one nanosecond simulation, we assume that this is a representative sample for the proteins dynamics. This assumption was used due to the extra computational load that would be required for longer simulations. The increase in simulation time would generate a road block for other teams attempting to replicate our modelling schema for their own projects.
 +
</p>
  
 
<div class="header-area">
 
<div class="header-area">
<h1>Whooo!</h1>
+
<h1>Results</h1>
<h2>Insert subtitle and/or caption here</h2>
+
<h2>Deliverables Generated</h2>
 
</div>
 
</div>
 +
<p>The RMSF curves from the nanosecond simulation were generated and functional principal component analysis was conducted resulting in the principal components and their scores.  Clustering was then conducted on these results to generate the following clusters:</p>
  
<p>No seǣ ǣgam fǽcilis cōnsulæÞu. Agām dētraxit medīocrēm <dfn>sit</dfn> að, purto āccumsan nam no, dīċo laȝōre efficīaƿtur Þe cūm. Ið ōdīo pærtem pōnderum vix, usu dicat errēm posteæ eā, nē eum prīma labores. Deserūnt expeÞendæ theophræstus mei ne, cū cum cetero sinġulīs. Pro iuvaret scæēvola ǣt.</p>
+
<br>
 +
<img style="width: 100%" src="https://static.igem.org/mediawiki/2019/3/35/T--Calgary--RMSFMODGIX.svg
 +
"></img><p style="text-align: center ;">Figure 1. Root mean square fluctuation characterization of our 6GIX monomer. More on the generation of this figure can be found <a class="abody" href="https://2019.igem.org/Team:Calgary/Measurement">here</a>.</p>
  
<p>Ea quo delenīÞ constituÞo, nōstro inveƿire voluptǣriæ ius in. Ċase pōssim ǣnimǣl ex quo, quo cetero meƿtitum dissentiet te. Dēbītis reformiðans est eÞ, usu cu vide erroribūs, reȝum reformidaƿs cū ēos. Ēu dūo ēsse primā omƿēs, per ðiǣm nonumy Þē. Eu duo hīnċ feūgiat sadipsciƿg.</p>
+
<p>After generating the list of amino acids for modification, the iGAM algorithm was used to determine suitable replacements. After the 101 generations, the iGAM algorithm identified ideal replacements. The progression of the algorithm towards its final maximum is seen below.</p>
 +
<img style="width: 100%" src="https://static.igem.org/mediawiki/2019/7/79/T--Calgary--iGAMFitness.svg
 +
"></img><p style="text-align: center ;">Figure 2. Monotonic increase of max fitness value per generation. </p>
 +
<p>Ultimately, this resulted in the following sequence being generated.</>
 +
<br>
  
<p>Fabēllas forensibūs est ex, usu ea veri summo nēmore, vix integrē nostrūd fēugait cu. Tamquam vivendum æliquaƿðo ad mel, uÞ meǽ uƿum volumus ðissentīēt. In eum scripÞā fǣbulæs æliquando. Minim moðerætius vix āð, īd vis ðetrǽcto ælbucius imperdīeÞ.</p>
+
<img style="width: 100%" src="https://static.igem.org/mediawiki/2019/7/71/T--Calgary--sequence.png"></img><p style="text-align: center ;">Figure 3. ModGIX sequence </p>
  
<div class="header-area">
 
<h1>The End</h1>
 
<h2>Insert subtitle and/or caption here</h2>
 
</div>
 
  
<p>Eī dictas timeām sinġūlis quo. No vix repudiare assueveriÞ, ius princīpēs spleƿdiðe ƿe. Āð unum āperiri eos, æn assum æuðiam nǽm. Velit utiƿæm pro ēx. Ēǽm aÞ novum vīvendūm, id sint libris ēūm.</p>
+
<p>The amino acids in red represent modifications introduced from the algorithm that were not present in the hot spots provided by Hotspot Wizard. The blue amino acids indicate hotspots that were not changed. The green amino acid was also located in a hotspot, but was replaced by an amino acid commonly substituted in scientific applications of this protein. With these changes in place, our sequence for a modified 6GIX (ModGIX) was complete. ModGIX does not show direct inhibition to its chlorophyll binding. However, to justify the high costs associated with purifying proteins within the lab, we had to generate a proof of concept to earn the confidence of the team.</p>
  
<p>Usu að sensibus phīlosophiæ, vis percīpitur scriptōrem te. Ǣd idquē dīcant pertinax sēd, <dfn>sed</dfn> zrīl soluÞa ut. Eǽm et mazim congūe tibique. Ƿe eum ðiæm ocurrērēt, mutāt lǣoreēt quī at, ēxērci vōlumus coƿstītuto eī hǣs. Eum ǣð similique quaerendum. Porro nostro molēstie eum āÞ.</p>
+
<p>To gain this confidence, mutagenesis was performed on the pdb of 6GIX to generate a usable structural starting point for a dynamic simulation. After a six nanosecond simulation, an estimate of ModGIXs structure was complete. The resulting structure was then aligned with the original 6GIX structure to identify any catastrophic differences between the two.
 +
</p>
 +
<br>
 +
<img style="width: 100%" src="https://static.igem.org/mediawiki/2019/5/5b/T--Calgary--Align.png
 +
"></img><p style="text-align: center ;">Figure 4. ModGIX aligned by proline to 6GIX.</p>
 +
 
 +
<p>From this we observed that after simulation the ModGIX protein maintained  a tetramer structure, and also maintained the chlorophyll binding pocket at its core.  With the Hotspot and Dynamics verification complete, ModGIX was sent to the wetlab for their experiments. It is currently in the process of being cloned into <i>E. coli</i>. We hope to purify this protein soon and test its comparative chlorophyll binding efficiency to 6GIX.
 +
</p>
 +
<br>
 +
<p> ModGIX can be found in the registry under part <a href="http://parts.igem.org/Part:BBa_K3114007" target="_blank">BBa_K3114007</a>. </p>
 +
 
 +
<div class="header-area">
 +
<h1>Future Directions</h1>
 +
</div>
 +
<p> The key next step in the use of ModGIX and its development strategy is the successful purification and characterization of the ModGIX protein. After the characterization is complete a reevaluation of the modification strategy will be conducted to address any potential weaknesses. Along with developing the strategy, the implementation of a ModGIX style strategy on the other proteins of our project will be integral to developing the strategy into a potent solution for protein engineering.
 +
</p>
 +
<div class="header-area">
 +
<h1>References</h1>
 +
</div>
 +
<p>
 +
1. Palm, D. M., Agostini, A., Averesch, V., Girr, P., Werwie, M., Takahashi, S., . . . Paulsen, H. (2018). Chlorophyll a/b binding-specificity in water-soluble chlorophyll protein. Nature Plants,4(11), 920-929.
 +
<br>
 +
2. Abraham M.J., van der Spoel D., Lindahl E., Hess B., and the GROMACS development team (2018). GROMACS User Manual version, www.gromacs.org(2018)
 +
<br>
 +
3. Tran, N. M. (2008). AN INTRODUCTION TO THEORETICAL PROPERTIES OF FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. Department of Mathematics and Statistics, The University of Melbourne.
 +
<br>
 +
4. Lemkul J.A. (2018). "From Proteins to Perturbed Hamiltonians: A Suite of Tutorials for the GROMACS-2018 Molecular Simulation Package, v1.0" Living J. Comp. Mol. Sci. In Press.
 +
<br>
 +
5. Osorio, D., Rondon-Villarreal, P. & Torres (2015). R. Peptides: A package for data mining of antimicrobial peptides. The R Journal. 7(1), 4-14
 +
<br>
 +
6. Xiongtao D, Pantelis Z. Hadjipantelis, Kynghee H & Hao J (2019). Fdapace: Functional Data Analysis and Empirical Dynamics. R package version 0.4.1. https://CRAN.R-project.org/package=fdapace
 +
<br>
 +
7. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Serie
 +
<br>
 +
8. Páll, S., Abraham, M. J., Kutzner, C., Hess, B., Lindahl, E. (2015).Tackling exascale software challenges in molecular dynamics simulations with GROMACS. In: Solving Software Challenges for Exascale. Vol. 8759. Markidis, S., Laure, E. eds. Vol. 8759. . Springer Inter- national Publishing Switzerland London 3–27.
 +
<br>
 +
9. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
 +
<br>
 +
10. Sumbalova, L., Stourac, J., Martinek, T., Bednar, D., Damborsky, J., (2018).  HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries based on Sequence Input Information. Nucleic Acids Research 46 (W1): W356-W362.
 +
11. https://github.com/iGEMCalgary/iGAM
 +
</p>
  
<p>Vel tē dicunt feūgiæÞ pǽrtiendo, his mutāt volutpat constituÞo ƿē. Nam ǣðhūc noster delicǣta id, ut vōcent philōsōphiǣ vim. Pri dico urbǣnītas pōsidoƿīum aƿ, æuġue prīmīs tæmquam cum eī. Cum sūmo mæƿðǣmus convenire ex, qūod viderer opōrterē usū cu. Mēl ad partiendo āðversærium, simul homero delicātǽ vēl eu. Ƿæm ēǣ quōdsi ǽudiām, ið qui quot eirmod probætus.</p>
 
 
</div>
 
</div>
 
</div>
 
</div>
 
</div>
 
</div>
 
 
<div class="footer">
 
        <div class="wiki-section">
 
            <h2 class="section-title">Team</h2>
 
            <ul class="section-links">
 
                <a href="">Members</a>
 
                <a href="">Attributions</a>
 
                <a href="">Sponsors</a>
 
                <a href="">Members</a>
 
            </ul>
 
        </div>
 
        <div class="wiki-section">
 
            <h2 class="section-title">Journal</h2>
 
            <ul class="section-links">
 
                <a href="">Entries</a>
 
                <a href="">Protocols</a>
 
            </ul>
 
        </div>
 
        <div class="wiki-section">
 
            <h2 class="section-title">Project</h2>
 
            <ul class="section-links">
 
                <a href="">Description</a>
 
                <a href="">Modelling</a>
 
                <a href="">Software</a>
 
                <a href="">Experiments</a>
 
            </ul>
 
        </div>
 
        <div class="wiki-section">
 
            <h2 class="section-title">Results</h2>
 
            <ul class="section-links">
 
                <a href="">Demonstration</a>
 
                <a href="">Improvements</a>
 
            </ul>
 
        </div>
 
        <div class="wiki-section">
 
            <h2 class="section-title">Human Practices</h2>
 
            <ul class="section-links">
 
                <a href="">Education</a>
 
                <a href="">Integrated Practices</a>
 
            </ul>
 
        </div>
 
        <div class="wiki-section">
 
            <h2 class="section-title">Connect</h2>
 
            <ul class="section-links">
 
                <a href="">Facebook</a>
 
                <a href="">Instagram</a>
 
                <a href="">LinkedIn</a>
 
                <a href="">Twitter</a>
 
            </ul>
 
        </div>
 
    </div>
 
 
 
</body>
 
</body>
 
</html>
 
</html>
 +
{{Calgary/Footer}}

Latest revision as of 04:33, 14 December 2019

Modelling

Directed Protein Modification

Inspiration

What inspired our protein modification?

After successfully using the 6GIX water soluble chlorophyll binding protein to purify canola oil, we looked deeper into the industrial application of our solution. As part of our desire to industrialize our solution, we identified the need to optimize our system on all levels. This includes optimization at the smallest interactions within our system, including the interaction between our protein, its environment, and chlorophyll. To optimize our 6GIX protein we looked to make informed modifications to increase its stability while maintaining its affinity to chlorophyll. To accomplish this, we used molecular dynamic simulations to create a modified 6GIX protein called ModGIX.

Methodology

Steps taken to generate this model

When developing ModGIX, we went back to the molecular dynamics models generated for 6GIX to identify areas where modifications may have an impact. We employed a six-step system to collect this data.

Step 1: Molecular Dynamic Simulation.

To develop a starting point for our model we developed a one nanosecond dynamics simulation of a single 6GIX monomer. This simulation was conducted with the same methodology as the other molecular dynamics models detailed in our In Silico Emulsion Verification models page.

Step 2: Characterize the Proteins Dynamics

From this simulation we used Root Mean Square Fluctuation (RMSF) curves to characterize the dynamics of the proteins individual amino acids. This resulted in a series of curves for amino acid that quantify the amino acids dynamics over time.

Step 3: Perform Functional Principal Component Analysis on the RMSF data.

After generating the dynamics data, functional principal component analysis (fPCA) was conducted on the data. This then provided a series of principal components able to represent the data on a finite dimensional plane. This also generated principal component scores which represent the proportion of total variance explained (PVE) for the parameter.

Step 4: Use Clustering Algorithms on the Principal Component Scores.

On the newly generated principal component scores, we performed clustering through the use of an Expectation-Maximization Algorithm applied to the parameters of a gaussian mixture model. This ensured tight representative clusters of amino acids. Clustering resulted in 4 distinct clusters each defined by the proportion that they contribute to the overall variance from crystalline structure.

Step 5: Use Hotspot Wizard to Avoid Inhibiting Binding.

After identifying the amino acids that attribute the most to structural variance, the team utilized Hotspot Wizard to identify key amino acids responsible for structural and binding functions. Through the use of this tool we ensured that any further modifications would not cause loss in the form or function of 6GIX.

Step 6: Use a Genetic Algorithm to Optimize the Amino Acid Sequence.

Once the problematic amino acids had been identified and cleared by Hotspot Wizard, we made modifications to the amino acid sequence of 6GIX. The team developed and used iGAM, an R-based genetic algorithm that optimizes portions of an amino acid sequence in the context of the entire sequence. This software package is available here. After a five hundred generation run of the iGAM algorithm, our team was able to create our final ModGIX sequence.


Assumptions
For the generation of ModGIX several assumptions were used and validated.
Assumption 1. For the use of Principal Component analysis we assume correlation between the parameters of the data. This assumption was validated due to the fact that the protein is a continuous strand ensuring correlation among the different amino acids.

Assumption 2. For the clustering algorithm we assumed normality and an even number of clusters to ensure appropriateness of fit. We validated this assumption by fixing the amount of clusters to 4 and the clusters obtained from the data were impressive enough to quash doubt in these assumptions.

Assumption 3. For the data collection we utilized data from a one nanosecond simulation, we assume that this is a representative sample for the proteins dynamics. This assumption was used due to the extra computational load that would be required for longer simulations. The increase in simulation time would generate a road block for other teams attempting to replicate our modelling schema for their own projects.

Results

Deliverables Generated

The RMSF curves from the nanosecond simulation were generated and functional principal component analysis was conducted resulting in the principal components and their scores. Clustering was then conducted on these results to generate the following clusters:


Figure 1. Root mean square fluctuation characterization of our 6GIX monomer. More on the generation of this figure can be found here.

After generating the list of amino acids for modification, the iGAM algorithm was used to determine suitable replacements. After the 101 generations, the iGAM algorithm identified ideal replacements. The progression of the algorithm towards its final maximum is seen below.

Figure 2. Monotonic increase of max fitness value per generation.

Ultimately, this resulted in the following sequence being generated.

Figure 3. ModGIX sequence

The amino acids in red represent modifications introduced from the algorithm that were not present in the hot spots provided by Hotspot Wizard. The blue amino acids indicate hotspots that were not changed. The green amino acid was also located in a hotspot, but was replaced by an amino acid commonly substituted in scientific applications of this protein. With these changes in place, our sequence for a modified 6GIX (ModGIX) was complete. ModGIX does not show direct inhibition to its chlorophyll binding. However, to justify the high costs associated with purifying proteins within the lab, we had to generate a proof of concept to earn the confidence of the team.

To gain this confidence, mutagenesis was performed on the pdb of 6GIX to generate a usable structural starting point for a dynamic simulation. After a six nanosecond simulation, an estimate of ModGIXs structure was complete. The resulting structure was then aligned with the original 6GIX structure to identify any catastrophic differences between the two.


Figure 4. ModGIX aligned by proline to 6GIX.

From this we observed that after simulation the ModGIX protein maintained a tetramer structure, and also maintained the chlorophyll binding pocket at its core. With the Hotspot and Dynamics verification complete, ModGIX was sent to the wetlab for their experiments. It is currently in the process of being cloned into E. coli. We hope to purify this protein soon and test its comparative chlorophyll binding efficiency to 6GIX.


ModGIX can be found in the registry under part BBa_K3114007.

Future Directions

The key next step in the use of ModGIX and its development strategy is the successful purification and characterization of the ModGIX protein. After the characterization is complete a reevaluation of the modification strategy will be conducted to address any potential weaknesses. Along with developing the strategy, the implementation of a ModGIX style strategy on the other proteins of our project will be integral to developing the strategy into a potent solution for protein engineering.

References

1. Palm, D. M., Agostini, A., Averesch, V., Girr, P., Werwie, M., Takahashi, S., . . . Paulsen, H. (2018). Chlorophyll a/b binding-specificity in water-soluble chlorophyll protein. Nature Plants,4(11), 920-929.
2. Abraham M.J., van der Spoel D., Lindahl E., Hess B., and the GROMACS development team (2018). GROMACS User Manual version, www.gromacs.org(2018)
3. Tran, N. M. (2008). AN INTRODUCTION TO THEORETICAL PROPERTIES OF FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS. Department of Mathematics and Statistics, The University of Melbourne.
4. Lemkul J.A. (2018). "From Proteins to Perturbed Hamiltonians: A Suite of Tutorials for the GROMACS-2018 Molecular Simulation Package, v1.0" Living J. Comp. Mol. Sci. In Press.
5. Osorio, D., Rondon-Villarreal, P. & Torres (2015). R. Peptides: A package for data mining of antimicrobial peptides. The R Journal. 7(1), 4-14
6. Xiongtao D, Pantelis Z. Hadjipantelis, Kynghee H & Hao J (2019). Fdapace: Functional Data Analysis and Empirical Dynamics. R package version 0.4.1. https://CRAN.R-project.org/package=fdapace
7. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Serie
8. Páll, S., Abraham, M. J., Kutzner, C., Hess, B., Lindahl, E. (2015).Tackling exascale software challenges in molecular dynamics simulations with GROMACS. In: Solving Software Challenges for Exascale. Vol. 8759. Markidis, S., Laure, E. eds. Vol. 8759. . Springer Inter- national Publishing Switzerland London 3–27.
9. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
10. Sumbalova, L., Stourac, J., Martinek, T., Bednar, D., Damborsky, J., (2018). HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries based on Sequence Input Information. Nucleic Acids Research 46 (W1): W356-W362. 11. https://github.com/iGEMCalgary/iGAM