Difference between revisions of "Team:TU Darmstadt/Model"

 
(39 intermediate revisions by 3 users not shown)
Line 34: Line 34:
 
                     <div class="row">
 
                     <div class="row">
 
                         <div class="col mx-2">
 
                         <div class="col mx-2">
                             <p> In synthetic biology, theoretical models are often used to gain insights, predict and
+
                             <p> In synthetic biology, theoretical models are often used to gain insights, to predict and
                                 improve
+
                                 to improve
 
                                 experiments. In our project we are modifying Virus-like particles (VLPs) by attaching
 
                                 experiments. In our project we are modifying Virus-like particles (VLPs) by attaching
 
                                 proteins to the
 
                                 proteins to the
                                 surface of the P22 capsid
+
                                 surface of the <a href="https://2019.igem.org/Team:TU_Darmstadt/Project/P22_VLP"
                                <!-- Link zum Background oder Project overview  --> through a linker. The linking is
+
                                    target="_blank">P22</a> capsid through a linker. The linking is catalyzed using
                                catalyzed using
+
                                 the enzyme <a href="https://2019.igem.org/Team:TU_Darmstadt/Project/Sortase"
                                 the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A
+
                                    target="_blank">Sortase&nbsp;A7M</a>, which is a calcium-independent mutant of the
                                 <!-- Link zum Sortase Background --> from <i>Staphylococcus aureus</i>. We performed
+
                                wild type
 +
                                 Sortase&nbsp;A from <i>Staphylococcus aureus</i>. We performed
 
                                 modeling to predict the unknown structure of the
 
                                 modeling to predict the unknown structure of the
                                 Sortase A7M, to improve the linker between proteins and therefore optimizing the
+
                                 Sortase&nbsp;A7M to improve the linking between proteins and therefore to optimize the
 
                                 modification
 
                                 modification
 
                                 efficiency of our platform. <br>
 
                                 efficiency of our platform. <br>
                                 Two different modeling approaches were used to determine the structure of Sortase A7M.
+
                                 Two different modeling approaches were used to determine the structure of the
 +
                                Sortase&nbsp;A7M.
 
                                 We compared
 
                                 We compared
                                 machine learning approaches to traditional comparative, Monte-Carlo based modeling
+
                                 machine learning approaches to traditional, comparative Monte-Carlo based modeling
 
                                 methods. The
 
                                 methods. The
                                 results were evaluated using an energy-scoring function and molecular dynamics (MD)
+
                                 results were evaluated using an energy-scoring function and molecular&nbsp;dynamics (MD)
 
                                 simulations. The
 
                                 simulations. The
                                 most promising Sortase A7M structures were used to perform a docking simulation to
+
                                 most promising Sortase&nbsp;A7M structures were used to perform a
                                 screen for
+
                                docking&nbsp;simulation to
                                optimal linkers.
+
                                 investigate binding with linkers.
 
                             </p>
 
                             </p>
 
                         </div>
 
                         </div>
Line 87: Line 89:
  
 
                                     <p>
 
                                     <p>
                                         <i>In silico</i> modeling and simulation of proteins requires a 3D structure,
+
                                         <i>In&nbsp;silico</i> modeling and simulation of proteins requires a
 +
                                        3D&nbsp;structure,
 
                                         which can be
 
                                         which can be
 
                                         obtained from the <a href="https://www.rcsb.org/" target="_blank">RCSB Protein
 
                                         obtained from the <a href="https://www.rcsb.org/" target="_blank">RCSB Protein
 
                                             Data
 
                                             Data
                                             Bank</a>. However, if no 3D structures are annotated, as it is the case with
+
                                             Bank</a>. However, if no 3D structures are annotated, as is the case with
                                         sortase
+
                                         Sortase&nbsp;A7M, the structure has to be determined by other means. The
                                        A7M, the structure has to be determined by other means. The structure prediction
+
                                        structure prediction
                                         of sortase A7M was done using two different approaches.
+
                                         of Sortase&nbsp;A7M was done using two different approaches.
 
                                     </p>
 
                                     </p>
  
Line 103: Line 106:
 
                                         datasets. This is commonly
 
                                         datasets. This is commonly
 
                                         done by
 
                                         done by
                                         presenting the algorithm with training data as well as a scoring function to
+
                                         presenting the algorithm with training data, as well as a scoring function to
 
                                         measure its
 
                                         measure its
 
                                         success at processing the
 
                                         success at processing the
 
                                         input data. During training a feedback loop is used to allow the algorithm to
 
                                         input data. During training a feedback loop is used to allow the algorithm to
 
                                         automatically
 
                                         automatically
                                         find a function to fit
+
                                         find a function that fits
 
                                         the data. In contrast, classical
 
                                         the data. In contrast, classical
 
                                         algorithms are often
 
                                         algorithms are often
Line 119: Line 122:
 
                                         weights, which are adjusted during its training. Nodes in neural networks are
 
                                         weights, which are adjusted during its training. Nodes in neural networks are
 
                                         linked
 
                                         linked
                                         together: One neuron processes
+
                                         together: one neuron processes
                                         the inputs of other neurons, loosely mimicking the structure of biological
+
                                         the input of other neurons, loosely mimicking the structure of biological
 
                                         brains. While one
 
                                         brains. While one
 
                                         usually has a fixed
 
                                         usually has a fixed
                                         amount of input and output neurons limited by the data one wishes to classify,
+
                                         amount of input and output neurons, limited by the data one wishes to classify,
 
                                         adding layers of hidden neurons can improve the classification.
 
                                         adding layers of hidden neurons can improve the classification.
 
                                         This is often referred to as
 
                                         This is often referred to as
Line 130: Line 133:
 
                                     </p>
 
                                     </p>
  
                                     <p>Using Machine Learning to predict protein structures has many advantages compared
+
                                     <p>Using Machine Learning to predict protein structures has many advantages,
 +
                                        compared
 
                                         to
 
                                         to
                                         conventional methods especially
+
                                         conventional methods, especially
 
                                         for iGEM teams who often only have limited access to resources. After training a
 
                                         for iGEM teams who often only have limited access to resources. After training a
 
                                         neural
 
                                         neural
 
                                         network, which is a
 
                                         network, which is a
                                         computationally expensive process and often done in centralized data centers, it
+
                                         computationally expensive process and is often done in centralized data centers,
 +
                                        it
 
                                         can be used
 
                                         can be used
 
                                         to predict the
 
                                         to predict the
Line 162: Line 167:
 
                                         AlQuarishi demonstrated a complete deep learning approach that is able to make
 
                                         AlQuarishi demonstrated a complete deep learning approach that is able to make
 
                                         predictions
 
                                         predictions
                                         within 1-2 Å of other
+
                                         within 1-2&nbsp;Å of other
                                         approaches
+
                                         approaches,
 
                                         <sup id="cite_ref-1" class="reference">
 
                                         <sup id="cite_ref-1" class="reference">
 
                                             <a href="#cite_note-1">[2] </a>
 
                                             <a href="#cite_note-1">[2] </a>
                                         </sup>
+
                                         </sup> while only using a fraction of the computational power. This enables
                                        , while only using a fraction of the computational power. This enables accurate
+
                                        accurate
 
                                         structural
 
                                         structural
 
                                         prediction with less
 
                                         prediction with less
                                         powerful as well as less expensive hardware and thus significantly reduces the
+
                                         powerful, as well as less expensive, hardware and thus significantly reduces the
 
                                         cost of
 
                                         cost of
 
                                         structural modeling.</p>
 
                                         structural modeling.</p>
Line 176: Line 181:
 
                                     <p> We used AlQuarashi’s approach in combination with his pretrained model, which
 
                                     <p> We used AlQuarashi’s approach in combination with his pretrained model, which
 
                                         was trained on
 
                                         was trained on
                                         the Proteinnet database
+
                                         the ProteinNet database
                                         containing all structures released prior to the start of CASP12 (12th Critical
+
                                         containing all structures released prior to the start of CASP12&nbsp;(12th
 +
                                        &nbsp;Critical
 
                                         Assessment of
 
                                         Assessment of
 
                                         Techniques for Protein
 
                                         Techniques for Protein
 
                                         <!-- RMSD immer gleich schreiben bindestriche und so -->
 
                                         <!-- RMSD immer gleich schreiben bindestriche und so -->
                                         Structure Prediction – 2016). The results were tested against the CASP12
+
                                         Structure Prediction – 2016) our generated candidate structure <i>CASP12</i> is
 +
                                        named after. The results were tested against the CASP12
 
                                         datasets and
 
                                         datasets and
                                         reached distance root-mean-square deviation (RMSD) values between
+
                                         reached distance root-mean-square deviation&nbsp;(RMSD) values between
 
                                         10 and 13 &#8491;. The RMSD is defined as root-mean-square deviation of all atom
 
                                         10 and 13 &#8491;. The RMSD is defined as root-mean-square deviation of all atom
 
                                         positions compared to a template structure.
 
                                         positions compared to a template structure.
                                         It is defined as:
+
                                         It is defined as: </p>
  
                                        <div class="row"
+
                                    <div class="row" style="height: 4em;">
                                            style="height: 3em;">
+
                                        <img class="img-fluid center"
                                            <img class="img-fluid center"
+
                                            src="https://2019.igem.org/wiki/images/3/31/T--TU_Darmstadt--RMSD.png"
                                                src="https://2019.igem.org/wiki/images/b/be/T--TU_Darmstadt--RMSD.jpeg"
+
                                            style="max-height: 100%; width: auto; margin: 0 auto;">
                                                style="max-height: 100%; width: auto; margin: 0 auto;">
+
  
                                        </div>
+
                                    </div>
                                        where v_i is a vector of all
+
                                    <p>where x<sub>i</sub> is a vector of the atomic coordinates of the i-th atom.
                                        <!-- change here -->
+
                                         All proteins in the CASP&nbsp;datasets were not published until after the
                                         All proteins in the CASP datasets were not published until after the competition
+
                                         competition, and thus represent an
                                         and thus represent an
+
 
                                         assessment with only little bias.
 
                                         assessment with only little bias.
 
                                         <sup id="cite_ref-4" class="reference">
 
                                         <sup id="cite_ref-4" class="reference">
 
                                             <a href="#cite_note-4">[4] </a>
 
                                             <a href="#cite_note-4">[4] </a>
 
                                         </sup>
 
                                         </sup>
                                         We used these pretrained datasets to make structural predictions for our Sortase
+
                                         We used these pretrained datasets to make structural predictions for our
                                         A7M. The
+
                                         Sortase&nbsp;A7M. The
                                         predicted structure was then relaxed in a Molecular Dynamics Simulation using
+
                                         predicted structure was then relaxed in a
                                         GROMACS.
+
                                         molecular dynamics simulation.
 +
 
 
                                     </p>
 
                                     </p>
  
Line 216: Line 222:
 
                                     <ol>
 
                                     <ol>
  
                                         <li>We used the amino acid sequence of the Sortase A7M in the FASTA format to
+
                                         <li>We used the amino acid sequence of the Sortase&nbsp;A7M in the FASTA format
 +
                                            to
 
                                             predict the
 
                                             predict the
 
                                             tertiary structure of the
 
                                             tertiary structure of the
                                             amino acid backbone using AlQuarishi’s Tensor Flow implementation of his
+
                                             amino acid backbone using AlQuarishi’s implementation of his
 
                                             end-to-end
 
                                             end-to-end
 
                                             differentiable learning of
 
                                             differentiable learning of
                                             protein structure with the pretrained preCASP Proteinnet database. The
+
                                             protein structure with the pretrained preCASP ProteinNet database.
                                             Output file was a
+
 
 +
 
 +
                                            The
 +
                                             output file was a
 
                                             <i>.tertiary</i> file which
 
                                             <i>.tertiary</i> file which
 
                                             contains a sequential 3x3 Matrix with atomic coordinates from each amino
 
                                             contains a sequential 3x3 Matrix with atomic coordinates from each amino
 
                                             acid backbone
 
                                             acid backbone
 
                                             starting at the
 
                                             starting at the
                                             N-Terminus.</li>
+
                                             N-Terminus. The raw backbone is depicted in <b>Animation 1</b>.</li>
  
                                         <li>As the standard format for protein structure information is the PDB file
+
                                         <li>As the standard format for protein structure information is the PDB (Protein
 +
                                            Data Bank) file
 
                                             format, we
 
                                             format, we
 
                                             wrote a python script to
 
                                             wrote a python script to
                                             combine the structural information from the FASTA and .tertiary files into a
+
                                             combine the structural information from the FASTA and <i>.tertiary</i> files
 +
                                            into a
 +
                                            single
 
                                             PDB file.
 
                                             PDB file.
 
                                             For ease of use we
 
                                             For ease of use we
Line 241: Line 254:
 
                                         </li>
 
                                         </li>
  
                                         <li>Using Rosetta's fixed backbone design program 'fixbb' with the 'hpatch', the
+
                                         <li>Using <i>Rosetta</i>'s fixed backbone design program 'fixbb' with the
 +
                                            'hpatch', the
 
                                             optimal
 
                                             optimal
 
                                             position of the side-chains
 
                                             position of the side-chains
Line 248: Line 262:
 
                                             corresponding
 
                                             corresponding
 
                                             side-chains and optimizes
 
                                             side-chains and optimizes
                                             their conformation. The Hpatch database ensures that hydrophilic side-chains
+
                                             their conformation. The hpatch database ensures that hydrophilic side-chains
 
                                             are to be
 
                                             are to be
 
                                             preferred on the surface
 
                                             preferred on the surface
                                             of the protein as our sortase is present in an aqueous environment.</li>
+
                                             of the protein as our sortase is present in an aqueous environment. The
 
+
                                            resulting structure is depicted in <b>Animation 2.</b></li>
 
+
 
                                     </ol>
 
                                     </ol>
  
 
                                     <p>In order to evaluate the structure obtained, we constructed a Ramachandran plot
 
                                     <p>In order to evaluate the structure obtained, we constructed a Ramachandran plot
 
                                         by
 
                                         by
                                         calculating the dihedral angles of the amino acid backbone as depicted in Figure
+
                                         calculating the dihedral angles of the amino acid backbone, as depicted in
                                         4. These can then be compared to the typical dihedral angles for specific
+
                                         <b>Fig.&nbsp;1</b>. These can then be compared to the typical dihedral angles
                                         secondary structures such as Alpha-Helices and Beta-Sheets. The typical angles
+
                                        for specific
 +
                                         secondary structures such as &alpha;-Helices and &beta;-Sheets. The typical
 +
                                        angles
 
                                         from a randomly sampled dataset
 
                                         from a randomly sampled dataset
                                         are depicted in Figure 5.</p>
+
                                         are depicted in <b>Fig.&nbsp;2</b>.</p>
 
+
 
                                     <div class="row">
 
                                     <div class="row">
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 3em;">
+
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <div class="row">
                                                 src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dihedral.png"
+
                                                 <a  target="_blank"
                                                style="width:90%; height: 90%;">
+
                                                    href="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dihedral.png">
                                        </div>
+
                                                    <img class="img-fluid center"
                                        <div class="figurcolumn column"
+
                                                        src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dihedral.png"
                                            style="width: 50%; float: right;  padding: 1em;">
+
                                                        style="width:90%; height: 90%;"></a>
                                            <img class="img-fluid center"
+
                                            </div>
                                                src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
+
                                            <div class="row">
                                                style="width:100%">
+
                                                <div class="caption">
 +
                                                    <p><b>Figure 1:</b> The dihedral angles of amino acids can be
 +
                                                        calculated to create a Ramachandran plot. </p>
 +
                                                </div>
  
 +
                                            </div>
 
                                         </div>
 
                                         </div>
                                    </div>
+
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                    <div class="row">
+
                                             <div class="row">
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                                 <a target="_blank"
                                             <p><b>Figure 4:</b> The dihedral angles of amino acids can be
+
                                                    href="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG">
                                                 calculated to create a Ramachandran plot. </p>
+
                                                    <img class="img-fluid center"
                                        </div>
+
                                                        src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
                                        <div class="figurcolumn column"
+
                                                        style="width:100%; height: 100% !important;"></a>
                                            style="width: 50%; float: right;  padding: 1em;">
+
                                             </div>
                                             <p><b>Figure 5:</b> The dihedral angles over a range of randomly sampled
+
                                            <div class="row">
                                                proteins.</p>
+
                                                <div class="caption">
 +
                                                    <p><b>Figure 2:</b> The dihedral angles over a range of randomly
 +
                                                        sampled
 +
                                                        proteins.</p>
 +
                                                </div>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
Line 294: Line 317:
 
                                     <div class="row">
 
                                     <div class="row">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/5/57/T--TU_Darmstadt--CASP_Nochains.gif"
+
                                                href="https://2019.igem.org/wiki/images/5/57/T--TU_Darmstadt--CASP_Nochains.gif">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p><b>Animation 1: </b>The raw PDB File converted from the .tertiary file.
+
                                                    src="https://2019.igem.org/wiki/images/5/57/T--TU_Darmstadt--CASP_Nochains.gif"
                                            </p>
+
                                                    style="width:100%"></a>
 +
                                             <div class="caption">
 +
                                                <p><b>Animation 1: </b>The raw PDB File converted from the
 +
                                                    <i>.tertiary</i>
 +
                                                    file.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/c/c1/T--TU_Darmstadt--CASP_Chains.gif"
+
                                                href="https://2019.igem.org/wiki/images/c/c1/T--TU_Darmstadt--CASP_Chains.gif">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p><b>Animation 2: </b>The PDB-File after Step 3.</p>
+
                                                    src="https://2019.igem.org/wiki/images/c/c1/T--TU_Darmstadt--CASP_Chains.gif"
 +
                                                    style="width:100%"></a>
 +
                                             <div class="caption">
 +
                                                <p><b>Animation 2: </b>The PDB-File after Step 3.</p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
                                     <p>For analysis the Strucure was viewed in Pymol
+
                                     <p>For analysis the Strucure was viewed in <i>PyMOL</i>
                                         <!-- PYMOL REFERENZ -->. As can be seen in the pictures below,
+
                                         <!-- PyMOL REFERENZ -->. As can be seen in <b>Animation 3</b> below,
                                         no secondary structures could be recognized by Pymol. Thus, a Ramachandran Plot
+
                                         no secondary structures could be recognized by PyMOL. Thus, a Ramachandran plot
 
                                         was used to
 
                                         was used to
                                         evaluate the dihedral angles of the backbone. It was found that the angles do
+
                                         evaluate the dihedral angles of the backbone. This is depicted in <b>Fig. 3</b>.
                                        not match with
+
                                        It was found that the angles do&nbsp;not match with
 
                                         the typical angles for &alpha;-helices and &beta;-sheets.</p>
 
                                         the typical angles for &alpha;-helices and &beta;-sheets.</p>
 
                                     <div class="row">
 
                                     <div class="row">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left;  padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left;  padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/8/82/T--TU_Darmstadt--CASP_CHAINSCartoon.gif"
+
                                                href="https://2019.igem.org/wiki/images/8/82/T--TU_Darmstadt--CASP_CHAINSCartoon.gif">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p><b>Animation 3: </b>The cartoon view in Pymol.</p>
+
                                                    src="https://2019.igem.org/wiki/images/8/82/T--TU_Darmstadt--CASP_CHAINSCartoon.gif"
 +
                                                    style="width:100%"></a>
 +
                                             <div class="caption">
 +
                                                <p><b>Animation 3: </b>The cartoon view in PyMOL.</p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                         <div class="figurcolumn column"
 
                                         <div class="figurcolumn column"
 
                                             style="width: 50%; float: right;  padding: 1em;">
 
                                             style="width: 50%; float: right;  padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
+
                                                href="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p> <b>Figure 1: </b>Ramachandran plot of the predicted structure.</p>
+
                                                    src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
 +
                                                    style="width:100%"></a>
 +
                                             <div class="caption">
 +
                                                <p> <b>Figure 3: </b>Ramachandran plot of the predicted structure.</p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
 
                                     <p>During training the predictions in AlQuarashi’s Model were optimized for their
 
                                     <p>During training the predictions in AlQuarashi’s Model were optimized for their
                                         RMSD which is
+
                                         RMSD, which is
 
                                         the root-mean-square deviation of the distance between the atoms of the
 
                                         the root-mean-square deviation of the distance between the atoms of the
 
                                         prediction and
 
                                         prediction and
Line 336: Line 377:
 
                                         structure. Thus, even though the predictions are expected to have a similar
 
                                         structure. Thus, even though the predictions are expected to have a similar
 
                                         shape to
 
                                         shape to
                                         the physical structure, they may not be in the energy minimum. Hence, we applied
+
                                         the physical structure, they may not be in the energy minimum.
                                         a
+
                                         <h2>Rosetta Comparative Modeling</h2>
                                        GROMACS molecular
+
                                        <h3 class="ausfahrbarer-boi">Background</h3>
                                        dynamics in order to relax the structure obtained by AlQuarashi’s deep learning
+
                                        model.</p>
+
                                    <h2>RosettaCM</h2>
+
                                    <h3 class="ausfahrbarer-boi">Background</h3>
+
  
                                    <p>In our second approach we used the <a href="rosettacommons.org"
+
                                        <p>In our second approach we used the <a href="https://www.rosettacommons.org"
                                            target="_blank"><i>RosettaCommons</a> comparative modeling
+
                                                target="_blank"><i>RosettaCommons</a> comparative modeling
                                        (<a>RosettaCM</a>)</i>, which
+
                                            (<a>RosettaCM</a>)</i>, which
                                        is based on homology modeling. <i>Homology modeling</i> is a protein modeling
+
                                            is based on homology modeling
                                        method, which
+
                                            <sup id="cite_ref-5" class="reference">
                                        requires one or more template structures as base the protein to be modeled on.
+
                                                <a href="#cite_note-5">[5] </a>
                                        The protein
+
                                            </sup>
                                        sequences are aligned with the sequence of the target protein. Unaligned
+
                                            . Homology modeling is a protein
                                        sections are
+
                                            modeling
                                        modeled using fragment or protein libraries, which leads to creating
+
                                            method, which
                                        <!-- ästhetik --> protein structures based
+
                                            requires one or more template structures as a base for the protein to be
                                        on different sequence homologues of the protein of interest.
+
                                            modeled.
                                        <i>Ab-initio</i> or <i>de novo</i> modeling on the other hand attempts to find
+
                                            The protein
                                        protein
+
                                            sequences are aligned with the sequence of the target protein. Unaligned
                                        structures solely based on physicochemical principles applied to the primary
+
                                            sections are
                                        sequence, which
+
                                            modeled using fragment or protein libraries, which leads to creating
                                        can be compared to the refolding of a denaturated protein.</p>
+
                                            protein structures based
 +
                                            on different sequence homologues of the protein of interest.
 +
                                            <i>Ab-initio</i> or <i>de&nbsp;novo</i> modeling on the other hand attempts
 +
                                            to find protein
 +
                                            structures solely based on physicochemical principles applied to the primary
 +
                                            sequence, which
 +
                                            can be compared to the refolding of a denatured protein.</p>
  
                                    <p>RosettaCM combines <i>ab-initio modeling</i> with <i>homology modeling</i>. The
+
                                        <p>RosettaCM combines <i>ab-initio&nbsp;modeling</i> with
                                        homologus structures for which a resolved 3D structure with sufficiently similar
+
                                            <i>homology&nbsp;modeling</i>. The
                                        sequence exists are generated using homology modeling. Afterwards the unaligned
+
                                            homologous structures, for which a resolved 3D structure with sufficiently
                                        sequences are modeled de novo. By combining the two methods RosettaCM
+
                                            similar
                                        represents a precise and resource efficient tool for protein structure
+
                                            sequence exists, are generated using homology modeling. Afterwards the
                                        prediction.
+
                                            unaligned
                                        Rosetta applications rely on the Monte-Carlo Optimization, which is a
+
                                            sequences are modeled <i>de&nbsp;novo</i>. By combining the two methods,
                                        probabilistic
+
                                            RosettaCM
                                        approach to finding a local minimum in the energy landscape of protein
+
                                            represents a precise and resource efficient tool for protein structure
                                        conformations. The
+
                                            prediction.
                                        underlying equation serving as the fundament of the statistical Monte-Carlo
+
                                            Rosetta applications rely on the Monte-Carlo Optimization, which is a
                                        <!-- ref original paper --> method is the Metropolis acceptance criterion:
+
                                            probabilistic
                                         $$p = min(1, exp[-\Delta E/ (k_{B} \cdot T)]),$$
+
                                            approach to finding a local minimum in the energy landscape of protein
                                         <br> where k<sub>B</sub> is the Boltzmann constant, &Delta;E the difference in
+
                                            conformations. The
                                        energy of the two states and T the temperature. The term k<sub>B</sub>T can also
+
                                            underlying equation serving as the foundation of the statistical Monte-Carlo
                                        be written as a single factor &beta;.</p>
+
                                            <sup id="cite_ref-6" class="reference">
 +
                                                <a href="#cite_note-1">[6] </a>
 +
                                            </sup>
 +
                                            method is the Metropolis acceptance criterion:
 +
                                         </p>
 +
                                        <div class="row" style="height: 4em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/3/32/T--TU_Darmstadt--MetAcc.png"
 +
                                                style="max-height: 100%; width: auto; margin: 0 auto;">
 +
                                         </div>
 +
                                        <p>
 +
                                            <br> where k<sub>B</sub> is the Boltzmann constant, &Delta;E the difference
 +
                                            in
 +
                                            energy of the two states and T the temperature. The term k<sub>B</sub>T can
 +
                                            also
 +
                                            be written as a single factor &beta;.</p>
  
                                    <p>
+
                                        <p>
                                        During the statistical protein folding based on the Monte-Carlo method, the
+
                                            During the statistical protein folding based on the Monte-Carlo method, the
                                        initial
+
                                            initial
                                        structure is changed by small random perturbations of the atom locations.
+
                                            structure is changed by small random perturbations of the atom locations.
                                        Whether the structure is accepted or
+
                                            Whether the structure is accepted, or
                                        not is decided by the Metropolis acceptance criterion.
+
                                            not, is decided by the Metropolis acceptance criterion.
                                        If &Delta;E < 0, the structure is accepted, otherwise the newly proposed
+
                                            If &Delta;E < 0, the structure is accepted, otherwise the newly proposed
                                            structure is accepted with probability p as described in the Metropolis
+
                                                structure is accepted with probability p as described in the Metropolis
                                            acceptance criterion. </p> <h3 class="ausfahrbarer-boi">Procedure</h3>
+
                                                acceptance criterion. </p> <h3 class="ausfahrbarer-boi">Procedure</h3>
                                            <p>
+
                                                <p>
                                                The RosettaCM protocol requires evolutionary related structures and
+
                                                    The <i>RosettaCM</i> protocol requires evolutionary related
                                                sequences,
+
                                                    structures and
                                                as well as fragment files of the target structure.
+
                                                    sequences,
                                                The fragment files serve as a structure template for the proteins and
+
                                                    as well as fragment files of the target structure.
                                                they
+
                                                    The fragment files serve as a structure template for the proteins
                                                consist of peptide fragments of sizes 3 and 9.
+
                                                    and
                                                We gathered five evolutionary related structures from the RCBS PDB with
+
                                                    consist of peptide fragments of sizes 3 and 9 amino acids.
                                                the
+
                                                    We gathered five evolutionary related structures from the PDB with
                                                accession numbers:</p>
+
                                                    the
                                            <ul>
+
                                                    accession numbers:</p>
                                                <!-- LINKS FÜR ALLE STRUKTUREN EINFÜGEN -->
+
                                                <ul>
                                                <li>1ija</li>
+
                                                    <li><a href="http://www.rcsb.org/structure/1IJA"
                                                <li>1itw</li>
+
                                                            target="_blank">1ija</a></li>
                                                <li>1itp</li>
+
                                                    <li><a href="http://www.rcsb.org/structure/1ITW"
                                                <li>1ito</li>
+
                                                            target="_blank">1itw</a></li>
                                                <li>2mlm</li>
+
                                                    <li><a href="http://www.rcsb.org/structure/1ITP"
                                            </ul>
+
                                                            target="_blank">1itp</a></li>
                                            <br>
+
                                                    <li><a href="http://www.rcsb.org/structure/1ITO"
                                            <p>
+
                                                            target="_blank">1ito</a></li>
                                                The five RCBS entries represent different structures of sortases from
+
                                                    <li><a href="http://www.rcsb.org/structure/2MLM"
                                                <i>Staphylococcus aureus</i>.
+
                                                            target="_blank">2mlm</a></li>
                                                Fragment files can be created with the Robetta <a
+
                                                </ul>
                                                    href="robetta.bakerlab.http://robetta.bakerlab.org/org"
+
                                                <br>
                                                    target="_blank">online server</a> or with the Rosetta FragmentPicker
+
                                                <p>
                                                application.
+
                                                    The five PDB entries represent different structures of sortases
                                            </p>
+
                                                    from
                                            <p>The RosettaCM procedure is best described in the following steps:</p>
+
                                                    <i>Staphylococcus&nbsp;aureus</i>.
                                            <!-- quelle auf rosetta cm seite-->
+
                                                    Fragment files can be created with the Robetta <a
                                            <ol>
+
                                                        href="http://robetta.bakerlab.org/"
                                                 <li>sequence and structural alignment of templates</li>
+
                                                        target="_blank">online server</a> or with the
                                                <li>fragment insertion in unaligned sections</li>
+
                                                    <i>Rosetta FragmentPicker</i>
                                                <li>replacement of random segment with segment from a different template
+
                                                    application.
                                                    structure</li>
+
                                                </p>
                                                <li>energy minimization</li>
+
                                                <p>The <i>RosettaCM</i> procedure is best described in the following
                                                <li>all-atom optimization</li>
+
                                                    steps:
 +
                                                    <sup id="cite_ref-5" class="reference">
 +
                                                        <a href="#cite_note-5">[5] </a>
 +
                                                    </sup>
 +
                                                 </p>
 +
                                                <ol>
 +
                                                    <li>sequence and structural alignment of templates</li>
 +
                                                    <li>fragment insertion in unaligned sections</li>
 +
                                                    <li>replacement of random segment with segment from a different
 +
                                                        template
 +
                                                        structure</li>
 +
                                                    <li>energy minimization</li>
 +
                                                    <li>all-atom optimization</li>
  
                                            </ol>
+
                                                </ol>
                                            <br>
+
                                                <br>
                                            <p>
+
                                                <p>
                                                The alignment can be performed with various tools. We used <a
+
                                                    The alignment can be performed with various tools. We used <a
                                                    href="https://mafft.cbrc.jp/alignment/server/"
+
                                                        href="https://mafft.cbrc.jp/alignment/server/"
                                                    target="_blank">MAFFT</a> to
+
                                                        target="_blank">MAFFT</a> to
                                                generate the multiple sequence alignments.
+
                                                    generate the multiple sequence alignments.
                                                Prior to using the alignments as an input, they were converted to the
+
                                                    Prior to using the alignments as an input, they were converted to
                                                grishin
+
                                                    the
                                                alignment format as RosettaCM requires the alignments to be in said
+
                                                    grishin
                                                format.
+
                                                    alignment format, as <i>RosettaCM</i> requires the alignments to be
                                                The minimization is performed using the Rosetta controid energy
+
                                                    in said
                                                function. For
+
                                                    format.
                                                the centroid function to be applied, the protein is converted to the
+
                                                    The minimization is performed using the Rosetta centroid energy
                                                centroid
+
                                                    function. For
                                                representation. A protein in centroid representation consists of the
+
                                                    the centroid function to be applied, the protein is converted to the
                                                backbone
+
                                                    centroid
                                                atoms N, C<sub>&alpha;</sub>;, O<sub>Carbonyl</sub> and an atom of
+
                                                    representation. A protein in centroid representation consists of the
                                                varying size representing the
+
                                                    backbone
                                                side chain. The advantage of using the centroid representation is that
+
                                                    atoms N, C<sub>&alpha;</sub>, O<sub>Carbonyl</sub> and an atom of
                                                the
+
                                                    varying size representing the
                                                energy landscape can be traversed easier due to the smoother nature of
+
                                                    side chain. The advantage of using the centroid representation is
                                                the
+
                                                    that
                                                centroid energy landscape.
+
                                                    the
                                                Finally the generated structure undergoes a second minimization in an
+
                                                    energy landscape can be traversed easier due to the smoother nature
                                                all-atom model by
+
                                                    of the centroid energy landscape.
                                                means of Monte-Carlo optimization. This is similar to the energy
+
                                                    Finally, the generated structure undergoes a second minimization in
                                                minimization but without the amino acids being
+
                                                    an
                                                represented as centroids of their functional groups. Structures computed
+
                                                    all-atom model by
                                                through
+
                                                    means of Monte-Carlo optimization. This is similar to the energy
                                                all-atom optimizations can reach atomic resolutions
+
                                                    minimization, but without the amino acids being
                                                {{Quelle rosetta paper}}
+
                                                    represented as centroids of their functional groups. Structures
                                                which is crucial for a model meant to be used to estimate atomic
+
                                                    computed
                                                interactions.
+
                                                    through
                                            </p>
+
                                                    all-atom optimizations can reach atomic resolutions,
 +
                                                    <sup id="cite_ref-5" class="reference">
 +
                                                        <a href="#cite_note-5">[5] </a>
 +
                                                    </sup>
 +
                                                    which is crucial for a model meant to be used to estimate atomic
 +
                                                    interactions.
 +
                                                </p>
  
                                            <h3>Results</h3>
+
                                                <h3>Results</h3>
                                            <p>
+
                                                <p>
                                                The run yielded 15,000 structures which have been compared using the
+
                                                    The run yielded 15,000 structures, which have been compared using
                                                Rosetta
+
                                                    the
                                                scoring functions (talaris2013).
+
                                                    Rosetta
                                                <!-- scoring -->
+
                                                    scoring functions (talaris2013).
                                                From the 15,000 structures generated, we inspected the ten best scoring
+
                                                    <!-- scoring -->
                                                structures. </p>
+
                                                    From the 15,000 structures generated, we inspected the ten best
 +
                                                    scoring
 +
                                                    structures. </p>
  
                                            <p>As can be seen in figure 5, the most prominent differences can
+
                                                <p>As can be seen in <b>Fig. 4</b>, the most prominent differences can
                                                be found in the regions close to the N- and C-terminus. As
+
                                                    be found in the regions close to the N- and C-terminus. As
                                                fluctuations in those
+
                                                    fluctuations in those
                                                regions are not untypical, we decided to use the best scoring
+
                                                    regions are not untypical, we decided to use the best scoring
                                                structure, candidate S_14771 (figure 6), as the input for the
+
                                                    structure, candidate <i>S_14771</i> (<b>Animation&nbsp;4</b>), as
                                                simulations to follow.</p>
+
                                                    the input
 +
                                                    for
 +
                                                    the
 +
                                                    simulations to follow.
 +
                                                </p>
  
  
                                            <div class="row">
+
                                                <div class="row">
                                                <div class="figurcolumn column"
+
                                                    <div class="figurcolumn column"
                                                    style="width: 50%; float: left;  padding: 1em;">
+
                                                        style="width: 50%; float: left;  padding: 1em;">
                                                    <img class="img-fluid center"
+
                                                        <a  target="_blank"
                                                        src="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png"
+
                                                            href="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png">
                                                         style="width:100%">
+
                                                            <img class="img-fluid center"
 +
                                                                src="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png"
 +
                                                                style="width:100%"></a>
 +
                                                        <div class="caption">
 +
                                                            <p><b>Figure 4</b>: The structural alignment of the ten best
 +
                                                                scoring
 +
                                                                sortase structures
 +
                                                                displaying minor differences with the exception of the
 +
                                                                C-
 +
                                                                and
 +
                                                                N-terminal
 +
                                                                regions. N- and C-terminal regions tend to show strong
 +
                                                                fluctuations, thus it is
 +
                                                                unsurprising to find the terminal regions to be
 +
                                                                unaligned.
 +
                                                            </p>
 +
                                                        </div>
 +
                                                    </div>
 +
                                                    <div class="figurcolumn column"
 +
                                                         style="width: 50%; float: right;  padding: 1em;">
 +
                                                        <a  target="_blank"
 +
                                                            href="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif">
 +
                                                            <img class="img-fluid center"
 +
                                                                src="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif"
 +
                                                                style="width:100%"></a>
 +
                                                        <div class="caption">
 +
                                                            <p><b>Animation 4</b>: Sortase&nbsp;A7M candidate
 +
                                                                <i>S_14771</i>
 +
                                                                created
 +
                                                                through
 +
                                                                <i>RosettaCM</i>.</p>
 +
                                                        </div>
 +
                                                    </div>
 
                                                 </div>
 
                                                 </div>
                                                <div class="figurcolumn column"
 
                                                    style="width: 50%; float: right;  padding: 1em;">
 
                                                    <img class="img-fluid center"
 
                                                        src="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif"
 
                                                        style="width:100%">
 
  
                                                 </div>
+
                                                 <p> In order to evaluate the secondary structure of the Sortase&nbsp;A7M
                                            </div>
+
                                                     candidate <i>S_14771</i> a Ramachandran plot was created and
                                            <div class="row">
+
                                                    compared to
                                                <div class="figurcolumn column"
+
                                                    the five sortases used as input for the comparative modeling.
                                                    style="width: 50%; float: left;  padding: 1em;">
+
                                                    Comparisons were also drawn with the sortase predicted by Deep
                                                     <p><b>Figure 2</b>: The structural alignment of the ten best scoring
+
                                                    Learning,
                                                        sortase structures
+
                                                     as well as a database of randomly sampled proteins.
                                                        displaying minor differences with the exception of the C- and
+
                                                     Ramachandran plots of dihedral angles (<b>Fig.&nbsp;5</b>) can be a
                                                        N-terminal
+
                                                    first indicator
                                                        regions. N- and C-terminal regions tend to show strong
+
                                                    whether the structures computed are valid.</p>
                                                        fluctuations, thus it is
+
                                                        unsurprising to find the terminal regions to be unaligned.</p>
+
                                                </div>
+
                                                <div class="figurcolumn column"
+
                                                     style="width: 50%; float: right;  padding: 1em;">
+
                                                     <p><b>Figure 3</b>: Sortase A7M candidate S_14771 created through
+
                                                        RosettaCM.</p>
+
                                                </div>
+
                                            </div>
+
  
                                            <!-- muss überarbeitet werden -->
+
                                                <div class="row">
 
+
                                                    <div class="figurcolumn column"
                                            <p>To evaluate the secondary structure as done with the structure acquired
+
                                                        style="width: 50%; float: left;  padding: 1em;">
                                                through Deep Learning bla bla a ramachandran plot of the dihedral angle
+
                                                        <a target="_blank"
                                                of the five sortases used as inputs has been made.
+
                                                            href="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png">
                                                Ramachandran plots of dihedral angles (fig x) can be a first indicator
+
                                                            <img class="img-fluid center"
                                                whether the structures computed are valid.</p>
+
                                                                src="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png"
 
+
                                                                style="width:100%"></a>
                                            <div class="row">
+
                                                    </div>
                                                <div class="figurcolumn column"
+
                                                    <div class="figurcolumn column"
                                                    style="width: 50%; float: left;  padding: 1em;">
+
                                                        style="width: 50%; float: right;  padding: 1em;">
                                                    <img class="img-fluid center"
+
                                                        <a  target="_blank"
                                                        src="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png"
+
                                                            href="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png">
                                                        style="width:100%">
+
                                                            <img class="img-fluid center"
 +
                                                                src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
 +
                                                                style="width:100%"></a>
 +
                                                    </div>
 
                                                 </div>
 
                                                 </div>
                                                 <div class="figurcolumn column"
+
                                                 <div class="row">
                                                    style="width: 50%; float: right;  padding: 1em;">
+
                                                    <div class="figurcolumn column"
                                                    <img class="img-fluid center"
+
                                                        style="width: 50%; float: left;  padding: 1em;">
                                                        src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
+
                                                        <a  target="_blank"
                                                        style="width:100%">
+
                                                            href="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png">
 +
                                                            <img class="img-fluid center"
 +
                                                                src="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png"
 +
                                                                style="width:100%"></a>
 +
                                                    </div>
 +
                                                    <div class="figurcolumn column"
 +
                                                        style="width: 50%; float: right;  padding: 1em;">
 +
                                                        <a  target="_blank"
 +
                                                            href="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG">
 +
                                                            <img class="img-fluid center"
 +
                                                                src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
 +
                                                                style="height:82.5%; padding-top: 1.8em; padding-bottom: 2.5em;"></a>
 +
                                                    </div>
 
                                                 </div>
 
                                                 </div>
                                            </div>
+
                                                <div class="row">
                                            <div class="row">
+
                                                    <div class="caption">
                                                <div class="figurcolumn column"
+
                                                        <p><b>Figure 5</b>: The Ramachandran plot of randomly sampled
                                                    style="width: 50%; float: left;  padding: 1em;">
+
                                                            proteins
                                                    <img class="img-fluid center"
+
                                                            <sup id="cite_ref-1" class="reference">
                                                        src="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png"
+
                                                                <a href="#cite_note-1">[1] </a>
                                                        style="width:100%">
+
                                                            </sup>
 +
                                                            and the input structures of the <i>comparative
 +
                                                                modeling</i>
 +
                                                            show
 +
                                                            similar secondary structures. Secondary structure analysis
 +
                                                            of
 +
                                                            both
 +
                                                            sortase candidates reveals absence
 +
                                                            of secondary structures for candidate <i>CASP12</i>. This is
 +
                                                            not
 +
                                                            the
 +
                                                            case
 +
                                                            with candidate <i>S_14771</i> as the Ramachandran plot shows
 +
                                                            all
 +
                                                            relevant
 +
                                                            structures.</p>
 +
                                                    </div>
 
                                                 </div>
 
                                                 </div>
                                                 <div class="figurcolumn column"
+
                                                 <p>The Ramachandran plot (<b>Fig.&nbsp;5</b>) showing &alpha;-helices
                                                    style="width: 50%; float: right;  padding: 1em;">
+
                                                    and
                                                    <img class="img-fluid center"
+
                                                    &beta;-sheets is a
                                                        src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
+
                                                    strong indicator of a successful structure determination, as those
                                                        style="height:82.5%; padding-top: 1.8em; padding-bottom: 2.5em;">
+
                                                    secondary
                                                </div>
+
                                                    structures are crucial for the functionality of sortases.</p>
                                            </div>
+
  
                                            <p><b>Figure 5: </b> The comparison of the ramachandran plot of
+
                                                <h2>Conclusion</h2>
                                                structure S_14771 and the ramachandran plot found on <a
+
                                                <p>
                                                     href="https://proteopedia.org/wiki/images/9/90/Ramachandran_plot_general_100K.jpg">Protopedia</a>
+
                                                    We used machine learning methods, as well as Monte-carlo simulations
                                                suggests that secondary structures are present. Hence the structure
+
                                                    to
                                                appears
+
                                                    determine the structure of the mutated transpeptidase
                                                to contain &alpha;-helices, &beta;-sheets and a small amount of
+
                                                    Sortase&nbsp;A7M.
                                                lefthanded
+
                                                    The machine
                                                &alpha;-helices. </p>
+
                                                    learning approach using AlQuarishi's Deep Neural Network yielded a
                                            The Ramachandran plot (Figure xzy) showing &alpha;-helices and
+
                                                    structure which seemed to
                                            &beta;-sheets is a
+
                                                    not have any secondary structures. To exclude the possibility of an
                                            strong indicator of a successful structure determination, as those
+
                                                     error in the
                                            secondary
+
                                                    PyMOL visualization software by Schroedinger,
                                            structures are crucial for the functionality of sortases.
+
                                                    <sup id="cite_ref-7" class="reference">
 +
                                                        <a href="#cite_note-7">[7] </a>
 +
                                                    </sup>a Ramachandran plot
 +
                                                    (<b>Fig.&nbsp;3</b>)
 +
                                                    was created. The plot shows that no typical secondary structures are
 +
                                                    present, which is a strong indicator of a failed approach to
 +
                                                    determine a
 +
                                                    structure.
 +
                                                    The approach, using <i>Rosetta&nbsp;Comparative&nbsp;Modeling</i>,
 +
                                                    yielded
 +
                                                    15,000
 +
                                                    structures scored with the talaris2013 scoring function. The ten
 +
                                                    best structures
 +
                                                    were aligned and exhibited almost identical secondary structures
 +
                                                    (<b>Fig.&nbsp;4</b>).
 +
                                                    The greatest structural differences are present in the N- and
 +
                                                    C-terminal
 +
                                                    regions. Since terminal regions tend to fluctuate more strongly than
 +
                                                    non-terminal segments of the protein, we deemed those fluctuations
 +
                                                    non-relevant
 +
                                                    for the proteins functionality.
 +
                                                    <br>
 +
                                                    Being the best scoring candidate, structure <i>S_14771</i> was
 +
                                                    analyzed
 +
                                                    structurally
 +
                                                    using a Ramachandran plot (<b>Fig.&nbsp;5</b>). The plot shows all
 +
                                                    the
 +
                                                    relevant and
 +
                                                    typical structures sortases exhibit and serves as an indicator for
 +
                                                    a
 +
                                                    successful structure prediction.
 +
                                                    <br>
 +
                                                    In the steps to follow, a molecular&nbsp;dynamics&nbsp;(MD)
 +
                                                    simulation will be performed on both structures. Even though
 +
                                                    structure <i>CASP12</i>
 +
                                                    does not seem to be a valid structure, refolding processes during a
 +
                                                    MD
 +
                                                    simulation might lead to a relaxation of the protein and allow for a
 +
                                                    promising
 +
                                                    prediction of the Sortase&nbsp;A7M structure.
 +
                                                </p>
  
                                            <h2>Conclusion</h2>
 
                                            <p>
 
                                                We used machine learning methods, as well as monte-carlo simulations
 
                                                to
 
                                                determine the structure of the mutated transpeptidase Sortase A7M.
 
                                                The machine
 
                                                learning approach using AlQuarishi's Deep Neural Network yielded a
 
                                                structure which seemed to
 
                                                not have any secondary structures. To exclude the possibility of an
 
                                                error in the
 
                                                PyMOL visualization software by Schroedinger, a Ramachandran plot
 
                                                (figure xyz)
 
                                                was created. The plot shows that no typical secondary structures are
 
                                                present
 
                                                which is a strong indicator of a failed approach to determine a
 
                                                structure.
 
                                                The approach, using <i>Rosetta Comparative Modeling</i>, yielded
 
                                                15,000
 
                                                structures scored with the talaris2013 scoring function. The ten
 
                                                best structures
 
                                                were aligned and exhibited almost identical secondary structures
 
                                                (figure xzy).
 
                                                The greatest structural differences are present in the N- and
 
                                                C-terminal
 
                                                regions. Since terminal regions tend to fluctuate more strongly than
 
                                                non-terminal segments of the protein, we deemed those fluctuations
 
                                                non-relevant
 
                                                for the proteins functionality.
 
                                                <br>
 
                                                Being the best scoring candidate, structure S_14771 was analyzed
 
                                                structurally
 
                                                using a Ramachandran plot (figure xyz). The plot shows all the
 
                                                relevant and
 
                                                typical structures sortases exhibits and serves as an indicator for
 
                                                a
 
                                                successful structure prediction.
 
                                                <br>
 
                                                In the steps to follow, a molecular dynamics (MD)
 
                                                simulation will be performed on both structures. Even though
 
                                                structure CASP12
 
                                                does not seem to be a valid structure, refolding processes during a
 
                                                MD
 
                                                simulation might lead to a relaxation of the protein and allow for a
 
                                                promising
 
                                                prediction of the sortase A7M structure.
 
                                            </p>
 
                                            <h2>References</h2>
 
                                            <ol class="references">
 
                                                <li id="cite_note-1">
 
                                                    <span class="mw-cite-backlink">
 
                                                        <a href="#cite_ref-1">↑</a>
 
                                                    </span>
 
                                                    <span class="reference-text">
 
                                                        Bishop, CM.., Neural Networks for Pattern Recognition.
 
                                                        Oxford University
 
                                                        Press,
 
                                                        1995.
 
                                                        <a rel="nofollow" class="external autonumber"
 
                                                            href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                            target="_blank">[1] </a>
 
                                                    </span>
 
                                                </li>
 
                                                <li id="cite_note-2">
 
                                                    <span class="mw-cite-backlink">
 
                                                        <a href="#cite_ref-2">↑</a>
 
                                                    </span>
 
                                                    <span class="reference-text">
 
                                                        AlQuraishi, M., End-to-End Differentiable Learning of
 
                                                        Protein
 
                                                        Structure. Cell Systems, 2019. 8: 1–10.
 
                                                        <a rel="nofollow" class="external autonumber"
 
                                                            href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                            target="_blank">[2] </a>
 
                                                    </span>
 
                                                </li> <!-- dihedral junge shrinken -->
 
                                                <li id="cite_note-3">
 
                                                    <span class="mw-cite-backlink">
 
                                                        <a href="#cite_ref-3">↑</a>
 
                                                    </span>
 
                                                    <span class="reference-text">
 
                                                        Leaver-Fay, A. et al., ROSETTA3: an object-oriented software
 
                                                        suite for
 
                                                        the
 
                                                        simulation and design of
 
                                                        macromolecules. Methods Enzymol, 2011. 487:545-74.
 
                                                        <a rel="nofollow" class="external autonumber"
 
                                                            href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083816/"
 
                                                            target="_blank">[3]
 
                                                        </a>
 
                                                    </span>
 
                                                </li>
 
                                                <li id="cite_note-4">
 
                                                    <span class="mw-cite-backlink">
 
                                                        <a href="#cite_ref-4">↑</a>
 
                                                    </span>
 
                                                    <span class="reference-text">
 
                                                        Moult, J. et al., Critical assessment of methods of protein
 
                                                        structure
 
                                                        prediction
 
                                                        (CASP)—Round 6. PROTEINS:
 
                                                        Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7.
 
                                                        <a rel="nofollow" class="external autonumber"
 
                                                            href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.20716"
 
                                                            target="_blank">[4] </a>
 
                                                    </span>
 
                                                </li>
 
                                            </ol>
 
 
                                 </div>
 
                                 </div>
 
                             </div>
 
                             </div>
Line 675: Line 738:
 
                     </div>
 
                     </div>
 
                 </div>
 
                 </div>
 +
  
 
                 <div class="tab my-3">
 
                 <div class="tab my-3">
Line 693: Line 757:
 
                             <div class="col-xs-12 col-sm-12 col-md-10">
 
                             <div class="col-xs-12 col-sm-12 col-md-10">
 
                                 <div class="flex-center">
 
                                 <div class="flex-center">
                                     <h2>Introduction</h2>
+
                                     <h2>Background</h2>
  
 
                                     <p>The structure predictions made so far were based on statistical methods with
 
                                     <p>The structure predictions made so far were based on statistical methods with
Line 699: Line 763:
 
                                         constraints. The Deep
 
                                         constraints. The Deep
 
                                         Learning algorithm uses a neural network trained to find a function associating
 
                                         Learning algorithm uses a neural network trained to find a function associating
                                         the
+
                                         the amino acid sequence and
                                        amino acid sequence and
+
 
                                         the final 3D positions of the atoms within the protein. On the other hand,
 
                                         the final 3D positions of the atoms within the protein. On the other hand,
 
                                         predictions
 
                                         predictions
 
                                         were made with Rosetta
 
                                         were made with Rosetta
                                         using the Monte Carlo Method. Here random movement of individual atoms occurs,
+
                                         using the Monte-Carlo Method. Here random movement of individual atoms occurs,
 
                                         and the
 
                                         and the
 
                                         energy is estimated after
 
                                         energy is estimated after
                                         each step.</p>
+
                                         each step.
 +
                                    </p>
  
                                    <div class="row">
 
                                        <div class="figurcolumn column"
 
                                            style="width: 80%; float: right;  padding: 1em;">
 
                                            <img class="img-fluid center"
 
                                                src="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png"
 
                                                style="width:100%">
 
                                            <p><b>Figure 6: </b>Sortase A7M in a force field surrounded by discrete
 
                                                water molecules. Image was made with …. </p>
 
                                        </div>
 
                                    </div>
 
  
 
                                     <p>
 
                                     <p>
Line 732: Line 786:
 
                                         discrete molecules, creating a solvated protein. This step is crucial to
 
                                         discrete molecules, creating a solvated protein. This step is crucial to
 
                                         validate the structures, as the interaction with water is one of the primary
 
                                         validate the structures, as the interaction with water is one of the primary
                                         mechamism for protein folding.
+
                                         mechanism for protein folding.
                                         Since neither candidate CASP12 nor S_14771 have been modeled with explicit water
+
                                         Since neither candidate <i>CASP12</i> nor <i>S_14771</i> have been modeled with
 +
                                        explicit water
 
                                         an according MD simulation is imperative, to
 
                                         an according MD simulation is imperative, to
 
                                         verify the correctness of the candidates conformation.
 
                                         verify the correctness of the candidates conformation.
Line 739: Line 794:
 
                                         the protein has to be placed in a simulation box
 
                                         the protein has to be placed in a simulation box
 
                                         and said box is filled with water molecules. This is called solvation and is
 
                                         and said box is filled with water molecules. This is called solvation and is
                                         visualized for candidate S_14771 in figure eeeeee.
+
                                         visualized for candidate <i>S_14771</i> in <b>Fig. 6</b>.
 
                                     </p>
 
                                     </p>
 
+
                                    <div class="row">
 +
                                        <div class="figurcolumn column"
 +
                                            style="width: 80%; float: right;  padding: 1em; margin: 0 auto;">
 +
                                            <a  target="_blank"
 +
                                                href="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png">
 +
                                                <img class="img-fluid center"
 +
                                                    src="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png"
 +
                                                    style="width:100%"></a>
 +
                                            <div class="caption">
 +
                                                <p><b>Figure 6: </b>Sortase A7M in a force field surrounded by discrete
 +
                                                    water molecules and ions.</p>
 +
                                            </div>
 +
                                        </div>
 +
                                    </div>
  
 
                                     <p>
 
                                     <p>
                                         We used GROMACS (GROningen MAchine for Chemical Simulations)
+
                                         We used <i>GROMACS</i> (GROningen MAchine for Chemical Simulations)
                                         <!-- cite --> as the tool for our molecular dynamic simulations. GROMACS solves
+
                                         <!-- cite --> as the tool for our molecular dynamic simulations. <i>GROMACS</i>
 +
                                        solves
 
                                         Newtons
 
                                         Newtons
 
                                         equations of motion for
 
                                         equations of motion for
 
                                         individual atoms
 
                                         individual atoms
                                         <sup id="cite_ref-1" class="reference">
+
                                         <sup id="cite_ref-8" class="reference">
                                             <a href="#cite_note-1">[1] </a>
+
                                             <a href="#cite_note-8">[8] </a>
 
                                         </sup>
 
                                         </sup>
 
                                         . While this classical simulation is much more accurate than predictions made by
 
                                         . While this classical simulation is much more accurate than predictions made by
Line 758: Line 827:
 
                                         the system
 
                                         the system
 
                                         size is quite small.
 
                                         size is quite small.
                                         <sup id="cite_ref-1" class="reference">
+
                                         <sup id="cite_ref-8" class="reference">
                                             <a href="#cite_note-1">[1] </a>
+
                                             <a href="#cite_note-8">[8] </a>
 
                                         </sup>
 
                                         </sup>
 
                                         Additionally, atoms are assumed to be classical particles, which is not the
 
                                         Additionally, atoms are assumed to be classical particles, which is not the
Line 773: Line 842:
 
                                         To perform the molecular dynamics simulations we mostly followed the <a
 
                                         To perform the molecular dynamics simulations we mostly followed the <a
 
                                             href="http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html"
 
                                             href="http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html"
                                             target="_blank">GROMACS Lysosome tutorial</a> as it serves our purpose
+
                                             target="_blank"><i>GROMACS</i>
 +
                                            Lysozyme tutorial</a> as it serves our purpose
 
                                         perfectly. We created our simulation box to be of dodecahedral shape and a 0.7
 
                                         perfectly. We created our simulation box to be of dodecahedral shape and a 0.7
 
                                         nm distance of the solute to the box borders. We used periodic boundry
 
                                         nm distance of the solute to the box borders. We used periodic boundry
 
                                         conditions and a Na<sup>+</sup> Cl<sup>-</sup> concentration of 0.012 mol/L. The
 
                                         conditions and a Na<sup>+</sup> Cl<sup>-</sup> concentration of 0.012 mol/L. The
 
                                         main difference of our approach was that we used the CHARMM36
 
                                         main difference of our approach was that we used the CHARMM36
                                         <!-- cite --> force field instead of the OPLS-AA/L force field and have adjusted
+
                                         <sup id="cite_ref-9" class="reference">
 +
                                            <a href="#cite_note-9">[9] </a>
 +
                                        </sup>
 +
                                        force field instead of the OPLS-AA/L force field and have adjusted
 
                                         our molecular dynamics parameters <a
 
                                         our molecular dynamics parameters <a
                                             href="http://www.gromacs.org/Documentation/Terminology/Force_Fields/CHARMM"
+
                                             href="http://www.GROMACS.org/Documentation/Terminology/Force_Fields/CHARMM"
 
                                             target="_blank">accordingly</a>.
 
                                             target="_blank">accordingly</a>.
 
                                         The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to
 
                                         The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to
Line 789: Line 862:
 
                                         To analyse the MD simulation we used the Python programming language and the <a
 
                                         To analyse the MD simulation we used the Python programming language and the <a
 
                                             href="https://www.biotite-python.org/" target="_blank">Biotite package</a>
 
                                             href="https://www.biotite-python.org/" target="_blank">Biotite package</a>
                                         <!-- cite --> as well as GROMACS analysis tools as
+
                                         <sup id="cite_ref-10" class="reference">
                                         <!-- links zu den jungs--> <a>covar</a> and anaeig.
+
                                            <a href="#cite_note-10">[10] </a>
 +
                                        </sup>
 +
                                        as well as <i>GROMACS</i> analysis tools as
 +
                                         <a href="http://manual.gromacs.org/documentation/2018/onlinehelp/gmx-covar.html"
 +
                                            target="_blank">covar</a> and
 +
                                        <a href="http://manual.gromacs.org/documentation/2018/onlinehelp/gmx-anaeig.html"
 +
                                            target="_blank">anaeig</a>.
 
                                         The first analyses are a root-mean-square deviation (RMSD), a root-mean-square
 
                                         The first analyses are a root-mean-square deviation (RMSD), a root-mean-square
 
                                         fluctuation (RMSF) and a gyration radius analysis.
 
                                         fluctuation (RMSF) and a gyration radius analysis.
 
                                         RMSD calculations have been described in the structure prediction section. To
 
                                         RMSD calculations have been described in the structure prediction section. To
 
                                         compute the RMSF the movement distance of each
 
                                         compute the RMSF the movement distance of each
                                         residue is computed as a root-mean-square over time as:
+
                                         residue is computed as a root-mean-square over time as: </p>
                                         $$ RMSF(t) = \sqrt{ 1/N \sum_i^N (v_i(t) - v_i(0)},
+
                                    <div class="row" style="height: 4em;">
                                        where v(t)<sub>i</sub> is the position of atom i at time t. The radius of
+
                                         <img class="img-fluid center"
                                         gyration is
+
                                            src="https://2019.igem.org/wiki/images/d/d3/T--TU_Darmstadt--RMSF.png"
                                         <!-- überarbeiten -->
+
                                            style="max-height: 100%; width: auto; margin: 0 auto;">
 +
                                    </div>
 +
                                    <p>where v(t)<sub>i</sub> is the position of atom i at time t. The radius of
 +
                                         gyration is a quantity used to describe the expansion or compression of
 +
                                         a particle.
 
                                         The final analysis performed on the MD simulation is called Principle Component
 
                                         The final analysis performed on the MD simulation is called Principle Component
 
                                         Analysis (PCA).
 
                                         Analysis (PCA).
 
                                         By applying PCA to a protein it is possible to gain insights into the relevant
 
                                         By applying PCA to a protein it is possible to gain insights into the relevant
                                         vibrational motions and thereby the physical mechanism of the protein
+
                                         vibrational motions and thereby the physical mechanism of the protein.
                                         <!-- zitat -->.
+
                                         <sup id="cite_ref-11" class="reference">
 +
                                            <a href="#cite_note-11">[11] </a>
 +
                                        </sup>
 
                                     </p>
 
                                     </p>
  
Line 810: Line 895:
 
                                     <h3>First indicators</h3>
 
                                     <h3>First indicators</h3>
 
                                     <p>
 
                                     <p>
                                         The first possible indicators of a stable protein structure are converging RMSD,
+
                                         The first possible indicators of a stable protein structure are converging
                                         small RMSF values
+
                                        root-mean-square deviation (RMSD),
 +
                                         small root-mean-square
 +
                                        fluctuation (RMSF) values
 
                                         as well as converging radii of gyration. Using the Python software package and
 
                                         as well as converging radii of gyration. Using the Python software package and
 
                                         the module Biotite we calculated
 
                                         the module Biotite we calculated
                                         these quantities and plotted the results for both candidate S_14771 and
+
                                         these quantities and plotted the results for both candidate <i>S_14771</i> and
                                         candidate CASP12.
+
                                         candidate <i>CASP12</i>.
 
                                     </p>
 
                                     </p>
 
                                     <div class="row">
 
                                     <div class="row">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png"
+
                                                href="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png"><img
                                                style="width:100%">
+
                                                    class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png"
                                                 <b>Figure 7: </b> The RMSD is one of three main indicators of a stable
+
                                                    style="width:100%"></a>
                                                protein structure of the MD simulation of
+
                                             <div class="caption">
                                                S_14771 over the period of 200,000 ps. As time progressed the RMSD
+
                                                 <p>
                                                increased with a smaller slope.
+
                                                    <b>Figure 7: </b> The RMSD is one of three main indicators of a
                                                The value stabilizes at a time of 110,000 ps and fluctuated around the
+
                                                    stable
                                                value of 6 &#8491;.
+
                                                    protein structure of the MD simulation of
                                            </p>
+
                                                    <i>S_14771</i> over the period of 200,000 ps. As time progressed the
 +
                                                    RMSD
 +
                                                    increased with a smaller slope.
 +
                                                    The value stabilizes at a time of 110,000 ps and fluctuated around
 +
                                                    the
 +
                                                    value of 6 &#8491;.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
  
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png"
+
                                                href="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png"><img
                                                style="width:100%">
+
                                                    class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png"
                                                 <b>Figure 8: </b> At t = 40,000 ps already the RMSD has arived at a
+
                                                    style="width:100%"></a>
                                                stable value, while at the same time
+
                                             <div class="caption">
                                                the gyration (fig x) radius decreases over time continuously. This
+
                                                 <p>
                                                information suggests the protein
+
                                                    <b>Figure 8: </b> At t = 40,000 ps already the RMSD has arived at a
                                                might be folding and potentially develpoing secondary structures not
+
                                                    stable value, while at the same time
                                                present previously.
+
                                                    the gyration (<b>Fig. 10</b>) radius decreases over time
                                            </p>
+
                                                    continuously.
 +
                                                    This
 +
                                                    information suggests the protein
 +
                                                    might be folding and potentially developing secondary structures not
 +
                                                    present previously.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png"
+
                                                href="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png"><img
                                                style="width:100%">
+
                                                    class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png"
                                                 <b>Figure 9: </b> The prominent fluctuations of the residues from ranges
+
                                                    style="width:100%"></a>
                                                105 to 115 might
+
                                             <div class="caption">
                                                indicate a binding site or another form of functional structure. The
+
                                                 <p>
                                                radius of gyration, just as
+
                                                    <b>Figure 9: </b> The prominent fluctuations of the residues from
                                                the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
+
                                                    ranges
                                                and converges towards a value of
+
                                                    105 to 115 might
                                                16.7 &#8491;.
+
                                                    indicate a binding site or another form of functional structure. The
                                            </p>
+
                                                    radius of gyration, just as
 +
                                                    the RMSD (<b>Fig. 7</b>), stabilizes around a simulation time of of
 +
                                                    110,000 ps
 +
                                                    and converges towards a value of
 +
                                                    16.7 &#8491;.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
  
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png"
+
                                                href="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png"><img
                                                style="width:100%">
+
                                                    class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png"
                                                 <b>Figure 10: </b> As from t = 40,000 ps the radius of gyration
+
                                                    style="width:100%"></a>
                                                decreases constantly. At the end of the simulation the gyration radius
+
                                             <div class="caption">
                                                reaches a value of 17 &#8491;.
+
                                                 <p>
                                                This behavior indicates folding of the protein structure.
+
                                                    <b>Figure 10: </b> As from t = 40,000 ps the radius of gyration
                                            </p>
+
                                                    decreases constantly. At the end of the simulation the gyration
 +
                                                    radius
 +
                                                    reaches a value of 17 &#8491;.
 +
                                                    This behavior indicates folding of the protein structure.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
  
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png"
+
                                                href="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png"
                                                 <b>Figure 11: </b> The fluctuations
+
                                                    style="width:100%"></a>
                                                (RMSF) of most residues appear insignificant compared to the first, the
+
                                             <div class="caption">
                                                last residues and
+
                                                 <p>
                                                the residues close to residue 110 . Typically the N- and C-terminus tend
+
                                                    <b>Figure 11: </b> The fluctuations
                                                to fluctuate more intensively due to the lack of
+
                                                    (RMSF) of most residues appear insignificant compared to the first,
                                                stabilizing structures. The prominent fluctuations in the range of
+
                                                    the
                                                residue 105 to 115
+
                                                    last residues and
                                                can indicate a binding site or another form of functional structure.
+
                                                    the residues close to residue 110 . Typically the N- and C-terminus
                                            </p>
+
                                                    tend
 +
                                                    to fluctuate more intensively due to the lack of
 +
                                                    stabilizing structures. The prominent fluctuations in the range of
 +
                                                    residue 105 to 115
 +
                                                    can indicate a binding site or another form of functional structure.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
  
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
                                                src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png"
+
                                                href="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png">
                                                style="width:100%">
+
                                                <img class="img-fluid center"
                                             <p>
+
                                                    src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png"
                                                 <b>Figure 12: </b> The prominent fluctuations of the residues from
+
                                                    style="width:100%"></a>
                                                ranges 105 to 115 might
+
                                             <div class="caption">
                                                indicate a binding site or another form of functional structure. The
+
                                                 <p>
                                                radius of gyration, just as
+
                                                    <b>Figure 12: </b> The prominent fluctuations of the residues from
                                                the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
+
                                                    ranges 105 to 115 might
                                                and converges towards a value of
+
                                                    indicate a binding site or another form of functional structure. The
                                                16.7 &#8491;.
+
                                                    radius of gyration, just as
                                            </p>
+
                                                    the RMSD (<b>Fig. 8</b>), stabilizes around a simulation time of of
 +
                                                    110,000 ps
 +
                                                    and converges towards a value of
 +
                                                    16.7 &#8491;.
 +
                                                </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
Line 909: Line 1,031:
 
                                         protein. Convergence of those quantities can be interpreted as a stable state of
 
                                         protein. Convergence of those quantities can be interpreted as a stable state of
 
                                         the protein
 
                                         the protein
                                         structure. As it can be seen in Figures x and y both the RMSD and the radius of
+
                                         structure. As it can be seen in <b>Fig. 7</b> and <b>Fig. 9</b> both the RMSD
 +
                                        and
 +
                                        the radius of
 
                                         gyration
 
                                         gyration
 
                                         stabilize at the same time as the simulation reaches 110,000 ps (110 ns),
 
                                         stabilize at the same time as the simulation reaches 110,000 ps (110 ns),
 
                                         suggesting a now
 
                                         suggesting a now
                                         stabilized structure of candidate S_14771 solvated in water. Another indicator
+
                                         stabilized structure of candidate <i>S_14771</i> solvated in water. Another
 +
                                        indicator
 
                                         of a
 
                                         of a
 
                                         functional protein is the RMSF. Instead of being averaged over all atoms, the
 
                                         functional protein is the RMSF. Instead of being averaged over all atoms, the
Line 919: Line 1,044:
 
                                         averaged over time with respect to each amino acid. It provides insights in both
 
                                         averaged over time with respect to each amino acid. It provides insights in both
 
                                         protein
 
                                         protein
                                         stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to
+
                                         stability and functionality. <b>Fig. 11</b> reveals the RMSF of residues 105 to
 +
                                        115 to
 
                                         be
 
                                         be
 
                                         significantly higher than that of other residues. This hints at the presence of
 
                                         significantly higher than that of other residues. This hints at the presence of
Line 926: Line 1,052:
 
                                         describing our structure prediction approaches, the N-
 
                                         describing our structure prediction approaches, the N-
 
                                         and C-terminal regions tend to fluctuate more strongly as a result of the
 
                                         and C-terminal regions tend to fluctuate more strongly as a result of the
                                         absence of
+
                                         absence of stabilizing structures.
                                        stabilizing structures.
+
 
                                     </p>
 
                                     </p>
 
                                     <p>
 
                                     <p>
                                         RMSD and gyration of radius calculations of candidate CASP12 (figures x and y)
+
                                         RMSD and gyration of radius calculations of candidate <i>CASP12</i> (<b>Fig.
 +
                                            8</b>
 +
                                        and <b>Fig. 10</b>)
 
                                         provide evidence of folding.
 
                                         provide evidence of folding.
 
                                         However, the RMSF values show values significantly higher, an
 
                                         However, the RMSF values show values significantly higher, an
Line 937: Line 1,064:
 
                                         residue 105 to
 
                                         residue 105 to
 
                                         115. This insight consolidates the theory that residues 105 to 115 might be a
 
                                         115. This insight consolidates the theory that residues 105 to 115 might be a
                                         part of a
+
                                         part of a functional unit.
                                        functional unit.
+
 
                                     </p>
 
                                     </p>
 
                                     <p>
 
                                     <p>
                                         We were unsure whether candidate CASP12 can be considered a plausible structure
+
                                         We were unsure whether candidate <i>CASP12</i> can be considered a plausible
 +
                                        structure
 
                                         and
 
                                         and
 
                                         how to interpret the findings concerning the prominent fluctuations. Therefore,
 
                                         how to interpret the findings concerning the prominent fluctuations. Therefore,
Line 951: Line 1,078:
 
                                     <p>
 
                                     <p>
 
                                         To analyze our system further Principle Component Analysis (PCA) was performed
 
                                         To analyze our system further Principle Component Analysis (PCA) was performed
                                         using GROMACS.
+
                                         using <i>GROMACS</i>. By applying PCA to a MD simulation of a protein it is
 +
                                        possible
 +
                                        to extract the most relevant motions of the protein.
 
                                     </p>
 
                                     </p>
 
                                     <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                     <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                         <img class="img-fluid center"
+
                                         <a  target="_blank"
                                            src="https://2019.igem.org/wiki/images/d/db/T--TU_Darmstadt--modes_s14771.gif"
+
                                            href="https://2019.igem.org/wiki/images/d/db/T--TU_Darmstadt--modes_s14771.gif"><img
                                            style="width:100%">
+
                                                class="img-fluid center"
                                         <p><b>Animation 4: </b> A Principle Component Analysis of a fast (blue) and a
+
                                                src="https://2019.igem.org/wiki/images/d/db/T--TU_Darmstadt--modes_s14771.gif"
                                            slow (red) mode showing the most prominent movements of the C&alpha;-chain
+
                                                style="width:100%"></a>
                                            of candidate S_14771. Both modes show movement of the &beta;6&#47;&beta;7
+
                                         <div class="caption">
                                            loop consisting of residues 105 to 115 towards the active site . Thus we can
+
                                            <p><b>Animation 5: </b> A Principle Component Analysis of a less (blue) and
                                            assume that the closing &beta;6&#47;&beta;7 loop is involved in the reaction
+
                                                a
                                            mechanism. </p>
+
                                                more relevant (red) principal movements showing the most prominent
 +
                                                movements of the
 +
                                                C&alpha;-chain
 +
                                                of candidate <i>S_14771</i>. Both principal components show movement of
 +
                                                the
 +
                                                &beta;6&#47;&beta;7
 +
                                                loop consisting of residues 105 to 115 towards the active site. Thus we
 +
                                                can
 +
                                                assume that the closing &beta;6&#47;&beta;7 loop is involved in the
 +
                                                reaction
 +
                                                mechanism. </p>
 +
                                        </div>
 
                                     </div>
 
                                     </div>
  
 
                                     <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                     <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                         <img class="img-fluid center"
+
                                         <a  target="_blank"
 +
                                            href="https://2019.igem.org/wiki/images/1/17/T--TU_Darmstadt--modes_casp.gif"><img class="img-fluid center"
 
                                             src="https://2019.igem.org/wiki/images/1/17/T--TU_Darmstadt--modes_casp.gif"
 
                                             src="https://2019.igem.org/wiki/images/1/17/T--TU_Darmstadt--modes_casp.gif"
                                             style="width:100%">
+
                                             style="width:100%"></a>
                                         <p><b>Animation 5: </b> The modes of candidate CASP appear similar to each other
+
                                         <div class="caption">
                                            and no strong single movement can be specified. This makes the slow (red)
+
                                            <p><b>Animation 6: </b> The principal movements of candidate <i>CASP</i>
                                            and fast (blue) mode indistinguishable from one another. Moreover the active
+
                                                appear
                                            site amino acids do not appear to be in close proximity, which would make a
+
                                                similar to each
                                            reaction catalyzed by candidate CASP12 impossible. </p>
+
                                                other
 +
                                                and no strong single movement can be specified. This makes the most
 +
                                                relevant
 +
                                                (red)
 +
                                                and less relevant (blue) principal component indistinguishable from one
 +
                                                another. Moreover the
 +
                                                active
 +
                                                site amino acids do not appear to be in close proximity, which would
 +
                                                make a
 +
                                                reaction catalyzed by candidate <i>CASP12</i> impossible. </p>
 +
                                        </div>
 
                                     </div>
 
                                     </div>
  
 
                                     <p>
 
                                     <p>
                                         The results from the Principle Component Analysis of candidate S_14771
+
                                         The results from the Principle Component Analysis of candidate <i>S_14771</i>
                                         (animation xy) show a movement of the residues 105 to 115 towards the active
+
                                         (<b>Animation 5</b>) show a movement of the residues 105 to 115 towards the
                                         site, supporting our theory that residues 105 to 115 are important for the
+
                                        active
                                         reaction mechanism. Since the slow mode (red), which shows the most relevant
+
                                         site,
                                         movement of the sortase, moves further towards the active site, it is possible
+
                                        <sup id="cite_ref-12" class="reference">
 +
                                            <a href="#cite_note-12">[12] </a>
 +
                                        </sup>
 +
                                        supporting our theory that residues 105 to 115 are important for the
 +
                                         reaction mechanism. Since the the most relevant vibrational
 +
                                         movement of the sortase (red), is directed towards the active site, it is
 +
                                        possible
 
                                         that the &beta;6&#47;&beta;7 loop either closes the binding site of the ligand
 
                                         that the &beta;6&#47;&beta;7 loop either closes the binding site of the ligand
 
                                         peptides or even transports one peptide towards the other.
 
                                         peptides or even transports one peptide towards the other.
Line 987: Line 1,144:
  
 
                                     <p>
 
                                     <p>
                                         Animation xyz shows the results of the Principle Component Analysis of candidate
+
                                         <b>Animation 6</b> shows the results of the Principle Component Analysis of
                                         CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to
+
                                         candidate <i>CASP12</i>. As the RMSF calculations suggested (<b>Fig. 12</b>),
 +
                                        the whole protein seems to
 
                                         be moving randomly with no directed movement.
 
                                         be moving randomly with no directed movement.
                                         In addition the active site amino acids
+
                                         In addition the active site residues
                                         <!-- ref --> are spread across the protein confirming our assumption that the
+
                                         <sup id="cite_ref-12" class="reference">
 +
                                            <a href="#cite_note-12">[12] </a>
 +
                                        </sup>
 +
                                        are spread across the protein confirming our assumption that the
 
                                         protein is not in a stable or plausible conformation.
 
                                         protein is not in a stable or plausible conformation.
 
                                     </p>
 
                                     </p>
Line 997: Line 1,158:
 
                                     <h2>Conclusion</h2>
 
                                     <h2>Conclusion</h2>
 
                                     <p>
 
                                     <p>
                                         We gained evidence that at least on of our Sortase A7M models is a valid and
+
                                         We gained evidence that at least one of our Sortase A7M models is a valid and
 
                                         stable candidate by performing various methods to analyse the structural
 
                                         stable candidate by performing various methods to analyse the structural
                                         stability and validity of our two Sortase A7M candidates. The candidate S_14771
+
                                         stability and validity of our two Sortase A7M candidates. The candidate
 +
                                        <i>S_14771</i>
 
                                         that was generated using <i>RosettaCM</i> appears to be a fitting candidate not
 
                                         that was generated using <i>RosettaCM</i> appears to be a fitting candidate not
 
                                         only due to successful analyses, but also since the residues of the active site
 
                                         only due to successful analyses, but also since the residues of the active site
                                         <!-- ref --> are close enough to each other to catalyze a ligation reaction.
+
                                         <sup id="cite_ref-12" class="reference">
 +
                                            <a href="#cite_note-12">[12] </a>
 +
                                        </sup>
 +
                                        are close enough to each other to catalyze a ligation reaction.
 
                                         Our model created through deep learning excelled only in terms of RMSD and
 
                                         Our model created through deep learning excelled only in terms of RMSD and
 
                                         gyration radius calculations. Not only the RMSF and Principle Component Analysis
 
                                         gyration radius calculations. Not only the RMSF and Principle Component Analysis
                                         but also the conformation of the active site have proven candidate CASP12 to be
+
                                         but also the conformation of the active site have proven candidate <i>CASP12</i>
 +
                                        to be
 
                                         of no use for further calculations as it does not portray a valid conformation
 
                                         of no use for further calculations as it does not portray a valid conformation
 
                                         of Sortase A7M.
 
                                         of Sortase A7M.
 
                                     </p>
 
                                     </p>
 
                                     </p>
 
                                     </p>
 
 
                                    <h2>References</h2>
 
                                    <ol class="references">
 
                                        <li id="cite_note-1">
 
                                            <span class="mw-cite-backlink">
 
                                                <a href="#cite_ref-1">↑</a>
 
                                            </span>
 
                                            <span class="reference-text">
 
                                                Apol, E. et. al. GROMACS
 
                                                USER MANUAL. Department of Biophysical Chemistry, University of
 
                                                Groningen.
 
                                                2015.
 
                                                <a rel="nofollow" class="external autonumber"
 
                                                    href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                    target="_blank">[1] </a>
 
                                            </span>
 
                                        </li>
 
                                    </ol>
 
 
 
                                 </div>
 
                                 </div>
 
                             </div>
 
                             </div>
Line 1,036: Line 1,182:
 
                     </div>
 
                     </div>
 
                 </div>
 
                 </div>
 +
 +
 
                 <div class="tab my-3">
 
                 <div class="tab my-3">
 
                     <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body4"
 
                     <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body4"
Line 1,058: Line 1,206:
 
                                         The procedure of choice
 
                                         The procedure of choice
 
                                         for the introduction of a ligand into the binding site of a protein is called
 
                                         for the introduction of a ligand into the binding site of a protein is called
                                         <i>docking</i>. In the
+
                                         docking. In the
 
                                         following sections, we will present the protocol and methods we used as well as
 
                                         following sections, we will present the protocol and methods we used as well as
 
                                         the results they yielded.
 
                                         the results they yielded.
Line 1,071: Line 1,219:
 
                                         affinity.
 
                                         affinity.
 
                                         To determine the best possible binding conformation of the protein-ligand
 
                                         To determine the best possible binding conformation of the protein-ligand
                                         complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons
+
                                         complex, we use <i>FlexPepDock</i>,
 +
                                        <sup id="cite_ref-13" class="reference">
 +
                                            <a href="#cite_note-13">[13] </a>
 +
                                        </sup>
 +
                                        an algorithm provided by the the <i>RosettaCommons</i>
 
                                         software package.
 
                                         software package.
 
                                     </p>
 
                                     </p>
Line 1,078: Line 1,230:
 
                                     <p>
 
                                     <p>
 
                                         The ab-initio FlexPepDock protocol consists of multiple steps and is documented
 
                                         The ab-initio FlexPepDock protocol consists of multiple steps and is documented
                                         on the RosettaCommons <a href="">online documentation</a>. We modified the
+
                                         on the <i>RosettaCommons</i> <a
 +
                                            href="https://www.rosettacommons.org/docs/latest/application_documentation/docking/flex-pep-dock"
 +
                                            target="_blank">online documentation</a>.
 +
                                        We modified the
 
                                         protocol as the one provided did not work with our approach.
 
                                         protocol as the one provided did not work with our approach.
 
                                         The modified protocol has the following form:
 
                                         The modified protocol has the following form:
Line 1,085: Line 1,240:
 
                                         <li>secondary structure determination</li>
 
                                         <li>secondary structure determination</li>
 
                                         <li>complex creation</li>
 
                                         <li>complex creation</li>
                                         <li>FlexPepDock refinement</li>
+
                                         <li><i>FlexPepDock</i> refinement</li>
 
                                     </ol>
 
                                     </ol>
 
                                     <br>
 
                                     <br>
 
                                     <p>
 
                                     <p>
                                         To determine the secondary structure of the peptide, fragment files (3- and
+
                                         To determine the secondary structure of the peptide, fragment files (3-, 5- and
                                         5-mers) had to be generated and a PSIPRED secondary structure prediction had to
+
                                         9-mers)
 +
                                        had to be generated and a PSIPRED secondary structure prediction
 +
                                        <sup id="cite_ref-15" class="reference">
 +
                                            <a href="#cite_note-15">[15] </a>
 +
                                        </sup>
 +
                                        had to
 
                                         be performed. As the peptides had a sequence length less than 20 amino acids, we
 
                                         be performed. As the peptides had a sequence length less than 20 amino acids, we
 
                                         were not able to use the online services such as <a
 
                                         were not able to use the online services such as <a
                                             href="http://robetta.bakerlab.org/">Robetta</a> and the <a
+
                                             href="http://robetta.bakerlab.org/" target="_blank">Robetta</a> and the <a
                                             href="http://bioinf.cs.ucl.ac.uk/psipred/">PSIPRED online service</a>.
+
                                             href="http://bioinf.cs.ucl.ac.uk/psipred/" target="_blank">PSIPRED online service</a>.
                                         Instead we used the Rosetta <a
+
                                         Instead we used the Rosetta <a target="_blank"
 
                                             href="https://www.rosettacommons.org/docs/latest/application_documentation/utilities/app-fragment-picker">FragmentPicker
 
                                             href="https://www.rosettacommons.org/docs/latest/application_documentation/utilities/app-fragment-picker">FragmentPicker
 
                                             application</a> and the PSIPRED <a
 
                                             application</a> and the PSIPRED <a
                                             href="https://github.com/psipred/psipred">command line tool</a>.
+
                                             href="https://github.com/psipred/psipred" target="_blank">command line tool</a>.
 
                                         The generated structures serve as the input for the refinement protocol.
 
                                         The generated structures serve as the input for the refinement protocol.
 
                                         <br>
 
                                         <br>
Line 1,110: Line 1,270:
 
                                     <br>
 
                                     <br>
 
                                     <p>
 
                                     <p>
                                         The peptide structure was created through ab-initio modeling.
+
                                         The peptide structure was created through <i>ab-initio</i> modeling.
 
                                         Initial creation of the peptide was followed by insertion of the peptide into
 
                                         Initial creation of the peptide was followed by insertion of the peptide into
 
                                         the sortase binding site. This lead to a coarse model of the peptide sortase
 
                                         the sortase binding site. This lead to a coarse model of the peptide sortase
 
                                         complex. Here we used insight gained from the molecular dynamics simulation to
 
                                         complex. Here we used insight gained from the molecular dynamics simulation to
 
                                         place the peptide close to the binding site.
 
                                         place the peptide close to the binding site.
                                         <!-- vielleicht hier schon biotite erwähnen -->
+
                                        This operation was performed using Biotite.
 +
                                         <sup id="cite_ref-10" class="reference">
 +
                                            <a href="#cite_note-10">[10] </a>
 +
                                        </sup>
 
                                         <br>
 
                                         <br>
                                         In the final step the FlexPepDock refinement protocol is executed and 50,000
+
                                         In the final step the <i>FlexPepDock</i> refinement protocol is executed and
 +
                                        50,000
 
                                         complex structures are generated. We used the inputs as described in
 
                                         complex structures are generated. We used the inputs as described in
                                         {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
+
                                         <span id="cite_ref-13" class="reference">
 +
                                            <a href="#cite_note-13">[13] </a>
 +
                                        </span>, written by the authors of the <i>FlexPepDock</i> documentation.
 
                                         <br>
 
                                         <br>
 
                                         To get a better overview over our data we performed a clustering in python,
 
                                         To get a better overview over our data we performed a clustering in python,
                                         using the scikit-learn package. We clustered the structures with respect to:
+
                                         using the scikit-learn
 +
                                        <sup id="cite_ref-14" class="reference">
 +
                                            <a href="#cite_note-14">[14] </a>
 +
                                        </sup>
 +
                                        package. We clustered the structures with respect to:
 
                                     </p>
 
                                     </p>
 
                                     <ul>
 
                                     <ul>
Line 1,137: Line 1,307:
 
                                     <br>
 
                                     <br>
 
                                     <p>
 
                                     <p>
                                         Here clustering is used to group the docking results and thereby descrease the
+
                                         Here clustering is used to group the docking results and thereby decrease the
 
                                         samlple size.
 
                                         samlple size.
 
                                         From the 50,000 results we picked the results with the 500 best total scores,
 
                                         From the 50,000 results we picked the results with the 500 best total scores,
Line 1,155: Line 1,325:
 
                                     </p>
 
                                     </p>
  
                                     <div class="figurcolumn column" style="width: 50%; float: center; padding: 1em;">
+
                                     <div class="figurcolumn column"
                                         <img class="img-fluid center"
+
                                        style="width: 50%; float: center; padding: 1em; margin: 0 auto;">
 +
                                         <a  target="_blank"
 +
                                            href="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png"><img class="img-fluid center"
 
                                             src="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png"
 
                                             src="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png"
                                             style="width:100%">
+
                                             style="width:100%"></a>
                                         <p><b>Figure 13: </b> The three best scoring structures (total score, interface
+
                                         <div class="caption">
                                            score, reweighted score) of the LPETGG-tag are shown. Only two results are
+
                                            <p><b>Figure 13: </b> The three best scoring structures (total score,
                                            visible as the best reweighted score candidate is identical to the best
+
                                                interface
                                            interface score candidate. The reacting section of the LPETGG-tag namely
+
                                                score, reweighted score) of the LPETGG-tag are shown. Only two results
                                            glycine is colored yellow as is the active site. The glycin of both ligand
+
                                                are
                                            peptides is facing the active site. </p>
+
                                                visible as the best reweighted score candidate is identical to the best
 +
                                                interface score candidate. The reacting section of the LPETGG-tag namely
 +
                                                glycine is colored yellow as is the active site. The glycin of both
 +
                                                ligand
 +
                                                peptides is facing the active site. </p>
 +
                                        </div>
 
                                     </div>
 
                                     </div>
 
                                     <p>
 
                                     <p>
Line 1,174: Line 1,351:
 
                                     <div class="row">
 
                                     <div class="row">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
 +
                                                href="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png"><img class="img-fluid center"
 
                                                 src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png"
 
                                                 src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png"
                                                 style="width:100%">
+
                                                 style="width:100%"></a>
                                             <p><b>Figure 14: </b>The three best scoring structures (total score,
+
                                             <div class="caption">
                                                interface score, reweighted score) of the poly-g peptide are shown. Only
+
                                                <p><b>Figure 14: </b>The three best scoring structures (total score,
                                                two results are visible as the best reweighted score candidate is
+
                                                    interface score, reweighted score) of the poly-g peptide are shown.
                                                identical to the best interface score candidate. Instead of facing the
+
                                                    Only
                                                active site (yellow) the reacting glycines (yellow) appear to interact
+
                                                    two results are visible as the best reweighted score candidate is
                                                with the &beta;6&#47;&beta;7 loop of the sortase. </p>
+
                                                    identical to the best interface score candidate. Instead of facing
 +
                                                    the
 +
                                                    active site (yellow) the reacting glycines (yellow) appear to
 +
                                                    interact
 +
                                                    with the &beta;6&#47;&beta;7 loop of the sortase. </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
 +
                                                href="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png"><img class="img-fluid center"
 
                                                 src="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png"
 
                                                 src="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png"
                                                 style="width:100%">
+
                                                 style="width:100%"></a>
                                             <p><b>Figure 15: </b>The three best scoring structures (total score,
+
                                             <div class="caption">
                                                interface score, reweighted score) of the poly-g peptide are shown. Only
+
                                                <p><b>Figure 15: </b>The three best scoring structures (total score,
                                                two results are visible as the best reweighted score candidate is
+
                                                    interface score, reweighted score) of the M-polyG peptide are
                                                identical to the best interface score candidate.
+
                                                    shown.
                                                Concerning the M-poly-G peptide no uniform directional orientation can
+
                                                    Only two results are visible as the best reweighted score candidate
                                                be observed.
+
                                                    is
                                                The structure with the best interface score (light blue) is oriendted
+
                                                    identical to the best interface score candidate.
                                                towards the loop while the structure with the best total/reweighted
+
                                                    Concerning the M-polyG peptide no uniform directional orientation
                                                (dark blue) is oriented towards the &beta;-sheets.</p>
+
                                                    can
 +
                                                    be observed.
 +
                                                    The structure with the best interface score (light blue) is
 +
                                                    oriendted
 +
                                                    towards the &beta;6&#47;&beta;7 loop while the structure with the
 +
                                                    best total/reweighted
 +
                                                    (dark blue) is oriented towards the &beta;-sheets.</p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
 
                                     <!-- see more button instead oben halt und so -->
 
                                     <!-- see more button instead oben halt und so -->
 
                                     <p>
 
                                     <p>
                                        Figure lpetgg
+
                                         <b>Fig. 13</b> shows the docking result of the LPETGG peptide to
                                         <!-- das auch noch ändern --> shows the docking result of the LPETGG peptide to
+
 
                                         the sortase. The results shown are the best scoring structures of the clustering
 
                                         the sortase. The results shown are the best scoring structures of the clustering
 
                                         with respect to the total score, interface score and reweighted score. As the
 
                                         with respect to the total score, interface score and reweighted score. As the
 
                                         best scoring structure is the same for the total score and the reweighted score
 
                                         best scoring structure is the same for the total score and the reweighted score
                                         only two peptides are shown. This also applies to figures x and y. For both
+
                                         only two peptides are shown. This also applies to <b>Fig. 14</b>and <b>Fig.
 +
                                            15</b>. For both
 
                                         results the reacting glycin residues (yellow) are facing the active site.
 
                                         results the reacting glycin residues (yellow) are facing the active site.
 
                                         Additionally, the same residues are in close proximity to the active site.
 
                                         Additionally, the same residues are in close proximity to the active site.
 
                                     </p>
 
                                     </p>
 
                                     <p>
 
                                     <p>
                                         The figures x ad y show the docking of the both polyG and M-polyG. While polyG
+
                                         <b>Fig. 14</b> and <b>Fig. 15</b> show the docking of the both polyG and
 +
                                        M-polyG. While polyG
 
                                         results align well and seem to be interacting with the &beta;6&#47;&beta;7 loop
 
                                         results align well and seem to be interacting with the &beta;6&#47;&beta;7 loop
 
                                         rather than with the active site, this does not seem to be the case for M-polyG.
 
                                         rather than with the active site, this does not seem to be the case for M-polyG.
Line 1,222: Line 1,414:
 
                                     <div class="row">
 
                                     <div class="row">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a target="_blank" href="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png"><img class="img-fluid center"
 
                                                 src="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png"
 
                                                 src="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png"
                                                 style="width:100%">
+
                                                 style="width:100%"></a>
                                             <p><b>Figure 16: </b>The close up of the M-polyG peptide (best
+
                                             <div class="caption">
                                                total/reweighted score) indicates an interaction of methionine with
+
                                                <p><b>Figure 16: </b>The close up of the M-polyG peptide (best
                                                arginine<sub>139</sub> and cysteine<sub>126</sub>. </p>
+
                                                    total/reweighted score) indicates an interaction of methionine with
 +
                                                    arginine<sub>139</sub> and cysteine<sub>126</sub>. </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                                         <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                             <img class="img-fluid center"
+
                                             <a  target="_blank"
 +
                                                href="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png"><img class="img-fluid center"
 
                                                 src="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png"
 
                                                 src="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png"
                                                 style="width:100%">
+
                                                 style="width:100%"></a>
                                             <p><b>Figure 17: </b> Methionine of the result with the best interface score
+
                                             <div class="caption">
                                                interacted with the &beta;6&#47;&beta;7 loop rather than the active
+
                                                <p><b>Figure 17: </b> Methionine of the result with the best interface
                                                site. Still the reactive glycine residues appear to be bound to the
+
                                                    score
                                                &beta;6&#47;&beta;7 loop. </p>
+
                                                    interacted with the &beta;6&#47;&beta;7 loop rather than the active
 +
                                                    site. Still the reactive glycine residues appear to be bound to the
 +
                                                    &beta;6&#47;&beta;7 loop. </p>
 +
                                            </div>
 
                                         </div>
 
                                         </div>
 
                                     </div>
 
                                     </div>
  
 
                                     <p>
 
                                     <p>
                                         As can be seen in figure 16 visualizing the result of the the docking simulation
+
                                         As can be seen in <b>Fig. 16</b> visualizing the result of the the docking
                                         total/reweighted score) suggests an interaction of methionine and two of the
+
                                        simulation
 +
                                         (total/reweighted score) suggests an interaction of methionine and two of the
 
                                         active sites namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
 
                                         active sites namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
                                         <!-- metionin erwähnen -->
+
                                         <b>Fig. 17</b> shows the interaction of M-polyG with the &beta;6&#47;&beta;7
                                        Visualizing the result of the according docking simulation, as can be seen in
+
                                        loop.
                                        figure 16, suggests an interaction between methionine and two active site
+
                                        residues, namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
+
                                        Figure 17 shows the interaction of M-polyG with the &beta;6&#47;&beta;7 loop.
+
 
                                         The glycines still interact with the &beta;6&#47;&beta;7 loop.
 
                                         The glycines still interact with the &beta;6&#47;&beta;7 loop.
 
                                         Instead of binding above the &beta;6&#47;&beta;7 loop, which is the case for
 
                                         Instead of binding above the &beta;6&#47;&beta;7 loop, which is the case for
                                         polyG as illustrated in fig z,
+
                                         polyG as illustrated in <b>Fig. 14</b>,
 
                                         the interaction seems to be influenced by methionine. By interacting with the
 
                                         the interaction seems to be influenced by methionine. By interacting with the
 
                                         residues in the &beta;-helix
 
                                         residues in the &beta;-helix
Line 1,276: Line 1,472:
 
                                         structures regarding the total score, the interface score and the reweighted
 
                                         structures regarding the total score, the interface score and the reweighted
 
                                         score using PyMOL.
 
                                         score using PyMOL.
 +
                                        <sup id="cite_ref-7" class="reference">
 +
                                            <a href="#cite_note-7">[7] </a>
 +
                                        </sup>
 
                                         Since the best structures with respect to total score and reweighted score were
 
                                         Since the best structures with respect to total score and reweighted score were
 
                                         the same for all simulations,
 
                                         the same for all simulations,
Line 1,281: Line 1,480:
 
                                         to all the peptides to simulate
 
                                         to all the peptides to simulate
 
                                         the modification of the VLPs with a small peptide.
 
                                         the modification of the VLPs with a small peptide.
                                        <!-- GRoß helices etc erwähnen als begründung -->
 
 
                                     </p>
 
                                     </p>
 
                                     <p>
 
                                     <p>
Line 1,299: Line 1,497:
 
                                         the interface score shows the M-polyG facing the mobile &beta;6&#47;&beta;7
 
                                         the interface score shows the M-polyG facing the mobile &beta;6&#47;&beta;7
 
                                         loop.
 
                                         loop.
                                         In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide
+
                                         In contrast to the polyG peptide lacking the methionine, the M-polyG peptide
 
                                         is pulled down below the &beta;6&#47;&beta;7 loop by the methionine interacting
 
                                         is pulled down below the &beta;6&#47;&beta;7 loop by the methionine interacting
 
                                         with one of the &beta;-sheets leading to the active site. This is not the case
 
                                         with one of the &beta;-sheets leading to the active site. This is not the case
Line 1,330: Line 1,528:
 
                                 <div class="flex-center">
 
                                 <div class="flex-center">
  
                                     ª{•̃̾_•̃̾}ª
+
                                     <p>
 +
                                        For our project it was key to understand and characterize Sortase A7M.
 +
                                        As there is no annotated 3D structure for this specific Sortase, an <i>in
 +
                                            silico</i>
 +
                                        structure determination
 +
                                        was performed. This problem was tackled using two different approaches. The Deep
 +
                                        Learning approach did not yield a promising model as later analysis also
 +
                                        confirmed.
 +
                                        However, comparative modeling with <i>Rosetta</i> produced valid structures. We
 +
                                        used
 +
                                        the best structure, candidate <i>S_14771</i>, for extensive characterization.
 +
                                        We evaluated the model with regard to its secondary structure using Ramachandran
 +
                                        plots which suggested plausible secondary structures.
 +
                                    </p>
  
                                </div>
+
                                    <p>
                            </div>
+
                                        Molecular Dynamics simulations were used to investigate stability and dynamic
                        </div>
+
                                        properties of the candidate.
 +
                                        The RMSD and radius of gyration stabilized over the course of the simulation, a
 +
                                        first indicator of an equilibrated structure.
 +
                                        Interestingly, RMSF analysis showed strong fluctuations of residues 105 to 115.
 +
                                        We further investigated this by performing
 +
                                        Principle Component Analysis. Doing so, we extracted the principle movements of
 +
                                        the model. We could observe movement
 +
                                        of the &beta;6&#47;&beta;7 loop towards the active site, suggesting the presence
 +
                                        of a binding site.
 +
                                        Consequently, we performed docking simulations.
 +
                                    </p>
  
                    </div>
+
                                    <p>
                </div>
+
                                        FlexPepDock was used to conduct the docking simulations with target peptides.
 +
                                        Each run yielded 50,000 structures.
 +
                                        In multiple steps we reduced the amount of complexes to 100 clusters with
 +
                                        respect to total, reweighted and interface score.
 +
                                        We extracted the best scoring complexes and investigated interactions.
 +
                                    </p>
  
                <div class="tab my-3">
+
                                    <p>
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body5"
+
                                        For LPETGG we observed a uniform binding to the active site, fullfilling our
                        aria-expanded="false" aria-controls="collapseOne">
+
                                        expectation.
                        Acknowledgements and References
+
                                        Strikingly, polyG appeared to bind to the &beta;6&#47;&beta;7 loop in a uniform
                    </button>
+
                                        manner.
                </div>
+
                                        As it is known from literature polyG is a functioning ligand of sortase.
 +
                                        Supported by literature and our data, <b>we postulate the following
 +
                                            mechanism</b>:
 +
                                        the &beta;6&#47;&beta;7 loop transports bound polyG towards the active site of
 +
                                        Sortase A7M, thereby lowering the activation energy of the linking reaction.
 +
                                    </p>
  
                <div class="collapse multi-collapse" id="body5">
+
                                    <p>
                    <div class="card card-body">
+
                                        As the theory is neither backed up by nor contradicts experimental data, further
                        <div class="row">
+
                                        research is required.
                            <div class="col-xs-12 col-sm-12 col-md-2">
+
                                     </p>
                                <img class="img-fluid"
+
                                    src="https://2019.igem.org/wiki/images/c/c3/T--TU_Darmstadt--model_aknowledge.png"
+
                                     style="max-width:100%;">
+
                            </div>
+
                            <div class="col-xs-12 col-sm-12 col-md-10">
+
                                <div class="flex-center">
+
 
+
                                    ԅ(‾⌣‾ԅ)
+
  
 
                                 </div>
 
                                 </div>
Line 1,366: Line 1,589:
 
                 </div>
 
                 </div>
  
 +
                <h2>Acknowledgements</h2>
 +
                <p>We would like to thank the working group of Prof. Dr. Kay Hamacher. Especially Benjamin Mayer,
 +
                    Maximilian
 +
                    Dombrowsky and Patrick Kunzmann for their generous advice and support.</p>
 +
                <p>Furthermore, we would like to thank the
 +
                    <a target="_blank" href="https://2019.igem.org/Team:TU_Darmstadt/Collaborations"> LAB3 </a> for providing us with
 +
                    the computing power necessary to
 +
                    execute our Modeling.</p>
 +
                <h2>References</h2>
 +
 +
                <ol class="references">
 +
                    <li id="cite_note-1">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-1">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Bishop, CM.., Neural Networks for Pattern Recognition.
 +
                            Oxford University
 +
                            Press,
 +
                            1995.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.biorxiv.org/content/10.1101/265231v1" target="_blank">[1] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-2">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-2">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            AlQuraishi, M., End-to-End Differentiable Learning of
 +
                            Protein
 +
                            Structure. Cell Systems, 2019. 8: 1–10.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.biorxiv.org/content/10.1101/265231v1" target="_blank">[2] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-3">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-3">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Leaver-Fay, A. et al., ROSETTA3: an object-oriented software
 +
                            suite for
 +
                            the
 +
                            simulation and design of
 +
                            macromolecules. Methods Enzymol, 2011. 487:545-74.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083816/" target="_blank">[3]
 +
                            </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-4">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-4">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Moult, J. et al., Critical assessment of methods of protein
 +
                            structure
 +
                            prediction
 +
                            (CASP)—Round 6. PROTEINS:
 +
                            Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.20716" target="_blank">[4]
 +
                            </a>
 +
                        </span>
 +
                    </li> <!-- Iz fertig -->
 +
                    <li id="cite_note-5">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-5">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Song Y., et. al. High resolution comparative modeling with RosettaCM.
 +
                            2013.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3811137/" target="_blank">[5] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-6">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-6">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Metropolis N., et. al., Equation of State Calculations by Fast Computing
 +
                            Machines.
 +
                            J. Chem. Phys., 1953. 21: 1087.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.biorxiv.org/content/10.1101/265231v1" target="_blank">[6] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-7">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-7">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.8.
 +
                            2015.
 +
                            <a rel="nofollow" class="external autonumber" href="https://pymol.org/2/"
 +
                                target="_blank">[7] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-8">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-8">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Apol, E. et. al. GROMACS
 +
                            USER MANUAL. Department of Biophysical Chemistry, University of
 +
                            Groningen.
 +
                            2015.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.biorxiv.org/content/10.1101/265231v1" target="_blank">[8] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-9">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-9">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Vanommeslaeghe K., et. al. CHARMM general force field: A force field for
 +
                            drug‐like molecules compatible with the CHARMM all‐atom additive
 +
                            biological force fields.
 +
                            J. Comput. Chem., 2010. 31: 671-90.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.ncbi.nlm.nih.gov/pubmed/19575467" target="_blank">[9] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-10">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-10">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Kunzmann P., et. al. Biotite: a unifying open source computational
 +
                            biology framework in Python.
 +
                            BMC Bioinformatics, 2018.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2367-z"
 +
                                target="_blank">[10] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-11">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-11">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Wold S., et. al. Principal component analysis.
 +
                            emometrics and Intelligent Laboratory System. 1987. 2: 37-52.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.sciencedirect.com/science/article/pii/0169743987800849"
 +
                                target="_blank">[11] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-12">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-12">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Clancy K.W. Sortase transpeptidases: insights into mechanism, substrate
 +
                            specificity, and inhibition.
 +
                            Biopolymers. 2010. 94: 385-396.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648256/" target="_blank">[12] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-13">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-13">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Raveh ., et. al. Rosetta FlexPepDock ab-initio: Simultaneous Folding,
 +
                            Docking and Refinement of Peptides onto Their Receptors.
 +
                            Plos One. 2011.
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0018934"
 +
                                target="_blank">[13] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-14">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-14">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            Buitinck L. et. al.,
 +
                            API design for machine learning software: experiences from the
 +
                            scikit-learn project. arXiv. 2013.
 +
                            <a rel="nofollow" class="external autonumber" href="https://arxiv.org/abs/1309.0238"
 +
                                target="_blank">[14] </a>
 +
                        </span>
 +
                    </li>
 +
                    <li id="cite_note-15">
 +
                        <span class="mw-cite-backlink">
 +
                            <a href="#cite_ref-15">↑</a>
 +
                        </span>
 +
                        <span class="reference-text">
 +
                            McGuffin L.J., et. al. The PSIPRED protein structure prediction server.
 +
                            2000. 4: 404-405
 +
                            <a rel="nofollow" class="external autonumber"
 +
                                href="https://academic.oup.com/bioinformatics/article/16/4/404/187312"
 +
                                target="_blank">[15] </a>
 +
                        </span>
 +
                    </li>
 +
                </ol>
  
 
                 <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top"
 
                 <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top"

Latest revision as of 18:34, 21 October 2019

TU Darmstadt

Modeling

Introduction


In synthetic biology, theoretical models are often used to gain insights, to predict and to improve experiments. In our project we are modifying Virus-like particles (VLPs) by attaching proteins to the surface of the P22 capsid through a linker. The linking is catalyzed using the enzyme Sortase A7M, which is a calcium-independent mutant of the wild type Sortase A from Staphylococcus aureus. We performed modeling to predict the unknown structure of the Sortase A7M to improve the linking between proteins and therefore to optimize the modification efficiency of our platform.
Two different modeling approaches were used to determine the structure of the Sortase A7M. We compared machine learning approaches to traditional, comparative Monte-Carlo based modeling methods. The results were evaluated using an energy-scoring function and molecular dynamics (MD) simulations. The most promising Sortase A7M structures were used to perform a docking simulation to investigate binding with linkers.

In silico modeling and simulation of proteins requires a 3D structure, which can be obtained from the RCSB Protein Data Bank. However, if no 3D structures are annotated, as is the case with Sortase A7M, the structure has to be determined by other means. The structure prediction of Sortase A7M was done using two different approaches.

Deep Learning

Background

Machine Learning is a class of algorithms that aim to determine a function between two datasets. This is commonly done by presenting the algorithm with training data, as well as a scoring function to measure its success at processing the input data. During training a feedback loop is used to allow the algorithm to automatically find a function that fits the data. In contrast, classical algorithms are often hardcoded to solve a specific problem and only allow for limited flexibility.

A neural network consists of neurons, which are commonly referred to as nodes. They process input using weights, which are adjusted during its training. Nodes in neural networks are linked together: one neuron processes the input of other neurons, loosely mimicking the structure of biological brains. While one usually has a fixed amount of input and output neurons, limited by the data one wishes to classify, adding layers of hidden neurons can improve the classification. This is often referred to as deep learning and has led to revolutions in applications like speech and image recognition.

Using Machine Learning to predict protein structures has many advantages, compared to conventional methods, especially for iGEM teams who often only have limited access to resources. After training a neural network, which is a computationally expensive process and is often done in centralized data centers, it can be used to predict the structure of a wide variety of proteins. [1] Using pretrained models, novel protein structures can be obtained within seconds [2] compared to conventional methods taking several hours or days. [3]

Until earlier this year the use of Machine Learning in the prediction of protein structures has been restricted to applications within human-written algorithms. [2] AlQuarishi demonstrated a complete deep learning approach that is able to make predictions within 1-2 Å of other approaches, [2] while only using a fraction of the computational power. This enables accurate structural prediction with less powerful, as well as less expensive, hardware and thus significantly reduces the cost of structural modeling.

Procedure

We used AlQuarashi’s approach in combination with his pretrained model, which was trained on the ProteinNet database containing all structures released prior to the start of CASP12 (12th  Critical Assessment of Techniques for Protein Structure Prediction – 2016) our generated candidate structure CASP12 is named after. The results were tested against the CASP12 datasets and reached distance root-mean-square deviation (RMSD) values between 10 and 13 Å. The RMSD is defined as root-mean-square deviation of all atom positions compared to a template structure. It is defined as:

where xi is a vector of the atomic coordinates of the i-th atom. All proteins in the CASP datasets were not published until after the competition, and thus represent an assessment with only little bias. [4] We used these pretrained datasets to make structural predictions for our Sortase A7M. The predicted structure was then relaxed in a molecular dynamics simulation.

In the following, the specific steps for obtaining a tertiary structure predicted by AlQuarashi’s model are listed.

  1. We used the amino acid sequence of the Sortase A7M in the FASTA format to predict the tertiary structure of the amino acid backbone using AlQuarishi’s implementation of his end-to-end differentiable learning of protein structure with the pretrained preCASP ProteinNet database. The output file was a .tertiary file which contains a sequential 3x3 Matrix with atomic coordinates from each amino acid backbone starting at the N-Terminus. The raw backbone is depicted in Animation 1.
  2. As the standard format for protein structure information is the PDB (Protein Data Bank) file format, we wrote a python script to combine the structural information from the FASTA and .tertiary files into a single PDB file. For ease of use we used the Biotite Python Module.
  3. Using Rosetta's fixed backbone design program 'fixbb' with the 'hpatch', the optimal position of the side-chains was determined and added to the PDB file. The fixed backbone tool adds the corresponding side-chains and optimizes their conformation. The hpatch database ensures that hydrophilic side-chains are to be preferred on the surface of the protein as our sortase is present in an aqueous environment. The resulting structure is depicted in Animation 2.

In order to evaluate the structure obtained, we constructed a Ramachandran plot by calculating the dihedral angles of the amino acid backbone, as depicted in Fig. 1. These can then be compared to the typical dihedral angles for specific secondary structures such as α-Helices and β-Sheets. The typical angles from a randomly sampled dataset are depicted in Fig. 2.

Figure 1: The dihedral angles of amino acids can be calculated to create a Ramachandran plot.

Figure 2: The dihedral angles over a range of randomly sampled proteins.

Results

Animation 1: The raw PDB File converted from the .tertiary file.

Animation 2: The PDB-File after Step 3.

For analysis the Strucure was viewed in PyMOL . As can be seen in Animation 3 below, no secondary structures could be recognized by PyMOL. Thus, a Ramachandran plot was used to evaluate the dihedral angles of the backbone. This is depicted in Fig. 3. It was found that the angles do not match with the typical angles for α-helices and β-sheets.

Animation 3: The cartoon view in PyMOL.

Figure 3: Ramachandran plot of the predicted structure.

During training the predictions in AlQuarashi’s Model were optimized for their RMSD, which is the root-mean-square deviation of the distance between the atoms of the prediction and reference structure. Thus, even though the predictions are expected to have a similar shape to the physical structure, they may not be in the energy minimum.

Rosetta Comparative Modeling

Background

In our second approach we used the RosettaCommons comparative modeling (RosettaCM), which is based on homology modeling [5] . Homology modeling is a protein modeling method, which requires one or more template structures as a base for the protein to be modeled. The protein sequences are aligned with the sequence of the target protein. Unaligned sections are modeled using fragment or protein libraries, which leads to creating protein structures based on different sequence homologues of the protein of interest. Ab-initio or de novo modeling on the other hand attempts to find protein structures solely based on physicochemical principles applied to the primary sequence, which can be compared to the refolding of a denatured protein.

RosettaCM combines ab-initio modeling with homology modeling. The homologous structures, for which a resolved 3D structure with sufficiently similar sequence exists, are generated using homology modeling. Afterwards the unaligned sequences are modeled de novo. By combining the two methods, RosettaCM represents a precise and resource efficient tool for protein structure prediction. Rosetta applications rely on the Monte-Carlo Optimization, which is a probabilistic approach to finding a local minimum in the energy landscape of protein conformations. The underlying equation serving as the foundation of the statistical Monte-Carlo [6] method is the Metropolis acceptance criterion:


where kB is the Boltzmann constant, ΔE the difference in energy of the two states and T the temperature. The term kBT can also be written as a single factor β.

During the statistical protein folding based on the Monte-Carlo method, the initial structure is changed by small random perturbations of the atom locations. Whether the structure is accepted, or not, is decided by the Metropolis acceptance criterion. If ΔE < 0, the structure is accepted, otherwise the newly proposed structure is accepted with probability p as described in the Metropolis acceptance criterion.

Procedure

The RosettaCM protocol requires evolutionary related structures and sequences, as well as fragment files of the target structure. The fragment files serve as a structure template for the proteins and consist of peptide fragments of sizes 3 and 9 amino acids. We gathered five evolutionary related structures from the PDB with the accession numbers:


The five PDB entries represent different structures of sortases from Staphylococcus aureus. Fragment files can be created with the Robetta online server or with the Rosetta FragmentPicker application.

The RosettaCM procedure is best described in the following steps: [5]

  1. sequence and structural alignment of templates
  2. fragment insertion in unaligned sections
  3. replacement of random segment with segment from a different template structure
  4. energy minimization
  5. all-atom optimization

The alignment can be performed with various tools. We used MAFFT to generate the multiple sequence alignments. Prior to using the alignments as an input, they were converted to the grishin alignment format, as RosettaCM requires the alignments to be in said format. The minimization is performed using the Rosetta centroid energy function. For the centroid function to be applied, the protein is converted to the centroid representation. A protein in centroid representation consists of the backbone atoms N, Cα, OCarbonyl and an atom of varying size representing the side chain. The advantage of using the centroid representation is that the energy landscape can be traversed easier due to the smoother nature of the centroid energy landscape. Finally, the generated structure undergoes a second minimization in an all-atom model by means of Monte-Carlo optimization. This is similar to the energy minimization, but without the amino acids being represented as centroids of their functional groups. Structures computed through all-atom optimizations can reach atomic resolutions, [5] which is crucial for a model meant to be used to estimate atomic interactions.

Results

The run yielded 15,000 structures, which have been compared using the Rosetta scoring functions (talaris2013). From the 15,000 structures generated, we inspected the ten best scoring structures.

As can be seen in Fig. 4, the most prominent differences can be found in the regions close to the N- and C-terminus. As fluctuations in those regions are not untypical, we decided to use the best scoring structure, candidate S_14771 (Animation 4), as the input for the simulations to follow.

Figure 4: The structural alignment of the ten best scoring sortase structures displaying minor differences with the exception of the C- and N-terminal regions. N- and C-terminal regions tend to show strong fluctuations, thus it is unsurprising to find the terminal regions to be unaligned.

Animation 4: Sortase A7M candidate S_14771 created through RosettaCM.

In order to evaluate the secondary structure of the Sortase A7M candidate S_14771 a Ramachandran plot was created and compared to the five sortases used as input for the comparative modeling. Comparisons were also drawn with the sortase predicted by Deep Learning, as well as a database of randomly sampled proteins. Ramachandran plots of dihedral angles (Fig. 5) can be a first indicator whether the structures computed are valid.

Figure 5: The Ramachandran plot of randomly sampled proteins [1] and the input structures of the comparative modeling show similar secondary structures. Secondary structure analysis of both sortase candidates reveals absence of secondary structures for candidate CASP12. This is not the case with candidate S_14771 as the Ramachandran plot shows all relevant structures.

The Ramachandran plot (Fig. 5) showing α-helices and β-sheets is a strong indicator of a successful structure determination, as those secondary structures are crucial for the functionality of sortases.

Conclusion

We used machine learning methods, as well as Monte-carlo simulations to determine the structure of the mutated transpeptidase Sortase A7M. The machine learning approach using AlQuarishi's Deep Neural Network yielded a structure which seemed to not have any secondary structures. To exclude the possibility of an error in the PyMOL visualization software by Schroedinger, [7] a Ramachandran plot (Fig. 3) was created. The plot shows that no typical secondary structures are present, which is a strong indicator of a failed approach to determine a structure. The approach, using Rosetta Comparative Modeling, yielded 15,000 structures scored with the talaris2013 scoring function. The ten best structures were aligned and exhibited almost identical secondary structures (Fig. 4). The greatest structural differences are present in the N- and C-terminal regions. Since terminal regions tend to fluctuate more strongly than non-terminal segments of the protein, we deemed those fluctuations non-relevant for the proteins functionality.
Being the best scoring candidate, structure S_14771 was analyzed structurally using a Ramachandran plot (Fig. 5). The plot shows all the relevant and typical structures sortases exhibit and serves as an indicator for a successful structure prediction.
In the steps to follow, a molecular dynamics (MD) simulation will be performed on both structures. Even though structure CASP12 does not seem to be a valid structure, refolding processes during a MD simulation might lead to a relaxation of the protein and allow for a promising prediction of the Sortase A7M structure.

Background

The structure predictions made so far were based on statistical methods with physical constraints. The Deep Learning algorithm uses a neural network trained to find a function associating the amino acid sequence and the final 3D positions of the atoms within the protein. On the other hand, predictions were made with Rosetta using the Monte-Carlo Method. Here random movement of individual atoms occurs, and the energy is estimated after each step.

Even though both methods use physical constraints to find plausible protein structures, neither of them actually simulates the behavior of these molecules within a physical force field. Moreover, both methods do not necessarily output fully relaxed protein structures and simulate water implicitly by preferring hydrophilic parts of the proteins to be on the outside. Thus, we conducted a molecular dynamics (MD) simulation to verify the plausibility of our protein structure and allow equilibration. The molecular dynamics simulation provides the opportunity to simulate water as discrete molecules, creating a solvated protein. This step is crucial to validate the structures, as the interaction with water is one of the primary mechanism for protein folding. Since neither candidate CASP12 nor S_14771 have been modeled with explicit water an according MD simulation is imperative, to verify the correctness of the candidates conformation. This of course is much more expensive in terms of computational ressources. As the protein has to be placed in a simulation box and said box is filled with water molecules. This is called solvation and is visualized for candidate S_14771 in Fig. 6.

Figure 6: Sortase A7M in a force field surrounded by discrete water molecules and ions.

We used GROMACS (GROningen MAchine for Chemical Simulations) as the tool for our molecular dynamic simulations. GROMACS solves Newtons equations of motion for individual atoms [8] . While this classical simulation is much more accurate than predictions made by the other methods, approximations are used nonetheless: Forces are cut after a certain radius and the system size is quite small. [8] Additionally, atoms are assumed to be classical particles, which is not the case, as quantum mechanics plays a role in particle-particle interactions. Still, this simulation is very computationally expensive. Therefore, only time periods less than one second could be simulated.

Methods

To perform the molecular dynamics simulations we mostly followed the GROMACS Lysozyme tutorial as it serves our purpose perfectly. We created our simulation box to be of dodecahedral shape and a 0.7 nm distance of the solute to the box borders. We used periodic boundry conditions and a Na+ Cl- concentration of 0.012 mol/L. The main difference of our approach was that we used the CHARMM36 [9] force field instead of the OPLS-AA/L force field and have adjusted our molecular dynamics parameters accordingly. The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to simulate approximately 1 ns per hour.

To analyse the MD simulation we used the Python programming language and the Biotite package [10] as well as GROMACS analysis tools as covar and anaeig. The first analyses are a root-mean-square deviation (RMSD), a root-mean-square fluctuation (RMSF) and a gyration radius analysis. RMSD calculations have been described in the structure prediction section. To compute the RMSF the movement distance of each residue is computed as a root-mean-square over time as:

where v(t)i is the position of atom i at time t. The radius of gyration is a quantity used to describe the expansion or compression of a particle. The final analysis performed on the MD simulation is called Principle Component Analysis (PCA). By applying PCA to a protein it is possible to gain insights into the relevant vibrational motions and thereby the physical mechanism of the protein. [11]

Results

First indicators

The first possible indicators of a stable protein structure are converging root-mean-square deviation (RMSD), small root-mean-square fluctuation (RMSF) values as well as converging radii of gyration. Using the Python software package and the module Biotite we calculated these quantities and plotted the results for both candidate S_14771 and candidate CASP12.

Figure 7: The RMSD is one of three main indicators of a stable protein structure of the MD simulation of S_14771 over the period of 200,000 ps. As time progressed the RMSD increased with a smaller slope. The value stabilizes at a time of 110,000 ps and fluctuated around the value of 6 Å.

Figure 8: At t = 40,000 ps already the RMSD has arived at a stable value, while at the same time the gyration (Fig. 10) radius decreases over time continuously. This information suggests the protein might be folding and potentially developing secondary structures not present previously.

Figure 9: The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD (Fig. 7), stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.

Figure 10: As from t = 40,000 ps the radius of gyration decreases constantly. At the end of the simulation the gyration radius reaches a value of 17 Å. This behavior indicates folding of the protein structure.

Figure 11: The fluctuations (RMSF) of most residues appear insignificant compared to the first, the last residues and the residues close to residue 110 . Typically the N- and C-terminus tend to fluctuate more intensively due to the lack of stabilizing structures. The prominent fluctuations in the range of residue 105 to 115 can indicate a binding site or another form of functional structure.

Figure 12: The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD (Fig. 8), stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.


Typical RMSDs and radii of gyration converge towards a value dependent on the size of the protein. Convergence of those quantities can be interpreted as a stable state of the protein structure. As it can be seen in Fig. 7 and Fig. 9 both the RMSD and the radius of gyration stabilize at the same time as the simulation reaches 110,000 ps (110 ns), suggesting a now stabilized structure of candidate S_14771 solvated in water. Another indicator of a functional protein is the RMSF. Instead of being averaged over all atoms, the RMSF is averaged over time with respect to each amino acid. It provides insights in both protein stability and functionality. Fig. 11 reveals the RMSF of residues 105 to 115 to be significantly higher than that of other residues. This hints at the presence of a functional unit along these residues. As commented on in the section describing our structure prediction approaches, the N- and C-terminal regions tend to fluctuate more strongly as a result of the absence of stabilizing structures.

RMSD and gyration of radius calculations of candidate CASP12 (Fig. 8 and Fig. 10) provide evidence of folding. However, the RMSF values show values significantly higher, an effect possibly caused by instability or refolding. Nevertheless, the strongest fluctuations, disregarding the terminal regions, can be seen in the region of residue 105 to 115. This insight consolidates the theory that residues 105 to 115 might be a part of a functional unit.

We were unsure whether candidate CASP12 can be considered a plausible structure and how to interpret the findings concerning the prominent fluctuations. Therefore, we decided to perform a Principle Component Analysis.

Principle Component Analysis

To analyze our system further Principle Component Analysis (PCA) was performed using GROMACS. By applying PCA to a MD simulation of a protein it is possible to extract the most relevant motions of the protein.

Animation 5: A Principle Component Analysis of a less (blue) and a more relevant (red) principal movements showing the most prominent movements of the Cα-chain of candidate S_14771. Both principal components show movement of the β6/β7 loop consisting of residues 105 to 115 towards the active site. Thus we can assume that the closing β6/β7 loop is involved in the reaction mechanism.

Animation 6: The principal movements of candidate CASP appear similar to each other and no strong single movement can be specified. This makes the most relevant (red) and less relevant (blue) principal component indistinguishable from one another. Moreover the active site amino acids do not appear to be in close proximity, which would make a reaction catalyzed by candidate CASP12 impossible.

The results from the Principle Component Analysis of candidate S_14771 (Animation 5) show a movement of the residues 105 to 115 towards the active site, [12] supporting our theory that residues 105 to 115 are important for the reaction mechanism. Since the the most relevant vibrational movement of the sortase (red), is directed towards the active site, it is possible that the β6/β7 loop either closes the binding site of the ligand peptides or even transports one peptide towards the other.

Animation 6 shows the results of the Principle Component Analysis of candidate CASP12. As the RMSF calculations suggested (Fig. 12), the whole protein seems to be moving randomly with no directed movement. In addition the active site residues [12] are spread across the protein confirming our assumption that the protein is not in a stable or plausible conformation.

Conclusion

We gained evidence that at least one of our Sortase A7M models is a valid and stable candidate by performing various methods to analyse the structural stability and validity of our two Sortase A7M candidates. The candidate S_14771 that was generated using RosettaCM appears to be a fitting candidate not only due to successful analyses, but also since the residues of the active site [12] are close enough to each other to catalyze a ligation reaction. Our model created through deep learning excelled only in terms of RMSD and gyration radius calculations. Not only the RMSF and Principle Component Analysis but also the conformation of the active site have proven candidate CASP12 to be of no use for further calculations as it does not portray a valid conformation of Sortase A7M.

Now that the binding site of the Sortase had been found, the peptide ligand needed to be inserted into the binding site to create a peptide-protein complex. The procedure of choice for the introduction of a ligand into the binding site of a protein is called docking. In the following sections, we will present the protocol and methods we used as well as the results they yielded.

Background

Enzymes are one of the most relevant macromolecules in biology. Their functionality is determined through the way they interact with their ligands. Although enzymes are highly specific concerning the ligands they interact with, similar compounds can often bind to the same enzyme albeit with different affinity. To determine the best possible binding conformation of the protein-ligand complex, we use FlexPepDock, [13] an algorithm provided by the the RosettaCommons software package.

Procedure

The ab-initio FlexPepDock protocol consists of multiple steps and is documented on the RosettaCommons online documentation. We modified the protocol as the one provided did not work with our approach. The modified protocol has the following form:

  1. secondary structure determination
  2. complex creation
  3. FlexPepDock refinement

To determine the secondary structure of the peptide, fragment files (3-, 5- and 9-mers) had to be generated and a PSIPRED secondary structure prediction [15] had to be performed. As the peptides had a sequence length less than 20 amino acids, we were not able to use the online services such as Robetta and the PSIPRED online service. Instead we used the Rosetta FragmentPicker application and the PSIPRED command line tool. The generated structures serve as the input for the refinement protocol.
The generation of the peptide-protein complex can be divided into three steps:

  • peptide creation
  • peptide relaxation
  • coarse complex creation

The peptide structure was created through ab-initio modeling. Initial creation of the peptide was followed by insertion of the peptide into the sortase binding site. This lead to a coarse model of the peptide sortase complex. Here we used insight gained from the molecular dynamics simulation to place the peptide close to the binding site. This operation was performed using Biotite. [10]
In the final step the FlexPepDock refinement protocol is executed and 50,000 complex structures are generated. We used the inputs as described in [13] , written by the authors of the FlexPepDock documentation.
To get a better overview over our data we performed a clustering in python, using the scikit-learn [14] package. We clustered the structures with respect to:

  • total score: the total score of the docking provided by the Rosetta scoring function
  • interface score: the sum of the energy of the residues in the interfacing region
  • reweighted score: a score calculated by double weighting the contribution of the residues in the interfacing region
  • root-mean-square deviation: the root-mean-square deviation of the peptides in relation to the structure with the highest score
  • peptide direction: the direction the peptide is facing

Here clustering is used to group the docking results and thereby decrease the samlple size. From the 50,000 results we picked the results with the 500 best total scores, the 500 best interface scores and the 500 best reweighted scores. As we aimed to create an unbiased set for clustering, the abscence of duplicates in the set was ensured. We decreased the sample size to 100 groups representing the best scoring structures from the three categories.

Results

For sequences MGGGGPPPPPP(M-polyG), GGGGPPPPPP(polyG) and PPPPPPLPETGG(LPETGG) 50,000 structures have been created and clustered. After the clustering the sample consisted of 100 structures of docked complexes.

Figure 13: The three best scoring structures (total score, interface score, reweighted score) of the LPETGG-tag are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. The reacting section of the LPETGG-tag namely glycine is colored yellow as is the active site. The glycin of both ligand peptides is facing the active site.

Analysis of the scores has shown a similar score for all the three dockings. The best scoring results of the LPETGG docking show a tendency of the glycines to face the active site while also being in close proximity to the active site.

Figure 14: The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Instead of facing the active site (yellow) the reacting glycines (yellow) appear to interact with the β6/β7 loop of the sortase.

Figure 15: The three best scoring structures (total score, interface score, reweighted score) of the M-polyG peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Concerning the M-polyG peptide no uniform directional orientation can be observed. The structure with the best interface score (light blue) is oriendted towards the β6/β7 loop while the structure with the best total/reweighted (dark blue) is oriented towards the β-sheets.

Fig. 13 shows the docking result of the LPETGG peptide to the sortase. The results shown are the best scoring structures of the clustering with respect to the total score, interface score and reweighted score. As the best scoring structure is the same for the total score and the reweighted score only two peptides are shown. This also applies to Fig. 14and Fig. 15. For both results the reacting glycin residues (yellow) are facing the active site. Additionally, the same residues are in close proximity to the active site.

Fig. 14 and Fig. 15 show the docking of the both polyG and M-polyG. While polyG results align well and seem to be interacting with the β6/β7 loop rather than with the active site, this does not seem to be the case for M-polyG. Instead of both structures interacting with the β6/β7 loop or active site one (best interaction score; dark blue) interacts with the β6/β7 loop and the other (best reweighted/total score; light blue-gray) appears to interact with the active site.

Figure 16: The close up of the M-polyG peptide (best total/reweighted score) indicates an interaction of methionine with arginine139 and cysteine126.

Figure 17: Methionine of the result with the best interface score interacted with the β6/β7 loop rather than the active site. Still the reactive glycine residues appear to be bound to the β6/β7 loop.

As can be seen in Fig. 16 visualizing the result of the the docking simulation (total/reweighted score) suggests an interaction of methionine and two of the active sites namely arginine139 and cysteine126. Fig. 17 shows the interaction of M-polyG with the β6/β7 loop. The glycines still interact with the β6/β7 loop. Instead of binding above the β6/β7 loop, which is the case for polyG as illustrated in Fig. 14, the interaction seems to be influenced by methionine. By interacting with the residues in the β-helix methionine could potentially hinder binding of glycine to the β6/β7 loop by partial immobilization of the peptide. Overall peptide binding and orientation is less uniform compared polyG without the leading methionine, which could be an indicator of lesser binding affinity of M-PolyG towards the β6/β7 loop.

Conclusion

To computationally investigate binding affinities of the polyG and M-polyG as well as the LPETGG tags we performed docking simulations using the Rosetta FlexPepDock application. We used a modified version of the recommended protocol as the modified version was easier to automate and served our purpose better than the standard protocol. From the calculated scores only, we could not see a difference in binding affinities. Thus, we inspected the best scoring structures regarding the total score, the interface score and the reweighted score using PyMOL. [7] Since the best structures with respect to total score and reweighted score were the same for all simulations, only two structures have been inspected per run. A polyproline tag was appended to all the peptides to simulate the modification of the VLPs with a small peptide.

As expected, the results showed that for LPETGG, the glycines of both peptides oriented towards the active site. This is unsurprising as peptides with the sequence LPXTGG are known to be substrate of the Sortase. It was more surprising to see the polyG tag oriented away from the active site since polyG also is a known substrate of the sortase. Both polyG peptides were facing the β6/β7 loop (residues 105 to 115) uniformly and appeared to be interacting with it. The M-polyG peptides did not show a uniform orientation or interaction scheme. On one hand the visualization of the best result concerning the total and reweighted score has shown interaction of methionine with the cysteine126 and arginine139, two residues of the active site. On the other hand, the visualization of the best result with respect to the interface score shows the M-polyG facing the mobile β6/β7 loop. In contrast to the polyG peptide lacking the methionine, the M-polyG peptide is pulled down below the β6/β7 loop by the methionine interacting with one of the β-sheets leading to the active site. This is not the case with the polgG results, which lie aligned in one plane with the β6/β7 loop.

For our project it was key to understand and characterize Sortase A7M. As there is no annotated 3D structure for this specific Sortase, an in silico structure determination was performed. This problem was tackled using two different approaches. The Deep Learning approach did not yield a promising model as later analysis also confirmed. However, comparative modeling with Rosetta produced valid structures. We used the best structure, candidate S_14771, for extensive characterization. We evaluated the model with regard to its secondary structure using Ramachandran plots which suggested plausible secondary structures.

Molecular Dynamics simulations were used to investigate stability and dynamic properties of the candidate. The RMSD and radius of gyration stabilized over the course of the simulation, a first indicator of an equilibrated structure. Interestingly, RMSF analysis showed strong fluctuations of residues 105 to 115. We further investigated this by performing Principle Component Analysis. Doing so, we extracted the principle movements of the model. We could observe movement of the β6/β7 loop towards the active site, suggesting the presence of a binding site. Consequently, we performed docking simulations.

FlexPepDock was used to conduct the docking simulations with target peptides. Each run yielded 50,000 structures. In multiple steps we reduced the amount of complexes to 100 clusters with respect to total, reweighted and interface score. We extracted the best scoring complexes and investigated interactions.

For LPETGG we observed a uniform binding to the active site, fullfilling our expectation. Strikingly, polyG appeared to bind to the β6/β7 loop in a uniform manner. As it is known from literature polyG is a functioning ligand of sortase. Supported by literature and our data, we postulate the following mechanism: the β6/β7 loop transports bound polyG towards the active site of Sortase A7M, thereby lowering the activation energy of the linking reaction.

As the theory is neither backed up by nor contradicts experimental data, further research is required.

Acknowledgements

We would like to thank the working group of Prof. Dr. Kay Hamacher. Especially Benjamin Mayer, Maximilian Dombrowsky and Patrick Kunzmann for their generous advice and support.

Furthermore, we would like to thank the LAB3 for providing us with the computing power necessary to execute our Modeling.

References

  1. Bishop, CM.., Neural Networks for Pattern Recognition. Oxford University Press, 1995. [1]
  2. AlQuraishi, M., End-to-End Differentiable Learning of Protein Structure. Cell Systems, 2019. 8: 1–10. [2]
  3. Leaver-Fay, A. et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol, 2011. 487:545-74. [3]
  4. Moult, J. et al., Critical assessment of methods of protein structure prediction (CASP)—Round 6. PROTEINS: Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7. [4]
  5. Song Y., et. al. High resolution comparative modeling with RosettaCM. 2013. [5]
  6. Metropolis N., et. al., Equation of State Calculations by Fast Computing Machines. J. Chem. Phys., 1953. 21: 1087. [6]
  7. Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.8. 2015. [7]
  8. Apol, E. et. al. GROMACS USER MANUAL. Department of Biophysical Chemistry, University of Groningen. 2015. [8]
  9. Vanommeslaeghe K., et. al. CHARMM general force field: A force field for drug‐like molecules compatible with the CHARMM all‐atom additive biological force fields. J. Comput. Chem., 2010. 31: 671-90. [9]
  10. Kunzmann P., et. al. Biotite: a unifying open source computational biology framework in Python. BMC Bioinformatics, 2018. [10]
  11. Wold S., et. al. Principal component analysis. emometrics and Intelligent Laboratory System. 1987. 2: 37-52. [11]
  12. Clancy K.W. Sortase transpeptidases: insights into mechanism, substrate specificity, and inhibition. Biopolymers. 2010. 94: 385-396. [12]
  13. Raveh ., et. al. Rosetta FlexPepDock ab-initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors. Plos One. 2011. [13]
  14. Buitinck L. et. al., API design for machine learning software: experiences from the scikit-learn project. arXiv. 2013. [14]
  15. McGuffin L.J., et. al. The PSIPRED protein structure prediction server. 2000. 4: 404-405 [15]
Logo
Logo