Difference between revisions of "Team:TU Darmstadt/Model"

(Undo revision 425988 by Tomtomhdx (talk))
Line 23: Line 23:
 
     </div>
 
     </div>
  
<!-- Überschriften mit farbigern Unterstrich -->
+
    <!-- Überschriften mit farbigern Unterstrich -->
<div class="container">
+
    <div class="row">
+
        <div class="col mx-2">
+
            <!-- Hier Text rein  -->
+
<h2>Introduction</h2>
+
<hr class="head" />
+
 
+
 
     <div class="container">
 
     <div class="container">
 
         <div class="row">
 
         <div class="row">
 
             <div class="col mx-2">
 
             <div class="col mx-2">
                 <p> In synthetic biology, theoretical models are often used to gain insights, predict and improve
+
                 <!-- Hier Text rein -->
                    experiments. In our project we are modifying Virus-like particles (VLPs) by attaching proteins to the
+
                <h2>Introduction</h2>
                    surface of the P22 capsid
+
                 <hr class="head" />
                    <!-- Link zum Background oder Project overview --> through a linker. The linking is catalyzed using
+
                    the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A
+
                    <!-- Link zum Sortase Background --> from <i>Staphylococcus aureus</i>. We performed modeling to predict the unknown structure of the
+
                    Sortase A7M, to improve the linker between proteins and therefore optimizing the modification
+
                    efficiency of our platform. <br>
+
                    Two different modeling approaches were used to determine the structure of Sortase A7M. We compared
+
                    machine learning approaches to traditional comparative, Monte-Carlo based modeling methods. The
+
                    results were evaluated using an energy-scoring function and molecular dynamics (MD) simulations. The
+
                    most promising Sortase A7M structures were used to perform a docking simulation to screen for
+
                    optimal linkers.
+
                 </p>
+
            </div>
+
        </div>
+
    </div>
+
  
    <div class="tab my-3">
+
                <div class="container">
        <button class="btn btn-block" id="last" data-toggle="collapse" data-target=".multi-collapse"
+
                    <div class="row">
            aria-expanded="false" aria-controls="collapseOne">
+
                        <div class="col mx-2">
            Toggle all
+
                            <p> In synthetic biology, theoretical models are often used to gain insights, predict and
        </button>
+
                                improve
    </div>
+
                                experiments. In our project we are modifying Virus-like particles (VLPs) by attaching
 +
                                proteins to the
 +
                                surface of the P22 capsid
 +
                                <!-- Link zum Background oder Project overview  --> through a linker. The linking is
 +
                                catalyzed using
 +
                                the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A
 +
                                <!-- Link zum Sortase Background --> from <i>Staphylococcus aureus</i>. We performed
 +
                                modeling to predict the unknown structure of the
 +
                                Sortase A7M, to improve the linker between proteins and therefore optimizing the
 +
                                modification
 +
                                efficiency of our platform. <br>
 +
                                Two different modeling approaches were used to determine the structure of Sortase A7M.
 +
                                We compared
 +
                                machine learning approaches to traditional comparative, Monte-Carlo based modeling
 +
                                methods. The
 +
                                results were evaluated using an energy-scoring function and molecular dynamics (MD)
 +
                                simulations. The
 +
                                most promising Sortase A7M structures were used to perform a docking simulation to
 +
                                screen for
 +
                                optimal linkers.
 +
                            </p>
 +
                        </div>
 +
                    </div>
 +
                </div>
  
    <div class="tab my-3">
+
                <div class="tab my-3">
        <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body1" aria-expanded="false"
+
                    <button class="btn btn-block" id="last" data-toggle="collapse" data-target=".multi-collapse"
            aria-controls="collapseOne">
+
                        aria-expanded="false" aria-controls="collapseOne">
            Structure determination
+
                        Toggle all
        </button>
+
                    </button>
    </div>
+
                </div>
  
    <div class="collapse multi-collapse" id="body1">
+
                <div class="tab my-3">
        <div class="card card-body">
+
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body1"
            <div class="row">
+
                         aria-expanded="false" aria-controls="collapseOne">
                <div class="col-xs-12 col-sm-12 col-md-2">
+
                         Structure determination
                    <img class="img-fluid"
+
                    </button>
                         src="https://2019.igem.org/wiki/images/d/d7/T--TU_Darmstadt--Structure_Determination_Modeling.png"
+
                         style="max-width:100%;">
+
 
                 </div>
 
                 </div>
                <div class="col-xs-12 col-sm-12 col-md-10">
 
                    <div class="flex-center">
 
  
                        <p>
+
                <div class="collapse multi-collapse" id="body1">
                            <i>In silico</i> modeling and simulation of proteins requires a 3D structure, which can be
+
                    <div class="card card-body">
                             obtained from the <a href="https://www.rcsb.org/" target="_blank">RCSB Protein Data
+
                        <div class="row">
                                Bank</a>. However, if no 3D structures are annotated, as it is the case with sortase
+
                             <div class="col-xs-12 col-sm-12 col-md-2">
                             A7M, the structure has to be determined by other means. The structure prediction of sortase A7M was done using two different approaches.
+
                                <img class="img-fluid"
                        </p>
+
                                    src="https://2019.igem.org/wiki/images/d/d7/T--TU_Darmstadt--Structure_Determination_Modeling.png"
 +
                                    style="max-width:100%;">
 +
                            </div>
 +
                             <div class="col-xs-12 col-sm-12 col-md-10">
 +
                                <div class="flex-center">
  
                        <h2 id="Deep_Learning">Deep Learning</h2>
+
                                    <p>
                        <h3 class="ausfahrbarer-boi">Background</h3>
+
                                        <i>In silico</i> modeling and simulation of proteins requires a 3D structure,
                        <p>Machine Learning is a class of algorithms that aim to determine a function between two
+
                                        which can be
                            datasets. This is commonly
+
                                        obtained from the <a href="https://www.rcsb.org/" target="_blank">RCSB Protein
                            done by
+
                                            Data
                            presenting the algorithm with training data as well as a scoring function to measure its
+
                                            Bank</a>. However, if no 3D structures are annotated, as it is the case with
                            success at processing the
+
                                        sortase
                            input data. During training a feedback loop is used to allow the algorithm to automatically
+
                                        A7M, the structure has to be determined by other means. The structure prediction
                            find a function to fit
+
                                        of sortase A7M was done using two different approaches.
                            the data. In contrast, classical
+
                                    </p>
                            algorithms are often
+
                            hardcoded to solve a specific problem and only allow for limited flexibility.</p>
+
  
                        <p>A neural network consists of neurons, which are commonly referred to as nodes. They process
+
                                    <h2 id="Deep_Learning">Deep Learning</h2>
                            input using
+
                                    <h3 class="ausfahrbarer-boi">Background</h3>
                            weights, which are adjusted during its training. Nodes in neural networks are linked
+
                                    <p>Machine Learning is a class of algorithms that aim to determine a function
                            together: One neuron processes
+
                                        between two
                            the inputs of other neurons, loosely mimicking the structure of biological brains. While one
+
                                        datasets. This is commonly
                            usually has a fixed
+
                                        done by
                            amount of input and output neurons limited by the data one wishes to classify, adding layers of hidden neurons can improve the classification.
+
                                        presenting the algorithm with training data as well as a scoring function to
                            This is often referred to as
+
                                        measure its
                            deep learning and has led to revolutions in applications like speech and image recognition.
+
                                        success at processing the
                        </p>
+
                                        input data. During training a feedback loop is used to allow the algorithm to
 +
                                        automatically
 +
                                        find a function to fit
 +
                                        the data. In contrast, classical
 +
                                        algorithms are often
 +
                                        hardcoded to solve a specific problem and only allow for limited flexibility.
 +
                                    </p>
  
                        <p>Using Machine Learning to predict protein structures has many advantages compared to
+
                                    <p>A neural network consists of neurons, which are commonly referred to as nodes.
                            conventional methods especially
+
                                        They process
                            for iGEM teams who often only have limited access to resources. After training a neural
+
                                        input using
                            network, which is a
+
                                        weights, which are adjusted during its training. Nodes in neural networks are
                            computationally expensive process and often done in centralized data centers, it can be used
+
                                        linked
                            to predict the
+
                                        together: One neuron processes
                            structure of a wide variety of proteins.
+
                                        the inputs of other neurons, loosely mimicking the structure of biological
                            <sup id="cite_ref-1" class="reference">
+
                                        brains. While one
                                <a href="#cite_note-1">[1] </a>
+
                                        usually has a fixed
                            </sup>
+
                                        amount of input and output neurons limited by the data one wishes to classify,
                            Using pretrained models, novel protein structures can be obtained within seconds
+
                                        adding layers of hidden neurons can improve the classification.
 +
                                        This is often referred to as
 +
                                        deep learning and has led to revolutions in applications like speech and image
 +
                                        recognition.
 +
                                    </p>
  
                            <sup id="cite_ref-2" class="reference">
+
                                    <p>Using Machine Learning to predict protein structures has many advantages compared
                                <a href="#cite_note-2">[2] </a>
+
                                        to
                            </sup>
+
                                        conventional methods especially
                            compared to conventional methods taking several hours or days.
+
                                        for iGEM teams who often only have limited access to resources. After training a
                            <sup id="cite_ref-2" class="reference">
+
                                        neural
                                <a href="#cite_note-2">[3] </a>
+
                                        network, which is a
                            </sup>
+
                                        computationally expensive process and often done in centralized data centers, it
                        </p>
+
                                        can be used
                        <p>Until earlier this year the use of Machine Learning in the prediction of protein structures
+
                                        to predict the
                            has been restricted to
+
                                        structure of a wide variety of proteins.
                            applications within human-written algorithms.
+
                                        <sup id="cite_ref-1" class="reference">
                            <sup id="cite_ref-1" class="reference">
+
                                            <a href="#cite_note-1">[1] </a>
                                <a href="#cite_note-1">[2] </a>
+
                                        </sup>
                            </sup>
+
                                        Using pretrained models, novel protein structures can be obtained within seconds
                            AlQuarishi demonstrated a complete deep learning approach that is able to make predictions
+
                            within 1-2 Å of other
+
                            approaches
+
                            <sup id="cite_ref-1" class="reference">
+
                                <a href="#cite_note-1">[2] </a>
+
                            </sup>
+
                            , while only using a fraction of the computational power. This enables accurate structural
+
                            prediction with less
+
                            powerful as well as less expensive hardware and thus significantly reduces the cost of
+
                            structural modeling.</p>
+
                        <h3>Procedure</h3>
+
                        <p> We used AlQuarashi’s approach in combination with his pretrained model, which was trained on
+
                            the Proteinnet database
+
                            containing all structures released prior to the start of CASP12 (12th Critical Assessment of
+
                            Techniques for Protein <!-- RMSD immer gleich schreiben bindestriche und so -->
+
                            Structure Prediction – 2016). The results were tested against the CASP12 datasets and
+
                            reached distance root-mean-square deviation (RMSD) values between
+
                            10 and 13 &#8491;. The RMSD is defined as root-mean-square deviation of all atom positions compared to a template structure.
+
                            It is defined as:
+
                            $$ RMSD = \sqrt{\sum_i^N \left((||v_t - v_i||)^2\right)},$$
+
                            where v_i is a vector of all <!-- change here -->
+
                            All proteins in the CASP datasets were not published until after the competition and thus represent an
+
                            assessment with only little bias.
+
                            <sup id="cite_ref-4" class="reference">
+
                                <a href="#cite_note-4">[4] </a>
+
                            </sup>
+
                            We used these pretrained datasets to make structural predictions for our Sortase A7M. The
+
                            predicted structure was then relaxed in a Molecular Dynamics Simulation using GROMACS.
+
                        </p>
+
  
                        <p>In the following, the specific steps for obtaining a tertiary structure predicted by
+
                                        <sup id="cite_ref-2" class="reference">
                            AlQuarashi’s model are listed.
+
                                            <a href="#cite_note-2">[2] </a>
                        </p>
+
                                        </sup>
 +
                                        compared to conventional methods taking several hours or days.
 +
                                        <sup id="cite_ref-2" class="reference">
 +
                                            <a href="#cite_note-2">[3] </a>
 +
                                        </sup>
 +
                                    </p>
 +
                                    <p>Until earlier this year the use of Machine Learning in the prediction of protein
 +
                                        structures
 +
                                        has been restricted to
 +
                                        applications within human-written algorithms.
 +
                                        <sup id="cite_ref-1" class="reference">
 +
                                            <a href="#cite_note-1">[2] </a>
 +
                                        </sup>
 +
                                        AlQuarishi demonstrated a complete deep learning approach that is able to make
 +
                                        predictions
 +
                                        within 1-2 Å of other
 +
                                        approaches
 +
                                        <sup id="cite_ref-1" class="reference">
 +
                                            <a href="#cite_note-1">[2] </a>
 +
                                        </sup>
 +
                                        , while only using a fraction of the computational power. This enables accurate
 +
                                        structural
 +
                                        prediction with less
 +
                                        powerful as well as less expensive hardware and thus significantly reduces the
 +
                                        cost of
 +
                                        structural modeling.</p>
 +
                                    <h3>Procedure</h3>
 +
                                    <p> We used AlQuarashi’s approach in combination with his pretrained model, which
 +
                                        was trained on
 +
                                        the Proteinnet database
 +
                                        containing all structures released prior to the start of CASP12 (12th Critical
 +
                                        Assessment of
 +
                                        Techniques for Protein
 +
                                        <!-- RMSD immer gleich schreiben bindestriche und so -->
 +
                                        Structure Prediction – 2016). The results were tested against the CASP12
 +
                                        datasets and
 +
                                        reached distance root-mean-square deviation (RMSD) values between
 +
                                        10 and 13 &#8491;. The RMSD is defined as root-mean-square deviation of all atom
 +
                                        positions compared to a template structure.
 +
                                        It is defined as:
 +
                                        $$ RMSD = \sqrt{\sum_i^N \left((||v_t - v_i||)^2\right)},$$
 +
                                        where v_i is a vector of all
 +
                                        <!-- change here -->
 +
                                        All proteins in the CASP datasets were not published until after the competition
 +
                                        and thus represent an
 +
                                        assessment with only little bias.
 +
                                        <sup id="cite_ref-4" class="reference">
 +
                                            <a href="#cite_note-4">[4] </a>
 +
                                        </sup>
 +
                                        We used these pretrained datasets to make structural predictions for our Sortase
 +
                                        A7M. The
 +
                                        predicted structure was then relaxed in a Molecular Dynamics Simulation using
 +
                                        GROMACS.
 +
                                    </p>
  
                        <ol>
+
                                    <p>In the following, the specific steps for obtaining a tertiary structure predicted
 +
                                        by
 +
                                        AlQuarashi’s model are listed.
 +
                                    </p>
  
                            <li>We used the amino acid sequence of the Sortase A7M in the FASTA format to predict the
+
                                    <ol>
                                tertiary structure of the
+
                                amino acid backbone using AlQuarishi’s Tensor Flow implementation of his end-to-end
+
                                differentiable learning of
+
                                protein structure with the pretrained preCASP Proteinnet database. The Output file was a
+
                                <i>.tertiary</i> file which
+
                                contains a sequential 3x3 Matrix with atomic coordinates from each amino acid backbone
+
                                starting at the
+
                                N-Terminus.</li>
+
  
                            <li>As the standard format for protein structure information is the PDB file format, we
+
                                        <li>We used the amino acid sequence of the Sortase A7M in the FASTA format to
                                wrote a python script to
+
                                            predict the
                                combine the structural information from the FASTA and .tertiary files into a PDB file.
+
                                            tertiary structure of the
                                For ease of use we
+
                                            amino acid backbone using AlQuarishi’s Tensor Flow implementation of his
                                used the
+
                                            end-to-end
                                Biotite Python Module. <!-- BIOTITE REFERENZ --></li>
+
                                            differentiable learning of
 +
                                            protein structure with the pretrained preCASP Proteinnet database. The
 +
                                            Output file was a
 +
                                            <i>.tertiary</i> file which
 +
                                            contains a sequential 3x3 Matrix with atomic coordinates from each amino
 +
                                            acid backbone
 +
                                            starting at the
 +
                                            N-Terminus.</li>
  
                            <li>Using Rosetta's fixed backbone design program 'fixbb' with the 'hpatch', the optimal
+
                                        <li>As the standard format for protein structure information is the PDB file
                                position of the side-chains
+
                                            format, we
                                was
+
                                            wrote a python script to
                                determined and added to the PDB file. The fixed backbone tool adds the corresponding
+
                                            combine the structural information from the FASTA and .tertiary files into a
                                side-chains and optimizes
+
                                            PDB file.
                                their conformation. The Hpatch database ensures that hydrophilic side-chains are to be
+
                                            For ease of use we
                                preferred on the surface
+
                                            used the
                                of the protein as our sortase is present in an aqueous environment.</li>
+
                                            Biotite Python Module.
 +
                                            <!-- BIOTITE REFERENZ -->
 +
                                        </li>
  
 +
                                        <li>Using Rosetta's fixed backbone design program 'fixbb' with the 'hpatch', the
 +
                                            optimal
 +
                                            position of the side-chains
 +
                                            was
 +
                                            determined and added to the PDB file. The fixed backbone tool adds the
 +
                                            corresponding
 +
                                            side-chains and optimizes
 +
                                            their conformation. The Hpatch database ensures that hydrophilic side-chains
 +
                                            are to be
 +
                                            preferred on the surface
 +
                                            of the protein as our sortase is present in an aqueous environment.</li>
  
                        </ol>
 
  
                        <h3>Results</h3>
+
                                    </ol>
                        <div class="row">
+
 
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                    <h3>Results</h3>
                                <img class="img-fluid center"
+
                                    <div class="row">
                                    src="https://2019.igem.org/wiki/images/5/57/T--TU_Darmstadt--CASP_Nochains.gif"
+
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                    style="width:100%">
+
                                            <img class="img-fluid center"
                                <p><b>Animation 1: </b>The raw PDB File converted from the .tertiary file.</p>
+
                                                src="https://2019.igem.org/wiki/images/5/57/T--TU_Darmstadt--CASP_Nochains.gif"
                            </div>
+
                                                style="width:100%">
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                            <p><b>Animation 1: </b>The raw PDB File converted from the .tertiary file.
                                <img class="img-fluid center"
+
                                            </p>
                                    src="https://2019.igem.org/wiki/images/c/c1/T--TU_Darmstadt--CASP_Chains.gif"
+
                                        </div>
                                    style="width:100%">
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                <p><b>Animation 2: </b>The PDB-File after Step 3.</p>
+
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/c/c1/T--TU_Darmstadt--CASP_Chains.gif"
 +
                                                style="width:100%">
 +
                                            <p><b>Animation 2: </b>The PDB-File after Step 3.</p>
 +
                                        </div>
 +
                                    </div>
 +
                                    <p>For analysis the Strucure was viewed in Pymol
 +
                                        <!-- PYMOL REFERENZ -->. As can be seen in the pictures below,
 +
                                        no secondary structures could be recognized by Pymol. Thus, a Ramachandran Plot
 +
                                        was used to
 +
                                        evaluate the dihedral angles of the backbone. It was found that the angles do
 +
                                        not match with
 +
                                        the typical angles for &alpha;-helices and &beta;-sheets.</p>
 +
                                    <div class="row">
 +
                                        <div class="figurcolumn column" style="width: 50%; float: left;  padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/8/82/T--TU_Darmstadt--CASP_CHAINSCartoon.gif"
 +
                                                style="width:100%">
 +
                                            <p><b>Animation 3: </b>The cartoon view in Pymol.</p>
 +
                                        </div>
 +
                                        <div class="figurcolumn column"
 +
                                            style="width: 50%; float: right;  padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
 +
                                                style="width:100%">
 +
                                            <p> <b>Figure 1: </b>Ramachandran plot of the predicted structure.</p>
 +
                                        </div>
 +
                                    </div>
 +
                                    <p>During training the predictions in AlQuarashi’s Model were optimized for their
 +
                                        RMSD which is
 +
                                        the root-mean-square deviation of the distance between the atoms of the
 +
                                        prediction and
 +
                                        reference
 +
                                        structure. Thus, even though the predictions are expected to have a similar
 +
                                        shape to
 +
                                        the physical structure, they may not be in the energy minimum. Hence, we applied
 +
                                        a
 +
                                        GROMACS molecular
 +
                                        dynamics in order to relax the structure obtained by AlQuarashi’s deep learning
 +
                                        model.</p>
 +
                                    <h2>RosettaCM</h2>
 +
                                    <h3 class="ausfahrbarer-boi">Background</h3>
 +
 
 +
                                    <p>In our second approach we used the <a href="rosettacommons.org"
 +
                                            target="_blank"><i>RosettaCommons</a> comparative modeling
 +
                                        (<a>RosettaCM</a>)</i>, which
 +
                                        is based on homology modeling. <i>Homology modeling</i> is a protein modeling
 +
                                        method, which
 +
                                        requires one or more template structures as base the protein to be modeled on.
 +
                                        The protein
 +
                                        sequences are aligned with the sequence of the target protein. Unaligned
 +
                                        sections are
 +
                                        modeled using fragment or protein libraries, which leads to creating
 +
                                        <!-- ästhetik --> protein structures based
 +
                                        on different sequence homologues of the protein of interest.
 +
                                        <i>Ab-initio</i> or <i>de novo</i> modeling on the other hand attempts to find
 +
                                        protein
 +
                                        structures solely based on physicochemical principles applied to the primary
 +
                                        sequence, which
 +
                                        can be compared to the refolding of a denaturated protein.</p>
 +
 
 +
                                    <p>RosettaCM combines <i>ab-initio modeling</i> with <i>homology modeling</i>. The
 +
                                        homologus structures for which a resolved 3D structure with sufficiently similar
 +
                                        sequence exists are generated using homology modeling. Afterwards the unaligned
 +
                                        sequences are modeled de novo. By combining the two methods RosettaCM
 +
                                        represents a precise and resource efficient tool for protein structure
 +
                                        prediction.
 +
                                        Rosetta applications rely on the Monte-Carlo Optimization, which is a
 +
                                        probabilistic
 +
                                        approach to finding a local minimum in the energy landscape of protein
 +
                                        conformations. The
 +
                                        underlying equation serving as the fundament of the statistical Monte-Carlo
 +
                                        <!-- ref original paper --> method is the Metropolis acceptance criterion:
 +
                                        $$p = min(1, exp[-\Delta E/ (k_{B} \cdot T)]),$$
 +
                                        <br> where k<sub>B</sub> is the Boltzmann constant, &Delta;E the difference in
 +
                                        energy of the two states and T the temperature. The term k<sub>B</sub>T can also
 +
                                        be written as a single factor &beta;.</p>
 +
 
 +
                                    <p>
 +
                                        During the statistical protein folding based on the Monte-Carlo method, the
 +
                                        initial
 +
                                        structure is changed by small random perturbations of the atom locations.
 +
                                        Whether the structure is accepted or
 +
                                        not is decided by the Metropolis acceptance criterion.
 +
                                        If &Delta;E < 0, the structure is accepted, otherwise the newly proposed
 +
                                            structure is accepted with probability p as described in the Metropolis
 +
                                            acceptance criterion. </p> <h3 class="ausfahrbarer-boi">Procedure</h3>
 +
                                            <p>
 +
                                                The RosettaCM protocol requires evolutionary related structures and
 +
                                                sequences,
 +
                                                as well as fragment files of the target structure.
 +
                                                The fragment files serve as a structure template for the proteins and
 +
                                                they
 +
                                                consist of peptide fragments of sizes 3 and 9.
 +
                                                We gathered five evolutionary related structures from the RCBS PDB with
 +
                                                the
 +
                                                accession numbers:</p>
 +
                                            <ul>
 +
                                                <!-- LINKS FÜR ALLE STRUKTUREN EINFÜGEN -->
 +
                                                <li>1ija</li>
 +
                                                <li>1itw</li>
 +
                                                <li>1itp</li>
 +
                                                <li>1ito</li>
 +
                                                <li>2mlm</li>
 +
                                            </ul>
 +
                                            <br>
 +
                                            <p>
 +
                                                The five RCBS entries represent different structures of sortases from
 +
                                                <i>Staphylococcus aureus</i>.
 +
                                                Fragment files can be created with the Robetta <a
 +
                                                    href="robetta.bakerlab.http://robetta.bakerlab.org/org"
 +
                                                    target="_blank">online server</a> or with the Rosetta FragmentPicker
 +
                                                application.
 +
                                            </p>
 +
                                            <p>The RosettaCM procedure is best described in the following steps:</p>
 +
                                            <!-- quelle auf rosetta cm seite-->
 +
                                            <ol>
 +
                                                <li>sequence and structural alignment of templates</li>
 +
                                                <li>fragment insertion in unaligned sections</li>
 +
                                                <li>replacement of random segment with segment from a different template
 +
                                                    structure</li>
 +
                                                <li>energy minimization</li>
 +
                                                <li>all-atom optimization</li>
 +
 
 +
                                            </ol>
 +
                                            <br>
 +
                                            <p>
 +
                                                The alignment can be performed with various tools. We used <a
 +
                                                    href="https://mafft.cbrc.jp/alignment/server/"
 +
                                                    target="_blank">MAFFT</a> to
 +
                                                generate the multiple sequence alignments.
 +
                                                Prior to using the alignments as an input, they were converted to the
 +
                                                grishin
 +
                                                alignment format as RosettaCM requires the alignments to be in said
 +
                                                format.
 +
                                                The minimization is performed using the Rosetta controid energy
 +
                                                function. For
 +
                                                the centroid function to be applied, the protein is converted to the
 +
                                                centroid
 +
                                                representation. A protein in centroid representation consists of the
 +
                                                backbone
 +
                                                atoms N, C<sub>&alpha;</sub>;, O<sub>Carbonyl</sub> and an atom of
 +
                                                varying size representing the
 +
                                                side chain. The advantage of using the centroid representation is that
 +
                                                the
 +
                                                energy landscape can be traversed easier due to the smoother nature of
 +
                                                the
 +
                                                centroid energy landscape.
 +
                                                Finally the generated structure undergoes a second minimization in an
 +
                                                all-atom model by
 +
                                                means of Monte-Carlo optimization. This is similar to the energy
 +
                                                minimization but without the amino acids being
 +
                                                represented as centroids of their functional groups. Structures computed
 +
                                                through
 +
                                                all-atom optimizations can reach atomic resolutions
 +
                                                {{Quelle rosetta paper}}
 +
                                                which is crucial for a model meant to be used to estimate atomic
 +
                                                interactions.
 +
                                            </p>
 +
 
 +
                                            <h3>Results</h3>
 +
                                            <p>
 +
                                                The run yielded 15,000 structures which have been compared using the
 +
                                                Rosetta
 +
                                                scoring functions (talaris2013).
 +
                                                <!-- scoring -->
 +
                                                From the 15,000 structures generated, we inspected the ten best scoring
 +
                                                structures. </p>
 +
 
 +
                                            <p>As can be seen in figure 5, the most prominent differences can
 +
                                                be found in the regions close to the N- and C-terminus. As
 +
                                                fluctuations in those
 +
                                                regions are not untypical, we decided to use the best scoring
 +
                                                structure, candidate S_14771 (figure 6), as the input for the
 +
                                                simulations to follow.</p>
 +
 
 +
 
 +
                                            <div class="row">
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: left;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png"
 +
                                                        style="width:100%">
 +
                                                    <p><b>Figure 2</b>: The structural alignment of the ten best scoring
 +
                                                        sortase structures
 +
                                                        displaying minor differences with the exception of the C- and
 +
                                                        N-terminal
 +
                                                        regions. N- and C-terminal regions tend to show strong
 +
                                                        fluctuations, thus it is
 +
                                                        unsurprising to find the terminal regions to be unaligned.</p>
 +
                                                </div>
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: right;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif"
 +
                                                        style="width:100%">
 +
                                                    <p><b>Figure 3</b>: Sortase A7M candidate S_14771 created through
 +
                                                        RosettaCM.</p>
 +
                                                </div>
 +
                                            </div>
 +
 
 +
                                            <div class="figurcolumn column" style="width: 70%; padding: 1em;">
 +
                                                <img class="img-fluid center"
 +
                                                    src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dihedral.png"
 +
                                                    style="width:100%">
 +
                                                <p><b>Figure 4</b>: The dihedral angles of amino acids can be
 +
                                                    calculated to create a Ramachandran plot. </p>
 +
                                            </div>
 +
 
 +
                                            <!-- muss überarbeitet werden -->
 +
 
 +
                                            To evaluate the secondary structure as done with the structure acquired
 +
                                            through Deep Learning bla bla a ramachandran plot of the dihedral angle
 +
                                            of the five sortases used as inputs has been made.
 +
                                            Ramachandran plots of dihedral angles (fig x) can be a first indicator
 +
                                            whether the structures computed are valid.
 +
 
 +
                                            <div class="row">
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: left;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png"
 +
                                                        style="width:100%">
 +
                                                </div>
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: right;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
 +
                                                        style="width:100%">
 +
                                                </div>
 +
                                            </div>
 +
                                            <div class="row">
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: left;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png"
 +
                                                        style="width:100%">
 +
                                                </div>
 +
                                                <div class="figurcolumn column"
 +
                                                    style="width: 50%; float: right;  padding: 1em;">
 +
                                                    <img class="img-fluid center"
 +
                                                        src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
 +
                                                        style="height:82.5%; padding-top: 1.8em; padding-bottom: 2.5em;">
 +
                                                </div>
 +
                                            </div>
 +
 
 +
                                            <p><b>Figure 5: </b> The comparison of the ramachandran plot of
 +
                                                structure S_14771 and the ramachandran plot found on <a
 +
                                                    href="https://proteopedia.org/wiki/images/9/90/Ramachandran_plot_general_100K.jpg">Protopedia</a>
 +
                                                suggests that secondary structures are present. Hence the structure
 +
                                                appears
 +
                                                to contain &alpha;-helices, &beta;-sheets and a small amount of
 +
                                                lefthanded
 +
                                                &alpha;-helices. </p>
 +
                                            The Ramachandran plot (Figure xzy) showing &alpha;-helices and
 +
                                            &beta;-sheets is a
 +
                                            strong indicator of a successful structure determination, as those
 +
                                            secondary
 +
                                            structures are crucial for the functionality of sortases.
 +
 
 +
                                            <h2>Conclusion</h2>
 +
                                            <p>
 +
                                                We used machine learning methods, as well as monte-carlo simulations
 +
                                                to
 +
                                                determine the structure of the mutated transpeptidase Sortase A7M.
 +
                                                The machine
 +
                                                learning approach using AlQuarishi's Deep Neural Network yielded a
 +
                                                structure which seemed to
 +
                                                not have any secondary structures. To exclude the possibility of an
 +
                                                error in the
 +
                                                PyMOL visualization software by Schroedinger, a Ramachandran plot
 +
                                                (figure xyz)
 +
                                                was created. The plot shows that no typical secondary structures are
 +
                                                present
 +
                                                which is a strong indicator of a failed approach to determine a
 +
                                                structure.
 +
                                                The approach, using <i>Rosetta Comparative Modeling</i>, yielded
 +
                                                15,000
 +
                                                structures scored with the talaris2013 scoring function. The ten
 +
                                                best structures
 +
                                                were aligned and exhibited almost identical secondary structures
 +
                                                (figure xzy).
 +
                                                The greatest structural differences are present in the N- and
 +
                                                C-terminal
 +
                                                regions. Since terminal regions tend to fluctuate more strongly than
 +
                                                non-terminal segments of the protein, we deemed those fluctuations
 +
                                                non-relevant
 +
                                                for the proteins functionality.
 +
                                                <br>
 +
                                                Being the best scoring candidate, structure S_14771 was analyzed
 +
                                                structurally
 +
                                                using a Ramachandran plot (figure xyz). The plot shows all the
 +
                                                relevant and
 +
                                                typical structures sortases exhibits and serves as an indicator for
 +
                                                a
 +
                                                successful structure prediction.
 +
                                                <br>
 +
                                                In the steps to follow, a molecular dynamics (MD)
 +
                                                simulation will be performed on both structures. Even though
 +
                                                structure CASP12
 +
                                                does not seem to be a valid structure, refolding processes during a
 +
                                                MD
 +
                                                simulation might lead to a relaxation of the protein and allow for a
 +
                                                promising
 +
                                                prediction of the sortase A7M structure.
 +
                                            </p>
 +
                                            <h2>References</h2>
 +
                                            <ol class="references">
 +
                                                <li id="cite_note-1">
 +
                                                    <span class="mw-cite-backlink">
 +
                                                        <a href="#cite_ref-1">↑</a>
 +
                                                    </span>
 +
                                                    <span class="reference-text">
 +
                                                        Bishop, CM.., Neural Networks for Pattern Recognition.
 +
                                                        Oxford University
 +
                                                        Press,
 +
                                                        1995.
 +
                                                        <a rel="nofollow" class="external autonumber"
 +
                                                            href="https://www.biorxiv.org/content/10.1101/265231v1"
 +
                                                            target="_blank">[1] </a>
 +
                                                    </span>
 +
                                                </li>
 +
                                                <li id="cite_note-2">
 +
                                                    <span class="mw-cite-backlink">
 +
                                                        <a href="#cite_ref-2">↑</a>
 +
                                                    </span>
 +
                                                    <span class="reference-text">
 +
                                                        AlQuraishi, M., End-to-End Differentiable Learning of
 +
                                                        Protein
 +
                                                        Structure. Cell Systems, 2019. 8: 1–10.
 +
                                                        <a rel="nofollow" class="external autonumber"
 +
                                                            href="https://www.biorxiv.org/content/10.1101/265231v1"
 +
                                                            target="_blank">[2] </a>
 +
                                                    </span>
 +
                                                </li> <!-- dihedral junge shrinken -->
 +
                                                <li id="cite_note-3">
 +
                                                    <span class="mw-cite-backlink">
 +
                                                        <a href="#cite_ref-3">↑</a>
 +
                                                    </span>
 +
                                                    <span class="reference-text">
 +
                                                        Leaver-Fay, A. et al., ROSETTA3: an object-oriented software
 +
                                                        suite for
 +
                                                        the
 +
                                                        simulation and design of
 +
                                                        macromolecules. Methods Enzymol, 2011. 487:545-74.
 +
                                                        <a rel="nofollow" class="external autonumber"
 +
                                                            href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083816/"
 +
                                                            target="_blank">[3]
 +
                                                        </a>
 +
                                                    </span>
 +
                                                </li>
 +
                                                <li id="cite_note-4">
 +
                                                    <span class="mw-cite-backlink">
 +
                                                        <a href="#cite_ref-4">↑</a>
 +
                                                    </span>
 +
                                                    <span class="reference-text">
 +
                                                        Moult, J. et al., Critical assessment of methods of protein
 +
                                                        structure
 +
                                                        prediction
 +
                                                        (CASP)—Round 6. PROTEINS:
 +
                                                        Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7.
 +
                                                        <a rel="nofollow" class="external autonumber"
 +
                                                            href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.20716"
 +
                                                            target="_blank">[4] </a>
 +
                                                    </span>
 +
                                                </li>
 +
                                            </ol>
 +
                                </div>
 
                             </div>
 
                             </div>
 
                         </div>
 
                         </div>
                        <p>For analysis the Strucure was viewed in Pymol<!-- PYMOL REFERENZ -->. As can be seen in the pictures below,
+
 
                            no secondary structures could be recognized by Pymol. Thus, a Ramachandran Plot was used to
+
                    </div>
                            evaluate the dihedral angles of the backbone. It was found that the angles do not match with
+
                </div>
                            the typical angles for &alpha;-helices and &beta;-sheets.</p>
+
 
 +
                <div class="tab my-3">
 +
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body3"
 +
                        aria-expanded="false" aria-controls="collapseOne">
 +
                        Molecular dynamics
 +
                    </button>
 +
                </div>
 +
 
 +
 
 +
                <div class="collapse multi-collapse" id="body3">
 +
                    <div class="card card-body">
 
                         <div class="row">
 
                         <div class="row">
                             <div class="figurcolumn column" style="width: 50%; float: left;  padding: 1em;">
+
                             <div class="col-xs-12 col-sm-12 col-md-2">
                                <img class="img-fluid center"
+
                                 <img class="img-fluid"
                                    src="https://2019.igem.org/wiki/images/8/82/T--TU_Darmstadt--CASP_CHAINSCartoon.gif"
+
                                     src="https://2019.igem.org/wiki/images/4/4e/T--TU_Darmstadt--MD_Modeling.png">
                                    style="width:100%">
+
                                <p><b>Animation 3: </b>The cartoon view in Pymol.</p>
+
                            </div>
+
                            <div class="figurcolumn column" style="width: 50%; float: right;  padding: 1em;">
+
                                 <img class="img-fluid center"
+
                                     src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
+
                                    style="width:100%">
+
                                <p> <b>Figure 1: </b>Ramachandran plot of the predicted structure.</p>
+
 
                             </div>
 
                             </div>
                        </div>
+
                            <div class="col-xs-12 col-sm-12 col-md-10">
                        <p>During training the predictions in AlQuarashi’s Model were optimized for their RMSD which is
+
                                <div class="flex-center">
                            the root-mean-square deviation of the distance between the atoms of the prediction and
+
                                    <h2>Introduction</h2>
                            reference
+
                            structure. Thus, even though the predictions are expected to have a similar shape to
+
                            the physical structure, they may not be in the energy minimum. Hence, we applied a
+
                            GROMACS molecular
+
                            dynamics in order to relax the structure obtained by AlQuarashi’s deep learning model.</p>
+
                        <h2>RosettaCM</h2>
+
                        <h3 class="ausfahrbarer-boi">Background</h3>
+
  
                        <p>In our second approach we used the <a href="rosettacommons.org"
+
                                    <p>The structure predictions made so far were based on statistical methods with
                                target="_blank"><i>RosettaCommons</a> comparative modeling (<a>RosettaCM</a>)</i>, which
+
                                        physical
                            is based on homology modeling. <i>Homology modeling</i> is a protein modeling method, which
+
                                        constraints. The Deep
                            requires one or more template structures as base the protein to be modeled on. The protein
+
                                        Learning algorithm uses a neural network trained to find a function associating
                            sequences are aligned with the sequence of the target protein. Unaligned sections are
+
                                        the
                            modeled using fragment or protein libraries, which leads to creating <!-- ästhetik --> protein structures based
+
                                        amino acid sequence and
                            on different sequence homologues of the protein of interest.
+
                                        the final 3D positions of the atoms within the protein. On the other hand,
                            <i>Ab-initio</i> or <i>de novo</i> modeling on the other hand attempts to find protein
+
                                        predictions
                            structures solely based on physicochemical principles applied to the primary sequence, which
+
                                        were made with Rosetta
                            can be compared to the refolding of a denaturated protein.</p>
+
                                        using the Monte Carlo Method. Here random movement of individual atoms occurs,
 +
                                        and the
 +
                                        energy is estimated after
 +
                                        each step.</p>
 +
 
 +
                                    <div class="row">
 +
                                        <div class="figurcolumn column"
 +
                                            style="width: 80%; float: right;  padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png"
 +
                                                style="width:100%">
 +
                                            <p><b>Figure 6: </b>Sortase A7M in a force field surrounded by discrete
 +
                                                water molecules. Image was made with …. </p>
 +
                                        </div>
 +
                                    </div>
  
                        <p>RosettaCM combines <i>ab-initio modeling</i> with <i>homology modeling</i>. The homologus structures for which a resolved 3D structure with sufficiently similar sequence exists are generated using homology modeling. Afterwards the unaligned sequences are modeled de novo. By combining the two methods RosettaCM
 
                            represents a precise and resource efficient tool for protein structure prediction.
 
                            Rosetta applications rely on the Monte-Carlo Optimization, which is a probabilistic
 
                            approach to finding a local minimum in the energy landscape of protein conformations. The
 
                            underlying equation serving as the fundament of the statistical Monte-Carlo <!-- ref original paper --> method is the Metropolis acceptance criterion:
 
                            $$p = min(1, exp[-\Delta E/ (k_{B} \cdot T)]),$$
 
                            <br> where k<sub>B</sub> is the Boltzmann constant, &Delta;E the difference in energy of the two states and T the temperature. The term k<sub>B</sub>T can also be written as a single factor &beta;.</p>
 
 
                            <p>
 
                                During the statistical protein folding based on the Monte-Carlo method, the initial
 
                                structure is changed by small random perturbations of the atom locations. Whether the structure is accepted or
 
                                not is decided by the Metropolis acceptance criterion.
 
                                If &Delta;E < 0, the structure is accepted, otherwise the newly proposed structure is
 
                                    accepted with probability p as described in the Metropolis acceptance criterion. </p> <h3
 
                                    class="ausfahrbarer-boi">Procedure</h3>
 
 
                                     <p>
 
                                     <p>
                                         The RosettaCM protocol requires evolutionary related structures and sequences,
+
                                         Even though both methods use physical constraints to find plausible protein
                                         as well as fragment files of the target structure.
+
                                        structures, neither of them actually
                                         The fragment files serve as a structure template for the proteins and they
+
                                         simulates the behavior of these molecules within a physical force field.
                                         consist of peptide fragments of sizes 3 and 9.
+
                                         Moreover, both methods do not necessarily output fully relaxed protein
                                         We gathered five evolutionary related structures from the RCBS PDB with the
+
                                        structures and simulate water implicitly by preferring hydrophilic parts of the
                                         accession numbers:</p>
+
                                        proteins to be on the outside. Thus, we conducted a molecular dynamics (MD)
                                    <ul>
+
                                         simulation to verify the plausibility of our protein structure and allow
<!-- LINKS FÜR ALLE STRUKTUREN EINFÜGEN -->
+
                                        equilibration.
                                         <li>1ija</li>
+
                                         The molecular dynamics simulation provides the opportunity to simulate water as
                                         <li>1itw</li>
+
                                        discrete molecules, creating a solvated protein. This step is crucial to
                                         <li>1itp</li>
+
                                        validate the structures, as the interaction with water is one of the primary
                                         <li>1ito</li>
+
                                         mechamism for protein folding.
                                         <li>2mlm</li>
+
                                        Since neither candidate CASP12 nor S_14771 have been modeled with explicit water
                                     </ul>
+
                                        an according MD simulation is imperative, to
                                    <br>
+
                                         verify the correctness of the candidates conformation.
 +
                                         This of course is much more expensive in terms of computational ressources. As
 +
                                         the protein has to be placed in a simulation box
 +
                                         and said box is filled with water molecules. This is called solvation and is
 +
                                         visualized for candidate S_14771 in figure eeeeee.
 +
                                     </p>
 +
 
 +
 
 
                                     <p>
 
                                     <p>
                                         The five RCBS entries represent different structures of sortases from
+
                                         We used GROMACS (GROningen MAchine for Chemical Simulations)
                                         <i>Staphylococcus aureus</i>.
+
                                         <!-- cite --> as the tool for our molecular dynamic simulations. GROMACS solves
                                         Fragment files can be created with the Robetta <a
+
                                        Newtons
                                             href="robetta.bakerlab.http://robetta.bakerlab.org/org"
+
                                        equations of motion for
                                             target="_blank">online server</a> or with the Rosetta FragmentPicker
+
                                        individual atoms
                                         application.
+
                                         <sup id="cite_ref-1" class="reference">
 +
                                             <a href="#cite_note-1">[1] </a>
 +
                                        </sup>
 +
                                        . While this classical simulation is much more accurate than predictions made by
 +
                                        the
 +
                                        other methods,
 +
                                        approximations are used nonetheless: Forces are cut after a certain radius and
 +
                                        the system
 +
                                        size is quite small.
 +
                                        <sup id="cite_ref-1" class="reference">
 +
                                             <a href="#cite_note-1">[1] </a>
 +
                                        </sup>
 +
                                        Additionally, atoms are assumed to be classical particles, which is not the
 +
                                         case, as quantum mechanics plays a role in particle-particle interactions.
 +
                                        Still, this simulation is very computationally expensive. Therefore, only time
 +
                                        periods less
 +
                                        than one second could be
 +
                                        simulated.
 
                                     </p>
 
                                     </p>
                                    <p>The RosettaCM procedure is best described in the following steps:</p>
 
                                    <!-- quelle auf rosetta cm seite-->
 
                                    <ol>
 
                                        <li>sequence and structural alignment of templates</li>
 
                                        <li>fragment insertion in unaligned sections</li>
 
                                        <li>replacement of random segment with segment from a different template
 
                                            structure</li>
 
                                        <li>energy minimization</li>
 
                                        <li>all-atom optimization</li>
 
  
                                     </ol>
+
                                     <h2>Methods</h2>
<br>
+
 
                                     <p>
 
                                     <p>
                                         The alignment can be performed with various tools. We used <a
+
                                         To perform the molecular dynamics simulations we mostly followed the <a
                                             href="https://mafft.cbrc.jp/alignment/server/" target="_blank">MAFFT</a> to
+
                                             href="http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html"
                                         generate the multiple sequence alignments.
+
                                            target="_blank">GROMACS Lysosome tutorial</a> as it serves our purpose
                                        Prior to using the alignments as an input, they were converted to the grishin
+
                                         perfectly. We created our simulation box to be of dodecahedral shape and a 0.7
                                        alignment format as RosettaCM requires the alignments to be in said format.
+
                                         nm distance of the solute to the box borders. We used periodic boundry
                                         The minimization is performed using the Rosetta controid energy function. For
+
                                         conditions and a Na<sup>+</sup> Cl<sup>-</sup> concentration of 0.012 mol/L. The
                                        the centroid function to be applied, the protein is converted to the centroid
+
                                        main difference of our approach was that we used the CHARMM36
                                        representation. A protein in centroid representation consists of the backbone
+
                                         <!-- cite --> force field instead of the OPLS-AA/L force field and have adjusted
                                         atoms N, C<sub>&alpha;</sub>;, O<sub>Carbonyl</sub> and an atom of varying size representing the
+
                                         our molecular dynamics parameters <a
                                        side chain. The advantage of using the centroid representation is that the
+
                                            href="http://www.gromacs.org/Documentation/Terminology/Force_Fields/CHARMM"
                                         energy landscape can be traversed easier due to the smoother nature of the
+
                                            target="_blank">accordingly</a>.
                                         centroid energy landscape.
+
                                         The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to
                                        Finally the generated structure undergoes a second minimization in an all-atom model by
+
                                         simulate approximately 1 ns per hour.
                                        means of Monte-Carlo optimization. This is similar to the energy minimization but without the amino acids being
+
                                        represented as centroids of their functional groups. Structures computed through
+
                                         all-atom optimizations can reach atomic resolutions {{Quelle rosetta paper}}
+
                                        which is crucial for a model meant to be used to estimate atomic
+
                                         interactions.
+
 
                                     </p>
 
                                     </p>
  
                                    <h3>Results</h3>
 
 
                                     <p>
 
                                     <p>
                                         The run yielded 15,000 structures which have been compared using the Rosetta
+
                                        To analyse the MD simulation we used the Python programming language and the <a
                                         scoring functions (talaris2013).
+
                                            href="https://www.biotite-python.org/" target="_blank">Biotite package</a>
                                         <!-- scoring -->
+
                                        <!-- cite --> as well as GROMACS analysis tools as
                                         From the 15,000 structures generated, we inspected the ten best scoring
+
                                        <!-- links zu den jungs--> <a>covar</a> and anaeig.
                                         structures.
+
                                         The first analyses are a root-mean-square deviation (RMSD), a root-mean-square
 +
                                        fluctuation (RMSF) and a gyration radius analysis.
 +
                                        RMSD calculations have been described in the structure prediction section. To
 +
                                        compute the RMSF the movement distance of each
 +
                                        residue is computed as a root-mean-square over time as:
 +
                                         $$ RMSF(t) = \sqrt{ 1/N \sum_i^N (v_i(t) - v_i(0)},
 +
                                        where v(t)<sub>i</sub> is the position of atom i at time t. The radius of
 +
                                        gyration is
 +
                                         <!-- überarbeiten -->
 +
                                         The final analysis performed on the MD simulation is called Principle Component
 +
                                        Analysis (PCA).
 +
                                        By applying PCA to a protein it is possible to gain insights into the relevant
 +
                                         vibrational motions and thereby the physical mechanism of the protein
 +
                                        <!-- zitat -->.
 +
                                    </p>
  
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                    <h2>Results</h2>
                                <img class="img-fluid center"
+
                                    <h3>First indicators</h3>
                                    src="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png"
+
                                    <p>
                                    style="width:100%">
+
                                        The first possible indicators of a stable protein structure are converging RMSD,
                                <p><b>Figure 2</b>: The structural alignment of the ten best scoring sortase structures
+
                                        small RMSF values
                                        displaying minor differences with the exception of the C- and N-terminal
+
                                        as well as converging radii of gyration. Using the Python software package and
                                        regions. N- and C-terminal regions tend to show strong fluctuations, thus it is
+
                                        the module Biotite we calculated
                                        unsurprising to find the terminal regions to be unaligned.</p>
+
                                        these quantities and plotted the results for both candidate S_14771 and
                            </div>
+
                                        candidate CASP12.
                                        <p>As can be seen in figure 5, the most prominent differences can
+
                                    </p>
                                         be found in the regions close to the N- and C-terminus. As fluctuations in those
+
                                    <div class="row">
                                        regions are not untypical, we decided to use the best scoring structure, candidate S_14771 (figure 6), as the input for the simulations to follow.</p>
+
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png"
 +
                                                style="width:100%">
 +
                                            <p>
 +
                                                <b>Figure 7: </b> The RMSD is one of three main indicators of a stable
 +
                                                protein structure of the MD simulation of
 +
                                                S_14771 over the period of 200,000 ps. As time progressed the RMSD
 +
                                                increased with a smaller slope.
 +
                                                The value stabilizes at a time of 110,000 ps and fluctuated around the
 +
                                                value of 6 &#8491;.
 +
                                            </p>
 +
                                         </div>
  
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                <img class="img-fluid center"
+
                                            <img class="img-fluid center"
                                    src="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif"
+
                                                src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png"
                                    style="width:100%">
+
                                                style="width:100%">
                                <p><b>Figure 3</b>: Sortase A7M candidate S_14771 created through RosettaCM.</p>
+
                                            <p>
                            </div>
+
                                                <b>Figure 8: </b> At t = 40,000 ps already the RMSD has arived at a
                           
+
                                                stable value, while at the same time
                        <div class="figurcolumn column" style="width: 70%; padding: 1em;">
+
                                                the gyration (fig x) radius decreases over time continuously. This
                            <img class="img-fluid center"
+
                                                information suggests the protein
                                src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dihedral.png"
+
                                                might be folding and potentially develpoing secondary structures not
                                style="width:100%">
+
                                                present previously.
                            <p><b>Figure 4</b>: The dihedral angles of amino acids can be calculated to create a Ramachandran plot. </p>
+
                                            </p>
                        </div>
+
                                        </div>
 +
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png"
 +
                                                style="width:100%">
 +
                                            <p>
 +
                                                <b>Figure 9: </b> The prominent fluctuations of the residues from ranges
 +
                                                105 to 115 might
 +
                                                indicate a binding site or another form of functional structure. The
 +
                                                radius of gyration, just as
 +
                                                the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
 +
                                                and converges towards a value of
 +
                                                16.7 &#8491;.
 +
                                            </p>
 +
                                        </div>
  
<!-- muss überarbeitet werden -->
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png"
 +
                                                style="width:100%">
 +
                                            <p>
 +
                                                <b>Figure 10: </b> As from t = 40,000 ps the radius of gyration
 +
                                                decreases constantly. At the end of the simulation the gyration radius
 +
                                                reaches a value of 17 &#8491;.
 +
                                                This behavior indicates folding of the protein structure.
 +
                                            </p>
 +
                                        </div>
  
                        To evaluate the secondary structure as done with the structure acquired through Deep Learning bla bla a ramachandran plot of the dihedral angle of the five sortases used as inputs has been made.
+
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                        Ramachandran plots of dihedral angles (fig x) can be a first indicator whether the structures computed are valid.
+
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png"
 +
                                                style="width:100%">
 +
                                            <p>
 +
                                                <b>Figure 11: </b> The fluctuations
 +
                                                (RMSF) of most residues appear insignificant compared to the first, the
 +
                                                last residues and
 +
                                                the residues close to residue 110 . Typically the N- and C-terminus tend
 +
                                                to fluctuate more intensively due to the lack of
 +
                                                stabilizing structures. The prominent fluctuations in the range of
 +
                                                residue 105 to 115
 +
                                                can indicate a binding site or another form of functional structure.
 +
                                            </p>
 +
                                        </div>
  
                        <div class="row">
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                            <img class="img-fluid center"
                                <img class="img-fluid center"
+
                                                src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png"
                                    src="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png"
+
                                                style="width:100%">
                                    style="width:100%">
+
                                            <p>
                            </div>
+
                                                <b>Figure 12: </b> The prominent fluctuations of the residues from
                            <div class="figurcolumn column" style="width: 50%; float: right;  padding: 1em;">
+
                                                ranges 105 to 115 might
                                <img class="img-fluid center"
+
                                                indicate a binding site or another form of functional structure. The
                                    src="https://2019.igem.org/wiki/images/1/18/T--TU_Darmstadt--Ramachandran_Plot.png"
+
                                                radius of gyration, just as
                                    style="width:100%">
+
                                                the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
                            </div>
+
                                                and converges towards a value of
                        </div>
+
                                                16.7 &#8491;.
                        <div class="row">
+
                                            </p>
                                <div class="figurcolumn column" style="width: 50%; float: left;  padding: 1em;">
+
                                        </div>
                                     <img class="img-fluid center"
+
                                    </div>
                                         src="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png"
+
                                    <br>
                                         style="width:100%">
+
                                     <p>
                                </div>
+
                                         Typical RMSDs and radii of gyration converge towards a value dependent on the
                                <div class="figurcolumn column" style="width: 50%; float: right;  padding: 1em;">
+
                                        size of the
                                    <img class="img-fluid center"
+
                                        protein. Convergence of those quantities can be interpreted as a stable state of
                                         src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG"
+
                                        the protein
                                         style="height:82.5%;">
+
                                        structure. As it can be seen in Figures x and y both the RMSD and the radius of
                                </div>
+
                                        gyration
                            </div>
+
                                        stabilize at the same time as the simulation reaches 110,000 ps (110 ns),
 +
                                        suggesting a now
 +
                                        stabilized structure of candidate S_14771 solvated in water. Another indicator
 +
                                        of a
 +
                                        functional protein is the RMSF. Instead of being averaged over all atoms, the
 +
                                        RMSF is
 +
                                        averaged over time with respect to each amino acid. It provides insights in both
 +
                                        protein
 +
                                        stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to
 +
                                        be
 +
                                        significantly higher than that of other residues. This hints at the presence of
 +
                                        a
 +
                                        functional unit along these residues. As commented on in the section
 +
                                        describing our structure prediction approaches, the N-
 +
                                        and C-terminal regions tend to fluctuate more strongly as a result of the
 +
                                         absence of
 +
                                        stabilizing structures.
 +
                                    </p>
 +
                                    <p>
 +
                                        RMSD and gyration of radius calculations of candidate CASP12 (figures x and y)
 +
                                         provide evidence of folding.
 +
                                        However, the RMSF values show values significantly higher, an
 +
                                        effect possibly caused by instability or refolding. Nevertheless, the strongest
 +
                                        fluctuations, disregarding the terminal regions, can be seen in the region of
 +
                                        residue 105 to
 +
                                        115. This insight consolidates the theory that residues 105 to 115 might be a
 +
                                         part of a
 +
                                        functional unit.
 +
                                    </p>
 +
                                    <p>
 +
                                        We were unsure whether candidate CASP12 can be considered a plausible structure
 +
                                        and
 +
                                        how to interpret the findings concerning the prominent fluctuations. Therefore,
 +
                                        we decided to perform a
 +
                                        <i>Principle Component Analysis</i>.
 +
                                    </p>
  
                                         <p><b>Figure 5: </b> The comparison of the ramachandran plot of structure S_14771 and the ramachandran plot found on <a href="https://proteopedia.org/wiki/images/9/90/Ramachandran_plot_general_100K.jpg">Protopedia</a>
+
                                    <h3>Principle Component Analysis</h3>
                                         suggests that secondary structures are present. Hence the structure appears
+
                                    <p>
                                         to contain &alpha;-helices, &beta;-sheets and a small amount of lefthanded
+
                                        To analyze our system further Principle Component Analysis (PCA) was performed
                                         &alpha;-helices. </p>
+
                                        using GROMACS.
                                         The Ramachandran plot (Figure xzy) showing &alpha;-helices and &beta;-sheets is a
+
                                    </p>
                                         strong indicator of a successful structure determination, as those secondary
+
                                    <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                         structures are crucial for the functionality of sortases.
+
                                        <img class="img-fluid center"
 +
                                            src="https://2019.igem.org/wiki/images/d/db/T--TU_Darmstadt--modes_s14771.gif"
 +
                                            style="width:100%">
 +
                                         <p><b>Animation 4: </b> A Principle Component Analysis of a fast (blue) and a
 +
                                            slow (red) mode showing the most prominent movements of the C&alpha;-chain
 +
                                            of candidate S_14771. Both modes show movement of the &beta;6&#47;&beta;7
 +
                                            loop consisting of residues 105 to 115 towards the active site . Thus we can
 +
                                            assume that the closing &beta;6&#47;&beta;7 loop is involved in the reaction
 +
                                            mechanism. </p>
 +
                                    </div>
 +
 
 +
                                    <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 +
                                        <img class="img-fluid center"
 +
                                            src="https://2019.igem.org/wiki/images/1/17/T--TU_Darmstadt--modes_casp.gif"
 +
                                            style="width:100%">
 +
                                        <p><b>Animation 5: </b> The modes of candidate CASP appear similar to each other
 +
                                            and no strong single movement can be specified. This makes the slow (red)
 +
                                            and fast (blue) mode indistinguishable from one another. Moreover the active
 +
                                            site amino acids do not appear to be in close proximity, which would make a
 +
                                            reaction catalyzed by candidate CASP12 impossible. </p>
 +
                                    </div>
 +
 
 +
                                    <p>
 +
                                         The results from the Principle Component Analysis of candidate S_14771
 +
                                        (animation xy) show a movement of the residues 105 to 115 towards the active
 +
                                        site, supporting our theory that residues 105 to 115 are important for the
 +
                                        reaction mechanism. Since the slow mode (red), which shows the most relevant
 +
                                         movement of the sortase, moves further towards the active site, it is possible
 +
                                        that the &beta;6&#47;&beta;7 loop either closes the binding site of the ligand
 +
                                         peptides or even transports one peptide towards the other.
 +
                                    </p>
 +
 
 +
                                    <p>
 +
                                         Animation xyz shows the results of the Principle Component Analysis of candidate
 +
                                        CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to
 +
                                         be moving randomly with no directed movement.
 +
                                         In addition the active site amino acids
 +
                                        <!-- ref --> are spread across the protein confirming our assumption that the
 +
                                        protein is not in a stable or plausible conformation.
 +
                                    </p>
  
 
                                     <h2>Conclusion</h2>
 
                                     <h2>Conclusion</h2>
 
                                     <p>
 
                                     <p>
                                         We used machine learning methods, as well as monte-carlo simulations to
+
                                         We gained evidence that at least on of our Sortase A7M models is a valid and
                                        determine the structure of the mutated transpeptidase Sortase A7M. The machine
+
                                         stable candidate by performing various methods to analyse the structural
                                         learning approach using AlQuarishi's Deep Neural Network yielded a structure which seemed to
+
                                         stability and validity of our two Sortase A7M candidates. The candidate S_14771
                                         not have any secondary structures. To exclude the possibility of an error in the
+
                                         that was generated using <i>RosettaCM</i> appears to be a fitting candidate not
                                        PyMOL visualization software by Schroedinger, a Ramachandran plot (figure xyz)
+
                                         only due to successful analyses, but also since the residues of the active site
                                        was created. The plot shows that no typical secondary structures are present
+
                                         <!-- ref --> are close enough to each other to catalyze a ligation reaction.
                                         which is a strong indicator of a failed approach to determine a structure.
+
                                         Our model created through deep learning excelled only in terms of RMSD and
                                        The approach, using <i>Rosetta Comparative Modeling</i>, yielded 15,000
+
                                         gyration radius calculations. Not only the RMSF and Principle Component Analysis
                                         structures scored with the talaris2013 scoring function. The ten best structures
+
                                         but also the conformation of the active site have proven candidate CASP12 to be
                                         were aligned and exhibited almost identical secondary structures (figure xzy).
+
                                         of no use for further calculations as it does not portray a valid conformation
                                        The greatest structural differences are present in the N- and C-terminal
+
                                         of Sortase A7M.
                                        regions. Since terminal regions tend to fluctuate more strongly than
+
                                        non-terminal segments of the protein, we deemed those fluctuations non-relevant
+
                                        for the proteins functionality.
+
                                        <br>
+
                                        Being the best scoring candidate, structure S_14771 was analyzed structurally
+
                                        using a Ramachandran plot (figure xyz). The plot shows all the relevant and
+
                                         typical structures sortases exhibits and serves as an indicator for a
+
                                         successful structure prediction.
+
                                         <br>
+
                                        In the steps to follow, a molecular dynamics (MD)
+
                                        simulation will be performed on both structures. Even though structure CASP12
+
                                         does not seem to be a valid structure, refolding processes during a MD
+
                                         simulation might lead to a relaxation of the protein and allow for a promising
+
                                        prediction of the sortase A7M structure.
+
 
                                     </p>
 
                                     </p>
 +
                                    </p>
 +
 +
 
                                     <h2>References</h2>
 
                                     <h2>References</h2>
 
                                     <ol class="references">
 
                                     <ol class="references">
Line 433: Line 976:
 
                                             </span>
 
                                             </span>
 
                                             <span class="reference-text">
 
                                             <span class="reference-text">
                                                 Bishop, CM.., Neural Networks for Pattern Recognition. Oxford University
+
                                                 Apol, E. et. al. GROMACS
                                                 Press,
+
                                                 USER MANUAL. Department of Biophysical Chemistry, University of
                                                 1995.
+
                                                Groningen.
 +
                                                 2015.
 
                                                 <a rel="nofollow" class="external autonumber"
 
                                                 <a rel="nofollow" class="external autonumber"
 
                                                     href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                     href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                     target="_blank">[1] </a>
 
                                                     target="_blank">[1] </a>
                                            </span>
 
                                        </li>
 
                                        <li id="cite_note-2">
 
                                            <span class="mw-cite-backlink">
 
                                                <a href="#cite_ref-2">↑</a>
 
                                            </span>
 
                                            <span class="reference-text">
 
                                                AlQuraishi, M., End-to-End Differentiable Learning of Protein
 
                                                Structure. Cell Systems, 2019. 8: 1–10.
 
                                                <a rel="nofollow" class="external autonumber"
 
                                                    href="https://www.biorxiv.org/content/10.1101/265231v1"
 
                                                    target="_blank">[2] </a>
 
                                            </span>
 
                                        </li> <!-- dihedral junge shrinken -->
 
                                        <li id="cite_note-3">
 
                                            <span class="mw-cite-backlink">
 
                                                <a href="#cite_ref-3">↑</a>
 
                                            </span>
 
                                            <span class="reference-text">
 
                                                Leaver-Fay, A. et al., ROSETTA3: an object-oriented software suite for
 
                                                the
 
                                                simulation and design of
 
                                                macromolecules. Methods Enzymol, 2011. 487:545-74.
 
                                                <a rel="nofollow" class="external autonumber"
 
                                                    href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083816/"
 
                                                    target="_blank">[3]
 
                                                </a>
 
                                            </span>
 
                                        </li>
 
                                        <li id="cite_note-4">
 
                                            <span class="mw-cite-backlink">
 
                                                <a href="#cite_ref-4">↑</a>
 
                                            </span>
 
                                            <span class="reference-text">
 
                                                Moult, J. et al., Critical assessment of methods of protein structure
 
                                                prediction
 
                                                (CASP)—Round 6. PROTEINS:
 
                                                Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7.
 
                                                <a rel="nofollow" class="external autonumber"
 
                                                    href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.20716"
 
                                                    target="_blank">[4] </a>
 
 
                                             </span>
 
                                             </span>
 
                                         </li>
 
                                         </li>
 
                                     </ol>
 
                                     </ol>
                    </div>
 
                </div>
 
            </div>
 
  
        </div>
+
                                </div>
    </div>
+
                            </div>
 
+
                        </div>
    <div class="tab my-3">
+
        <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body3" aria-expanded="false"
+
            aria-controls="collapseOne">
+
            Molecular dynamics
+
        </button>
+
    </div>
+
  
   
+
                    </div>
    <div class="collapse multi-collapse" id="body3">
+
                </div>
        <div class="card card-body">
+
                <div class="tab my-3">
            <div class="row">
+
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body4"
                <div class="col-xs-12 col-sm-12 col-md-2">
+
                        aria-expanded="false" aria-controls="collapseOne">
                    <img class="img-fluid"
+
                         Docking
                         src="https://2019.igem.org/wiki/images/4/4e/T--TU_Darmstadt--MD_Modeling.png">
+
                    </button>
 
                 </div>
 
                 </div>
                <div class="col-xs-12 col-sm-12 col-md-10">
 
                    <div class="flex-center">
 
                        <h2>Introduction</h2>
 
 
                        <p>The structure predictions made so far were based on statistical methods with physical
 
                            constraints. The Deep
 
                            Learning algorithm uses a neural network trained to find a function associating the
 
                            amino acid sequence and
 
                            the final 3D positions of the atoms within the protein. On the other hand, predictions
 
                            were made with Rosetta
 
                            using the Monte Carlo Method. Here random movement of individual atoms occurs, and the
 
                            energy  is estimated after
 
                            each step.</p>
 
  
 +
                <div class="collapse multi-collapse" id="body4">
 +
                    <div class="card card-body">
 
                         <div class="row">
 
                         <div class="row">
                             <div class="figurcolumn column" style="width: 80%; float: right;  padding: 1em;">
+
                             <div class="col-xs-12 col-sm-12 col-md-2">
                                 <img class="img-fluid center"
+
                                 <img class="img-fluid"
                                     src="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png" style="width:100%">
+
                                     src="https://2019.igem.org/wiki/images/d/df/T--TU_Darmstadt--Docking_Structure_Determination.png">
                                <p><b>Figure 6: </b>Sortase A7M in a force field surrounded by discrete water molecules. Image was made with …. </p>
+
 
                             </div>
 
                             </div>
                        </div>
+
                            <div class="col-xs-12 col-sm-12 col-md-10">
 +
                                <div class="flex-center">
  
                        <p>
+
                                    <p>
                            Even though both methods use physical constraints to find plausible protein structures, neither of them actually
+
                                        Now that the binding site of the Sortase had been found, the peptide ligand
                            simulates the behavior of these molecules within a physical force field. Moreover, both methods do not necessarily output fully relaxed protein structures and simulate water implicitly by preferring hydrophilic parts of the proteins to be on the outside. Thus, we conducted a molecular dynamics (MD) simulation to verify the plausibility of our protein structure and allow equilibration.
+
                                        needed to be inserted into the binding site to create a peptide-protein complex.
                            The molecular dynamics simulation provides the opportunity to simulate water as discrete molecules, creating a solvated protein. This step is crucial to
+
                                        The procedure of choice
                            validate the structures, as the interaction with water is one of the primary mechamism for protein folding.
+
                                        for the introduction of a ligand into the binding site of a protein is called
                            Since neither candidate CASP12 nor S_14771 have been modeled with explicit water an according MD simulation is imperative, to
+
                                        <i>docking</i>. In the
                            verify the correctness of the candidates conformation.
+
                                        following sections, we will present the protocol and methods we used as well as
                            This of course is much more expensive in terms of computational ressources. As the protein has to be placed in a simulation box
+
                                        the results they yielded.
                            and said box is filled with water molecules. This is called solvation and is visualized for candidate S_14771 in figure eeeeee.
+
                                    </p>
                        </p>
+
                           
+
  
                        <p>
+
                                    <h2>Background</h2>
                            We used GROMACS (GROningen MAchine for Chemical Simulations) <!-- cite --> as the tool for our molecular dynamic simulations. GROMACS solves Newtons
+
                                    <p>
                            equations of motion for
+
                                        Enzymes are one of the most relevant macromolecules in biology. Their
                            individual atoms
+
                                        functionality is determined through the way they interact with their ligands.
                            <sup id="cite_ref-1" class="reference">
+
                                        Although enzymes are highly specific concerning the ligands they interact with,
                                <a href="#cite_note-1">[1] </a>
+
                                        similar compounds can often bind to the same enzyme albeit with different
                            </sup>
+
                                        affinity.
                            . While this classical simulation is much more accurate than predictions made by the  
+
                                        To determine the best possible binding conformation of the protein-ligand
                            other methods,
+
                                        complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons
                            approximations are used nonetheless: Forces are cut after a certain radius and the system
+
                                        software package.
                            size is quite small.
+
                                    </p>
                            <sup id="cite_ref-1" class="reference">
+
                                <a href="#cite_note-1">[1] </a>
+
                            </sup>
+
                            Additionally, atoms are assumed to be classical particles, which is not the case, as quantum mechanics plays a role in particle-particle interactions.
+
                            Still, this simulation is very computationally expensive. Therefore, only time periods less
+
                            than one second could be
+
                            simulated.
+
                        </p>
+
  
                        <h2>Methods</h2>
+
                                    <h2>Procedure</h2>
                        <p>
+
                                    <p>
                            To perform the molecular dynamics simulations we mostly followed the <a href="http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html" target="_blank">GROMACS Lysosome tutorial</a> as it serves our purpose perfectly. We created our simulation box to be of dodecahedral shape and a 0.7 nm distance of the solute to the box borders. We used periodic boundry conditions and a Na<sup>+</sup> Cl<sup>-</sup> concentration of 0.012 mol/L. The main difference of our approach was that we used the CHARMM36 <!-- cite --> force field instead of the OPLS-AA/L force field and have adjusted our molecular dynamics parameters <a href="http://www.gromacs.org/Documentation/Terminology/Force_Fields/CHARMM" target="_blank">accordingly</a>.  
+
                                        The ab-initio FlexPepDock protocol consists of multiple steps and is documented
The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to simulate approximately 1 ns per hour.                    
+
                                        on the RosettaCommons <a href="">online documentation</a>. We modified the
</p>
+
                                        protocol as the one provided did not work with our approach.
 +
                                        The modified protocol has the following form:
 +
                                    </p>
 +
                                    <ol>
 +
                                        <li>secondary structure determination</li>
 +
                                        <li>complex creation</li>
 +
                                        <li>FlexPepDock refinement</li>
 +
                                    </ol>
 +
                                    <br>
 +
                                    <p>
 +
                                        To determine the secondary structure of the peptide, fragment files (3- and
 +
                                        5-mers) had to be generated and a PSIPRED secondary structure prediction had to
 +
                                        be performed. As the peptides had a sequence length less than 20 amino acids, we
 +
                                        were not able to use the online services such as <a
 +
                                            href="http://robetta.bakerlab.org/">Robetta</a> and the <a
 +
                                            href="http://bioinf.cs.ucl.ac.uk/psipred/">PSIPRED online service</a>.
 +
                                        Instead we used the Rosetta <a
 +
                                            href="https://www.rosettacommons.org/docs/latest/application_documentation/utilities/app-fragment-picker">FragmentPicker
 +
                                            application</a> and the PSIPRED <a
 +
                                            href="https://github.com/psipred/psipred">command line tool</a>.
 +
                                        The generated structures serve as the input for the refinement protocol.
 +
                                        <br>
 +
                                        The generation of the peptide-protein complex can be divided into three steps:
 +
                                    </p>
 +
                                    <ul>
 +
                                        <li>peptide creation</li>
 +
                                        <li>peptide relaxation</li>
 +
                                        <li>coarse complex creation</li>
 +
                                    </ul>
 +
                                    <br>
 +
                                    <p>
 +
                                        The peptide structure was created through ab-initio modeling.
 +
                                        Initial creation of the peptide was followed by insertion of the peptide into
 +
                                        the sortase binding site. This lead to a coarse model of the peptide sortase
 +
                                        complex. Here we used insight gained from the molecular dynamics simulation to
 +
                                        place the peptide close to the binding site.
 +
                                        <!-- vielleicht hier schon biotite erwähnen -->
 +
                                        <br>
 +
                                        In the final step the FlexPepDock refinement protocol is executed and 50,000
 +
                                        complex structures are generated. We used the inputs as described in
 +
                                        {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
 +
                                        <br>
 +
                                        To get a better overview over our data we performed a clustering in python,
 +
                                        using the scikit-learn package. We clustered the structures with respect to:
 +
                                    </p>
 +
                                    <ul>
 +
                                        <li>total score: the total score of the docking provided by the <i>Rosetta</i>
 +
                                            scoring function</li>
 +
                                        <li>interface score: the sum of the energy of the residues in the interfacing
 +
                                            region</li>
 +
                                        <li>reweighted score: a score calculated by double weighting the contribution of
 +
                                            the residues in the interfacing region</li>
 +
                                        <li>root-mean-square deviation: the root-mean-square deviation of the peptides
 +
                                            in relation to the structure with the highest score</li>
 +
                                        <li>peptide direction: the direction the peptide is facing</li>
 +
                                    </ul>
 +
                                    <br>
 +
                                    <p>
 +
                                        Here clustering is used to group the docking results and thereby descrease the
 +
                                        samlple size.
 +
                                        From the 50,000 results we picked the results with the 500 best total scores,
 +
                                        the 500 best interface scores and
 +
                                        the 500 best reweighted scores.
 +
                                        As we aimed to create an unbiased set for clustering, the abscence of duplicates
 +
                                        in the set was ensured.
 +
                                        We decreased the sample size to 100 groups representing the best scoring
 +
                                        structures from the three categories.
 +
                                    </p>
  
                        <p>
+
                                    <h2>Results</h2>
                            To analyse the MD simulation we used the Python programming language and the <a href="https://www.biotite-python.org/" target="_blank">Biotite package</a> <!-- cite --> as well as GROMACS analysis tools as <!-- links zu den jungs--> <a>covar</a> and anaeig.
+
                                    <p>
                            The first analyses are a root-mean-square deviation (RMSD), a root-mean-square fluctuation (RMSF) and a gyration radius analysis.
+
                                        For sequences MGGGGPPPPPP(M-polyG), GGGGPPPPPP(polyG) and PPPPPPLPETGG(LPETGG)
                            RMSD calculations have been described in the structure prediction section. To compute the RMSF the movement distance of each
+
                                        50,000 structures have been created and clustered.
                            residue is computed as a root-mean-square over time as:
+
                                        After the clustering the sample consisted of 100 structures of docked complexes.
                            $$ RMSF(t) = \sqrt{ 1/N \sum_i^N (v_i(t) - v_i(0)},
+
                                    </p>
                            where v(t)<sub>i</sub> is the position of atom i at time t. The radius of gyration is <!-- überarbeiten -->
+
                            The final analysis performed on the MD simulation is called Principle Component Analysis (PCA).
+
                            By applying PCA to a protein it is possible to gain insights into the relevant vibrational motions and thereby the physical mechanism of the protein <!-- zitat -->.  
+
                        </p>
+
  
                        <h2>Results</h2>
+
                                    <div class="figurcolumn column" style="width: 50%; float: center; padding: 1em;">
                        <h3>First indicators</h3>
+
                                        <img class="img-fluid center"
                        <p>
+
                                            src="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png"
                            The first possible indicators of a stable protein structure are converging RMSD, small RMSF values
+
                                            style="width:100%">
                            as well as converging radii of gyration. Using the Python software package and the module Biotite we calculated
+
                                        <p><b>Figure 13: </b> The three best scoring structures (total score, interface
                            these quantities and plotted the results for both candidate S_14771 and candidate CASP12.
+
                                            score, reweighted score) of the LPETGG-tag are shown. Only two results are
                        </p>
+
                                            visible as the best reweighted score candidate is identical to the best
                        <div class="row">
+
                                            interface score candidate. The reacting section of the LPETGG-tag namely
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                            glycine is colored yellow as is the active site. The glycin of both ligand
                                <img class="img-fluid center"
+
                                            peptides is facing the active site. </p>
                                    src="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png"
+
                                    </div>
                                    style="width:100%">
+
                                    <p>
                                <p>
+
                                        Analysis of the scores has shown a similar score for all the three dockings. The
                                    <b>Figure 7: </b> The RMSD is one of three main indicators of a stable protein structure of the MD simulation of
+
                                        best scoring results of the LPETGG docking show a tendency of the glycines to
                                    S_14771 over the period of 200,000 ps. As time progressed the RMSD increased with a smaller slope.  
+
                                        face the active site while also being in close proximity to the active site.
                                    The value stabilizes at a time of 110,000 ps and fluctuated around the value of 6 &#8491;.
+
                                    </p>
                                </p>
+
                            </div>
+
  
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                    <div class="row">
                                <img class="img-fluid center"
+
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                    src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png"
+
                                            <img class="img-fluid center"
                                    style="width:100%">
+
                                                src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png"
                                <p>
+
                                                style="width:100%">
                                    <b>Figure 8: </b> At t = 40,000 ps already the RMSD has arived at a stable value, while at the same time
+
                                            <p><b>Figure 14: </b>The three best scoring structures (total score,
                                    the gyration (fig x) radius decreases over time continuously. This information suggests the protein
+
                                                interface score, reweighted score) of the poly-g peptide are shown. Only
                                    might be folding and potentially develpoing secondary structures not present previously.
+
                                                two results are visible as the best reweighted score candidate is
                                </p>
+
                                                identical to the best interface score candidate. Instead of facing the
                            </div>
+
                                                active site (yellow) the reacting glycines (yellow) appear to interact
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                                with the &beta;6&#47;&beta;7 loop of the sortase. </p>
                                <img class="img-fluid center"
+
                                        </div>
                                    src="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png"
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
                                    style="width:100%">
+
                                            <img class="img-fluid center"
                                <p>
+
                                                src="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png"
                                    <b>Figure 9: </b> The prominent fluctuations of the residues from ranges 105 to 115 might
+
                                                style="width:100%">
                                     indicate a binding site or another form of functional structure. The radius of gyration, just as
+
                                            <p><b>Figure 15: </b>The three best scoring structures (total score,
                                    the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of
+
                                                interface score, reweighted score) of the poly-g peptide are shown. Only
                                    16.7 &#8491;.
+
                                                two results are visible as the best reweighted score candidate is
                                </p>
+
                                                identical to the best interface score candidate.
                            </div>
+
                                                Concerning the M-poly-G peptide no uniform directional orientation can
 +
                                                be observed.
 +
                                                The structure with the best interface score (light blue) is oriendted
 +
                                                towards the loop while the structure with the best total/reweighted
 +
                                                (dark blue) is oriented towards the &beta;-sheets.</p>
 +
                                        </div>
 +
                                     </div>
 +
                                    <!-- see more button instead oben halt und so -->
 +
                                    <p>
 +
                                        Figure lpetgg
 +
                                        <!-- das auch noch ändern --> shows the docking result of the LPETGG peptide to
 +
                                        the sortase. The results shown are the best scoring structures of the clustering
 +
                                        with respect to the total score, interface score and reweighted score. As the
 +
                                        best scoring structure is the same for the total score and the reweighted score
 +
                                        only two peptides are shown. This also applies to figures x and y. For both
 +
                                        results the reacting glycin residues (yellow) are facing the active site.
 +
                                        Additionally, the same residues are in close proximity to the active site.
 +
                                    </p>
 +
                                    <p>
 +
                                        The figures x ad y show the docking of the both polyG and M-polyG. While polyG
 +
                                        results align well and seem to be interacting with the &beta;6&#47;&beta;7 loop
 +
                                        rather than with the active site, this does not seem to be the case for M-polyG.
 +
                                        Instead of both structures interacting with the &beta;6&#47;&beta;7 loop or
 +
                                        active site one (best interaction score; dark blue) interacts with the
 +
                                        &beta;6&#47;&beta;7 loop and the other (best reweighted/total score; light
 +
                                        blue-gray) appears to interact with the active site.
 +
                                    </p>
  
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                    <div class="row">
                                <img class="img-fluid center"
+
                                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
                                    src="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png"
+
                                            <img class="img-fluid center"
                                    style="width:100%">
+
                                                src="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png"
                                <p>
+
                                                style="width:100%">
                                    <b>Figure 10: </b> As from t = 40,000 ps the radius of gyration decreases constantly. At the end of the simulation the gyration radius
+
                                            <p><b>Figure 16: </b>The close up of the M-polyG peptide (best
                                    reaches a value of 17 &#8491;.
+
                                                total/reweighted score) indicates an interaction of methionine with
                                    This behavior indicates folding of the protein structure.
+
                                                arginine<sub>139</sub> and cysteine<sub>126</sub>. </p>
                                </p>
+
                                        </div>
                            </div>
+
                                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 +
                                            <img class="img-fluid center"
 +
                                                src="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png"
 +
                                                style="width:100%">
 +
                                            <p><b>Figure 17: </b> Methionine of the result with the best interface score
 +
                                                interacted with the &beta;6&#47;&beta;7 loop rather than the active
 +
                                                site. Still the reactive glycine residues appear to be bound to the
 +
                                                &beta;6&#47;&beta;7 loop. </p>
 +
                                        </div>
 +
                                    </div>
  
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                    <p>
                                <img class="img-fluid center"
+
                                        As can be seen in figure 16 visualizing the result of the the docking simulation
                                    src="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png"
+
                                        total/reweighted score) suggests an interaction of methionine and two of the
                                    style="width:100%">
+
                                        active sites namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
                                <p>
+
                                        <!-- metionin erwähnen -->
                                    <b>Figure 11: </b> The fluctuations
+
                                        Visualizing the result of the according docking simulation, as can be seen in
                                    (RMSF) of most residues appear insignificant compared to the first, the last residues and
+
                                        figure 16, suggests an interaction between methionine and two active site
                                    the residues close to residue 110 . Typically the N- and C-terminus tend to fluctuate more intensively due to the lack of
+
                                        residues, namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
                                    stabilizing structures. The prominent fluctuations in the range of residue 105 to 115
+
                                        Figure 17 shows the interaction of M-polyG with the &beta;6&#47;&beta;7 loop.
                                    can indicate a binding site or another form of functional structure.
+
                                        The glycines still interact with the &beta;6&#47;&beta;7 loop.
                                </p>
+
                                        Instead of binding above the &beta;6&#47;&beta;7 loop, which is the case for
                            </div>
+
                                        polyG as illustrated in fig z,
 +
                                        the interaction seems to be influenced by methionine. By interacting with the
 +
                                        residues in the &beta;-helix
 +
                                        methionine could potentially hinder binding of glycine to the
 +
                                        &beta;6&#47;&beta;7 loop by partial
 +
                                        immobilization of the peptide. Overall peptide binding and orientation is less
 +
                                        uniform compared
 +
                                        polyG without the leading methionine, which could be an indicator of lesser
 +
                                        binding affinity of M-PolyG towards
 +
                                        the &beta;6&#47;&beta;7 loop.
 +
                                    </p>
  
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                    <h2>Conclusion</h2>
                                <img class="img-fluid center"
+
                                    <p>
                                    src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png"
+
                                        To computationally investigate binding affinities of the polyG and M-polyG as
                                    style="width:100%">
+
                                        well as the LPETGG tags we performed
                                <p>
+
                                        docking simulations using the <i>Rosetta FlexPepDock</i> application. We used a
                                     <b>Figure 12: </b> The prominent fluctuations of the residues from ranges 105 to 115 might
+
                                        modified version of the recommended
                                    indicate a binding site or another form of functional structure. The radius of gyration, just as
+
                                        protocol as the modified version was easier to automate and served our purpose
                                    the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of
+
                                        better than the standard protocol.
                                    16.7 &#8491;.
+
                                        From the calculated scores only, we could not see a difference in binding
                                </p>
+
                                        affinities.
 +
                                        Thus, we inspected the best scoring
 +
                                        structures regarding the total score, the interface score and the reweighted
 +
                                        score using PyMOL.
 +
                                        Since the best structures with respect to total score and reweighted score were
 +
                                        the same for all simulations,
 +
                                        only two structures have been inspected per run. A polyproline tag was appended
 +
                                        to all the peptides to simulate
 +
                                        the modification of the VLPs with a small peptide.
 +
                                        <!-- GRoß helices etc erwähnen als begründung -->
 +
                                    </p>
 +
                                     <p>
 +
                                        As expected, the results showed that for LPETGG, the glycines of both peptides
 +
                                        oriented towards the active site.
 +
                                        This is unsurprising as peptides with the sequence LPXTGG are known to be
 +
                                        substrate of the Sortase. It was more surprising to
 +
                                        see the polyG tag oriented away from the active site since polyG also is a known
 +
                                        substrate of the sortase. Both polyG peptides
 +
                                        were facing the &beta;6&#47;&beta;7 loop (residues 105 to 115) uniformly and
 +
                                        appeared to be interacting with it. The M-polyG peptides did not
 +
                                        show a uniform orientation or interaction scheme. On one hand the visualization
 +
                                        of the best result concerning the total and reweighted
 +
                                        score has shown interaction of methionine with the cysteine<sub>126</sub> and
 +
                                        arginine<sub>139</sub>, two residues of the active
 +
                                        site. On the other hand, the visualization of the best result with respect to
 +
                                        the interface score shows the M-polyG facing the mobile &beta;6&#47;&beta;7
 +
                                        loop.
 +
                                        In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide
 +
                                        is pulled down below the &beta;6&#47;&beta;7 loop by the methionine interacting
 +
                                        with one of the &beta;-sheets leading to the active site. This is not the case
 +
                                        with the polgG results, which lie aligned in one plane
 +
                                        with the &beta;6&#47;&beta;7 loop.
 +
                                    </p>
 +
                                </div>
 
                             </div>
 
                             </div>
 
                         </div>
 
                         </div>
                        <br>
 
                        <p>
 
                            Typical RMSDs and radii of gyration converge towards a value dependent on the size of the
 
                            protein. Convergence of those quantities can be interpreted as a stable state of the protein
 
                            structure. As it can be seen in Figures x and y both the RMSD and the radius of gyration
 
                            stabilize at the same time as the simulation reaches 110,000 ps (110 ns), suggesting a now
 
                            stabilized structure of candidate S_14771 solvated in water. Another indicator of a
 
                            functional protein is the RMSF. Instead of being averaged over all atoms, the RMSF is
 
                            averaged over time with respect to each amino acid. It provides insights in both protein
 
                            stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to be
 
                            significantly higher than that of other residues. This hints at the presence of a
 
                            functional unit along these residues. As commented on in the section
 
                            describing our structure prediction approaches, the N-
 
                            and C-terminal regions tend to fluctuate more strongly as a result of the absence of
 
                            stabilizing structures.
 
                        </p>
 
                        <p>
 
                            RMSD and gyration of radius calculations of candidate CASP12 (figures x and y) provide evidence of folding.
 
                            However, the RMSF values show values significantly higher, an
 
                            effect possibly caused by instability or refolding. Nevertheless, the strongest
 
                            fluctuations, disregarding the terminal regions, can be seen in the region of residue 105 to
 
                            115. This insight consolidates the theory that residues 105 to 115 might be a part of a
 
                            functional unit.
 
                        </p>
 
                        <p>
 
                            We were unsure whether candidate CASP12 can be considered a plausible structure and
 
                            how to interpret the findings concerning the prominent fluctuations. Therefore, we decided to perform a
 
                            <i>Principle Component Analysis</i>.
 
                        </p>
 
 
                        <h3>Principle Component Analysis</h3>
 
                        <p>
 
                        To analyze our system further Principle Component Analysis (PCA) was performed using GROMACS.
 
                        </p>
 
                        <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
 
                                <img class="img-fluid center"
 
                                    src="https://2019.igem.org/wiki/images/d/db/T--TU_Darmstadt--modes_s14771.gif"
 
                                    style="width:100%">
 
                                <p><b>Animation 4: </b> A Principle Component Analysis of a fast (blue) and a slow (red) mode showing the most prominent movements of the C&alpha;-chain of candidate S_14771. Both modes show movement of the &beta;6&#47;&beta;7 loop consisting of residues 105 to 115 towards the active site . Thus we can assume that the closing &beta;6&#47;&beta;7 loop is involved in the reaction mechanism. </p>
 
                        </div>
 
 
                        <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
 
                            <img class="img-fluid center"
 
                                src="https://2019.igem.org/wiki/images/1/17/T--TU_Darmstadt--modes_casp.gif"
 
                                style="width:100%">
 
                            <p><b>Animation 5: </b> The modes of candidate CASP appear similar to each other and no strong single movement can be specified. This makes the slow (red) and fast (blue) mode indistinguishable from one another. Moreover the active site amino acids do not appear to be in close proximity, which would make a reaction catalyzed by candidate CASP12 impossible. </p>
 
                        </div>
 
 
                        <p>
 
                            The results from the Principle Component Analysis of candidate S_14771 (animation xy) show a movement of the residues 105 to 115 towards the active site, supporting our theory that residues 105 to 115 are important for the reaction mechanism. Since the slow mode (red), which shows the most relevant movement of the sortase, moves further towards the active site, it is possible that the &beta;6&#47;&beta;7 loop either closes the binding site of the ligand peptides or even transports one peptide towards the other.
 
                        </p>
 
 
                        <p>
 
                            Animation xyz shows the results of the Principle Component Analysis of candidate CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to be moving randomly with no directed movement.
 
                            In addition the active site amino acids <!-- ref --> are spread across the protein confirming our assumption that the protein is not in a stable or plausible conformation.
 
                        </p>
 
 
                        <h2>Conclusion</h2>
 
                        <p>
 
                            We gained evidence that at least on of our Sortase A7M models is a valid and stable candidate by performing various methods to analyse the structural stability and validity of our two Sortase A7M candidates. The candidate S_14771 that was generated using <i>RosettaCM</i> appears to be a fitting candidate  not only due to successful analyses, but also since the residues of the active site <!-- ref -->  are close enough to each other to catalyze a ligation reaction.
 
                            Our model created through deep learning excelled only in terms of RMSD and gyration radius calculations. Not only the RMSF and Principle Component Analysis but also the conformation of the active site have proven candidate CASP12 to be of no use for further calculations as it does not portray a valid conformation of Sortase A7M.
 
                        </p>
 
                        </p>
 
 
 
                        <h2>References</h2>
 
                        <ol class="references">
 
                            <li id="cite_note-1">
 
                                <span class="mw-cite-backlink">
 
                                    <a href="#cite_ref-1">↑</a>
 
                                </span>
 
                                <span class="reference-text">
 
                                    Apol, E. et. al. GROMACS
 
                                    USER MANUAL. Department of Biophysical Chemistry, University of Groningen.
 
                                    2015.
 
                                    <a rel="nofollow" class="external autonumber"
 
                                        href="https://www.biorxiv.org/content/10.1101/265231v1" target="_blank">[1] </a>
 
                                </span>
 
                            </li>
 
                        </ol>
 
  
 
                     </div>
 
                     </div>
 
                 </div>
 
                 </div>
            </div>
 
 
        </div>
 
    </div>
 
    <div class="tab my-3">
 
        <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body4" aria-expanded="false"
 
            aria-controls="collapseOne">
 
            Docking
 
        </button>
 
    </div>
 
  
    <div class="collapse multi-collapse" id="body4">
+
                <div class="tab my-3">
        <div class="card card-body">
+
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body6"
            <div class="row">
+
                        aria-expanded="false" aria-controls="collapseOne">
                <div class="col-xs-12 col-sm-12 col-md-2">
+
                         Conclusion
                    <img class="img-fluid"
+
                    </button>
                         src="https://2019.igem.org/wiki/images/d/df/T--TU_Darmstadt--Docking_Structure_Determination.png">
+
 
                 </div>
 
                 </div>
                <div class="col-xs-12 col-sm-12 col-md-10">
 
                    <div class="flex-center">
 
 
                        <p>
 
                            Now that the binding site of the Sortase had been found, the peptide ligand
 
                            needed to be inserted into the binding site to create a peptide-protein complex. The procedure of choice
 
                            for the introduction of a ligand into the binding site of a protein is called <i>docking</i>. In the
 
                            following sections, we will present the protocol and methods we used as well as the results they yielded.
 
                        </p>
 
 
                        <h2>Background</h2>
 
                        <p>
 
                        Enzymes are one of the most relevant macromolecules in biology. Their functionality is determined through the way they interact with their ligands. Although enzymes are highly specific concerning the ligands they interact with, similar compounds can often bind to the same enzyme albeit with different affinity.
 
                        To determine the best possible binding conformation of the protein-ligand complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons software package.
 
                        </p>
 
 
                        <h2>Procedure</h2>
 
                        <p>
 
                        The ab-initio FlexPepDock protocol consists of multiple steps and is documented on the RosettaCommons <a href="">online documentation</a>. We modified the protocol as the one provided did not work with our approach.
 
                        The modified protocol has the following form:
 
                        </p>
 
                        <ol>
 
                            <li>secondary structure determination</li>
 
                            <li>complex creation</li>
 
                            <li>FlexPepDock refinement</li>
 
                        </ol>
 
                        <br>
 
                        <p>
 
                        To determine the secondary structure of the peptide, fragment files (3- and 5-mers) had to be generated and a PSIPRED secondary structure prediction had to be performed. As the peptides had a sequence length less than 20 amino acids, we were not able to use the online services such as <a href="http://robetta.bakerlab.org/">Robetta</a> and the <a href="http://bioinf.cs.ucl.ac.uk/psipred/">PSIPRED online service</a>. Instead we used the Rosetta <a href="https://www.rosettacommons.org/docs/latest/application_documentation/utilities/app-fragment-picker">FragmentPicker application</a> and the PSIPRED <a href="https://github.com/psipred/psipred">command line tool</a>.
 
                        The generated structures serve as the input for the refinement protocol.
 
                        <br>
 
                        The generation of the peptide-protein complex can be divided into three steps:
 
                        </p>
 
                        <ul>
 
                            <li>peptide creation</li>
 
                            <li>peptide relaxation</li>
 
                            <li>coarse complex creation</li>
 
                        </ul>
 
                        <br>
 
                        <p>
 
                        The peptide structure was created through ab-initio modeling.
 
                        Initial creation of the peptide was followed by insertion of the peptide into the sortase binding site. This lead to a coarse model of the peptide sortase complex. Here we used insight gained from the molecular dynamics simulation to place the peptide close to the binding site. <!-- vielleicht hier schon biotite erwähnen -->
 
                        <br>
 
                        In the final step the FlexPepDock refinement protocol is executed and 50,000 complex structures are generated. We used the inputs as described in {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
 
                        <br>
 
                        To get a better overview over our data we performed a clustering in python, using the scikit-learn package. We clustered the structures with respect to:
 
                        </p>
 
                        <ul>
 
                            <li>total score: the total score of the docking provided by the <i>Rosetta</i> scoring function</li>
 
                            <li>interface score: the sum of the energy of the residues in the interfacing region</li>
 
                            <li>reweighted score: a score calculated by double weighting the contribution of the residues in the interfacing region</li>
 
                            <li>root-mean-square deviation: the root-mean-square deviation of the peptides in relation to the structure with the highest score</li>
 
                            <li>peptide direction: the direction the peptide is facing</li>
 
                        </ul>
 
                        <br>
 
                        <p>
 
                        Here clustering is used to group the docking results and thereby descrease the samlple size.
 
                        From the 50,000 results we picked the results with the 500 best total scores, the 500 best interface scores and
 
                        the 500 best reweighted scores.
 
                        As we aimed to create an unbiased set for clustering, the abscence of duplicates in the set was ensured.
 
                        We decreased the sample size to 100 groups representing the best scoring
 
                        structures from the three categories.
 
                        </p>
 
 
                        <h2>Results</h2>
 
                        <p>
 
                            For sequences MGGGGPPPPPP(M-polyG), GGGGPPPPPP(polyG) and PPPPPPLPETGG(LPETGG) 50,000 structures have been created and clustered.
 
                            After the clustering the sample consisted of 100 structures of docked complexes.
 
                        </p>
 
 
                        <div class="figurcolumn column" style="width: 50%; float: center; padding: 1em;">
 
                                <img class="img-fluid center"
 
                                    src="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png"
 
                                    style="width:100%">
 
                                <p><b>Figure 13: </b> The three best scoring structures (total score, interface score, reweighted score) of the LPETGG-tag are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. The reacting section of the LPETGG-tag namely glycine is colored yellow as is the active site. The glycin of both ligand peptides is facing the active site. </p>
 
                        </div>
 
                        <p>
 
                        Analysis of the scores has shown a similar score for all the three dockings. The best scoring results of the LPETGG docking show a tendency of the glycines to face the active site while also being in close proximity to the active site.
 
                        </p>
 
  
 +
                <div class="collapse multi-collapse" id="body6">
 +
                    <div class="card card-body">
 
                         <div class="row">
 
                         <div class="row">
                             <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                             <div class="col-xs-12 col-sm-12 col-md-2">
                                 <img class="img-fluid center"
+
                                 <img class="img-fluid"
                                     src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png"
+
                                     src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--model_conclusion.png"
                                     style="width:100%">
+
                                     style="max-width:100%;">
                                <p><b>Figure 14: </b>The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Instead of facing the active site (yellow) the reacting glycines (yellow) appear to interact with the &beta;6&#47;&beta;7 loop of the sortase. </p>
+
 
                             </div>
 
                             </div>
                             <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                             <div class="col-xs-12 col-sm-12 col-md-10">
                                 <img class="img-fluid center"
+
                                 <div class="flex-center">
                                    src="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png"
+
 
                                    style="width:100%">
+
                                     ª{•̃̾_•̃̾}ª
                                <p><b>Figure 15: </b>The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate.
+
                                     Concerning the M-poly-G peptide no uniform directional orientation can be observed.
+
                                    The structure with the best interface score (light blue) is oriendted towards the loop while the structure with the best total/reweighted (dark blue) is oriented towards the &beta;-sheets.</p>
+
                            </div>
+
                        </div>
+
<!-- see more button instead oben halt und so -->
+
                        <p>
+
                            Figure lpetgg <!-- das auch noch ändern --> shows the docking result of the LPETGG peptide to the sortase. The results shown are the best scoring structures of the clustering with respect to the total score, interface score and reweighted score. As the best scoring structure is the same for the total score and the reweighted score only two peptides are shown. This also applies to figures x and y. For both results the reacting glycin residues (yellow) are facing the active site. Additionally, the same residues are in close proximity to the active site.
+
                        </p>
+
                        <p>
+
                            The figures x ad y show the docking of the both polyG and M-polyG. While polyG results align well and seem to be interacting with the &beta;6&#47;&beta;7 loop rather than with the active site, this does not seem to be the case for M-polyG. Instead of both structures interacting with the &beta;6&#47;&beta;7 loop or active site one (best interaction score; dark blue) interacts with the &beta;6&#47;&beta;7 loop and the other (best reweighted/total score; light blue-gray) appears to interact with the active site.
+
                        </p>
+
  
                        <div class="row">
+
                                 </div>
                            <div class="figurcolumn column" style="width: 50%; float: left; padding: 1em;">
+
                                 <img class="img-fluid center"
+
                                    src="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png"
+
                                    style="width:100%">
+
                                <p><b>Figure 16: </b>The close up of the M-polyG peptide (best total/reweighted score) indicates an interaction of methionine with arginine<sub>139</sub> and cysteine<sub>126</sub>.  </p>
+
                            </div>
+
                            <div class="figurcolumn column" style="width: 50%; float: right; padding: 1em;">
+
                                <img class="img-fluid center"
+
                                    src="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png"
+
                                    style="width:100%">
+
                                <p><b>Figure 17: </b> Methionine of the result with the best interface score interacted with the &beta;6&#47;&beta;7 loop rather than the active site. Still the reactive glycine residues appear to be bound to the &beta;6&#47;&beta;7 loop. </p>
+
 
                             </div>
 
                             </div>
 
                         </div>
 
                         </div>
  
                        <p>
 
                        As can be seen in figure 16 visualizing the result of the the docking simulation total/reweighted score) suggests an interaction of methionine and two of the active sites namely arginine<sub>139</sub> and cysteine<sub>126</sub>. <!-- metionin erwähnen -->
 
                        Visualizing the result of the according docking simulation, as can be seen in figure 16, suggests an interaction between methionine and two active site residues, namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
 
                        Figure 17 shows the interaction of M-polyG with the &beta;6&#47;&beta;7 loop.
 
                        The glycines still interact with the &beta;6&#47;&beta;7 loop.
 
                        Instead of binding above the &beta;6&#47;&beta;7 loop, which is the case for polyG as illustrated in fig z,
 
                        the interaction seems to be influenced by methionine. By interacting with the residues in the &beta;-helix
 
                        methionine could potentially hinder binding of glycine to the &beta;6&#47;&beta;7 loop by partial
 
                        immobilization of the peptide. Overall peptide binding and orientation is less uniform compared
 
                        polyG without the leading methionine, which could be an indicator of lesser binding affinity of M-PolyG towards
 
                        the &beta;6&#47;&beta;7 loop.
 
                        </p>
 
 
                        <h2>Conclusion</h2>
 
                        <p>
 
                            To computationally investigate binding affinities of the polyG and M-polyG as well as the LPETGG tags we performed
 
                            docking simulations using the <i>Rosetta FlexPepDock</i> application. We used a modified version of the recommended
 
                            protocol as the modified version was easier to automate and served our purpose better than the standard protocol.
 
                            From the calculated scores only, we could not see a difference in binding affinities.
 
                            Thus, we inspected the best scoring
 
                            structures regarding the total score, the interface score and the reweighted score using PyMOL.
 
                            Since the best structures with respect to total score and reweighted score were the same for all simulations,
 
                            only two structures have been inspected per run. A polyproline tag was appended to all the peptides to simulate
 
                            the modification of the VLPs with a small peptide. <!-- GRoß helices etc erwähnen als begründung -->
 
                        </p>
 
                        <p>
 
                            As expected, the results showed that for LPETGG, the glycines of both peptides oriented towards the active site.
 
                            This is unsurprising as peptides with the sequence LPXTGG are known to be substrate of the Sortase. It was more surprising to
 
                            see the polyG tag oriented away from the active site since polyG also is a known substrate of the sortase. Both polyG peptides
 
                            were facing the &beta;6&#47;&beta;7 loop (residues 105 to 115) uniformly and appeared to be interacting with it. The M-polyG peptides did not
 
                            show a uniform orientation or interaction scheme. On one hand the visualization of the best result concerning the total and reweighted
 
                            score has shown interaction of methionine with the cysteine<sub>126</sub> and arginine<sub>139</sub>, two residues of the active
 
                            site. On the other hand, the visualization of the best result with respect to the interface score shows the M-polyG facing the mobile &beta;6&#47;&beta;7 loop.
 
                            In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide is pulled down below the &beta;6&#47;&beta;7 loop by the methionine interacting
 
                            with one of the &beta;-sheets leading to the active site. This is not the case with the polgG results, which lie aligned in one plane 
 
                            with the &beta;6&#47;&beta;7 loop.
 
                        </p>
 
 
                     </div>
 
                     </div>
 
                 </div>
 
                 </div>
            </div>
 
  
        </div>
+
                <div class="tab my-3">
    </div>
+
                    <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body5"
 +
                        aria-expanded="false" aria-controls="collapseOne">
 +
                        Acknowledgements and References
 +
                    </button>
 +
                </div>
  
    <div class="tab my-3">
+
                 <div class="collapse multi-collapse" id="body5">
            <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body6" aria-expanded="false"
+
                    <div class="card card-body">
                 aria-controls="collapseOne">
+
                        <div class="row">
                Conclusion
+
                            <div class="col-xs-12 col-sm-12 col-md-2">
            </button>
+
                                <img class="img-fluid"
        </div>
+
                                    src="https://2019.igem.org/wiki/images/c/c3/T--TU_Darmstadt--model_aknowledge.png"
       
+
                                    style="max-width:100%;">
        <div class="collapse multi-collapse" id="body6">
+
                            </div>
            <div class="card card-body">
+
                            <div class="col-xs-12 col-sm-12 col-md-10">
                <div class="row">
+
                                <div class="flex-center">
                    <div class="col-xs-12 col-sm-12 col-md-2">
+
 
                        <img class="img-fluid"
+
                                    ԅ(‾⌣‾ԅ)
                            src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--model_conclusion.png"
+
 
                            style="max-width:100%;">
+
                                 </div>
                    </div>
+
                            </div>
                    <div class="col-xs-12 col-sm-12 col-md-10">
+
                        <div class="flex-center">
+
       
+
                                 ª{•̃̾_•̃̾}ª
+
       
+
 
                         </div>
 
                         </div>
                    </div>
 
                </div>
 
       
 
            </div>
 
        </div>
 
  
    <div class="tab my-3">
 
            <button class="btn btn-block" id="tab1" data-toggle="collapse" data-target="#body5" aria-expanded="false"
 
                aria-controls="collapseOne">
 
                Acknowledgements and References
 
            </button>
 
        </div>
 
       
 
        <div class="collapse multi-collapse" id="body5">
 
            <div class="card card-body">
 
                <div class="row">
 
                    <div class="col-xs-12 col-sm-12 col-md-2">
 
                        <img class="img-fluid"
 
                            src="https://2019.igem.org/wiki/images/c/c3/T--TU_Darmstadt--model_aknowledge.png"
 
                            style="max-width:100%;">
 
                    </div>
 
                    <div class="col-xs-12 col-sm-12 col-md-10">
 
                        <div class="flex-center">
 
       
 
                                ԅ(‾⌣‾ԅ)
 
       
 
                        </div>
 
 
                     </div>
 
                     </div>
 
                 </div>
 
                 </div>
       
 
            </div>
 
        </div>
 
  
  
    <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top" data-toggle="tooltip"
+
                <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top"
        data-placement="left">
+
                    data-toggle="tooltip" data-placement="left">
        <img src="https://2019.igem.org/wiki/images/b/b1/T--TU_Darmstadt--UpArrow.svg" alt="Logo" style="width: 2em" />
+
                    <img src="https://2019.igem.org/wiki/images/b/b1/T--TU_Darmstadt--UpArrow.svg" alt="Logo"
 +
                        style="width: 2em" />
  
  
 +
            </div>
 
         </div>
 
         </div>
        </div>
+
    </div>
        </div>
+
  
  
  
        <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top" data-toggle="tooltip"
+
    <a id="back-to-top" href="#" class="back-to-top" role="button" title="Return to the top" data-toggle="tooltip"
            data-placement="left"> <img src="https://2019.igem.org/wiki/images/b/b1/T--TU_Darmstadt--UpArrow.svg"
+
        data-placement="left"> <img src="https://2019.igem.org/wiki/images/b/b1/T--TU_Darmstadt--UpArrow.svg" alt="Logo"
                alt="Logo" style="width: 2em"> </a>
+
            style="width: 2em"> </a>
  
 
</body>
 
</body>

Revision as of 23:10, 20 October 2019

TU Darmstadt

Modeling

Introduction


In synthetic biology, theoretical models are often used to gain insights, predict and improve experiments. In our project we are modifying Virus-like particles (VLPs) by attaching proteins to the surface of the P22 capsid through a linker. The linking is catalyzed using the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A from Staphylococcus aureus. We performed modeling to predict the unknown structure of the Sortase A7M, to improve the linker between proteins and therefore optimizing the modification efficiency of our platform.
Two different modeling approaches were used to determine the structure of Sortase A7M. We compared machine learning approaches to traditional comparative, Monte-Carlo based modeling methods. The results were evaluated using an energy-scoring function and molecular dynamics (MD) simulations. The most promising Sortase A7M structures were used to perform a docking simulation to screen for optimal linkers.

In silico modeling and simulation of proteins requires a 3D structure, which can be obtained from the RCSB Protein Data Bank. However, if no 3D structures are annotated, as it is the case with sortase A7M, the structure has to be determined by other means. The structure prediction of sortase A7M was done using two different approaches.

Deep Learning

Background

Machine Learning is a class of algorithms that aim to determine a function between two datasets. This is commonly done by presenting the algorithm with training data as well as a scoring function to measure its success at processing the input data. During training a feedback loop is used to allow the algorithm to automatically find a function to fit the data. In contrast, classical algorithms are often hardcoded to solve a specific problem and only allow for limited flexibility.

A neural network consists of neurons, which are commonly referred to as nodes. They process input using weights, which are adjusted during its training. Nodes in neural networks are linked together: One neuron processes the inputs of other neurons, loosely mimicking the structure of biological brains. While one usually has a fixed amount of input and output neurons limited by the data one wishes to classify, adding layers of hidden neurons can improve the classification. This is often referred to as deep learning and has led to revolutions in applications like speech and image recognition.

Using Machine Learning to predict protein structures has many advantages compared to conventional methods especially for iGEM teams who often only have limited access to resources. After training a neural network, which is a computationally expensive process and often done in centralized data centers, it can be used to predict the structure of a wide variety of proteins. [1] Using pretrained models, novel protein structures can be obtained within seconds [2] compared to conventional methods taking several hours or days. [3]

Until earlier this year the use of Machine Learning in the prediction of protein structures has been restricted to applications within human-written algorithms. [2] AlQuarishi demonstrated a complete deep learning approach that is able to make predictions within 1-2 Å of other approaches [2] , while only using a fraction of the computational power. This enables accurate structural prediction with less powerful as well as less expensive hardware and thus significantly reduces the cost of structural modeling.

Procedure

We used AlQuarashi’s approach in combination with his pretrained model, which was trained on the Proteinnet database containing all structures released prior to the start of CASP12 (12th Critical Assessment of Techniques for Protein Structure Prediction – 2016). The results were tested against the CASP12 datasets and reached distance root-mean-square deviation (RMSD) values between 10 and 13 Å. The RMSD is defined as root-mean-square deviation of all atom positions compared to a template structure. It is defined as: $$ RMSD = \sqrt{\sum_i^N \left((||v_t - v_i||)^2\right)},$$ where v_i is a vector of all All proteins in the CASP datasets were not published until after the competition and thus represent an assessment with only little bias. [4] We used these pretrained datasets to make structural predictions for our Sortase A7M. The predicted structure was then relaxed in a Molecular Dynamics Simulation using GROMACS.

In the following, the specific steps for obtaining a tertiary structure predicted by AlQuarashi’s model are listed.

  1. We used the amino acid sequence of the Sortase A7M in the FASTA format to predict the tertiary structure of the amino acid backbone using AlQuarishi’s Tensor Flow implementation of his end-to-end differentiable learning of protein structure with the pretrained preCASP Proteinnet database. The Output file was a .tertiary file which contains a sequential 3x3 Matrix with atomic coordinates from each amino acid backbone starting at the N-Terminus.
  2. As the standard format for protein structure information is the PDB file format, we wrote a python script to combine the structural information from the FASTA and .tertiary files into a PDB file. For ease of use we used the Biotite Python Module.
  3. Using Rosetta's fixed backbone design program 'fixbb' with the 'hpatch', the optimal position of the side-chains was determined and added to the PDB file. The fixed backbone tool adds the corresponding side-chains and optimizes their conformation. The Hpatch database ensures that hydrophilic side-chains are to be preferred on the surface of the protein as our sortase is present in an aqueous environment.

Results

Animation 1: The raw PDB File converted from the .tertiary file.

Animation 2: The PDB-File after Step 3.

For analysis the Strucure was viewed in Pymol . As can be seen in the pictures below, no secondary structures could be recognized by Pymol. Thus, a Ramachandran Plot was used to evaluate the dihedral angles of the backbone. It was found that the angles do not match with the typical angles for α-helices and β-sheets.

Animation 3: The cartoon view in Pymol.

Figure 1: Ramachandran plot of the predicted structure.

During training the predictions in AlQuarashi’s Model were optimized for their RMSD which is the root-mean-square deviation of the distance between the atoms of the prediction and reference structure. Thus, even though the predictions are expected to have a similar shape to the physical structure, they may not be in the energy minimum. Hence, we applied a GROMACS molecular dynamics in order to relax the structure obtained by AlQuarashi’s deep learning model.

RosettaCM

Background

In our second approach we used the RosettaCommons comparative modeling (RosettaCM), which is based on homology modeling. Homology modeling is a protein modeling method, which requires one or more template structures as base the protein to be modeled on. The protein sequences are aligned with the sequence of the target protein. Unaligned sections are modeled using fragment or protein libraries, which leads to creating protein structures based on different sequence homologues of the protein of interest. Ab-initio or de novo modeling on the other hand attempts to find protein structures solely based on physicochemical principles applied to the primary sequence, which can be compared to the refolding of a denaturated protein.

RosettaCM combines ab-initio modeling with homology modeling. The homologus structures for which a resolved 3D structure with sufficiently similar sequence exists are generated using homology modeling. Afterwards the unaligned sequences are modeled de novo. By combining the two methods RosettaCM represents a precise and resource efficient tool for protein structure prediction. Rosetta applications rely on the Monte-Carlo Optimization, which is a probabilistic approach to finding a local minimum in the energy landscape of protein conformations. The underlying equation serving as the fundament of the statistical Monte-Carlo method is the Metropolis acceptance criterion: $$p = min(1, exp[-\Delta E/ (k_{B} \cdot T)]),$$
where kB is the Boltzmann constant, ΔE the difference in energy of the two states and T the temperature. The term kBT can also be written as a single factor β.

During the statistical protein folding based on the Monte-Carlo method, the initial structure is changed by small random perturbations of the atom locations. Whether the structure is accepted or not is decided by the Metropolis acceptance criterion. If ΔE < 0, the structure is accepted, otherwise the newly proposed structure is accepted with probability p as described in the Metropolis acceptance criterion.

Procedure

The RosettaCM protocol requires evolutionary related structures and sequences, as well as fragment files of the target structure. The fragment files serve as a structure template for the proteins and they consist of peptide fragments of sizes 3 and 9. We gathered five evolutionary related structures from the RCBS PDB with the accession numbers:

  • 1ija
  • 1itw
  • 1itp
  • 1ito
  • 2mlm

The five RCBS entries represent different structures of sortases from Staphylococcus aureus. Fragment files can be created with the Robetta online server or with the Rosetta FragmentPicker application.

The RosettaCM procedure is best described in the following steps:

  1. sequence and structural alignment of templates
  2. fragment insertion in unaligned sections
  3. replacement of random segment with segment from a different template structure
  4. energy minimization
  5. all-atom optimization

The alignment can be performed with various tools. We used MAFFT to generate the multiple sequence alignments. Prior to using the alignments as an input, they were converted to the grishin alignment format as RosettaCM requires the alignments to be in said format. The minimization is performed using the Rosetta controid energy function. For the centroid function to be applied, the protein is converted to the centroid representation. A protein in centroid representation consists of the backbone atoms N, Cα;, OCarbonyl and an atom of varying size representing the side chain. The advantage of using the centroid representation is that the energy landscape can be traversed easier due to the smoother nature of the centroid energy landscape. Finally the generated structure undergoes a second minimization in an all-atom model by means of Monte-Carlo optimization. This is similar to the energy minimization but without the amino acids being represented as centroids of their functional groups. Structures computed through all-atom optimizations can reach atomic resolutions {{Quelle rosetta paper}} which is crucial for a model meant to be used to estimate atomic interactions.

Results

The run yielded 15,000 structures which have been compared using the Rosetta scoring functions (talaris2013). From the 15,000 structures generated, we inspected the ten best scoring structures.

As can be seen in figure 5, the most prominent differences can be found in the regions close to the N- and C-terminus. As fluctuations in those regions are not untypical, we decided to use the best scoring structure, candidate S_14771 (figure 6), as the input for the simulations to follow.

Figure 2: The structural alignment of the ten best scoring sortase structures displaying minor differences with the exception of the C- and N-terminal regions. N- and C-terminal regions tend to show strong fluctuations, thus it is unsurprising to find the terminal regions to be unaligned.

Figure 3: Sortase A7M candidate S_14771 created through RosettaCM.

Figure 4: The dihedral angles of amino acids can be calculated to create a Ramachandran plot.

To evaluate the secondary structure as done with the structure acquired through Deep Learning bla bla a ramachandran plot of the dihedral angle of the five sortases used as inputs has been made. Ramachandran plots of dihedral angles (fig x) can be a first indicator whether the structures computed are valid.

Figure 5: The comparison of the ramachandran plot of structure S_14771 and the ramachandran plot found on Protopedia suggests that secondary structures are present. Hence the structure appears to contain α-helices, β-sheets and a small amount of lefthanded α-helices.

The Ramachandran plot (Figure xzy) showing α-helices and β-sheets is a strong indicator of a successful structure determination, as those secondary structures are crucial for the functionality of sortases.

Conclusion

We used machine learning methods, as well as monte-carlo simulations to determine the structure of the mutated transpeptidase Sortase A7M. The machine learning approach using AlQuarishi's Deep Neural Network yielded a structure which seemed to not have any secondary structures. To exclude the possibility of an error in the PyMOL visualization software by Schroedinger, a Ramachandran plot (figure xyz) was created. The plot shows that no typical secondary structures are present which is a strong indicator of a failed approach to determine a structure. The approach, using Rosetta Comparative Modeling, yielded 15,000 structures scored with the talaris2013 scoring function. The ten best structures were aligned and exhibited almost identical secondary structures (figure xzy). The greatest structural differences are present in the N- and C-terminal regions. Since terminal regions tend to fluctuate more strongly than non-terminal segments of the protein, we deemed those fluctuations non-relevant for the proteins functionality.
Being the best scoring candidate, structure S_14771 was analyzed structurally using a Ramachandran plot (figure xyz). The plot shows all the relevant and typical structures sortases exhibits and serves as an indicator for a successful structure prediction.
In the steps to follow, a molecular dynamics (MD) simulation will be performed on both structures. Even though structure CASP12 does not seem to be a valid structure, refolding processes during a MD simulation might lead to a relaxation of the protein and allow for a promising prediction of the sortase A7M structure.

References

  1. Bishop, CM.., Neural Networks for Pattern Recognition. Oxford University Press, 1995. [1]
  2. AlQuraishi, M., End-to-End Differentiable Learning of Protein Structure. Cell Systems, 2019. 8: 1–10. [2]
  3. Leaver-Fay, A. et al., ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol, 2011. 487:545-74. [3]
  4. Moult, J. et al., Critical assessment of methods of protein structure prediction (CASP)—Round 6. PROTEINS: Structure, Function, and Bioinformatics, 2005. Suppl 7:3–7. [4]

Introduction

The structure predictions made so far were based on statistical methods with physical constraints. The Deep Learning algorithm uses a neural network trained to find a function associating the amino acid sequence and the final 3D positions of the atoms within the protein. On the other hand, predictions were made with Rosetta using the Monte Carlo Method. Here random movement of individual atoms occurs, and the energy is estimated after each step.

Figure 6: Sortase A7M in a force field surrounded by discrete water molecules. Image was made with ….

Even though both methods use physical constraints to find plausible protein structures, neither of them actually simulates the behavior of these molecules within a physical force field. Moreover, both methods do not necessarily output fully relaxed protein structures and simulate water implicitly by preferring hydrophilic parts of the proteins to be on the outside. Thus, we conducted a molecular dynamics (MD) simulation to verify the plausibility of our protein structure and allow equilibration. The molecular dynamics simulation provides the opportunity to simulate water as discrete molecules, creating a solvated protein. This step is crucial to validate the structures, as the interaction with water is one of the primary mechamism for protein folding. Since neither candidate CASP12 nor S_14771 have been modeled with explicit water an according MD simulation is imperative, to verify the correctness of the candidates conformation. This of course is much more expensive in terms of computational ressources. As the protein has to be placed in a simulation box and said box is filled with water molecules. This is called solvation and is visualized for candidate S_14771 in figure eeeeee.

We used GROMACS (GROningen MAchine for Chemical Simulations) as the tool for our molecular dynamic simulations. GROMACS solves Newtons equations of motion for individual atoms [1] . While this classical simulation is much more accurate than predictions made by the other methods, approximations are used nonetheless: Forces are cut after a certain radius and the system size is quite small. [1] Additionally, atoms are assumed to be classical particles, which is not the case, as quantum mechanics plays a role in particle-particle interactions. Still, this simulation is very computationally expensive. Therefore, only time periods less than one second could be simulated.

Methods

To perform the molecular dynamics simulations we mostly followed the GROMACS Lysosome tutorial as it serves our purpose perfectly. We created our simulation box to be of dodecahedral shape and a 0.7 nm distance of the solute to the box borders. We used periodic boundry conditions and a Na+ Cl- concentration of 0.012 mol/L. The main difference of our approach was that we used the CHARMM36 force field instead of the OPLS-AA/L force field and have adjusted our molecular dynamics parameters accordingly. The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to simulate approximately 1 ns per hour.

To analyse the MD simulation we used the Python programming language and the Biotite package as well as GROMACS analysis tools as covar and anaeig. The first analyses are a root-mean-square deviation (RMSD), a root-mean-square fluctuation (RMSF) and a gyration radius analysis. RMSD calculations have been described in the structure prediction section. To compute the RMSF the movement distance of each residue is computed as a root-mean-square over time as: $$ RMSF(t) = \sqrt{ 1/N \sum_i^N (v_i(t) - v_i(0)}, where v(t)i is the position of atom i at time t. The radius of gyration is The final analysis performed on the MD simulation is called Principle Component Analysis (PCA). By applying PCA to a protein it is possible to gain insights into the relevant vibrational motions and thereby the physical mechanism of the protein .

Results

First indicators

The first possible indicators of a stable protein structure are converging RMSD, small RMSF values as well as converging radii of gyration. Using the Python software package and the module Biotite we calculated these quantities and plotted the results for both candidate S_14771 and candidate CASP12.

Figure 7: The RMSD is one of three main indicators of a stable protein structure of the MD simulation of S_14771 over the period of 200,000 ps. As time progressed the RMSD increased with a smaller slope. The value stabilizes at a time of 110,000 ps and fluctuated around the value of 6 Å.

Figure 8: At t = 40,000 ps already the RMSD has arived at a stable value, while at the same time the gyration (fig x) radius decreases over time continuously. This information suggests the protein might be folding and potentially develpoing secondary structures not present previously.

Figure 9: The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.

Figure 10: As from t = 40,000 ps the radius of gyration decreases constantly. At the end of the simulation the gyration radius reaches a value of 17 Å. This behavior indicates folding of the protein structure.

Figure 11: The fluctuations (RMSF) of most residues appear insignificant compared to the first, the last residues and the residues close to residue 110 . Typically the N- and C-terminus tend to fluctuate more intensively due to the lack of stabilizing structures. The prominent fluctuations in the range of residue 105 to 115 can indicate a binding site or another form of functional structure.

Figure 12: The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.


Typical RMSDs and radii of gyration converge towards a value dependent on the size of the protein. Convergence of those quantities can be interpreted as a stable state of the protein structure. As it can be seen in Figures x and y both the RMSD and the radius of gyration stabilize at the same time as the simulation reaches 110,000 ps (110 ns), suggesting a now stabilized structure of candidate S_14771 solvated in water. Another indicator of a functional protein is the RMSF. Instead of being averaged over all atoms, the RMSF is averaged over time with respect to each amino acid. It provides insights in both protein stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to be significantly higher than that of other residues. This hints at the presence of a functional unit along these residues. As commented on in the section describing our structure prediction approaches, the N- and C-terminal regions tend to fluctuate more strongly as a result of the absence of stabilizing structures.

RMSD and gyration of radius calculations of candidate CASP12 (figures x and y) provide evidence of folding. However, the RMSF values show values significantly higher, an effect possibly caused by instability or refolding. Nevertheless, the strongest fluctuations, disregarding the terminal regions, can be seen in the region of residue 105 to 115. This insight consolidates the theory that residues 105 to 115 might be a part of a functional unit.

We were unsure whether candidate CASP12 can be considered a plausible structure and how to interpret the findings concerning the prominent fluctuations. Therefore, we decided to perform a Principle Component Analysis.

Principle Component Analysis

To analyze our system further Principle Component Analysis (PCA) was performed using GROMACS.

Animation 4: A Principle Component Analysis of a fast (blue) and a slow (red) mode showing the most prominent movements of the Cα-chain of candidate S_14771. Both modes show movement of the β6/β7 loop consisting of residues 105 to 115 towards the active site . Thus we can assume that the closing β6/β7 loop is involved in the reaction mechanism.

Animation 5: The modes of candidate CASP appear similar to each other and no strong single movement can be specified. This makes the slow (red) and fast (blue) mode indistinguishable from one another. Moreover the active site amino acids do not appear to be in close proximity, which would make a reaction catalyzed by candidate CASP12 impossible.

The results from the Principle Component Analysis of candidate S_14771 (animation xy) show a movement of the residues 105 to 115 towards the active site, supporting our theory that residues 105 to 115 are important for the reaction mechanism. Since the slow mode (red), which shows the most relevant movement of the sortase, moves further towards the active site, it is possible that the β6/β7 loop either closes the binding site of the ligand peptides or even transports one peptide towards the other.

Animation xyz shows the results of the Principle Component Analysis of candidate CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to be moving randomly with no directed movement. In addition the active site amino acids are spread across the protein confirming our assumption that the protein is not in a stable or plausible conformation.

Conclusion

We gained evidence that at least on of our Sortase A7M models is a valid and stable candidate by performing various methods to analyse the structural stability and validity of our two Sortase A7M candidates. The candidate S_14771 that was generated using RosettaCM appears to be a fitting candidate not only due to successful analyses, but also since the residues of the active site are close enough to each other to catalyze a ligation reaction. Our model created through deep learning excelled only in terms of RMSD and gyration radius calculations. Not only the RMSF and Principle Component Analysis but also the conformation of the active site have proven candidate CASP12 to be of no use for further calculations as it does not portray a valid conformation of Sortase A7M.

References

  1. Apol, E. et. al. GROMACS USER MANUAL. Department of Biophysical Chemistry, University of Groningen. 2015. [1]

Now that the binding site of the Sortase had been found, the peptide ligand needed to be inserted into the binding site to create a peptide-protein complex. The procedure of choice for the introduction of a ligand into the binding site of a protein is called docking. In the following sections, we will present the protocol and methods we used as well as the results they yielded.

Background

Enzymes are one of the most relevant macromolecules in biology. Their functionality is determined through the way they interact with their ligands. Although enzymes are highly specific concerning the ligands they interact with, similar compounds can often bind to the same enzyme albeit with different affinity. To determine the best possible binding conformation of the protein-ligand complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons software package.

Procedure

The ab-initio FlexPepDock protocol consists of multiple steps and is documented on the RosettaCommons online documentation. We modified the protocol as the one provided did not work with our approach. The modified protocol has the following form:

  1. secondary structure determination
  2. complex creation
  3. FlexPepDock refinement

To determine the secondary structure of the peptide, fragment files (3- and 5-mers) had to be generated and a PSIPRED secondary structure prediction had to be performed. As the peptides had a sequence length less than 20 amino acids, we were not able to use the online services such as Robetta and the PSIPRED online service. Instead we used the Rosetta FragmentPicker application and the PSIPRED command line tool. The generated structures serve as the input for the refinement protocol.
The generation of the peptide-protein complex can be divided into three steps:

  • peptide creation
  • peptide relaxation
  • coarse complex creation

The peptide structure was created through ab-initio modeling. Initial creation of the peptide was followed by insertion of the peptide into the sortase binding site. This lead to a coarse model of the peptide sortase complex. Here we used insight gained from the molecular dynamics simulation to place the peptide close to the binding site.
In the final step the FlexPepDock refinement protocol is executed and 50,000 complex structures are generated. We used the inputs as described in {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
To get a better overview over our data we performed a clustering in python, using the scikit-learn package. We clustered the structures with respect to:

  • total score: the total score of the docking provided by the Rosetta scoring function
  • interface score: the sum of the energy of the residues in the interfacing region
  • reweighted score: a score calculated by double weighting the contribution of the residues in the interfacing region
  • root-mean-square deviation: the root-mean-square deviation of the peptides in relation to the structure with the highest score
  • peptide direction: the direction the peptide is facing

Here clustering is used to group the docking results and thereby descrease the samlple size. From the 50,000 results we picked the results with the 500 best total scores, the 500 best interface scores and the 500 best reweighted scores. As we aimed to create an unbiased set for clustering, the abscence of duplicates in the set was ensured. We decreased the sample size to 100 groups representing the best scoring structures from the three categories.

Results

For sequences MGGGGPPPPPP(M-polyG), GGGGPPPPPP(polyG) and PPPPPPLPETGG(LPETGG) 50,000 structures have been created and clustered. After the clustering the sample consisted of 100 structures of docked complexes.

Figure 13: The three best scoring structures (total score, interface score, reweighted score) of the LPETGG-tag are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. The reacting section of the LPETGG-tag namely glycine is colored yellow as is the active site. The glycin of both ligand peptides is facing the active site.

Analysis of the scores has shown a similar score for all the three dockings. The best scoring results of the LPETGG docking show a tendency of the glycines to face the active site while also being in close proximity to the active site.

Figure 14: The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Instead of facing the active site (yellow) the reacting glycines (yellow) appear to interact with the β6/β7 loop of the sortase.

Figure 15: The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Concerning the M-poly-G peptide no uniform directional orientation can be observed. The structure with the best interface score (light blue) is oriendted towards the loop while the structure with the best total/reweighted (dark blue) is oriented towards the β-sheets.

Figure lpetgg shows the docking result of the LPETGG peptide to the sortase. The results shown are the best scoring structures of the clustering with respect to the total score, interface score and reweighted score. As the best scoring structure is the same for the total score and the reweighted score only two peptides are shown. This also applies to figures x and y. For both results the reacting glycin residues (yellow) are facing the active site. Additionally, the same residues are in close proximity to the active site.

The figures x ad y show the docking of the both polyG and M-polyG. While polyG results align well and seem to be interacting with the β6/β7 loop rather than with the active site, this does not seem to be the case for M-polyG. Instead of both structures interacting with the β6/β7 loop or active site one (best interaction score; dark blue) interacts with the β6/β7 loop and the other (best reweighted/total score; light blue-gray) appears to interact with the active site.

Figure 16: The close up of the M-polyG peptide (best total/reweighted score) indicates an interaction of methionine with arginine139 and cysteine126.

Figure 17: Methionine of the result with the best interface score interacted with the β6/β7 loop rather than the active site. Still the reactive glycine residues appear to be bound to the β6/β7 loop.

As can be seen in figure 16 visualizing the result of the the docking simulation total/reweighted score) suggests an interaction of methionine and two of the active sites namely arginine139 and cysteine126. Visualizing the result of the according docking simulation, as can be seen in figure 16, suggests an interaction between methionine and two active site residues, namely arginine139 and cysteine126. Figure 17 shows the interaction of M-polyG with the β6/β7 loop. The glycines still interact with the β6/β7 loop. Instead of binding above the β6/β7 loop, which is the case for polyG as illustrated in fig z, the interaction seems to be influenced by methionine. By interacting with the residues in the β-helix methionine could potentially hinder binding of glycine to the β6/β7 loop by partial immobilization of the peptide. Overall peptide binding and orientation is less uniform compared polyG without the leading methionine, which could be an indicator of lesser binding affinity of M-PolyG towards the β6/β7 loop.

Conclusion

To computationally investigate binding affinities of the polyG and M-polyG as well as the LPETGG tags we performed docking simulations using the Rosetta FlexPepDock application. We used a modified version of the recommended protocol as the modified version was easier to automate and served our purpose better than the standard protocol. From the calculated scores only, we could not see a difference in binding affinities. Thus, we inspected the best scoring structures regarding the total score, the interface score and the reweighted score using PyMOL. Since the best structures with respect to total score and reweighted score were the same for all simulations, only two structures have been inspected per run. A polyproline tag was appended to all the peptides to simulate the modification of the VLPs with a small peptide.

As expected, the results showed that for LPETGG, the glycines of both peptides oriented towards the active site. This is unsurprising as peptides with the sequence LPXTGG are known to be substrate of the Sortase. It was more surprising to see the polyG tag oriented away from the active site since polyG also is a known substrate of the sortase. Both polyG peptides were facing the β6/β7 loop (residues 105 to 115) uniformly and appeared to be interacting with it. The M-polyG peptides did not show a uniform orientation or interaction scheme. On one hand the visualization of the best result concerning the total and reweighted score has shown interaction of methionine with the cysteine126 and arginine139, two residues of the active site. On the other hand, the visualization of the best result with respect to the interface score shows the M-polyG facing the mobile β6/β7 loop. In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide is pulled down below the β6/β7 loop by the methionine interacting with one of the β-sheets leading to the active site. This is not the case with the polgG results, which lie aligned in one plane with the β6/β7 loop.

ª{•̃̾_•̃̾}ª
ԅ(‾⌣‾ԅ)
Logo
Logo