Difference between revisions of "Team:Rice/Software"

Line 110: Line 110:
 
         }
 
         }
  
         @media screen and (orientation: portrait;) {
+
         @media screen and (orientation: portrait; ) {
 
             .column {
 
             .column {
 
                 width: 100%;
 
                 width: 100%;
Line 127: Line 127:
  
 
         <div class="column" style="background-color: #d2e8c5!important; height: 65vh;">
 
         <div class="column" style="background-color: #d2e8c5!important; height: 65vh;">
             <h2>Basis for a Genetic Algorithm</h2>
+
             <h2>Software Innovation</h2>
 
             <p>
 
             <p>
                 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed id porta lacus. Proin vel urna nec ex
+
                 The software designed by the 2019 Rice iGEM team uses a genetic algorithm to optimize RNA thermometers
                 rhoncus tristique at ut felis. Integer pharetra ipsum sapien, a bibendum nisi facilisis egestas. Nam
+
                 for a given temperature range by maximizing the base-pairing and secondary structure change within said
                 molestie urna quis porttitor vulputate. Cras nibh leo, fringilla eget luctus et, rutrum ut risus.
+
                range. A Python 3 library called Distributed Evolutionary Algorithms in Python (DEAP) provides the
                 Aliquam finibus nulla eu nunc luctus vulputate. Duis euismod pellentesque dictum. Curabitur blandit diam
+
                 components to build the genetic algorithm. Another library named Scalable Concurrent Operations in
                 et porttitor sollicitudin. In hac habitasse platea dictumst.
+
                Python (SCOOP) parallelizes evaluation of potential candidates, allowing for quick turnover in candidate
 +
                 selection. NuPack is used to evaluate base pairing and secondary structure. The advantage of our program
 +
                 over traditional design processes lies in its speed and automated nature. It allows for the creation and
 +
                testing of a library of RNA thermometers optimized for a custom temperature range without requiring
 +
                access to sophisticated technical expertise. For iGEM teams that require temperature-dependent
 +
                components not present in the existing literature, this would be an expeditious way to provide wetlab
 +
                candidates for testing.
 
             </p>
 
             </p>
 
         </div>
 
         </div>
         <div class="column" style="background-color: #ffffff!important; height: 65vh;">
+
         <<!-- div class="column" style="background-color: #ffffff!important; height: 65vh;">
 
             <img class="img-responsive" style="width:100%; padding:10px;"
 
             <img class="img-responsive" style="width:100%; padding:10px;"
 
                 src="https://static.igem.org/mediawiki/2019/1/18/T--Rice--basic_heat_response.png" />
 
                 src="https://static.igem.org/mediawiki/2019/1/18/T--Rice--basic_heat_response.png" />
        </div>
+
    </div> -->
        <div class="row" style="background-color:#9bc4cf!important; padding-top: 5vh;">
+
    <div class="row" style="background-color:#9bc4cf!important; padding-top: 5vh;">
            <h2>Percentage base pair optimization algorithm</h2>
+
        <h2>General procedure used by the software</h2>
            <ol>
+
        <ol>
                <li>
+
            <li>
                    <p>
+
                <p>
                        For each base of the complement of the context containing the RBS, create three other
+
                    For each base of the complement of the context containing the RBS, create three other
                        permutations which have that base mutated. All of these permutations combined form the initial
+
                    permutations which have that base mutated. All of these permutations combined form the initial
                        population. The baseline is defined as the full sequence containing the context before the
+
                    population. The baseline is defined as the full sequence containing the context before the
                        variable region, the variable region which is the complement of the RBS-containing context, and
+
                    variable region, the variable region which is the complement of the RBS-containing context, and
                        the RBS-containing context.
+
                    the RBS-containing context.
                    </p>
+
                </p>
                </li>
+
            </li>
                <li>
+
            <li>
                    <p>
+
                <p>
                        Calculate the base pairing probabilities at the two given temperatures for each test sequence
+
                    Running the NuPack command
                        using the NuPack command <code>pairs -T TEMP -pseudo -material rna sequencename.</code>
+
                    <code>complexes -T + <b>TEMPERATURE</b>°C + -material rna -pairs -mfe -quiet</code> on an input
                     </p>
+
                     file containing the sequence to be tested produces a umber of files which contain the
                </li>
+
                    base-pairing probabilities and minimum free energy secondary structures of the given sequence.
                <li>
+
                </p>
                    <p>
+
            </li>
                        The output of the command consists in part of a list of every base and the base(s) which it has
+
        </ol>
                        a > 0.001 probability of base pairing with by position number. Find the base pairings where one
+
        <h2>Percentage base pair optimization algorithm</h2>
                        of the bases is in the RBS-containing region and add up the probabilities corresponding to that
+
        <ol>
                        base pairing. The nature of this NuPack command should prevent duplicates. Then, subtract the
+
            <li>
                        probability that these bases are unpaired.
+
                <p>
                    </p>
+
                    The <code>.ocx-ppairs</code> file outputted by the command above consists in part of a list of
                </li>
+
                    every base and the base(s) which it has a > 0.001 probability of base pairing with by position
                <li>
+
                    number. Find the base pairings where one of the bases is in the RBS-containing region and add up
                    <p>
+
                    the probabilities corresponding to that base pairing. The nature of this NuPack command should
                        Subtract the number of base pairings at 30°C from the number of base pairings at 25°C. We want
+
                    prevent duplicates. Then, subtract the probability that these bases are unpaired.
                        to maximize the number of RBS-base pairings that disappear as the temperature increases from
+
                </p>
                        25°C to 30°C (for reference, the command I use for the minimum free energy calculation is
+
            </li>
                        <code>mfe -T TEMP -pseudo -material rna sequencename</code>)
+
            <li>
                     </p>
+
                <p>
                </li>
+
                    Subtract the number of base pairings at the higher temperature from the number of base pairings
                <li>
+
                    at the lower temperature. We want to maximize the number of RBS-base pairings that disappear as
                    <p>
+
                    the temperature increases from 25°C to 30°C. The resulting number is the base-pairing fitness
                        Using the library DEAP, select 50 individuals through the tournament selection method.
+
                    value.
                     </p>
+
                </p>
                </li>
+
            </li>
            </ol>
+
        </ol>
        </div>
+
 
 +
        <h2>Secondary structure change optimization algorithm</h2>
 +
        <ol>
 +
            <li>
 +
                <p>
 +
                    The <code>.ocx-mfe</code> file outputted by the command above contains the dot-parentheses
 +
                     representation of the minimum free energy secondary structure of the given sequence at the given
 +
                    temperature.
 +
                </p>
 +
            </li>
 +
            <li>
 +
                <p>
 +
                    Find the Levenshtein distance between the strings representing the dot-parentheses representation at
 +
                    the two different temperatures. The Levenshtein distance measures the number of changes needed to
 +
                     transform one string into another. The resulting numerical value will serve as a proxy for the
 +
                    degree of change in secondary structure between the two temperatures.
 +
                </p>
 +
            </li>
 +
        </ol>
 +
    </div>
  
 
     </div>
 
     </div>

Revision as of 22:48, 16 October 2019


Software Innovation

The software designed by the 2019 Rice iGEM team uses a genetic algorithm to optimize RNA thermometers for a given temperature range by maximizing the base-pairing and secondary structure change within said range. A Python 3 library called Distributed Evolutionary Algorithms in Python (DEAP) provides the components to build the genetic algorithm. Another library named Scalable Concurrent Operations in Python (SCOOP) parallelizes evaluation of potential candidates, allowing for quick turnover in candidate selection. NuPack is used to evaluate base pairing and secondary structure. The advantage of our program over traditional design processes lies in its speed and automated nature. It allows for the creation and testing of a library of RNA thermometers optimized for a custom temperature range without requiring access to sophisticated technical expertise. For iGEM teams that require temperature-dependent components not present in the existing literature, this would be an expeditious way to provide wetlab candidates for testing.

<

General procedure used by the software

  1. For each base of the complement of the context containing the RBS, create three other permutations which have that base mutated. All of these permutations combined form the initial population. The baseline is defined as the full sequence containing the context before the variable region, the variable region which is the complement of the RBS-containing context, and the RBS-containing context.

  2. Running the NuPack command complexes -T + TEMPERATURE°C + -material rna -pairs -mfe -quiet on an input file containing the sequence to be tested produces a umber of files which contain the base-pairing probabilities and minimum free energy secondary structures of the given sequence.

Percentage base pair optimization algorithm

  1. The .ocx-ppairs file outputted by the command above consists in part of a list of every base and the base(s) which it has a > 0.001 probability of base pairing with by position number. Find the base pairings where one of the bases is in the RBS-containing region and add up the probabilities corresponding to that base pairing. The nature of this NuPack command should prevent duplicates. Then, subtract the probability that these bases are unpaired.

  2. Subtract the number of base pairings at the higher temperature from the number of base pairings at the lower temperature. We want to maximize the number of RBS-base pairings that disappear as the temperature increases from 25°C to 30°C. The resulting number is the base-pairing fitness value.

Secondary structure change optimization algorithm

  1. The .ocx-mfe file outputted by the command above contains the dot-parentheses representation of the minimum free energy secondary structure of the given sequence at the given temperature.

  2. Find the Levenshtein distance between the strings representing the dot-parentheses representation at the two different temperatures. The Levenshtein distance measures the number of changes needed to transform one string into another. The resulting numerical value will serve as a proxy for the degree of change in secondary structure between the two temperatures.