Difference between revisions of "Team:Calgary/BOT"

 
(10 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
<body>
 
<body>
  
<div class="container-fluid">
+
    <div class="container-fluid">
 +
<div class="mobile-banner-back" id="banner">
 +
                <div class="page-banner">
 +
                    <h2 class="page-subtitle">Section &nbsp;&nbsp;/&nbsp;&nbsp; <span class="emphasis">Page</span></h2>
 +
                    <h2 class="toggle-button">+ Press for Menu</h2>
 +
                </div>
 +
            </div>
 +
        <div class="fixed" id="fixed-content">
  
<div class = "fixed" id="fixed-content">
 
 
  
<div class="mobile-banner-back" id="banner">
+
            <div class="section-menu section-menu-up" id="section-menu">
<div class="page-banner">
+
                <div class="sections" id="sections">
<h2 class="page-subtitle">Section &nbsp;&nbsp;/&nbsp;&nbsp; <span class="emphasis">Page</span></h2>
+
                </div>
<img src="Navigation Section.svg">
+
                <div class="back-to-top">
</div>
+
                </div>
</div>
+
            </div>
 +
        </div>
  
<div class="progress-container">
 
<progress value="0" max="100" id="bar"></progress>
 
</div>
 
  
<div class="section-menu section-menu-up" id="section-menu">
+
        <div class="desktop-banner-back">
<div class="sections" id="sections">
+
            <div class="text-area">
</div>
+
                <div class="page-banner">
<div class="back-to-top">
+
                    <h2 class="page-subtitle">Software</h2>
<a class="goto-top" href="#">Back to Top</a>
+
                    <h1 class="page-title">BioBrick Optimization Tool</h1>
</div>
+
                </div>
</div>
+
            </div>
</div>
+
            <div class="overlap-area" id="overlap"></div>
 +
        </div>
  
<div class="desktop-banner-back">
+
        <div class="interface-group" id="interface">
<div class="text-area">
+
            <div class="menu-container" id="menu-container">
<div class="page-banner">
+
                <div class="desktop-section-menu" id="desktop-section-menu">
<h2 class="page-subtitle">SOFTWARE</h2>
+
                    <div class="sections" id="desktop-sections">
<h1 class="page-title">BioBrick Optimization Tool</h1>
+
                    </div>
</div>
+
                    <div class="back-to-top" id="go-top">
</div>
+
                    </div>
<div class="overlap-area" id="overlap"></div>
+
                </div>
</div>
+
            </div>
  
<div class="interface-group">
+
            <div class="content-area" id="textual-content">
<div class="desktop-section-menu" id="desktop-section-menu">
+
                <div class="header-area">
<div class="sections" id="desktop-sections">
+
                    <img src="https://static.igem.org/mediawiki/2019/7/78/T--Calgary--BOTs-logo.png" width="50%" style="margin-left: auto; margin-right: auto; display: block;" />
</div>
+
                    <h2 style="text-align:center;">BioBrick Optimization Tool - synthesis</h2>
<div class="back-to-top" id="go-top">
+
                </div>
<a class="goto-top" href="#">Back to Top</a>
+
                <p>
</div>
+
                    A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC
</div>
+
                    <ul style="padding-left:10vw;">
<div class="content-area" id="textual-content">
+
                        <li>Remove Repeats</li>
<div class="header-area">
+
                        <li>Reduce GC Content</li>
<img src="https://static.igem.org/mediawiki/2019/7/78/T--Calgary--BOTs-logo.png" width="50%" style="margin-left: auto; margin-right: auto; display: block;"/>
+
                        <li>Remove Hairpins</li>
<h2 style="text-align:center;">BioBrick Optimization Tool - synthesis</h2>
+
                        <li>And More!</li>
</div>
+
                    </ul>
<p>A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC
+
                    <div class="header-area">
<ul>
+
                        <h1>Results</h1>
<li>Remove Repeats</li>
+
                        <h2>How good is BOT?</h2>
<li>Reduce GC Content</li>
+
                    </div>
<li>Remove Hairpins</li>
+
                <p>With BOTs' advanced SPEA2 algorithm (Coell et al, 2007), BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.</p>
<li>And More!</li>
+
                <img src="https://static.igem.org/mediawiki/2019/d/db/T--Calgary--BOTs-120.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
</ul>
+
                <img src="https://static.igem.org/mediawiki/2019/5/55/T--Calgary--BOTs-Score-7.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
<div class="header-area">
+
                <p>This score for <a target="blank" href="https://2019.igem.org/Team:Calgary/RepurposingChlorophyll"> SGR, an enzyme in the degradation pathway</a> was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.</p>
<h1>Results</h1>
+
                <div class="header-area">
<h2>How good is BOT?</h2>
+
                    <h1>Background</h1>
</div>
+
                    <h2>Why do we want to improve codon optimization</h2>
<p>With BOTs' advanced SPEA2 algorithm[1], BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.</p>  
+
                </div>
<img src="https://static.igem.org/mediawiki/2019/d/db/T--Calgary--BOTs-120.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
+
<img src="https://static.igem.org/mediawiki/2019/5/55/T--Calgary--BOTs-Score-7.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
+
<p>This score for <a target="blank" href="https://2019.igem.org/Team:Calgary/RepurposingChlorophyll"> SGR, an enzyme in the degradation pathway</a> was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.</p>
+
<div class="header-area">
+
<h1>Background</h1>
+
<h2>Why do we want to improve codon optimization</h2>
+
</div>
+
  
<p>This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a new tool for synthetic biology was born.</p>
+
                <p>This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a novel tool for synthetic biology was born.</p>
  
<p>Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. The genes in which the codons are optimized are great and all, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work. </p>
+
                <p>Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. Expression optimization is important, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work as removing one may cause the others. </p>
  
 
  
<div class="header-area">
 
<h1>How</h1>
 
<h2>Steps in the approach</h2>
 
</div>
 
  
<p>So we divided the problem into two parts.
+
                <div class="header-area">
<ol><li>Change the code so anyone can use it.</li>
+
                    <h1>How</h1>
<li>Modify the algorithm to make it faster and/or more performant and/or add features.</li></ol>
+
                    <h2>Steps in the approach</h2>
</p>
+
                </div>
  
+
                <p>
 +
                    So we divided the problem into two parts.
 +
                    <ol>
 +
                        <li>Finding current solutions</li>
 +
                        <li>High Useability</li>
 +
                        <li>High Performance Code</li>
 +
                    </ol>
 +
                </p>
 +
                <div class="header-area">
 +
                    <h1>Current solutions</h1>
 +
                    <h2>codon-harmony</h2>
 +
                </div>
 +
                <p>
 +
                    A first-glance into current solutions yielded very few results. Many tools that could optimize sequences for expression, some that could remove known undesirable features, but only one, codon-harmony by Brian Weitzner claimed to be able to do this. Whilst extremely hard to use, it did run, that's not to say it functioned well. Scores would often increase. Furthermore, it was command-line which made using the code 10x harder.
 +
                </p>
  
<div class="header-area">
 
<h1>Easier to Use</h1>
 
<h2>Using Django</h2>
 
</div>
 
<p>To do the first one we had to modify it to be usable. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it warns the user it is not a FASTA, then asks them if they want their project modified into a fasta.</p>
 
<p>Second optimize the program. Time vs Space is a classic conundrum in bioinformatics. If we go for pure speed, it is likely we use too much memory, especially for large sequences. But if we go towards using less memory, it will be a very slow program. There are multitudes of techniques to optimize for one or the other, or both. We will try our best.</p>
 
  
 +
                <div class="header-area">
 +
                    <h1>Useability</h1>
 +
                    <h2>Using Django</h2>
 +
                </div>
 +
                <p>To make it useable, we had to steer away from command-lines. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments that are not command-line. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it doesn't accept non-FASTA files and throws a warning instead of outputting empty sequences.</p>
 +
                <img src="https://static.igem.org/mediawiki/2019/e/e0/T--Calgary--BOTs-GUI1.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
 +
                <img src="https://static.igem.org/mediawiki/2019/7/7e/T--Calgary--BOTs-GUI2.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
 +
                <img src="https://static.igem.org/mediawiki/2019/5/52/T--Calgary--BOTs-GUI3.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
 +
                <p>Whilst codon-harmony can currently only functionally run locally, we hope to have it deployed fully by the Jamboree, that way all iGEMers can use BOTs in the future.</p>
  
<div class="header-area">
+
                <div class="header-area">
<h1>Better Program</h1>
+
                    <h1>Better Program</h1>
<h2>Genetic Algorithms</h2>
+
                    <h2>Genetic Algorithms</h2>
</div>
+
                </div>
<p>So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins. Unfortunately, someone already created one that did it. It wasn’t good, it is slow, it is very difficult to use. And worked in a format few people can use. So the plan for this project switched from creating the entire algorithm, to optimizing an algorithm and making it easy to use.</p>
+
                <p>
 +
                    Albeit codon-harmony claimed to be able to optimize for synthesis, testing showed that not only it couldn't optimize,
 +
                    it often made the sequences worse. This was due to the fact that codon-harmony was iterative, meaning it would optimize
 +
                    a sequence one function at a time, which could then be undone by the next function.
 +
                </p>
 +
                <p>So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins.</p>
  
<p>Unfortunately, just modifying the code didn't work, the way the old codon-harmony worked is to iteratively change every aspect of the sequence one at a time, meaning that all progress gained from removing repeats, may be destroyed from trying to reduce GC content, rendering the program completely useless for complicated sequences</p>
+
                <p>Initial design was to use a simple <a class = "abody" href="https://2019.igem.org/Team:Calgary/Appendix#geneticalgorithm" target="_blank">genetic algorithm</a>. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to removing all of them. The genetic algorithm only operates on identification, not actually solving the problem algorithmicly. Solving the problem is left to evolution, ie. random events.</p>
  
<p>With that setback, a new way of looking at optimization was required. Initial fault was given to the individual functions, which weren't perfect, but were not to blame. To change the overall program, an idea was formed to just use a genetic algorithm. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to changing a sequence until there is none. <<read up on NP-Complete>> The genetic algorithm only operates on identification, not actually solving the problem. Solving the problem is left to evolution.<<Read up on how genetic algorithms work>> </p>
+
                <p>
 
+
                    This did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign, assigning weights can only be done after extensive research into the solution space.
<p>A first genetic algorithm was created to do this. It did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign. Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm. Considered on of the best evolutionary algorithm, SPEA2 is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better. So if you are building a genetic algorithm, SPEA2 is your friend.
+
                    Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm.
</p>
+
                    Based upon of the best evolutionary algorithm, <a class = "abody" href="https://2019.igem.org/Team:Calgary/Appendix#SPEA2" target="_blank">SPEA2</a> is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better, at least in a breadth of problem statements.
<div class="header-area">
+
                    So if you are building a genetic algorithm, SPEA2 is your friend.
<h1>Results</h1>
+
                </p>
</div>
+
                <img src="https://static.igem.org/mediawiki/2019/0/02/T--Calgary--SPEA2-Diagram.jpg" width="100%" style="margin-left: auto; margin-right: auto; display: block;" />
</div>
+
                <div class="header-area">
</div>
+
                    <h1>Future Directions</h1>
</div>
+
                </div>
 +
                <p>There are several way we could improve BOTs, namely improve the truncation function, the raw fitness calculation function.
 +
                    Integrate a more compact version of the graph for more effective calculations, as well as using linkedlists rather than arrays as arrays lose their data easier with truncation.
 +
                    Hosting the website on a webpage rather than just having a functional local webpage.
 +
                <div class="header-area">
 +
                    <h1>Access</h1>
 +
                    <h2>GitHub</h2>
 +
                </div>   
 +
<a target="_blank" href="https://github.com/iGEMCalgary/BOTs" class="abody">GitHub Link for BOTs</a>             
 +
<div class="header-area">
 +
                    <h1>References</h1>
 +
                </div>
 +
                <p class="hangingindent">Coello, C. A. C., Lamont, G. B., & Veldhuizen, D. A. V. (2007). Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition. Boston (MA): Springer.</p>
 +
            </div>
 +
        </div>
 +
    </div>
  
 
</body>
 
</body>
 
</html>
 
</html>
 
{{Calgary/Footer}}
 
{{Calgary/Footer}}

Latest revision as of 03:52, 14 December 2019

Software

BioBrick Optimization Tool

BioBrick Optimization Tool - synthesis

A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC

  • Remove Repeats
  • Reduce GC Content
  • Remove Hairpins
  • And More!

Results

How good is BOT?

With BOTs' advanced SPEA2 algorithm (Coell et al, 2007), BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.

This score for SGR, an enzyme in the degradation pathway was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.

Background

Why do we want to improve codon optimization

This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a novel tool for synthetic biology was born.

Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. Expression optimization is important, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work as removing one may cause the others.

How

Steps in the approach

So we divided the problem into two parts.

  1. Finding current solutions
  2. High Useability
  3. High Performance Code

Current solutions

codon-harmony

A first-glance into current solutions yielded very few results. Many tools that could optimize sequences for expression, some that could remove known undesirable features, but only one, codon-harmony by Brian Weitzner claimed to be able to do this. Whilst extremely hard to use, it did run, that's not to say it functioned well. Scores would often increase. Furthermore, it was command-line which made using the code 10x harder.

Useability

Using Django

To make it useable, we had to steer away from command-lines. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments that are not command-line. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it doesn't accept non-FASTA files and throws a warning instead of outputting empty sequences.

Whilst codon-harmony can currently only functionally run locally, we hope to have it deployed fully by the Jamboree, that way all iGEMers can use BOTs in the future.

Better Program

Genetic Algorithms

Albeit codon-harmony claimed to be able to optimize for synthesis, testing showed that not only it couldn't optimize, it often made the sequences worse. This was due to the fact that codon-harmony was iterative, meaning it would optimize a sequence one function at a time, which could then be undone by the next function.

So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins.

Initial design was to use a simple genetic algorithm. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to removing all of them. The genetic algorithm only operates on identification, not actually solving the problem algorithmicly. Solving the problem is left to evolution, ie. random events.

This did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign, assigning weights can only be done after extensive research into the solution space. Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm. Based upon of the best evolutionary algorithm, SPEA2 is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better, at least in a breadth of problem statements. So if you are building a genetic algorithm, SPEA2 is your friend.

Future Directions

There are several way we could improve BOTs, namely improve the truncation function, the raw fitness calculation function. Integrate a more compact version of the graph for more effective calculations, as well as using linkedlists rather than arrays as arrays lose their data easier with truncation. Hosting the website on a webpage rather than just having a functional local webpage.

Access

GitHub

GitHub Link for BOTs

References

Coello, C. A. C., Lamont, G. B., & Veldhuizen, D. A. V. (2007). Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition. Boston (MA): Springer.