Revision as of 10:27, 21 October 2019

BioBrick Optimization Tool - synthesis

A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC

Remove Repeats
Reduce GC Content
Remove Hairpins
And More!

Results

How good is BOT?

With BOTs' advanced SPEA2 algorithm[1], BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.

This score for SGR, an enzyme in the degradation pathway was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.

Background

Why do we want to improve codon optimization

This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a new tool for synthetic biology was born.

Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. The genes in which the codons are optimized are great and all, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work.

How

Steps in the approach

So we divided the problem into two parts.

Change the code so anyone can use it.
Modify the algorithm to make it faster and/or more performant and/or add features.

Easier to Use

Using Django

To do the first one we had to modify it to be usable. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it warns the user it is not a FASTA, then asks them if they want their project modified into a fasta.

Second optimize the program. Time vs Space is a classic conundrum in bioinformatics. If we go for pure speed, it is likely we use too much memory, especially for large sequences. But if we go towards using less memory, it will be a very slow program. There are multitudes of techniques to optimize for one or the other, or both. We will try our best.

Better Program

Genetic Algorithms

So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins. Unfortunately, someone already created one that did it. It wasn’t good, it is slow, it is very difficult to use. And worked in a format few people can use. So the plan for this project switched from creating the entire algorithm, to optimizing an algorithm and making it easy to use.

Unfortunately, just modifying the code didn't work, the way the old codon-harmony worked is to iteratively change every aspect of the sequence one at a time, meaning that all progress gained from removing repeats, may be destroyed from trying to reduce GC content, rendering the program completely useless for complicated sequences

With that setback, a new way of looking at optimization was required. Initial fault was given to the individual functions, which weren't perfect, but were not to blame. To change the overall program, an idea was formed to just use a genetic algorithm. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to changing a sequence until there is none. <> The genetic algorithm only operates on identification, not actually solving the problem. Solving the problem is left to evolution.<>

A first genetic algorithm was created to do this. It did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign. Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm. Considered on of the best evolutionary algorithm, SPEA2 is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better. So if you are building a genetic algorithm, SPEA2 is your friend.

@@ Line 1: / Line 1: @@
 {{:Template:Calgary/Layout}}
 <html>
-<head>
+  <head>
-</head>
+  </head>
-<body>
+  <body>
-	<div class="container-fluid">
+    <div class="container-fluid">
-		<div class = "fixed" id="fixed-content">
+      <div class = "fixed" id="fixed-content">
-			<div class="mobile-banner-back" id="banner">
-				<div class="page-banner">
-					<h2 class="page-subtitle">Section &nbsp;&nbsp;/&nbsp;&nbsp; <span class="emphasis">Page</span></h2>
-					<img src="Navigation Section.svg">
-				</div>
-			</div>
-			<div class="progress-container">
+        <div class="section-menu section-menu-up" id="section-menu">
-				<progress value="0" max="100" id="bar"></progress>
+          <div class="sections" id="sections">
-			</div>
+          </div>
+          <div class="back-to-top">
+          </div>
+        </div>
+      </div>
-			<div class="section-menu section-menu-up" id="section-menu">
-				<div class="sections" id="sections">
-				</div>
-				<div class="back-to-top">
-					<a class="goto-top" href="#">Back to Top</a>
-				</div>
-			</div>
-		</div>
-		<div class="desktop-banner-back">
+      <div class="desktop-banner-back">
-			<div class="text-area">
+        <div class="text-area">
-				<div class="page-banner">
+          <div class="page-banner">
-					<h2 class="page-subtitle">SOFTWARE</h2>
+            <h2 class="page-subtitle">Software</h2>
-					<h1 class="page-title">BioBrick Optimization Tool</h1>
+            <h1 class="page-title">BioBrick Optimization Tool</h1>
-				</div>
+          </div>
-			</div>
+        </div>
-			<div class="overlap-area" id="overlap"></div>
+        <div class="overlap-area" id="overlap"></div>
-		</div>
+      </div>
-		<div class="interface-group">
+      <div class="interface-group" id="interface">
-			<div class="desktop-section-menu" id="desktop-section-menu">
+        <div class="menu-container" id="menu-container">
-				<div class="sections" id="desktop-sections">
+          <div class="desktop-section-menu" id="desktop-section-menu">
-				</div>
+            <div class="sections" id="desktop-sections">
-				<div class="back-to-top" id="go-top">
+            </div>
-					<a class="goto-top" href="#">Back to Top</a>
+            <div class="back-to-top" id="go-top">
-				</div>
+            </div>
-			</div>
+          </div>
-			<div class="content-area" id="textual-content">
+        </div>
-<div class="header-area">
-<img src="https://static.igem.org/mediawiki/2019/7/78/T--Calgary--BOTs-logo.png" width="50%" style="margin-left: auto; margin-right: auto; display: block;"/>
-					<h2 style="text-align:center;">BioBrick Optimization Tool - synthesis</h2>
-				</div>
-<p>A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC
-<ul>
-<li>Remove Repeats</li>
-<li>Reduce GC Content</li>
-<li>Remove Hairpins</li>
-<li>And More!</li>
-</ul>
-<div class="header-area">
-					<h1>Results</h1>
-					<h2>How good is BOT?</h2>
-				</div>
-<p>With BOTs' advanced SPEA2 algorithm[1], BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.</p>
-<img src="https://static.igem.org/mediawiki/2019/d/db/T--Calgary--BOTs-120.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
-<img src="https://static.igem.org/mediawiki/2019/5/55/T--Calgary--BOTs-Score-7.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
-<p>This score for <a target="blank" href="https://2019.igem.org/Team:Calgary/RepurposingChlorophyll"> SGR, an enzyme in the degradation pathway</a> was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.</p>
-				<div class="header-area">
-					<h1>Background</h1>
-					<h2>Why do we want to improve codon optimization</h2>
-				</div>
-				<p>This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a new tool for synthetic biology was born.</p>
+        <div class="content-area" id="textual-content">
+          <div class="header-area">
+            <img src="https://static.igem.org/mediawiki/2019/7/78/T--Calgary--BOTs-logo.png" width="50%" style="margin-left: auto; margin-right: auto; display: block;"/>
+            <h2 style="text-align:center;">BioBrick Optimization Tool - synthesis</h2>
+          </div>
+          <p>A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis. GRAPHIC
+          <ul>
+            <li>Remove Repeats</li>
+            <li>Reduce GC Content</li>
+            <li>Remove Hairpins</li>
+            <li>And More!</li>
+          </ul>
+          <div class="header-area">
+            <h1>Results</h1>
+            <h2>How good is BOT?</h2>
+          </div>
+          <p>With BOTs' advanced SPEA2 algorithm[1], BOTs is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible to do in a reasonable time by manual labour.</p>
+          <img src="https://static.igem.org/mediawiki/2019/d/db/T--Calgary--BOTs-120.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
+          <img src="https://static.igem.org/mediawiki/2019/5/55/T--Calgary--BOTs-Score-7.png" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
+          <p>This score for <a target="blank" href="https://2019.igem.org/Team:Calgary/RepurposingChlorophyll"> SGR, an enzyme in the degradation pathway</a> was achieved in minutes with BOTs, whilst it took two weeks of tireless work to bring it down to this level with manual labour.</p>
+          <div class="header-area">
+            <h1>Background</h1>
+            <h2>Why do we want to improve codon optimization</h2>
+          </div>
-				<p>Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. The genes in which the codons are optimized are great and all, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work. </p>
+          <p>This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a new tool for synthetic biology was born.</p>
+          <p>Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. The genes in which the codons are optimized are great and all, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work. </p>
-				<div class="header-area">
-					<h1>How</h1>
-					<h2>Steps in the approach</h2>
-				</div>
-				<p>So we divided the problem into two parts.
-<ol><li>Change the code so anyone can use it.</li>
-<li>Modify the algorithm to make it faster and/or more performant and/or add features.</li></ol>
-</p>
+          <div class="header-area">
+            <h1>How</h1>
+            <h2>Steps in the approach</h2>
+          </div>
-				<div class="header-area">
+          <p>So we divided the problem into two parts.
-					<h1>Easier to Use</h1>
+          <ol><li>Change the code so anyone can use it.</li>
-					<h2>Using Django</h2>
+            <li>Modify the algorithm to make it faster and/or more performant and/or add features.</li></ol>
-				</div>
+          </p>
-				<p>To do the first one we had to modify it to be usable. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it warns the user it is not a FASTA, then asks them if they want their project modified into a fasta.</p>
-				<p>Second optimize the program. Time vs Space is a classic conundrum in bioinformatics. If we go for pure speed, it is likely we use too much memory, especially for large sequences. But if we go towards using less memory, it will be a very slow program. There are multitudes of techniques to optimize for one or the other, or both. We will try our best.</p>
-				<div class="header-area">
-					<h1>Better Program</h1>
-					<h2>Genetic Algorithms</h2>
-				</div>
-<p>So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins. Unfortunately, someone already created one that did it. It wasn’t good, it is slow, it is very difficult to use. And worked in a format few people can use. So the plan for this project switched from creating the entire algorithm, to optimizing an algorithm and making it easy to use.</p>
-<p>Unfortunately, just modifying the code didn't work, the way the old codon-harmony worked is to iteratively change every aspect of the sequence one at a time, meaning that all progress gained from removing repeats, may be destroyed from trying to reduce GC content, rendering the program completely useless for complicated sequences</p>
+        <div class="header-area">
+          <h1>Easier to Use</h1>
+          <h2>Using Django</h2>
+        </div>
+        <p>To do the first one we had to modify it to be usable. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it warns the user it is not a FASTA, then asks them if they want their project modified into a fasta.</p>
+        <p>Second optimize the program. Time vs Space is a classic conundrum in bioinformatics. If we go for pure speed, it is likely we use too much memory, especially for large sequences. But if we go towards using less memory, it will be a very slow program. There are multitudes of techniques to optimize for one or the other, or both. We will try our best.</p>
-<p>With that setback, a new way of looking at optimization was required. Initial fault was given to the individual functions, which weren't perfect, but were not to blame. To change the overall program, an idea was formed to just use a genetic algorithm. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to changing a sequence until there is none. <<read up on NP-Complete>> The genetic algorithm only operates on identification, not actually solving the problem. Solving the problem is left to evolution.<<Read up on how genetic algorithms work>> </p>
-<p>A first genetic algorithm was created to do this. It did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign. Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm. Considered on of the best evolutionary algorithm, SPEA2 is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better. So if you are building a genetic algorithm, SPEA2 is your friend.
+        <div class="header-area">
-</p>
+          <h1>Better Program</h1>
-				<div class="header-area">
+          <h2>Genetic Algorithms</h2>
-					<h1>Results</h1>
+        </div>
-				</div>
+        <p>So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins. Unfortunately, someone already created one that did it. It wasn’t good, it is slow, it is very difficult to use. And worked in a format few people can use. So the plan for this project switched from creating the entire algorithm, to optimizing an algorithm and making it easy to use.</p>
-<img src="https://static.igem.org/mediawiki/2019/0/02/T--Calgary--SPEA2-Diagram.jpg" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
-			</div>
+        <p>Unfortunately, just modifying the code didn't work, the way the old codon-harmony worked is to iteratively change every aspect of the sequence one at a time, meaning that all progress gained from removing repeats, may be destroyed from trying to reduce GC content, rendering the program completely useless for complicated sequences</p>
-		</div>
-	</div>
-</body>
+        <p>With that setback, a new way of looking at optimization was required. Initial fault was given to the individual functions, which weren't perfect, but were not to blame. To change the overall program, an idea was formed to just use a genetic algorithm. The advantage of a genetic algorithm is that it will operate in a very different computational way. Identifying all the repeats in a sequence is trivial compared to changing a sequence until there is none. <<read up on NP-Complete>> The genetic algorithm only operates on identification, not actually solving the problem. Solving the problem is left to evolution.<<Read up on how genetic algorithms work>> </p>
+        <p>A first genetic algorithm was created to do this. It did not work. It operated on the sum of all the fitness functions, which while valid for many applications, is incorrect for this application as we have no idea what weights to assign. Thus came along the third and final form of BOT, which is an application of the SPEA2 algorithm. Considered on of the best evolutionary algorithm, SPEA2 is used as a benchmark for all new genetic algorithms, and for the most part the best new ones are "comparable to" never really better. So if you are building a genetic algorithm, SPEA2 is your friend.
+        </p>
+        <div class="header-area">
+          <h1>Results</h1>
+        </div>
+        <img src="https://static.igem.org/mediawiki/2019/0/02/T--Calgary--SPEA2-Diagram.jpg" width="100%" style="margin-left: auto; margin-right: auto; display: block;"/>
+      </div>
+    </div>
+    </div>
+  </body>
 </html>
 {{Calgary/Footer}}

Difference between revisions of "Team:Calgary/BOT"

Revision as of 10:27, 21 October 2019

Software

BioBrick Optimization Tool

BioBrick Optimization Tool - synthesis

Results

How good is BOT?

Background

Why do we want to improve codon optimization

How

Steps in the approach

Easier to Use

Using Django

Better Program

Genetic Algorithms

Results