Team:Calgary/BOT

Page Template

SOFTWARE

BioBrick Optimization Tool

BOT

BioBrick Optimization Tool

A software tool to help iGEM teams optimize hard-to-synthesize sequences for expression and synthesis.

Results

How good is BOT?

With BOT's advanced SPEA2 algorithm[1], BOT is capable of optimizing sequences for both expression and ease-of-synthesis. As evidenced by its ability to reduce IDT Gene Fragment Synthesis scores from well over a hundred to 7. A task impossible by manual labour.

Background

Why do we want to improve codon optimization

This year, one of the team's enzymes was nearly impossible to optimize for both expression and synthesis. Tools to optimize for expression are plenty, but tools to optimize for synthesis are nowhere to be found. This led to frustration as two weeks of work were wasted on painstakingly iterative work. Thus the idea of a new tool for synthetic biology was born.

Codon optimization is a standard problem in synthetic biology. To produce a protein in a given host, we not only need to think about restriction sites, and how it might get folded in the host, we also want a high level of production in the host. Getting that high production is done through codon optimization. The genes in which the codons are optimized are great and all, but sometimes you get unlucky and your sequence cannot be synthesized. You have to deal with repeats, gc richness, hairpins and other factors that reduce the ability to synthesize. Removing all these features is tedious work.

So we created a tool that removes repeats, keeps GC richness below a certain percentage and removes hairpins. Unfortunately, someone already created one that did it. It wasn’t good, it is slow, it is very difficult to use. And worked in a format few people can use. So the plan for this project switched from creating the entire algorithm, to optimizing an algorithm and making it easy to use.

How

Steps in the approach

So we divided the problem into two parts.

  1. Change the code so anyone can use it.
  2. Modify the algorithm to make it faster and/or more performant and/or add features.

Easier to Use

Using Django

To do the first one we had to modify it to be usable. We decided it was best to create a website to do this. Because codon-harmony by Brian Weitzner is coded in Python we decided to use Django, an API that lets us use python code and HTML at the same time. The first step in this process was to modify the code to accept arguments. In its current state, the code could only take command line arguments which is very user-unfriendly. We decided to let people provide the arguments as a dictionary by using dataclasses in Python. This way scripts could easily be set-up to run codon-harmony. After this is done it is possible to use the code easily in the website, and do checks on it beforehand. One of the major flaws of the program is that it doesn’t recognize if a file is in a FASTA file or not. So we made the changes so it warns the user it is not a FASTA, then asks them if they want their project modified into a fasta.

Second optimize the program. Time vs Space is a classic conundrum in bioinformatics. If we go for pure speed, it is likely we use too much memory, especially for large sequences. But if we go towards using less memory, it will be a very slow program. There are multitudes of techniques to optimize for one or the other, or both. We will try our best.