Difference between revisions of "Team:DTU-Denmark/Software"

Revision as of 03:06, 22 October 2019

Software

Comming soon!.

Introduction

In order to generate our synthetic promoters, we built promoter sequence models based on hidden Markov models (HMMs) (see the model page). For handling the many calculations and large volumes of data needed to build these models, we built a user-friendly program that does all the work &mdash {TODO is this an em dash?} we call it proHMMoter! Besides making working the model significantly easier, not to mention faster, it also allowed us to implement all the different features we wanted our promoters to have in a way that is easy to extend and modify for future users. proHMMoter has been written as a command-line program that ties a small suite of scripts together in a single convenient command. Due to the command-line interface as well as consistent naming and location of output files, users can easily embed our software as a tool in a bigger pipeline.

Features

Domestication: proHMMoter outputs promoter sequences that are compatible with iGEM’s Type IIS standards, and incorporates a penalty system that can automatically domesticate for other relevant synthetic biology standards when this can be done without major disruption.
Shortening sequences: proHMMoter can identify evolutionarily non-conserved regions. Accordingly, the program can remove non-essential regions upstream of the actual promoter region, creating shorter promoter sequences. This can be essential when working with higher eukaryotes, as these gene constructs can get very large, and thereby more difficult and expensive to work with.
Non-homology: proHMMoter can generate promoter sequences that in theory should have the same function as the original promoter, but with a drastically different sequence. This can be very useful, since it reduces the chance of unwanted homologous recombination.
Synthesizability: The promoters generated by proHMMoter automatically comply with the complexity rules of major DNA synthesis vendors, making it easy to get your freshly generated promoters synthesised.
Host versatility: proHMMoter can generate consensus sequences representing the promoters in diverse sets of genomes. As long as these genomes are sufficiently related, the promoters should be usable in any of the organisms, and further in silico analysis can help verify this. As a proof of concept, we generated our synthetic Aspergillus niger promoters based on all the genomes within the Aspergillus genus.
Induction of noise: proHHMoter has built-in functionality that allows the user to inject noise of varying amounts into the promoter generation process. This gives the user the opportunity to create artificially weakened promoters, and thereby allows for the creation of promoters with different strengths but broadly similar expression dynamics. For our project, we have injected noise into several of our promoters - and proved that expression was still achieved, in a step towards the aim of building a comprehensive promoter ladder.

How it works

The program then works by first finding all orthologs of the genes you have chosen, using the HMMER tool phmmer to identify orthologs by amino acid sequence, approximately identifying orthologs by selecting the highest-scoring homolog in each genome. When the sequences of the orthologs are found, the program locates them in their respective genomes, and extracts the putative promoters upstream of these genes. These promoter regions are then aligned with the multiple sequence alignment tool MUSCLE[2]. From the multiple sequence alignments we then build the actual HMM, using the HMMER tool hmmbuild. To further refine this model, we perform an expectation maximization step, wherein we align all native promoter sequences to the model using hmmalign, then rebuild the HMM from this alignment while trimming unaligned ends, and iterate until the consensus sequence converges. To ensure minimal promoter lengths, we extract the position-weight matrix underlying the HMM, to use in generating the synthetic promoters. Implementing the linear program described on our model page. in clear, concise code that is easy to extend, we apply constraints such as limitations on homopolymers and GC content to reduce synthesis complexity to levels considered acceptable to major vendors, and domesticate with respect to the features considered in our design, complying with strictly required as well as making a best effort to follow merely desirable synthetic biology standards. This latter part is accomplished via a penalty system, ensuring the amount of sequence conservation (measured in bits, as sequence conservation is essentially a measure of information) sacrificed when domesticating falls below preset levels. With this linear program, we then solve it, finding the sequence which, with respect to sequence conservation, comes as close to the consensus sequence as the constraints allow. We also adjust the distribution implied by the PWM as described on our model page and sample from it, using the linear program to reject non-solutions, thus fairly sampling promoter sequences living up to our defined standards. In our project, we have shown that our promoters work even when a substantial amount of noise has been injected, and that this appropriately resulted in a relatively weak promoter.

More text soon

Soon.

Sources here will also come soon

The logos of our three biggest supporters, DTU Blue Dot, Novo Nordisk fonden and Otto Mønsted fonden

The logos of all of our sponsors, DTU, BioNordica, Eurofins Genomics, Qiagen, NEB New England biolabs, IDT Integrated DNA technologies and Twist bioscience

>

@@ Line 78: / Line 78: @@
 <div class="sm-no-float col-md-8">
-<h2>Comming soon</h2>
+<h2>Introduction</h2>
-<p>Text will also be comming soon</a>.
+<p>In order to generate our synthetic promoters, we built promoter sequence models based on hidden Markov models (HMMs) (see <a target="_blank" href="https://2019.igem.org/Team:DTU-Denmark/Model">the model page</a>). For handling the many calculations and large volumes of data needed to build these models, we built a user-friendly program that does all the work &mdash {TODO is this an em dash?} we call it <code>proHMMoter</code>! Besides making working the model significantly easier, not to mention faster, it also allowed us to implement all the different features we wanted our promoters to have in a way that is easy to extend and modify for future users.
+<code>proHMMoter</code> has been written as a command-line program that ties a small suite of scripts together in a single convenient command. Due to the command-line interface as well as consistent naming and location of output files, users can easily embed our software as a tool in a bigger pipeline.
 </p>
@@ Line 100: / Line 101: @@
-<h2>More text soon</h2>
+<h2>Features</h2>
-<p>Soon.
+<p>
+<dl>
+<dt>Domestication</dt>
+<dd><code>proHMMoter</code> outputs promoter sequences that are compatible with iGEM’s Type IIS standards, and incorporates a penalty system that can automatically domesticate for other relevant synthetic biology standards when this can be done without major disruption.</dd>
+<dt>Shortening sequences</dt>
+<dd><code>proHMMoter</code> can identify evolutionarily non-conserved regions. Accordingly, the program can remove non-essential regions upstream of the actual promoter region, creating shorter promoter sequences. This can be essential when working with higher eukaryotes, as these gene constructs can get very large, and thereby more difficult and expensive to work with.</dd>
+<dt>Non-homology</dt>
+<dd><code>proHMMoter</code> can generate promoter sequences that in theory should have the same function as the original promoter, but with a drastically different sequence. This can be very useful, since it reduces the chance of unwanted homologous recombination. </dd>
+<dt>Synthesizability</dt>
+<dd>The promoters generated by <code>proHMMoter</code> automatically comply with the complexity rules of major DNA synthesis vendors, making it easy to get your freshly generated promoters synthesised.</dd>
+<dt>Host versatility</dt>
+<dd><code>proHMMoter</code> can generate consensus sequences representing the promoters in diverse sets of genomes. As long as these genomes are sufficiently related, the promoters should be usable in any of the organisms, and further <i>in silico</i> analysis can help verify this. As a proof of concept, we generated our synthetic <i>Aspergillus niger</i> promoters based on all the genomes within the  <i>Aspergillus</i> genus.</dd>
+<dt>Induction of noise</dt>
+<dd><code>proHHMoter</code> has built-in functionality that allows the user to inject noise of varying amounts into the promoter generation process. This gives the user the opportunity to create artificially weakened promoters, and thereby allows for the creation of promoters with different strengths but broadly similar expression dynamics. For our project, we have injected noise into several of our promoters - and proved that expression was still achieved, in a step towards the aim of building a comprehensive promoter ladder.</dd>
+<dl>
 </p>
@@ Line 122: / Line 136: @@
 			<div class=" sm-no-float col-md-4 bbmobile col-sm-12">
-<img src="https://static.igem.org/mediawiki/2019/d/d8/T--DTU-Denmark--commingsoon.png" class="safetyfirstimg"/>
+<figure>
+<img style="padding:28px; width:100%" src="https://static.igem.org/mediawiki/2019/c/ce/T--DTU-Denmark--MikkelMarcusGraf.png" alt=”A flow chart showing arrows, symbolizing actions, pointing to nodes, symbolizing the files these actions generate, starting from annotated genomes and choices for candidate genes, and ending at both consensus promoters and promoters with artificial noise."
+<figcaption>Fig. 1: Flowchart showing how <code>proHMMoter</code> works.  </figcaption>
+class="safetyfirstimg"/>
+</figure>
 </div>
@@ Line 129: / Line 148: @@
 <div class="sm-no-float col-md-8">
-<h2>Comming soon</h2>
+<h2>How it works</h2>
-<p>Text will also be comming soon</a>.
+<p>The program then works by first finding all orthologs of the genes you have chosen, using the HMMER tool <code>phmmer</code> to identify orthologs by amino acid sequence, approximately identifying orthologs by selecting the highest-scoring homolog in each genome.
+When the sequences of the orthologs are found, the program locates them in their respective genomes, and extracts the putative promoters upstream of these genes. These promoter regions are then aligned with the multiple sequence alignment tool MUSCLE[2]. From the multiple sequence alignments we then build the actual HMM, using the HMMER tool <code>hmmbuild</code>. To further refine this model, we perform an expectation maximization step, wherein we align all native promoter sequences to the model using <code>hmmalign</code>, then rebuild the HMM from this alignment while trimming unaligned ends, and iterate until the consensus sequence converges. To ensure minimal promoter lengths, we extract the position-weight matrix underlying the HMM, to use in generating the synthetic promoters. Implementing the linear program described on <a target="_blank" href="https://2019.igem.org/Team:DTU-Denmark/Model">our model page</a>.
+in clear, concise code that is easy to extend, we apply constraints such as limitations on homopolymers and GC content to reduce synthesis complexity to levels considered acceptable to major vendors, and domesticate with respect to the features considered in
+<a target="_blank" href="https://2019.igem.org/Team:DTU-Denmark/Design_Promoter">our design</a>, complying with strictly required as well as making a best effort to follow merely desirable synthetic biology standards. This latter part is accomplished via a penalty system, ensuring the amount of sequence conservation (measured in bits, as sequence conservation is essentially a measure of information) sacrificed when domesticating falls below preset levels.
+With this linear program, we then solve it, finding the sequence which, with respect to sequence conservation, comes as close to the consensus sequence as the constraints allow.
+We also adjust the distribution implied by the PWM as described on <a href=”https://2019.igem.org/Team:DTU-Denmark/Model” target=”_blank”>our model page</a> and sample from it, using the linear program to reject non-solutions, thus fairly sampling promoter sequences living up to our defined standards.
+In our project, we have shown that our promoters work even when a substantial amount of noise has been injected, and that this appropriately resulted in a relatively weak promoter.
 </p>