Team:TUDelft/Osman

Sci-Phi 29

Home

Overview - Codon Usage

Overview text here.

Codon Usage - Cross-species codon harmonization

To support our orthogonal system, we developed a new and novel codon adaptation tool which ensures equal translation rates within different species. Similar translation rates will increase the predictability of heterologous protein expression levels within different bacterial species.
Protein structures are dependent on the DNA sequence, which is translated into a functional protein through two subsequent cellular processes: transcription and translation (Angov, Hillier, Kincaid, & Lyon, , ). The cell contains 20 different amino acids encoded by 64 codons. This has resulted in a phenomenon called synonymous codon usage. Synonymous codon usage means that most of the 20 amino acids are encoded by more than one codon (Nascimento et al., Crick, Gun, Yumiao, Haixian, & Liang, ). Nascimento, Kelly et al. (2018) have proven that cells are making great use of the codon choice that this offers, since codon usage directly affects both the level of mRNA and the translation rate. They showed that proteins expressed at high levels have more mRNA copies and contain more frequently used codons in order to speed up the translation rate (Nascimento et al., )

After the development of gene editing techniques, scientists started to express heterologous proteins in new host cells. Heterologous protein expression has shown altered protein levels compared to that in the original microorganism. One of the reasons for a lower expression level is the variance in codon usage between the original organism and the new host cell. In order to increase the expression level of the heterologous protein in the host cell, new codon optimization tools were developed. The codon optimization tools available now can be divided into two main groups based on how the tool's algorithm functions:

Codon optimization tools: The basic idea is to achieve the highest translation rate possible and avoid hairpin formation by substituting each codon with the codon that is used mosed frequently for the corresponding amino acid (Hanson & Coller 2017). The relative codon frequencies are calculated through the Codon Adaptation Index (CAI) as shown in the following equation. In this equation $w_i$ is the CAI, $f_i$ is the frequency of particular codon, and $max(f_i)$ is the codon that is used most frequently for the corresponding amino acid.

$$w_i = \frac{f_i}{max( f_i )}$$

Codon harmonization tools: The basic idea is to mimic the same translation-rate in the host organism by using rare codons at specific places and avoiding hairpin formation as much as possible. The codon usage of the original microorganism function as the reference point. This approach allows pre-folding of the protein during the translation in order to reduce the chance of protein misfolding as much as possible ( Figure 1 ).

Translation rate — Figure 1: Schematic representation of the protein translation. The green parts are encoded with high frequency codons in order to speed up to translation rate. The red parts are encoded by rare codons in order to slow down the translation rate in order to decrease misfolding of proteins by creating a small time window for protein pre-foldong. With codon harmonization the aim is to create the same codon usage pattern as the native host in order to increase the amount of functional protein.

Since our project is all about creating an universal toolbox, we boosted our toolbox by creating the first cross species codon harmonization tool. We developed this new harmonization tool in order to increase the predictability of heterologous protein in multiple bacterial host species by taking into account the translation variability between organisms. This harmonisation tool will yield the same expression levels of functional protein in different host cells using 1 single coding sequence. The codon harmonization tool approach functions as the core of our algorithm. The algorithm is modified by making use of the statistical analysis. Furthermore, we made our tool BioBrick RFC compatible by removing the type II standard removes restriction sides.

How does our universal harmonization tool work?
- From nucleotide input to nucleotide output
  
  First the codon frequency is calculated through using the same way as has been calculated in Athey et al. (2017). We used the this formula instead of the formula for CAI, since for the CAI calculation a reference gene is required. The CAI is very useful in case you want to change gene expression of a gene within the same organisms, However, our system function interspecies so an adapted version of the CAI is used as shown in the equation below.
  $$freq_{codon,i} = \frac{codon}{\sum codon,i}$$
  
  Secondly, the variance for each codon position is calculated sepatelty. During this step each synonymous codon encoding for the same amino acid is calculated. During the caculation of the variance the codon frequency at that particular sequence position is also taken in the calculation in order to remove outlayers as much as possible. The calculated variance for each position is ordered from lowest to highest variance.
  
  In the first iteration for generating the final sequence, we use the lowest variance codon at that curtain position. The generated sequence will go through screening for type II restriction enzyme recognition sequences. In case one side is found, the codon at that curtain position will be substituted with the second lowest variance. Going through this iteration cycle multiple times we derive a single nucleotide sequence cleared from type II restriction side and codon harmonised for all the organisms of interest in order to achieve same translation rates in each organism of interest.

Experimental validation

text here