Team:TUDelft/Osman

Sci-Phi 29

Home

Overview - Codon Usage

Overview text here.

Codon Usage - Cross-species codon harmonization

To support our orthogonal system, we developed a new and novel codon adaptation tool which ensures equal translation rates within different species. Similar translation rates will increase the predictability of heterologous protein expression levels within different bacterial species.
Protein structures are dependent on the DNA sequence, which is translated into a functional protein through two subsequent cellular processes: transcription and translation (Angov, Hillier, Kincaid, & Lyon, , ). The cell contains 20 different amino acids encoded by 64 codons. This has resulted in a phenomenon called synonymous codon usage. Synonymous codon usage means that most of the 20 amino acids are encoded by more than one codon (Nascimento et al., Crick, Gun, Yumiao, Haixian, & Liang, ). Nascimento, Kelly et al. (2018) have proven that cells are making great use of the codon choice that this offers, since codon usage directly affects both the level of mRNA and the translation rate. They showed that proteins expressed at high levels have more mRNA copies and contain more frequently used codons in order to speed up the translation rate (Nascimento et al., )

After the development of gene editing techniques, scientists started to express heterologous proteins in new host cells. Heterologous protein expression has shown altered protein levels compared to that in the original microorganism. One of the reasons for a lower expression level is the variance in codon usage between the original organism and the new host cell. In order to increase the expression level of the heterologous protein in the host cell, new codon optimization tools were developed. The codon optimization tools available now can be divided into two main groups based on how the tool's algorithm functions:

Codon optimization tools: The basic idea is to achieve the highest translation rate possible and avoid hairpin formation by substituting each codon with the codon that is used mosed frequently for the corresponding amino acid (Hanson & Coller 2017). The relative codon frequencies are calculated through the Codon Adaptation Index (CAI) as shown in the following equation. In this equation $w_i$ is the CAI, $f_i$ is the frequency of a particular codon, and $max(f_i)$ is the codon that is used most frequently for the corresponding amino acid.

$$w_i = \frac{f_i}{max( f_i )}$$

Codon harmonization tools: The basic idea is to mimic the native translation rate in the host organism by using rare codons at specific places and avoiding hairpin formation as much as possible. The codon usage of the original microorganism functions as the reference point. This approach allows pre-folding of the protein during translation in order to reduce the chance of the protein misfolding as much as possible (Figure 1).

Translation rate — Figure 1: Schematic representation of protein translation. The green parts are encoded with high frequency codons in order to speed up the translation rate. The red parts are encoded by rare codons in order to slow down the translation rate, which limits misfolding of proteins by creating a small time window for protein pre-folding. Codon harmonization aims to create the same codon usage pattern as the native host in order to increase the amount of functional protein.

Since our project is all about creating a universal toolkit, we boosted our project by creating the first cross-species codon harmonization tool. We developed this new harmonization tool in order to increase the predictability of heterologous protein expression in multiple bacterial host species, by taking into account the translation variability between organisms. This harmonization tool will provide the user with a single DNA coding sequence that will yield the same protein expression level in different bacterial host cells. The codon harmonization approach as explained above forms the core of our algorithm. We modified this algorithm by making use of statistical analysis. Furthermore, we made our tool BioBrick RFC compatible by removing type II standard restriction sites.

How does our universal harmonization tool work?
- From nucleotide input to nucleotide output
  
  First, the codon frequency is calculated in the same way as in Athey et al. (2017). We used this formula instead of the formula for CAI, since for the CAI calculation a reference gene is required. The CAI is very useful in case you want to change gene expression of a gene within the same organism. However, our system functions across species, so an adapted version of the CAI is used, as shown in the equation below.
  $$freq_{codon,i} = \frac{codon}{\sum codon,i}$$
  
  Secondly, the variance for each codon position is calculated separately. During the calculation of the variance, the codon frequency at that particular sequence position is also taken into account in order to remove outliers as much as possible. The calculated variance for each position is ordered from lowest to highest.
  
  In the first iteration of generating the final sequence, we use the lowest variance codon at every position. The generated sequence will go through screening for type II restriction enzyme recognition sequences. In case a site is found, the codon at that particular position will be substituted with the synonymous codon that has the second lowest variance. Going through this iteration cycle multiple times we derive a single nucleotide sequence cleared from type II restriction sites and codon harmonized for all organisms of interest in order to achieve the same translation rate in each organism of interest.

Experimental validation

text here