Revision as of 03:37, 22 October 2019

Title

Computational analyses and Modeling were very important parts of our project as they allowed us to address two fundamental questions:
(1) How can we produce jacaric acid in the absence of any information on the enzyme catalysing its synthesis?
(2) Which Yarrowia lipolytica strain is the most adapted / efficient / robust for the Conjugated Linolenic Acids (CLnAs) production like punicic and jacaric acids ?

To find answers to the above questions, we: (1) undertook a substantial experimental and computational effort to identify the sequence of the FadX gene directly from Jacaranda seeds, and (2) carried out Flux Balance Analyses to identify the Yarrowia lipolytica strains best suited for the production of rare fatty acids of our interest. We describe our work below in detail.

Uncovering the Jacaranda mimosifolia FadX gene

In the scope of the iGEM project, we needed to identify and characterize the enzyme responsible for catalyzing the synthesis of jacaric acid, a Conjugated Linolenic Acid (CLnA) with anti-tumoral, anti-inflammatory and anti-obesity properties. This compound is naturally found at a high concentration in the seeds of the plant Jacaranda mimosifolia (hereafter called Jacaranda), however, the genome of this species is not sequenced. To identify which enzyme would catalyze such reaction in Jacaranda, we decided to perform Exome-sequencing on both fresh and germinated seeds.

RNA sequencing of Jacaranda mimosifolia and data pre-processing

We performed RNA extraction for the fresh and germinated seeds of Jacaranda using the NucleoSpin® RNA Plant and Fungi kit (Macherey Nagel) and, after a quick analysis on an ethidium bromide stained agarose gel followed by a Qubit RNA assay (Thermo Fisher Scientific) samples were prepared for sequencing using the TruSeq Stranded mRNA kit (Illumina). The libraries were analyzed on a HiSeq 4000 system (Illumina) and the obtained data were preprocessed and cleaned by the sequencing platform (Genoscope).

Figure 1 : Ethidium bromide stained agarose gel showing the bands from fresh and germinated Jacaranda seeds RNA extracts.

Illumina Filter

Raw reads were filtered to remove clusters that had too many bases with ambiguous signal. The purity of the signal was analyzed on the first 25 cycles of each cluster by assigning a score to each of the cycles: Chastity = (Maximum Intensity) / (Sum of the two highest intensities). The filter used in basic calling allows at most a cycle with a chastity value of less than 0.6.

Genoscope Filter

Adapters and primers sequences were eliminated from the readings as well as low-quality bases (Q <20) at 2 extremities. Reads of less than 30 nucleotides were discarded. Finally, PhiX reads from Illumina internal spiking were discarded.

Filtering of ribosomal sequences

The cleaned reads corresponding to ribosomal RNA were separated from other reads.

Methods

For the analyses of the exome sequencing, we decided to follow two independent approaches: alignment to a reference genome and de novo transcriptome assembly.

Alignment using a reference genome

In this first approach, we used the genome of Handroanthus impetiginosus [1], a Bignoniaceaea plant as a reference genome for the alignment of the reads. We choose it following the discussions we had with Dr. Florian Jabbour, Senior Lecturer and Collection Manager in the field of Morpho-Anatomy and Plant Development at the Institute of Systematics, Evolution, Biodiversity of the National Museum of Natural History of Paris (for details, see our Human Practices page on this wiki). We used the Galaxy platform [2] to analyze the fastq files provided by the Illumina platform. We uploaded four fastq files containing the paired-end sequences for the germinated and fresh seeds (two for each seed type) as well as the reference genome [1]. First, we used FastQ Groomer to ensure that our fastq files used standard quality format. Next, we used FastQC to perform a quality control of our files. Before mapping the obtained sequences against a reference genome, we visualized the statistics corresponding to the raw data for each paired-end sequence, we did so for the fresh seeds (referred to as COS) and the germinated ones (BOS).

We then proceed to the mapping of our different fastq files (alignment of the reads to a reference genome) creating bam files using BWA tool and obtained bam files for both seed exomes. We sorted those files according to their coordinates on the reference genome. At this step, we used bcftools mpileup tool to create a VCF file with the variant calls. We then used bcftools consensus in order to create a consensus sequence, substituting the reference genome bases for the variants of the VCF file. Finally, the consensus sequences obtained for the fresh seeds and germinated seeds were translated into peptide sequences according to three forward frames using EMBOSS Transeq [3]. The corresponding peptide sequences were then aligned along with nine sequences of FADX enzymes, five FAD2 enzymes and the two peptide sequences of Handroanthus impetiginosus with the best similarity.

Figure 2 : Galaxy workflow for the alignment to a reference genome approach

De novo transcriptome assembly

In this second approach, we performed a de novo transcriptome assembly from Illumina reads of 150 nucleotides length to obtain longer contigs which would cover the entire sequence of our enzyme. We then aligned those transcripts to the FADX enzymes (table 1) to identify the closest transcripts. We performed the de novo transcriptome assembly using the galaxy platform [1]. Briefly, we used the Trimmomatic tool [4] with a sliding window of four nucleotides to ensure the quality of the reads. Then we used the Trinity tool [5] on the paired reads to create longer contigs and obtain two libraries of roughly 135 000 and 200 000 sequences (assembly quality reports: germinated seeds; fresh seeds.

Figure 3 : Galaxy workflow for the de novo transcriptome assembly approach

As the genome of Jacaranda is not annotated, it was impossible for us to identify the start codon of those sequences. Moreover, as we cannot be sure that our contigs contain the beginning of the real transcript and therefore the start codon, we translated all the transcripts of the library into amino acid sequences according to the three potential open reading frames (ORFs). To convert the transcripts to protein sequences and to perform local alignments, we used Python’s module Seq from the package Bio [6]. For the translation, we used the default table for plant plastids provided by the module. For the alignments, we used a BLOSUM62 transition matrix. We first performed the analysis using parameters with a gap open penalty of 10 and a gap extension penalty of 0.5. We then refined our results with a more stringent gap open penalty of 20. Python script is available here.

To analyze our alignments, we ranked the scores obtained by each contig with each enzyme for one ORF and then compare the results between contigs and ORFs.

Results and comparison the two approaches

We used a set of nine known CLnA enzymes (table 1) to perform local alignments and identified the closest sequences. We then challenged those alignments with five sequences corresponding to FAD2 enzymes (table 2) performing multiple alignment with clustal omega [7] and constructed phylogenetic trees using phylML 3.0 (bootstraping parameter: 100) [8] to ensure that the sequence discovered would correspond to a FADX enzyme. We visualized the multiple alignments with Jalview [9].

Table 1. FADX enzyme

Enzyme	Organism	Uniprot identifier
Bifunctional fatty acid conjugase/Delta(12)-oleate desaturase	Punica granatum	Q84UB8
Bifunctional fatty acid conjugase/Delta(12)-oleate desaturase	Trichosanthes kirilowii	Q84UC0
Bifunctional desaturase/conjugase FADX	Vernicia fordii	Q8GZC2
Delta13 fatty acid desaturase FADX-1B	Momordica charantia	Q9SP61
Delta(12) acyl-lipid conjugase (11E,13E-forming)	Impatiens balsamina	Q9SP62
Delta13 fatty acid desaturase FADX-1B	Exocarpos cupressiformis	U5LN76
Fatty acid conjugase FAC2 B	Calendula officinalis	Q9FPP7
Fatty acid conjugase FAC2 A	Calendula officinalis	Q9FPP8
Delta(12) fatty acid desaturase DES8.11	Calendula officinalis	Q9SCG2

Figure 4. Multiple alignments of the FADX enzymes with colors representing the identity percentage

Table 2. FAD2 enzyme

Enzyme	Organism	Uniprot identifier
Delta(12) fatty acid desaturase FAD2	Calendula officinalis	Q9AT72
Delta(12)-acyl-lipid-desaturase	Punica granatum	Q84VT2
Delta(12)-fatty-acid desaturase	Arabidopsis thaliana	P46313
Delta(12)-fatty-acid desaturase FAD2	>Vernicia fordii	Q8GZC3
(submitted)	Yarrowia lipolytica	Q6CF55

Figure 5. Multiple alignments of the FAD2 enzymes with colors representing the identity percentage

Figure 6. Phylogenetic tree of the FAD2 and FADX enzymes

Figure 7. Multiple alignments of the FADX and FAD2 enzymes with colors representing the identity percentage

Results for the alignment to a reference genome

For comparison, we used the genes predicted for Handroanthus impetiginosus Bignoniaceaea and the peptide sequences of the nine CLnA enzymes (table 1). Searching for local alignments with Python’s Bio::seq module (script availablehere), we identified two gene sequences of Handroanthus impetiginosus Bignoniaceae that shared the best similarity with those enzymes. These sequences correspond to the peptide1 and peptide2 (Figure 8). As a preliminary analysis, we performed multiple alignments between the sequences of enzymes in Tables 1 (FADX) and 2 (FAD2), the consensus sequences for the fresh (COS) and germinated (BOS) seeds of Jacaranda (for the three reading frames) and the two peptide sequences of Handroanthus impetiginosus. These alignments showed that the sequence translated with the first reading frame of the fresh seeds was the one sharing most similarity with FADX enzymes while other ORFs as well as the Handroanthus impetiginosus gene sequences would share more similarity with FAD2 enzymes or constitute outliers. We, therefore, decided to focus on this sequence and repeated the alignment with the FADX and FAD2 enzymes constructing a phylogenetic tree to see which enzyme would be closer to our sequence (Figure 8).

Figure 8. Phylogenetic tree with the sequences from the first ORF of the fresh (COS) seeds in the alignment

The consensus sequence for the first reading frame of the fresh seeds being much longer than the ones of the enzymes, we extracted the sequence where the alignment takes place and repeated the analysis (Figure 9).

Figure 9. Phylogenetic tree with the sequences from the first ORF of the fresh seeds. Within the first reading frame, we selected the sequence corresponding to the amino acids 91623 to 91980

We can see that the sequence extracted from the fresh seed is localized between FADX and FAD2 enzymes without really clustering with any specific type.

Figure 10. Multiple alignments between the selected consensus sequence (amino acids 91623 to 91980) from the first ORF of the fresh (COS) seeds and the FADX and FAD2 enzymes. The colors represent the alignment score for a BLOSUM62 transition matrix

@@ Line 6: / Line 6: @@
 <img src="https://static.igem.org/mediawiki/2019/f/ff/T--Evry_Paris-Saclay--DryLab.jpg" class="img-fluid" style="max-height:100vh; width: auto;"/>
-</div><div class="container p-0">
+<div class="container p-0">
@@ Line 164: / Line 164: @@
 <center><img src="https://static.igem.org/mediawiki/2019/6/67/T--Evry_Paris-Saclay--Dry_Fig3_FADX.svg" class="img-fluid"
-		style="max-height:100vh; width: auto;"width: auto;/></center>
+		style="max-height:100vh; width: auto;"width: auto></center>
-<center><div class="font-weight-light"><a href="https://static.igem.org/mediawiki/2019/6/67/T--Evry_Paris-Saclay--Dry_Fig3_FADX.svg">Figure 4. Multiple alignments of the FADX enzymes with colors representing the identity percentage</a></center></div>
+<center><div class="font-weight-light"><a href="https://static.igem.org/mediawiki/2019/6/67/T--Evry_Paris-Saclay--Dry_Fig3_FADX.svg">Figure 4. Multiple alignments of the FADX enzymes with colors representing the identity percentage</a></div></center>
@@ Line 218: / Line 218: @@
 <br>
 <center><img src="https://static.igem.org/mediawiki/2019/b/bc/T--Evry_Paris-Saclay--Dry_Fig5_EnzymesTree_Linear.png" class="img-fluid"
-		style="max-height:100vh; width: auto;"width: auto;/></center>
+		style="max-height:100vh; width: auto;"width: auto></center>
 <div class="font-weight-light"><center>Figure 6. Phylogenetic tree of the FAD2 and FADX enzymes</center></div>
@@ Line 232: / Line 232: @@
 <h2>Results for the alignment to a reference genome</h2>
-<p>For comparison, we used the genes predicted for <i>Handroanthus impetiginosus</i> Bignoniaceaea and the peptide sequences of the nine CLnA enzymes (table 1). Searching for local alignments with Python’s Bio::seq module (script available here: T--Evry_Paris-Saclay--Dry_File4_LocalAlignmentsUsingBio_Seq.txt), we identified two gene sequences of <i>Handroanthus impetiginosus</i> Bignoniaceae that shared the best similarity with those enzymes. These sequences correspond to the peptide1 and peptide2 (Figure 8).
+<p>For comparison, we used the genes predicted for <i>Handroanthus impetiginosus</i> Bignoniaceaea and the peptide sequences of the nine CLnA enzymes (table 1). Searching for local alignments with Python’s Bio::seq module (script available<a href="T--Evry_Paris-Saclay--Dry_File4_LocalAlignmentsUsingBio_Seq.txt">here</a>), we identified two gene sequences of <i>Handroanthus impetiginosus</i> Bignoniaceae that shared the best similarity with those enzymes. These sequences correspond to the peptide1 and peptide2 (Figure 8).
 As a preliminary analysis, we performed multiple alignments between the sequences of enzymes in Tables 1 (FADX) and 2 (FAD2), the consensus sequences for the fresh (COS) and germinated (BOS) seeds of Jacaranda (for the three reading frames) and the two peptide sequences of <i>Handroanthus impetiginosus</i>. These alignments showed that the sequence translated with the first reading frame of the fresh seeds was the one sharing most similarity with FADX enzymes while other ORFs as well as the <i>Handroanthus impetiginosus</i> gene sequences would share more similarity with FAD2 enzymes or constitute outliers.
 We, therefore, decided to focus on this sequence and repeated the alignment with the FADX and FAD2 enzymes constructing a phylogenetic tree to see which enzyme would be closer to our sequence (Figure 8).</p>
-<center><img src="https://static.igem.org/mediawiki/2019/6/6b/T--Evry_Paris-Saclay--Dry_Fig9_SequenceIDentifiedAlignment_Enzs_Peptides.png"height="1000"width="1000"class="img-fluid"></center>
+<center><img src="https://static.igem.org/mediawiki/2019/6/6b/T--Evry_Paris-Saclay--Dry_Fig9_SequenceIDentifiedAlignment_Enzs_Peptides.png"height="500"width="500"class="img-fluid"></center>
 <div class="font-weight-light"><center>Figure 8. Phylogenetic tree with the sequences from the first ORF of the fresh (COS) seeds in the alignment</center></div>
@@ Line 244: / Line 244: @@
 <br>
-<center><img src="https://static.igem.org/mediawiki/2019/d/d1/T--Evry_Paris-Saclay--Dry_Fig10_SequenceIdentifiedFromAlignment_Enzs.png"height="1000"width="1000"class="img-fluid"></center>
+<center><img src="https://static.igem.org/mediawiki/2019/d/d1/T--Evry_Paris-Saclay--Dry_Fig10_SequenceIdentifiedFromAlignment_Enzs.png"height="500"width="500"class="img-fluid"></center>
 <div class="font-weight-light"><center>Figure 9. Phylogenetic tree with the sequences from the first ORF of the fresh seeds. Within the first reading frame, we selected the sequence corresponding to the amino acids 91623 to 91980</center></div>
@@ Line 378: / Line 378: @@
 					models, the normal value for gene expression in set to 1. To knockout genes in the models, we
 					have set this value to 0, and we have set it to 10 to simulate an over-expression. The FBA
-					results are presented in Table 1.</p>
+					results are presented in Table 3.</p>
 				<br>
 				<br>
-				 <div class="font-weight-light"><center>Table 1. Jacaric acid production of several <i>Y. lipolytica</i>
+				 <div class="font-weight-light"><center>Table 3. Jacaric acid production of several <i>Y. lipolytica</i>
 					strains determined by Flux Balance Analysis.<br></center></div>
@@ Line 473: / Line 473: @@
 					evolutionary optimization tool of OtpFlux on JMY195 and set the minimum biomass production
 					to 95% of the control. As results, this tool gave us genes or sets of genes the deletion of which increases the jacaric acid production to 1,000, which is the maximum potential value of the
-					reaction. These genes are presented in Table 2. The solutions proposed by this tool should
+					reaction. These genes are presented in Table 4. The solutions proposed by this tool should
 					always be manually verified since a model may not be refined enough to represent the exact reality and could
 					propose absurd deletions as a result.</p>
@@ Line 480: / Line 480: @@
 				<br>
-				 <div class="font-weight-light"><center>Table 2. Gene deletions proposed by the evolutionary
+				 <div class="font-weight-light"><center>Table 4. Gene deletions proposed by the evolutionary
 					optimization of OptFlux to increase jacaric acid production.<br></center></div>

Y. lipolytica strain	Mutations	Biomass		Jacaric acid
JMY195	Po1d (Ura3-, Leu2-, Xpr2-)	29.50	100.0%	226.54	100.0%
JMY1233	Po1d, pox1-6Δ	29.50	100.0%	226.54	100.0%
JMY1877	Po1d, dga1Δ dga2Δ lro1Δ are1Δ	0.00	0.0%	0.00	0.0%
JMY2159	Po1d, pox1-6Δ dga1Δ dga2Δ lro1Δ fad2Δ	0.00	0.0%	0.00	0.0%
JMY3325	Po1d, pox1-6Δ dga1Δ dga2Δ lro1Δ fad2Δ, pTEF-FAD2-LEU2	0.00	0.0%	0.00	0.0%
JMY3820	Po1d, pox1-6Δ tgl4Δ, pTEF-DGA2, pTEF-GPD1	29.50	100.0%	386.41	170.6%

Gene	Occurrence in solution
YALI0D00583g	10
YALI0F15631g	7
YALI0E33517g	4
YALI0F24475g	4
YALI0E25740g	4
YALI0F01210g	4
YALI0F26323g	3
YALI0E18568g	3
YALI0A05379g	2
YALI0B09647g	2
YALI0B22682g	2
YALI0E33099g	2
YALI0C05258g	1
YALI0D04741g	1
YALI0D10131g	1
YALI0E31009g	1
YALI0F14025g	1
YALI0E31471g	1
YALI0D24750g	1
YALI0E24013g	1
YALI0C11407g	1

Difference between revisions of "Team:Evry Paris-Saclay/Model"

Revision as of 03:37, 22 October 2019

Uncovering the Jacaranda mimosifolia FadX gene

RNA sequencing of Jacaranda mimosifolia and data pre-processing

Illumina Filter

Genoscope Filter

Filtering of ribosomal sequences

Methods

Alignment using a reference genome

De novo transcriptome assembly

Results and comparison the two approaches

Results for the alignment to a reference genome

Results from the de novo transcriptome assembly

Discussion of the two approaches

FBA

References

Our Sponsors

Contact Us