Difference between revisions of "Team:Evry Paris-Saclay/Model"

Line 16: Line 16:
  
 
<h2>RNA sequencing of Jacaranda mimosifolia and data pre-processing</h2>
 
<h2>RNA sequencing of Jacaranda mimosifolia and data pre-processing</h2>
<br>
 
 
<p>We performed RNA extraction for the fresh and germinated seeds of Jacaranda using the NucleoSpin® RNA Plant and Fungi kit (Macherey Nagel) and, after a quick analysis on an ethidium bromide stained agarose gel followed by a Qubit RNA assay (Thermo Fisher Scientific) samples were prepared for sequencing using the TruSeq Stranded mRNA kit (Illumina). The libraries were analyzed on a HiSeq 4000 system (Illumina) and the obtained data were preprocessed and cleaned by the sequencing platform (Genoscope).</p>
 
<p>We performed RNA extraction for the fresh and germinated seeds of Jacaranda using the NucleoSpin® RNA Plant and Fungi kit (Macherey Nagel) and, after a quick analysis on an ethidium bromide stained agarose gel followed by a Qubit RNA assay (Thermo Fisher Scientific) samples were prepared for sequencing using the TruSeq Stranded mRNA kit (Illumina). The libraries were analyzed on a HiSeq 4000 system (Illumina) and the obtained data were preprocessed and cleaned by the sequencing platform (Genoscope).</p>
  
Line 22: Line 21:
  
 
<h2>Illumina Filter</h2>
 
<h2>Illumina Filter</h2>
<br>
+
<p>Raw reads were filtered to remove clusters that had too many bases with ambiguous intensity. The purity of the signal was analyzed on the first 25 cycles of each cluster by assigning a score to each of the cycles: Chastity = (Maximum Intensity) / (Sum of the two highest intensities). The filter used in basic calling allows at most a cycle with a chastity value of less than 0.6.</p>
Raw reads were filtered to remove clusters that had too many bases with ambiguous intensity. The purity of the signal was analyzed on the first 25 cycles of each cluster by assigning a score to each of the cycles: Chastity = (Maximum Intensity) / (Sum of the two highest intensities). The filter used in basic calling allows at most a cycle with a chastity value of less than 0.6
+
  
 
<br>
 
<br>
  
 
<h2>Genoscope Filter</h2>
 
<h2>Genoscope Filter</h2>
<br>
 
 
<p>Adapters and primers sequences were eliminated from the readings as well as low-quality bases (Q <20) at 2 extremities. Reads of less than 30 nucleotides were discarded. Finally, PhiX reads from Illumina internal spiking were discarded.</p>
 
<p>Adapters and primers sequences were eliminated from the readings as well as low-quality bases (Q <20) at 2 extremities. Reads of less than 30 nucleotides were discarded. Finally, PhiX reads from Illumina internal spiking were discarded.</p>
  
Line 34: Line 31:
  
 
<h2>Filtering of ribosomal sequences</h2>
 
<h2>Filtering of ribosomal sequences</h2>
<br>
 
 
<p>The cleaned reads corresponding to ribosomal RNA were separated from other reads.</p>
 
<p>The cleaned reads corresponding to ribosomal RNA were separated from other reads.</p>
  
Line 46: Line 42:
  
 
<h2>Alignment using a reference genome</h2>
 
<h2>Alignment using a reference genome</h2>
<br>
 
 
<p>In this first approach, we used the genome of Handroanthus impetiginosus [1], Bignoniaceaea as a reference genome for the alignment of the reads. We choose it following the discussions we had with Dr. Florian Jabbour, Senior Lecturer and Collection Manager in the field of Morpho-Anatomy and Plant Development at the Institute of Systematics, Evolution, Biodiversity of the National Museum of Natural History of Paris (for details, see our Human Practices page on this wiki).  
 
<p>In this first approach, we used the genome of Handroanthus impetiginosus [1], Bignoniaceaea as a reference genome for the alignment of the reads. We choose it following the discussions we had with Dr. Florian Jabbour, Senior Lecturer and Collection Manager in the field of Morpho-Anatomy and Plant Development at the Institute of Systematics, Evolution, Biodiversity of the National Museum of Natural History of Paris (for details, see our Human Practices page on this wiki).  
 
We used the Galaxy platform [2] to analyze the fastq files provided by the Illumina platform. We uploaded four fastq files containing the paired-end sequences for the germinated and fresh seeds (two for each seed type) as well as the reference genome [1].
 
We used the Galaxy platform [2] to analyze the fastq files provided by the Illumina platform. We uploaded four fastq files containing the paired-end sequences for the germinated and fresh seeds (two for each seed type) as well as the reference genome [1].
Line 55: Line 50:
 
Finally, the consensus sequences obtained for the fresh seeds and germinated seeds were translated into peptide sequences according to three forward frames using EMBOSS Transeq [3].
 
Finally, the consensus sequences obtained for the fresh seeds and germinated seeds were translated into peptide sequences according to three forward frames using EMBOSS Transeq [3].
 
The corresponding peptide sequences were then aligned along with nine sequences of FADX enzymes, five FAD2 enzymes and the two peptide sequences of Handroanthus Impetiginosus with the best similarity.</p>
 
The corresponding peptide sequences were then aligned along with nine sequences of FADX enzymes, five FAD2 enzymes and the two peptide sequences of Handroanthus Impetiginosus with the best similarity.</p>
 +
 +
<br><br>
 +
 +
<h2>De novo transcriptome assembly</h2>
 +
<p>In this second approach, we performed a de novo transcriptome assembly from Illumina reads of 150 nucleotides length to obtain longer contigs which would cover the entire sequence of our enzyme. We then aligned those transcripts to the FADX enzymes (table 1) to identify the closest transcripts.
 +
We performed the de novo transcriptome assembly using the galaxy platform [1].
 +
Briefly, we used the Trimmomatic tool [4]  with a sliding window of four nucleotides to ensure the quality of the reads. Then we used the Trinity tool [5] on the paired reads to create longer contigs and obtain two libraries of roughly 135 000 and 200 000 sequences (assembly quality reports: <a href="T--Evry_Paris-Saclay--Dry_File1_BOSRB-QuastReport-.pdf">germinated seeds</a>; <a href="T--Evry_Paris-Saclay--Dry_File2_COSRB-QuastReport-.pdf">fresh seeds</a>.</p>
 +
  
 
</html>
 
</html>
  
 
{{Evry_Paris-Saclay/bottom}}
 
{{Evry_Paris-Saclay/bottom}}

Revision as of 19:30, 21 October 2019

Title

Uncovering the Jacaranda mimosifolia FadX gene


In the scope of the iGEM project, we needed to identify and characterize the enzyme responsible for catalyzing the synthesis of jacaric acid, a Conjugated Linolenic Acid (CLnA) with anti-tumoral, anti-inflammatory and anti-obesity properties. This compound is naturally found at a high concentration in the seeds of the plant Jacaranda mimosifolia (hereafter called Jacaranda), however, the genome of this species is not sequenced. To identify which enzyme would catalyze such reaction in Jacaranda, we decided to perform Exome-sequencing on both fresh and germinated seeds.


RNA sequencing of Jacaranda mimosifolia and data pre-processing

We performed RNA extraction for the fresh and germinated seeds of Jacaranda using the NucleoSpin® RNA Plant and Fungi kit (Macherey Nagel) and, after a quick analysis on an ethidium bromide stained agarose gel followed by a Qubit RNA assay (Thermo Fisher Scientific) samples were prepared for sequencing using the TruSeq Stranded mRNA kit (Illumina). The libraries were analyzed on a HiSeq 4000 system (Illumina) and the obtained data were preprocessed and cleaned by the sequencing platform (Genoscope).


Illumina Filter

Raw reads were filtered to remove clusters that had too many bases with ambiguous intensity. The purity of the signal was analyzed on the first 25 cycles of each cluster by assigning a score to each of the cycles: Chastity = (Maximum Intensity) / (Sum of the two highest intensities). The filter used in basic calling allows at most a cycle with a chastity value of less than 0.6.


Genoscope Filter

Adapters and primers sequences were eliminated from the readings as well as low-quality bases (Q <20) at 2 extremities. Reads of less than 30 nucleotides were discarded. Finally, PhiX reads from Illumina internal spiking were discarded.


Filtering of ribosomal sequences

The cleaned reads corresponding to ribosomal RNA were separated from other reads.



Methods


For the analyses of the exome sequencing, we decided to follow two independent approaches: alignment to a reference genome and de novo transcriptome assembly.


Alignment using a reference genome

In this first approach, we used the genome of Handroanthus impetiginosus [1], Bignoniaceaea as a reference genome for the alignment of the reads. We choose it following the discussions we had with Dr. Florian Jabbour, Senior Lecturer and Collection Manager in the field of Morpho-Anatomy and Plant Development at the Institute of Systematics, Evolution, Biodiversity of the National Museum of Natural History of Paris (for details, see our Human Practices page on this wiki). We used the Galaxy platform [2] to analyze the fastq files provided by the Illumina platform. We uploaded four fastq files containing the paired-end sequences for the germinated and fresh seeds (two for each seed type) as well as the reference genome [1]. First, we used FastQ Groomer to ensure that our fastq files used standard quality format. Next, we used FastQC to perform a quality control of our files. Before mapping the obtained sequences against a reference genome, we visualized the statistics corresponding to the raw data for each paired-end sequence, we did so for the fresh seeds (referred to as COS) and the germinated ones (BOS).


We then proceed to the mapping of our different fastq files (alignment of the reads to a reference genome) creating bam files using BWA tool and obtained bam files for both seed exomes. We sorted those files according to their coordinates on the reference genome. At this step, we used bcftools mpileup tool to create a VCF file with the variant calls. We then used bcftools consensus in order to create a consensus sequence, substituting the reference genome bases for the variants of the VCF file. Finally, the consensus sequences obtained for the fresh seeds and germinated seeds were translated into peptide sequences according to three forward frames using EMBOSS Transeq [3]. The corresponding peptide sequences were then aligned along with nine sequences of FADX enzymes, five FAD2 enzymes and the two peptide sequences of Handroanthus Impetiginosus with the best similarity.



De novo transcriptome assembly

In this second approach, we performed a de novo transcriptome assembly from Illumina reads of 150 nucleotides length to obtain longer contigs which would cover the entire sequence of our enzyme. We then aligned those transcripts to the FADX enzymes (table 1) to identify the closest transcripts. We performed the de novo transcriptome assembly using the galaxy platform [1]. Briefly, we used the Trimmomatic tool [4] with a sliding window of four nucleotides to ensure the quality of the reads. Then we used the Trinity tool [5] on the paired reads to create longer contigs and obtain two libraries of roughly 135 000 and 200 000 sequences (assembly quality reports: germinated seeds; fresh seeds.