The TwistDx guidelines 2 recommend the following parameters for the target sequence :
- A GC content between 40-60%
- An optimal size range of 100-200 bp ; size limit of 500 bp
- No repetitive sequences/palindromes (to optimize specificity)
The first condition quickly turned out to be more problematic than expected, as phytoplasmas usually have a genomic GC content between 21 and 33% 3.
We started by looking at the Hren et al. paper4 detailing the detection method cited by the European and Mediterranean Plant Protection Organisation (EPPO) protocol 5.
Bois Noir
The target sequence for BN is derived from a Candidatus Phytoplasma solani genomic sequence (GenBank accession number AF447593) used in the Hren et al. paper. The sequence was analyzed in silico to confirm that it contained potential amplicons with >40% GC. The sequence was then BLASTed to test for specificity, which only elicited matches for stolbur phytoplasmas (stolbur is the name given to the disease caused by Candidatus P. solani in every plant beside grapevine).
Flavescence Dorée
The choice of a target sequence was a lot less straightforward for FD. The sequence described in the Hren et al paper is a fragment of the SecY gene. The sequence has an average GC content of 23.75% and doesn't contain a single sequence of ≥100 bp with more than 35% GC. SecY was out, we needed to find a new sequence for FD.
Figure 1 : Analysis of the GC content of the SecY sequence, generated by computing the GC content of fragments of varying lengths (80 bp; 100 bp; 120 bp; 150 bp) at each position of the sequence
Phytoplasma strains are classified based on their 16S ribosomal RNA. For instance, C. phytoplasma vitis (Flavescence Dorée) belongs to the 16SrV, while C. phytoplasma solani (Bois Noir) belongs to the 16SrXII group. We decided to try the R16F2n/R16R2 (F2N/R2) fragment of the 16S ribosomal RNA gene. This sequence is used as an identification sequence in various studies involving Candidatus phytoplasmas.
We aligned 7 sequences of the F2N/R2 gene from Candidatus phytoplasma vitis, but also from Candidatus phytoplasma solani (Bois noir) and other phytoplasma strains from the 16SrV group to assess specificity (additional material, table 3). Alignment was performed using the online analysis tool Benchling.com, generating a consensus sequence. The target sequence was selected from the consensus sequence, in a region with high 16SrV identity and low 16SrXII compatibility. The target sequence was then BLASTed to confirm that it did not match any known sequence from Bois Noir, other diseases or grapevine.
Figure 2 : Alignement of FD, FD-related (16Srv) and BN R16F2n/R16R2 sequences. The target sequence was again selected from the consensus sequence, in a region with high 16SrV identity and low 16SrXII compatibility.
Figure 3 : Analysis of the GC content of the F2N/R2 consensus sequence, generated by computing the GC content of fragments of varying lengths (100 bp; 120 bp; 150 bp) at each position of the sequence
We considered a few other barcode genes (such as tuf and map), but the F2N/R2 turned out to be an excellent candidate for detection : well-documented in most phytoplasma strains, with a high GC content and strong specificity.
Endogenous Control
When asked about what requirements our test needed to fulfill to be usable, the Agroscope expert told us that we needed to be able to detect a third sequence, neither BN nor FD. This sequence would need to be present regardless of whether FD or BN were present as well, and would serve two purposes :
- Extraction control : confirming that the DNA was extracted correctly from the plant
- Amplification control : confirming that the DNA amplification step worked
This control sequence is what we refer to as endogenous control, or EC for short.
For the target sequence, we chose to work with a gene from the V. vinifera chloroplast genome. More specifically, an intronic region of the tRNA-Leu (trnl) gene. We chose to use that gene because the chloroplast genome is well-documented and the trnl gene is highly conserved in all plants, which is essential since our control needs to work for all grapevine cultivars.
To construct the target sequence, we aligned 8 sequences from V. vinifera cultivars, subspecies and closely related species (additional material, table 2). The alignment confirmed the high level of similarities between the sequences. The target sequence was again selected within the generated consensus sequence.
Figure 4 : Alignement of trnl sequences from Vitis vinifera cultivars, subspecies and closely related species.
The final target sequences can be found in table 1 of the additional materials.