Xylencer

- Project
- Results
- Notebook
- Safety
- Human Practices
- About Us

Determining pathogenicity of the Xanthomonas species

The phage delivery bacterium (PDB) demands a strain that is both able to replicate X. fastidiosa phages and is not harmful to the treated plant. This requires an evolutionary closely related non-pathogenic bacterium. An excellent candidate is Xanthomonas, of which there are multiple reported non-pathogenic strains. However Xanthomonas is known as a predominantly pathogenic genus. By combining extensive literature research and genomic information into a machine learning approach, this subproject establishes a genetic basis for discerning pathogens from non-pathogens in the Xanthomonas species. This allowed us confidently select non-pathogenic X. arboricola strain CITA 44 for use as our PDB.

Principal component analysis plot of the average model predictions. This gives an overview of the perfomrance of the model on the differnt samples. Samples that were mislabeled in more than 5% of the cases are designated as 'missed' and are symbolized by a crossed out dot.

A total of 1372 Xanthomonas genomes were retrieved from the NCBI database and reannotated with protein domains, the functional units of proteins. Extensive literature research was performed to curate a high-quality dataset of 104 Xanthomonas genomes with experimentally verified pathogenicity. This dataset was used to train a random forest machine learning model. The performance of 100 models was combined to estimate the sensitivity and specificity of the model, resulting in an average sensitivity (non-pathogen prediction rate) of 0.90 ± 0.12 (SD) and average specificity (pathogen prediction rate) of 0.81 ± 0.11 (SD). The set of non-pathogens, that could be correctly predicted with high certainty, were selected as promising candidates for the PDB. Protein domains important for prediction were extracted from the model, allowing us to deepen our understanding of the biology of pathogenicity in the Xanthomonas genus, by identifying transposable DNA elements and the type III secretion systems as key factors in Xanthomonas pathogenicity.

Box-plots of the sensitivity (prediction rate of non-pathogens) and specificity (prediction rate of pathogens) of the 100 model replicates. This gives a measure of consistency of the predictions.

Introduction

The phage delivery bacterium is an integral part of Xylencer. Early on, trough our contact with phage experts, it became apparent that delivery of the phage to the infected areas of a diseased plant was one of the biggest bottlenecks holding back agricultural phage therapy.

Two major factors at play here are:

The difficulty of precise delivery to the areas of infection.
Damage to the phage during and after delivery due to environmental factors, of which the main culprit is UV damage.

Xylencer overcomes these hurdles with the introduction of the phage delivery bacterium (PDB). This bacterium forms a protective shell for the phage and facilitates targeted delivery of the phage inside the plant. However, one big question remains: Which specific bacterial species and strain do we use as the PDB?

What criteria do we apply to our PDB?
There are two main criteria when considering a bacterium for use as a PDB:
1. The bacterium should be able to replicate the phage, by having the right machinery to transcribe and translate the phage proteins.
2. The bacterium should not be harmful to the plant to which it will be applied as a cure.
To satisfy the first criterion, the PDB needs to have sufficiently similar machinery to X. fastidiosa, enabling it to correctly replicate the phage. The odds of this being true are increased the more evolutionary related the PDB is to X. fastidiosa. To satisfy the latter criterion, the bacterium should be non-pathogenic to the plant that is to be treated. Since X. fastidiosa can infect over 350 plant species, the PDB should be non-pathogenic to all of them. Simply put, it should be a general non-pathogen. Having a non-pathogenic PDB has the added benefit of increasing the safety of the therapy.

In the case of X. fastidiosa, these criteria are at apparent odds with each other. The order of the Xanthomonadaceae, of which X. fastidiosa is a part, is considered to be an entirely pathogenic order [1], this violates the second criterion. But if we look outside of the Xanthomonadaceae, the evolutionary distance becomes so large that the first criterion no longer holds. Luckily there might be a way out of this paradox

Over two decades ago, reports of non-pathogenic Xanthomonas strains started to appear [2] and this list has been slowly growing. Xanthomonas is the most closely related organism to X. fastidiosa, at 95% 16S rRNA similarity. Additionally, Xanthomonas produces a yellow pigment called xanthomonadin that protects it from photobiological damage [3]. This same pigment would also allow Xanthomonas to better protect the phage from UV damage. On top of that, it was already experimentally confirmed that X. fastidiosa phages can be replicated by Xanthomonas [4]. This means that these non-pathogenic Xanthomonas strains meet both of the criteria set for the PDB.

However, non-pathogenicity is a hard thing to prove. There is always a risk of testing the wrong host, testing the wrong conditions or using the wrong method of inoculation, to name a few. An example of this, is a set of X. arboricola strains isolated from infected strawberries, that were unable to cause symptoms when manually inoculated to strawberries again [5]. Based on this evidence, these strains could be considered non-pathogenic, but when the pathogenicity-assay was repeated at an increased humidity, symptoms did show [6]. Of course testing all possible hosts and conditions makes for a near-infinite parameter space and is unfeasible. Still, it would be desirable and responsible to extend the evidence for non-pathogenicity of the PDB beyond mere literature reports. This sub-project aims to provide additional evidence, by examining if a genetic basis for delineating pathogens from non-pathogens can be established using an in silico approach.

General Approach

This sub-project is based on the hypothesis that the reliability of the diagnosis of a bacterium as a non-pathogens is increased if we can accurately separate non-pathogens from pathogens on a genetic basis. This genetic basis can be established by combining literature information on pathogenicity with genome sequencing information and feeding it to a classifier (Figure 1). The reliability of this genetic basis can then be assessed by examining the performance of the classifier. This requires three components: A list of information on pathogenicity of different Xanthomonas strains, a database consisting of the corresponding genomes with functional annotations and lastly a classifier that can interpret this data and make a prediction on pathogenicity of a given strain. The next sections will go over these components in more detail.

*Figure 1:* A schematic overview of the approach. Literature study is combined with all publicly available genomes to trains a random forest machine learning model to predict pathogenicity of *Xanthomonas* strains

The Genomic Database

Publicly available Xanthomonas genomes were retrieved from the NCBI database, which yielded 1397 genomes. Quality control filtered-out 25 genomes, resulting in 1372 genomes spanning 35 different subspecies. To prevent differences in annotation date and method from contaminating the results, all genomes were re-annotated using the SAPP platform [7]. Prodigal was used to call genes [8], which were subsequently annotated with Pfam protein domains [9] using InterProScan [10].

*Figure 2:* Genome size versus gene count of the 1372 *Xanthomonas* genomes obtained after filtering.

Why use protein domains?

Gene function was inferred based on protein domain content instead of traditional global sequence similarity-based methods. The motivation for using protein domains was two-fold: First, the Hidden Markov Models (HMMs) used for predicting domains are more sophisticated than the matrices used for scoring sequence similarity. The HMMs each have their predetermined thresholds that can vary between domains, forgoing the problem encountered with sequence similarity methods of having to select arbitrary thresholds. This makes protein domains a better proxy for biological function [11]. Secondly, when compared to the calculation of bi-directional best hits required for sequence-similarity-based approaches, annotation of protein domains is less computationally intensive. This allows the computation time to scale linearly with the database size as opposed to the quadratic relationship observed for traditional methods [12].
Database statistics

All of the annotated genomes were uploaded to a graph database, to allow for easy querying. The database was interfaced using SPARQL queries and R was used for further analyses. The genomes had an average size of 4.8Mb containing 4425 genes on average. Figure 2 shows the relationship between genome size and gene count. Most of the genomes follow the expected trend of a linear relationship, with only a subgroup of X. oryzae showing a higher gene density than expected. This is most probably a product of the high amount of recombination and rapid evolution observed in this subspecies [13]. The subspecies albilineans, fragiae and dyei all show a reduced genome size, which is in line with the reported genome reduction ongoing in albilineans [14].

Figure 3: Genome size versus gene domain coverage of the 1372 Xanthomonas genomes obtained after filtering.

On average, domain annotation yielded at least one domain for 79% of the genes. The average domain abundance was 5213, 2370 of which were distinct. The distribution is visualized in Figure 3. X. oryzae shows the lowest coverage but is still in line with the observed trend. Coverage plateaus at 83%, with strains undergoing genome reduction showing a lower than expected coverage for their gene count. This can indicate that the set of unannotated genes is enriched with biologically important genes that are preserved under selective pressure. The relationship of domain abundance and the unique number of domains in Figure 4 shows a linear relationship, with higher domain abundance giving rise to a more diverse repertoire of domains. Again the subgroup of X. oryzae, that showed a higher gene density in Figure 2, stands out because they have less distinct domains than expected for their domain abundance. This could again be explained by the increased recombination activity, by which new genes are created from combinations of existing domains, increasing only abundance, not diversity.

Figure 4: Distinct number of domains versus total number of domains of the 1372 Xanthomonas genomes obtained after filtering.

A binary domain presence absence matrix was generated for each genome as the main starting point for further analyses (Figure 5). On this matrix a PCA was preformed to examine the natural grouping present in the data set. The PCA-plot (Figure 6) clusters genomes together on a species level, with two high-density areas near the bottom right. PC-1 is responsible for separation of most subspecies and explains significantly more variation than PC-2. PC-2 is mainly responsible from separating X. oryzae from the rest of the species, this a by-product of the over-representation of the species in the data set (33% of all genomes), causing the PCA to put a lot of weight on separating these genomes.

*Figure 5:* Binary domain presence/absence matrix for the *Xanthomonas* genus. Dark red indicates presence of a domain, light red indicates absence of a domain. X-axis represents the different domains, y-axis the different genomes. The colored bar at left indicates the species of the genome in that row.

*Figure 6:* Principal component analysis of all obtained *Xanthomonas* genomes based on domain presence/absence.

Curation Of The Pathogenicity Dataset

A model will only ever be as good as the data it is built on. Because of the importance of the model, it is vital that a high-quality dataset is available. In the case of Xanthomonas there is no centralized database or any other resource publicly accessible that holds a large amount of information on pathogenicity. This has brought the need to manually curate our own dataset from literature reports. Ideally, this dataset would be backed up by our own experimental data, but the safety risks involved handling a large number of pathogenic bacteria and the time-consuming nature of pathogenicity assays, has led to the decision to focus solely on computational research. Curation of the dataset was very strict, using only data that was backed-up by experimental confirmation of (non)-pathogenicity. If conflicting evidence existed for a strain, it was excluded from the dataset altogether. This also meant that weakly pathogenic strains were excluded because of the ambiguity in their classification. Genome availability of the selected strains was confirmed using the previously described database. This resulted in a pathogenicity dataset consisting of 104 experimentally verified and publicly available genomes [5, 15–27].

Figure 7: Overview of the distribution of the pathogenicity dataset over the total Xanthomonas database. PCA was generated from the domain presence and absence matrix. Genomes shown in gray are not included in the pathogenicity dataset. This figure can be compared to Figure 6 get the species distribution.

A deeper look at the dataset

The dataset is composed of 70 pathogens and 34 non-pathogens. This imbalance is caused by the extreme over-representation of pathogenic strains in literature. The list of non-pathogens should be very extensive for the time of writing. The dataset is dominated by the X. arboricola species (24 out of 34 genomes) as it is by far the most represented species in literature on non-pathogenicity. The list of pathogens is by no means exhaustive but is selected to reflect the diversity of the genus and counterbalance the predominance of the X. arboricola species in the non-pathogens, by bringing an equal number of pathogenic X. arboricola genomes. The PCA-plot in Figure 7 gives an overview of the distribution of the pathogenicity dataset over the previously described database of all Xanthomonas genomes. Figure 8 better shows the distribution within the curated dataset and Figure 9 show the corresponding domain presence/absence matrix. It is interesting to note that the X. arboricola species are scattered all over the plot, with the non-pathogenic strains clustering near the center-left. The pathogenic strains are more spread out over the PCA, but generally do separate well from the non-pathogens.

(a) Colored by pathogenicity (b) Colored by species
Figure 8: PCA-plot of domain presence/absence matrix of the pathogenicity dataset.
Pan & core domainome analysis

For a good model it is important that predictions don’t massively change upon introduction of new data/genomes. For a model to be able to capture all of the required information, it is important that the dataset itself contains most of the domains present in the genus. One way to estimate if this is the case, is to check the openness of the pan domainome. This was achieved by preforming a heap analysis on the domain presence/absence matrix of the pathogenicity dataset using the "micropan" package [28]. Heap analysis yielded an alpha value of 1.11, indicating that the pan domainome is indeed closed. To analyze how the number of genomes affect the size of the core/pan domainome, a 10-fold random sampling was preformed on a range of sample sizes between 1 and 104 at a 5 sample interval. Results are show in Figure 10, the estimated core and pan domainome size are calculated using "micropan", whilst observed pan/core domainome size are inferred from the matrix directly. The core domainome’s estimation seems to approximate 1000 domains. The pan domainome still appears to be growing, but the closeness of the pan domainome is reflected by the fact that the estimated and observed pan domainome sizes are approaching each other.

Figure 10: The effect of sample size on the true/estimated pan and core domainome size. Error bars are estimated with 10-fold random sampling and show standard error. The pan domain- ome consists of all domains found within a group, The core domainome consists only of the domains that are shared among all members of the group.
Close All

Model Building

The final piece of the puzzle is a suitable classifier. The pathogenicity dataset is a very "wide" dataset, at 104 samples and 2241 features, making it prone to over-fitting. The choice for a Random Forest model [29] was made because it is one of the top-performing models on bioinformatics datasets across the board [30] and on pathogenicity data [31], has great outof-the-box performance, doing well without any hyperparameter tuning [32] and is naturally resilient against over-fitting [29].

How we built the model

Model building was implemented using R’s "randomForest" package [33] for model building and "caret" [34] for model testing/optimization. The model parameter "ntree" was varied between 1,000 and 10,000 at intervals of a 1,000 and "mtry" between 50 and 150 at intervals of 5 to obtain the optimal performing parameter set. In this analysis non-pathogens were regarded as "positives" and pathogens as "negatives". The objective of maximizing specificity was selected to optimize performance on predicting pathogens, as incorrectly predicting pathogens is much more crippeling to the safety of our project. Each parameter set was benchmarked using 10-fold cross-validation. The parameter "ntree" had no impact on model performance and was set to a 1,000 to reduce computation time, the optimal "mtry" was 100. The optimized model was assessed using 100 times repeated 5-fold cross-validation (Figure 11), in which the imbalance in the dataset was corrected with down-sampling to yield a balanced train set. Model performance was assessed based on the sensitivity and specificity on the test set. These metrics were selected because they correct for the imbalance in the dataset. Model performance was estimated at a sensitivity of 0.90±0.12 (SD) and an average specificity of 0.81±0.11 (SD) Figure 12. This indicates that the model performed well on both groups of data, although the variability in between models is moderately high. The model performed better at classifying non-pathogens than pathogens, even though parameters were optimized to maximize performance on pathogens.

Figure 12: Box-plots of the sensitivity (prediction rate of non-pathogens) and specificity (prediction rate of pathogens) of the 100 calculated models.

Figure 13: PCA-plot of the average model predictions. Samples that were mislabeled in more than 5% of the cases are designated as "missed" symbolized by a crossed out dot.

Figure 13b: Dendrogram with prediciton accuracy

Figure 13b: Dendrogram based on complete linkage using the binary distance metric on the domain presence matrix. Red = pathogenic, Blue = non-pathogenic. The (x) indicates the strain is mispredicted by more than 5% of the models

Determining prediction accuracy

To obtain a measure for the prediction accuracy of individual samples, the model building procedure was repeated 100 times with a bootstrapped dataset. From the bootstrap results, the 90% bias-corrected confidence interval was calculated as a measure of prediction variability. Samples that were incorrectly predicted in 5 or more percent of the models are visualized in Figure 13, where we can observe that there are two clusters of non-pathogens that can be reliably predicted. Pathogens close to these clusters are often mislabeled, which can either mean that these samples might be mislabeled or can indicate that some part of the complexity is not captured by the model or the dataset. By minimizing the calculated variability and false prediction rate allow we can select a candidate strain for the PDB that is reliably separated from the pathogens. The list of candidates is compromised of all non-pathogens that have 5% confidence interval (CI) of over 95% prediction accuracy and is given in Table 1.

Table 1: Overview of all Xanthomonas strains with a 5% confidence interval (CI) of over 95%. This group forms the list of candidates for our PDB. The species with tag "Xanthomonas" are reported by the author of their source to be of a new species called X. sontii.
Species	Strain	Source	Accuracy	5% CI	95% CI
X. arboricola	3004	[19]	100	100,00	100,00
X. arboricola	CFBP 7614	[22]	100	100,00	100,00
X. arboricola	CFBP 7629	[17]	100	100,00	100,00
X. arboricola	CFBP 7634	[17]	100	100,00	100,00
X. arboricola	CFBP 7653	[17]	100	100,00	100,00
X. arboricola	CFBP 8130	[22]	100	100,00	100,00
X. arboricola	CFBP 8138	[22]	100	100,00	100,00
X. arboricola	CFBP 8147	[22]	100	100,00	100,00
X. arboricola	CFBP 8149	[22]	100	100,00	100,00
X. arboricola	CFBP 8150	[22]	100	100,00	100,00
X. arboricola	CFBP7604	[22]	100	100,00	100,00
X. arboricola	CFBP7697	[22]	100	100,00	100,00
X. arboricola	CITA 124	[20]	100	100,00	100,00
X. arboricola	CITA 44	[20]	100	100,00	100,00
Xanthomonas	SHU 308	[16]	100	100,00	100,00
X. arboricola	CFBP 8153	[22]	100	99,94	100,00
X. arboricola	CFBP 1022	[17]	100	99,00	100,00
X. arboricola	CFBP 7610	[22]	100	99,00	100,00
X. arboricola	CFBP 7622	[22]	100	99,00	100,00
X. arboricola	CFBP 7652	[17]	100	99,00	100,00
X. arboricola	CFBP 8132	[22]	100	99,00	100,00
X. arboricola	CFBP 8152	[22]	100	99,00	100,00
X. sacchari	R1	[18]	99	99,00	99,00
Xanthomonas	SHU 166	[16]	100	99,00	100,00
Xanthomonas	SHU 199	[16]	99	99,00	100,00
X. arboricola	CFBP 7645	[17]	100	98,00	100,00
X. sacchari	PPL1	[15]	99	98,00	100,00
X. sacchari	PPL2	[15]	99	98,00	100,00
X. arboricola	CFBP 8142	[22]	100	97,98	100,00
X. arboricola	CFBP 7651	[17]	100	95,74	100,00

From this list, X. arboricola strain CITA 44 was selected as the most promising candidate as it has a 100% confidence interval, lacks the type III secretion system, shown to be central to pathogenicity later in this subproject and has the most rigorous experimental data backing it up, as it was tested on five different hosts [20].

Samples with a 5% CI of below 50% were regarded as "hard to predict" and are displayed in Table 2. The model seems to be biased towards labeling X. arboricola species as nonpathogenic, as they are all predicted corrected, but their pathogenic counter part is often mislabeled. Again this could also mean that these pathogenic strains are incorrectly annotated, but only in planta experiments can provide more conclusive evidence for such claims

Table 2: Samples difficult to predict

Table 2: Samples that were predicted with a lower-bound of less then 50%, these are regarded as "difficult" to predict.

(a) Non-pathogens
Species	Strain	Source	Accuracy	5%CI	95%CI
X. translucens	CS2	[17]	0	0,00	1,00
X. maliensis	M97	[23]	2	0,00	20,47
X. axonopodis	NCPPB 1159	[21]	16	7,00	31,00
X. axonopodis	ORST4	[21]	48	26,75	73,77

(b) Pathogens
Species	Strain	Source	Accuracy	5%CI	95%CI
X. arboricola	CFBP 6771	[22]	0	0,00	0,00
X. arboricola	CFBP6827	[22]	0	0,00	0,00
X. arboricola	CFBP 7410	[22]	0	0,00	1,00
X. arboricola	NCPPB 1832	[19]	0	0,00	1,00
X. arboricola	NCPPB 1630	[19]	0	0,00	2,00
X. arboricola	CITA 14	[20]	1	0,00	9,00
X. arboricola	CFBP 3122	[22]	3	0,00	26,00
X. arboricola	CFBP3123	[22]	9	1,00	38,95
X. arboricola	CFBP 7407	[22]	26	1,78	76,00
X. pisi	CFBP4643	[22]	50	10,91	84,40
X. axonopodis	CFBP1851	[21]	16	13,48	22,00
X. theicola	CFBP 4691	[22]	50	17,53	74,11
X. albilineans	CFBP2523	[22]	59	21,00	89,00
X. axonopodis	ORST17	[21]	49	32,00	56,23
X. sacchari	NCPPB 4393	[24]	36	36,00	36,00
X. hyacinthi	CFBP1156	[22]	79	36,95	85,25
X. melonis	CFBP4644	[22]	81	40,00	98,00
X. sacchari	CFBP4641	[22]	43	43,00	43,00

Learning from machine learning

Not only can we use the model to select a list of candidates, but we can also study the most important predictors of the model. This set of predictors might hold new information about the biology of Xanthomonas pathogenicity. Random Forest is an ensemble method, meaning that we cannot directly interpret the model. One thing we can do is estimate the importance of a variable/domain by observing the mean decrease in accuracy when the given domain is left out of the model. The mean decrease of the 50 most important domains is plotted in Figure 14. The mean decrease in accuracy has no directionality making it impossible to asses whether a domain is important for either pathogens or non-pathogens. In an attempt to visualize the propensity of the domains, heatmaps of top 30 domains were created Figure 15 & 16. The heatmaps show that no one domain is uniquely present in either of the groups, but most domains do show a clear difference in abundance between the two different subgroups. A summary of top 30 domains is given in Table 3.

*Figure 14:* Top 30 most predictive Pfam domains for delinaitaing pathogens from non-pathogens in *Xanhtomonas*, based on the scaled mean decrease in variance.

Figures 15 & 16: Domain presence matrices

Figure 15: Heat-map of domain presence/absence of the top 30 most important Pfam do- mains. Domains are ordered from most to least important. Dark red indicates presence of a domain, light red indicates absence of a domain. X-axis represents the different domains, y-axis the different genomes.

Figure 16: Heat-map of domain presence/absence of the top 30 most important Pfam do- mains, the domains and samples are clustered using complete linkage and the binary distance metric. Dark red indicates presence of the domain, light red indicates absence of a domain. X-axis represents the different domains, y-axis a different genomes.

Table 3: Most important domains

*Table 3:* Top 30 Pfam domains and their description. Propensity contains the groups in which the domain is most commonly found. Groups: 1 = non-pathogenic *X. arboricola,* 2 = non-pathogens excluding *X. arboricola,* 3 = pathogenic *X. arboricola,* 4 = pathogens excluding *X. arboricola*
Domain	Description	Propensity
PF01845	Toxin CcdB	4
PF07362	Post-segregation antitoxin CcdA	4
PF09487	Type III secretion protein HrpB2	3, 4
PF01609	Transposase; IS4-like	2, 3, 4
PF05394	Avirulence B/C	3, 4
PF07532	Bacterial Ig-related	2, 3, 4
PF09838	Protein of unknown function DUF2065	-
PF09483	Type III secretion protein HpaP	3, 4
PF09502	Type III secretion protein HrpB4	3, 4
PF01548	Transposase; IS111A/IS1328/IS1533; N-terminal	2, 3, 4
PF05426	Alginate lyase domain	1, 2, 3
PF01095	Pectinesterase; catalytic	1, 2, 4
PF13855	Leucine-rich repeat	3, 4
PF09486	Type III secretion protein HrpB7	3, 4
PF02638	Glycosyl hydrolase-like 10	4
PF09386	Antitoxin ParD	1
PF09613	Type III secretion system; HrpB1/HrpK	3, 4
PF02371	Transposase; IS116/IS110/IS902	2, 3, 4
PF17263	Protein of unknown function DUF5329	1, 2, 3
PF05621	Bacterial TniB	3
PF05932	Tir chaperone protein (CesT) family	3, 4
PF05015	Toxin HigB-1	2, 3, 4
PF03412	Peptidase C39; bacteriocin processing	2, 3, 4
PF09286	Peptidase S53; activation domain	2, 3, 4
PF02498	BRO N-terminal domain	3, 4
PF09907	Toxin-antitoxin system; toxin component; HigB; putative	-
PF10899	Abortive phage resistance protein AbiGi	3
PF07638	RNA polymerase sigma-70 ECF-like	1, 2, 3
PF13438	Domain of unknown function DUF4113	1
PF17784	Sulfotransferase; S, mansonii-type	2, 3, 4

From this list of most predictive domains and their propensity we can make the following important obserations:

Two parts of the Ccd toxin anti-toxin system seem to be crucial for pathogens outside of X. arboricola species, this system is known as a plasmid retention system [35] and might be linked to a plasmid important for pathogenicity that could have ended up in the genome, either as an assembly error or as a real biological phenomenon.
Amongst the top ranking domains are three domains related to transposases, transposases are often found to flank pathogenicity islands in genomes of Xanthomonas and X. fastidiosa [36] and could be a more reliable predictor than any of the genes found in these islands.
Type III secretion related domains make up the most abundant class of domains, with five unique domains. Type III secretion proteins hrpB1/HrpK, HrpB2, and HrpB4 are all part of a set of secretion proteins that are essential but non-conserved in X. campestris [37]. HrpB1/HrpK and HrpB4 are membrane-bound proteins, in contrast to HrpB2 that is secreted. HrpB2 is possibly acting as a translocator for effectors into the cell host. Both HrpB7 and HpaP serve an unknown function but are commonly found in type III secretion operons. Avirulence genes are common effectors secreted by the type III secretion system and are used to suppress plant immune responses. Certain plants have adapted hypersensitive responses against these avirulence genes, allowing the plant to detect and respond to the pathogen with an effectortriggered immunity response [38]. Interestingly, the avirulence domain has a propensity towards pathogens, this could mean that the domain is not causing a hyper sensitive response and is actually important for virulence.
The bacterial Ig-related fold is commonly found on the outside of bacteria and is a target for receptors that trigger immune responses against these bacteria in mammalians, the function of this fold is still unknown [39].
Alginate degradation has not been linked to pathogenicity or non-pathogenicity in Xanthomonas, but for Flavobacterium there are reports of alginate as an anti-microbial compound, protecting the bacterium and its plant host from pathogens [40].
The leucine-rich-repeat (LRR) domain is most likely associated with a type 3 effector, that can be used by pathogens to suppress the plant’s defenses. Interestingly, one of the few LRRs known in Xanthomonas confers an avirulence of X. campestris in A. thaliana [41].
Tir chaperone proteins are also involved in type III secretion and are a strong predictor of pathogenic potential in E. coli. These small cytosolic proteins serve to stabilize secreted effectors in a secretion-competent state [42].

Overall non-pathogenicity seems to be dictated by a lack of domains deemed important by the model, i.e. pathogenicity seems to be caused by a gain of function in domains that allow the bacterium to exploit its host. Proteins related to type III secretion are over-represented in the set of important predictors and have a propensity towards pathogenic bacteria, this class of proteins is already hypothesized to be important for pathogenicity in Xanthomonas [13] and their predictive power underlines the importance of understanding this system.

Data availability

All genomes used are available at the NCBI ftp database. Genomes were annotated using the SAPP platform and were uploaded to a GraphDB SPARQL database. The manually curated pathogenicity dataset is avaible here and the R makdown version of the script that was used to generate the results is available here.

References
1. Ania M Cutiño-Jiménez, Marinalva Martins-Pinheiro, Wanessa C Lima, Alexander Martín-Tornet, Osleidys G Morales, and Carlos FM Menck. Evolutionary placement of xanthomonadales based on conserved protein signature sequences. Molecular phylogenetics and evolution, 54(2):524–534, 2010.
2. Luc Vauterin, Ping Yang, Anne Alvarez, Yuichi Takikawa, Don A Roth, Anne K Vidaver, Robert E Stall, Karel Kersters, and Jean Swings. Identification of non-pathogenic xanthomonas strains associated with plants. Systematic and applied microbiology, 19(1):96– 105, 1996.
3. AR Poplawsky, SC Urban, and W Chun. Biological role of xanthomonadin pigments inxanthomonas campestris pv. campestris. Appl. Environ. Microbiol., 66(12):5123–5127, 2000.
4. Stephen J Ahern, Mayukh Das, Tushar Suvra Bhowmick, Ry Young, and Carlos F Gonzalez. Characterization of novel virulent broad-host-range phages of xylella fastidiosa and xanthomonas. Journal of bacteriology, 196(2):459–471, 2014.
5. Joachim Vandroemme, Bart Cottyn, Joël F Pothier, Valentin Pflüger, Brion Duffy, and Martine Maes. Xanthomonas arboricola pv. fragariae: what’s in a name? Plant Pathology, 62(5):1123–1131, 2013.
6. Patrizia Ferrante and Marco Scortichini. Xanthomonas arboricola pv. fragariae: a confirmation of the pathogenicity of the pathotype strain. European journal of plant pathology, 150(3):825–829, 2018.
7. Jasper J Koehorst, Jesse CJ van Dam, Edoardo Saccenti, Vitor AP Martins dos Santos, Maria Suarez-Diez, and Peter J Schaap. Sapp: functional genome annotation and analysis through a semantic framework using fair principles. Bioinformatics, 34(8):1401–1403, 2017.
8. Doug Hyatt, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics, 11(1):119, 2010.
9. Sara El-Gebali, Jaina Mistry, Alex Bateman, Sean R Eddy, Aurélien Luciani, Simon C Potter, Matloob Qureshi, Lorna J Richardson, Gustavo A Salazar, Alfredo Smart, et al. The pfam protein families database in 2019. Nucleic acids research, 47(D1):D427–D432, 2018.
10. Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, et al. Interproscan 5: genome-scale protein function classification. Bioinformatics, 30(9):1236–1240, 2014.
11. Chris P Ponting and Robert R Russell. The natural history of protein domains. Annual review of biophysics and biomolecular structure, 31(1):45–71, 2002.
12. Jasper J Koehorst, Edoardo Saccenti, Peter J Schaap, Vitor AP Martins dos Santos, and Maria Suarez-Diez. Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Research, 5, 2016.
13. Robert P Ryan, Frank-Jörg Vorhölter, Neha Potnis, Jeffrey B Jones, Marie-Anne Van Sluys, Adam J Bogdanove, and J Maxwell Dow. Pathogenomics of xanthomonas: understanding bacterium–plant interactions. Nature Reviews Microbiology, 9(5):344, 2011.
14. Isabelle Pieretti, Monique Royer, Valérie Barbe, Sébastien Carrere, Ralf Koebnik, Stéphane Cociancich, Arnaud Couloux, Armelle Darrasse, Jérôme Gouzy, Marie-Agnès Jacques, et al. The complete genome sequence of xanthomonas albilineans provides new insights into the reductive genome evolution of the xylem-limited xanthomonadaceae. BMC genomics, 10(1):616, 2009.
15. Kanika Bansal, Amandeep Kaur, Samriti Midha, Sanjeet Kumar, Suresh Korpole, and Prabhu B Patil. Xanthomonas sontii sp. nov., a non-pathogenic bacterium isolated from healthy basmati rice (oryza sativa) seeds from india. bioRxiv, page 738047, 2019.
16. Kanika Bansal, Samriti Midha, Sanjeet Kumar, Amandeep Kaur, Ramesh V Sonti, and Prabhu B Patil. Ecological and evolutionary insights into pathogenic and non-pathogenic rice associated xanthomonas. bioRxiv, page 453373, 2019.
17. Salwa Essakhi, Sophie Cesbron, Marion Fischer-Le Saux, Sophie Bonneau, Marie-Agnès Jacques, and Charles Manceau. Phylogenetic and variable-number tandem-repeat analyses identify nonpathogenic xanthomonas arboricola lineages lacking the canonical type iii secretion system. Appl. Environ. Microbiol., 81(16):5395–5410, 2015.
18. Yunxia Fang, Haiyan Lin, Liwen Wu, Deyong Ren, Weijun Ye, Guojun Dong, Li Zhu, and Longbiao Guo. Genome sequence of xanthomonas sacchari r1, a biocontrol bacterium isolated from the rice seed. Journal of biotechnology, 206:77–78, 2015.
19. Jerson Garita-Cambronero, Ana Palacio-Bielsa, and Jaime Cubero. Xanthomonas arboricola pv. pruni, causal agent of bacterial spot of stone fruits and almond: its genomic and phenotypic characteristics in the x. arboricola species context. Molecular plant pathology, 19(9):2053–2065, 2018.
20. Jerson Garita-Cambronero, Ana Palacio-Bielsa, María M López, and Jaime Cubero. Pan-genomic analysis permits differentiation of virulent and non-virulent strains of xanthomonas arboricola that cohabit prunus spp. and elucidate bacterial virulence factors. Frontiers in microbiology, 8:573, 2017.
21. Carolina Gonzalez, Silvia Restrepo, Joe Tohme, and Valérie Verdier. Characterization of pathogenic and nonpathogenic strains of xanthomonas axonopodis pv. manihotis by pcr-based dna fingerprinting techniques. FEMS microbiology letters, 215(1):23–31, 2002.
22. Déborah Merda, Sophie Bonneau, Jean-François Guimbaud, Karine Durand, Chrystelle Brin, Tristan Boureau, Christophe Lemaire, Marie-Agnès Jacques, and Marion FischerLe Saux. Recombination-prone bacterial strains form a reservoir from which epidemic clones emerge in agroecosystems. Environmental microbiology reports, 8(5):572–581, 2016.
23. Lindsay R Triplett, Valérie Verdier, Tony Campillo, Cinzia Van Malderghem, Ilse Cleenwerck, Martine Maes, Loïc Deblais, Rene Corral, Ousmane Koita, Bart Cottyn, et al. Characterization of a novel clade of xanthomonas isolated from rice leaves in mali and proposal of xanthomonas maliensis sp. nov. Antonie Van Leeuwenhoek, 107(4):869–881, 2015.
24. G Karamura, Julian Smith, David Studholme, Jerome Kubiriba, and E Karamura. Comparative pathogenicity studies of the xanthomonas vasicola species on maize, sugarcane and banana. Afr. J. Plant Sci, 9:385–400, 2015.
25. Wei Qian, Yantao Jia, Shuang-Xi Ren, Yong-Qiang He, Jia-Xun Feng, Ling-Feng Lu, Qihong Sun, Ge Ying, Dong-Jie Tang, Hua Tang, et al. Comparative and functional genomic analyses of the pathogenicity of phytopathogen xanthomonas campestris pv. campestris. Genome research, 15(6):757–767, 2005.
26. David J Studholme, Eric Kemen, Daniel MacLean, Sebastian Schornack, Valente Aritua, rd Thwaites, Murray Grant, Julian Smith, and Jonathan DG Jones. Genome-wide sequencing data reveals virulence factors implicated in banana xanthomonas wilt. FEMS microbiology letters, 310(2):182–192, 2010.
27. Issa Wonni, Bart Cottyn, Liselot Detemmerman, S Dao, L Ouedraogo, S Sarra, C Tekete, S Poussier, R Corral, L Triplett, et al. Analysis of xanthomonas oryzae pv. oryzicola population in mali and burkina faso reveals a high level of genetic and pathogenic diversity. Phytopathology, 104(5):520–531, 2014.
28. Lars Snipen and Kristian Hovde Liland. micropan: an r-package for microbial pangenomics. BMC bioinformatics, 16(1):79, 2015.
29. Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
30. Randal S Olson, William La Cava, Zairah Mustahsan, Akshay Varik, and Jason H Moore. Data-driven advice for applying machine learning to bioinformatics problems. arXiv preprint arXiv:1708.05070, 2017.
31. Tjerko Kamminga, Jasper J Koehorst, Paul Vermeij, Simen-Jan Slagman, Vitor AP Martins dos Santos, Jetta JE Bijlsma, and Peter J Schaap. Persistence of functional protein domains in mycoplasma species and their role in host specificity and synthetic minimal life. Frontiers in cellular and infection microbiology, 7:31, 2017.
32. Raphael Couronné, Philipp Probst, and Anne-Laure Boulesteix. Random forest versus logistic regression: a large-scale benchmark experiment. BMC bioinformatics, 19(1):270, 2018.
33. Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news, 2(3):18–22, 2002.
34. Max Kuhn. Caret: classification and regression training. Astrophysics Source Code Library, 2015.
35. Laurence Van Melderen, Philippe Bernard, and Martine Couturier. Lon-dependent proteolysis of ccda is the key control for activation of ccdb in plasmid-free segregant bacteria. Molecular microbiology, 11(6):1151–1157, 1994.
36. Claudia B Monteiro-Vitorello, Mariana C De Oliveira, Marcelo M Zerillo, Alessandro M Varani, Edwin Civerolo, and Marie-Anne Van Sluys. Xylella and xanthomonas mobil’omics. Omics: a journal of integrative biology, 9(2):146–159, 2005.
37. Ombeline Rossier, Guido Van den Ackerveken, and Ulla Bonas. Hrpb2 and hrpf from xanthomonas are type iii-secreted proteins and essential for pathogenicity and recognition by the host plant. Molecular microbiology, 38(4):828–838, 2000.
38. Yulei Shang, Xinyan Li, Haitao Cui, Ping He, Roger Thilmony, Satya Chintamanani, Julie Zwiesler-Vollick, Suresh Gopalan, Xiaoyan Tang, and Jian-Min Zhou. Rar1, a central player in plant immunity, is targeted by pseudomonas syringae effector avrb. Proceedings of the National Academy of Sciences, 103(50):19200–19205, 2006.
39. Qian Han, Ning Liu, Howard Robinson, Lin Cao, Changli Qian, Qianfu Wang, Lei Xie, Haizhen Ding, Qian Wang, Yongping Huang, et al. Biochemical characterization and crystal structure of a gh10 xylanase from termite gut bacteria reveal a novel structural feature and significance of its bacterial ig-like domain. Biotechnology and bioengineering, 110(12):3093–3103, 2013.
40. Q-D An, G-L Zhang, H-T Wu, Z-C Zhang, G-S Zheng, L Luan, Yoshiyuki Murata, and X Li. Alginate-deriving oligosaccharide production by alginase from newly isolated flavobacterium sp. lxa and its potential application in protection against pathogens. Journal of applied microbiology, 106(1):161–170, 2009.
41. Rong-Qi Xu, Servane Blanvillain, Jia-Xun Feng, Bo-Le Jiang, Xian-Zhen Li, HongYu Wei, Thomas Kroj, Emmanuelle Lauber, Dominique Roby, Baoshan Chen, et al. Avracxcc8004, a type iii effector with a leucine-rich repeat domain from xanthomonas campestris pathovar campestris confers avirulence in vascular tissues of arabidopsis thaliana ecotype col-0. Journal of bacteriology, 190(1):343–355, 2008.
42. Robin M Delahay, Robert K Shaw, Simon J Elliott, James B Kaper, Stuart Knutton, and Gad Frankel. Functional analysis of the enteropathogenic escherichia coli type iii secretion system chaperone cest identifies domains that mediate substrate interactions. Molecular microbiology, 43(1):61–73, 2002.

SECTIONS
Back To Top

[1] Ania M Cutiño-Jiménez, Marinalva Martins-Pinheiro, Wanessa C Lima, Alexander Martín-Tornet, Osleidys G Morales, and Carlos FM Menck. Evolutionary placement of xanthomonadales based on conserved protein signature sequences. Molecular phylogenetics and evolution, 54(2):524–534, 2010.

[2] Luc Vauterin, Ping Yang, Anne Alvarez, Yuichi Takikawa, Don A Roth, Anne K Vidaver, Robert E Stall, Karel Kersters, and Jean Swings. Identification of non-pathogenic xanthomonas strains associated with plants. Systematic and applied microbiology, 19(1):96– 105, 1996.

[3] AR Poplawsky, SC Urban, and W Chun. Biological role of xanthomonadin pigments inxanthomonas campestris pv. campestris. Appl. Environ. Microbiol., 66(12):5123–5127, 2000.

[4] Stephen J Ahern, Mayukh Das, Tushar Suvra Bhowmick, Ry Young, and Carlos F Gonzalez. Characterization of novel virulent broad-host-range phages of xylella fastidiosa and xanthomonas. Journal of bacteriology, 196(2):459–471, 2014.

[5] Joachim Vandroemme, Bart Cottyn, Joël F Pothier, Valentin Pflüger, Brion Duffy, and Martine Maes. Xanthomonas arboricola pv. fragariae: what’s in a name? Plant Pathology, 62(5):1123–1131, 2013.

[6] Patrizia Ferrante and Marco Scortichini. Xanthomonas arboricola pv. fragariae: a confirmation of the pathogenicity of the pathotype strain. European journal of plant pathology, 150(3):825–829, 2018.

[7] Jasper J Koehorst, Jesse CJ van Dam, Edoardo Saccenti, Vitor AP Martins dos Santos, Maria Suarez-Diez, and Peter J Schaap. Sapp: functional genome annotation and analysis through a semantic framework using fair principles. Bioinformatics, 34(8):1401–1403, 2017.

[8] Doug Hyatt, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics, 11(1):119, 2010.

[9] Sara El-Gebali, Jaina Mistry, Alex Bateman, Sean R Eddy, Aurélien Luciani, Simon C Potter, Matloob Qureshi, Lorna J Richardson, Gustavo A Salazar, Alfredo Smart, et al. The pfam protein families database in 2019. Nucleic acids research, 47(D1):D427–D432, 2018.

[10] Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, et al. Interproscan 5: genome-scale protein function classification. Bioinformatics, 30(9):1236–1240, 2014.

[11] Chris P Ponting and Robert R Russell. The natural history of protein domains. Annual review of biophysics and biomolecular structure, 31(1):45–71, 2002.

[12] Jasper J Koehorst, Edoardo Saccenti, Peter J Schaap, Vitor AP Martins dos Santos, and Maria Suarez-Diez. Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics. F1000Research, 5, 2016.

[13] Robert P Ryan, Frank-Jörg Vorhölter, Neha Potnis, Jeffrey B Jones, Marie-Anne Van Sluys, Adam J Bogdanove, and J Maxwell Dow. Pathogenomics of xanthomonas: understanding bacterium–plant interactions. Nature Reviews Microbiology, 9(5):344, 2011.

[14] Isabelle Pieretti, Monique Royer, Valérie Barbe, Sébastien Carrere, Ralf Koebnik, Stéphane Cociancich, Arnaud Couloux, Armelle Darrasse, Jérôme Gouzy, Marie-Agnès Jacques, et al. The complete genome sequence of xanthomonas albilineans provides new insights into the reductive genome evolution of the xylem-limited xanthomonadaceae. BMC genomics, 10(1):616, 2009.

[15] Kanika Bansal, Amandeep Kaur, Samriti Midha, Sanjeet Kumar, Suresh Korpole, and Prabhu B Patil. Xanthomonas sontii sp. nov., a non-pathogenic bacterium isolated from healthy basmati rice (oryza sativa) seeds from india. bioRxiv, page 738047, 2019.

[16] Kanika Bansal, Samriti Midha, Sanjeet Kumar, Amandeep Kaur, Ramesh V Sonti, and Prabhu B Patil. Ecological and evolutionary insights into pathogenic and non-pathogenic rice associated xanthomonas. bioRxiv, page 453373, 2019.

[17] Salwa Essakhi, Sophie Cesbron, Marion Fischer-Le Saux, Sophie Bonneau, Marie-Agnès Jacques, and Charles Manceau. Phylogenetic and variable-number tandem-repeat analyses identify nonpathogenic xanthomonas arboricola lineages lacking the canonical type iii secretion system. Appl. Environ. Microbiol., 81(16):5395–5410, 2015.

[18] Yunxia Fang, Haiyan Lin, Liwen Wu, Deyong Ren, Weijun Ye, Guojun Dong, Li Zhu, and Longbiao Guo. Genome sequence of xanthomonas sacchari r1, a biocontrol bacterium isolated from the rice seed. Journal of biotechnology, 206:77–78, 2015.

[19] Jerson Garita-Cambronero, Ana Palacio-Bielsa, and Jaime Cubero. Xanthomonas arboricola pv. pruni, causal agent of bacterial spot of stone fruits and almond: its genomic and phenotypic characteristics in the x. arboricola species context. Molecular plant pathology, 19(9):2053–2065, 2018.

[20] Jerson Garita-Cambronero, Ana Palacio-Bielsa, María M López, and Jaime Cubero. Pan-genomic analysis permits differentiation of virulent and non-virulent strains of xanthomonas arboricola that cohabit prunus spp. and elucidate bacterial virulence factors. Frontiers in microbiology, 8:573, 2017.

[21] Carolina Gonzalez, Silvia Restrepo, Joe Tohme, and Valérie Verdier. Characterization of pathogenic and nonpathogenic strains of xanthomonas axonopodis pv. manihotis by pcr-based dna fingerprinting techniques. FEMS microbiology letters, 215(1):23–31, 2002.

[22] Déborah Merda, Sophie Bonneau, Jean-François Guimbaud, Karine Durand, Chrystelle Brin, Tristan Boureau, Christophe Lemaire, Marie-Agnès Jacques, and Marion FischerLe Saux. Recombination-prone bacterial strains form a reservoir from which epidemic clones emerge in agroecosystems. Environmental microbiology reports, 8(5):572–581, 2016.

[23] Lindsay R Triplett, Valérie Verdier, Tony Campillo, Cinzia Van Malderghem, Ilse Cleenwerck, Martine Maes, Loïc Deblais, Rene Corral, Ousmane Koita, Bart Cottyn, et al. Characterization of a novel clade of xanthomonas isolated from rice leaves in mali and proposal of xanthomonas maliensis sp. nov. Antonie Van Leeuwenhoek, 107(4):869–881, 2015.

[24] G Karamura, Julian Smith, David Studholme, Jerome Kubiriba, and E Karamura. Comparative pathogenicity studies of the xanthomonas vasicola species on maize, sugarcane and banana. Afr. J. Plant Sci, 9:385–400, 2015.

[25] Wei Qian, Yantao Jia, Shuang-Xi Ren, Yong-Qiang He, Jia-Xun Feng, Ling-Feng Lu, Qihong Sun, Ge Ying, Dong-Jie Tang, Hua Tang, et al. Comparative and functional genomic analyses of the pathogenicity of phytopathogen xanthomonas campestris pv. campestris. Genome research, 15(6):757–767, 2005.

[26] David J Studholme, Eric Kemen, Daniel MacLean, Sebastian Schornack, Valente Aritua, rd Thwaites, Murray Grant, Julian Smith, and Jonathan DG Jones. Genome-wide sequencing data reveals virulence factors implicated in banana xanthomonas wilt. FEMS microbiology letters, 310(2):182–192, 2010.

[27] Issa Wonni, Bart Cottyn, Liselot Detemmerman, S Dao, L Ouedraogo, S Sarra, C Tekete, S Poussier, R Corral, L Triplett, et al. Analysis of xanthomonas oryzae pv. oryzicola population in mali and burkina faso reveals a high level of genetic and pathogenic diversity. Phytopathology, 104(5):520–531, 2014.

[28] Lars Snipen and Kristian Hovde Liland. micropan: an r-package for microbial pangenomics. BMC bioinformatics, 16(1):79, 2015.

[29] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

[30] Randal S Olson, William La Cava, Zairah Mustahsan, Akshay Varik, and Jason H Moore. Data-driven advice for applying machine learning to bioinformatics problems. arXiv preprint arXiv:1708.05070, 2017.

[31] Tjerko Kamminga, Jasper J Koehorst, Paul Vermeij, Simen-Jan Slagman, Vitor AP Martins dos Santos, Jetta JE Bijlsma, and Peter J Schaap. Persistence of functional protein domains in mycoplasma species and their role in host specificity and synthetic minimal life. Frontiers in cellular and infection microbiology, 7:31, 2017.

[32] Raphael Couronné, Philipp Probst, and Anne-Laure Boulesteix. Random forest versus logistic regression: a large-scale benchmark experiment. BMC bioinformatics, 19(1):270, 2018.

[33] Andy Liaw, Matthew Wiener, et al. Classification and regression by randomforest. R news, 2(3):18–22, 2002.

[34] Max Kuhn. Caret: classification and regression training. Astrophysics Source Code Library, 2015.

[35] Laurence Van Melderen, Philippe Bernard, and Martine Couturier. Lon-dependent proteolysis of ccda is the key control for activation of ccdb in plasmid-free segregant bacteria. Molecular microbiology, 11(6):1151–1157, 1994.

[36] Claudia B Monteiro-Vitorello, Mariana C De Oliveira, Marcelo M Zerillo, Alessandro M Varani, Edwin Civerolo, and Marie-Anne Van Sluys. Xylella and xanthomonas mobil’omics. Omics: a journal of integrative biology, 9(2):146–159, 2005.

[37] Ombeline Rossier, Guido Van den Ackerveken, and Ulla Bonas. Hrpb2 and hrpf from xanthomonas are type iii-secreted proteins and essential for pathogenicity and recognition by the host plant. Molecular microbiology, 38(4):828–838, 2000.

[38] Yulei Shang, Xinyan Li, Haitao Cui, Ping He, Roger Thilmony, Satya Chintamanani, Julie Zwiesler-Vollick, Suresh Gopalan, Xiaoyan Tang, and Jian-Min Zhou. Rar1, a central player in plant immunity, is targeted by pseudomonas syringae effector avrb. Proceedings of the National Academy of Sciences, 103(50):19200–19205, 2006.

[39] Qian Han, Ning Liu, Howard Robinson, Lin Cao, Changli Qian, Qianfu Wang, Lei Xie, Haizhen Ding, Qian Wang, Yongping Huang, et al. Biochemical characterization and crystal structure of a gh10 xylanase from termite gut bacteria reveal a novel structural feature and significance of its bacterial ig-like domain. Biotechnology and bioengineering, 110(12):3093–3103, 2013.

[40] Q-D An, G-L Zhang, H-T Wu, Z-C Zhang, G-S Zheng, L Luan, Yoshiyuki Murata, and X Li. Alginate-deriving oligosaccharide production by alginase from newly isolated flavobacterium sp. lxa and its potential application in protection against pathogens. Journal of applied microbiology, 106(1):161–170, 2009.

[41] Rong-Qi Xu, Servane Blanvillain, Jia-Xun Feng, Bo-Le Jiang, Xian-Zhen Li, HongYu Wei, Thomas Kroj, Emmanuelle Lauber, Dominique Roby, Baoshan Chen, et al. Avracxcc8004, a type iii effector with a leucine-rich repeat domain from xanthomonas campestris pathovar campestris confers avirulence in vascular tissues of arabidopsis thaliana ecotype col-0. Journal of bacteriology, 190(1):343–355, 2008.

[42] Robin M Delahay, Robert K Shaw, Simon J Elliott, James B Kaper, Stuart Knutton, and Gad Frankel. Functional analysis of the enteropathogenic escherichia coli type iii secretion system chaperone cest identifies domains that mediate substrate interactions. Molecular microbiology, 43(1):61–73, 2002.

Team:Wageningen UR/Results/Pathogenicity