Team:Stony Brook/Model

iGEM SBU 2019

Image processing is the use of algorithms to analyze digital images. This can include classification of objects and images, and pattern recognition. Image processing is also useful for analyzing a system that is difficult to analyze by hand. For our project, we used image processing and analysis to visualize and quantify the amount of mottling and fluorescence in our leaves during our experiments.

One of the symptoms of TMV is a mottled “mosaic” pattern on the leaf. This is seen as yellow or light green spots on the surface of the leaf (figure 1). Since it has been shown that these areas contained virus, and that the dark green areas are resistant to virus [1], we expected that if our gene was reducing the amount of viral RNA, it would also reduce the amount of mottling seen on the leaf.

Figure 1. An example of mottling due to TMV on a leaf

The algorithm used to measure mottling was adapted from [2], and is summarized in figure 2. First, the image of the lead is taken on a white background with a standard of known area (step 1). Next, the image is binarized so the leaf and the standard appear black, while the spots appear white (step 2). Then, the image is processed in two ways. First, the background is filled, so only the spots in white are shown (step 3a). Statistics such as the total area of all the spots, the number of spots, and the size of each spot is calculated. In parallel, the image is inverted and filled so only the standard and the entire leaf appear white (step 3b). The areas of the standard and the leaf are then calculated. The standard is distinguished from the leaf by measuring the eccentricity of each object, and taking the lower one as the standard. The leaf is then traced in green, and the standard is traced in magenta so the user can see if the program analyzed the objects correctly (step 3b). Then, using these statistics, metrics about the leaf can be displayed to the user, such as descriptive statistics (step 4), or a histogram of the spots (step 5).

Figure 2. TMV mottling algorithm.

This algorithm was implemented using MATLAB. A GUI was created (figure 3) so the user can see each individual image as it is being analyzed. The user also has the option to switch the leaf and the standard if the program identified the objects incorrectly, and the option to download all the data and summary statistics as an Excel file. For a large amount of images, a script was also created in MATLAB that performs this algorithm for every image in a specified folder. To download these scripts and functions, please visit our Github page.

Figure 3. Image of the MATLAB GUI than can be used to visualize the TMV mottling algorithm described.

When using the program to measure the leaves, the concern arose of whether the program was accurately measuring the area of the leaves. To address this issue, we measured the areas of 127 leaves using graph paper (figure 4), and then took pictures of those leaves to measure the area with the program.

Figure 4. Example of graph paper used to calculate the true areas of leaves by hand, so they could be compared against in the program.

First, the eccentricity of the leaves was measured against the eccentricity of the standard (piece of paper with known area) to make sure the eccentricity was a valid parameter to distinguish the leaf from the standard. MATLAB code was used to find the eccentricities, and perform significance testing. The eccentricities of the leaves $(\mu = 0.138995$, $\sigma = 0.044967)$ were significantly different than the eccentricities of the leaves $(\mu = 0.580446, \sigma = 0.150514)$ $(t_{252} = -31.6696, p < 0.001)$, as illustrated by the histogram in figure 5. Thus, we concluded that using the eccentricity was a valid parameter to distinguish between the leaf and the standard. For those few cases where it is not, a “switch objects” button was implemented to recalculate the leaf statistics when the standard and leaf were switched.

Figure 5. A histogram plot created in MATLAB of the distribution of eccentricities of the leaves and of the standards for all the photos. n=127.

Then, the accuracy of the program was measured by calculated the relative error between the true areas (calculated by hand with the graph paper) and the program areas. As seen in the unshifted line graph (figure 6a), the program had a tendency to overestimate the area, and thus the program needed to be corrected. A dilation shift was then applied to all the data points. The shift was calculated using the formula \[shift=1 - \frac{m\overline{r}}{\Sigma \frac{p_i}{t_i}}\]where $\overline{r}$ is the average relative error, and $\frac{p_i}{t_i}$ is the ratio of program area to true area for each image. For our data, our shift was calculated to be $0.649253$. As seen in the shifted line graph (figure 6b), the program areas align more closely to the true areas. A rank sum test was used to confirm that the distribution of shifted relative errors was not significantly different than a distribution with a mean of 0 $(U = 14001, p = 0.332)$. Thus, for all the mottling programs, a shift of $0.649253$ is applied to all areas before being outputted.

Figure 6. Line plots of the relative errors of the program. Figure 6a (left) shows the unshifted data, while Figure 6b (right) shows the data multiplied by the shift mentioned in the paragraph. The red circles represent the individual data points. The blue line represents the line of best fit of these data points. The black dotted line is the reference line of y = x, or program area = true area. Figures were created in MATLAB.

The main use of images in our project was for measuring fluorescence intensity in the leaves. This was used as a quantitative measure of how much virus (GFP) and the gene (RFP) was expressed. For more information about our results, see our results page.

The algorithm is summarized in figure 7. First, the image of the leaf with the fluorescence is taken (step 1). Next, the image is converted to a grayscale image based on the filter supplied (step 2). Then, if specified, the standards are specified to quantify the lowest and highest settings (step 3a). This will then adjust the intensity values of the image, where values below the low intensity will be set to 0, values above the high intensity will be set to 255, and values in between these intensities will be set to $255 \cdot \frac{v - s_o}{s_b - s_o}$, where $v$ is the data point, $s_o$ is the low intensity value, and $s_b$ is the high intensity value). Additionally, if a mask photo is specified, a mask will be applied to the photo to find the border of the leaf, which can be used to find the percent infected (step 3b). Then, the intensity of each region is measured in the grayscale image. A histogram of the distribution of the intensity of all the spots (step 4), and a heatmap of where these spots are located and how intense they are (step 5) can also be created based on the raw data.

Figure 7. Fluorescence intensity algorithm.

This algorithm was implemented using MATLAB. A GUI was created (figure 8) so the user can see each individual photo as it is being analyzed. The user also have the option to download all the graphs as MATLAB figure files, and the option to download all the data and statistics in an Excel file. For a large amount of images, a script was also created in MATLAB that performs this algorithm on all images in a specified folder. To download these scripts and functions, please visit our Github page.

Figure 8. Image of the MATLAB GUI than can be used to visualize the fluorescent intensity algorithm described.

We developed MATLAB code for analyzing mottling and fluorescence in leaves. We used this code to help quantify the amount of expression in our leaves during our experiment. All of our code is open source and is available on our Github page for other teams to look at, use, and adapt for their own purposes.

References

Burundukova, O.L., et al. “Dark and Light Green Tissues of Tobacco Leaves Systemically Infected with Tobacco Mosaic Virus.” Biologica Plantarum, Apr. 2007, https://link.springer.com/content/pdf/10.1007/s10535-009-0053-8.pdf

Marathe, Hrushiketh, and Kothe, Prerna. “Leaf Disease Detection Using Image Processing Techniques.” International Journal of Engineering Research & Technology, Mar. 2013, https://www.ijert.org/research/leaf-disease-detection-using-image-processing-techniques-IJERTV2IS3480.pdf

Our ordinary differential equation models the effect of tobacco mosaic virus (TMV) on the tobacco plant Nicotiana benthamiana and the effect of XRN1 on the TMV-infected plant. This is shown using two different functions, the first showing TMV infecting Nicotiana benthamiana and the second showing XRN1 degrading TMV.

MATLAB was used to model the effect of TMV on Nicotiana benthamiana. TMV infects plant cells by releasing its RNA to be translated using host ribosomes. This RNA codes for a capsid protein (CP), movement protein (MP), and replicase protein (p50) that work together to assemble progeny viruses or a virus replication complex (VRC). A VRC is a complex formed in the cell’s endoplasmic reticulum (ER) that aids in infecting the plant by producing viruses and moving through the plasmodesmata (PD). We constructed the following biochemical reactions based on the nature of TMV infection with the following assumption(s):

Viruses are used whenever possible to infect healthy plant cells, thus the amount is kept to a minimal until all cells are infected.
TMV proteins are used whenever possible to create a VRC, thus the amount is kept to a minimal.

$C + mV \overset{k}{\rightarrow} I$
$I \overset{i}{\rightarrow} nV + I$
$I \overset{D_I}{\rightarrow} bV$
$V \overset{x}{\rightarrow} P$
$V + P \overset{p}{\rightarrow} W$
$W \overset{r}{\rightarrow} nV + W$
$W + C \overset{a}{\rightarrow} W + I$
$C + P \overset{h}{\rightarrow} 0$
$V \overset{D_C}{\rightarrow} 0$
$V \overset{D_V}{\rightarrow} 0$
$W \overset{D_V}{\rightarrow} 0$
$P \overset{D_P}{\rightarrow} 0$
$T \overset{g}{\rightarrow} T + C$

From the biochemical reactions, we can obtain the following ODE:

$dI = kCV^m - ID_I + aWC$
$dV = kCV^m + inI + bID_I - xV - pVP + rnW - VD_V$
$dC = -kCV^m - aWC + gC - hCP - CD_C$
$dW = pVP - WD_V$
$dP = xV - pVP - hCP - PD_P$

Initial conditions:

$C = 1 \times 10^7$
$V = 10$
All other values $= 0$

Constant	Description	Units	Value
$m$	Multiplicity of Infection of TMV	$cells^{-1}$	$2$
$k$	Rate of virus to infect a cell	$min^{-1}$	$0.167$
$i$	Rate of infected cell producing viruses	$min^{-1}$	$0.25$
$b$	Number of viruses that burst from infected cell	$-$	$20$
$n$	Number of viruses produced from VRC and infected cell	$-$	$4$
$D_I$	Death rate of infected cell	$min^{-1}$	$5 \times 10^{-5}$
$x$	Rate of virus to create CP and MP	$min^{-1}$	$0.4$
$p$	Rate at which CP and MP are used to make VRC	$min^{-1}$	$0.3$
$r$	Rate of VRC replication	$min^{-1}$	$2$
$a$	Affinity for the cell to take VRC and become infected	$-$	$0.167$
$h$	Rate of cell with TMV protein to undergo hypersensitive response (HR) death	$min^{-1}$	$1 \times 10^{-4}$
$D_C$	Death rate of healthy plant cells	$min^{-1}$	$4 \times 10^{-5}$
$D_V$	Degradation rate of TMV	$min^{-1}$	$0.1$
$D_P$	Degradation rate of TMV proteins	$min^{-1}$	$0.1$
$g$	Growth rate of plant	$meters \times min^{-1}$	$1.7 \times 10^{-5}$

Variable	Description	Units
$dI$	Infected cells	$cells$
$dV$	TMV	$molecules$
$dC$	Healthy plant cells	$cells$
$dW$	Virus replication complex	$molecules$
$dP$	TMV Protein	$molecules$

Figure 1. The first graph depicts the effects of TMV on N. benthamiana. The second graph depicts the concentration of proteins created from TMV during the infection. Assume the amount of cells in a leaf is approximately 10 million. Dpi = days post infection.

Genetically engineered agrobacterium with XRN1 was agro-infiltrated into N. benthamiana to degrade TMV. The plant cells will adopt and express the XRN1 gene, allowing it to degrade TMV RNA. We constructed the following biochemical reaction based on the interaction of TMV and XRN1 with the following assumption(s):

XRN1 proteins are used whenever possible to cure infected cells, thus the amount is kept to a minimal
XRN1 degrades after degrading TMV
TMV stunts growth rate of the plant
Agrobacterium with XRN1 is agro-infiltrated into the plant after it is completely infected by TMV

$C \overset{c}{\rightarrow} C + X$
$I + X \overset{e}{\rightarrow} C$
$X \overset{v}{\rightarrow} - V - W$
$I \overset{D_I}{\rightarrow} bV$
$X \overset{D_X}{\rightarrow} 0$
$C \overset{D_C}{\rightarrow} 0$
$V \overset{D_V}{\rightarrow} 0$
$W \overset{D_V}{\rightarrow} 0$

From the biochemical reactions, we can obtain the following ODE:

$dIX = -cIX - ID_I$
$dVX = -vX + bID_I - VD_V$
$dCV = cIX - CD_C$
$dWX = -vX - WD_V$
$dX = eC - XD_X - cI_X - vX$

Initial conditions:

$I = 10,000$
$V, W = 21,000$
$X = 1$
$C = 0$

Constant	Description	Units	Value
$c$	Rate of curing infected cell	$min^{-1}$	$2$
$e$	Rate of healthy cell producing XRN1	$min^{-1}$	$2$
$v$	Rate of XRN1 degrading viruses	$min^{-1}$	$3$
$b$	Number of viruses that burst from infected cell	$-$	$20$
$D_I$	Death rate of infected cell	$min^{-1}$	$5 \times 10^{-5}$
$D_X$	Degradation rate of XRN1	$min^{-1}$	$0.1$
$D_C$	Death rate of healthy plant cells	$min^{-1}$	$4 \times 10^{-5}$
$D_V$	Degradation rate of TMV	$min^{-1}$	$0.1$

Variable	Description	Units
$dIX$	Infected cells	$cells$
$dVX$	TMV	$molecules$
$dCX$	Healthy plant cells	$cells$
$dWX$	Virus replication complex	$molecules$
$dX$	XRN1 Protein	$molecules$

Figure 2. The first graph depicts the effect of agrobacterium with XRN1 agro-infiltrated into a TMV-infected N. benthamiana. The second graph depicts the concentration of XRN1 during the infiltration. Assume the amount of cells in a leaf is approximately 10 million. Dpi = days post infection.

References

Liu, Chengke, and Richard S. Nelson. “The Cell Biology of Tobacco Mosaic Virus Replication and Movement.” Frontiers in Plant Science, vol. 4, 2013, doi:10.3389/fpls.2013.00012.

Mandadi, Kranthi K., and Karen-Beth G. Scholthof. “Plant Immune Responses Against Viruses: How Does a Virus Cause Disease?”The Plant Cell, vol. 25, no. 5, 2013, pp. 1489–1505., doi:10.1105/tpc.113.111658.

Asurmendi, S., et al. “Coat Protein Regulates Formation of Replication Complexes during Tobacco Mosaic Virus Infection.” Proceedings of the National Academy of Sciences, vol. 101, no. 5, 2004, pp. 1415–1420., doi:10.1073/pnas.0307778101.

Phylogeny refers to the evolutionary history of a species or its genes. A phylogenetic analysis is a means of estimating the evolutionary relationships between species. Phylogenetics is important because it enriches our understanding of how species and their genes evolve in relation to one another.

XRN family proteins are 5'-3' exoribonucleases in eukaryotes. There are four homologs in the XRN family: XRN1, XRN2, XRN3, XRN4. Each homolog slightly differs in its function and its location of enzymatic activity in the cell. For example, XRN1 and XRN4 are metabolically active in the cytoplasm, whereas XRN2 and XRN3 are found in the cell’s nucleus.

This is the first attempt to create a phylogenetic tree for the XRN family. A literature review of this gene revealed that there were no previous attempts to constructing one, making this a new but promising area of research.

S.cerevisiae has XRN1 and XRN2, but does not possess XRN3 or XRN4. Most plant species, however, lack XRN1 and instead have the XRN3 and XRN4 exoribonucleases. We did not find XRN1 in any of the Nicotiana species, but they did have XRN3 and XRN4. The species we are genetically modifying (Nicotiana benthamiana) was not found to have XRN3 or XRN4, although this could be due to the lack of data in the genome database.

Why is the XRN1 gene found in S.cerevisiae and not in plants? Was this gene lost or did it evolve into something new when it got to plants? Outlined are three possible scenarios for what could have occurred (cladograms pictured in order from left to right corresponding to scenarios).

Scenario 1: Ancient ancestor of plants possessed XRN1, but on its way to becoming modern plants, the gene changed significantly and is now unrecognizable.
Scenario 2: Ancient ancestor possessed XRN1, but on its way to becoming plants, it lost it.
Scenario 3: Ancient ancestor did not have XRN1, but the ancestor of fungi/animals acquired the gene.

To attempt to answer the evolutionary questions posed above, a phylogenetic tree of the XRN protein homologs was constructed. Ideally, our data would have a very clear relationship and point to one of the scenarios. However, most of the time, the data does not show a clear trend. Since this was the case for our project, it is impossible to pinpoint which one of the scenarios occurred. Instead, our goal was to provide the evolutionary context for the protein XRN1 to help aid our understanding of the main project and its feasibility.

Hypothesis: Based on the location of enzymatic activity in the cell, we hypothesized that XRN1 and XRN4 would be more evolutionarily related since they are both cytoplasmic enzymes. Similarly, we hypothesized that XRN2 and XRN3 would also be more evolutionarily related because they are active in the nucleus.

The species for our phylogenetic tree were chosen based on the availability of their genome sequences and how well studied they were. This was crucial because choosing a species without a well-researched genome would skew our results, as the protein database could be missing an XRN homolog that is actually present in the organism, and is only absent because its DNA has not been thoroughly studied or documented. We intentionally limited the number of species in our tree, as we found that more organisms on the tree made cluttered and thus harder to see the evolutionary relationship between the XRN genes. Below is the list of species used in our analysis.

Candida albicans
S. cerevisiae

Chlamydomonas reinhardtii

Schizosaccharomyces pombe
Neurospora crassa

Arabidopsis thaliana
Selaginella moellendorffii
Zea mays
Oryza sativa
Nicotiana tabacum
Nicotiana tomentosiformis
Nicotiana sylvestris

Using NCBI, the species were blasted for XRN1 and their FASTA files were downloaded. We uploaded the FASTA file to Clustal to align the sequences, as shown above. From this, we characterized the domains of the XRN proteins. We identified two domains(XRN-N and XRN-M) present in most of the XRN homologs, as well as four unique domains found only in XRN1(D1-D4).

XRN-N (N-terminus, position 1-230)
XRN-M (helical domain with catalytic core, position 272-680)
D1 (position 723-914)
D2 (position 918-1005)
D3 (position 1090-1157)
D4 (position 1163-1233)

This alignment was also helpful because it allowed us to identify and eliminate species with protein sequences that were not similar to the protein sequence of XRN1. This shows the limitations of databases, as they sometimes mistakenly categorize a protein as XRN.

❮ ❯

The evolutionary history was inferred by using the Maximum Likelihood method and Le_Gascuel_2008 (LG+G+I) model with bootstrap (1000 replications). The tree with the highest log likelihood (-5899.25) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.8197)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 12.25% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 26 amino acid sequences. All positions with less than 95% site coverage were eliminated, i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position (partial deletion option). There were a total of 302 positions in the final dataset.

Species marked with an asterisk* on the tree denote improper naming of the protein by the database.

The tree is shown in two formats: rectangular and radial. The radial is technically more accurate for our purposes because it is unrooted, but it is harder to view, so the rectangular is provided for reference. An unrooted tree is more conservative as opposed to the midpoint (rectangular tree) because it doesn’t show time.

Using Fastml, we were able to derive the ancestral sequence of XRN1. We obtained the ancestral sequences of all nodes on the tree, not just the furthest common ancestor. In this document, the sequences of all the nodes are shown. N refers to a node on the tree and the number tells you which node on the tree it is referring to. On the tree (see above), we have labeled where these nodes are located. N1 is the most ancient common ancestor.

N1 ancestral sequence: MGIPAFYRWLSEKYPLIISQVIEDEPIAIAGGGGATVPVDSSKPNPNGVEFDNLYLDMNGIIHPCSHPEDQLTHADIDRPSPTTEEEMFAAIFEYIDRLFRIVRPRKLLYMAIDGVAPRAKMNQQRSRRFRAAKDALDAEEEEERLRKQFEAEGREVRAPPKEESEVEVVVKKAFDSNCITPGTEFMAVLSEALQYYIHCKLNEDPGWQNIKVILSDANVPGEGEHKIMEFIRSQRSLPDYDPNTRHCLYGLDADLIMLGLATHEPHFSILREDVFYQNSGYGLSIESTTQPEKCYLCGQKGHLKERPWQAANCEGKAKRKRGEFDQKDELEPLPKKPFQFLHISVLREYLELELQEIPDPPKFKYDFERIIDDFIFICFFVGNDFLPHMPTLEIHEGAIDLLMSIYKQEFPNMGGYLTDMDKVKDKYNGHVNLSRVEHFIQALGKYEDKIFQKRARLHERQAKRIKRDKLKDEKKQRRRGDHRQLRQRPESLGPIARFSGSRLNQFPSPSPFQSNKPKLTSNHDAVDHANVSSLSDLNIQSENREDVDNQLPSSRSRSVDSSSTPADTLRVAETTAEEAAPPATLEDRMQEIKEELKKKQKASLREKSEVFKSDNEVTDKVKLWEEGWKDRYYEEKFKAHITPEEDEEVRRDVVQRYTEGLCWVLHYYYQGCPSWNWFYPYHYAPFASDLKDLGQELINEKIDIKFELGTPFKPFEQLMGVLPAASRHAIPECYRPLMSDENSPIIDFYPNDFEIDMNGKRFSWQGIALLPFIDERRLLAAMRKVEHTLTEEERRRNSVGKDILFVFASHYTHFYPSPLPGFFPDLEHNQCIEKEYKLPPMEGKEYRIGLCPGAKLGAFMLAGFPTFHTLPFKSELAYHEIKVFGHPSRNQSMILNVEDVWKTSDLTLEQFAQQYVNKQFYSKWPYLRECKLEPVMDEGYMYLAQKTNGYKPFGELIRPLSSQTKNLFNYERSTMIHKFDKQCGHKFGEMNLLGYQKPVPGLVRNHEGALVKKFSKSGLEYYPLQLVVKKYAGKDQRYKYRPPPPIEEEFPLDSKVFFLGDYAYGGPAKIVGYNDDKTKLGITWFKTQPGLEPNWGKERLNMDKNEVKYYPSYIVAKLLHLHPLLLSKITSQFMITDGTKRKVNIGLELKFEARHQKVLGYTRKSPKGKFWEFSNLTINLIKEYKNTFPGLFKKLTNHGNKDNLSGDLFPDCFPKDDMEQLKGIKHWLKEVGSKFLPVSLEDEDFLKFDIQKLEEYMDNYLGMGQGGYQRKEGKGGPREGYLGPGEGYQLLRGQYFDLGDRGGYGQDFGKQPILSYGPPPGWMPLPPPHHLGFGFDQPLLPGGNLYGRCPGGRGYGGYGSLVLGYQNKQFVYHSRAGKNRKKKTDENNIRKLKAKEAKKNPKQQQQQKEQQKKQKQDEQQFQKGDNELLNLLKKKNDMMGTQKDSKPDGDCKKKEPKDYDKENGQEDEANFEKAPLDNPTVAGSIFRVDPNQYKQGYGHIYNNPMPPGPMRPPPPPPPGPGHPQQFPYGYPIPPPFMPMHPGYHPLHPGQPPYPNFYGQYPPPGQPFGFGQPPPPPPPPPMTQGQPYGQGGTRDEYQGQDNKYNGPGRQHGGYRGRGGYRGGGYKGGGKGGGYRNHQECPKPKVKQEAADRDNKKDEST

In our analysis of the tree, we looked at how S. cerevisae’s XRN1 related to other XRNs, especially the XRN4 found in Nicotiana species. As shown on the tree, XRN1 is evolving very fast even within yeast. Since there is already so much differentiation occurring between the XRN1s in yeast, by the time XRN4 is in plants, there is a significant difference between XRN1 and XRN4. While XRN1 and XRN4 are both active in the cytoplasm, they are less related than we had initially thought.

Assuming our radial tree is correct, XRN3 and XRN4 arose in plants and are equidistantly related to XRN1. It appears that XRN2 is actually more closely related to XRN1. Thus, our hypothesis was incorrect. XRN1 and XRN4 are not more evolutionarily similar even though they are metabolically active at the same cell location.

While conducting our research, we ran into the problem of databases listing a protein as XRN when it was not the case. During our protein alignment and construction of the tree, we found that there are many proteins that have been mislabeled as XRN when they were not or were labeled the wrong homolog of XRN. For example, XRN3 of the plant species Arabidopsis thaliana was incorrectly labeled XRN2. We wanted to note this for future research purposes, as these mistakes could hinder future phylogenetic studies of the XRN family.

The main goal of this modeling project was to provide evolutionary context to our main project. We were able to gain a better understanding of how the XRN1 found in S. cerevisae relates to the XRN4 in Nicotiana species. The effect of integrating XRN1 into a plant genome was unknown. On one hand, this showed that the exoribonuclease we are inserting is very different than the exoribonuclease normally found in plants. This could raise some concerns about whether the XRN1 enzyme would be functional/stable in plants. If we were to do this experiment again using the results from this phylogenetic analysis, the XRN from a relative ancestor of plants (such as a green alga) may have been a more suitable choice. Using the ancestral sequences we derived above (section 4), one could insert the sequence of N4 or N7 into a plant and observe its results. Of course, we are not implying that these proteins would be extremely effective in attacking TMV- the results cannot be determined without an experiment. However, because they are more closely related to Nicotiana than the XRN1 found in yeast, perhaps their stability would lend to more desirable results.

References

Chang, J. H., Xiang, S., Xiang, K., Manley, J. L., & Tong, L. (2011). Structural and biochemical studies of the 5′→ 3′ exoribonuclease Xrn1. Nature structural & molecular biology, 18(3), 270.

Miki, T. S., & Großhans, H. (2013). The multifunctional RNase XRN2.

Sakyiama, J., Zimmer, S. L., Ciganda, M., & Williams, N. (2013). Ribosome biogenesis requires a highly diverged XRN family 5′→ 3′ exoribonuclease for rRNA processing in Trypanosoma brucei. Rna, 19(10), 1419-1431.
Souret, F. F., Kastenmayer, J. P., & Green, P. J. (2004). AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Molecular cell, 15(2), 173-183.

Agent-based modeling is a simulation modeling technique that uses agents to model complex systems, and it has seen applications in many fields, including biology. An agent-based model (ABM) simulates actions and interactions between autonomous entities, called agents. Each agent behaves according to a set of rules, and exhibits various behaviours depending on the status of the system. For example, consider an ecosystem that consists of sheep and wolves, and we want to determine the population of each species at a certain time. The agents would be the sheep and the wolves, and we can define what happens when two sheep interact and when a sheep and a wolf interact. ABM introduces a factor of randomness into the model and is able to simulate behaviours that would be hard or impossible to simulate with regular mathematical models. In short, ABM describes a system from the perspective of its constituents [1].

Figure 1. An agent-based model (ABM) describes the entities of a system as agents, the set of rules for their behavior and the interactions between agents [2].

Complex and unanticipated behavior may arise from entities of a model interacting with one another. Since we are defining rules for interactions between agents in ABM, it makes it easy to capture phenomena resulting from such interactions.

In a system that is made up of autonomous entities, it is natural to describe their behavior and how they interact with each other, and ABM makes that possible.

In an ABM, it is easy to add more agents, and add or modify the rules in which they behave and interact. This makes ABMs flexible [1].

NetLogo is a open-source software for developing agent-based models. It is useful for modeling complex systems that change over time. It uses a variant of Logo as its programming languages and provides an easy-to-use graphical user interface for visualizing the parameters and the agents of the system.

Features of a NetLogo model:

Turtles: Mobile agents
Patches: A grid of stationary agents
Links: Agents that connect turtles to define interactions between them
Procedures: For describing sets of behavior rules for the agents
Reporters: Functions that tell us something about the status of the agents

Figure 2. A wolf sheep predation example model on NetLogo.

This year, our team developed a model of cell-to-cell spreading of Tobacco Mosaic Virus (TMV) using NetLogo. The model has two agents, virus replication complexes (VRC) and cells. VRCs are represented as small white triangles, and can move within the cell, spread to adjacent cells, and replicate inside a cell. Cells are represented by a green area that is enclosed by a black border. The cells are essentially a randomly generated Voronoi diagram. Healthy cells are represented as green, while infected ones are represented as red. The model also has links between adjacent cells, which are represented by gray lines between the centers of adjacent cells. This is a dual graph of the Voronoi diagram. There is also a slider that allows the user to change the number of initial infection sites (1 - 20).

The number of VRCs produced in a cell to spread to adjacent cells is equal to the number of its healthy adjacent cells.
VRCs do not move to an already infected cell.
Each VRC spreads to a different healthy cell.

Initial infection and spreading [3]:

Hours post infection	Description
0 - 14	Viral RNA and proteins are produced, and VRCs are formed. They are still motionless.
14 - 16	VRCs rapidly move within the initially infected cell.
16 - 18	Movement of VRCs is slowed down and eventually stopped. VRCs are now located at or adjacent to plasmodesmata.
18 - 20	Plasmodesmata are opened by VRCs. They are motionless in this state.
20+	VRCs spread to adjacent cells, and replication is initiated in secondary cells.

Replication and spreading in secondary cells and so on [3]:

Hours post infection	Description
0 - 2	VRCs rapidly move within the infected cell.
2 - 4	VRCs move to plasmodesmata and spread to adjacent cells to infect more cells.

Figure 3. The NetLogo model for TMV spread in action. The number of healthy cells, infected cells and VRCs can be seen on the graph on the left.

Figure 4. The resulting graph from running the model with one initial infection site.

To view the source code of this model, please visit our organizational Github account.

This model could definitely be expanded to simulate the spread of TMV throughout the entire plant, not just one leaf. It would also be possible to include XRN1 in the model and analyze its effect on the spread of the virus. Future iGEM teams could also use this program as a baseline to develop their own virus spread models.

References

Bonabeau, Eric. “Agent-Based Modeling: Methods and Techniques for Simulating Human Systems.” PNAS, National Academy of Sciences, 14 May 2002, www.pnas.org/content/99/suppl_3/7280/tab-figures-data.

Galán, José Manuel, Izquierdo, Luis R., Izquierdo, Segismundo S., Santos, José Ignacio, del Olmo, Ricardo, López-Paredes, Adolfo and Edmonds, Bruce (2009). 'Errors and Artefacts in Agent-Based Modelling'. Journal of Artificial Societies and Social Simulation 12(1)1.

Kawakami, Shigeki, et al. “Tobacco Mosaic Virus Infection Spreads Cell to Cell as Intact Replication Complexes.” PNAS, National Academy of Sciences, 20 Apr. 2004, www.pnas.org/content/101/16/6291.

iGEM Stony Brook 2019

Team:Stony Brook/Model

Model

What is image processing and analysis?

Analyzing mottling on leaves

Measuring bias in the program

Analyzing fluorescence in leaves

Summary of our image processing

References

What is our ODE model about?

TMV Model

XRN1 Model

References

What is phylogenetic analysis?

Introducing the XRN family

The evolutionary question

Our objective

Data

1. Constructing a list of species for the tree

Yeast

Plant Ancestors (alga)

Fungi

Plants

2. Aligning the protein sequences and characterizing their domains

3. Creating the phylogenetic tree

4. Constructing the Ancestral Sequence of XRN1

Discussion

How does S. cerevisae’s XRN1 relate to other XRNs?

Was our hypothesis correct?

Arbitrary Naming of XRNs

Conclusions and Future Research

References

What is agent-based modeling?

Why use agent-based modeling?

1. It makes it easier to capture emergent phenomena.

2. It provides a natural description of the system.

3. It is flexible.

NetLogo

Tobacco Mosaic Virus (TMV) Spread Model

Assumptions of the model

Behavior of VRC agents

Future goals

References