Model
What is image processing and analysis?
Image processing is the use of algorithms to analyze digital images. This can include classification of objects and images, and pattern recognition. Image processing is also useful for analyzing a system that is difficult to analyze by hand. For our project, we used image processing and analysis to visualize and quantify the amount of mottling and fluorescence in our leaves during our experiments.
Analyzing mottling on leaves
One of the symptoms of TMV is a mottled “mosaic” pattern on the leaf. This is seen as yellow or light green spots on the surface of the leaf (figure 1). Since it has been shown that these areas contained virus, and that the dark green areas are resistant to virus [1], we expected that if our gene was reducing the amount of viral RNA, it would also reduce the amount of mottling seen on the leaf.
The algorithm used to measure mottling was adapted from [2], and is summarized in figure 2. First, the image of the lead is taken on a white background with a standard of known area (step 1). Next, the image is binarized so the leaf and the standard appear black, while the spots appear white (step 2). Then, the image is processed in two ways. First, the background is filled, so only the spots in white are shown (step 3a). Statistics such as the total area of all the spots, the number of spots, and the size of each spot is calculated. In parallel, the image is inverted and filled so only the standard and the entire leaf appear white (step 3b). The areas of the standard and the leaf are then calculated. The standard is distinguished from the leaf by measuring the eccentricity of each object, and taking the lower one as the standard. The leaf is then traced in green, and the standard is traced in magenta so the user can see if the program analyzed the objects correctly (step 3b). Then, using these statistics, metrics about the leaf can be displayed to the user, such as descriptive statistics (step 4), or a histogram of the spots (step 5).
This algorithm was implemented using MATLAB. A GUI was created (figure 3) so the user can see each individual image as it is being analyzed. The user also has the option to switch the leaf and the standard if the program identified the objects incorrectly, and the option to download all the data and summary statistics as an Excel file. For a large amount of images, a script was also created in MATLAB that performs this algorithm for every image in a specified folder. To download these scripts and functions, please visit our Github page.
Measuring bias in the program
When using the program to measure the leaves, the concern arose of whether the program was accurately measuring the area of the leaves. To address this issue, we measured the areas of 127 leaves using graph paper (figure 4), and then took pictures of those leaves to measure the area with the program.
First, the eccentricity of the leaves was measured against the eccentricity of the standard (piece of paper with known area) to make sure the eccentricity was a valid parameter to distinguish the leaf from the standard. MATLAB code was used to find the eccentricities, and perform significance testing. The eccentricities of the leaves $(\mu = 0.138995$, $\sigma = 0.044967)$ were significantly different than the eccentricities of the leaves $(\mu = 0.580446, \sigma = 0.150514)$ $(t_{252} = -31.6696, p < 0.001)$, as illustrated by the histogram in figure 5. Thus, we concluded that using the eccentricity was a valid parameter to distinguish between the leaf and the standard. For those few cases where it is not, a “switch objects” button was implemented to recalculate the leaf statistics when the standard and leaf were switched.
Then, the accuracy of the program was measured by calculated the relative error between the true areas (calculated by hand with the graph paper) and the program areas. As seen in the unshifted line graph (figure 6a), the program had a tendency to overestimate the area, and thus the program needed to be corrected. A dilation shift was then applied to all the data points. The shift was calculated using the formula \[shift=1 - \frac{m\overline{r}}{\Sigma \frac{p_i}{t_i}}\]where $\overline{r}$ is the average relative error, and $\frac{p_i}{t_i}$ is the ratio of program area to true area for each image. For our data, our shift was calculated to be $0.649253$. As seen in the shifted line graph (figure 6b), the program areas align more closely to the true areas. A rank sum test was used to confirm that the distribution of shifted relative errors was not significantly different than a distribution with a mean of 0 $(U = 14001, p = 0.332)$. Thus, for all the mottling programs, a shift of $0.649253$ is applied to all areas before being outputted.
Analyzing fluorescence in leaves
The main use of images in our project was for measuring fluorescence intensity in the leaves. This was used as a quantitative measure of how much virus (GFP) and the gene (RFP) was expressed. For more information about our results, see our results page.
The algorithm is summarized in figure 7. First, the image of the leaf with the fluorescence is taken (step 1). Next, the image is converted to a grayscale image based on the filter supplied (step 2). Then, if specified, the standards are specified to quantify the lowest and highest settings (step 3a). This will then adjust the intensity values of the image, where values below the low intensity will be set to 0, values above the high intensity will be set to 255, and values in between these intensities will be set to $255 \cdot \frac{v - s_o}{s_b - s_o}$, where $v$ is the data point, $s_o$ is the low intensity value, and $s_b$ is the high intensity value). Additionally, if a mask photo is specified, a mask will be applied to the photo to find the border of the leaf, which can be used to find the percent infected (step 3b). Then, the intensity of each region is measured in the grayscale image. A histogram of the distribution of the intensity of all the spots (step 4), and a heatmap of where these spots are located and how intense they are (step 5) can also be created based on the raw data.
This algorithm was implemented using MATLAB. A GUI was created (figure 8) so the user can see each individual photo as it is being analyzed. The user also have the option to download all the graphs as MATLAB figure files, and the option to download all the data and statistics in an Excel file. For a large amount of images, a script was also created in MATLAB that performs this algorithm on all images in a specified folder. To download these scripts and functions, please visit our Github page.
Summary of our image processing
We developed MATLAB code for analyzing mottling and fluorescence in leaves. We used this code to help quantify the amount of expression in our leaves during our experiment. All of our code is open source and is available on our Github page for other teams to look at, use, and adapt for their own purposes.
References
- Burundukova, O.L., et al. “Dark and Light Green Tissues of Tobacco Leaves Systemically Infected with Tobacco Mosaic Virus.” Biologica Plantarum, Apr. 2007, https://link.springer.com/content/pdf/10.1007/s10535-009-0053-8.pdf
- Marathe, Hrushiketh, and Kothe, Prerna. “Leaf Disease Detection Using Image Processing Techniques.” International Journal of Engineering Research & Technology, Mar. 2013, https://www.ijert.org/research/leaf-disease-detection-using-image-processing-techniques-IJERTV2IS3480.pdf
What is our ODE model about?
Our ordinary differential equation models the effect of tobacco mosaic virus (TMV) on the tobacco plant Nicotiana benthamiana and the effect of XRN1 on the TMV-infected plant. This is shown using two different functions, the first showing TMV infecting Nicotiana benthamiana and the second showing XRN1 degrading TMV.
TMV Model
MATLAB was used to model the effect of TMV on Nicotiana benthamiana. TMV infects plant cells by releasing its RNA to be translated using host ribosomes. This RNA codes for a capsid protein (CP), movement protein (MP), and replicase protein (p50) that work together to assemble progeny viruses or a virus replication complex (VRC). A VRC is a complex formed in the cell’s endoplasmic reticulum (ER) that aids in infecting the plant by producing viruses and moving through the plasmodesmata (PD). We constructed the following biochemical reactions based on the nature of TMV infection with the following assumption(s):
- Viruses are used whenever possible to infect healthy plant cells, thus the amount is kept to a minimal until all cells are infected.
- TMV proteins are used whenever possible to create a VRC, thus the amount is kept to a minimal.
- $C + mV \overset{k}{\rightarrow} I$
- $I \overset{i}{\rightarrow} nV + I$
- $I \overset{D_I}{\rightarrow} bV$
- $V \overset{x}{\rightarrow} P$
- $V + P \overset{p}{\rightarrow} W$
- $W \overset{r}{\rightarrow} nV + W$
- $W + C \overset{a}{\rightarrow} W + I$
- $C + P \overset{h}{\rightarrow} 0$
- $V \overset{D_C}{\rightarrow} 0$
- $V \overset{D_V}{\rightarrow} 0$
- $W \overset{D_V}{\rightarrow} 0$
- $P \overset{D_P}{\rightarrow} 0$
- $T \overset{g}{\rightarrow} T + C$
From the biochemical reactions, we can obtain the following ODE:
- $dI = kCV^m - ID_I + aWC$
- $dV = kCV^m + inI + bID_I - xV - pVP + rnW - VD_V$
- $dC = -kCV^m - aWC + gC - hCP - CD_C$
- $dW = pVP - WD_V$
- $dP = xV - pVP - hCP - PD_P$
Initial conditions:
- $C = 1 \times 10^7$
- $V = 10$
- All other values $= 0$
Constant | Description | Units | Value |
---|---|---|---|
$m$ | Multiplicity of Infection of TMV | $cells^{-1}$ | $2$ |
$k$ | Rate of virus to infect a cell | $min^{-1}$ | $0.167$ |
$i$ | Rate of infected cell producing viruses | $min^{-1}$ | $0.25$ |
$b$ | Number of viruses that burst from infected cell | $-$ | $20$ |
$n$ | Number of viruses produced from VRC and infected cell | $-$ | $4$ |
$D_I$ | Death rate of infected cell | $min^{-1}$ | $5 \times 10^{-5}$ |
$x$ | Rate of virus to create CP and MP | $min^{-1}$ | $0.4$ |
$p$ | Rate at which CP and MP are used to make VRC | $min^{-1}$ | $0.3$ |
$r$ | Rate of VRC replication | $min^{-1}$ | $2$ |
$a$ | Affinity for the cell to take VRC and become infected | $-$ | $0.167$ |
$h$ | Rate of cell with TMV protein to undergo hypersensitive response (HR) death | $min^{-1}$ | $1 \times 10^{-4}$ |
$D_C$ | Death rate of healthy plant cells | $min^{-1}$ | $4 \times 10^{-5}$ |
$D_V$ | Degradation rate of TMV | $min^{-1}$ | $0.1$ |
$D_P$ | Degradation rate of TMV proteins | $min^{-1}$ | $0.1$ |
$g$ | Growth rate of plant | $meters \times min^{-1}$ | $1.7 \times 10^{-5}$ |
Variable | Description | Units |
---|---|---|
$dI$ | Infected cells | $cells$ |
$dV$ | TMV | $molecules$ |
$dC$ | Healthy plant cells | $cells$ |
$dW$ | Virus replication complex | $molecules$ |
$dP$ | TMV Protein | $molecules$ |
XRN1 Model
Genetically engineered agrobacterium with XRN1 was agro-infiltrated into N. benthamiana to degrade TMV. The plant cells will adopt and express the XRN1 gene, allowing it to degrade TMV RNA. We constructed the following biochemical reaction based on the interaction of TMV and XRN1 with the following assumption(s):
- XRN1 proteins are used whenever possible to cure infected cells, thus the amount is kept to a minimal
- XRN1 degrades after degrading TMV
- TMV stunts growth rate of the plant
- Agrobacterium with XRN1 is agro-infiltrated into the plant after it is completely infected by TMV
- $C \overset{c}{\rightarrow} C + X$
- $I + X \overset{e}{\rightarrow} C$
- $X \overset{v}{\rightarrow} - V - W$
- $I \overset{D_I}{\rightarrow} bV$
- $X \overset{D_X}{\rightarrow} 0$
- $C \overset{D_C}{\rightarrow} 0$
- $V \overset{D_V}{\rightarrow} 0$
- $W \overset{D_V}{\rightarrow} 0$
From the biochemical reactions, we can obtain the following ODE:
- $dIX = -cIX - ID_I$
- $dVX = -vX + bID_I - VD_V$
- $dCV = cIX - CD_C$
- $dWX = -vX - WD_V$
- $dX = eC - XD_X - cI_X - vX$
Initial conditions:
- $I = 10,000$
- $V, W = 21,000$
- $X = 1$
- $C = 0$
Constant | Description | Units | Value |
---|---|---|---|
$c$ | Rate of curing infected cell | $min^{-1}$ | $2$ |
$e$ | Rate of healthy cell producing XRN1 | $min^{-1}$ | $2$ |
$v$ | Rate of XRN1 degrading viruses | $min^{-1}$ | $3$ |
$b$ | Number of viruses that burst from infected cell | $-$ | $20$ |
$D_I$ | Death rate of infected cell | $min^{-1}$ | $5 \times 10^{-5}$ |
$D_X$ | Degradation rate of XRN1 | $min^{-1}$ | $0.1$ |
$D_C$ | Death rate of healthy plant cells | $min^{-1}$ | $4 \times 10^{-5}$ |
$D_V$ | Degradation rate of TMV | $min^{-1}$ | $0.1$ |
Variable | Description | Units |
---|---|---|
$dIX$ | Infected cells | $cells$ |
$dVX$ | TMV | $molecules$ |
$dCX$ | Healthy plant cells | $cells$ |
$dWX$ | Virus replication complex | $molecules$ |
$dX$ | XRN1 Protein | $molecules$ |
References
- Liu, Chengke, and Richard S. Nelson. “The Cell Biology of Tobacco Mosaic Virus Replication and Movement.” Frontiers in Plant Science, vol. 4, 2013, doi:10.3389/fpls.2013.00012.
- Mandadi, Kranthi K., and Karen-Beth G. Scholthof. “Plant Immune Responses Against Viruses: How Does a Virus Cause Disease?”The Plant Cell, vol. 25, no. 5, 2013, pp. 1489–1505., doi:10.1105/tpc.113.111658.
- Asurmendi, S., et al. “Coat Protein Regulates Formation of Replication Complexes during Tobacco Mosaic Virus Infection.” Proceedings of the National Academy of Sciences, vol. 101, no. 5, 2004, pp. 1415–1420., doi:10.1073/pnas.0307778101.
What is phylogenetic analysis?
Phylogeny refers to the evolutionary history of a species or its genes. A phylogenetic analysis is a means of estimating the evolutionary relationships between species. Phylogenetics is important because it enriches our understanding of how species and their genes evolve in relation to one another.
Introducing the XRN family
XRN family proteins are 5'-3' exoribonucleases in eukaryotes. There are four homologs in the XRN family: XRN1, XRN2, XRN3, XRN4. Each homolog slightly differs in its function and its location of enzymatic activity in the cell. For example, XRN1 and XRN4 are metabolically active in the cytoplasm, whereas XRN2 and XRN3 are found in the cell’s nucleus.
This is the first attempt to create a phylogenetic tree for the XRN family. A literature review of this gene revealed that there were no previous attempts to constructing one, making this a new but promising area of research.
S.cerevisiae has XRN1 and XRN2, but does not possess XRN3 or XRN4. Most plant species, however, lack XRN1 and instead have the XRN3 and XRN4 exoribonucleases. We did not find XRN1 in any of the Nicotiana species, but they did have XRN3 and XRN4. The species we are genetically modifying (Nicotiana benthamiana) was not found to have XRN3 or XRN4, although this could be due to the lack of data in the genome database.
The evolutionary question
Why is the XRN1 gene found in S.cerevisiae and not in plants? Was this gene lost or did it evolve into something new when it got to plants? Outlined are three possible scenarios for what could have occurred (cladograms pictured in order from left to right corresponding to scenarios).
- Scenario 1: Ancient ancestor of plants possessed XRN1, but on its way to becoming modern plants, the gene changed significantly and is now unrecognizable.
- Scenario 2: Ancient ancestor possessed XRN1, but on its way to becoming plants, it lost it.
- Scenario 3: Ancient ancestor did not have XRN1, but the ancestor of fungi/animals acquired the gene.
Our objective
To attempt to answer the evolutionary questions posed above, a phylogenetic tree of the XRN protein homologs was constructed. Ideally, our data would have a very clear relationship and point to one of the scenarios. However, most of the time, the data does not show a clear trend. Since this was the case for our project, it is impossible to pinpoint which one of the scenarios occurred. Instead, our goal was to provide the evolutionary context for the protein XRN1 to help aid our understanding of the main project and its feasibility.
Hypothesis: Based on the location of enzymatic activity in the cell, we hypothesized that XRN1 and XRN4 would be more evolutionarily related since they are both cytoplasmic enzymes. Similarly, we hypothesized that XRN2 and XRN3 would also be more evolutionarily related because they are active in the nucleus.
Data
1. Constructing a list of species for the tree
The species for our phylogenetic tree were chosen based on the availability of their genome sequences and how well studied they were. This was crucial because choosing a species without a well-researched genome would skew our results, as the protein database could be missing an XRN homolog that is actually present in the organism, and is only absent because its DNA has not been thoroughly studied or documented. We intentionally limited the number of species in our tree, as we found that more organisms on the tree made cluttered and thus harder to see the evolutionary relationship between the XRN genes. Below is the list of species used in our analysis.
Yeast
- Candida albicans
- S. cerevisiae
Plant Ancestors (alga)
- Chlamydomonas reinhardtii
Fungi
- Schizosaccharomyces pombe
- Neurospora crassa
Plants
- Arabidopsis thaliana
- Selaginella moellendorffii
- Zea mays
- Oryza sativa
- Nicotiana tabacum
- Nicotiana tomentosiformis
- Nicotiana sylvestris
2. Aligning the protein sequences and characterizing their domains
Using NCBI, the species were blasted for XRN1 and their FASTA files were downloaded. We uploaded the FASTA file to Clustal to align the sequences, as shown above. From this, we characterized the domains of the XRN proteins. We identified two domains(XRN-N and XRN-M) present in most of the XRN homologs, as well as four unique domains found only in XRN1(D1-D4).
- XRN-N (N-terminus, position 1-230)
- XRN-M (helical domain with catalytic core, position 272-680)
- D1 (position 723-914)
- D2 (position 918-1005)
- D3 (position 1090-1157)
- D4 (position 1163-1233)
This alignment was also helpful because it allowed us to identify and eliminate species with protein sequences that were not similar to the protein sequence of XRN1. This shows the limitations of databases, as they sometimes mistakenly categorize a protein as XRN.
3. Creating the phylogenetic tree
The evolutionary history was inferred by using the Maximum Likelihood method and Le_Gascuel_2008 (LG+G+I) model with bootstrap (1000 replications). The tree with the highest log likelihood (-5899.25) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.8197)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 12.25% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 26 amino acid sequences. All positions with less than 95% site coverage were eliminated, i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position (partial deletion option). There were a total of 302 positions in the final dataset.
Species marked with an asterisk* on the tree denote improper naming of the protein by the database.
The tree is shown in two formats: rectangular and radial. The radial is technically more accurate for our purposes because it is unrooted, but it is harder to view, so the rectangular is provided for reference. An unrooted tree is more conservative as opposed to the midpoint (rectangular tree) because it doesn’t show time.
4. Constructing the Ancestral Sequence of XRN1
Using Fastml, we were able to derive the ancestral sequence of XRN1. We obtained the ancestral sequences of all nodes on the tree, not just the furthest common ancestor. In this document, the sequences of all the nodes are shown. N refers to a node on the tree and the number tells you which node on the tree it is referring to. On the tree (see above), we have labeled where these nodes are located. N1 is the most ancient common ancestor.
N1 ancestral sequence: MGIPAFYRWLSEKYPLIISQVIEDEPIAIAGGGGATVPVDSSKPNPNGVEFDNLYLDMNGIIHPCSHPEDQLTHADIDRPSPTTEEEMFAAIFEYIDRLFRIVRPRKLLYMAIDGVAPRAKMNQQRSRRFRAAKDALDAEEEEERLRKQFEAEGREVRAPPKEESEVEVVVKKAFDSNCITPGTEFMAVLSEALQYYIHCKLNEDPGWQNIKVILSDANVPGEGEHKIMEFIRSQRSLPDYDPNTRHCLYGLDADLIMLGLATHEPHFSILREDVFYQNSGYGLSIESTTQPEKCYLCGQKGHLKERPWQAANCEGKAKRKRGEFDQKDELEPLPKKPFQFLHISVLREYLELELQEIPDPPKFKYDFERIIDDFIFICFFVGNDFLPHMPTLEIHEGAIDLLMSIYKQEFPNMGGYLTDMDKVKDKYNGHVNLSRVEHFIQALGKYEDKIFQKRARLHERQAKRIKRDKLKDEKKQRRRGDHRQLRQRPESLGPIARFSGSRLNQFPSPSPFQSNKPKLTSNHDAVDHANVSSLSDLNIQSENREDVDNQLPSSRSRSVDSSSTPADTLRVAETTAEEAAPPATLEDRMQEIKEELKKKQKASLREKSEVFKSDNEVTDKVKLWEEGWKDRYYEEKFKAHITPEEDEEVRRDVVQRYTEGLCWVLHYYYQGCPSWNWFYPYHYAPFASDLKDLGQELINEKIDIKFELGTPFKPFEQLMGVLPAASRHAIPECYRPLMSDENSPIIDFYPNDFEIDMNGKRFSWQGIALLPFIDERRLLAAMRKVEHTLTEEERRRNSVGKDILFVFASHYTHFYPSPLPGFFPDLEHNQCIEKEYKLPPMEGKEYRIGLCPGAKLGAFMLAGFPTFHTLPFKSELAYHEIKVFGHPSRNQSMILNVEDVWKTSDLTLEQFAQQYVNKQFYSKWPYLRECKLEPVMDEGYMYLAQKTNGYKPFGELIRPLSSQTKNLFNYERSTMIHKFDKQCGHKFGEMNLLGYQKPVPGLVRNHEGALVKKFSKSGLEYYPLQLVVKKYAGKDQRYKYRPPPPIEEEFPLDSKVFFLGDYAYGGPAKIVGYNDDKTKLGITWFKTQPGLEPNWGKERLNMDKNEVKYYPSYIVAKLLHLHPLLLSKITSQFMITDGTKRKVNIGLELKFEARHQKVLGYTRKSPKGKFWEFSNLTINLIKEYKNTFPGLFKKLTNHGNKDNLSGDLFPDCFPKDDMEQLKGIKHWLKEVGSKFLPVSLEDEDFLKFDIQKLEEYMDNYLGMGQGGYQRKEGKGGPREGYLGPGEGYQLLRGQYFDLGDRGGYGQDFGKQPILSYGPPPGWMPLPPPHHLGFGFDQPLLPGGNLYGRCPGGRGYGGYGSLVLGYQNKQFVYHSRAGKNRKKKTDENNIRKLKAKEAKKNPKQQQQQKEQQKKQKQDEQQFQKGDNELLNLLKKKNDMMGTQKDSKPDGDCKKKEPKDYDKENGQEDEANFEKAPLDNPTVAGSIFRVDPNQYKQGYGHIYNNPMPPGPMRPPPPPPPGPGHPQQFPYGYPIPPPFMPMHPGYHPLHPGQPPYPNFYGQYPPPGQPFGFGQPPPPPPPPPMTQGQPYGQGGTRDEYQGQDNKYNGPGRQHGGYRGRGGYRGGGYKGGGKGGGYRNHQECPKPKVKQEAADRDNKKDEST
Discussion
How does S. cerevisae’s XRN1 relate to other XRNs?
In our analysis of the tree, we looked at how S. cerevisae’s XRN1 related to other XRNs, especially the XRN4 found in Nicotiana species. As shown on the tree, XRN1 is evolving very fast even within yeast. Since there is already so much differentiation occurring between the XRN1s in yeast, by the time XRN4 is in plants, there is a significant difference between XRN1 and XRN4. While XRN1 and XRN4 are both active in the cytoplasm, they are less related than we had initially thought.
Was our hypothesis correct?
Assuming our radial tree is correct, XRN3 and XRN4 arose in plants and are equidistantly related to XRN1. It appears that XRN2 is actually more closely related to XRN1. Thus, our hypothesis was incorrect. XRN1 and XRN4 are not more evolutionarily similar even though they are metabolically active at the same cell location.
Arbitrary Naming of XRNs
While conducting our research, we ran into the problem of databases listing a protein as XRN when it was not the case. During our protein alignment and construction of the tree, we found that there are many proteins that have been mislabeled as XRN when they were not or were labeled the wrong homolog of XRN. For example, XRN3 of the plant species Arabidopsis thaliana was incorrectly labeled XRN2. We wanted to note this for future research purposes, as these mistakes could hinder future phylogenetic studies of the XRN family.
Conclusions and Future Research
The main goal of this modeling project was to provide evolutionary context to our main project. We were able to gain a better understanding of how the XRN1 found in S. cerevisae relates to the XRN4 in Nicotiana species. The effect of integrating XRN1 into a plant genome was unknown. On one hand, this showed that the exoribonuclease we are inserting is very different than the exoribonuclease normally found in plants. This could raise some concerns about whether the XRN1 enzyme would be functional/stable in plants. If we were to do this experiment again using the results from this phylogenetic analysis, the XRN from a relative ancestor of plants (such as a green alga) may have been a more suitable choice. Using the ancestral sequences we derived above (section 4), one could insert the sequence of N4 or N7 into a plant and observe its results. Of course, we are not implying that these proteins would be extremely effective in attacking TMV- the results cannot be determined without an experiment. However, because they are more closely related to Nicotiana than the XRN1 found in yeast, perhaps their stability would lend to more desirable results.
References
- Chang, J. H., Xiang, S., Xiang, K., Manley, J. L., & Tong, L. (2011). Structural and biochemical studies of the 5′→ 3′ exoribonuclease Xrn1. Nature structural & molecular biology, 18(3), 270.
- Miki, T. S., & Großhans, H. (2013). The multifunctional RNase XRN2.
- Sakyiama, J., Zimmer, S. L., Ciganda, M., & Williams, N. (2013). Ribosome biogenesis requires a highly diverged XRN family 5′→ 3′ exoribonuclease for rRNA processing in Trypanosoma brucei. Rna, 19(10), 1419-1431.
- Souret, F. F., Kastenmayer, J. P., & Green, P. J. (2004). AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Molecular cell, 15(2), 173-183.
What is agent-based modeling?
Agent-based modeling is a simulation modeling technique that uses agents to model complex systems, and it has seen applications in many fields, including biology. An agent-based model (ABM) simulates actions and interactions between autonomous entities, called agents. Each agent behaves according to a set of rules, and exhibits various behaviours depending on the status of the system. For example, consider an ecosystem that consists of sheep and wolves, and we want to determine the population of each species at a certain time. The agents would be the sheep and the wolves, and we can define what happens when two sheep interact and when a sheep and a wolf interact. ABM introduces a factor of randomness into the model and is able to simulate behaviours that would be hard or impossible to simulate with regular mathematical models. In short, ABM describes a system from the perspective of its constituents [1].
Why use agent-based modeling?
1. It makes it easier to capture emergent phenomena.
Complex and unanticipated behavior may arise from entities of a model interacting with one another. Since we are defining rules for interactions between agents in ABM, it makes it easy to capture phenomena resulting from such interactions.
2. It provides a natural description of the system.
In a system that is made up of autonomous entities, it is natural to describe their behavior and how they interact with each other, and ABM makes that possible.
3. It is flexible.
In an ABM, it is easy to add more agents, and add or modify the rules in which they behave and interact. This makes ABMs flexible [1].
NetLogo
NetLogo is a open-source software for developing agent-based models. It is useful for modeling complex systems that change over time. It uses a variant of Logo as its programming languages and provides an easy-to-use graphical user interface for visualizing the parameters and the agents of the system.
Features of a NetLogo model:
- Turtles: Mobile agents
- Patches: A grid of stationary agents
- Links: Agents that connect turtles to define interactions between them
- Procedures: For describing sets of behavior rules for the agents
- Reporters: Functions that tell us something about the status of the agents
Tobacco Mosaic Virus (TMV) Spread Model
This year, our team developed a model of cell-to-cell spreading of Tobacco Mosaic Virus (TMV) using NetLogo. The model has two agents, virus replication complexes (VRC) and cells. VRCs are represented as small white triangles, and can move within the cell, spread to adjacent cells, and replicate inside a cell. Cells are represented by a green area that is enclosed by a black border. The cells are essentially a randomly generated Voronoi diagram. Healthy cells are represented as green, while infected ones are represented as red. The model also has links between adjacent cells, which are represented by gray lines between the centers of adjacent cells. This is a dual graph of the Voronoi diagram. There is also a slider that allows the user to change the number of initial infection sites (1 - 20).
Assumptions of the model
- The number of VRCs produced in a cell to spread to adjacent cells is equal to the number of its healthy adjacent cells.
- VRCs do not move to an already infected cell.
- Each VRC spreads to a different healthy cell.
Behavior of VRC agents
Initial infection and spreading [3]:
Hours post infection | Description |
---|---|
0 - 14 | Viral RNA and proteins are produced, and VRCs are formed. They are still motionless. |
14 - 16 | VRCs rapidly move within the initially infected cell. |
16 - 18 | Movement of VRCs is slowed down and eventually stopped. VRCs are now located at or adjacent to plasmodesmata. |
18 - 20 | Plasmodesmata are opened by VRCs. They are motionless in this state. |
20+ | VRCs spread to adjacent cells, and replication is initiated in secondary cells. |
Replication and spreading in secondary cells and so on [3]:
Hours post infection | Description |
---|---|
0 - 2 | VRCs rapidly move within the infected cell. |
2 - 4 | VRCs move to plasmodesmata and spread to adjacent cells to infect more cells. |
To view the source code of this model, please visit our organizational Github account.
Future goals
This model could definitely be expanded to simulate the spread of TMV throughout the entire plant, not just one leaf. It would also be possible to include XRN1 in the model and analyze its effect on the spread of the virus. Future iGEM teams could also use this program as a baseline to develop their own virus spread models.
References
- Bonabeau, Eric. “Agent-Based Modeling: Methods and Techniques for Simulating Human Systems.” PNAS, National Academy of Sciences, 14 May 2002, www.pnas.org/content/99/suppl_3/7280/tab-figures-data.
- Galán, José Manuel, Izquierdo, Luis R., Izquierdo, Segismundo S., Santos, José Ignacio, del Olmo, Ricardo, López-Paredes, Adolfo and Edmonds, Bruce (2009). 'Errors and Artefacts in Agent-Based Modelling'. Journal of Artificial Societies and Social Simulation 12(1)1.
- Kawakami, Shigeki, et al. “Tobacco Mosaic Virus Infection Spreads Cell to Cell as Intact Replication Complexes.” PNAS, National Academy of Sciences, 20 Apr. 2004, www.pnas.org/content/101/16/6291.