Difference between revisions of "Team:Tongji Software/Project"

Line 774: Line 774:
 
             <section class="cd-section">
 
             <section class="cd-section">
 
                 <h1 id="ProMainTitle"><b>P</b>ROJECT</h1>
 
                 <h1 id="ProMainTitle"><b>P</b>ROJECT</h1>
                 <img src="https://static.igem.org/mediawiki/2019/e/e1/T--Tongji_Software--picture-logo2.png" style="width:100vw"></img>
+
                 <img src="https://static.igem.org/mediawiki/2019/b/b8/T--Tongji_Software--picture-Mainpage_project.png" style="width:100vw"></img>
 
             </section>
 
             </section>
 
                 <!-- <div id="Description_jump" style="height:9em;"></div><br><br><br><br><br><br><br>-->
 
                 <!-- <div id="Description_jump" style="height:9em;"></div><br><br><br><br><br><br><br>-->

Revision as of 13:00, 20 October 2019

Tongji Software | Pathlab

PROJECT
Open navigation

PROJECT

PROJECT - DESCRIPTION

OVERVIEW

Our software constructs an optimal synthetic pathway in E. coli or yeast based on the desired product provided by the user. In such a synthetic pathway, we will comprehensively consider the requirements and provide information about the enzymes needed for each step of the reaction. Finally, along with the appropriate promoter, the sequences of all the required enzymes are joined together to form a backbone of a biobrick for the user. At the same time, the relevant research literature, as well as a post-experiment feedback community, will be provided.

WHY THIS PROJECT -- MEET THE NEEDS

A computational tool for pathway design and reconstruction is needed when synthetic biologists want to optimize genetic processes within cells, model for yield prediction, make flux balance analysis and generate value-added products. However, when actually establishing a metabolic pathway, it is a cumbersome problem to separately purchase different enzymes from different suppliers and transfer them into chassis. We consider that all the enzymes in a pathway can be constructed in the same plasmid to transfer at one time. And then, the regulation of enzyme expression under different conditions will ensure the realization of the pathway. In this process, synthetic DNA may be an indispensable part. Although the cost of synthetic DNA is not low at present, it continues to decline. We believe that synthetic DNA will be popular in the future, and by that time, our tool will be more practical.

HOW WE START -- INSPIRATION INSIDE IGEM

We appreciate three previous iGEM projects that inspired us:
①Team: Tongji-Software 2018——Their useful tool AlphaAnt shows us the framework to design a pathway.
②Team: HokkaidoU_Japan 2012——Their experiments give us confidence to construct multiple enzymes on the same plasmid.
③Team: IIT-Madras 2017——Their statistics on codon preferences give us inspiration for sequence optimization.

WHAT WE ARE DOING

On the main body, based on the project of Tongji-Software in 2018, we optimize the algorithm by pruning, and expand the database of the reaction, adding novel reactions [1].


With reference to the frequency of use of various biological chassis, there are two options for chassis available for users: E. coli and yeast [2]. We will produce different results depending on the strain selected by the user.


We select enzyme with higher catalytic efficiency by the nature of the parameters of the enzyme itself [3]. To ensure that the enzyme is expressed normally, we use taxonomic knowledge and sequence alignment analysis to select strains that are close to the selected chassis as the sequence source for the enzyme. Subsequently, the codons are optimized. In regulating the expression of synthetic sequences, we integrate the relevant signaling pathways to make the biobrick skeleton in the results more practical. At the same time, the comprehensive physical and chemical properties of the enzyme are also parts of the results, which can be applied in actual experimental operations.


In addition, we consider the association recommendations for the literature on products or enzymes. In this way, users may be able to explore more research directions.


After all, the results of the design software are ideal. We need to establish a community where synthetic biologists can exchange ideas and apply feedback after the actual experiment. This community not only provides users with a reference to the results, but also provides a direction for our developers to improve the software.

REFERRENCE

[1] Hadadi N, MohammadiPeyhani H, Miskovic L, Seijo M, Hatzimanikatis V. Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci U S A. 2019;116(15):7298–7307.

[2] Juhyun Kim, Manuel Salvador, Elizabeth Saunders, Jaime González, Claudio Avignone-Rossa, and Jose Ignacio Jiménez. Properties of alternative microbial hosts used in synthetic biology: towards the design of a modular chassis. Essays Biochem. 2016 Nov 30; 60(4): 303–313.

[3] Pablo Carbonell, Jerry Wong, Neil Swainston, Eriko Takano, Nicholas J Turner, Nigel S Scrutton, Douglas B Kell, Rainer Breitling, Jean-Loup Faulon, Selenzyme: enzyme selection tool for pathway design, Bioinformatics, Volume 34, Issue 12, 15 June 2018, Pages 2153–2154.

scroll down

PROJECT - DESIGN

DATA

Based on the data of 2018 Tongji-Software team, we updated them. The physicochemical properties of enzymes are collated in BRENDA database, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature.



几个数据库整理出来的数据格式.jpg

Fig1. Data sources of Pathlab


SEARCHING ALGORITHM

算法图示.jpg

Instead of DFS algorithm which is used in last year, we choose Greedy algorithm. Greedy algorithm is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the intent of finding a global optimum. In many problems, a greedy strategy does not usually produce an optimal solution, nonetheless a greedy heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time. And in our software, with limited reactions, we drew the conclusion that Greedy algorithm can also get a globally optimal solution with less time after testing.

RANKING CRITERIA

具体图文解释.jpg

When scoring the pathway, we consider feasibility of thermodynamic, competition of heterologous reactions, frequency of reactions and toxicity of compounds, which are used in last year's project. At the same time, each factor has the corresponding weight. Users can change the weight of each factor to meet different requirements. For example, for a chemist seeking an in vitro reaction, without considering cytotoxicity, he could set the weight to 0.






具体图文解释.jpg


In the function of Enzyme Selection, we searched for the presence of the required enzyme in the close source bacteria of the engineering bacteria according to the affinity of the bacteria. If the same enzyme exists in multiple near-source bacteria, we will arrange the sequence according to the physicochemical properties of the enzyme, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature. In order to measure the adaptability of physical and chemical properties, we build a model.



LITERATURE RECOMMENDATION

Considering that users will perform some personalized operations in the practical application of the result pathway, we provide the relevant literature keywords of compounds in Pubmed required for each step of the reaction in the form of wordcloud. This may inspire users with possible target compounds or a follow-up research direction.



词云

Fig2. Wordcloud examples



CONDON OPTIMIZATION

We searched the codon preference databases of E. coli and yeast from the Internet, and modified the infrequently used codons with the information in the database to avoid the trouble caused by the differences of translation and gene expression. Besides, we improve the success of expressing foreign genes.


密码子优化

Table1. Codon usage frequency & score



PROJECT - CONTRIBUTION

WHAT WE DO :

Building a complete pathway requires three steps: searching for a pathway, selecting related enzymes, and designing parts. These steps are quite difficult for a worker to achieve by himself, so we aim at making the whole process into one software to release workers from complicated and boring work.


With the good wish, we developed our software called Pathlab, which core idea is modular design. We The users can choose certain module. The users can choose to use one module or the combination of any modules.


In brief, Pathlab makes people who work with synthetic biology have a platform to search a certain pathway that can be applied.



运行截图.jpg运行截图.jpg

Fig3. Software run shots



PROJECT - VALIDATION

In order to verify whether pathlab can achieve the expected function, we use software to search several paths and compare them with the actual paths in the literature.







EXAMPLE1 - Validate with Alpha Ant

Pathway for the production of flavonoids from glucose

The first validation example is selected from last year’s job - Alpha Ant’s validation case study, because our project makes an improvement from it.


Flavonoids comprise a large family of secondary plant metabolic intermediates that exhibit a wide variety of antioxidant and human health-related properties. However, their wide spread use and availability are currently limited by inefficiencies in both their chemical synthesis and extraction from natural plant sources. [6]As a result, significant strides have been made recent years in improving the microbial production of flavonoids. There are four steps of pathway that are known to be productive for the conversion of L-tyrosine to naringenin(C00509), the main flavonoid precursor.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v1-1

v1-2

Fig5. Searching results by Pathlab (top) & other pathway predicting tools (bottom)



As the figure shows, we can get the same path used in the literature, which suggests that our software can work and the result is reliable from the perspective of literature.







EXAMPLE2 - Validate with iGEM19_CAU_China

Astaxanthin synthesis pathway

Astaxanthin is the most powerful antioxidant found in nature. It has a wide range of health care functions, including fighting high blood pressure by reducing oxidative stress and relaxing blood vessel walls and even inhibiting cancer metastasis. Astaxanthin has a promising market, with over 98% pure products sold at SIGMA for up to $200 /50 mg. This year CAU_China constructs an engineering Escherichia coli using cellulose to produce astaxanthin to deal with the dilemma of stalk treatment in China.


The enzymes involved in each step of the astaxanthin synthesis pathway have been well understood. So based on our collaboration, we use their pathway to validate our software.

Here is their pathway:

First, we search the pathway of astaxanthin synthesis from Farnesyl pyrophosphate, as the result shows, we can find the pathway they use efficiently. And more exciting, the pathway CAU_China used is the top1 in our result which prove that our software is efficient on the pathway search.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v2-1

Fig6. Searching result for example2



v2-2

Fig7. Pathway construct report



Then, to validate our enzyme selection part, we used our software to select enzymes for each reaction. According to the report, it contents the source organisms of the enzymes they use. But we can’t offer enough information because of the limitation of the databases we use. However, we can give the suitable enzyme selection result with existing data.


v2-3v2-4

Fig8. Enzyme selection report







EXAMPLE3 - Validate with iGEM12_Tokyo_Tech

synthesize P(3HB)

Polyhydroxyalkanoates(PHAs) are biological polyester synthesized by a wide range of bacteria, and can be produced by fermentation from renewable carbon sources such as sugars and vegetable oil. Team iGEM12_Tokyo_Tech created the first Biobrick part to synthesize P(3HB), a kind of PHAs. At the beginning, we choose this project for validation because of it’s integrity in the information of pathway and enzymeand the romantic story contained.

Their pathway is

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

First of all, we can find this pathway in our software.

v3-1

The enzyme they used for each step is 2.3.1.16>1.1.1.36>2.3.1.-
Here is our selection result, the enzyme donor they used is concluded.

v3-2-1

v3-2-2

v3-2-3

Fig9. Enzyme donors results



By the limitation of databases, we just get little information, but it’s enough to support the research of preliminary investigation by the validation of literature and experiment. To minimize the trouble brought by the limitation, we have made the platform that users can submit their experiment data to expand the database.







EXAMPLE4 - Validate with comparison to tradational pathway by Tongji_China

Indol pathway

The representative blue of denim fabrics usually derives from indigo, and the high demand for such dyes has led to the production of indigo by chemical synthesis on an industrial scale.


To promote the practical application of this method, they plan to remove the inhibition coming from glucose to the circuit based on team Berkeley 2013 to make it possible to use low-cost carbon sources possible and try to find a cost-effective indole donor. With the research of related industries, they design an accessible environmentally-friendly indigo dye production system with application value.


During the previous collaboration, we have tried to find an indol donor by computer searching. Disappointedly, there is no useful results. So they go back to used the traditional pathway. After finishing our project, we searched their pathway again to made a validation for our software and make a comparison for Tongji_China between software search and traditional way.


Here is their pathway get from the traditional research.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v4-1

Fig10. Indol pathway


v4-2

Fig11. Indol pathway searching result



Here is our software result. After the optimism, we found their pathway as axpexted, but interestingly, this pathway’s score is very low, we can see the difference to the top1.

v4-3-2

Fig12. Software result screenshot



From this result, we can see that our software is useful and the difference between traditional way and computer. We plan to do experiment to validate which one is better after iGEM.

PROJECT - DEMONSTRATION

PROJECT - IMPROVE

Our software was built on the project of last year's Tongji_Software team. The main improvement is to change the searching algorithm and add software functions, including the enzyme selection and parts design

SEARCHING ALGORITHM

In theory, the greedy algorithm may fail to get a global optimum while improving the speed. However, we used both DFS algorithm and greedy algorithm to find specific pathways, and then compared the results. we made tests to check the accuracy of Greedy in limited reactions compared with DFS, and found that the accuracy of two algorithms is similar, while the speed of greedy algorithm is significantly improved, so we regard it as a good improvement.


DFS


DFS

Fig13. Comparison of time consuming between DFS & BFS


ADDITIONAL FUNCTIONS

In choosing the enzymes needed for each reaction, we establish our own judgment model. At the same time, the key words related to compounds needed in the pathway were sorted out, and these key words would be presented as a word cloud. When providing the final result of enzyme selection to the user, the optimized sequence is provided considering the codon preference for the engineering bacteria.


In parts design, we cleared up the data from iGEM part database, and we made a search engine which enables users to search parts with their name or a certain function.


These functions can be used as a whole, meanwhile they can be used separately.

What’s more, users can apply for their own account on our website, and can leave a message on the webpage. We will always pay attention to users’ message and constantly optimize the Pathlab, and users can also make comments about optimized enzymes or different parts. Moreover, users’ message will be seen by others, and they can communicate through the message board and read others comments about the enzyme or parts they are going to use.

PROJECT - COLLRBORATION

The paths found by our software are based on databases and algorithms, which need to be verified by practical experiments. At the same time, the results obtained by our software can provide support for the path design of the experimental team.


Through CCiC, we had a deep communication with three other experimental teams related to pathways construction. We know the substrates they own and the products they want to get,then try to design parts through Pathlab search paths and verify with the pathways they implement.

COLLABORATION 1:TONGJI_CHINA

Because we are from the same school, Tongji_China and us have more integrated collaboration from the very beginning. We had conferences together for several times, and their project is about manufacturing, meanwhile, ours about pathway search, so we get feedback from them after they used our software, and our results also inspire them sometimes.


One of their suggestions which had a great influence to us is that we should avoid some unreasonable results putting a group on a compound and then taking it apart, which is pretty useless. So we added codes to avoid this kind of situation taking place. And we had searched the pathway from tryptophan to indole they used, but we didn't get a realistic and practical result, such as there will be some pathways fall into the cycle. So, we realized that the database we used had limitation.


For Tongji_China, they tried to improve the synthesis of indigo, and the method of finding new pathways could be found through reading literature, experimental attempts through the combination of existing pathways, or simulated synthesis through software design and retrieval. Therefore, we provided help in software retrieval. But in the existing database we did not find useful results, because the data in the synthetic indigo pathways are already published literature or the experimental materials needed are too expensive, not suitable for synthesis, but the upstream and downstream information about indigo we found provided certain reference and support for their experiments. They also tried to give us their attempts to enrich our database for designing more efficient and useful pathways.


Later, they completed their synthesis pathway based on the combination of two pathways published, and we also have optimized our software. Here they tested our software by searching their pathway, the search result provides us a sample of comparation between software and traditional experiment.



DFS

Fig14. Collaboration between Tongji_China





COLLABORATION 2:WASHIFTON IGEM

Washington iGEM invited us to participate in the manufacturing of their audiobook which is a popular science of biology, and we mainly do some translation and recording word for them, thus making chinese students can read and know what biology is.

COLLABORATION 3:SASTRA IGEM

SASTRA iGEM invited us to participate in their manufacturing of their magazine, and our collaboration forms included but was not limited to writing articles about synthesis biology and experiment, providing interviews with professionals, making the theme of synthesis biology and taking related photography.

COLLABORATION 4:UCD IGEM

We participated in UCD’s research about the use of mammals in this iGEM competition.

COLLABORATION 5:UESTC_Software

The software team of UESTC does the integration of various parts databases, taking iGEM parts database of the main body of the integration, which is very convenience for users to search related information. Not only can integrated information of database improve the efficiency of searching, but it can also provide other software team a strong data support, and our collaboration is based on data. What we do is to complete the pathway design part, from the reaction to the catalytic enzyme, to the choice of regulatory parts, and for the regulatory parts to choose, different users can choose different options according to the experimental requirements, so we want to establish of the regulatory parts database and build a search engine, so that the user can according to their own needs to retrieve the corresponding parts. UESTC software team has done the data collation of iGEM parts database, so we established cooperation with them, and they provided us with data support, which reduced our workload. What’s more, we provide the link to their software where can get more complete information of selected parts.



DFS

Fig15. Software logo of UESTC_Software





COLLABORATION 6:CAU iGEM

The cooperation with China Agricultural University is based on their demand for detail information of their pathway, and it is also an attempt to apply our software into practice. What they did was to synthesize astaxanthin from glucose which come from the degradation of cellulose, and the synthetic pathway was retrieved from the literature, but for the technical team, the information available from the literature was limited, and searching through the database was a time-consuming process. So, we tried to search the software for possible pathways from lycopene to astaxanthin, and finally we provided them with a PDF of the results of the software search, from which they got some reliable information for their experiments. With our help, they felt amazed to have access to information that was not expected from the literature, and it would be interesting to see if the results of the software search performed better than those of the literature, but this verification is subject to time, so if possible, we can make this verification after iGEM.



DFS

Fig16. Collaboration between CAU_China





COLLABORATION 7:SJTU-software

SJTU-software contacted us to make collaboration with us about the use and function of software. So, we organized a seminar face to face in Shanghai Jiao Tong University. In the conference, we also invited UESTC_Software to join us online. Each team showed that what they do, which data they use, what function they have and how to use their software just like a demonstration. After the presentation, we talked about the problems exist in the software, and gave put forward some advice for each team. For UESTC_Software, their software is complete and user friendly, we give them some suggestion in details. For SJTU-software, we give them some technical instruct. We use the same frame to build our software, so we show our source data, and explain it to them. For us, we realized the disadvantages of login function from their advice which we based on to add the comment to each result of user get.



DFS

Fig17. Collaboration between SJTU_software