Difference between revisions of "Team:Tongji Software/Project"

Line 726: Line 726:
 
         /*  -----------------------------  footer style --------------------------- */
 
         /*  -----------------------------  footer style --------------------------- */
 
         #FOOTER p{
 
         #FOOTER p{
               font-size:15px;
+
               font-size:13px;
 
         }
 
         }
 
         .footer-distributed {
 
         .footer-distributed {

Revision as of 18:07, 21 October 2019

Tongji Software | Pathlab

PROJECT
Open navigation

PROJECT

PROJECT - DESCRIPTION

OVERVIEW

With the development of synthetic biology, it is possible to design metabolic pathways and achieve them. Therefore, an integrated platform for pathway construction is needed urgently. Our software, Pathlab, perfectly caters to this demand with accurate and efficient algorithms and open data in the KEGG and BRENDA databases. It constructs an optimal synthetic pathway in E. coli or yeast based on the desired product provided by the user. In such a synthetic pathway, we will comprehensively consider the requirements and provide information about the enzymes needed for each step of the reaction. Moreover, Pathlab provides additional functions, such as novel reaction database for users to try some reactions not exist in the KEGG database, Word Cloud of compounds that contains keywords from latest published literature and a search engine for promoters and parts in iGEM database.

WHY THIS PROJECT -- MEET THE NEEDS

A computational tool for pathway design and reconstruction is needed when synthetic biologists want to optimize genetic processes within cells, model for yield prediction, make flux balance analysis and generate value-added products. However, when actually establishing a metabolic pathway, it is a cumbersome problem to separately purchase different enzymes from different suppliers and transfer them into chassis. We consider that if all the enzymes in a pathway can be constructed in the same organism to transfer at one time will make the experiment to be easier and more convenient. So we develop our software, which have pathway search, enzyme select and parts browser functions to make the pathway design process can be finished on one platform. In this process, synthetic DNA may be an indispensable part. Although the cost of synthetic DNA is not low at present, it continues to decline. We believe that synthetic DNA will be popular in the future, and by that time, our tools will be more practical.

HOW WE START -- INSPIRATION INSIDE IGEM

We appreciate three previous iGEM projects that inspired us:
①Team: Tongji-Software 2018——Their useful tool AlphaAnt shows us the framework to design a pathway.
②Team: HokkaidoU_Japan 2012——Their experiments give us confidence to construct multiple enzymes on the same plasmid.
③Team: IIT-Madras 2017——Their statistics on codon preferences give us inspiration for sequence optimization.

WHAT WE ARE DOING

On the main body, based on the project of Tongji-Software in 2018, we changed the algorithm to the Greedy algorithm to accelerate the running speed with the same accuracy, and expand the database of the reaction, adding novel reactions[1].


With reference to the frequency of use of various biological chassis, there are two options for chassis available for users: E. coli and yeast[2]. We will produce different results depending on the strain selected by the user.


We select enzyme with higher catalytic efficiency by the nature of the parameters of the enzyme itself [3]. To ensure that the enzyme is expressed normally, we use taxonomic knowledge and the alignment of important parameters of enzymes to select strains that are close to the selected chassis as the sequence source for the enzyme. Subsequently, the codons are optimized according to the codon preference of the selected chassis organism. In the parts section, we build a browser for users can efficiently find the parts they want in iGEM parts database, which can help them to design their own personalized biobricks.


In addition, considering that users may not clearly know the latest research of the compound related to the pathway, we make word clouds based on key words of latest issued literature for each compound In this way, users may be able to explore more research directions.


After all, the results of the design software are ideal. We need to establish a community where synthetic biologists can apply feedback after the actual experiment and tell us about the perform of certain enzyme under specific condition. This community not only provides users with a reference to the results, but also provides a direction for our developers to improve the software and makes it possible for us to collect more data to perfect the exsiting functions, even develop new functions according the needs.

REFERRENCE

[1] Hadadi N, MohammadiPeyhani H, Miskovic L, Seijo M, Hatzimanikatis V. Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci U S A. 2019;116(15):7298–7307.

[2] Juhyun Kim, Manuel Salvador, Elizabeth Saunders, Jaime González, Claudio Avignone-Rossa, and Jose Ignacio Jiménez. Properties of alternative microbial hosts used in synthetic biology: towards the design of a modular chassis. Essays Biochem. 2016 Nov 30; 60(4): 303–313.

[3] Pablo Carbonell, Jerry Wong, Neil Swainston, Eriko Takano, Nicholas J Turner, Nigel S Scrutton, Douglas B Kell, Rainer Breitling, Jean-Loup Faulon, Selenzyme: enzyme selection tool for pathway design, Bioinformatics, Volume 34, Issue 12, 15 June 2018, Pages 2153–2154.

scroll down

PROJECT - DESIGN

DATA

Based on the data of 2018 Tongji-Software team, we updated them. The physicochemical properties of enzymes are collated in BRENDA database, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature.



几个数据库整理出来的数据格式.jpg

Fig1. Data sources of Pathlab


SEARCHING ALGORITHM

算法图示.jpg

Instead of DFS algorithm which is used in last year, we choose Greedy algorithm. Greedy algorithm is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the intent of finding a global optimum. In many problems, a greedy strategy does not usually produce an optimal solution, nonetheless a greedy heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time. And in our software, with limited reactions, we drew the conclusion that Greedy algorithm can also get a globally optimal solution with less time after testing.

RANKING CRITERIA

具体图文解释.jpg

When scoring the pathway, we consider feasibility of thermodynamic, competition of heterologous reactions, frequency of reactions and toxicity of compounds, which are used in last year's project. At the same time, each factor has the corresponding weight. Users can change the weight of each factor to meet different requirements. For example, for a chemist seeking an in vitro reaction, without considering cytotoxicity, he could set the weight to 0.






具体图文解释.jpg


In the function of Enzyme Selection, we searched for the presence of the required enzyme in the close source bacteria of the engineering bacteria according to the affinity of the bacteria. If the same enzyme exists in multiple near-source bacteria, we will arrange the sequence according to the physicochemical properties of the enzyme, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature. In order to measure the adaptability of physical and chemical properties, we build a model.



WORD CLOUD

Considering that in the early stage of establish a project, researchers may not have a clear idea of each compound involved in the pathway, so it is very essential to give some aids to briefly know these compounds. Therefore, we introduced Word Cloud to visualize the key words of latest published literature to clearly show the advanced research directions of certain compound.



词云

Fig2. Word Cloud examples



CODON OPTIMIZATION

We searched the codon preference databases of E. coli and yeast from the Internet, and modified the infrequently used codons in the enzyme sequence with the information in the database to avoid the trouble caused by the differences of translation and gene expression in heterologous host, thus improving the success rate of host expressing foreign genes.


密码子优化

Table1. Codon usage frequency & score



PROJECT - CONTRIBUTION

WHAT WE DO :

Our software Pathlab is built to make it become easier when manufacturing researcher or scientific researcher want to realize a pathway in common engineering bacteria. With the good wish, we developed our software Pathlab, which core idea is modular design. We devide different function into modules so that the users can choose a certain module to use or the combination of any modules.


In the process of they realizing the pathway, they should know which pathway to use first, so the most powerful function of our software is pathway search, which can search the pathway when given the substrate and product. For those who just wonder what they can get through the compound they have, we set the one step search function. And users can modify the weight matrix and the results they want to get to meet their own needs.


运行截图.jpg

Fig5. Function Page of Pathway Search


On the base of pathway, they can use our software to do enzyme select and parts browser job, which is important for the following experiment. In enzyme select part, we take the homology and the basic physical and chemical properties of enzymes into consideration to rank them and the given result can be chosen by the users.


运行截图.jpg

Fig5. Function Page of Enzyme Selection.



The parts browser can help users to search the parts from iGEM database in the form of keywords or parts ID. Compared with querying parts one by one directly and blindly from the iGEM website, it is obviously more efficient to use the parts browser.


运行截图.jpg

Fig5. Function Page of Parts Browser



Since the central idea of synthesis biology is to use biological methods to synthesize substances, a big industry of synthesis biology is manufacturing, and what we are doing is closely related to this. Because of the limitation of time and data, we can’t take every possible aspects into consideration to choose the perfect pathway, but Pathlab can make the researchers become more targeted when reading literature in the early stage of project establishment.



PROJECT - VALIDATION

In order to verify whether pathlab can achieve the expected function, we use software to search several paths and compare them with the actual paths in the literature.







EXAMPLE1 - Validate with Alpha Ant

Pathway for the production of flavonoids from glucose

The first validation example is selected from last year’s job - Alpha Ant’s validation case study, because our project makes an improvement from it.


Flavonoids comprise a large family of secondary plant metabolic intermediates that exhibit a wide variety of antioxidant and human health-related properties. However, their wide spread use and availability are currently limited by inefficiencies in both their chemical synthesis and extraction from natural plant sources. As a result, significant strides have been made recent years in improving the microbial production of flavonoids. There are four steps of pathway that are known to be productive for the conversion of L-tyrosine to naringenin(C00509), the main flavonoid precursor.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v1-1

v1-2

Fig5. Searching results by Pathlab (top) & other pathway predicting tools (bottom)



As the figure shows, we can get the same path used in the literature, which suggests that our software can work and the result is reliable from the perspective of literature.







EXAMPLE2 - Validate with iGEM19_CAU_China

Astaxanthin synthesis pathway

Astaxanthin is the most powerful antioxidant found in nature. It has a wide range of health care functions, including fighting high blood pressure by reducing oxidative stress and relaxing blood vessel walls and even inhibiting cancer metastasis. Astaxanthin has a promising market, with over 98% pure products sold at SIGMA for up to $200 /50 mg. This year CAU_China constructs an engineering Escherichia coli using cellulose to produce astaxanthin to deal with the dilemma of stalk treatment in China.


The enzymes involved in each step of the astaxanthin synthesis pathway have been well understood. So based on our collaboration, we use their pathway to validate our software.

Here is their pathway:

First, we search the pathway of astaxanthin synthesis from Farnesyl pyrophosphate, as the result shows, we can find the pathway they use efficiently. And more exciting, the pathway CAU_China used is the top1 in our result which prove that our software is efficient on the pathway search.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v2-1

Fig6. Searching result for example2



v2-2

Fig7. Pathway construct report



Then, to validate our enzyme selection part, we used our software to select enzymes for each reaction. According to the report, it contents the source organisms of the enzymes they use. But we can’t offer enough information because of the limitation of the databases we use. However, we can give the suitable enzyme selection result with existing data.So, it is also a collaboration.


v2-3v2-4

Fig8. Enzyme selection report







EXAMPLE3 - Validate with iGEM12_Tokyo_Tech

Synthesize P(3HB)

Polyhydroxyalkanoates(PHAs) are biological polyester synthesized by a wide range of bacteria, and can be produced by fermentation from renewable carbon sources such as sugars and vegetable oil. Team iGEM12_Tokyo_Tech created the first Biobrick part to synthesize P(3HB), a kind of PHAs. At the beginning, we choose this project for validation because of it’s integrity in the information of pathway and enzymeand the romantic story contained.

Their pathway is

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

First of all, we can find this pathway in our software.

v3-1

The enzyme they used for each step is 2.3.1.16>1.1.1.36>2.3.1.-
Here is our selection result, the enzyme donor they used is concluded.

v3-2-1

v3-2-2

v3-2-3

Fig9. Enzyme donors results. (The organism highlight with orange is the used bacteria.)



By the limitation of databases, we just get little information, but it’s enough to support the research of preliminary investigation by the validation of literature and experiment. To minimize the trouble brought by the limitation, we have made the platform that users can submit their experiment data to expand the database.







EXAMPLE4 - Validate with comparison to tradational pathway by Tongji_China

Indole pathway

The representative blue of denim fabrics usually derives from indigo, and the high demand for such dyes has led to the production of indigo by chemical synthesis on an industrial scale.


To promote the practical application of this method, they plan to remove the inhibition coming from glucose to the circuit based on team Berkeley 2013 to make it possible to use low-cost carbon sources possible and try to find a cost-effective indole donor. With the research of related industries, they design an accessible environmentally-friendly indigo dye production system with application value.


During the previous collaboration, we have tried to find an indole donor by computer searching. Disappointedly, there is no useful results. So they go back to used the traditional pathway. After finishing our project, we searched their pathway again to made a validation for our software and make a comparison for Tongji_China between software search and traditional way.


Here is their pathway get from the traditional research.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

v4-1

Fig10. Indole pathway


v4-2

Fig11. Indole pathway searching result



Here is our software result. After the optimism, we found their pathway as axpexted, but interestingly, this pathway’s score is very low, we can see the difference to the top1.

v4-3-2

Fig12. Software result screenshot



From this result, we can see that our software is useful and the difference between traditional way and computer. We plan to do experiment to validate which one is better after iGEM.

PROJECT - DEMONSTRATION



This video shows how to use our software, and you can see the functions Pathlab have through the video. As the video shown, our software is workable. And the efficency of our software is demonstrated as below.




PROJECT - IMPROVE

Our software was built on the project of last year's Tongji_Software team. The main improvement is to change the searching algorithm and add software functions, including the enzyme selection and parts design

SEARCHING ALGORITHM

In theory, the greedy algorithm may fail to get a global optimum while improving the speed. However, we used both DFS algorithm and greedy algorithm to find specific pathways, and then compared the results. we made tests to check the accuracy of Greedy in limited reactions compared with DFS, and found that the accuracy of two algorithms is similar, while the speed of greedy algorithm is significantly improved, so we regard it as a good improvement.


DFS


DFS

Fig13. Comparison of time consuming between DFS & BFS


UPDATED & APPENDED DATABASE

Compare with data they used last year, we get all data updated, including but not limited to KEGG pathway data, compounds’ toxicity data, reactions’ Gibbs free energy, KM, Kcat. Also, we add some new data this year such as novel reaction data from LCSB database [1], enzyme sequences from KEGG, enzyme’s physicochemical properties from BRENDA [2] and parts data from iGEM parts registry [3].

All these updated and new added data have greatly improved the reliability of our searching results, which make our software has a better performance.


ADDITIONAL FUNCTIONS

NOVEL REACTION


From the feedback given by last year’s project and survey we did this year, we found that there is a great need to add novel reactions into our project this year. Introduce novel reactions into our project can create more possible pathway searching results which in a way can make our searching process much easier. After getting permission from LCSB database, we expend our pathway and reaction database with their ATLAS reaction database. We totally get 137879 novel reaction raw data, after filter and selection there are 49396 novel reactions with high confidence level left. Most of novel reactions are predicted and generated by BNICE.ch, a powerful computational method to explore the theoretical space of biochemistry. Users can decide whether introduce novel reactions into searching process or not on their own. If they want thing new, they can just to click the ‘yes’ button next to ‘Novel reaction’ on pathway build page of Pathlab, then novel reactions will be taken into consideration.


Although we cannot confirm the enzyme information which they predicted are 100% right, from the example shows below we can find that the adding of novel reactions can complete the whole reaction network and offer more choice to users to make our project more practicable.

novel reaction


In choosing the enzymes needed for each reaction, we establish our own judgment model. At the same time, the key words related to compounds needed in the pathway were sorted out, and these key words would be presented as a word cloud. When providing the final result of enzyme selection to the user, the optimized sequence is provided considering the codon preference for the engineering bacteria. We try to avoid problems of enzyme function change and gene expression problem through our selecting method.



ENZYME RECOMMENDATION


Actually, finding out a possible pathway is just the first step, there are still many works that we can do after pathway searching. Alpha Ant, their project last year, stops just at pathway searching and this year we want to make our project go further. After we getting some predicting pathways, Pathlab will recommend enzymes used in each step in every pathway and meanwhile offer some information of that enzyme. We establish our own judgment model to choose the enzyme selected for each reaction, which take best pH, best temperature, KM, Kcat and some other enzyme properties into consideration. If the enzyme we need do not exist in set chassis, Pathlab will recommend several most likely enzymes from other related species and offer their optimized sequences.


Through the improvement on enzyme selection and recommendation, we want to give our users more useful information and better user experience.




CODON OPTIMIZATION


To make our enzymes have a better expression level in target organism (chassis), Pathlab not only recommends the most likely enzymes from related organisms but also gives them a codon optimization. As we know the expression level of an exogenous gene is influenced by many factors such as the differences between codon usage bias and GC content in different organisms. Codon optimization can make exogenous gene performance better in chassis’ codon environments and in a way having a better expression level.


To make Pathlab more practical we add this function, and we think it will be a big help for users who wants to introduce an exogenous pathway into their project.




WORD CLOUD


Considering that in the early stage of establish a project, researchers may not have a clear idea of each compound involved in the pathway, so it is very essential to give some aids to briefly know these compounds. Therefore, we introduced Word Cloud to visualize the key words of latest published literature to clearly show the advanced research directions of certain compound.




PARTS BROWSER


In parts browser, we had the data from iGEM part database, and we made a search engine which enables users to search parts with their name or a certain function. This function will help users to build the final biobricks.



The central thought of our software is modular design, which means all the functions mentioned can be used as a whole, meanwhile they can be used separately.


BETTER RESULT DISPLAY & USER EXPERIENCE

Every time after you use Pathlab, you can choose to get a report to download the information you have chosen on the website in order to save the record.

What’s more, users can apply for their own account on our website, and can leave a message on the webpage. On one hand, all the data we use are not complete, and we need to get more data from users, like the performance of enzymes under certain condition, so the user’s account will help us to collect more data. Meanwhile, other users can also know more detail about the enzyme or parts they want to use. On the other hand, we will always pay attention to users’ messages and constantly optimize the functions of Pathlab in order to make users have a better usage experience and get practical results.

PROJECT - COLLRBORATION

The paths found by our software are based on databases and algorithms, which need to be verified by practical experiments. At the same time, the results obtained by our software can provide support for the path design of the experimental team.


Through CCiC, we had a deep communication with three other experimental teams related to pathways construction. We know the substrates they own and the products they want to get,then try to design parts through Pathlab search paths and verify the pathways they implement.

COLLABORATION 1:TONGJI_CHINA

Since we are from the same school, Tongji_China and we have more integrated collaboration from the very beginning. We had conferences together for several times, and their project is about manufacturing, meanwhile, ours is about pathway search, so we get feedback from them after they used our software. Our results also inspire them at the same time.


One of their suggestions which had a great influence on us is that we should avoid some unreasonable results putting a group on a compound and then taking it apart, which is pretty useless. So we added codes to avoid this kind of situation taking place. And we had searched the pathway from tryptophan to indole they used, but we didn't get a practical result, for example, there will be some pathways fall into the cycle. Thus, we realized that the database we used had limitation.


DFS

Fig14.1 Collaboration between Tongji_China



For Tongji_China, they tried to improve the synthesis of indigo. The method of finding new pathways could be found through reading literature, experimental attempts through the combination of existing pathways, or simulated synthesis through software design and retrieval. Therefore, we provided help in software retrieval. Since the data in the synthetic indigo pathways are already published literature or the experimental materials needed are too expensive and not suitable for synthesis, we did not find useful results in the existed database. However, the upstream and downstream information about indigo we found provided certain reference and support for their experiments. They also tried to give us their attempts to enrich our database for designing more efficient and useful pathways.


Later, they completed their synthesis pathway design based on serval published pathway, and we also have optimized our software. Here they tested our software by searching their pathway, the search result provides us a sample of comparison between software and traditional experiment.



DFS

Fig14.2. The indole synthesised by Tongji_China





COLLABORATION 2:UESTC_Software

The software team of UESTC does the integration of various parts databases, taking iGEM parts database of the main body of the integration, which is very convenience for users to search related information. Not only can integrated information of database improve the efficiency of searching, but can also provide other software teams with a strong data support, and our collaboration is based on data. What we do is to complete the pathway design part, from the reaction to the catalytic enzyme, and then to the choice of regulatory parts. For the regulatory parts, different users can have different options according to the experimental requirements. We want to establish the regulatory parts database and build a search engine, so that the user can retrieve the corresponding parts according to their own needs. UESTC software team has done the data collation of iGEM parts database, so we established cooperation with them. They provided us with data support, which reduced our workload. What’s more, we provide the link to their software where users can get more complete information of selected parts.



DFS

Fig15. Software logo of UESTC_Software





COLLABORATION 3:CAU iGEM

The cooperation with China Agricultural University is based on their demand for detail information of their pathway, and it is also an attempt to apply our software into practice. What they did was to synthesize Astaxanthin from glucose which comes from the degradation of cellulose, and the synthetic pathway was retrieved from the literature, but the information available from the literature was limited for the technical team, and searching through the database was a time-consuming process. So, we tried to search the software for possible pathways from lycopene to Astaxanthin. Finally we provided them with a PDF of the results of the software search, from which they got some reliable information for their experiments. With our help, they felt amazed to have access to information that was not expected from the literature, and it would be interesting to see if the results of the software search performed better than those of the literature, but this verification is subject to time, so if possible, we can do this verification completely after iGEM.



DFS

Fig16. Collaboration between CAU_China





COLLABORATION 4:SJTU-software

SJTU-software contacted us to make collaboration about the use and function of software, so we organized a seminar face to face in Shanghai Jiao Tong University. In the conference, we also invited UESTC_Software to join us online. Each team showed what they do, which data they use, what function they have and how to use their software just like a demonstration. After the presentation, we talked about the problems existing in the software, and put forward some advice for each team. For UESTC_Software, their software is complete and friendly to users. We give them some suggestion in details. For SJTU-software, we give them some technical instruct. We use the same frame to build our software, so we show our source data, and explain it to them. For us, we realized the disadvantages of login function from their advice, which we based on to add the comment to each result of user get.



DFS

Fig17. Collaboration between SJTU_software





COLLABORATION 5:PROMOTION of SYNTHESIS BIOLOGY

SASTRA iGEM invited us to participate in their manufacturing of their magazine, and our collaboration forms included but was not limited to writing articles about synthesis biology and experiment, providing interviews with professionals, making the theme of synthesis biology and taking related photography.

Washington iGEM invited us to participate in the manufacturing of their audiobook which is about biology. We mainly do some translation and recording work for them, thus making chinese students what synthetic biology is and join some interesting experiments.