Revision as of 12:28, 21 October 2019

Tongji Software | Pathlab

PROJECT

Open navigation

PROJECT

PROJECT - DESCRIPTION

OVERVIEW

With the development of synthetic biology, it is possible to design metabolic pathways and achieve them. Therefore, an integrated platform for pathway construction is needed urgently. Our software, Pathlab, perfectly caters to this demand with accurate and efficient algorithms and open data in the KEGG and BRENDA databases. It constructs an optimal synthetic pathway in E. coli or yeast based on the desired product provided by the user. In such a synthetic pathway, we will comprehensively consider the requirements and provide information about the enzymes needed for each step of the reaction. Moreover, Pathlab provides additional functions, such as word clouds for keywords of pathway-related literature, search engine for promoters and parts used in iGEM

WHY THIS PROJECT -- MEET THE NEEDS

A computational tool for pathway design and reconstruction is needed when synthetic biologists want to optimize genetic processes within cells, model for yield prediction, make flux balance analysis and generate value-added products. However, when actually establishing a metabolic pathway, it is a cumbersome problem to separately purchase different enzymes from different suppliers and transfer them into chassis. We consider that all the enzymes in a pathway can be constructed in the same plasmid to transfer at one time. And then, the regulation of enzyme expression under different conditions will ensure the realization of the pathway. In this process, synthetic DNA may be an indispensable part. Although the cost of synthetic DNA is not low at present, it continues to decline. We believe that synthetic DNA will be popular in the future, and by that time, our tool will be more practical.

HOW WE START -- INSPIRATION INSIDE IGEM

We appreciate three previous iGEM projects that inspired us:
①Team: Tongji-Software 2018——Their useful tool AlphaAnt shows us the framework to design a pathway.
②Team: HokkaidoU_Japan 2012——Their experiments give us confidence to construct multiple enzymes on the same plasmid.
③Team: IIT-Madras 2017——Their statistics on codon preferences give us inspiration for sequence optimization.

WHAT WE ARE DOING

On the main body, based on the project of Tongji-Software in 2018, we changed the algorithm to the Greedy algorithm to accelerate the running speed with the same accuracy, and expand the database of the reaction, adding novel reactions ^[1].

With reference to the frequency of use of various biological chassis, there are two options for chassis available for users: E. coli and yeast ^[2]. We will produce different results depending on the strain selected by the user.

We select enzyme with higher catalytic efficiency by the nature of the parameters of the enzyme itself ^[3]. To ensure that the enzyme is expressed normally, we use taxonomic knowledge and the alignment of important parameters of enzymes to select strains that are close to the selected chassis as the sequence source for the enzyme. Subsequently, the codons are optimized according to the codon preference of the selected chassis organism. In the parts section, we build a browser for users can efficiently find the parts they want in iGEM parts database, which can help them to design their own personalized biobricks.

In addition, considering that users may not clearly know the latest research of the compound related to the pathway, we make word clouds based on key words of latest issued literature for each compound in this way, users may be able to explore more research directions.

After all, the results of the design software are ideal. We need to establish a community where synthetic biologists can apply feedback after the actual experiment and tell us about the perform of certain enzyme under specific condition. This community not only provides users with a reference to the results, but also provides a direction for our developers to improve the software and makes it possible for us to collect more data to perfect the existing functions, even develop new functions according the needs.

REFERRENCE

[1] Hadadi N, MohammadiPeyhani H, Miskovic L, Seijo M, Hatzimanikatis V. Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci U S A. 2019;116(15):7298–7307.

[2] Juhyun Kim, Manuel Salvador, Elizabeth Saunders, Jaime González, Claudio Avignone-Rossa, and Jose Ignacio Jiménez. Properties of alternative microbial hosts used in synthetic biology: towards the design of a modular chassis. Essays Biochem. 2016 Nov 30; 60(4): 303–313.

[3] Pablo Carbonell, Jerry Wong, Neil Swainston, Eriko Takano, Nicholas J Turner, Nigel S Scrutton, Douglas B Kell, Rainer Breitling, Jean-Loup Faulon, Selenzyme: enzyme selection tool for pathway design, Bioinformatics, Volume 34, Issue 12, 15 June 2018, Pages 2153–2154.

scroll down

PROJECT - DESIGN

DATA

Based on the data of 2018 Tongji-Software team, we updated them. The physicochemical properties of enzymes are collated in BRENDA database, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature.

Fig1. Data sources of Pathlab

SEARCHING ALGORITHM

Instead of DFS algorithm which is used in last year, we choose Greedy algorithm. Greedy algorithm is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the intent of finding a global optimum. In many problems, a greedy strategy does not usually produce an optimal solution, nonetheless a greedy heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time. And in our software, with limited reactions, we drew the conclusion that Greedy algorithm can also get a globally optimal solution with less time after testing.

RANKING CRITERIA

When scoring the pathway, we consider feasibility of thermodynamic, competition of heterologous reactions, frequency of reactions and toxicity of compounds, which are used in last year's project. At the same time, each factor has the corresponding weight. Users can change the weight of each factor to meet different requirements. For example, for a chemist seeking an in vitro reaction, without considering cytotoxicity, he could set the weight to 0.

In the function of Enzyme Selection, we searched for the presence of the required enzyme in the close source bacteria of the engineering bacteria according to the affinity of the bacteria. If the same enzyme exists in multiple near-source bacteria, we will arrange the sequence according to the physicochemical properties of the enzyme, including the ratio of Kcat to Km, Km value, optimal pH and optimal temperature. In order to measure the adaptability of physical and chemical properties, we build a model.

WORD CLOUD

Considering that in the early stage of establish a project, researchers may not have a clear idea of each compound involved in the pathway, so it is very essential to give some aids to briefly know these compounds. Therefore, we introduced Word Cloud to visualize the key words of latest published literature to clearly show the advanced research directions of certain compound.

Fig2. Word Cloud examples

CODON OPTIMIZATION

We searched the codon preference databases of E. coli and yeast from the Internet, and modified the infrequently used codons in the enzyme sequence with the information in the database to avoid the trouble caused by the differences of translation and gene expression in heterologous host, thus improving the success rate of host expressing foreign genes.

Table1. Codon usage frequency & score

PROJECT - CONTRIBUTION

WHAT WE DO :

Building a complete pathway requires three steps: searching for a pathway, selecting related enzymes, and designing parts. These steps are quite difficult for a worker to achieve by himself, so we aim at making the whole process into one software to release workers from complicated and boring work.

With the good wish, we developed our software called Pathlab, which core idea is modular design. We The users can choose certain module. The users can choose to use one module or the combination of any modules.

In brief, Pathlab makes people who work with synthetic biology have a platform to search a certain pathway that can be applied.

Fig3. Software run shots

PROJECT - VALIDATION

In order to verify whether pathlab can achieve the expected function, we use software to search several paths and compare them with the actual paths in the literature.

EXAMPLE1 - Validate with Alpha Ant

Pathway for the production of flavonoids from glucose

The first validation example is selected from last year’s job - Alpha Ant’s validation case study, because our project makes an improvement from it.

Flavonoids comprise a large family of secondary plant metabolic intermediates that exhibit a wide variety of antioxidant and human health-related properties. However, their wide spread use and availability are currently limited by inefficiencies in both their chemical synthesis and extraction from natural plant sources. As a result, significant strides have been made recent years in improving the microbial production of flavonoids. There are four steps of pathway that are known to be productive for the conversion of L-tyrosine to naringenin(C00509), the main flavonoid precursor.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

Fig5. Searching results by Pathlab (top) & other pathway predicting tools (bottom)

As the figure shows, we can get the same path used in the literature, which suggests that our software can work and the result is reliable from the perspective of literature.

EXAMPLE2 - Validate with iGEM19_CAU_China

Astaxanthin synthesis pathway

Astaxanthin is the most powerful antioxidant found in nature. It has a wide range of health care functions, including fighting high blood pressure by reducing oxidative stress and relaxing blood vessel walls and even inhibiting cancer metastasis. Astaxanthin has a promising market, with over 98% pure products sold at SIGMA for up to $200 /50 mg. This year CAU_China constructs an engineering Escherichia coli using cellulose to produce astaxanthin to deal with the dilemma of stalk treatment in China.

The enzymes involved in each step of the astaxanthin synthesis pathway have been well understood. So based on our collaboration, we use their pathway to validate our software.

Here is their pathway:

First, we search the pathway of astaxanthin synthesis from Farnesyl pyrophosphate, as the result shows, we can find the pathway they use efficiently. And more exciting, the pathway CAU_China used is the top1 in our result which prove that our software is efficient on the pathway search.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

Fig6. Searching result for example2

Fig7. Pathway construct report

Then, to validate our enzyme selection part, we used our software to select enzymes for each reaction. According to the report, it contents the source organisms of the enzymes they use. But we can’t offer enough information because of the limitation of the databases we use. However, we can give the suitable enzyme selection result with existing data.So, it is also a collaboration.

Fig8. Enzyme selection report

EXAMPLE3 - Validate with iGEM12_Tokyo_Tech

synthesize P(3HB)

Polyhydroxyalkanoates(PHAs) are biological polyester synthesized by a wide range of bacteria, and can be produced by fermentation from renewable carbon sources such as sugars and vegetable oil. Team iGEM12_Tokyo_Tech created the first Biobrick part to synthesize P(3HB), a kind of PHAs. At the beginning, we choose this project for validation because of it’s integrity in the information of pathway and enzymeand the romantic story contained.

Their pathway is

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

First of all, we can find this pathway in our software.

The enzyme they used for each step is 2.3.1.16>1.1.1.36>2.3.1.-
Here is our selection result, the enzyme donor they used is concluded.

Fig9. Enzyme donors results

By the limitation of databases, we just get little information, but it’s enough to support the research of preliminary investigation by the validation of literature and experiment. To minimize the trouble brought by the limitation, we have made the platform that users can submit their experiment data to expand the database.

EXAMPLE4 - Validate with comparison to tradational pathway by Tongji_China

Indol pathway

The representative blue of denim fabrics usually derives from indigo, and the high demand for such dyes has led to the production of indigo by chemical synthesis on an industrial scale.

To promote the practical application of this method, they plan to remove the inhibition coming from glucose to the circuit based on team Berkeley 2013 to make it possible to use low-cost carbon sources possible and try to find a cost-effective indole donor. With the research of related industries, they design an accessible environmentally-friendly indigo dye production system with application value.

During the previous collaboration, we have tried to find an indol donor by computer searching. Disappointedly, there is no useful results. So they go back to used the traditional pathway. After finishing our project, we searched their pathway again to made a validation for our software and make a comparison for Tongji_China between software search and traditional way.

Here is their pathway get from the traditional research.

Weight matrix : (Gibbs Weight:1; Toxicity Weight:1; Frequency Weight:1)

Fig10. Indol pathway

Fig11. Indol pathway searching result

Here is our software result. After the optimism, we found their pathway as axpexted, but interestingly, this pathway’s score is very low, we can see the difference to the top1.

Fig12. Software result screenshot

From this result, we can see that our software is useful and the difference between traditional way and computer. We plan to do experiment to validate which one is better after iGEM.

PROJECT - DEMONSTRATION

PROJECT - IMPROVE

Our software was built on the project of last year's Tongji_Software team. The main improvement is to change the searching algorithm and add software functions, including the enzyme selection and parts design

SEARCHING ALGORITHM

In theory, the greedy algorithm may fail to get a global optimum while improving the speed. However, we used both DFS algorithm and greedy algorithm to find specific pathways, and then compared the results. we made tests to check the accuracy of Greedy in limited reactions compared with DFS, and found that the accuracy of two algorithms is similar, while the speed of greedy algorithm is significantly improved, so we regard it as a good improvement.

Fig13. Comparison of time consuming between DFS & BFS

ADDITIONAL FUNCTIONS

In choosing the enzymes needed for each reaction, we establish our own judgment model. At the same time, the key words related to compounds needed in the pathway were sorted out, and these key words would be presented as a word cloud. When providing the final result of enzyme selection to the user, the optimized sequence is provided considering the codon preference for the engineering bacteria.

In parts design, we cleared up the data from iGEM part database, and we made a search engine which enables users to search parts with their name or a certain function.

These functions can be used as a whole, meanwhile they can be used separately.

What’s more, users can apply for their own account on our website, and can leave a message on the webpage. We will always pay attention to users’ message and constantly optimize the Pathlab, and users can also make comments about optimized enzymes or different parts. Moreover, users’ message will be seen by others, and they can communicate through the message board and read others comments about the enzyme or parts they are going to use.

PROJECT - COLLRBORATION

The paths found by our software are based on databases and algorithms, which need to be verified by practical experiments. At the same time, the results obtained by our software can provide support for the path design of the experimental team.

Through CCiC, we had a deep communication with three other experimental teams related to pathways construction. We know the substrates they own and the products they want to get，then try to design parts through Pathlab search paths and verify the pathways they implement.

COLLABORATION 1：TONGJI_CHINA

Since we are from the same school, Tongji_China and we have more integrated collaboration from the very beginning. We had conferences together for several times, and their project is about manufacturing, meanwhile, ours is about pathway search, so we get feedback from them after they used our software. Our results also inspire them at the same time.

One of their suggestions which had a great influence on us is that we should avoid some unreasonable results putting a group on a compound and then taking it apart, which is pretty useless. So we added codes to avoid this kind of situation taking place. And we had searched the pathway from tryptophan to indole they used, but we didn't get a practical result, for example, there will be some pathways fall into the cycle. Thus, we realized that the database we used had limitation.

For Tongji_China, they tried to improve the synthesis of indigo. The method of finding new pathways could be found through reading literature, experimental attempts through the combination of existing pathways, or simulated synthesis through software design and retrieval. Therefore, we provided help in software retrieval. Since the data in the synthetic indigo pathways are already published literature or the experimental materials needed are too expensive and not suitable for synthesis, we did not find useful results in the existed database. However, the upstream and downstream information about indigo we found provided certain reference and support for their experiments. They also tried to give us their attempts to enrich our database for designing more efficient and useful pathways.

Later, they completed their synthesis pathway design based on the combination of two pathways published, and we also have optimized our software. Here they tested our software by searching their pathway, the search result provides us a sample of comparison between software and traditional experiment.

Fig14. Collaboration between Tongji_China

COLLABORATION 2：WASHIINGTON IGEM

Washington iGEM invited us to participate in the manufacturing of their audiobook which is about biology. We mainly do some translation and recording work for them， thus making chinese students what synthetic biology is and join some interesting experiments.

COLLABORATION 3：SASTRA IGEM

SASTRA iGEM invited us to participate in their manufacturing of their magazine, and our collaboration forms included but was not limited to writing articles about synthesis biology and experiment, providing interviews with professionals, making the theme of synthesis biology and taking related photography.

COLLABORATION 4：UCD IGEM

We participated in UCD’s research about the use of mammals in this iGEM competition.

COLLABORATION 5：UESTC_Software

The software team of UESTC does the integration of various parts databases, taking iGEM parts database of the main body of the integration, which is very convenience for users to search related information. Not only can integrated information of database improve the efficiency of searching, but can also provide other software teams with a strong data support, and our collaboration is based on data. What we do is to complete the pathway design part, from the reaction to the catalytic enzyme, and then to the choice of regulatory parts. For the regulatory parts, different users can have different options according to the experimental requirements. We want to establish the regulatory parts database and build a search engine, so that the user can retrieve the corresponding parts according to their own needs. UESTC software team has done the data collation of iGEM parts database, so we established cooperation with them. They provided us with data support, which reduced our workload. What’s more, we provide the link to their software where users can get more complete information of selected parts.

Fig15. Software logo of UESTC_Software

COLLABORATION 6：CAU iGEM

The cooperation with China Agricultural University is based on their demand for detail information of their pathway, and it is also an attempt to apply our software into practice. What they did was to synthesize Astaxanthin from glucose which comes from the degradation of cellulose, and the synthetic pathway was retrieved from the literature, but the information available from the literature was limited for the technical team, and searching through the database was a time-consuming process. So, we tried to search the software for possible pathways from lycopene to Astaxanthin. Finally we provided them with a PDF of the results of the software search, from which they got some reliable information for their experiments. With our help, they felt amazed to have access to information that was not expected from the literature, and it would be interesting to see if the results of the software search performed better than those of the literature, but this verification is subject to time, so if possible, we can do this verification completely after iGEM.

Fig16. Collaboration between CAU_China

COLLABORATION 7：SJTU-software

SJTU-software contacted us to make collaboration about the use and function of software, so we organized a seminar face to face in Shanghai Jiao Tong University. In the conference, we also invited UESTC_Software to join us online. Each team showed what they do, which data they use, what function they have and how to use their software just like a demonstration. After the presentation, we talked about the problems existing in the software, and put forward some advice for each team. For UESTC_Software, their software is complete and friendly to users. We give them some suggestion in details. For SJTU-software, we give them some technical instruct. We use the same frame to build our software, so we show our source data, and explain it to them. For us, we realized the disadvantages of login function from their advice, which we based on to add the comment to each result of user get.

Fig17. Collaboration between SJTU_software

@@ Line 1,129: / Line 1,129: @@
                  <h2 class="ProSubSubTitle"><b>C</b>OLLABORATION 1：<b>T</b>ONGJI_<b>C</b>HINA</h2>
                  <p>Since we are from the same school, <a href="https://2019.igem.org/Team:Tongji_China" id="Model_link">Tongji_China</a> and we have more integrated collaboration from the very beginning. We had conferences together for several times, and
-                     their project is about manufacturing, meanwhile, ours is about pathway search, so we get feedback from them after they used our software.Our results also inspire them at the same time.</p><br>
+                     their project is about manufacturing, meanwhile, ours is about pathway search, so we get feedback from them after they used our software. Our results also inspire them at the same time.</p><br>
                  <p>One of their suggestions which had a great influence on us is that we should avoid some unreasonable results putting a group on a compound and then taking it apart, which is pretty useless. So we added codes to avoid this kind of situation
                      taking place. And we had searched the pathway from tryptophan to indole they used, but we didn't get a practical result, for example, there will be some pathways fall into the cycle. Thus, we realized that the database we used had
@@ Line 1,138: / Line 1,138: @@
                      for synthesis, we did not find useful results in the existed database. However, the upstream and downstream information about indigo we found provided certain reference and support for their experiments. They also tried to give us
                      their attempts to enrich our database for designing more efficient and useful pathways.</p><br>
-                 <p>Later, they completed their synthesis pathway based on the combination of two pathways published, and we also have optimized our software. Here they tested our software by searching their pathway, the search result provides us a sample
+                 <p>Later, they completed their synthesis pathway design based on the combination of two pathways published, and we also have optimized our software. Here they tested our software by searching their pathway, the search result provides us a sample
                      of comparison between software and traditional experiment.</p>
                  <br><br>

Difference between revisions of "Team:Tongji Software/Project"

Revision as of 12:28, 21 October 2019

PROJECT

PROJECT - DESCRIPTION

OVERVIEW

WHY THIS PROJECT -- MEET THE NEEDS

HOW WE START -- INSPIRATION INSIDE IGEM

WHAT WE ARE DOING

REFERRENCE

PROJECT - DESIGN

DATA

SEARCHING ALGORITHM

RANKING CRITERIA

WORD CLOUD

CODON OPTIMIZATION

PROJECT - CONTRIBUTION

WHAT WE DO :

PROJECT - VALIDATION

EXAMPLE1 - Validate with Alpha Ant

EXAMPLE2 - Validate with iGEM19_CAU_China

EXAMPLE3 - Validate with iGEM12_Tokyo_Tech

EXAMPLE4 - Validate with comparison to tradational pathway by Tongji_China

PROJECT - DEMONSTRATION

PROJECT - IMPROVE

SEARCHING ALGORITHM

ADDITIONAL FUNCTIONS

PROJECT - COLLRBORATION

COLLABORATION 1：TONGJI_CHINA

COLLABORATION 2：WASHIINGTON IGEM

COLLABORATION 3：SASTRA IGEM

COLLABORATION 4：UCD IGEM

COLLABORATION 5：UESTC_Software

COLLABORATION 6：CAU iGEM

COLLABORATION 7：SJTU-software