Difference between revisions of "Team:USTC-Software/Model"

Line 28: Line 28:
 
           <div class="passage">Flux Balance Analysis (FBA) is a mathematical method for simulating metabolism in
 
           <div class="passage">Flux Balance Analysis (FBA) is a mathematical method for simulating metabolism in
 
             metabolic networks. It is based on linear programming to calculate fluxes when the model is stable.
 
             metabolic networks. It is based on linear programming to calculate fluxes when the model is stable.
             <br>
+
             <br><br>
 
             Assuming S is a matrix, which contains all reactions in a model. In this matrix, metabolites consumed take
 
             Assuming S is a matrix, which contains all reactions in a model. In this matrix, metabolites consumed take
 
             negative coefficients, and metabolites produced then take positive ones. Furthermore, v is a vector that
 
             negative coefficients, and metabolites produced then take positive ones. Furthermore, v is a vector that
Line 44: Line 44:
 
             reaction flux while they satisfy constraints and have the same optimal objective by solving a double linear
 
             reaction flux while they satisfy constraints and have the same optimal objective by solving a double linear
 
             programming problem.
 
             programming problem.
             <br>
+
             <br><br>
 
             However, when the metabolic network contains internal loops, the result can be too high in absolute to be
 
             However, when the metabolic network contains internal loops, the result can be too high in absolute to be
 
             realistic. Users can select the "loopless" option to avoid such internal loops, and get a more consistent
 
             realistic. Users can select the "loopless" option to avoid such internal loops, and get a more consistent
Line 55: Line 55:
 
             gap between object-oriented programming languages and relational databases and avoids the vulnerability of
 
             gap between object-oriented programming languages and relational databases and avoids the vulnerability of
 
             SQL Injection.
 
             SQL Injection.
             <br>
+
             <br><br>
 
             We use ORM to organize our metabolic networks, biobricks data, and user's computational models. It brings us
 
             We use ORM to organize our metabolic networks, biobricks data, and user's computational models. It brings us
 
             the convenience of creating, querying, listing, and connecting models, with acceptable performance.
 
             the convenience of creating, querying, listing, and connecting models, with acceptable performance.
             <br>
+
             <br><br>
 
             In practice, we use Django ORM with MySQL backend to provide fast, flexible, and reliable service.
 
             In practice, we use Django ORM with MySQL backend to provide fast, flexible, and reliable service.
 
           </div>
 
           </div>
Line 65: Line 65:
 
             Our website requires computing large models, and it is quite embarrassing to see users waiting for browsers
 
             Our website requires computing large models, and it is quite embarrassing to see users waiting for browsers
 
             for so long. So we use message queues to maintain computing tasks.
 
             for so long. So we use message queues to maintain computing tasks.
             <br>
+
             <br><br>
 
             We split our programs computing models from our website, and use message queues to send and receive
 
             We split our programs computing models from our website, and use message queues to send and receive
 
             information about our computation. A queue is a data structure that stores things waiting to be handled, and
 
             information about our computation. A queue is a data structure that stores things waiting to be handled, and
 
             it obeys the "First Come, First Serve" principle. So, we can store our computing tasks in the queue, and
 
             it obeys the "First Come, First Serve" principle. So, we can store our computing tasks in the queue, and
 
             response to users instantly about the progress of his tasks.
 
             response to users instantly about the progress of his tasks.
             <br>
+
             <br><br>
 
             In Foresyn, we use RabbitMQ and Celery to build our message queues. Here, Celery uses its protocol to
 
             In Foresyn, we use RabbitMQ and Celery to build our message queues. Here, Celery uses its protocol to
 
             communicate with web programs and computation programs. And RabbitMQ acts as a broker who accepts and
 
             communicate with web programs and computation programs. And RabbitMQ acts as a broker who accepts and
Line 80: Line 80:
 
             our strategy is to use slow but accurate algorithm when searching small datasets, and use fast but less
 
             our strategy is to use slow but accurate algorithm when searching small datasets, and use fast but less
 
             accurate algorithm when searching large ones.
 
             accurate algorithm when searching large ones.
             <br>
+
             <br><br>
 
             Here, there are only some models like E. coli, but a lot of genes, reactions and metabolites in our
 
             Here, there are only some models like E. coli, but a lot of genes, reactions and metabolites in our
 
             database. We choose to calculate Levenshtein distances for model searching. Levenshtein distance defines as
 
             database. We choose to calculate Levenshtein distances for model searching. Levenshtein distance defines as
Line 94: Line 94:
 
             $$
 
             $$
 
             which has a time complexity of \(O(nm)\) where n and m are the length of the two strings.
 
             which has a time complexity of \(O(nm)\) where n and m are the length of the two strings.
             <br>
+
             <br><br>
 
             When we are handling genes, reactions and metabolites, we use another algorithm.
 
             When we are handling genes, reactions and metabolites, we use another algorithm.
 
           </div>
 
           </div>

Revision as of 08:18, 20 October 2019

Model

Biological Models

Flux Balance Analysis

Flux Balance Analysis (FBA) is a mathematical method for simulating metabolism in metabolic networks. It is based on linear programming to calculate fluxes when the model is stable.

Assuming S is a matrix, which contains all reactions in a model. In this matrix, metabolites consumed take negative coefficients, and metabolites produced then take positive ones. Furthermore, v is a vector that represents the flux of all reactions. When the system is steady, it satisfies: $$ \textbf{S}\textbf{v}=0 $$ And then, FBA tries to maximize or minimize the objective function \(\textbf{c}^\textrm{T} \textbf{v}\), which c contains the weight of all reactions contributing to the function. Now it is clear that FBA is a linear programming problem, and it just works.

Flux Variability Analysis

Flux Variability Analysis (FVA) is an extension of FBA. It can show the minimum and maximum range of each reaction flux while they satisfy constraints and have the same optimal objective by solving a double linear programming problem.

However, when the metabolic network contains internal loops, the result can be too high in absolute to be realistic. Users can select the "loopless" option to avoid such internal loops, and get a more consistent result.

Computer Models

Object-Relational Mapping

Object-Relational Mapping (ORM) enables us to work with databases more comfortable and safer. It fills the gap between object-oriented programming languages and relational databases and avoids the vulnerability of SQL Injection.

We use ORM to organize our metabolic networks, biobricks data, and user's computational models. It brings us the convenience of creating, querying, listing, and connecting models, with acceptable performance.

In practice, we use Django ORM with MySQL backend to provide fast, flexible, and reliable service.

Message Queues

Our website requires computing large models, and it is quite embarrassing to see users waiting for browsers for so long. So we use message queues to maintain computing tasks.

We split our programs computing models from our website, and use message queues to send and receive information about our computation. A queue is a data structure that stores things waiting to be handled, and it obeys the "First Come, First Serve" principle. So, we can store our computing tasks in the queue, and response to users instantly about the progress of his tasks.

In Foresyn, we use RabbitMQ and Celery to build our message queues. Here, Celery uses its protocol to communicate with web programs and computation programs. And RabbitMQ acts as a broker who accepts and forwards messages of Celery.

Different Searching Engines

When we are developing Foresyn, we discovered that there is a contradiction between speed and accuracy. So our strategy is to use slow but accurate algorithm when searching small datasets, and use fast but less accurate algorithm when searching large ones.

Here, there are only some models like E. coli, but a lot of genes, reactions and metabolites in our database. We choose to calculate Levenshtein distances for model searching. Levenshtein distance defines as the following recursive function: $$ \qquad\operatorname{lev}_{a,b}(i,j) = \begin{cases} \max(i,j) & \text{if } \min(i,j)=0, \\ \min (\operatorname{lev}_{a,b}(i-1,j) + 1,\operatorname{lev}_{a,b}(i,j-1) + 1, \operatorname{lev}_{a,b}(i-1,j-1) + 1) & a_i \neq b_i, \\ \min (\operatorname{lev}_{a,b}(i-1,j) + 1,\operatorname{lev}_{a,b}(i,j-1) + 1, \operatorname{lev}_{a,b}(i-1,j-1)) & \text{otherwise.} \\ \end{cases} $$ which has a time complexity of \(O(nm)\) where n and m are the length of the two strings.

When we are handling genes, reactions and metabolites, we use another algorithm.

Biobricks Recommendation

TBD

References

  • 1. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis?. *Nat Biotechnol*. 2010;28(3):245–248. doi:10.1038/nbt.1614
  • 2. Gudmundsson S, Thiele I. Computationally efficient flux variability analysis. *BMC Bioinformatics*. 2010;11:489. Published 2010 Sep 29. doi:10.1186/1471-2105-11-489
  • 3. Arne C. Müller, Alexander Bockmayr, Fast thermodynamically constrained flux variability analysis, *Bioinformatics*, Volume 29, Issue 7, 1 April 2013, Pages 903–909, https://doi.org/10.1093/bioinformatics/btt059
  • 4. Abdelmoneim Amer Desouki, Florian Jarre, Gabriel Gelius-Dietrich, Martin J. Lercher, CycleFreeFlux: efficient removal of thermodynamically infeasible loops from flux distributions, *Bioinformatics*, Volume 31, Issue 13, 1 July 2015, Pages 2159–2165, https://doi.org/10.1093/bioinformatics/btv096
  • 5. Ambler, Scott W. "Mapping objects to relational databases." On the World Wide Web: http://www. AmbySoft. com (2000).
  • 6. Vinoski, Steve. "Advanced message queuing protocol." IEEE Internet Computing 6 (2006): 87-89.