Team:UESTC-Software/Contribution

Description

...

Contribution

BioMaster 2.0 is an integrated database to solve the problem of incomplete information and user-unfriendly experience, and lay the foundation of CAD software in synthetic biology. The contribution consists of four parts: database integration, information screening, data retrieval, and data supporting service.
Following the spiral model, our team carried out the Human practices to deal with backtrackings and constantly updating requirements. This model can also be used as a template for other teams to carry out Human practices.

Database Integration

BioMaster 2.0 integrated 11 traditional biological research databases, such as UniProt, STRING, BRENDA, to provide users with more information about biobricks, including biobrick sites, interactions, function keys, annotations and related references. Through BioMaster 2.0, users can use biobricks, and even improve them more accurately.
In addition, we also collected information from team wiki in 2005-2018. Compared with the official wiki search tool, BioMaster 2.0 added team awards and summary pictures. Users can search in team names, awards or keywords to learn more and get inspiration from previous projects.

Information Screening and Completion

iGEM Registry has been committed to promote the standardization of parts, but many parts still have incomplete information or obvious errors, especially in the early submissions. BioMaster 2.0 make great effect to screen out the right information, or remind synthetic biologists to avoid errors. We utilized the method of sequence alignment to find the best hits and link corresponding entries in UniProt, BRENDA and other databases. In this way, BioMaster 2.0 is able to increase the content displayed on each part page and fill in the missing parts information.

Data Retrieval and Display

Four search methods are provided in BioMaster 2.0 to find useful information more effectively and accurately. Users can retrieve related biobricks by various IDs and gene names. With the help of Elasticsearch, we implement the function for keywords search. Besides, the BLAST tool is also provided in the BioMaster 2.0. In the detail pages for parts, BioMaster 2.0 specially retain database links for users to quickly find the original source. Except the text content, SBOL images and interactive scatter plot along with other graphs are shown in the page to visually present search results for users.

Database Service

One of our major goals is to provide integrated part database for iGEM software teams or any other software projects that may use part registry, thus saving their time from boring and repetitive database adapting work. BioMaster 2.0 offers three database download formats: SQL, CSV and XML, which are stored on Amazon S3 cloud database services respectively.
iGEMers have witnessed many outstanding software projects that can no longer contribute to the community due to lack of maintenance and update. To avoid this circumstance, BioMaster 2.0 improves the maintainability, mergeability and data updating of BioMaster 2.0 in many ways.
We have developed a website of BioMaster 2.0 that can run in docker. After downloading the website files, iGEMers can run our website program independently by installing docker with independent and migrated data. Simultaneously, the website is completely open source, and other developers can pull images in Docker Hub and share the same development environment. In addition, we also integrate the data acquisition and update code into a single program. Both website maintainers and other developers can enjoy "one-click update", which simplifies the maintenance work to a large extent. We believe that these efforts can prolong the project life cycle and provide iGEMers with steady and continuous service.

Human Practices

Our Human Practices follow the life cycle of software development to guide our project. we use the spiral model to carry out research, and promote the project in the way of parallel demand research and coding.
In spiral three-round survey, each round will receive new requirements or suggestions for improvement of the project. We enhance the project through continuous excavation and problem solving. After satisfying all the requirements, the preliminary version will be tested by users again to collect feedbacks and comments.
All these improvements make contributions to the final version. Spiral research does not affect the process of software implementation while continuing to retrospect. Besides, the updating of requirements can also be mastered in time. Such a model is not only applicable to our project, but also can be used as a template for other software teams or experimental teams.

Education Popularization

When it comes to the promotion of synthetic biology and iGEM, we released a board game BioME (along with PC version) and a brochure of synthetic biology which is suitable for junior and senior high school students.
BioME is a board game based on the idea of bottom-up construction of synthetic biology. Following the idea of building blocks, BioME starts from collecting the parts and builds them as a device layer by layer. Finally, it achieves the goal of building a functional system. Through this game, we hope to help the public know the idea of "knowledge-based application" of synthetic biology.
Only understanding the ideas is not enough, and mere lectures are kind of boring and difficult. Therefore, the brochure plays the role of a supplementary understanding tool for all kinds of propaganda activities. There are two editions in both Chinese and English, and other teams can also use them when preparing for promotion activities.

Validation

Theoretical Support

$$result=V_{id}\times w1+\frac{V_{score}}{7\times alilen}\times w2+\frac{\lg({-\log_6({V_{evalue}})})}{3}\times w3$$ $$if result >0.65; save$$ $$else: delete$$
After the same mathematical processing this year, the value of identity, score, evalue in last year's BLAST results is reduced to (0, 1), and the average of three corresponding values in each of the two years is obtained, which is shown on the chart by histogram respectively. As you can see from the figure, this year's results’ accuracy have improved in an all-round way over last year, because the accuracy of the results is positively correlated with these values, and as these values become larger, the accuracy is also on the rise.
Fig.1 Comparison with last version-1
By mapping the total number of UniProt ID and iGEM ID in the last year's results produced by BLAST, the total number of UniProt ID and the iGEM ID, we can clearly see that this year's data set is more complete than last year.
Fig.2 Comparison with last version-2

Database Testing

In order to solve the problem of unclear and insufficient information in the iGEM Registry, we developed BioMaster 2.0 based on BioMaster 1.0. It has more data. The final database model is as follows:
Fig. 3. Structure of classifier
To verify the performance of our database, we compared information from iGEM Registry, BioMaster 1.0 with information from BioMaster 2.0. We used BBa_K1745002 to search, the results are as follows:

Comparison about Information of Parts

Comparison about Interaction of Genes/Protein

Comparison about Other Databases' Information

Comparison about Details of Reference

Abundant Information of BioMaster 2.0

From the above comparison, we can see that compared to the iGEM Registry and BioMaster 1.0, BioMaster 2.0 gives more information of the biobrick from different databases, making BioMaster 2.0 a better synthetic biology database where you can find rich information to make better use of biobricks.

Database Download

In BioMaster 2.0, we provide six download formats for each biobrick to meet the needs of different users. Along with sequence information, users can get any other information stored in BioMaster.
Fig.4 Download buttons for a part
Fig.5 Example for data download
Fig.6 The data download page in BioMaster 2.0
If that's not enough, users can download all the data in SQL, CSV, XML. In addition, users can also pull the image of BioMaster 2.0 through Docker and run our database locally.
Fig.7 Catalog for download files in Amazon S3
The files for downloading is stored in amazon S3 cloud service instead of web service, which enjoy a better safety, reliability and extensibility.

Comment Function

To facilitate communication among synthetic biologists, we provide a forum where users can post their experience with biobricks, ask questions, and answer questions for others.
Fig.8 Screenshot for comment function

Search Function

We provide several different types of searches that you can search for by searching for iGEM ID, EPD ID, UniProt ID, Gene Name, keyword, team wiki, and DNA sequence. In addition, the ability to sort search results was added, so users can assign different weights to find the most appropriate biobricks.
Fig.9 The search result page

Click For More Details

Enzymatic Functions Prediction Tool

You can predict the probabilistic EC number by providing a sequence of proteins, and click on “more information” for more information in the BRENDA database.
Fig.10 EC number prediction result
To test the effect of the tool, we integrated data sets from different data sources, which were never used in the system training. The results of the tests indicated the well effectiveness of our prediction tool. The test results are as follows:
Table.1. The test results

Promoter Prediction Tool

Fig.11 Promoter prediction result

BLAST

BLAST tool is used to get more information when users have a uncharacterized sequence.
Fig.12 Example for BLAST result

Database ID Conversion Tool

ID conversion tool is used to convert from an identifier in a database to the identifier in other databases.
Fig.13 Example for ID conversion result

SBOL-design Tool

SBOL-design is used to design the genetic circuit and store that in a different format including PNG, FASTA, GenBank.
Fig.14 Example for SBOL-design tool-1
Fig.15 Example for SBOL-design tool-2

Docker

After downloading our web program from GitHub, users can run BioMaster web program on their own computer by simply typing a few commands to pull the image and build containers.
Fig.16 Example for Docker image install
We established a repository in Docker Hub and upload necessary images for our project, any docker developers can pull these images to their computer and operate them.
Fig.17 Screenshot for our Docker repository
We upload our web program based on docker images to GitHub, you can access it and make comments, we will continue improvement and debugging.
Fig.18 Screenshot for our GitHub project
Docker also guaranteed a long-life cycle of our project. With docker, we can easily migrate our program and even provide BioMaster 2.0 to users without a remote server. Below are BioMaster 2.0 web pages running and being accessed through localhost.
Fig.19 BioMaster running on localhost-1
Fig.20 BioMaster running on localhost-2

Feedback

After many iGEM teams have tested BioMaster 2.0, we have received a lot of feedback and suggestions. The collaboration with other teams is the greatest recognition of our project,and also verified the feasibility of our project. BioMaster 2.0 is no longer only for individual users, but for all those who design software for synthetic biology. For individuals, complete information reduces the trouble of jumping from one database to another when searching. For software design, we provide the basic information needed whether it involves metabolic pathways or biobrick recommendations. This year's collaboration with Tongji-Software and USTC-Software fully demonstrates the accessibility of BioMaster2.0 to software design.