Contribution
BioMaster 2.0 is an integrated database
to solve the problem of incomplete information and user-unfriendly experience, and lay the foundation of CAD
software in synthetic biology. The contribution consists of four parts: database integration, information
screening, data retrieval, and data supporting service.
Following the spiral model, our team carried out the Human practices to deal with backtrackings and constantly
updating requirements. This model can also be used as a template for other teams to carry out Human practices.
Database Integration
BioMaster 2.0 integrated 11 traditional biological research databases, such as UniProt, STRING, BRENDA, to provide
users with more information about biobricks, including biobrick sites, interactions, function keys, annotations
and related references. Through BioMaster 2.0, users can use biobricks, and even improve them more accurately.
In addition, we also collected information from team wiki in 2005-2018. Compared with the official wiki search
tool, BioMaster 2.0 added team awards and summary pictures. Users can search in team names, awards or keywords to
learn more and get inspiration from previous projects.
Information Screening and Completion
iGEM Registry has been committed to promote the standardization of parts, but many parts still have incomplete
information or obvious errors, especially in the early submissions. BioMaster 2.0 make great effect to screen out
the right information, or remind synthetic biologists to avoid errors. We utilized the method of
sequence alignment to find the best hits and link corresponding entries in UniProt, BRENDA and other databases.
In this way, BioMaster 2.0 is able to increase the content displayed on each part page and fill in the missing
parts
information.
Data Retrieval and Display
Four search methods are provided in BioMaster 2.0 to find useful information more effectively and accurately. Users
can retrieve related biobricks by various IDs and gene names. With the help of Elasticsearch, we implement the
function for keywords search. Besides, the BLAST tool is also provided in the BioMaster 2.0.
In the detail pages for parts, BioMaster 2.0 specially retain database links for users to quickly find the original
source. Except the text content, SBOL images and interactive scatter plot along with other graphs are shown in
the page to visually present search results for users.
Database Service
One of our major goals is to provide integrated part database for iGEM software teams or any other software
projects that may use part registry, thus saving their time from boring and repetitive database adapting work.
BioMaster 2.0 offers three database download formats: SQL, CSV and XML, which are stored on Amazon S3 cloud
database services respectively.
iGEMers have witnessed many outstanding software projects that can no longer contribute to the community due to
lack of maintenance and update. To avoid this circumstance, BioMaster 2.0
improves the maintainability, mergeability and data updating of BioMaster 2.0 in many ways.
We have developed a website of BioMaster 2.0 that can run in docker. After downloading the website files,
iGEMers can run our website program independently by installing docker with independent and migrated data.
Simultaneously, the website is completely open source, and other developers can pull images in Docker Hub
and share the same development environment. In addition, we also integrate the data acquisition and update code
into a single program. Both website maintainers and other developers can enjoy "one-click update", which
simplifies the maintenance work to a large extent. We believe that these efforts can prolong the project life
cycle and provide iGEMers with steady and continuous service.
Human Practices
Our Human Practices follow the life cycle
of software development to guide our project. we use the spiral model to carry out research, and promote the
project in the way of parallel demand research and coding.
In spiral three-round survey, each round will receive new requirements or suggestions for improvement of the
project. We enhance the project through continuous excavation and problem solving. After satisfying all the
requirements, the preliminary version will be tested by users again to collect feedbacks and comments.
All these improvements make contributions to the final version. Spiral research does not affect the process of
software implementation while continuing to retrospect. Besides, the updating of requirements can also be
mastered in time. Such a model is not only applicable to our project, but also can be used as a template for
other software teams or experimental teams.
Education Popularization
When it comes to the promotion of synthetic biology and iGEM, we released a board game BioME (along with PC
version) and a brochure of synthetic biology which is suitable for junior and senior high school students.
BioME is a board game based on the idea of bottom-up construction of synthetic biology. Following the idea of
building blocks, BioME starts from collecting the parts and builds them as a device layer by layer. Finally, it
achieves the goal of building a functional system. Through this game, we hope to help the public know the idea
of "knowledge-based application" of synthetic biology.
Only understanding the ideas is not enough, and mere lectures are kind of boring and difficult. Therefore, the
brochure plays the role of a supplementary understanding tool for all kinds of propaganda activities. There are
two editions in both Chinese and English, and other teams can also use them when preparing for promotion
activities.
Validation
Theoretical Support
$$result=V_{id}\times w1+\frac{V_{score}}{7\times alilen}\times
w2+\frac{\lg({-\log_6({V_{evalue}})})}{3}\times w3$$
$$if result >0.65; save$$
$$else: delete$$
After the same mathematical processing this year, the value of identity, score, evalue in last year's BLAST
results is reduced to (0, 1), and the average of three corresponding values in each of the two years is
obtained, which is shown on the chart by histogram respectively. As you can see from the figure, this year's
results’ accuracy have improved in an all-round way over last year, because the accuracy of the results is
positively correlated with these values, and as these values become larger, the accuracy is also on the rise.
Fig.1 Comparison with last version-1
By mapping the total number of UniProt ID and iGEM ID in the last year's results produced by BLAST, the total
number of UniProt ID and the iGEM ID, we can clearly see that this year's data set is more complete than last
year.
Fig.2 Comparison with last version-2
Database Testing
In order to solve the problem of unclear and insufficient information in the iGEM Registry, we developed
BioMaster 2.0 based on BioMaster 1.0. It has more data. The final database model is as follows:
Fig. 3. Structure of classifier
To verify the performance of our database, we compared information from iGEM Registry, BioMaster 1.0 with
information from BioMaster 2.0. We used BBa_K1745002 to search, the results are as follows:
From the above comparison, we can see that compared to the iGEM Registry and BioMaster 1.0, BioMaster 2.0 gives
more information of the biobrick from different databases, making BioMaster 2.0 a better synthetic biology
database where you can find rich information to make better use of biobricks.
Database Download
In BioMaster 2.0, we provide six download formats for each biobrick to meet the needs of different users. Along
with sequence information, users can get any other information stored in BioMaster.
Fig.4 Download buttons for a part
Fig.5 Example for data download
Fig.6 The data download page in BioMaster 2.0
If that's not enough, users can download all the data in SQL, CSV, XML. In addition, users can also pull the
image of BioMaster 2.0 through Docker and run our database locally.
Fig.7 Catalog for download files in Amazon S3
The files for downloading is stored in amazon S3 cloud service instead of web service, which enjoy a better
safety, reliability and extensibility.
Comment Function
To facilitate communication among synthetic biologists, we provide a forum where users can post their
experience
with biobricks, ask questions, and answer questions for others.
Fig.8 Screenshot for comment function
Search Function
We provide several different types of searches that you can search for by searching for iGEM ID, EPD ID,
UniProt
ID, Gene Name, keyword, team wiki, and DNA sequence. In addition, the ability to sort search results was added,
so users can assign different weights to find the most appropriate biobricks.
Fig.9 The search result page
Enzymatic Functions Prediction Tool
You can predict the probabilistic EC number by providing a sequence of proteins, and click on “more
information”
for more information in the BRENDA database.
Fig.10 EC number prediction result
To test the effect of the tool, we integrated data sets from different data sources, which were never used in
the system training. The results of the tests indicated the well effectiveness of our prediction tool. The test
results are as follows:
Table.1. The test results
Promoter Prediction Tool
Fig.11 Promoter prediction result
Please refer to https://2018.igem.org/Team:UESTC-Software/Validation
for details.
BLAST
BLAST tool is used to get more information when users have a uncharacterized sequence.
Fig.12 Example for BLAST result
Database ID Conversion Tool
ID conversion tool is used to convert from an identifier in a database to the identifier in other databases.
Fig.13 Example for ID conversion result
SBOL-design Tool
SBOL-design is used to design the genetic circuit and store that in a different format including PNG, FASTA,
GenBank.
Fig.14 Example for SBOL-design tool-1
Fig.15 Example for SBOL-design tool-2
Docker
After downloading our web program from GitHub, users can run BioMaster web program on their own computer by
simply typing a few commands to pull the image and build containers.
Fig.16 Example for Docker image install
We established a repository in Docker Hub and upload necessary images for our project, any docker developers
can
pull these images to their computer and operate them.
Fig.17 Screenshot for our Docker repository
We upload our web program based on docker images to GitHub, you can access it and make comments, we will
continue improvement and debugging.
Fig.18 Screenshot for our GitHub project
Docker also guaranteed a long-life cycle of our project. With docker, we can easily migrate our program and
even
provide BioMaster 2.0 to users without a remote server. Below are BioMaster 2.0 web pages running and being
accessed through localhost.
Fig.19 BioMaster running on localhost-1
Fig.20 BioMaster running on localhost-2
Feedback
After many iGEM teams have tested BioMaster 2.0, we have received a lot of feedback and suggestions. The
collaboration with other teams is the greatest recognition of our project,and also verified the feasibility of
our project. BioMaster 2.0 is no longer only for individual users, but for all those who design software for
synthetic biology. For individuals, complete information reduces the trouble of jumping from one database to
another when searching. For software design, we provide the basic information needed whether it involves
metabolic pathways or biobrick recommendations. This year's collaboration with Tongji-Software and
USTC-Software
fully demonstrates the accessibility of BioMaster2.0 to software design.