Overview
In 2018, UESTC-Software released BioMaster 1.0 and achieved good results.
Besides, many users and judges gave valuable suggestions. We valued these feedbacks very much and responded to
them in our 2019 work.This year, We did a lot of new work, including improvements based on BioMaster
1.0, and a number of features specific to the BioMaster 2.0 release. BioMaster 2.0 has the following features:
New Algorithms for Sequence Alignment
Using a new sequence comparison model to determine database mapping relations.
Ranking and Recommendation for Searching
Innovative weighted equation for searching results ranking and screening function.
Prediction
Adding a prediction tool,which can provide enzymatic function by predicting the EC number.
New Feature for BioMaster 2.0
More Integrated Databases
Compared to BioMaster 1.0, BioMaster 2.0 has a significant expansion in the number
of databases. We focused on
integrating the enzyme databases which other software teams pay more attention to. Thus, KEGG, BRENDA
and ExplorEnz were added to our system.
The KEGG contains information on metabolic pathways, and the BRENDA and
ExplorEnz complement
the experimental information that is missing in iGEM registry. Although the information of a single gene
in a part does not represent the physical and chemical properties and experimental data of the overall part, we
believe that the completion of this information can still provide inspiration and data support for synthetic
biologists.
We also added the interaction database BioGRID to BioMaster2.0, which provides information on
protein,
genetic and chemical
interactions. The joining of BioGRID complements the STRING and, to some extent, reflects the interaction of
part with the
environment.
Upgraded Framework
For BioMaster 1.0, we built our website using the ThinkPHP framework. However, with
the amount of data and the
number of visits increasing, the defects of the BioMaster 1.0 framework exposed: low security, poor
scalability, inability to support large numbers of access, and inconvenient maintenance.
Finally we decided to build the Laravel framework in BioMaster 2.0 and completely
reset the code logic of the
entire site. laravel is one of the commonly used high-performance frameworks, with advantages in scalability
and
maintainability. Openssl for encryption is used, which has a significant improvement in website security. The
back-end tasks are processed asynchronously using queues, and the website can perform long and time-consuming
tasks (such as EC number prediction and web BLAST API), and the website will not crash under a lot of access.
New Algorithms for Sequence Alignment
BioMaster 1.0 uses the "preprocessing sequence + machine learning + classifier"
screening strategy. However, in the process of arranging the iGEM registry data, we found that the original
screening model has many shortcomings. For example, since only the full sequence is considered, the original
screening model ignores fragments of gene duplication in the part in some cases. At the same time, the
classifier in the model is still essentially dependent on the manually annotated data set and the original data
set in our extended BioMaster 2.0.
Therefore, for BioMaster 2.0 screening model, we abandoned the machine
learning method and
comprehensively judged the quality of the hit sets by setting up preprocessing equations and setting weights.
See Model for a detailed description. The new screening method surpasses the original method in the three BLAST
parameter indicators and can be better migrated to other projects.
User-friendly Interface
We
redesigned the BioMaster database
page and fixed some bugs that appeared in
BioMaster 1.0.
The website has added a navigational home page and a database description page.
And We kept the page simple and adapted to different screen sizes.
Ranking and Recommendation for Searching
In BioMaster 2.0, users can filter the search results according to their own needs with the screening and recommendation function.
Prediction
BioMaster 2.0 adds the EC number prediction function of unknown sequence. If the sequence does not
get the ideal search result, EC number prediction can be used to provide some reference.
Stable Support for Long Term
Whether
software projects can contribute
to the iGEM community has always been one of our key considerations. In order to increase the
sustainability and security of the project, we use the Amazon S3 cloud database
storage method. To easily implement data update, we specifically integrated the update code of all
databases into a program. These improvements make users benefit from it in the long term.
In order to maximize the project's mobility, we used docker to package the entire project, so
that
BioMaster 2.0 can even run independently. We believe these features allow BioMaster 2.0 to have a longer
project life than BioMaster 1.0.
Visualization
We create a web-side SBOL design tool that can implement part design
and adjustment on the web. This plugin is also used in the sequence visualization of search results.
Other Tools
We provide the NCBI web blast API tool to our users, allowing other large databases to be directly
aligned in BioMaster.
To convert different database,we've added the
UniProt Retrieve/ID mapping
API to BioMaster 2.0.
Comparison Results
Compared with BioMaster 1.0
Compared to
BioMaster 1.0, BioMaster 2.0 adds a significant amount of database, adopts a new screening model and goes
beyond
the previous generation model, and has significant changes in website architecture and database structure. All
in all, BioMaster 2.0 has significant improvements in data integrity, search accuracy and user friendliness, as
well as better maintainability and portability for long-term service to the iGEM community.
Compared with Other Software Projects
Compared
with other similar projects, BioMaster is a database rather than community or platform. Therefore, BioMaster
should emphaze more on improving iGEM Registry and providing database services for other software projects. We
not only obtain the information of iGEM parts from iGEM registry, but also present the information of protein
related to parts from 11 databases such as UniProt by sequence alignment, which will probably inspire synthetic
biologists.