Team:UESTC-Software/Advantages

Description

...

Overview

In 2018, UESTC-Software released BioMaster 1.0 and achieved good results. Besides, many users and judges gave valuable suggestions. We valued these feedbacks very much and responded to them in our 2019 work.This year, We did a lot of new work, including improvements based on BioMaster 1.0, and a number of features specific to the BioMaster 2.0 release. BioMaster 2.0 has the following features:
More Integrated Databases
The number of main reference databases has increased from 4 to 8.
Upgraded Framework
The front-end and back-end framework of the website have been redesigned.
New Algorithms for
Sequence Alignment
Using a new sequence comparison model to determine database mapping relations.
User-friendly Interface
More concise,
structured pages.
Ranking and Recommendation for Searching
Innovative weighted equation for searching results ranking and screening function.
Prediction
Adding a prediction tool,which can provide enzymatic function by predicting the EC number.
Stable Support for
Long Term
Packing with docker to facilitate migration.
Visualization
Adding SBOL Editor on web and applying it to feature visualization.
Other Tools
NCBI web BLAST tool and UniProt Retrieve/ID mapping are included.

New Feature for BioMaster 2.0

More Integrated Databases

Compared to BioMaster 1.0, BioMaster 2.0 has a significant expansion in the number of databases. We focused on integrating the enzyme databases which other software teams pay more attention to. Thus, KEGG, BRENDA and ExplorEnz were added to our system.
The KEGG contains information on metabolic pathways, and the BRENDA and ExplorEnz complement the experimental information that is missing in iGEM registry. Although the information of a single gene in a part does not represent the physical and chemical properties and experimental data of the overall part, we believe that the completion of this information can still provide inspiration and data support for synthetic biologists.
We also added the interaction database BioGRID to BioMaster2.0, which provides information on protein, genetic and chemical interactions. The joining of BioGRID complements the STRING and, to some extent, reflects the interaction of part with the environment.

Upgraded Framework

For BioMaster 1.0, we built our website using the ThinkPHP framework. However, with the amount of data and the number of visits increasing, the defects of the BioMaster 1.0 framework exposed: low security, poor scalability, inability to support large numbers of access, and inconvenient maintenance.
Finally we decided to build the Laravel framework in BioMaster 2.0 and completely reset the code logic of the entire site. laravel is one of the commonly used high-performance frameworks, with advantages in scalability and maintainability. Openssl for encryption is used, which has a significant improvement in website security. The back-end tasks are processed asynchronously using queues, and the website can perform long and time-consuming tasks (such as EC number prediction and web BLAST API), and the website will not crash under a lot of access.

New Algorithms for Sequence Alignment

BioMaster 1.0 uses the "preprocessing sequence + machine learning + classifier" screening strategy. However, in the process of arranging the iGEM registry data, we found that the original screening model has many shortcomings. For example, since only the full sequence is considered, the original screening model ignores fragments of gene duplication in the part in some cases. At the same time, the classifier in the model is still essentially dependent on the manually annotated data set and the original data set in our extended BioMaster 2.0.
Therefore, for BioMaster 2.0 screening model, we abandoned the machine learning method and comprehensively judged the quality of the hit sets by setting up preprocessing equations and setting weights. See Model for a detailed description. The new screening method surpasses the original method in the three BLAST parameter indicators and can be better migrated to other projects.

User-friendly Interface

We redesigned the BioMaster database page and fixed some bugs that appeared in BioMaster 1.0. The website has added a navigational home page and a database description page. And We kept the page simple and adapted to different screen sizes.

Ranking and Recommendation for Searching

In BioMaster 2.0, users can filter the search results according to their own needs with the screening and recommendation function.

Prediction

BioMaster 2.0 adds the EC number prediction function of unknown sequence. If the sequence does not get the ideal search result, EC number prediction can be used to provide some reference.

Stable Support for Long Term

Whether software projects can contribute to the iGEM community has always been one of our key considerations. In order to increase the sustainability and security of the project, we use the Amazon S3 cloud database storage method. To easily implement data update, we specifically integrated the update code of all databases into a program. These improvements make users benefit from it in the long term.
In order to maximize the project's mobility, we used docker to package the entire project, so that BioMaster 2.0 can even run independently. We believe these features allow BioMaster 2.0 to have a longer project life than BioMaster 1.0.

Visualization

We create a web-side SBOL design tool that can implement part design and adjustment on the web. This plugin is also used in the sequence visualization of search results.

Other Tools

We provide the NCBI web blast API tool to our users, allowing other large databases to be directly aligned in BioMaster.
To convert different database,we've added the UniProt Retrieve/ID mapping API to BioMaster 2.0.

Comparison Results

Compared with BioMaster 1.0

Compared to BioMaster 1.0, BioMaster 2.0 adds a significant amount of database, adopts a new screening model and goes beyond the previous generation model, and has significant changes in website architecture and database structure. All in all, BioMaster 2.0 has significant improvements in data integrity, search accuracy and user friendliness, as well as better maintainability and portability for long-term service to the iGEM community.

Compared with Other Software Projects

Compared with other similar projects, BioMaster is a database rather than community or platform. Therefore, BioMaster should emphaze more on improving iGEM Registry and providing database services for other software projects. We not only obtain the information of iGEM parts from iGEM registry, but also present the information of protein related to parts from 11 databases such as UniProt by sequence alignment, which will probably inspire synthetic biologists.