Database
Based on more than 40,000 biobricks in the iGEM Registry, we have integrated 10 databases to expand
information and describe biobricks more accurately, including UniProt, PromEC, EPD, QuickGO, KEGG, BRENDA, ExplorEnz, STRING, BioGRID and NCBI PubMed. All the information contained in BioMaster has been
verified by experiments, which ensures the high reliability of data.
Fig.1 Database integration(All the above logos are from official databases.)
Data Integration
In 2018, the BLAST(BLAST+ 2.71)-based local sequence alignment method was used to compare all sequences in
other databases to find other biobricks related information.
In 2019, we found that only considering full sequence partial alignment would mistakenly lose part of necessary
information. But there are many errors or redundant information in annotation of sequence in iGEM Registry. To
improve the completeness of information, we finally reserved the full sequence of each part and the featured
region sequence related to the 3 types of coding proteins as the settings of BLAST[1] to align the partial
sequence with the sequence of other databases.
Fig.2 Process of data integration
Database Structure
This year, MySQL was used to store data for higher stability and expansibility. We expanded
the number of tables and imported last year's CSV data into MySQL. To reduce redundancy and facilitate the call
of data, we improved the data paradigm and enhanced the relations among tables.
Fig.3 Database structure
iGEM Parts
We've downloaded the latest iGEM parts tables directly from iGEM Registry through POINT-IN-TIME
DATABASE DUMP to ensure the accuracy and timeliness of information. Simultaneously, the mapping relationships
among different databases were expanded and established based on the iGEM parts.
Fig.4 iGEM POINT-IN-TIME DATABASE DUMP
Interaction
The interaction information between biobricks and certain components are provided. We
collected genes contained in each part from iGEM Registry and analyzed the interactions in STRING[2]. Besides,
BioGRID enrich the interaction information, too.
To visualize the interaction between biobricks, we utilized the interactive graphics package Cytoscape.js to
draw interactive scatter plots.
Fig.5 Interaction of biobricks
Enzymes
Enzyme fragments are often the key to iGEM parts or devices. In order to provide better
guidance for experimenters, KEGG, BRENDA[3] and ExplorEnz[4] were inclued to supplement the enzymes-related
information. We used EC number as feature key to connect the enzymes-related databases. These databases can
replenish the information of enzyme activity conditions, reactant products, references and so on.
Search
To improve user experience, BioMaster offers several search methods. Multiple IDs (such as
iGEM_ID, UniProt_ID, EC number, gene name or EPD_ID) and keywords can be used to find corresponding biobricks.
In addition, we provide sequence aligment tool (BLAST[4]), which makes it possible to find the matching biobricks through the sequence
directly.
Fig.6 Different search methods
Keywords Search
The keyword search of BioMaster 1.0 has been well received, and many users hope that we can further improve.
Considering the maintainability and scalability of data, we used Elasticsearch to implement keyword search.
ID Search
BioMaster supports many kinds of database ID retrieval methods, enabling users to get biobricks in a variety of
ways and view the data from multiple perspectives.
Sequence Search
We spotted that many users wanted to search through sequences, so we provide the sequence aligment searching in BioMaster.
Users can locate the information by setting appropriate thresholds with sequence information.
Team Wiki Search
It is very helpful for new teams to study the previous projects. Based on version 1.0, we
updated the team information and profile pictures of 306 teams in 2018. Single awards were also posted if any of them had ever earned.
Now users can search in "Year & Single Award" to get the team that won the single awards in that year.
Fig.7 Team wiki search
Cloud Database Service
Our entire project takes the strategy of local server & cloud database storage. The files are
stored in the Amazon S3 cloud database service. The cloud database ensures the security of local projects and
makes it easier to provide data supporting services to other teams.
We allow users to download data of biobricks in FASTA format and GenBank format. For the whole database, we
provide data download in SQL, CSV and XML format. Users can create a same database through it.
Web Page
To make BioMaster more user-friendly, we specifically designed a guiding home page and the
top navigation bar. Screen adaptations for different devices were designed this year. On the search results
page, we modeled the NCBI PubMed and UniProt design side navigation and filtering. All efforts are made to
create a smooth user experience.
Additonally, we used Bootstrap Response Framework and other plugins in the front-end to design the web pages'
response, which meet the users' needs for mobile devices.
In the back-end, we adopt the PHP framework: Laravel. Laravel is currently the most popular
and widely used PHP framework in the world because of its feature of simplicity and elegance, which can free us
from the messy code.
Docker is an open source application container engine that allows developers to package their
applications and dependencies into a portable image and then publish it to any popular Linux or Windows machine
for sharing the development environment.
We developed the version of BioMaster 2.0 that can be run in docker. After downloading the website file, users only
need to install docker and run images to run the website program independently. It has the characteristics
of data independence and mobility. At the same time, the website is completely open source, other developers
can pull the image through our document and share the development environment with us.
Elasticsearch(the most popular open source search engine today) was used for the keyword
full-text search.
BioMaster 1.0 used the MySQL database full-text matching default dictionary to implement keyword search, which
not only has low performance, low operability, but also needs to update the dictionary synchronously with the
database update. For improvement, We turned to Elasticsearch to implement keyword search. Elasticsearch is a
search engine based on the Lucene library. with a high performance and fast search, and offers a wide range of
operations. At the same time, Elasticsearch doesn't need to process the dictionary, thus the updating of database
will not affect the search performance.
User-friendliness
BioMaster made great efforts to improve user experience:
We designed a graphical guide home page, a simple and easy-to-use navigation bar. It dedicated to ensure a
delightful experience for users when they first visit BioMaster. Besides, we integrated frequently-used functions
such as ID conversion between databases, NCBI web page version BLAST, etc.. Web design was also improved.
According to our survey, ranking of search results came to the first among the features that BioMaster was
expected to have. So we especially improved the search process and result display and added search screening
and sorting functions. By adjusting the ranking method on the search results page, users can get desired
results.
Furthermore, many users need to find the latest part or more descriptive information when using parts. To meet
this demand, we specially came up with the weight system. Users can adjust the weights of "document
quantity", "keyword matching degree" and "descriptive information quality". BioMaster will show the adjusted
search recommendation results on the next refreshed page.
New Function
EC Number Prediction
The EC number
prediction[5] is an automated EC number based enzymatic function prediction method, that takes the amino acid
sequences as inputs. It was constructed considering an ensemble prediction approach, where the results of 3
different predictors with different qualities are combined. The tool can provide probabilistic enzymatic
function predictions for uncharacterized protein sequences.
Fig.8 EC prediction[6]
Visualization of SBOL Diagrams
People usually
search for interested parts and works in the design repository of SynBioHub. When displaying the search
results, the user can intuitively understand the structure of the biobrick through the visual interface,
which is very satisfying.
Fig.9 Example on SynBioHubs
SynBioHub did not update the contents of the library in a timely manner,
and it did not
correct erroneous features, so we combined the contents of BioMaster 2.0 with SynBiohub to optimize the visual interface[7] in order to give users a better search experience.
Fig.10 Visual interface in BioMaster 2.0
SBOL Design Tool
SBOL
Designer-3.0[8] is a JAVA-based visual aiding design software that assists users in designing new bio-bricks or
simple gene sequences.
The software can be downloaded on GitHub, but for most synthetic biology users who
don't understand the JAVA
language, it is extremely difficult to install this software. For this reason, our team has created a web-based
SBOL[9]-design tool, users can just click the button to use it. The web page is capable of visualizing the genetic design.
With the help of this tool, users can add different components such as promoters, CDS, etc.. They can also add
corresponding information such as introductions, sequences, etc. to the components. After the design is
completed, users can save the designed content into a picture or GenBank format, which helps users to
store the design to a certain extent.
Fig.11 SBOL-design tool
Others
We inherited last year's promoter prediction tools. Please refer to BioMaster1.0_description and
BioMaster1.0_model for details.
For users’ convience, we added two useful small functions to BioMaster 2.0: NCBI
BLAST in website with commonly-used parameters, and the ID conversion tool between databases implemented by
UniProt API.