Team:UESTC-Software/Human Practices

Description

...

Overview

We spend a lot of time and energy to construct our models, which played an important role in our project.
Software development engineering mainly includes requirements analysis, software design, code implementation, maintenance and testing, which are divided into the following three stages. All project development and human practices follow the below model. Activities held at different stages have different priorities and different needs, then we adjusted our response measures accordingly. Through a series of human practices, our project was gradually perfected and completed successfully.
During the software development, requirements tends to be constantly updated. Software design will probably contain many backtrackings especially during software testing and maintenance stage, which indicates that it is more likely to repeat some former time-consuming works. We have always been occupied with conducting the demand research for the software from experimental teams or synthetic biologists. Therefore, the whole Human Practices focused on the investigation from stage1&stage2.

Design and Implementation

We spend a lot of time and energy to construct our models, which played an important role in our project.
After deciding the topic of improving BioMaster to make it an auxiliary tool for Gene-CAD system, we hope to learn more about the software requirements from synthetic biology researcher and iGEMers. In the design and implementation stage, we have simultaneously conducted three rounds of research in the spiral model which is shown below.
Try to hover the mouse cursor over the figure.
Expanding the database Optimizing search Adding EC Prediction Copyright of database Adding Education Module Optimizing database structure Feedback to iGEM about mistaken mark of feature ElasticSearch to optimize search Coding

First Round

Requirement Analysis 1: Expanding the database

Communication with the Leader of UESTC-China 2015

We had the honor to interview and invite the leader of 2015 UESTC-China Kaiyue Zhang to try BioMaster 1.0. As an experienced predecessor, he not only explained the core of synthetic biology for us, but also pointed out BioMaster’s shortcomings and the methods of improvement.
Summary
1. For the completeness and timeliness of data, it’s necessary to expand the database and update the information contained in it timely.
2. It’s more user-friendly to provide the track of each part.

" A Gene Computer-Aided Design system is necessary for synthetic biology. It indeed requires a high standard database. The timeliness and completeness of data incredibly matter to database. In addition, given that the iGEM competition includes 12 tracks, such as diagnosis, treatment, environment, etc., each team with the different tracks will have various desire on information searching, so classification according to the function of the part and the origin of species can be added. "
—— Leader of 2015 UESTC-China Kaiyue Zhang

Requirement Analysis 2: Improvement

Questionnaire about Requirements

For the upgrade of database, in addition to improving some shortcomings (such as optimizing interface display and the mode of user interaction, etc) caused by time factors, we need more professional and targeted opinions from iGEMers and synthetic biologists to promote the function of the database, so questionnaire survey is indispensable.
The survey was conducted for two groups: users who had never used BioMaster 1.0 and users who had run BioMaster 1.0. For the former people, we mainly wanted to know their opinions on the necessity of database integration and their particular needs; for the latter, we were inclined to receive their specific advice on the previous version and improving methods. Results of the survey are displayed as follow:
PS: A total of 131 people filled in the questionnaire effectively, 61 of whom had tried BioMaster1.0.

Click for More Details



Through chi-square test and other methods, we draw the following conclusions:
1.iGEM Registry has many problems, so it is really necessary to continue to improve the database.
2.BioMaster needs to expand information and improve interface design.
3.BioMaster needs to provide more detailed information about part/gene interactions and environmental conditions of enzyme expression.
Summary
1. STRING, BRENDA, KEGG, BioGRID and other practical databases can be integrated into BioMaster.
2. Enhancing the user interaction is necessary for better experience.

Software Design: Copyright

Consultation with Law Firm

In view of whether the use of databases contained in BioMaster will lead to infringement, we consulted lawyer Liang for professional advice. He has handled many related events and has rich experience in this field. He indicated that since most databases are open resources and there’s no confirmation on these databases that we are able to use it. No intellectual property issues are involved generally.

Summary
1. We need to check carefully on every database to avoid any missing possible regulation on the property of database.
2. It’s necessary to make a statement mentioned clearly what we integrated and we have no commercial intentions.

Click for Complete Dialogue

Implementation: Feedback to iGEM

Email to iGEM HQ about the Feature of Part in iGEM Registry

As an auxiliary tool of CAD, the accuracy and completeness of data are very important when integrating database. However, when we were dealing with the Part Registry provided by iGEM, we found many problems. Our project was carried out centering on iGEM database, so the non-standardization of data greatly increased the workload of our project. For many experimental teams, the wrong labeling of part feature also had a bad effect. So we sorted out the major feature annotation problems and sent an email to the official HQ.
Summary
In order to urge other teams to submit part normatively in the future, we proposed that the official should strengthen the review of part submission to promote the standardization and modularization of synthetic biology through strict specifications.
Common feature annotation errors:
1. Some ID display of incomplete
2. Incorrect or missing markup of Label
3. Error marking of site
4. Error marking of Length
5. The submitted part sequence is empty

You can click on the carousel map below to view details.

Second Round

Requirement Analysis: Optimizing Search

2019 Southwestern iGEM Exchange Conference

We held the 5th Southwestern iGEM Exchange Conference for iGEMers with UESTC-China, and invited four iGEM teams and Ms. Zhang Nan, the Asian Ambassador of iGEM.
The meeting stressed on the inspiration in exchanges of projects among teams. From this conference, we also learned that different teams have different priorities on software requirements. They said that BioMaster 1.0 only used Brute Force algorithm for string matching, which has slower response and irrelevance search results, so it is recommended to optimize the algorithm.
Summary
1. For optimization of search, we decided to replace Brute Force with a better way which would improve the relevance of keywords.
2. The accuracy of mapping between databases needs to be considered especially.

Software Design: Education Module

Interview with Professor Zheng Xuelian

For the improvement of the project, we need more professional advice for guidance. Professor Zheng Xuelian has been involved in the regular research of synthetic biology. We mainly consulted her about the development trend and popularization of synthetic biology. We also received some valuable suggestions on our project design.
Summary
1. In order to promote synthetic biology, it’s more effective to highlight its practical applications in our daily life.
2. Adding an educational module can not only achieve the purpose of popular science, but also attract more users and improve the practicability of BioMaster.

Click for Complete Dialogue

Software Implementation: Optimizing Search

Interview with Professor Zeng Dong about Search Optimization

We turned to Professor Zeng Dong for help with the suggestions (optimizing the search) put forward by other teams at the Southwest Exchange. He recommended Elasticsearch for the optimization of keyword search.
Elasticsearch is a distributed massive data search and analysis technology that can be used to analyze massive amounts of data in near real time. Elasticsearch's natural architecture for distributed data analysis operations, near real-time (second) performance support on the order of magnitude of massive data, extremely powerful syntax support for search and aggregate analysis make ElasticSearch more suitable for data analysis applications in big data scenarios.
Adopting this search engine, we only need to choose which deployments the users can search for. The rest of the work is all completed by ES, including the certificates to create trust communication between deployments.
Summary
Elasticsearch is a better search engine. It can promote the relevance of keywords and speeded up the search response through this portable way, which optimized the user experience.

Third Round

Requirement Analysis: Add Function--- EC Prediction

Visit to Sichuan Junyu Biotechnology Co., Ltd.

In the third round of research, we mainly asked opinions about the functionality of the database. We visited Mr. Zhang Yunfei, a researcher of Sichuan Junyu Biotechnology Co., Ltd. He made suggestions on the relatively practical data in biological research and emphasize the importance of prediction in biological research.
Summary
1. Promoter Prediction can be retained.
2. EC Prediction is a new feature of this year. Certain reference value for iGEMers on the search can be offered.

Click for Complete Dialogue

Software Design: Optimize Data Structure

the 6th Conference of China iGEMer Community

From August 20th to 23rd, we participated in the 6th CCiC, held by Shenzhen Institute of Synthetic Biology, Chinese Academy of Sciences. There were as many as 70 teams. We have received many suggestions from other teams and judges, as well as opportunities to cooperate with other teams.
During the presentation, judge raised questions about the data structure, the relationship between tables, and gave several optimization suggestions. On this basis, when we used MySQL to store data, we took the database structure into consideration. Some teams suggested us to enrich features, such as visual plasmid compositions.
Summary
1. We improved the data paradigm to a certain extent, and enhanced the relationship between tables, so that it could not only reduce redundancy, but also facilitated the data invocation.
2. ID conversion and SBOL design can be added to enrich BioMaster.

Maintenance and Testing

Interview with Professor Liang of Sichuan University

After all the implementation, we got the test version, and contacted the Professor Liang shufang of the State Key Laboratory of Biotherapy, Sichuan University. Through the communication with her and her students, we got a new suggestion: one-click update function. Considering the maintainability, sustainability and timeliness of the database, we have added the function to update the database regularly. At the same time, the senior also mentioned that some users are not quite clear about the function of EC prediction, and the result display information after EC prediction is too brief and should be more detailed. We have added an introduction to EC and enriched the presentation of the results.

Test and Feedback

In addition, we invited many iGEM teams to test our software and received a lot of feedbacks. And then we improved the aesthetic and eliminated some bugs.

Collaborations with Others

After all the modifications and refinements, we collaborated with Tongji-Software and USTC-Software’s projects. Our collaboration not only makes their project more complete, but also makes our project more meaningful.