BioTuring Data Delivery Pipeline

Project information

BioTuring data delivery pipeline is used by BioTuring's data science team to publish the curated data to the public database for BBrowser's users to download. The pipeline takes data release request from the data adminsitrator through the interface on BBrowser. The request is pushed to a delivery task queue where the task manager will distribute the requests to the workers. Each worker processes one request & releases the data to Amazon S3, metadata to MongoDB, and logs the result to MySQL DB.

My Contributions

BioTuring Data Delivery Pipeline

  • Built the pipeline's architecture, API server, and the interface for users to interact with.
  • Maintained the pipeline & fixed existing bugs.
Designed by BootstrapMade