BD2K – The NIHs Big Data To Knowledge Effort

The topic at a recent Bioinformatics lunch discussion was the NIH’s Big Data To Knowledge (BD2K) program.  The BD2K’s Mission Statement (below) gives you some ideas for where the program is heading…

BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement.

The BD2K initiative addresses four major aims that, in combination, are meant to enhance the utility of biomedical Big Data:

  • To facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable.
  • To conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data.
  • To enhance training in the development and use of methods and tools necessary for biomedical Big Data science.
  • To support a data ecosystem that accelerates discovery as part of a digital enterprise.

Overall, the focus of the BD2K program is to support the research and development of innovative and transforming approaches and tools to maximize and accelerate the integration of Big Data and data science into biomedical research. [BD2K Mission Statement]

But as the interview below, with project director and noted bioinformaticist Phil Bourne indicates, the scope of the initiative spans the entire research life cycle (ideas, hypotheses, experiments, data, analysis, comprehension, dissemination). In concrete terms that means tools for authoring, collecting, analyzing and visualizing and data.  In part, the project hopes to address some of the issues around reproducibility, discoverability and provenance of data and software that have impeded the ability of industry to leverage academic research.[Assessing Credibility Via Reproducibility] [Reproducibility and Provenance]

The Commons is a program within the BD2K initiative that seeks to improve the discoverability of data, provide open APIs (data and tools), unique IDs for research objects, containers for packaged applications, running in cloud and HPC environments.

My goal, at this month’s meeting was to learn more about the program from some of the local participants.

Ben Good from the Su Lab at The Scripps Research Institute described some of the work they were doing.  Earlier this year, they held a Network of Biothings/BD2K Hackathon to kickstart some projects.  During that meeting the

Chunlei Wu, also of the Su Lab, is developing a “Community Platform for Data Wrangling of Gene and Genetic Variant Annotations”.

At the PDB, Peter Rose has been working on a compression technique for 3D protein structures as part of the BD2K’s Targeted Software Development program.  This technique makes it possible to stream complex 3D structures in the same way that you might stream a YouTube video.


About Mark Fortner

I write software for scientists doing drug discovery and cancer research. I'm interested in Design Thinking, Agile Software Development, Web Components, Java, Javascript, Groovy, Grails, MongoDB, Firebase, microservices, the Semantic Web Drug Discovery and Cancer Biology.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s