Cloud Computing for Drug Discovery

Last year cloud computing has become a major topic of conversation in drug discovery companies.  At the San Diego Bioinformatics Forum this year, Sandor Szalma gave a great presentation on how cloud computing was used at his company, Centocor (a division of Johnson and Johnson).  Just prior to his talk, BioITWorld featured a article on Pfizer’s use of antibody docking in the cloud.  And just recently, they devoted the entire issue to cloud computing.

What Is Cloud Computing?
So what exactly IS cloud computing?  The answer, of course, depends on whom you ask — there are 3 different approaches to cloud computing.

  1. To some vendors, cloud computing is hosted computing.  The software and data for your company run on a hosted server.  (This is’s approach).
  2. To the users of Google App Engine, it’s a technology platform that allows you to host your own services at negligible cost, and with the ability to scale as your needs change.
  3. To the users of Amazon’s Web Services, it’s a means for hosting Virtual Machine Images and easily deploying them.

And it’s this latter scenario that has received the most traction within the life sciences community.

What are the benefits?
Cloud computing provides ready access to high-performance hardware without the added costs associated with building and provisioning your own server room.  Cloud computing can let you deal with peak computing needs, so that you don’t have to buy extra hardware to meet a temporary need.  Often when attempting to host similar research computing applications internally, one ends up having long conversations about what the predicted disk and memory usage will be for the application.  As with many research applications, these types of metrics are hard to gauge up-front.  Cloud computing makes those types of discussions unnecessary, and in many cases bypasses the whole IT provisioning process altogether.

Some pharma’s are taking a page from the cloud computing book, and standardizing on a particular set of Virtual Machine Images (VMI).  The advantage to this approach is that they can simply hand you a VMI that is guaranteed to work in their server environment.  You can develop and test your application with the assurance that everything will work properly when deployed in the production environment.

Over the past few years, pharmas have been adding to their pipelines through collaborative agreements with smaller biopharmaceutical companies.  Cloud computing makes it possible to provide the infrastructure required to support those collaborations in “neutral territory” that both parties have access to.

Is It Safe/Secure?
As with any application development project, things are as secure as you make them.  In the case of Centocor and Pfizer, Amazon’s cloud offering was vetted by internal security resources.  Amazon recently began offering a Virtual Private Cloud, basically a cloud application with VPN access.

What’s Missing?
At the moment Amazon offers a number of features that make it easier to develop informatics applications in a cloud:

  • Access to large public datasets like PubMed, UniGene, GenBank and PubChem.
  • Amazon Machine Images (AMIs) with preconfigured tools make it easy to get started.  The bioinformatics image is a good example of this.
  • Built-in infrastructure to support relational databases and web application servers.
  • OS Support for Windows, Linux and Solaris.

Despite all of these features there are still a few things I would like to see:

  • Additional public datasets for EntrezGene, PDB, Pathway Commons and ontologies like the NCI Thesaurus.
  • VMIs tailored to specific roles within the drug discovery process.  The existing bioinformatics image provides tools for genomics analysis, but not much else.  Tools for target identification, compound registration, compound management and screening are not yet available, and a tiered offering that provides an open source set of tools for startup companies, and a proprietary set of tools for big pharma’s would definitely make the transition to cloud computing easier.
  • Moving large datasets around is still problematic.  Amazon offers a bulk loading service whereby a company can send a disk of data to them to be loaded onto your grid.  This method seems to be the most efficient approach for loading large datasets.

Where can I get help?
I liken the current state of cloud computing to Microsoft Office without templates.  Sure you can get work done, but the results aren’t often pretty, and it will take you a while to get there.  Fortunately, there are a number of organizations with pharma experience helping to bridge the gap between Amazon’s offering and the real-world needs of pharmaceutical companies.  Here in San Diego, Cirrhus 9 has developed AMIs that help companies with their document management needs in a regulated environment and more offerings are in the pipeline.    Organizations like BioTeam and Cycle Computing are also working to bridge the gap.


About aspenbio

I write software for scientists. I'm interested in Java/Groovy/Grails, the Semantic Web and Cancer Biology.
This entry was posted in Bioinformatics, Informatics and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s