Fetching Data With BioGroovy

With BioGroovy 1.1 we’ve added support for fetching data from a variety of RESTful webservices, including: EntrezGene, PubMed, UniProt and many others.

Fetching Data
In this example, we’ll fetch gene information from EntrezGene for 3 genes, and output the result to the console.

import org.biogroovy.io.eutils.EntrezGeneSlurper;
import org.biogroovy.models.Gene;
@Grab(group='org.biogroovy', module='biogroovy', version='1.1')

EntrezGeneSlurper slurper = new EntrezGeneSlurper();

println “Gene”
println “Symbol\tEntrezGeneID\tName”

List<Gene> geneList = slurper.fetchAll(‘675,1034,133’);
geneList.each{Gene gene ->
println “${gene.symbol}\t${gene.entrezGeneId}\t${gene.name}”

In the slurper.fetchAll() method call, we pass a string containing a comma-delimited list of EntrezGene IDs.  This return a list of Gene objects.  We iterate through the gene list and print the results out to the console.

Posted in Bioinformatics, Informatics | Tagged | Leave a comment

BioGroovy and Web Service Identity

With the recent release of BioGroovy 1.1, we added support for a number of web services. Web service providers like NCBI’s eUtils, EntrezAjax and JournalTOCs have requirements for tracking usage of their services.  In some cases, a token must be passed along with each request that identifies the tool making the request, or a specific user email address.

To support this type of interaction, a BioGroovyConfig class was added.  In this example, we’ll see how the EntrezGeneSlurper class takes advantage of BioGroovyConfig.  In the constructor for EntrezGeneSlurper you’ll see the following snippet:

ConfigObject conf = BioGroovyConfig.getConfig();
this.tool = conf.eutils.tool
this.email = conf.eutils.email

The first line looks for a biogroovy.conf file in your ~/.biogroovy directory. If it doesn’t see a file there, it copies the default configuration into this directory, and throws an exception, letting the user know that they’ve failed to configure the biogroovy.conf file properly. The default configuration file does not contain any real identity information, and so it must be updated with real information in order to be used. Here’s an example of what the default file looks like:

eutils = [
tool : 'biogroovy',
email :'goofy@disney.com'

The biogroovy.conf file is in reality a groovy file that is parsed as a groovy ConfigObject. In the first line, we’re declaring the eutils properties, tool and email that need to be sent with each request. In the second line we’re setting the journaltocs.userid property, and in the third line; the EntrezAjax userid. In each of these cases, you’ll need to replace these default values with your own values.  The links in this paragraph will take you to the registration pages for these services.

After you’ve configured the biogroovy.conf file, you can run the EntrezGeneSlurperTest, and see the results.

Posted in Bioinformatics, Informatics | Tagged , , | Leave a comment

Tinkering With BioGroovy 1.1

Since the initial release of BioGroovy, a lot has changed, and the library has continued to grow substantially. With the recent BioGroovy 1.1 release, I thought I would review some of the changes, and update the information on how you can get started using BioGroovy. Here’s a brief list of some of the changes:

  • Support for new model objects, including Drug, ClinicalTrial
  • A new search engine client API with support for EntrezGene, PubMed, and ClinicalTrials.gov.
  • Refactored IO framework that supports:
    • direct fetching of data, and mapping into model objects (“fetch KRAS from EntrezGene [id=3845] and return the result as a Gene object”).
    • caching of data in your local file system to make unit testing your code easier and to reduce the load on external web services.
    • New fetchers to support fetching data from EntrezGene, PubMed, UniProt, MyGene.info, OMIM, ClinicalTrial.gov, JournalTOCS, Chembl, and PubChem
    • The frameworks also support either the use of JSON results, or XML.
    • Support for web service identity.

In addition to these changes, we’re also publishing the BioGroovy binaries, source and documentation through Bintray.  This means that you’ll want to update your .groovy/grapeConfig.xml using the instructions found here.

BioGroovy Models
BioGroovy uses POGOs (Plain Old Groovy Objects) to hold commonly accessible data. In a typical usecase, you might want to fetch a list of Genes from EntrezGene, and write the results out in an Excel file or CSV file, or to a database. With the 1.1 release, we’ve added support for ClinicalTrial objects, Drugs, Journals, RSS feeds.  We’ve also added support for clustering, to let users generate graphs of data that can be rendered using Cytoscape.  For example, you can cluster a set of genes by GO terms, and export the result as a SIF file using the Go2SIFClusterWriter. You can cluster articles by keywords, journal or MeSH terms.  You can also use the FrequencyMap object to create a simple map of the number of occurrences of a particular object.

Posted in Bioinformatics | Tagged , , | Leave a comment

Network of BioThings Hackathon #2

This weekend’s Network of BioThings hackathon (#hackNoB) was a great success with some really innovative projects making great strides in a short amount of time.

The event was hosted by the Jeff Grethe of the Neuroscience Information Framework group at UCSD, and organized by Ben Good (Su Lab @ TSRI), Dexter Pratt (NDEx project) with the help of many others, and sponsored by NDEx, San Diego Center for Systems Biology, and the International Society for Biocuration.

This year’s winner was the Citizen Science team, who hacked the BRAT web-based document annotation tool.  This application, lets citizen scientists annotate abstracts with gene, drug, and disease information along with the connections between these semantic types. A tool like this promised to make it easier to extract relevant facts from the avalanche of publications.

This may, at some point be used to feed annotated journal articles into a project like CIViC (Clinical Interpretation of Variants in Cancer).

In second place was the SBiDer project (a tool for developing Synthetic Biocircuits).


Posted in Bioinformatics, Cancer Research | Tagged , , | 3 Comments

Google Docs for Scientists

Science is inherently a collaborative effort, and at least once a month I encounter someone who mentions in passing some trial or tribulation they had when sharing documents.  The story usually goes like this…

We were working on a presentation/paper for a meeting.  Everyone had last minute changes, new data to share, and somehow, someone accidentally picked up the wrong version of the document and started editing.  Everyone was frustrated because, they had to get their updates in, and they were all waiting for Joe to finish his changes.  Joe went to lunch and left the file open and no one could get any work done, etc.

Continue reading

Posted in Informatics, Science Blogging, Uncategorized | Leave a comment

Google+ For Scientists

Recently, I’ve been thinking about the role that social media plays in science. And while friends are fond of pointing out that I’ve drunk the Kool-Aid, colleagues at various labs usually shoot me a quizzical look when I bring up the subject.

Perhaps the biggest benefit to social media is that it acts like a peer-reviewed lens that brings the latest developments in your field into view — a conference that runs 24-7 and acts as a form of social democratization for scientific thought. Someone you wouldn’t dream of approaching in the real world, is instantly more accessible on Twitter, Facebook or Google Plus. They are also more frank than they might be if you approached them in person.

Continue reading

Posted in Science Blogging, Social Media | Tagged | Leave a comment

What Grails 3 Brings To The Research Informatics Lab

In my previous post, I talked about the current state of the Grails platform and what it brings to the Research Informatics environment.  In this post, I’ll discuss some of the features in the upcoming Grails 3 release, and the potential impact of those features on Research Informatics organizations.

Continue reading

Posted in Bioinformatics, Informatics | Tagged , | 2 Comments