BioGroovy and the Semantic Web

Recently, I found myself putting together a presentation on BioGroovy.  One of the key features that I’ve been taking advantage of is the ability to use BioGroovy models as Grails domain objects.  This makes it possible to easily download and persist Genes, Proteins, and PubMed articles into a relational database.  However, I always ask myself whether putting things into a database makes sense.  Occam’s Razor suggests that this isn’t necessarily always useful, especially with hierarchical data.

It occurred to me that we could use the JenaBean API to serialize these models as RDF.  In preparation for the presentation, I hacked together a unit test that was capable of generating RDF for and it worked amazingly well.  Here’s an example of how to use the annotations to generate RDF: 

package org.biogroovy.models

import thewebsemantic.Id
import thewebsemantic.Namespace
import thewebsemantic.RdfProperty

/**
* This class represents a Gene.
* @author mfortner
*/
@Namespace(“http://www.mygrid.org.uk/ontology#”)
class Gene implements Sequence {

/**
* The EntrezGene ID for the gene.
*/
@RdfProperty(“Entrez_Gene_ID”)
@Id
int entrezGeneId;

/**
* The name of the gene.
*/
String geneName;

/**
* The symbol for the gene.
*/
@RdfProperty(“HGNC_symbol”)
String geneSymbol;

There are three annotations that I used. @Namespace at the beginning of the class allows me to declare the ontology that I’m using.  In this case, it’s the MyGrid bioinformatics ontology developed by the University of Manchester.  This is a great starter ontology that I used as a proof-of-concept, but in the long term we’re going to need a more all-encompassing ontology that describes more than just IDs, accessions, and sequences.  Unfortunately, NCBI does not currently maintain an ontology of the fields in its databases.  I did find an ontology at Linked Life Data.  The only trepidations that I have about using it, is that it’s new.  Any ontology not maintained by the originators of the data though may be problematic.  My preference would be if NCBI actually maintained the ontology for their data (and maintained their own triple-stores for that matter) as it will insure that the data and metadata will be kept up-to-date.

The @Id annotation allows you to indicate which field in the model class should be used as an ID when generating the RDF.  Currently, we’re using the EntrezGene ID, however, this may result in collisions with other databases and we may need to prefix it with “entrezgene:” or some other similar identifier.

Lastly, the @RdfProperty annotation allows you to map individual fields in the model against fields identified in the ontology.  Again, since the ontology we were using was rather limited in the number of fields that it identified, we were only able to tag a few of the fields.

To generate the RDF we use the following snippet:

OntModel ontModel = ModelFactory.createOntologyModel();
Bean2RDF writer = new Bean2RDF(ontModel);
writer.save(gene);
ontModel.write( new FileOutputStream(new File(“./638.rdf”)));

Pretty neat.  With a little extra work this might make a handy Grails plugin.

About aspenbio

I write software for scientists. I'm interested in Java/Groovy/Grails, the Semantic Web and Cancer Biology.
This entry was posted in Bioinformatics, Informatics, Semantic Web and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s