The Secret Names of Things: Naming The Database

In my previous post I described how an ontology like EDAM could be used to provide applications with consistent names for objects and their attributes.  Let’s take a look at how we can add and retrieve ontology information to/from a database.

Adding Ontology References
All relational databases provide some level of table and column-level metadata. Although the SQL syntax for adding metadata may differ from one database to another, the method for retrieving that metadata is the same using Java’s JDBC API.

In this example, I’ll show you how to add ontology references to a database, and retrieve those references using Groovy.  I’ll assume that you have both the Java JDK and Groovy installed on your machine.

If you use the MySQL database, then you can add an ontology reference using this syntax:

ALTER TABLE target MODIFY entrez_gene_id varchar(255) NOT NULL 
COMMENT 'http://edamontology.org/data_1035';

Here we have a column called entrez_gene_id that we want to explicitly describe.  There isn’t a way to simply add the comment to the column, so we have to modify the column definition.  The red text shows the original column definition.

Retrieving Ontology References
To retrieve the database metadata simply use the snippet of code below:

import groovy.sql.Sql;
import java.sql.ResultSet;
@GrabConfig(systemClassLoader=true)
@Grapes(
    @Grab(group='mysql', module='mysql-connector-java', version='5.1.18')
)

// get the metadata for the pipeline_dev database
def sql = Sql.newInstance('jdbc:mysql://localhost:3306/pipeline_dev', "", "", 'com.mysql.jdbc.Driver')
def metadata = sql.connection.metaData
ResultSet rs = metadata.getTables(null, null, null, null);

// parse a local copy of the EDAM ontology
def ontology = new XmlSlurper().parse(new File("/Users/markfortner/Downloads/EDAM.owl")).declareNamespace(
    owl:"http://www.w3.org/2002/07/owl#",
    rdf:"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    oboInOwl:"http://www.geneontology.org/formats/oboInOwl#")

// iterate through the metadata, if a column has a REMARK
// assume the remark is the ontology reference URI, and 
// lookup the definition for the field in the ontology
String tblName = null;
while(rs.next()){
   tblName = rs.getString('TABLE_NAME');
   println ">>>>>> "+tblName 
   ResultSet tblMetadata = metadata.getColumns(null, null, tblName, null);
   while(tblMetadata.next()){
      String url = tblMetadata.getString("REMARKS");
      def definition = "";
      if (url != null && url != ''){
         definition  = ontology.'owl:Class'.grep{ it.'@rdf:about' == url }.'oboInOwl:hasDefinition'.get(0)         
      }
      println tblMetadata.getString("COLUMN_NAME")  + "\t"+ definition;
   }
}

This will give you the following results:


>>>>>> target
id
version
description
entrez_gene_id Identifier of a gene from the GeneDB database.
name
pdb_id

In this example I’ve used a local copy of the ontology and used it to look up the description of entries.  This gives us a nice small self-contained example to use. Another approach would be to use the NCBO’s RESTful interface to fetch the ontology definition found at the URL above.  You’ll find more information on using the NCBO’s services here.

Advertisements

About aspenbio

I write software for scientists. I'm interested in Java/Groovy/Grails, the Semantic Web and Cancer Biology.
This entry was posted in Informatics, Semantic Web and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s