Adding Semantics to XML Schemas

Recently, I was working on a UniProt parser for the next BioGroovy release, and while looking through the UniProt schema


UniProt (Photo credit: Wikipedia)

I started to wonder why the elements in schemas weren’t annotated with references to ontologies?  Let’s take a look at an example.

A typical UniProt record contains many “reference” elements like this:

<reference key="2">
 <citation type="journal article" date="1998" name="Biochem. Biophys. Res. Commun." volume="244" first="285" last="292">
 <title>cDNA cloning, expression, subcellular localization, and chromosomal assignment of mammalian aurora homologues, aurora-related kinase (ARK) 1 and 2.</title>
 <person name="Shindo M."/>
 <person name="Nakano H."/>
 <person name="Kuroyanagi H."/>
 <person name="Shirasawa T."/>
 <person name="Mihara M."/>
 <person name="Gilbert D.J."/>
 <person name="Jenkins N.A."/>
 <person name="Copeland N.G."/>
 <person name="Yagita H."/>
 <person name="Okumura K."/>
 <dbReference type="PubMed" id="9514916"/>
 <dbReference type="DOI" id="10.1006/bbrc.1998.8250"/>
 <scope>VARIANT ILE-57</scope>

And my question was “What do the Scope elements signify”.  Are they evidence of an assertion about the function of a protein similar to the GeneRIFs in EntrezGene?  In the example above, does VARIANT ILE-57 mean that “there exists a variant of this protein called “ILE-57” and this document is evidence of that assertion?

In order to answer that question, I started digging around in the XML Schema file for the UniProt file format.  The schema’s rather cryptic answer was this:

<xs:element name="scope" type="xs:string" maxOccurs="unbounded"><xs:annotation><xs:documentation>Describes the scope of a citation. Equivalent to the flat file RP-line.</xs:documentation></xs:annotation></xs:element>

While musing over this answer (“What’s an RP-line”), it occurred to me that some light might be cast on the situation if the elements in the schema pointed to a well-documented ontology.  Is such a thing even possible?  A quick Google search later, and I arrive at the Semantic Annotations for WSDL and XML Schema. Here’s a snippet of XML that shows you what embedding an ontology model reference in XML schema looks like:

<xs:simpleType name="Confirmation"

The only bad thing about this approach is that if you’re trying to read the documentation found in the ontology, you’re going to be doing a lot of clicking. Or you’ll need to transform the schema into a more comprehensive human readable document that consolidates the information in the schema and the ontology into one document.


About aspenbio

I write software for scientists. I'm interested in Java/Groovy/Grails, the Semantic Web and Cancer Biology.
This entry was posted in Bioinformatics, Informatics, Science Blogging, Semantic Web and tagged , , . Bookmark the permalink.

2 Responses to Adding Semantics to XML Schemas

  1. voldrani says:

    It looks like ‘scope’ is Uniprot publishes a lot (all?) of its data in RDF, so the way I got to that was from an example record,, and then comparing the xml and the rdf/xml outputs. In the rdf/xml, there is:


    In the XML namespace

    • aspenbio says:

      It sounds like you’re proving my point — if the schema were annotated to begin with, the detective work needed to understand the XML would be much easier or perhaps unnecessary to begin with.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s