Adding Semantics to XML Schemas

Recently, I was working on a UniProt parser for the next BioGroovy release, and while looking through the UniProt schema


UniProt (Photo credit: Wikipedia)

I started to wonder why the elements in schemas weren’t annotated with references to ontologies?  Let’s take a look at an example.

A typical UniProt record contains many “reference” elements like this:

<reference key="2">
 <citation type="journal article" date="1998" name="Biochem. Biophys. Res. Commun." volume="244" first="285" last="292">
 <title>cDNA cloning, expression, subcellular localization, and chromosomal assignment of mammalian aurora homologues, aurora-related kinase (ARK) 1 and 2.</title>
 <person name="Shindo M."/>
 <person name="Nakano H."/>
 <person name="Kuroyanagi H."/>
 <person name="Shirasawa T."/>
 <person name="Mihara M."/>
 <person name="Gilbert D.J."/>
 <person name="Jenkins N.A."/>
 <person name="Copeland N.G."/>
 <person name="Yagita H."/>
 <person name="Okumura K."/>
 <dbReference type="PubMed" id="9514916"/>
 <dbReference type="DOI" id="10.1006/bbrc.1998.8250"/>
 <scope>VARIANT ILE-57</scope>

And my question was “What do the Scope elements signify”.  Are they evidence of an assertion about the function of a protein similar to the GeneRIFs in EntrezGene?  In the example above, does VARIANT ILE-57 mean that “there exists a variant of this protein called “ILE-57” and this document is evidence of that assertion?

In order to answer that question, I started digging around in the XML Schema file for the UniProt file format.  The schema’s rather cryptic answer was this:

<xs:element name="scope" type="xs:string" maxOccurs="unbounded"><xs:annotation><xs:documentation>Describes the scope of a citation. Equivalent to the flat file RP-line.</xs:documentation></xs:annotation></xs:element>

While musing over this answer (“What’s an RP-line”), it occurred to me that some light might be cast on the situation if the elements in the schema pointed to a well-documented ontology.  Is such a thing even possible?  A quick Google search later, and I arrive at the Semantic Annotations for WSDL and XML Schema. Here’s a snippet of XML that shows you what embedding an ontology model reference in XML schema looks like:

<xs:simpleType name="Confirmation"

The only bad thing about this approach is that if you’re trying to read the documentation found in the ontology, you’re going to be doing a lot of clicking. Or you’ll need to transform the schema into a more comprehensive human readable document that consolidates the information in the schema and the ontology into one document.


About Mark Fortner

I write software for scientists doing drug discovery and cancer research. I'm interested in Design Thinking, Agile Software Development, Web Components, Java, Javascript, Groovy, Grails, MongoDB, Firebase, microservices, the Semantic Web Drug Discovery and Cancer Biology.
This entry was posted in Bioinformatics, Informatics, Science Blogging, Semantic Web and tagged , , . Bookmark the permalink.

2 Responses to Adding Semantics to XML Schemas

  1. voldrani says:

    It looks like ‘scope’ is Uniprot publishes a lot (all?) of its data in RDF, so the way I got to that was from an example record,, and then comparing the xml and the rdf/xml outputs. In the rdf/xml, there is:


    In the XML namespace

    • aspenbio says:

      It sounds like you’re proving my point — if the schema were annotated to begin with, the detective work needed to understand the XML would be much easier or perhaps unnecessary to begin with.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s