Google Glass in the Lab

During our recent San Diego Informatics Lunch program, we had a lively discussion about the possibilities that Google Glass and other wearables bring to research labs.  To get the conversation started, I posted a video clip from Digital Sciences, and Imperial College London, where they had envisioned what an augmented reality might look like in a research lab.

Is This a Tube I See Before Me?
Borrowing a line from MacBeth, one of the most commonly cited use-cases was the need to see the contents of a tube.  Imagine being able to scan the barcode of a tube and instantly see information about the compound or reagent in the tube. This might be the compound ID, and structure.  Or for a biological sample, you might have the protein symbol and 3D structure or sequence.

Admittedly you could probably do the same thing with the Google Goggles mobile app and QR codes instead of barcodes.  But this would be more cumbersome to do and you run the risk of contaminating the surface of your phone with the residue of whatever experiment was last on your gloves.

The advantage that Google Glass brings to this scenario, is that the experience would all be hands free.  You simple look at the tube, and Glass tells you what’s in it.

How Could Glass Fit Into the Workflow?
In some commercial labs, the process of performing an experiment is tightly controlled. A lab tech may be working from a barcoded work order that contains a printed protocol, and barcoded lists of materials to be used.  The tech usually scans their badge, the work order, and the barcodes of the materials before performing the experiment.

Imagine being able to simply glance down at your bench, automatically register the materials and the work order, and verify that all of the materials actually belong to the order that you’re currently working on.  Any materials that are out of date, might appear highlighted in red.

Imagine that you’ve just run an experiment with a plate of samples.  You glance down at the plate, and it shows you the results for each well of the plate.  If one of those samples has a problem, the well appears in red.  Suppose it’s not the pipetter.  Could it be the compound? Suppose you could instantaneously scroll through all of the plates where a given compound was used to see.  Or compare the results of the first batch of a compound against the current batch.  Did both batches of compound come from the same source? Show me the vendor card for this compound.  What was the age of each batch before the compounds were used?  Show me the batch information.

Are you free?
In many labs, it’s a constant hunt for a free machine (a gene chip machine that’s not being used, or a thermocycler that’s free).  Suppose you could look at a machine and know that it would be free in half-an-hour.  Time for a quick run around the park before returning to the lab.

Inventory with Glass
Finding materials within a lab can also be challenging. Racks and plates can become overiced, making it difficult to locate them without spending lots of time rummaging around.  With many freezers you want to limit the number of times that you put a rack of tubes through a freeze-thaw cycle, in addition to limiting the number of times the freezer itself is opened.

Imagine a shopping cart scenario where the lab manager knows that on a given day, these 5 work orders will be processed.  Rather than opening up the freezers each time you start a workflow, you could create a consolidated materials list in a shopping cart.  Simply staring at a freezer could show you an “X-Ray” view of the freezer with the locations of the racks you needed. You open the door quickly, pull the racks you need, and shut the door within a few seconds, without the need to dawdle in front of the freezer like a teenager with the munchies.

To transfer a tube from one rack to another, you simply move the tube to an open slot, and re-scan the rack.

The same scenario would also be useful for identifying the locations of expired materials within the freezer.

Safety Glass
Safety is another scenario where Google Glass could be very handy.  Suppose that a colleague has accidentally spilled the contents of a tube on themselves.  Do you simply wash it off?  Imagine looking at the tube and getting the Material Safety Data Sheet (MSDS) for the contents of the tube.

And speaking of safety, the current version of Google Glass can be used with prescription lenses, and potentially attached to safety glasses, depending on the type of frames you have.

Gesture Support
Currently, the primary way that users interact with Glass is through voice command, and by swiping the right temple of the frames.  But that can be annoying to your colleagues when every other word out of your mouth is some command to an unseen entity.

Currently Glass supports the ability to swipe the touch pad at your temple to swipe a card out of your field of view.  You can also navigate by voice using the keyword “Next”, by moving your head, or by blinking. But full-on gesture support is not there yet.  So no waving your hands around the in front of your face.  It should be noted though, that constantly swiping at the touch pad could also pose contamination and safety problems, especially near your eyes.

Getting Google Glass
If you haven’t heard by now, Google is opening up its Explorer program for the day to let anyone (with $1500 to spare) to purchase Glass.  You can find out more about it here.

How are you using (or planning to use Google Glass) in your lab?  Is Glass still in search of that killer app? Drop me a line.

Posted in Informatics | Tagged | Leave a comment

Notes from San Diego KNIME User Group Meeting

This week Brock Luty of Dart Neuroscience hosted the first San Diego KNIME User Group meeting.  Michael Berthold, one of the founders of KNIME, came down from San Francisco to deliver a briefing

Mark Donnelly of XIFIN demonstrated how a business that had formerly been run on a series of linked spreadsheets, was able to use KNIME to collect financial data, analyze past performance and predict future performance using a predictive analytics model that he created in KNIME.

George Nicola of UCSD’s Mike Gilson Lab demonstrated how a workflow could be used to identify potential targets for a compound using a model based on the chemical fingerprints of compounds.  The workflow was trained with sets of compounds that were grouped into “active”, and “inactive” compounds.  Where active compounds showed activity against dopamine receptors.  The workflow examined the fingerprints of these compounds, and compared them with libraries of compounds, and when combined with DrugBank data was able to identify the previously unknown targets for compounds.

Lastly, Alex Guazzelli of Zementis presented on ADAPA (a platform for real-time scoring) and on UPPI ( a means for scoring either within your database, or within Hadoop).

Posted in Drug Development, Informatics | Tagged , , , , | Leave a comment

Ideas from the Cytoscape Workshop

During the recent Cytoscape workshop, I started out re-acquainting myself with Cytoscape and learning about some of the new features.  As I did so, I found myself accumulating a wishlist of features that I thought I would share:

Using Templates To Export Cytoscape Sessions To HTML
The new JSON export feature, looks very useful.  With it you can export your data and style information in JSON format, and this is useful for when you want to share your data with colleagues who just want to browse through your network.

The goal with previous versions of Cytoscape has always been to end up with publication-quality graphics, but in industry this isn’t always the goal.  For the most part you simply want to share the information with colleagues, and you want the results to be a little more dynamic than a static image.  You want genes to be linked to EntrezGene (or some internal target database).  You want to see sequence and structural information about the target, in addition to information on existing drugs, or existing internal compounds that may address those targets.

The only problem with this approach is that you still have to do some work with HTML and JavaScript in order to really make your work browseable.

It would also be nice if there was a way to export my network via a simple user-definable template. There are templating systems like Velocity and Freemarker that you can use to make this easier. This would let anyone with minimal (or no) HTML skills dump out the data, style information, and a skeletal web page into a directory.  If you want to link gene nodes to EntrezGene, you simply add an IFRAME and a bit of JavaScript to your template and you have a simple way of browsing the nodes in your network.  The templates could be shared in the cytoscape community so that members can learn from each other.  And publishing your results is as simple as sticking your web page in Google Drive, or GitHub.

Manually Creating Networks
Most of the workflows demonstrated in the workshop assumed that you’re starting out by importing networks from external sources.  In my case however, I have a lot of individual nodes that represent genes, clinical stages, disease processes, and pathways, and I want to manually create this network.

Manually creating nodes and edges can be a little “clicky”.  You have to right-click on the canvas, select Add, select Node from a submenu, repeat the process for the second node, and then perform a similar operation to create an edge.  One way simplify the process would be to add “mode” buttons — a browse mode for browsing through the diagram, and a design mode to edit the diagram.  Whenever you’re in edit mode, mouse clicks would be interpreted as node creation events, and drag events would be interpreted as edge creation events.  If you drag from a node into an empty spot, it creates both the edge and the node for you. This would make it easier and faster to manually create networks.

Entering information in the Node/Edge/Network tables could also be a little easier. The new document icon currently means “create column” which can be confusing to new users who expect it to mean “create a document”. The usual icon in Excel or Open Office for adding a column shows a table with a selected column and a plus sign overlaid on top of the icon.  A similar icon with a minus sign is usually used to represent the delete column operation.

Also, sometimes you just want to create a series of nodes and edges and worry about how they layout later.  To support that you’d need to be able to add and delete rows in the node/edge tables, and support tabbing between cells in the tables.

Semantic Columns
As I was tinkering around with my data, I thought it would be useful to be able to create columns that had semantic meaning.  For example, if I add a geneId column, I’d like to be able to indicate that this is an EntrezGene ID perhaps using an EDAM ontology URI like this  It should then be fairly simple to map the URI to a URL pattern like{cell-value}.  This would be a fairly generic approach that would allow people to configure support for linking out to different resources without the need to add code to support it.  You would simply tag a column to indicate that “this is an Entrez Gene ID”. Whenever you clicked on a node, it would construct a link out menu item for each of the mappings you had defined. These kinds of mappings could also be shared with the community.

Representing Evidence Using Edge Weight
I like the idea of representing evidence as an edge attribute however, what I’d really like to be able to do is tie a list of PubMed IDs to a single edge, rather than creating an edge per evidence link and then try to collapse those edges.  Assertions that lack evidence might appear with a dashed line, or a faded line.  Assertions that had a lot of evidence might appear darker and bolder.

It would also be useful to be able to associate different types of search patterns with different “classes” of edges. In my network, I have some edges that represent mutation events, others represent differential expression, or methylation events — each type of event has a different style associated with it, and I want to have a different type of literature search associated with each edge “class”. For mutation events, this might look something like “pubmed: {source-node} is mutated in {destination-node} AND pancreatic adenocarcinoma” or “google-scholar: {source-node} is differentially expressed in {destination-node} AND pancreatic adenocarcinoma”.  When you click on an edge, the popup menu would show you menu items for each of the searches you had previously configured, and let you look for new evidence that backs up your assertion.  You would then be able to bookmark any search result, and add it to the list of evidence items for your edge.  You could also have a “search edge evidence” menu item that would perform these kinds of searches across your entire network and highlight edges that had new evidence.

Representing Temporal Networks
One of the topics mentioned in the meeting, concerned how to represent temporal information in Cytoscape.  I’d like to be able to show different events at different stages in the progression of pancreatic cancer.  For example, I want to show that in PanIN-1A lesions, you have KRAS mutated in perhaps 50% of lesions, and that percentage grows as you reach PanIN-3. It would be nice if we there was some time-based slider where you tie could an attribute in each node to a discrete time point. As you moved the slider between these time points, nodes in the network might fade-in and fade out. In my case, the KRAS node would get darker as I dragged the slider through the different PanIN stages.

You could also use this to show different cellular processes at work at each stage of a disease.  Of course the same type of approach could be used to view gene expression data at various time points after a treatment, or any other kind of data that had discrete time points.

Node and Edge Classes

By default, Cytoscape provides you with a table that you can use to enter node and edge data.  The table is flexible in that you can add columns for new attributes; however, all of the attributes are the same for all of the nodes and edges.  This means that if your diagram has both genes and pathways on it, and you create a geneId column for genes, and a pathwayId column for pathway nodes, then any node (regardless of its type) will have those attributes.  This means that pathways will have geneIds and genes will have pathwayIds — something which doesn’t really make much sense.

Imagine if you could tell Cytoscape that there is a class of node called “gene”.  That class has an attribute called “geneId”, and another attribute called “goTerms” which contains a list of gene ontology terms.  You could also define other classes like pathways, disease processes, etc and save the definitions of those classes and how they are to be styled in your project.

You could also define mappings between attributes in your gene class, and attributes in EntrezGene’s XML.  If you click on the new gene node, you could enter a gene ID, and Cytoscape would fetch the data from EntrezGene, extract the bits you’re interested in, and save them in the gene node you had created.

Although that last step might be a bit “twiddly” for some folks, it’s something that could be defined once, and shipped with Cytoscape as part of it’s default configuration. If you want to change how a gene data gets extracted from EntrezGene, or how edge data gets extracted from PubMed, you could modify this configuration to suit your needs.

Localization-based Layout
Lastly, one of the things I’d really like to see is localization-based layout.  Typically, when you view pathways, the nodes are laid out in a manner that shows where the genes are localized in a cell.  This makes it easier to visualize the role of the gene in the context of a cell.

Beyond just cell-based localization though, it would be useful to be able to see where a gene functions in relation to a tumor, or some other structure. This might mean showing genes localized in stromal tissue, in successive generations of tumor colonies, in premetastatic niches, etc.  This makes it easier to visualize the roles of these genes with respect to specific disease processes.  This might be accomplished by importing an image (either a PNG or SVG) of particular tissues and manually positioning nodes on a layer above the image.

After discussing a number of these ideas with the Cytoscape team, I was pleased to learn that some of them would be incorporated into upcoming releases.  I’m looking forward to giving them a try when they become available.

Posted in Informatics | Leave a comment

Tips for Better PubMed Searches

Logo for PubMed, a service of the National Lib...

Logo for PubMed, a service of the National Library of Medicine. (Photo credit: Wikipedia)

PubMed is one of those tools that everyone uses but not everyone uses it well.  Usually your enthusiasm for investigating the search results will wane after the third or fourth page of results. Here are a few tips for making your searches more targeted and successful.

Use MeSH Terms
PubMed uses Medical Search Heading (MeSH) terms to make the process of searching for papers easier.  The challenge is knowing which terms to use.  Luckily the Advanced Search page can help you get familiar with some of the search terms.  This tutorial from PubMed can help you get started.

Let’s take a look at a few of the most frequently used terms.  Suppose you wanted to do a search for a paper whose title contained the exact text “Early onset pancreatic cancer”. To do this search simply enter “Early onset pancreatic cancer”[Title] – exactly as you see it in bold face, including the quotation marks. The [Title] MeSH term tells PubMed to look only in the Title field.

Suppose that you’ve become familiar with some of the authors in a particular field, and you want to see if there are other papers by the same author.  Use the [Author] MeSH term like this: pancreatic cancer AND Hruban RH[Author]. In this example, I’m looking for all papers written by Ralph Hruban on pancreatic cancer.

Suppose that you notice that a lot of the authors in your field are associated with a particular institution, and you’d like to narrow your focus to just those articles from Johns Hopkins or the Sol Goldman Institute.  You can add the [Affiliation] MeSH term to your existing search, like this: AND (Johns Hopkins[Affiliation] OR Sol Goldman[Affiliation]).

Keep an eye on the Search details field on the right-hand side of the PubMed search results page as you build and run your query.  As you perform your search it shows you exactly how your search terms are decomposed.  In my case, pancreatic cancer becomes pancreatic neoplasms and any filters I add to my search are automatically turned into MeSH terms.

If you want to narrow your search to articles published over the last 5 years, use this clause: AND (“2008/12/24″[PDat] : “2013/12/22″[PDat])

The [PDat] field stands for the Publishing Date.

If you want to only see results that contain full free text, use the “loattrfree full text”[sb] clause.

Use The Filter’s Luke
PubMed provides a number of built-in filters that let you further narrow the search results.  These filters appear on the left side of the results page.  You can tell PubMed that you’re only interested in papers published over the last 5 years, or papers that only have “Full free text”.  Perhaps you’re only interested in articles concerning clinical trials — there’s a filter for that.

Once you have the search working the way you want it to, you can save the search to your MyNCBI account.

You’ll find more helpful tips here.  Happy Searching!

Posted in Informatics, Science Blogging | Tagged , , | Leave a comment

Notes from the Cytoscape Workshop

Recently, the Sanford Burnham Consortium hosted a workshop on Cytoscape.  Cytoscape is a tool for representing networks.  It’s primary users are biologists wanting to understand different types of biological networks, but it can also be used for computer networks, or social networks. The project is a multi-institutional collaboration between UCSD, the Institute for Systems Biology, Memorial Sloan Kettering, and the Pasteur Institute among others.  And this bit of workshop pre-reading gave a pretty good overview of how biologists could use Cytoscape in the wild.

The workshop started with a number of introductory talks that provided an overview of the project, showed some examples of how Cytoscape has been used in the past, and showed an example of recent work in analyzing prostate cancer genes.  At that point the meeting broke into separate sessions for beginners and advanced users, and since a colleague and I were attending we each took different sessions.

The advanced session, which I attended, gave a good overview of some of the features from the upcoming 3.10 release of Cytoscape. You can download the current beta version of it here.  The release includes features like:

  • The ability to export networks and style information as JSON.  This feature is particularly important if you want to be able to create a network in the Cytoscape desktop application, and share that data with colleagues via a simple web page.  The Cytoscape.js library can be used to render the JSON data.
  • An improved welcome screen that helps you get started with Cytoscape.  The previous version opened onto a blank screen and left you wondering “now what do I do”.
  • A command-line and REST interface.
  • Improved Vizmapper — this is the tool that lets you assign styles to different types of nodes and edges in the network.
  • Faster network filtering.
  • Improved support for PSICQUIC

You can see the workshop materials here.

Human interactome - round2 - Cytoscape

Human interactome – round2 – Cytoscape (Photo credit: andytrop79)

The usual results that I end up with when playing with Cytoscape are less than stellar and share more than a passing resemblance to a hairball.  So I was determined to learn some tips and tricks for visualizing lots of information more clearly.

In upcoming posts, I’ll talk more about these features and some of the ideas that the workshop generated, and some of the results from my own project.

Posted in Informatics | Tagged , , , | Leave a comment

What does the 23andMe letter mean?

Image representing 23andMe as depicted in Crun...

Image via CrunchBase

Recently, the FDA sent a warning letter to 23andMe telling them to stop marketing their $99 diagnostic kit.  This highlighted some key problem gaps — specifically, with the molecular diagnostics (MDx) industry and with the healthcare industry in general.

Making Tests Actionable
Traditionally, a physician will order a test if there is a condition that can’t be readily diagnosed with the tools at hand.  Currently, the tests that a physician may request are usually those that insurance companies will pay for, and that are easily actionable.  To be actionable, the results must point to an action that is part of the standard of care.

Since the field of molecular diagnostics is still in its nascent form, there are only a few conditions where this is currently done.  Moreover, genomic medicine has a rather uneven adoption amongst the curricula of medical colleges and, to a certain extent, remains part of the bailiwick of specialities like oncology and pathology.  Stanford currently lists it as an elective, but provides additional training through it’s pathology program. Harvard also provides a series of electives. UCSD has an Institute of Genomic Medicine. But in general, genomic medicine is not part-and-parcel of your average GPs toolkit.  And that’s unfortunate, because most patients will see their GP first, and very often a condition may go undiagnosed.

Ideally, a baseline set of tests should be performed as a matter of course (just as you would ask a patient to provide a medical history).  In the TV show House, the protagonist was always fond of pointing out that the patient always lies, whether intentionally or unintentionally.  And the sad reality of it is that most patients are woefully under-informed about the incidence of disease within their own extended family.  One might know about ones parents medical condition, but it’s not that common to know about ones aunts and uncles and grandparents.

A well-designed battery of tests could help physicians set up a personalized medical surveillance program.  If you have a mutation in BRCA2, for example, it would be worthwhile not only to watch for signs of breast cancer, but also for ovarian and pancreatic cancer.  The latter are often asymptomatic until their terminal stages, and early diagnosis is key to survival.

The Weight of Evidence
In order for clinicians to succeed, each “read-out” needs to have enough clear-cut evidence behind it.  For example, let’s say that a journal publishes an article that asserts that a germline mutation in gene X is predictive of a patient’s likelihood of getting cancer.  Let’s also say that in the next 6 months since publication 3 other papers are published confirming the results.  Is that enough evidence to be able to create and sell a diagnostic kit for the mutation? What happens if it turns out that the researchers all had a financial interest in the diagnostic? What happens if the researcher conducted the experiment 6 times and the first 5 didn’t yield a publishable result, but the 6th did?

The Evolving Role of the GP
So where does this leave the GP? If a patient comes to you and says they took a 23andme tests that shows they have a mutation in BRCA2, what do you do?  Refer them to a genetics counselor?  At some point, the patient is going to come back to you since for all intents and purposes you are the face of the medical community.  Moreover, how do you take the information that the patient gets and translate that into a viable set of actions that their insurance will pay for, and that provides adequate surveillance of potential conditions?  Do you refer your patient to an oncologist even though they show no signs of cancer but may have predisposition towards cancer? What about other conditions and other specialists?  The results might indicate that the patient is prone to a wide array of potential conditions.  How much surveillance is warranted?  We’ve gone from on extreme to the other over the past 5 years with respect to prostate cancer, and other forms of cancer are likely to follow the same path.

Context is Everything
Genomic tests could be used in a variety of contexts though.  So far, I’ve only discussed surveillance scenarios, but these types of biomarkers can be used to adjust medication like warfarin, or to recommend one course of treatment over another.  As it is, more and more clinical trials make use of biomarkers to identify patient populations that may benefit from a particular treatment. Perhaps one day, we will have prognostic biomarkers capable of staging a disease, and telling the patient “you are here” in the progression of your disease, and recommending a specific cocktail of drugs.  But we aren’t there yet.  And we need publicly accessible databases like ClinVar to help point the way.  For most indications, we just don’t have enough patient data to be able to make clinical decisions based on these tests — and that ultimately is what the FDA letter is point towards.

What’s Needed
In order to really make molecular diagnostics part-and-parcel of medical care we need:

  • Genomic medicine to be an integral part of medical training and continuing medical education.  This will be a moving target, and physicians need to know how to apply what we know now in a clinical setting.
  • A clear-cut, critical path similar to the pharmaceutical industry that leads from repeatable, verifiable academic research to diagnostic development and approval.
  • Support for both baseline tests and ongoing surveillance and insurance programs that will pay for them.
  • Decision trees for the medical community that connect the “read-outs” from tests to specific treatment and surveillance plans.
  • Electronic health records that make it easy to mine that data and learn from it.
  • As new findings, and tests are available, it should be easy for a physician to see new recommendations for changes to the patient’s plans in the same way that you see new email.
  • A degree of latitude when it comes to using these tests. Just as a physician can now make off-label use of a drug, there are going to be acute conditions for which we don’t have enough evidence that clearly points to a treatment program.  Physicians need to be able to make off-label use of diagnostics.
Posted in Uncategorized | Tagged , , , , | Leave a comment

Tinkering With EntrezAjax

English: Logo of the United States National Ce...

English: Logo of the United States National Center for Biotechnology Information, a part of the National Library of Medicine, itself part of the National Institutes of Health. Français : Logo du NCBI (Photo credit: Wikipedia)

Recently I was working on a web project, and I wanted to see how feasible it would be to build a web application that had no back end.  No database, or app server, just a simple web page, some JavaScript and some data.

Since I also needed to fetch data from EntrezGene, and other NCBI web sites, I decided that’d I try out the EntrezAjax RESTful web service.  Unlike NCBI’s eUtils API, EntrezAjax returns the results of queries as JSON datagrams, which makes it easier to integrate it into your web page.

The other nice thing about it is that it allows you to chain queries together.  For example, in most cases, I don’t want to do a PubMed search, parse the article IDs that it returns, and turn around and fetch those PubMed records.  That’s just a waste of time.  I want to do the search and get useable data in one transaction, and EntrezAjax lets me do that.

The other nice thing about it is that it’s hosted on Google’s Appspot cloud infrastructure. Which means that the infrastructure can expand to meet the demands placed on it by developers.

If you’d like to give it a try, go here and signup for an API key.  You’ll need the key in order to run queries.   You can read more about it here.

Posted in Informatics, Uncategorized | Tagged , , , , | 1 Comment