Ideas from the Cytoscape Workshop

During the recent Cytoscape workshop, I started out re-acquainting myself with Cytoscape and learning about some of the new features.  As I did so, I found myself accumulating a wishlist of features that I thought I would share:

Using Templates To Export Cytoscape Sessions To HTML
The new JSON export feature, looks very useful.  With it you can export your data and style information in JSON format, and this is useful for when you want to share your data with colleagues who just want to browse through your network.

The goal with previous versions of Cytoscape has always been to end up with publication-quality graphics, but in industry this isn’t always the goal.  For the most part you simply want to share the information with colleagues, and you want the results to be a little more dynamic than a static image.  You want genes to be linked to EntrezGene (or some internal target database).  You want to see sequence and structural information about the target, in addition to information on existing drugs, or existing internal compounds that may address those targets.

The only problem with this approach is that you still have to do some work with HTML and JavaScript in order to really make your work browseable.

It would also be nice if there was a way to export my network via a simple user-definable template. There are templating systems like Velocity and Freemarker that you can use to make this easier. This would let anyone with minimal (or no) HTML skills dump out the data, style information, and a skeletal web page into a directory.  If you want to link gene nodes to EntrezGene, you simply add an IFRAME and a bit of JavaScript to your template and you have a simple way of browsing the nodes in your network.  The templates could be shared in the cytoscape community so that members can learn from each other.  And publishing your results is as simple as sticking your web page in Google Drive, or GitHub.

Manually Creating Networks
Most of the workflows demonstrated in the workshop assumed that you’re starting out by importing networks from external sources.  In my case however, I have a lot of individual nodes that represent genes, clinical stages, disease processes, and pathways, and I want to manually create this network.

Manually creating nodes and edges can be a little “clicky”.  You have to right-click on the canvas, select Add, select Node from a submenu, repeat the process for the second node, and then perform a similar operation to create an edge.  One way simplify the process would be to add “mode” buttons — a browse mode for browsing through the diagram, and a design mode to edit the diagram.  Whenever you’re in edit mode, mouse clicks would be interpreted as node creation events, and drag events would be interpreted as edge creation events.  If you drag from a node into an empty spot, it creates both the edge and the node for you. This would make it easier and faster to manually create networks.

Entering information in the Node/Edge/Network tables could also be a little easier. The new document icon currently means “create column” which can be confusing to new users who expect it to mean “create a document”. The usual icon in Excel or Open Office for adding a column shows a table with a selected column and a plus sign overlaid on top of the icon.  A similar icon with a minus sign is usually used to represent the delete column operation.

Also, sometimes you just want to create a series of nodes and edges and worry about how they layout later.  To support that you’d need to be able to add and delete rows in the node/edge tables, and support tabbing between cells in the tables.

Semantic Columns
As I was tinkering around with my data, I thought it would be useful to be able to create columns that had semantic meaning.  For example, if I add a geneId column, I’d like to be able to indicate that this is an EntrezGene ID perhaps using an EDAM ontology URI like this http://edamontology.org/data_1027.  It should then be fairly simple to map the URI to a URL pattern like http://www.ncbi.nlm.nih.gov/gene/{cell-value}.  This would be a fairly generic approach that would allow people to configure support for linking out to different resources without the need to add code to support it.  You would simply tag a column to indicate that “this is an Entrez Gene ID”. Whenever you clicked on a node, it would construct a link out menu item for each of the mappings you had defined. These kinds of mappings could also be shared with the community.

Representing Evidence Using Edge Weight
I like the idea of representing evidence as an edge attribute however, what I’d really like to be able to do is tie a list of PubMed IDs to a single edge, rather than creating an edge per evidence link and then try to collapse those edges.  Assertions that lack evidence might appear with a dashed line, or a faded line.  Assertions that had a lot of evidence might appear darker and bolder.

It would also be useful to be able to associate different types of search patterns with different “classes” of edges. In my network, I have some edges that represent mutation events, others represent differential expression, or methylation events — each type of event has a different style associated with it, and I want to have a different type of literature search associated with each edge “class”. For mutation events, this might look something like “pubmed: {source-node} is mutated in {destination-node} AND pancreatic adenocarcinoma” or “google-scholar: {source-node} is differentially expressed in {destination-node} AND pancreatic adenocarcinoma”.  When you click on an edge, the popup menu would show you menu items for each of the searches you had previously configured, and let you look for new evidence that backs up your assertion.  You would then be able to bookmark any search result, and add it to the list of evidence items for your edge.  You could also have a “search edge evidence” menu item that would perform these kinds of searches across your entire network and highlight edges that had new evidence.

Representing Temporal Networks
One of the topics mentioned in the meeting, concerned how to represent temporal information in Cytoscape.  I’d like to be able to show different events at different stages in the progression of pancreatic cancer.  For example, I want to show that in PanIN-1A lesions, you have KRAS mutated in perhaps 50% of lesions, and that percentage grows as you reach PanIN-3. It would be nice if we there was some time-based slider where you tie could an attribute in each node to a discrete time point. As you moved the slider between these time points, nodes in the network might fade-in and fade out. In my case, the KRAS node would get darker as I dragged the slider through the different PanIN stages.

You could also use this to show different cellular processes at work at each stage of a disease.  Of course the same type of approach could be used to view gene expression data at various time points after a treatment, or any other kind of data that had discrete time points.

Node and Edge Classes

By default, Cytoscape provides you with a table that you can use to enter node and edge data.  The table is flexible in that you can add columns for new attributes; however, all of the attributes are the same for all of the nodes and edges.  This means that if your diagram has both genes and pathways on it, and you create a geneId column for genes, and a pathwayId column for pathway nodes, then any node (regardless of its type) will have those attributes.  This means that pathways will have geneIds and genes will have pathwayIds — something which doesn’t really make much sense.

Imagine if you could tell Cytoscape that there is a class of node called “gene”.  That class has an attribute called “geneId”, and another attribute called “goTerms” which contains a list of gene ontology terms.  You could also define other classes like pathways, disease processes, etc and save the definitions of those classes and how they are to be styled in your project.

You could also define mappings between attributes in your gene class, and attributes in EntrezGene’s XML.  If you click on the new gene node, you could enter a gene ID, and Cytoscape would fetch the data from EntrezGene, extract the bits you’re interested in, and save them in the gene node you had created.

Although that last step might be a bit “twiddly” for some folks, it’s something that could be defined once, and shipped with Cytoscape as part of it’s default configuration. If you want to change how a gene data gets extracted from EntrezGene, or how edge data gets extracted from PubMed, you could modify this configuration to suit your needs.

Localization-based Layout
Lastly, one of the things I’d really like to see is localization-based layout.  Typically, when you view pathways, the nodes are laid out in a manner that shows where the genes are localized in a cell.  This makes it easier to visualize the role of the gene in the context of a cell.

Beyond just cell-based localization though, it would be useful to be able to see where a gene functions in relation to a tumor, or some other structure. This might mean showing genes localized in stromal tissue, in successive generations of tumor colonies, in premetastatic niches, etc.  This makes it easier to visualize the roles of these genes with respect to specific disease processes.  This might be accomplished by importing an image (either a PNG or SVG) of particular tissues and manually positioning nodes on a layer above the image.

Followup
After discussing a number of these ideas with the Cytoscape team, I was pleased to learn that some of them would be incorporated into upcoming releases.  I’m looking forward to giving them a try when they become available.

Advertisements

About aspenbio

I write software for scientists. I'm interested in Java/Groovy/Grails, the Semantic Web and Cancer Biology.
This entry was posted in Bioinformatics, Informatics and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s