After my recent post on the current state of pancreatic cancer research, a reader got me wondering if we are indeed making any measureable progress towards understanding and treating pancreatic cancer.
My first thought was that a simple year-by-year count of the papers on the subject would give me a crude measure. I’d been itching to try out some of the functions in Google Doc’s spreadsheet — especially the ones that you don’t find in Microsoft Excel. Google Docs has an “ImportXML” function that allows you to download XML and extract data from the XML using XPath expressions. In this case, I want to run a simple PubMed query and extract a count of the number of papers found in a particular year. NCBI has a set of RESTful web services called eUtils that we’ll use for this purpose.
The key pieces of the URL are the term parameter, and the mindate and maxdate parameters. In this case, I’m looking for papers on pancreatic cancer published between the mindate and maxdate. This query returns an XML document — you can click on the link to see what the query returns.
We’re primarily interested in the number of results found in a given year, so we use the following XPath expression to extract the paper count: /eSearchResult/Count
Lastly, we create a Google Docs spreadsheet to put together the results using the formula shown below:
You can see the results here.
Admittedly, this is rather a crude measure of progress. All it really takes is one seminal paper to make a big difference in the field. It begs the question though, “what are the hallmarks of true progress”? For example, if we looked at the papers produced in the run-up to the approval of Imatinib (Gleevec) are there any indications that a revolutionary drug is in the offing? At what point in the history of the literature do we have an indication that we’ve identified the right target for CML? And how much time elapsed between target identification/validation and the approval of Imatinib?