Pancreatic Cancer and the Microbiome


Beyond the genomic complexity of the disease, we’ve begun to examine the effect of the microbiome on pancreatic cancer. In a recent paper in Science, “Potential role of intratumor bacteria in mediating tumor resistance to the chemotherapeutic drug gemcitabine” [Geller et al] reported that 76% of pancreatic cancer patients in their study (n=113) had gammaproteobacteria in their pancreas tumors. These bacteria were shown to metabolize the chemotherapeutic drug gemcitabine, which is commonly prescribed to pancreatic cancer patients. The bacteria effectively converted the drug into an inactive form, thus explaining in part why gemcitabine has such a limited effect on patients.

In addition, a growing body of evidence is accumulating around a role for oral bacteria contributing to the progression of pancreatic cancer. In 2008, Meyer, et al, identified a potential relationship between periodontal disease, and pancreatic cancer. In 2013, a European study conducted by Michaud, et al,  revealed that patients with Porphyromonas gingivalis an oral pathogen, had a more than 2-fold risk of pancreatic cancer.

A prospective risk study was also performed at NYU Langone to help confirm the earlier results, the results of which were published in a paper by Fan, et al. This study was able to add additional data points to the growing body of knowledge on the subject, including a potential role for Aggregatibacter actinomycetemcomitans.



Fan X, Alekseyenko AV, Wu J, Peters BA, Jacobs EJ, Gapstur SM, Purdue MP, Abnet CC, Stolzenberg-Solomon R, Miller G, Ravel J, Hayes RB, Ahn J. (2018) Human oral microbiome and prospective risk for pancreatic cancer: a population-based nested case-control study. Gut. 2018 Jan;67(1):120-127. doi: 10.1136/gutjnl-2016-312580. Epub 2016 Oct 14.


Meyer MS, Joshipura K, Giovannucci E, Michaud DS (2008) A review of the
relationship between tooth loss, periodontal disease, and cancer. Cancer
Causes Control 19(9): 895–907.
Michaud DS, Izard J, Wilhelm-Benartzi CS, You DH, Grote VA, Tjonneland A,Dahm CC, Overvad K, Jenab M, Fedirko V, Boutron-Ruault MC, Clavel-
Chapelon F, Racine A, Kaaks R, Boeing H, Foerster J, Trichopoulou A,Lagiou P, Trichopoulos D, Sacerdote C, Sieri S, Palli D, Tumino R, Panico S,Siersema PD, Peeters PH, Lund E, Barricarte A, Huerta JM, Molina-Montes E, Dorronsoro M, Quiros JR, Duell EJ, Ye W, Sund M, Lindkvist B, Johansen D, Khaw KT, Wareham N, Travis RC, Vineis P, Bueno-de-
Mesquita HB, Riboli E (2013) Plasma antibodies to oral bacteria and risk of pancreatic cancer in a large European prospective cohort study. Gut 62(12):


Posted in Cancer Research, microbiome, pancreatic cancer | Tagged , | Leave a comment

Subtypes of Pancreatic Cancer

In the third part of this series on pancreatic cancer, we’ll take a look at recent developments in identifying subtypes of pancreatic cancer and its potential impact on the lives of pancreatic cancer patients.

Twenty-one years ago, we had identified some of the histologically distinct types of pancreatic cancer. These neoplastic tissue types included: PanINs (pancreatic inter-epithelial neoplasms), IPMN (intraductal papillary mucinous neoplasms, MCN (mucinous cystic neoplasms). Wu et al established cancer-associated genes in the cyst fluids of IPMNs, and discovered that IPMNs harbored GNA and KRAS mutations.

In 2016, Andrew Biankin of the Wolfson Wohl Cancer Research Center in Glasgow, identified 4 distinct subtypes of pancreatic cancer in his paper “Genomic analyses identify molecular subtypes of pancreatic cancer”. These subtypes included:

  • Squamous – enriched for TP53 & KDM6A mutations, upregulation of the TP63∂N transcriptional network, hypermethylation of pancreatic endodermal cell-fate determining genes and have a poor prognosis
  • Pancreatic progenitor – differentially expressed genes involved in early pancreatic development including (FOXA2/3, PDX1, and MNX1)
  • Immunogenic – contained upregulated immune networks including pathways involved in acquired immune suppression
  • Aberrantly differentiated endocrine exocrine (ADEX) – displayed upregulation of genes that regulate networks involved in KRAS activation, exocrine (NR5A2 and RBPJL), and endocrine differentiation (NEUROD1 and NKX2-2)

With these subtypes, the hope is that clinicians can segment patient populations into groups that respond better to specific, targeted therapies. Already we’ve seen that patients with BRCA2 mutations are more likely to respond to PARP inhibitors, and researchers are currently investigating if patients in the Immunogenic subtype are more likely to respond to the latest immunotherapies.

Just recently, the Precision-Panc organization announced the first of three PRIMUS (Pancreatic cancer Individualized Multi-arm Umbrella Study) trials. PRIMUS-001 is an adaptive Phase II/III study with an integrated biomarker evaluation in patients with metastatic disease. PRIMUS-002 will aim to define biomarkers of therapeutic responsiveness in the neoadjuvant setting and is set to open in 2018. PRIMUS-003, supported by AstraZeneca, is using an immunotherapy approach and is also currently recruiting patients with metastatic disease.

Posted in Cancer Research, pancreatic cancer | Tagged | Leave a comment

The Genetics of Pancreatic Cancer

In the second part of this series on pancreatic cancer, we’ll look at the changes in our understanding of the role of familial genetics in pancreatic cancer, and how that new understanding holds promise for new therapeutics.

In the mid-1970s doctors began reporting cases of familial pancreatic cancer, where multiple first-degree relatives had presented with pancreatic cancer. Epidemiological studies could give us estimates of risk, but not the source of the risk or a mechanistic understanding of it. It wasn’t until the the earliest days of the genomic era; however, that researchers had the tools necessary to understand the genetic components of these cancers.


Estimates vary depending on the study size, but approximately 10% of pancreatic cancer patients have one of the following syndromes which predispose them to pancreatic cancer:

  • BRCA2 – a DNA repair gene that predisposes family members to breast, ovarian, pancreatic and prostate cancer.
  • Other DNA Repair Genes (PALB2 & ATM)
  • Peutz-Jeghers Syndrome (STK11/LKB1)
  • Fanconi Anemia Syndrome (FANCC)
  • Lynch Syndrome (MLH1, MSH2, MSH6, and PMS2)
  • Von Hippel-Lindau Syndrome (VHL)
  • Hereditary pancreatitis (PRSS1, SPINK1)
  • FAMMM – Familial Atypical Multiple Mole Melanoma (CDKN2A)
  • Palladin (PALLD)

The chart below shows the relative prevalence of these syndromes within the pancreatic cancer patient population.

Source: [1]

Of these syndromes, BRCA2 and CDKN2A account for the majority of mutations found in familial pancreatic cancer. For the most part, the genes associated with these syndromes have been well-established. One exception to this is Palladin, a gene first reported by Pogue-Geile, etc al[2], to be mutated in a family known as Family X. Subsequent papers [3] however, failed to recapitulate the findings.

Pooling Resources

In 2002, PACGENE, a consortium of 7 cancer centers, was formed to help centralize the collection of genetic information about pancreatic cancer within the US and Canada. The institutions involved include MD Anderson, the Mayo Clinic, Dana-Farber Cancer Institute, and Johns Hopkins. The goal of the consortium is to collect data from the broadest pool of pancreatic cancer patients and to eventually create a means of surveillance and early detection amongst families which were likely to see an increased rate in pancreatic cancer diagnosis.

Some of the participating institutions, like Johns Hopkins, had previously established pancreatic cancer tumor registries. Their tumor registry, known as National Familial Pancreas Tumor Registry (NFPTR) was established in 1994, by Ralph Hruban, one of the leading researchers in pancreatic cancer.

The NCI formed the Early Detection Research Network to help collect, curate and standardize available information on cancer-related biomarkers. The website also helps identify potential collaborators, and funding opportunities for biomarker development.

In 2015, David Zhen of the Mayo Clinic published a study[4] of the prevalence of mutations found in the families of pancreatic cancer patients using PACGENE data. The dataset confirmed that BRCA2 (3.7%) mutations were most prevalent in the patient population, followed by CDKN2A (2.5%), BRCA1 (1.2%) and PALB2 (0.6%).

In 2010, McWilliams et al[5], reported a modest association between mutations in the CFTR gene (cystic fibrosis transmembrane conductance regulator), and an increased risk in pancreatic cancer. Over the past 20 years, we’ve seen the association between breast, ovarian, and pancreatic cancer become more clearly defined. Along with that effort, we’ve seen additional associations being made with stomach cancer [6], prostate cancer and colon cancer [6,7].

Perhaps one of the most valuable datasets for familial pancreatic cancer (FPC) has been the Australian Pancreatic Cancer Genome Initiative. The data set of over 760 patients has yielded a new axon guidance pathway [8], a novel set of subtypes (more about that in a later blog post), and the discovery [9] that more that nearly 78% of FPC families had 2 affected first degree relatives (FDRs). The study also revealed that if a parent was diagnosed with pancreatic cancer, that a child would be likely to be diagnosed on average 12 years earlier; thus indicating the importance of awareness in early detection of the disease.

On a side note, most of these studies showed that smoking accelerated the course of the disease, often resulting in an earlier age of presentation than one would normally get with non-smokers. And while, we’ve known for some time that smoking increases the risk of pancreatic cancer, the mechanism of action hadn’t (until recently [10]) been investigated. Their findings concluded that the mutation rate of critical driver genes in pancreatic cancer was higher in smokers than in non-smokers, resulting in faster progression, earlier-onset, and increased aggressiveness of the disease. The mechanism by which smoking alters signalling in pancreatic cancer was also studied [11], and smoking was also linked to changes in the axon guidance pathway [11,12].

The Real Challenge Is In The Data

By far, the biggest challenge in assessing the impact of these syndromes on the patient population, is our lack of a comprehensive dataset.

PACGENE in North America, NFPTR, EUROPAC, FaPaCa in Germany, Japanese Familial Pancreatic Cancer Registry and similar efforts in other countries, have thus far provided the most comprehensive datasets, but there are always holes in the data, leading to more questions.

Efforts[4,13] to assess risk using self-reported questionnaires vs genetic counselor driven questionnaires, have found that self-reporting actually failed to identify a number of high-risk family members. Thus indicating that comprehensive efforts involving not only questionnaires, but also genomic screening are necessary to accurately determine risk within families.

Given that 53,000 patients will be diagnosed with pancreatic cancer in the United States this year, these numbers represent such a small fraction of the patient population making it difficult to assess the real prevalence of these syndromes. Many of the studies found new candidate gene variants of unknown significance. Of these, we don’t know which of them may be “driver” mutations for specific processes within pancreatic cancer, or “passenger” mutations. Without that mechanistic understanding of the roles of these genes, it’s difficult to identify plausible new drug targets, and difficult to understand how pervasive these variants may be in the population as a whole.

Putting The Data To Work In The Clinic

One of the primary goals for collecting the data, is to develop early screening and surveillance programs for families where pancreatic cancer is a known risk.

The efforts thus far have been to try and answer two basic questions: who is most at risk for pancreatic cancer, and how can we cost-effectively identify them at an early enough stage for treatment to be effective.

Who to Test

Using the information gleaned from PACGENE and similar efforts, the risk population has become easier to identify. Many of the efforts; however, have been hampered by small sample sizes, and a narrow focus on families that report 2 or more cases of pancreatic cancer in first-degree relatives. The problem with this narrowed approach to identifying high-risk kindreds is that you will miss families where first degree relatives have cancers other than pancreatic cancer. The narrow definition means that we miss relatives with breast, ovarian, and prostate cancers which have been shown to be linked to BRCA1/2 mutations. But it also means that we are less likely to uncover novel new biomarkers. It also means that the 10% number of FPC families quoted earlier, may actually be larger than we suspect.

The temporal nature of cancer also means that as patients age into that “magic window” of cancer susceptibility, bounded by age, familial risk, diabetes onset, smoking and other risk factors; we still aren’t acting quickly enough to identify other family members who may be at risk. We still treat each cancer patient as though their disease was a sporadic case of bad luck, rather than something that may have arisen out of an inherited syndrome, and requires assessment. If clinicians were to assess familial risk at the time of diagnosis of a single member of a family, we would be more likely to intercept cancer at an early stage. That one family member would in-effect play the role of “canary in the mine” to the other family members.

How to Test Them More Cost-Effectively

While studies have shown that Endoscopic Ultrasonography (EUS) can be used as a cost-effective surveillance tool for high risk families. It is still too expensive and too invasive to be made widely available beyond families that fit into these predefined high-risk categories. What’s needed is a low-cost screening mechanism that could be made part of a yearly checkup, perhaps a test that could be added to the existing blood panel. A test that can be made broadly available, and is likely to identify a wider risk pool.

Traditionally molecular diagnostics for pancreatic cancer have been rather limited. CA19-9, CA-50 and CEA over-expression has been used as biomarker for pancreatic cancer diagnosis, and treatment efficacy. As genomic technologies have made their way into clinical practice, there has been a renewed focus on identifying new multi-biomarker signatures [14], [15], [16] capable of identifying pancreatic cancer earlier, providing staging and prognostic information, as well as guiding precision medicine. These newer technologies are less invasive, and more economic to implement in a clinical setting, thus making them more likely to be used.

Recently we’ve begun to see a number of emergent diagnostic platforms applied to detecting pancreatic cancer including circulating tumor cell (CTC) based diagnostics [1,17], and microRNA-based diagnostics [18], [19]. This month, the Economist reported that Cancer Research UK was investigating the potential use of a breathalyzer-type technology capable of detecting different types of cancers based on the presence of volatile organic compounds (VOCs) found in the breath.

The Illumina spinoff company GRAIL, has been working on a new cell-free DNA diagnostic for pancreatic cancer. Because the signals are often low for early stage disease, GRAIL sequences the genomic regions thousands of times to improve the signal-to-noise ratio and sequences a larger panel of genes making it more likely to detect rare tumor DNA molecules.

GRAIL is also working on a study called the Circulating Cell-Free Genome Atlas which will include 7000 cancer patients, at various stages of disease progression, various ages, genders and smoking history.

Immunovia, the Swedish molecular diagnostics company, has developed a multiplex antibody array, known as IMMARAY PANCAN-D capable of detecting 98% of pancreatic cancers [20] [21].


Over the next 10 years we will see the continuing translation of bench-top ‘omic technologies to the bedside to help diagnose patients, and guide their treatment. As the cost of these technologies continue to decline, we will see greater use amongst family members. Beyond the clinic, the consumer genomic testing services such as 23andMe continue to proliferate in the marketplace; increasing awareness amongst family members of the potential risk of pancreatic cancer. We’re also seeing cancer diagnosis as a canary in a mine, helping bring awareness of the potential of cancer risk to be passed down through the germline.

In the past 5 years, we’ve seen a growing realization of the genetic nature of cancer which has changed the way the clinical community views and treats cancer. Rather than seeing cancer in terms of its tissue of origin; we’re seeing greater emphasis placed on the genetic commonalities between a variety of different types of cancer. This has resulted in the application of treatments from other cancers, and even other diseases, to pancreatic cancer.


1. Petersen GM. Familial Pancreatic Adenocarcinoma. Hematol Oncol Clin North Am. 2015;29: 641–653.

2. Pogue-Geile KL, Chen R, Bronner MP, Crnogorac-Jurcevic T, Moyes KW, Dowen S, et al. Palladin mutation causes familial pancreatic cancer and suggests a new cancer mechanism. PLoS Med. 2006;3: e516.

3. Klein AP, Borges M, Griffith M, Brune K, Hong S-M, Omura N, et al. Absence of deleterious palladin mutations in patients with familial pancreatic cancer. Cancer Epidemiol Biomarkers Prev. 2009;18: 1328–1330.

4. Zhen DB, Rabe KG, Gallinger S, Syngal S, Schwartz AG, Goggins MG, et al. BRCA1, BRCA2, PALB2, and CDKN2A mutations in familial pancreatic cancer: a PACGENE study. Genet Med. 2015;17: 569–577.

5. McWilliams RR, Petersen GM, Rabe KG, Holtegaard LM, Lynch PJ, Bishop MD, et al. Cystic fibrosis transmembrane conductance regulator (CFTR) gene mutations and risk for pancreatic adenocarcinoma. Cancer. 2010;116: 203–209.

6. Jakubowska A E al. BRCA2 gene mutations in families with aggregations of breast and stomach cancers. – PubMed – NCBI [Internet]. [cited 2 Dec 2017]. Available:

7. Lee MV, Katabathina VS, Bowerson ML, Mityul MI, Shetty AS, Elsayes KM, et al. BRCA-associated Cancers: Role of Imaging in Screening, Diagnosis, and Management. Radiographics. 2017;37: 1005–1023.

8. Biankin AV, Waddell N, Kassahn KS, Gingras M-C, Muthuswamy LB, Johns AL, et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature. 2012;491: 399–405.

9. Humphris JL, Johns AL, Simpson SH, Cowley MJ, Pajic M, Chang DK, et al. Clinical and pathologic features of familial pancreatic cancer. Cancer. 2014;120: 3669–3675.

10. Matsubayashi H, Takaori K, Morizane C, Maguchi H, Mizuma M, Takahashi H, et al. Familial pancreatic cancer: Concept, management and issues. World J Gastroenterol. 2017;23: 935–948.

11. Momi N, Kaur S, Ponnusamy MP, Kumar S, Wittel UA, Batra SK. Interplay between smoking-induced genotoxicity and altered signaling in pancreatic carcinogenesis. Carcinogenesis. 2012;33: 1617–1628.

12. Tang H E al. Axonal guidance signaling pathway interacting with smoking in modifying the risk of pancreatic cancer: a gene- and pathway-based interaction analysis… – PubMed – NCBI [Internet]. [cited 2 Dec 2017]. Available:

13. Lucas AL, Tarlecki A, Van Beck K, Lipton C, RoyChoudhury A, Levinson E, et al. Self-Reported Questionnaire Detects Family History of Cancer in a Pancreatic Cancer Screening Program. J Genet Couns. 2017;26: 806–813.

14. Kim J, Bamlet WR, Oberg AL, Chaffee KG, Donahue G, Cao X-J, et al. Detection of early pancreatic ductal adenocarcinoma with thrombospondin-2 and CA19-9 blood markers. Sci Transl Med. 2017;9. doi:10.1126/scitranslmed.aah5583

15. Makawita S, Dimitromanolakis A, Soosaipillai A, Soleas I, Chan A, Gallinger S, et al. Validation of four candidate pancreatic cancer serological biomarkers that improve the performance of CA19.9. BMC Cancer. 2013;13: 404.

16. Capello M, Bantis LE, Scelo G, Zhao Y, Li P, Dhillon DS, et al. Sequential Validation of Blood-Based Protein Biomarker Candidates for Early-Stage Pancreatic Cancer. J Natl Cancer Inst. 2017;109. doi:10.1093/jnci/djw266

17. Kenner BJ, Go VLW, Chari ST, Goldberg AE, Rothschild LJ. Early Detection of Pancreatic Cancer: The Role of Industry in the Development of Biomarkers. Pancreas. 2017;46: 1238–1241.

18. Huang J, Liu J, Chen-Xiao K, Zhang X, Paul Lee WN, Go VLW, et al. Advance in microRNA as a potential biomarker for early detection of pancreatic cancer. Biomarker Research. 2016;4. doi:10.1186/s40364-016-0074-3

19. Yuan W, Tang W, Xie Y, Wang S, Chen Y, Qi J, et al. New combined microRNA and protein plasmatic biomarker panel for pancreatic cancer. Oncotarget. 2016; doi:10.18632/oncotarget.12406

20. Wingren C, Sandström A, Segersvärd R, Carlsson A, Andersson R, Löhr M, et al. Identification of serum biomarker signatures associated with pancreatic cancer. Cancer Res. 2012;72: 2481–2490.

21. Ingvarsson J, Wingren C, Carlsson A, Ellmark P, Wahren B, Engström G, et al. Detection of pancreatic cancer using antibody microarray-based serum protein profiling. Proteomics. 2008;8: 2211–2219.

Posted in Cancer Research, pancreatic cancer | Leave a comment

 Remember, Remember…

nationalpancreaticcancerawareness-month“… the Fifth of November! “, so the old rhyme goes. And as every British schoolchild knows, this day marks the day that Guy Fawkes attempted to blow up the Houses of Parliament in 1605. For families of pancreatic cancer patients, November is the Pancreatic Cancer Awareness month, — a month filled with fundraising and awareness raising activities.

For my family, today marks my father’s birthday, and the day when my mother was diagnosed with pancreatic cancer 21 years ago.  For me, it’s a time to reflect on how far we’ve come in our understanding of the disease, and how far we have to go.

The advent of the genomic era brought with it a slew of technologies that fundamentally changed our understanding of pancreatic cancer. Affymetrix GeneChips that let us identify genes that were differentially expressed in pancreatic cancer; Next Generation Sequencing, Whole Genome Sequencing, Whole Exome Sequencing and RNASeq that helped us see the mutational landscape of pancreatic cancer, and much more.

The first of these discoveries was the PanIN (Pancreatic Intraepithelial Neoplasia) model that describes the early neoplastic changes that occur in pancreatic cancer. These early lesions had been nearly 100 years earlier, and had been known by various names including ductal hyperplasia, hypertrophy, metaplasia and dysplasia, but a progressive model that described the underlying genetic changes had heretofore never been attempted. In 2000, Ralph Hruban of Johns Hopkins, outlined the histopathologic changes and identified mutations in KRAS, CDKN2A, TP53, and SMAD4 as drivers in this process in his paper entitled “Progression Model for Pancreatic Cancer”.



In a follow-up paper entitled “Update to Pancreatic Intraepithelial Neoplasia”, Hruban described how the progression model had been used to create genetically engineered mouse models, which are essential to helping researchers create and test new drugs. He also described how the model could be used for improved early diagnostics.

In 2002, Christine Iacobuzio-Donahue used Affymetrix GeneChips to identify differentially expressed genes in pancreatic cancer that might be used to help diagnose the disease. This paper, entitled “Discovery of Novel Tumor Markers of Pancreatic Cancer using Global Gene Expression Technology”, identified 97 differentially expressed genes that could potentially be used as biomarkers in future diagnostic tests.

This early research gave us some clues about the early progression of the disease and potential diagnostics, but we still didn’t have an appreciation for the genetic complexity of pancreatic cancer, until 2008, when Sian Jones of Johns Hopkins published a paper entitled “Core signaling pathways in human pancreatic cancers revealed by global genomic analyses” [Jones, et al]. The paper used a limited number of tumor samples (n=24) to identify an average of 63 modifications that occur during pancreatic cancer.

The genes identified in the paper fell into the following categories/pathways: KRAS signalling, TGFB signalling, JNK signalling, integrin signalling, Wnt/Notch signalling, hedgehog signalling, control of G1/S Phase transition, apoptosis, DNA damage control, small GTP-ase signalling, invasion, and cell-cell adhesion.

A subsequent paper, “Distant Metastasis Occurs Late during the Genetic Evolution of Pancreatic Cancer” [Yachida & Jones, et al] published in 2010, established a timeline for the progression of pancreatic cancer of over 20 years, thus providing us with a longer potential window of opportunity to diagnose and treat this disease.


And a follow-on paper, also by Yachida further established how alterations in KRAS, CDKN2A, TP53, and SMAD4 (the most commonly mutated genes in pancreatic cancer) can directly influence the patient outcomes. “Clinical significance of the genetic landscape of pancreatic cancer and implications for identification of potential long-term survivors.” [Yachida et al]

Additional tools began to make their way into the lab and helped us gain a better understanding of the importance of epigenetic changes in driving pancreatic cancer. We were beginning to understand how a gene like CDKN2A could become inactivated in pancreatic cancer due to promoter hypermethylation. “Hypermethylation of multiple genes in pancreatic adenocarcinoma” [Ueki et al]

And beyond epigenetics, we were beginning to see the roles that microRNAs play in pancreatic cancer, acting sometimes as tumor suppressors, and inhibiting invasion and migration. These new potential drug targets also brought with them a whole new potential therapeutic class: oligonucleotides, stretches of man-made RNA that could bond to the microRNA and interfere with them in ways that small molecules could not. In addition, researchers began exploring how circulating microRNAs could be used as diagnostic tools in pancreatic cancer.

These new tools brought with them the promise of new diagnostics, and new therapies, and a deeper understanding of the disease necessary to begin to make progress.  In the posts that follow, we’ll take a look at some of the new pathways that were discovered, the role of familial genetics and smoking in pancreatic cancer, and the promise of precision medicine and pancreatic subtypes.  We’ll also take a closer look at the pipelines of drug companies both large and small, and what promises they hold for the pancreatic cancer patients of tomorrow.


Posted in Cancer Research, pancreatic cancer | Tagged , , | Leave a comment

PubMed vs EuropePMC: Let’s Get Ready To Rumble


For most researchers, PubMed is the go-to resource for all biomedical literature. But from a programmatic standpoint it has some real challenges that make it difficult to integrate into many informatics applications.  Let’s take a look at a typical application.

Suppose we have an internal application used for target identification and tracking, and we want to add the ability to perform literature searches, and add selected hits to a specific target.

To do this using PubMed’s eUtil’s API requires two calls which have overly verbose results that must be parsed.  (Click on the Sample Call links below to see an example of what the call looks like, as well as the server response).

  1. Perform a search, and get a list of PubMed IDs. [Sample Call]
  2. Fetch the PubMed records, allow the user to review them, and then save the selected records. [Sample Call]

The first problem is that the search only returns IDs and search metadata. It doesn’t return titles, or abstracts, or anything else that a user might find useful in making a decision about which article to download or view.

The second problem is that the results when fetching PubMed articles are too verbose. The response is only available in XML, not JSON, and this has a performance impact. For example, all of the dates found in the record appear as separate tags.

Rather than:
<date-created date="2017-01-27"/>
or even
<date-created year="2017" month="01" day="27"/>

That’s 79 characters vs 33 or 47 (depending on which format you prefer).

A simple author name appears like this is:

<Author ValidYN="Y">
where this would do
<author lastname="Tao" firstname="Huimin" initials="H"/>
That’s 107 characters vs 56.
On the surface these seem like niggling complaints, but when you take into account the fact that the record size negatively impacts the speed and responsiveness of your application, and the amount of memory and processing power required to parse the data, then it has some serious implications for your application. For each author or date you could reduce the number of characters by half.
Aside from the verbosity of the results though, PubMed does not attempt to text mine abstract data. The record does not contain gene, protein, pathway or compound information which would make it truly useful in a drug discovery or literature mining application. The closest we come to getting article metadata are the MeSH (Medical Search Heading) terms.
Although BioGroovy makes it easy to search, download and parse PubMed records; it (like other libraries and applications) is not immune to the limitations of the eUtils API.


Perhaps the best alternative to PubMed is EuropePMC.  The database includes both PubMed abstracts, and PubMed Central full text articles.  The EuropePMC API provides you with both XML and JSON response formats. Let’s take a look at our previous algorithm, and how EuropePMC’s API differs from PubMed’s.

  1. Perform a search. [Sample Call]
  2. Fetch the selected records [Sample Call]

One of the first things you’ll notice is that the search results actually contain useful information.  In the sample below, we can see a title, the DOI, a well-formatted author, the journal. We can even see if the article has text-mined terms associated with it.

id: "28094263",
source: "MED",
pmid: "28094263",
doi: "10.1038/nrclinonc.2017.3",
title: "Pancreatic cancer: Pancreatic cancer cells digest extracellular protein.",
authorString: "Sidaway P.",
journalTitle: "Nat Rev Clin Oncol",
pubYear: "2017",
journalIssn: "1759-4774; 1759-4782; ",
pubType: "journal article",
isOpenAccess: "N",
inEPMC: "N",
inPMC: "N",
hasPDF: "N",
hasBook: "N",
citedByCount: 0,
hasReferences: "N",
hasTextMinedTerms: "N",
hasDbCrossReferences: "N",
hasLabsLinks: "Y",
epmcAuthMan: "N",
hasTMAccessionNumbers: "N"


What makes this especially useful is the results can easily be used in a user interface, and contain enough information to allow a user to determine whether or not the article is potentially useful.

You can also fetch text mined terms, such as genes, diseases, and chemicals from EuropePMC records as well. [Sample Call]  For example, in the previous call we’re returning all terms from a particular record. One of those terms is a record for the chemical taxol which is used as a chemotherapeutic agent. The compound metadata includes information from the CHEBI chemical database.


Posted in Bioinformatics, Informatics | Tagged , , | Leave a comment

A New Year, A New Site, A New Service

It’s the start of a new year, and nothing says New Year like a website refresh. With the rise in the number of visitors to the site using mobile browsers, we’ve updated the site to make it more mobile friendly. Not only is it more easily viewable on smartphones and tablets, but you can add it to your home screen just like any other app.


That last bit is important because we’re also announcing the launch of a new service simply called Aspen Gene. With it you can look up information on any gene. The service is powered by the web service developed by Chunlei Wu at The Scripps Research Institute’s Su Lab.  To visualize the data, we’ve developed a series of web components and are making them available through the new open source BioPolymer project.


The Aspen Gene Search interface

Let’s take a look at the service. You start by entering in the symbol for a gene of interest, in this KRAS. Then tap the “Search” button to start the search.  The search results will then appear as a series of cards at the bottom of the screen. Tap on the arrow icon on the result card, and the gene summary will appear as shown below.

Gene Summary

The Summary tab provides an overview of the gene, including its symbol, synonyms, and IDs in related databases.  We’re currently linking to NCBI’s EntrezGene database, the Online Mendelian Inheritance In Man (OMIM), the Human Genome Nomenclature Committee, UniGene, and PharmGKB.  You can tap on the icon to the right of the field to open the record in a new window.


Gene Summary Tab

Protein Information

The Protein tab shows the UniProt ID, along with a list of InterPro domains found in the protein.  You can tap on any of the domains to see more information.  The Protein Database section shows a list of PDB IDs. You can tap on any ID to display the associated protein structure.


Protein Information Tab

Pathway Information

The Pathways tab shows a list of all of the pathways that the gene participates in. This includes entries from KEGG, Reactome, PharmGKB, Wikipathways and more. You can tap on any pathway name to see a diagram of the pathway.


Pathway Information Tab



The Publications tab shows a list of GeneRIF publications. These are References Into Function, or papers that indicate the function of a gene, and are found in the NCBI EntrezGene database. Tap on the card to display the PubMed record for the article.


Publication Information Tab

So, come visit us, and give our new site (and our new service a try)!

Posted in Bioinformatics, Informatics, Science Blogging | Leave a comment

Drug Target Identification: And then a miracle occurs


At the front end of most drug discovery programs lies a step called Target Identification, and a few months ago I sat down with a colleague to discuss their approach to target identification.  In particular, “how do you characterize a target”? I was surprised at how much that process can vary from company to company.

As I set out to describe my workflow for this blog post, I was reminded of this cartoon, and how much work goes on between the starting point and the end point when researching the function of genes.

I should preface what I’m about to say, with the words “this is the way I work” your goals and tools might be different, and I’m always curious about the way people work.  So please feel free to comment.

At a macroscopic level (regardless of your ultimate research goals) there are three levels of research:

  1. Foundational Research: where you familiarize yourself with the general “landscape” of a particular research topic.
  2. Deep Dive Research: where you examine certain concepts exposed in step 1 in-depth.
  3. Current Research: where you create a “surveillance” program to keep yourself up-to-date with the latest developments in a particular area of research.

In the examples which follow, I’ll be showing you the steps that I take and the tools that I use to learn more about the target space for pancreatic cancer.

Foundational Research
My goal at this stage in the game is to answer the following questions:

  • What is the etiology of the disease? (What syndromes predispose people to the disease and what percentage of the patient population do they account for?)
  • How does the disease progress? (What are the clinical stages?)
  • What genes & proteins are involved in the progression of the disease?
  • What pathways & disease processes do they participate in?
  • What is the current standard of care, and what genes are targeted by that standard of care?
  • Who are the thought leaders in this area?
  • Is research in this area heating up?

Since my workflow is very disease-centric, I usually start by searching the OMIM database.  OMIM provides a good overview of the disease, with information on the genes involved, and relevant literature.  Recently, I’ve also added Wikipedia to the list.  I’ve been pleasantly surprised with the depth of information available on Wikipedia, both for diseases and for genes.  In addition, to these more general sources, the National Cancer Institute’s PDQ site provides a good overview of the clinical stages of the disease, and the standards of care applied at each stage.  This information is critical for two reasons.  It gives discovery scientists insight into the clinical presentation of the disease, and makes it possible to design a drug or cocktail that targets a particular patient population.

My usual starting point for most research projects is PubMed.  And I start by looking for review papers on a topic of interest.  In this case my query looks like this:
(pancreatic cancer) AND “review”[Publication Type] 

You can further restrict the results by limiting hits to the last few years. Sorting by publication date also helps focus your attention on the latest developments.  You’ll find more tips and tricks for using PubMed here.

As I read through the review papers, I compile a list of genes which I keep in two “piles” — targets and biomarkers.  I also compile a list of pathways, and attempt to connect those to specific biological processes involved in the disease.

Gene-centric vs Pathway Centric vs Disease Centric Workflows
When I first started out in this industry, I thought, perhaps rather naively, that drug discovery research always followed the same path, and consequently that every company used the same approach to identify new drug candidates.  However, I quickly learned that this wasn’t the case.

Some companies used a traditional compound-centric approach to drug discovery.  They would screen a compound through a particular target panel, find some interesting binding characteristics for a target, and then back-track to an indication or set of indications.

In a gene-centric approach, the process starts with a gene.  The function of the gene is determined (at least initially) by the Gene Ontology terms, by literature, by sequence homology, by protein domain, etc. Depending on the drug class (small molecule vs peptide or antibody, siRNA, etc) certain types of genes/transcripts/proteins may be more or less amenable to being addressed. For example, antibodies may be more appropriate for targets that have extracellular domains to which the antibody can attach.

A few years ago, Novartis espoused a more pathway-centric approach to drug discovery. The aim of which was to use the signaling pathways to help identify new targets, either for monotherapies, or collections of targets for drug cocktails, or for repurposing existing drugs.

In a disease-centric approach, the disease biology, and the genes that drive that biology are used to drive the strategy for therapeutic development.  This approach, originally pioneered by organizations with a vested interest in research in particular disease areas, appears to be the most promising.  These organizations, that I loosely classify as “Translational Medicine Companies”, have a great deal of knowledge and experience in a particular indication, and thus tend to take a systems biology approach to identifying potential targets and drug candidates.  Organizations like the Michael J. Fox Foundation, and globalCure (an initiative of Translational Genomics Institute to find new treatments for Pancreatic Cancer) spring to mind.


Posted in Bioinformatics, Cancer Research, Drug Development, Informatics, pancreatic cancer, Science Blogging | Leave a comment