Patient-Specific Pathway Analysis Using PARADIGM Identifies Key Activities… – Josh Stuart

Josh Stewart:
I want to thank the organizers for inviting me. So I’m going to talk about our method
called Paradigm which integrates multiple types of data on patient samples for inferring
what’s going on in these cancers. So as folks know in TCGA, we generate lots and lots of
data and it’s often referred to as a flood. The Broad calls their system Firehose [spelled
phonetically] for an appropriate reason. And my point is that when you participate in these
projects you often want to do lots of different types of comparisons from comparing expressions
and methylation to figure out why something’s not expressed or, you know, looking at the
copy number and expression and methylation all together. This quickly gets out of control.
You have lots of combinations of things you want to look at and it can be overwhelming.
More importantly, when you’re thinking about a gene and trying to figure out what’s going
on with that gene is it active, is it not active, you’ve got all these different pieces
of data telling you different things, you feel like you’re at this stop light and you
don’t know whether to go or not. Well, at least that’s how I feel and this is often
how many of us feel [laughs]. This is what it makes you want to do. This is your brain on all this types of data.
So our particular approach is to say let’s use a knowledge based approach and the analogy
I like to use is you’re kind of like a detective or a car mechanic in this example and each
patient is a different accident, let’s say. And something different went wrong and some
things are more serious than others here. And if you could try to do data mining on
these car wrecks, if somebody handed you a ream of data, how fast the car was going,
the direction, what people were saying. Some of it’s relevant; some of it’s not relevant.
You’re going to be better off if you use knowledge about how the system works and I like car
talk so I’m showing Click and Clack here, right. People call them because they know
a lot about cars and can figure out and diagnose the problem. And this cartoon shows a radiator
running off and the mechanics looking in the engine and saying, “I know what the problem
with the car is, that you don’t have a radiator.” Now you laugh, but with this data set, you
know, it took a little bit of knowledge in this case to know what was missing. So in cellular systems obviously we have put
together at least some of the circuitry and the machines inside cells and so we should
use those. And I’m going to show you a system that defines a computational model to represent
these types of systems and we benefit from all these efforts out there and there are
many I didn’t list, that’s the ellipses at the end, we’ve drawn from Reactome Kegg, BioCarta,
NCI PID, many different institutions and our favorite of course combines all of them Pathway
Commons from Memorial Sloan-Kettering. And so we try to suck in all that data to learn
something about what’s going on in a cell. So to motivate why we want to do this beyond
just data fusion just think of a simple example, we’ve got a transcription factor and you’re
looking at the expression of the transcription factor and there’s let’s say three different
transcription factors shown here. You know, you’ve got two that have high expressions
shown in red and one that’s lower expression. And we know that expression isn’t everything
and so it’s almost a teleological argument, but how do you figure out whether something’s
working or not? How would you figure out that an enzyme’s working? You know, even if you
had magic goggles and you could look inside a cell and see that it’s bumping around and
moving in a cell and chewing things up, you’re going to look at its secondary effects. You’re
going to look, did it actually metabolize substrate? Or did it, it’s a kinase, did it
actually phosphorylate a target? And for a transcription factor, is it turning on its
targets? Right, and so that secondary evidence tells you something about the activity of
the transcription factor so in this case you assume or you infer that the transcription
factor’s on and that might confirm your expression evidence. Another case you might see that oh, well,
the targets aren’t doing anything downstream of the factor and in this case you would think
it’s off either the post-transcriptionally or even translationally. We didn’t activate
this protein or it’s not localizing correctly or there’s a mutation that stopped blocking
its function or its co-activators, right, aren’t around. On the reverse, you could have
a low level of expression of factor and yet it’s still enough to have potent transcriptional
activity. So you want to look around the neighborhood, is the argument here, to figure out what’s
going on in these things. And one more, so that’s one piece of the — of
the puzzle is to look at neighbors. And the other idea too is, you know, in this previous
example we infer that the factor was on because of its downstream targets. But suppose I ask
Gady [spelled phonetically] to give me JISTIC plots now this is a different type of data,
copy number data, and all of these just serendipitously, all the targets are amplified now. And so
I could explain a way that those over expression via amplification, and so I’m less likely
now to think that the factor’s on. Maybe I’m still — maybe I still think, you know, over
my prior expectation it’s on. But it’s not as high anymore because I have another piece
to explain, the up-regulation of those targets via assist regulation type of machinery. So to model all those — two pieces of information
were also standing on the shoulders of giants here. There’s been lots of development in
the ’80s and ’90s and even currently by seminal work in the field from Judea Pearl and Heckerman
in the early ’90s and more recently by Daphne Koller and Nir Friedman and Aranci Gal [spelled
phonetically]. There’s lots of people in this list and I would recommend folks read this
really nice review article by Nir Friedman in Science in 2004, so it’s getting dated,
but it’s still a very nice read. So these Bayesian networks and probabilistic graphical
models that they describe give us a very nice way of modeling lots of different data and
dependencies and we can — we can learn something from data where we might have had a knowledge
bottleneck before. And so just a simple example here, let’s go
back to the diagram we had from the — from the nice work from Sloan-Kettering and the
GBM study and we have a oncogene MDM2 that is known to inhibit p53. So there are two
parts to this system that we model, one has to do with the regulation of MDM2s activity
and the other part has to do with the interaction between it and p53. And just as a quick toy
[spelled phonetically] example, the model that we have, so when you see our activities
for genes it’s actually a little bit more of a rich representation that looks something
like the central dogma for a gene, right? You could — you have a certain number of
copies in the genome, you can express it, you can have a certain level of protein and
a certain activity in that protein and all these variables are beliefs that you infer
from data and these little black boxes show you constraints that help you infer those
beliefs from data or from other beliefs in the system. And you can propagate this information
to infer something about a higher level thing like apoptosis or activities for these genes.
And that’s what we use downstream for our downstream analysis. And so the big picture looks like we take
a cohort of patients, various types of data, run it through our pathway models and then
we produce one matrix that we can now do analysis on. So we don’t have to think about all these
different modalities anymore, we can just think about is the gene active in the sample
and provide this new matrix for analysis. So for the ovarian study, the obvious signature
here from the paradigm analysis was this FOXM1 signature so when we zoomed in on this, all
the patients pretty much had a up-regulation of this known mitotic regulator, FOXM1. The
slightly more interesting story about it is that it has two isoforms and one part feeds
into proliferation, the other part feeds into DNA repair and there’s a lot of disruptions
in the genome and I know all the ovarian samples, they’re getting constituent activity signaling
through like ATM and ATR, turning on genes like FOXM1 that if they’re not being spliced
correctly are promoting two different, very opposite kinds of things that you want to
happen in a cell, both you know, this proliferation switch and this DNA repair switch. SO FOXM1
also regulated BRCA2 for example. So, very interesting story surrounding FOXM1,
if you take the pathway activities and you try to define subtypes for the ovarian samples
then the good news there was that we could actually start seeing a delineation of meaningful
subtypes so this purple cluster shows you that they have slightly better survival patterns
than the rest of the patients. We’ve recently worked on the colorectal paper
led by Rajinder Kaul [spelled phonetically] and David Wheeler [spelled phonetically] and
in this case the story isn’t so much FOXM1 but activated MIC throughout and that’s an
interesting piece of information. As we see in the mutation data and other types of genomic
perturbations when TFG-beta signaling pathway genes are mutated and those all impinge on
this mis-regulation of MIC and that also bears out in the pathway analysis. And so one other
type of analysis that we’re doing with the pathways is we can take two groups of samples
or patients and look for markers of one subtype versus another say, and then hone in on sub-networks
that are markers for a particular cohort. And we’re working on this for the luminal
basal comparison. So in the breast cancer model and just to show you, this is the closest
we get to the dreaded hairball, but you can — you can see that you know there’s so blue
is more expressed or more active than luminal. And you can see the expected sort of ER signaling
pathways and then you have some other intriguing pathways among the proliferative ones for
basal shown in red like F1-alpha, for example. So, the way we can use that hairball is to
do something like a master regulators analysis like Andrea Califano [spelled phonetically]
likes to do with ARACNE. You can look upstream in this example of a — of a basal marker
such as FOXM1, like I showed and sort of by chain of reasoning, up the regulation hierarchy
you see that there’s a polo kinase. And so the prediction there is that basal cells will
be more sensitive to a polo kinase inhibitor. And this actually pans out in a cell line
model shown in Joe Gray’s [spelled phonetically] lab with his cell lines. So this plot here
shows you sensitivity to a polo kinase inhibitor for basal and claudent [spelled phonetically]
lows contrasted against those in luminal cells. And the reverse is true as well. You can look
up a marker for luminal, like a luminal hub and in this case it was an HDAC. And so the
prediction is that an HDAC inhibitor would be more sensitive in luminal cells and that’s
what turns out to happen in these cell line models. And you saw a nice example yesterday from
Sam Ng [spelled phonetically]. Just to go through that real quick because I wanted to
show you one more result that Sam didn’t have time to show. So he’s developed a clever method
where you can run our pathway analysis twice. One where you connect the gene downstream,
to its downstream targets, infer an activity for it, another where you connect it to its
upstream targets and infer an activity. And just look at the difference to get what he
calls the discrepancy in the activities that are inferred. And he showed you an example,
sort of a positive control for Rb. You can see that the mutated cases, he’s seeing a
lower discrepancy which corresponds to a loss of function event. And he showed you the pathway
surrounding these things. So we’ve tried this for a few positive controls
and he showed you p53. And you can kind of squint and see that for the cases in red around
the circle plot, the tick marks are patients, sorry, I didn’t mention that, you can see
a lower activity being inferred. And so I asked Sam late last night actually, “Can you
please run this for the lung squamous results?” And as you saw before he was predicting for
NFE2L2 this known oncogenic gene that he’s getting a positive discrepancy. And there
are 30 mutations in CDKN2A and consistent with, you know, other deletions, homozygous
deletions in CDKN2A, he’s predicting loss of function. So that’s interesting. But now the power is, and these are sort of
for more frequent like events, but you can now start actually drilling into some of these
more lower frequency events and there are some intriguing stories I think in there.
But and I wanted to just point out that some of its highest scoring discrepant genes now
are not the most frequent, right? So you have a — you actually have a HIF, a hypoxy-inducing
factor up here in seven samples. Why would that be? And among these up here are going
to be possible new targets that you could go after for your drugable genome, for example.
So we even have a map, kinase-kinase up there that might be worthwhile. And on the other
end of the spectrum, there are some other loss of function events that we would, might
want to pay attention to. So you might ask, “What do you do if you don’t
have good pathway models for genes? How can you infer activity? Or do these mutations
mean anything?” You can plot them against clinical information. And so this is just
sort of an overview of — you can show some phenotypic information against these pathway
activities and infer a connection between mutations or phenotypes. And just really quickly,
since I’m almost out of time, we’ve done this for — piloted this in the colorectal study
and you can cluster the mutations based on these signatures and you can see you can look
up that APC and p53 tend to have the same correlations in the colorectal study, for
example. And it confirms that APC mutations are correlated with MIC activity, in this
case anti-correlated with the repressed targets of MIC. And on the other end of the spectrum
you have TGF-beta pathway mutations, so those cluster together. And in the middle you have
RTK and PI3 kinase pathway mutation. So, the obvious idea here is if you have a
mutation in gene X and it has a — and it looks like it’s associated with the same activities
in different, in possible different patients, perhaps it’s also acting in the same pathway
based on this type of association analysis that Ted’s [spelled phonetically] doing. And
so I’m basically out of time. I’m going to skip to the end. Obviously we want to use
these to look across multiple cancers. The pathway activities give us a way to do that
and we’re working on pan cancer analysis, a basal comparison to ovarian for the breast
work and so one. So I hope I showed you that we have a nice
model for integrating a lot of different data sets. We use knowledge about pathways. We’re
trying to expand that with predicted interactions now. We can stratify patients with that, find
predictive sub-networks and so on and use it to predict hopefully more of these rarer
mutations. And the beliefs allow — the inferences allow us to connect cancers across different
data sets. And hopefully, the last slide that I just skipped there, it was just trying to
make a point that we can connect subtypes together, maybe get a clue about therapies.
So, I wanted to just say a special thank you to the Broad team here. They’ve got PARADIGM
working and Firehose [spelled phonetically] and this is not a trivial feat. And a lot
of these big network methods, by the way take a lot of CPU time to run so this is really
nice that it’s going to put the results in the hands of public actually. And so you don’t
have — you don’t have to go off and implement these yourself. And this is my group that worked on the integration
analysis. I’ve highlighted the work of the folks circled there, especially Sam Ng who
you saw speak earlier. And this is work in collaboration with David Haussler who actually
heads the whole team and Chris Benz and Jane Ju [spelled phonetically] ran a tutorial yesterday
and she runs the engineering staff. So thank you and I’ll take any questions. Sorry I went
a couple minutes over. [applause] Male Speaker:
Time for one quick question for Josh. Josh Stewart:
Crystal clear. Male Speaker:
No, okay, well I’m sure he’d be happy to take it up over coffee if something emerges. So
thank you, Josh.

Leave a Reply

Your email address will not be published. Required fields are marked *