Query: now that I’m a video master, would people like videos of these lessons? I don’t see how they’d help much, but who knows. Let me know below or via email.
Next up is survival analysis, a.k.a. time-to-event analysis. We could treat times to events as regular numbers, and use regression, or even tobit regression, or the like, except for a twist. And that twist is called censoring.
Suppose we’re studying when people hand in their dinner pails for the final time after shooting them up with some new Ask-Your-Doctor-About drug. Some will during our study gladden the hearts of undertakers, yet others will have frustratingly remained above ground. Yet we know they have an unbreakable appointment with the Lord. We know the keeling over times of the dead, but only the so-far times of the living. These so-far times are said to be censored.
There are other kinds of censoring, but today all we want to do is this so-called “right” censoring. (Wrong censoring happens in your more mature Western democracies.)
The first problem is finding useful software. Predictive methods are not yet so common that every package contains them. And the rare ones that rely on MCMC-type methods, about which more below. But if you don’t recall why these creatures are not what most think, then you must review: this and this at the least, and this too for good measure.
We’ve been using rstanarm
, and it has a method that sorta kinda doesn’t not work, called stan_jm
, for joint longitudinal-survival models. Longitudinal models measures things over time, like time-series. There is a prediction method for this model, but it only produces predictions for the longitudinal part. There are some laborious workarounds, but our point here is not software per se, but understanding the philosophy of models. So we’ll leave it behind.
There is also spBayesSurv
, which works, and which allows spatial processes to join the fun. The difficulty with it is that you have to work directly with design matrices, which aren’t especially hard to grasp, but again the code requirements will become a distraction for us.
So we’re going to use brms
. The weakness here is resources. It is a memory hog (which is why I’ve been avoiding it up to now). It externally compiles models before running them. First time I tried this for the model below, I had several other instances of R running, along with a video editor, and it locked up my system. Close extraneous programs before beginning.
Obligatory anti-MCMC (mini) rant. Since you reviewed, or you remembered the cautions, you recall MCMC doesn’t do what it says, not exactly. The problem with it is not that useful answers can’t be extracted from MCMC methods—of course they can. The problem is that people find it so useful I fear not enough thinking is going into finding better analytical approximations to complex integrals, solutions which would bypass all these fictional “random” simulations. For one, we could learn to embrace discrete finite models, which are exact, and not approximations as all continuous models are (and which everybody forgets because of the Deadly Sin of Reification). Bonus: discrete finite models don’t need integrals, thus don’t need MCMC. End of rant.
The Code
install.packages('brms', dependencies=TRUE)
library(survival)
library(brms)
We’ll use the built-in kidney
data. Type ?kidney
to learn about it. It has a time to event (infection), a censoring indicator, age, sex, and disease type. The package authors already wrote the model code for us, to which I make only one change: assigning the data to x
(for consistency).
x = kidney
fit = brm(time | cens(censored) ~ age + sex + disease, data = x,
family = weibull, inits = "0")
The code form is mostly familiar from other models, except for the addition of time | cens(censored)
to indicate this is time given censoring. The “weibull” is to characterize uncertainty in the time. ?brmsfamily
shows the other options. When run, this will first show “Compiling the C++ model
“. It could take considerable time here, too; minutes, maybe, depending on your resources. Then the MCMC bits begin.
As before, we could take time to examine all the MCMC diagnostics which give information about the parameters. This should be done. Do plot(fit)
. Look for “convergence”. I won’t do that here, because this example works fine.
In the end, we do not give a rat’s kidney about the parameters. They do not exist. There are mathematical struts that make the model work. That and nothing more. As always, we care about this:
Pr( time in t | New age, sex, disease, D, M) (1),
where t is a some time of interest, where we make guesses of new values of the measures, where D is the old data, and M the model. Where we remember that the priors and the MCMC supposeds all form the model M. Change the model—change any part of the model—change the probability! This is not a bug, it’s a feature. Change anything on the right hand side, change the probability. All probability is conditional.
We want predictions. Which is to say, we want equation (1). Which means we must supply guesses of age, sex, and disease type. If we didn’t want to specify guesses of age, sex, and disease type, we shouldn’t have put them into the model. Luckily, we have a ready supply of such guesses: the old data. The probabilities produced by (1) will not be for these old patients, though (unlike the supposition of classical hypothesis testing). We already know all about them. They will be for new patients who are “like” the old ones, where “like” is defined by us: an implicit right-hand-side assumption. It is there even though it doesn’t appear visibly.
p = predict(fit, probs = c(0.10, 0.90))
The probs = c(0.10, 0.90)
is not the default, which instead is the old familiar number. But why on earth do we want 95% prediction intervals? That’s a mighty and harsh requirement for time predictions. It produces great uncertainty; why shouldn’t it? Such an interval says “Conditional on the model and so on, there is 95% chance this patient will have an event in this time interval.” Considering a 100% chance would be the interval (0, infinity), you can see a 95% interval would be wide, too. The default is there only because old habits die hard.
First few rows of p:
> p Estimate Est.Error Q10 Q90 [1,] 44.91669 54.83210 3.636403 105.41077 [2,] 197.94483 223.35119 17.389400 478.37263 [3,] 42.76444 51.70631 3.689502 98.58186 [4,] 211.71675 233.77522 18.804205 494.05413 [5,] 46.70474 58.45127 3.842455 111.61812 [6,] 221.68699 263.71488 19.417692 519.37713 [7,] 40.13264 52.69928 3.114666 95.80451 [8,] 196.17201 237.87493 16.869121 470.98349 [9,] 123.36390 146.84011 10.551705 294.04778
And the first few rows of x (which are matched to these p):
> x time censored patient recur age sex disease 1 8 0 1 1 28 male other 2 23 0 2 1 48 female GN 3 22 0 3 1 32 male other 4 447 0 4 1 31 female other 5 30 0 5 1 10 male other 6 24 0 6 1 16 female other 7 7 0 7 1 51 male GN 8 511 0 8 1 55 female GN 9 53 0 9 1 69 female AN
Doesn’t look so hot, this model. We have to be careful how we interpret its performance, though, because of the censoring (none of the first nine were censored, meaning all had the event).
Let’s look at the empirical cumulative distribution functions for the data, and for the point predictions, busted out by censoring.
plot(ecdf(x$time[x$censored==0]), main = 'ECDF', xlab='Time' ) lines(ecdf(x$time[x$censored==1]),col=2) lines(ecdf(p[x$censored==0,1]),col=3) lines(ecdf(p[x$censored==1,1]),col=4) grid() legend('bottomright', c('X not-cens', 'X cens', 'P not-cens', 'P cens'), col=1:4, lwd=3, bty='n')
There is a clear difference in distributions of times for censored and uncensored data. What is assumed is that the times for the censored patients will be larger that what is seen (obviously). There is no censoring in the predictions, of course; the breaking out by censoring is only to show the matching points with the data. That is, the model is not predicting whether a new patient will be censored, for that concept has no place in guessing a person’s eventual time to event—which may be “infinite”, i.e. they never get an infection. Censoring only happens in limited-time studies.
This all means the predicted times must be larger than what was seen. The censored points “push out” the ECDFs to higher numbers. Thus (at least with a Weibull) the model tends to over predict in a way.
And what all that means is that we can’t really compare the model’s predictions with the observed data. The only way to verify this model is to test it on new times.
Wait. Build a model, make predictions, then test how well the model performs in real life? Where have we heard that before? You know it, baby. Rightched here. And in this gorgeous award-eligible book.
So let’s example the predictions themselves, knowing (as we knew for all our past efforts) that we’re limited to making statements only about the predictions and not their quality.
Here we run back into the screwiness of MCMC. Ideally, we’d specify a new age, sex, disease and compute (1), which would produce the same number (same prediction) for every duplicate entry of age, sex, and disease. Not so with MCMC, which produces hazy numbers. Run this:
i = order(x[,5], x[,6],x[,7]) # order by age, sex, disease
x = x[i,]
p = p[i,]
Then look at the first few rows of x:
> x time censored patient recur age sex disease 5 30 0 5 1 10 male other 36 12 0 5 1 10 male other 24 132 0 27 1 10 female GN 50 156 0 27 2 10 female GN 6 24 0 6 1 16 female other 14 536 0 15 1 17 female other 37 245 0 6 1 17 female other 68 25 1 15 2 17 female other 31 119 0 35 1 22 female other 57 8 0 35 2 22 female other 1 8 0 1 1 28 male other 33 16 0 1 1 28 male other 20 402 0 22 1 30 female other 72 24 1 22 2 30 female other
The first two patients are the same! As are many of the others. At this point somebody will chirp up “But those data are correlated! This is the wrong model!” Which, I have to tell you, is empty of meaning, or ambiguous. Correlated to us means that when conditioned on a probability changes. It does not mean cause. Since probability is conditional on the sort of model we choose, and on everything else on the right hand side, it is not clear how multiple measures on patients would change the probability. Other models are easy to explore; the package authors even thought of some. ?kidney
will show them to you (scroll to the bottom).
We’re going to ignore the multiple measures aspect (we’re not in this lesson trying to build the world’s most predictive model of kidney infections). Instead we’ll suppose, as happens, we have some rows of data that are the same. The first two rows of data are identical, as far as (1) goes. The have predictions
> p Estimate Est.Error Q10 Q90 [1,] 46.70474 58.45127 3.842455 111.61812 [2,] 46.26424 56.88700 3.454595 111.20644
Again, if these were analytical results, or non “simulated” results, these rows would be identical. They are not. They’re close, and whether “close” is close enough depends on the decisions that would be made—and on nothing else. I’m going to say close enough. But you might not. If not, you have to find a way to merge them, either by some kind of averaging, say, or by working though the parent code and hacking the simulation to group like rows, or whatever. Far-apartness would then be an indication the model did not “converge”.
Relevance
Okay! So hypothesis testing is out. How do we test measures? Class? Class?
If you said relevance, you’re right! Good memory. Let’s first look at all the predictions in some useful way. (The reordering of x
and p
won’t matter.) We could do something like this.
jit = rnorm(nrow(p),0,.2) plot(x$age+jit,p[,1],pch=as.numeric(x$sex), col=as.numeric(x$disease), cex=2, ylab='Times',xlab='Age', ylim=c(0, max(p))) grid() for(i in 1:nrow(p)){ lines(c(x$age[i]+jit[i],x$age[i]+jit[i]),c(p[i,3],p[i,4]), col=as.numeric(x$disease[i])) } legend('topleft', c('male', 'female', 'other', 'CN','AN','PKD'), col=c('#999999','#999999',1:4), pch =c(1,2,NA,NA,NA,NA),lty=c(NA,NA,1,1,1,1), lwd=3, bty='n')
The jit
adds a bit of jitter (which needs to be saved) to separate points. Remember: we looking for differences in probability and not just point predictions.
The changes in probabilities for sex are obvious, and they are for diseases AN, and PKD versus the other two. The changes in probabilities is not so great for age, except for two females with PKD (it’s the same two patients measured twice each). These are the only females with PKD, and the suspicion is age doesn’t matter too much, but the combination of female and PKD does. I’m not a kidneyologist so I don’t know what this means.
We can test these easily.
y = data.frame(age = seq(10,70,10), sex=rep('female',7), disease=rep('PKD',7))
p = predict(fit, newdata=y, probs = c(0.10, 0.90))
> cbind(round(p),y) Estimate Est.Error Q10 Q90 age sex disease 1 1084 1884 56 2509 10 female PKD 2 1022 1539 63 2481 20 female PKD 3 951 1375 67 2353 30 female PKD 4 914 1340 64 2176 40 female PKD 5 942 1468 63 2279 50 female PKD 6 876 1279 68 2066 60 female PKD 7 901 1309 58 2140 70 female PKD
So age does change the probability (and in a way that makes sense to naive readers like me). Is this change enough to make a difference? I have no idea, and unless you are kidney guy, neither do you. I don’t know what kind of decisions are pertinent. These kinds of decisions are not up to the statistician make. There are no magic numbers in the predictive way. P-values presume to give probabilities and make decisions simultaneously. We won’t make that enormous mistake.
Homework
You can repeat the same thing but for sex and disease. If you know something about kidneys, let us know below.
Then fit the second model, where it says (from ?kidney
) “adding random intercepts over patients”. That’s a misnomer. Everything not known in a Bayesian analysis is “random”, which his nothing but a synonym for unknown.
Compare directly the predictions (don’t forget you sort p
above) from both. We cannot say which of these models is better in a predictive sense per se: not until we get new data in. But what can you say?
Advanced readers should try this. brms
is limited, unlike rstanarm
, because its prediction method only spits out a point and predictions bounds. In rstanarm
you get the whole distribution.
First pick a combination of the measures, and then a time you think is interesting. Suppose it’s 300. Now find the probability of exceeding that time with your given combination. This is trivial in rstanarm
. Here you need to optimize
. Next have a systematic series of measures (age, sex, disease) and plot these exceedance probability for this sequence. Different times will lead, of course, to different curves. The differences in those curves may be big or small depending on the decisions to be made conditional on the model. Proving, yet again, that the same model may be useful to one man, and useless for the next.
Thank you for the nice example! Does brms allow plotting something like adjusted Kaplan-Meier curves for each time point?
There are multiple datasets named kidney, I think the following one is needed:
data(“kidney”, package=”brms”)