Thursday, July 17, 2008
Chest has an article about the effects of treatment limitations in ICU patients on prolonged survival. I'm not going to discuss the article itself much: it's also well discussed in the July 2008 PC-FACS (although you have to be an AAHPM member to access it) and in an accompanying editorial in Chest. Instead I wanted to focus on its use of propensity scores, as the article is a good introduction to them.
Some background on the article. It's a single institution retrospective cohort study which compared 60 day mortality between patients for whom there was some sort of order/decision to withhold a life-sustaining treatment in the ICU (e.g. vent, dialysis, pressors, CPR, etc.) and patients who had no such decision/order. Patients who had any such treatment withdrawn were excluded, as well as patients who wanted comfort-only care. There were ~2000 patients in the study; ~200 had a WLST decision. As you'd expect, the WLST patients were older, sicker, with a higher in-ICU and in-hospital mortality than the non-WLST patients (16% vs 2%, 30% vs 5%).
The authors then created a propensity score model to describe the likelihood of having a WLST decision. Propensity scores (PS) are a way to try to minimize confounding differences between groups in observational research. Clearly one cannot do a RCT of WLST decisions. Instead all you can do is watch what happens to those who have a WLST decision and those that don't. Of course there are likely many confounding variables in such observations (things that are associated both with having a WLST decision and with death like being older and sicker - it's not fair, say, to compare these older, sicker patients with the younger, healthier ones and conclude that the WLST decision was responsible for increased mortality). What PS try to do is to mimic a RCT by creating a model which predicts the likelihood of a subject getting an intervention (in this case WLST) then comparing outcomes between subjects who got the intervention or not but who had an equal chance of getting the intervention in the first place (i.e. as if they were randomized to the intervention or a control).
To clarify.... A multivariate model is created from as many data points (hopefully) that the researchers have. This model creates a score (PS) which describes a patient's likelihood - within the cohort - of receiving the intervention (in this case a WLST decision). In this paper it was a 69-variable model and included things like demographics, markers of illness severity, etc. - the model was derived from data from the subjects in this cohort, and, again, predicts a subject's likelihood (propensity) of getting the intervention in question. A simplified example could be: a 67 year old white male with Medicare admitted to the ICU on hospital day 4 with an APACHE II score of 30 and gram negative sepsis would have a PS of X. X being some number which means something to statisticians about how likely this patient is to have a WLST decision in this cohort. (A 53 yo woman with diabetic ketoacidosis and an APACHE II score of 14 would have a lower PS, for instance.) What you then do is take a patient with a WLST decision, derive their PS, then match them as closely as possible to a patient in your cohort who did not have a WLST decision but who has a nearly identical PS. The idea is, again, to mimic a RCT in the sense that - as much as your model is accurate - both of these patients had an identical 'chance' or 'risk' of having a WLST decision and it 'just so happens' that one did and one didn't; you then can fairly compare outcomes. You repeat this matching across your entire sample of WLST patients and you can then compare outcomes between the groups because, ideally, the patients in the WLST group and the non-WLST group had an equal 'chance' of receiving that 'intervention' and so it's fair to then compare the outcomes.
So, to keep things concrete, in this study they took their 200 WLST patients and matched them 1:1 with non-WLST patients with nearly identical PS and then compared outcomes between the groups (which now total 400 patients and not the original 2000). What they found is that despite the now very similar baseline characteristics between the groups (age, demographics, indices of illness severity) the WLST patients had higher mortality in-ICU, in-hospital, at 30 days, and at 60 days (16% vs 6%, 32% vs 12%, 42% vs 22%, 51% vs 26%). (The authors were surprised that the difference in mortality extended so long and there is some hand-wringing about whether or not we are causing 'premature' deaths by WLST - see the editorial mentioned above for some common sense reaction to this.)
The obvious problem with PS is that it all hinges on what is included in the multivariate analysis to derive the PS. Only things which are measured can be included, and so if there are important factors which aren't being measured or included, which could influence the outcome, the model breaks down. (For this study the editorialists points out that in this study clinicians' prediction of prognosis was not included).
Why am I rambling on about PS? They have been proposed and promoted within the palliative care research community as one 'get around' for the fact that controlled trials are often impossible or impractical for our patient population (like for instance this trial, or one looking at the effects of G-tube feedings in dementia, or the effects of early palliative care consultation on some outcome, etc.), and PS have some appeal because they approximate randomization (again, only as well as the models contain all relevant variables, which is a significant issue). They have been discussed in J Palliative Med here, and were the subject of a concurrent session at AAHPM last winter, and I've begun to see them used more often. I have been waiting for a good article to introduce into my program's palliative care-EBM curriculum about PS and this is the one I'm probably going to use (as it's relatively easy to understand and a little controversial which gets people excited and interested).
PS are not without controversy, not only because of the issues mentioned above, but there's some debate whether they actually add anything to 'routine' multivariate analysis; however this debate is quite statistical and well above my head. I haven't found any really good, simple (casually readable) summaries on PS: this one is OK.