Effective Altruism, and building a better QALY

2015-06-04

This originally started as a write-up for my friend James, but since it may be of general interest I decided to blog it here.

Effective Altruism (EA) is a progressive social movement that says altruism is good, but provably effective altruism is way better. According to EA this is particularly important, because charities can vary in effectiveness by 10x, 100x, or more, so being a little picky in which charity you give your money to can lead to much better outcomes.

To paraphrase Yudkowsky and Hanson, a lot of philanthropy tends to be less about accomplishing good, and more about signaling to others (and yourself) that you are a Good Person Who Helps People. Public philanthropy in particular can be a very effective way to ‘purchase’ social status and warm fuzzies, and while any philanthropist or charitable organization you ask would swear up and down that they were only interested in doing good, often the actual good can get lost in the shuffle.

EA tries to turn this around: it seldom discourages philanthropy, but it’s trying to build a culture and community where people gain more status and more warm fuzzies if their philanthropy is provably effective. The community is larger and more dynamic than I can give it credit for here, but some notable sites include givewell.org, givingwhatwecan.org, 80000hours.org, eaglobal.org, and eaventures.org. Peter Singer also has several books on the topic.

I love this movement, I endorse this movement, and I wish it had been founded a long time ago. But I have a few nits to pick about its quantitative foundation.

Okay, so EA likes measurements. What do they use?

The current gold standard for measuring ‘utility’, which the EA community enthusiastically uses, is the Quality-Adjusted Life Year (QALY). Here’s the University of Ottawa’s explanation:

QALYs are calculated as the average number of additional years of life gained from an intervention, multiplied by a utility judgment of the quality of life in each of those years. For example, a person might be placed on hypertension therapy for 30 years, which prolongs his life by 10 years at a slightly reduced quality level of 0.9. In addition, the need for continued drug therapy reduces his quality of life by 0.03. Hence, the QALYs gained would be 10 x 0.09 - 30 x 0.03 = 8.1 years. The valuations of quality may be collected from surveys; a subjective weight is given to indicate the quality or utility of a year of life with that disability.

The idea of QALYs can also be applied to years of life lost due to sickness, injuries or disability. This can illustrate the societal impact of disease. For example, a year lived following a disabling stroke may be judged worth 0.8 normal years. Imagine a person aged 55 years who lives for 10 years after a stroke and dies at age 65. In the absence of the stroke he might be expected to live to 72 years of age, so he has lost 7 potential years. As his last 10 years were in poor health, they were quality-adjusted downward to an equivalent of 8 years, so the quality-adjusted years of life lost would be 7 + (10 - 8), or 9.

In short, if we’re giving money to charity, we should look for opportunities which give a high QALY-per-dollar ratio (e.g., Malaria nets in Africa) and shun those that don’t (e.g., invasive cancer surgery on bedridden 90-year-olds). Every so often, givewell.org updates their shortlist for trusted, validated charities that can produce lots of QALYs for your donation. There are also different ‘flavors’ of the QALY model that focus on different things: the Disability-Adjusted Life Year (DALY), the Wellbeing-Adjusted Life Year (WELBY), and so on.

The QALY, a foundation of wire, particle board, and spackle: way better than nothing, but not ideal.

It’s great we have these metrics: they make intuitive sense, they’re handy for quickly summarizing the expected benefit of various humanitarian interventions, and they’re a lot better than nothing. But jeez, they depend on a lot of complex, kludgy, top-down simplifications. For instance, most variants of the QALY tend to:

treat utility, quality-of-life, absence of disease, and health as the same thing;
assume health states are well-defined and have exact durations;
have no accommodation for differences in how people deal with health burdens, or have different baselines;
essentially ignore the effects of interventions that might do something other than reduce some burden, generally disease burden (may be arguable);
have all the limitations and biases characteristic of subjective evaluations and self-reports;

… and so on. The QALY is a wonderfully useful tool, but it’s very far from “carving reality at the joints”, and it’s trivial to think up long lists of scenarios where it’d be a terrible metric.

To be fair, this problem- measuring well-being- is inherently difficult, and it’s not that people don’t want to ‘build a better QALY’, it’s that innovation here is so constrained because it’s very, very hard to design and validate something better, especially when you have something simple that already sorta works. (The fact that QALYs are so closely tied in with the medical establishment, never a paragon of ideological nimbleness, doesn’t help things either.)

“Okay,” you say, “you’re skeptical of using the QALY to evaluate the effect of charitable interventions on well-being. What should we use instead?”

In the short term: let’s augment/validate the QALY with some bottom-up, quantitative proxies for well-being.

What gives me hope that the QALY is a brief stop-over point toward a better metric is that there’s so much overlap between the EA community and the Quantified Self (QS) community, and the QS community is absolutely nuts about novel, clever, and useful ways to measure stuff. The QS meetups I’ve attended have had people present on things ranging from the relative effectiveness of 5+ dating sites for meeting women and the quality of follow-up interactions from each, to the effects of 30+ different foods on poop consistency. Quantifying and tracking well-being is well within the QS scope!

So how would a QSer measure well-being in a more bottom-up, data-driven way?

First, there’s a lot that a QALY-style factor analysis does right. It gives us an expected effect on well-being from the environment, and helps sort through causes of well-being or suffering, something that purely biological proxies can’t do. So we wouldn’t throw it out, but we should set our sights on augmenting and validating it with QS-style data collection.

Second, we’d look for some really good (and ideally, really really simple) bottom-up, biological proxies that track well-being. I suspect we could throw something together from stress hormone levels (e.g., cortisol) and their dynamic range, and possibly heart-rate variability.

Third, we’d crunch the numbers on possible biological proxies, pick the best ones, and validate the heck out of them. Easier said than done, but simple enough in principle.

Why do we need to bother with biological proxies? Because they’re strong in the same ways that top-down metrics tend to be weak. E.g.,

they don’t involve any subjective component… they may not tell the whole story, but they don’t lie;
they can be frequently measured, and frequent measurements are important, because they let you have tight feedback loops;
they can measure and compensate for the phenomenon of hedonic adaptation;
there’s a significant amount of literature in animal models that explores and validates this approach, so we wouldn’t have to start from scratch.

Obviously we couldn’t do this sort of QS-style data collection on everybody all the time, but even if just a few people did it, it’d go pretty far toward improving the QALY for everybody else, by seeing where our predictions of environmental effects on well-being hold up and where they break down, at least compared to the window of biological data we can directly measure.

In the medium term: let’s figure out some common unit by which to measure human and non-human animal well-being.

One of the principles of EA is that all sentient beings matter- that humans don’t have a monopoly on ethical significance. I agree! But how do we compare different organisms?

First, those biological proxies from step (1) will definitely come in handy. Humans, dogs, cows, and chickens all share the same basic brain architecture, which implies that if we find something that’s a good proxy for well-being or suffering in humans, it should be at least a not-so-terrible proxy for the same in basically all non-human animals too.

But for numerical comparisons we’ll need more than just that. I suspect we’ll need some sort of a plausible method for adjusting for how sentient and/or capable of suffering something is. Most people, for instance, would agree that mice can suffer, and that mouse suffering is a bad thing. But can they suffer as much as humans can? Most people would say ‘no’, but we can’t put good numbers or confidence ranges on this. My intuition is that almost everything with a brain shares the basic emotional architecture, and so is technically capable of suffering, but various animals will differ significantly in their degree of consciousness, which acts as a multiplier of suffering/well-being for the purposes of ethics. E.g., ethical significance = [suffering]*[degree of consciousness]. The capacity to have a strong sense of self (i.e., the sense that there is a self that is being harmed) may also be important, which likely has a neuroanatomical basis. Call it the SQALY (Sentience and Quality-Adjusted Life-Year).

The road forward here is murky, but important. I hope people are thinking about ways to quantify this, because one of EA’s core strengths is that it argues that human and animal well-being is in principle commensurable, and quantifiable. My off-the-cuff intuition is that a mash-up of Panksepp’s work on mapping structural commonalities in “emotion centers” across vertebrate brains, and a comparison of relative amounts of brain mass in areas thought most significant for consciousness, could bear fruit. But I dunno. It’ll be tough to put numbers on this.

In the long term: let’s build a rigorous mathematical foundation that explains what suffering is, what happiness is, and how we can literally do utilitarian calculus.

EA wants to maximize well-being and minimize suffering. Cool! But what are these things? These tend to be “I know it when I see it” phenomena, and usually that’s good enough. But eventually we’re gonna need a properly rigorous, crisp definition of what we’re after. To paraphrase Max Tegmark: ’Some arrangements of particles feel better than others. Why?’ — if pain and pleasure are patterns within conscious systems… what can we say about these patterns? Can we characterize them mathematically?

I’ve spent most of my creative output of the last couple years on this problem, and though I don’t have a complete answer yet, I think I know more than I did when I started. It’s not really ready to share publicly, but feel free to track me down in private and I can share what I think is reasonable to say, and what I think is plausible to speculate.

We don’t need an answer to this right away. But technology is taking us to strange places (see e.g., If a brain emulation has a headache, does it hurt?) where it’ll be handy to have some guide for detecting suffering other than our (very fallible) intuitions.

This is way too much to worry about! And isn’t it a better use of resources to actually help sentient creatures we know are in pain, rather than slog through all these abstract philosophical details that seem impossible to overcome?

Very possibly! But I don’t think this would be wasted effort, at all.

First: EA is all about metrics. True, improving these metrics can be very difficult, and very difficult to validate, but it’s the sort of thing that pays huge long-term dividends. And if we don’t do it, is there anybody else who will? Abe Lincoln has this (possibly apocryphal) quote, “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” The QALY is EA’s axe, and it’s worth trying to sharpen.

Second, this all sounds very hard, but it may be easier than it seems. This stuff reminds me of what Paul Graham said about startups: the best time to work on something is when technology has made what you want to do possible, but before it becomes obvious. With all the consciousness research that’s been going on, I suspect we have many more good tools to work with than most people- even (or perhaps especially?) ethicists- realize. We may not have solid answers to these questions, but we’re increasingly getting a decent picture as to what good answers will look like.

Third, and I think just as important, I fear that EA is going to be vulnerable to mission drift and ideological hijacking. This is not a criticism of EA, but rather a comment that almost everybody would love to think they’re engaging in effective altruism! Every activist and culture warrior thinks they’re improving the world by their actions, and that their special brand of activism is the most provably effective way to do so. And so I think it’s very plausible that people with external agendas will be highly attracted to EA (whether consciously or not), especially as it starts to gain traction. EA is going to need a sturdy rudder to avoid diversions, and strong (yet not overactive!) immune system to fend off ideological hijacking. I can’t help with latter, but I do think improvements in the core metric EA uses could help with ‘organic’ long-term mission drift.

Concrete recommendations for EA: There’s no one person ‘in charge’ of EA, so the best, most effective, and least annoying way to get something done is generally to organize it yourself. That said, here are a few things I think the community could (and should) do:

Consider forming an “EA Foundational Research” team for the sort of inquiry that might not happen effectively on its own. It need not be super formal- even an ‘EA Foundational Issues Journal Club’ could be helpful and would be fun;
Foster relationships with receptive scientific communities to help with each step (short/medium/long)… the way Randal Koene’s carboncopies coordinates research between labs might be an interesting model to emulate;
Think deeply about how to shield EA from unwanted appropriation by culture warriors without being too insular, and/or how to pick the culture battles worth fighting;
Keep being awesome.