Good Judgment with Numbers

The latter can inform, without substituting for, the former

Sep 23, 2024

A general theme of my writing is that people are often too quick to “read off” practical differences from theoretical ones. A classic example: whereas pluralistic views tend to leave unspecified the relative weights of their varied moral reasons, utilitarian reasons have precise weights (in theory), determinately fixing what ought to be done in any given situation. But people often infer from this that we can easily determine what ought to be done, if utilitarianism is true. (“Just add up and compare the numbers!”) This inference is fallacious: positing precise and determinate truth-makers doesn’t imply that we have easy epistemic access to them. The world’s a complicated place, which leaves plenty of room for reasonable disagreement over the theoretically “simple” question of what interventions will actually do the most good.

In this post, I especially want to target the view (rarely explicitly defended, but often implicitly assumed) that the role of quantitative tools in practical decision-making must be all or nothing. On this view, either you’re committed to blindly following a simple algorithm come what may (a la naive instrumentalism), or you dismiss “soulless number-crunching” as entirely irrelevant to ethics. I think both options are bad, and moral agents should instead use good judgment informed by quantitative considerations.1

The ‘All or Nothing’ Assumption

In my experience, the ‘All or Nothing’ view of quantitative tools is especially common amongst critics of effective altruism. They are opposed to quantitative methods,2 so to resist EA calls to quantify their impact, they (i) implicitly assume the ‘All or Nothing’ view, (ii) attribute the ‘All’ view to effective altruists, (iii) suggest that the ‘All’ view is bad, and so (iv) conclude that their ‘Nothing’ view is good and any pressures towards moral optimization can rightly be dismissed.

Consider, for example, Leif Wenar’s old WIRED diatribe against GiveWell.3 He gets very outraged by GiveWell’s “hedging” of their quantitative results:

In the fine print, the calculations are hedged with phrases like “very rough guess,” “very limited data,” “we don’t feel confident,” “we are highly uncertain,” “subjective and uncertain inputs.” These pages also say that “we consider our cost-effectiveness numbers to be extremely rough,” and that these numbers “should not be taken literally.” What’s going on?

GiveWell transparently explains that their quantitative modelling is uncertain; it’s just one input that goes into determining their overall verdicts.4 (An excellent 2011 blog post explains why weakly-supported explicit expected-value estimates should be adjusted back towards the mean, the same way you would adjust down your estimate of the quality of a restaurant sporting a single five-star review and no other info.) This all seems very reasonable and epistemically responsible to me. But it makes critics like Wenar feel cheated. What’s going on?

My best guess is that the All or Nothing theorist associates numbers with mathematical certainty. So, to use numbers to present one’s best estimate inherently “projects absolute confidence” (an accusation Wenar levels against GiveWell, for no other reason but that they present numbers). But this just seems a mistake on the part of the All or Nothing theorist. There’s no necessary connection between numerical representation and higher-order confidence in the represented number. If you want to know how confident GiveWell is in their estimates, you can read their full reports to find out. If you make assumptions based on nothing but the presented estimate, that’s on you.

Don’t Blindly Trust Numbers

Sometimes we have higher-order evidence that certain kinds of first-order estimates require adjustment, or even outright dismissal. I’ve already mentioned GiveWell’s argument for discounting weakly-supported “extreme” verdicts. And I often write about the standard two-level utilitarian case for trusting reliable heuristics over error-prone individual calculations. I don’t have much patience for galaxy-brained proposals about how murdering your rival (or “stealing to give”) would make the world a better place. Unreliably calculating those things out demonstrates bad practical judgment, in my view. We can better approximate ideal guidance (i.e., maximizing expected value, by the lights of the objectively warranted expectations) by not doing that.

If someone presents you with a “proof” that 1 = 0, you can (rightly) be extremely confident that it contains a mistake, even if you can’t immediately identify the error. Their “evidence” isn’t strong enough to overcome your (rational) prior resistance to the conclusion. Similarly, in moderated fashion, if someone presents you with an expected value “calculation” favoring a disreputable act like theft or murder (in a context where it seems intuitively bad). Even if you can’t immediately see what they’re leaving out, you can reasonably expect that they’re missing something important. Your prior resistance to their conclusion should be weaker than in the pure math case, of course; you’d expect there to be some cases where disreputable-seeming acts really are for the best. But a handwavy EV estimate is very weak evidence indeed, and shouldn’t much move you away from expectations that seem reasonable based purely on priors. (Even if there are some cases where disreputable acts turn out for the best, your expectation in any particular case should not be so sanguine—compare the expected value of buying an overpriced lottery ticket.)

Similarly, as I wrote in It’s Not Wise to be Clueless, if someone pulls out some numbers (and an associated narrative) suggesting that global nuclear war would actually be best for the long-term future, you should not believe them. Practical rationality requires good priors; if you don’t have them, you’ll be led badly astray.5 (Compare Pascal’s Mugging.) It really matters what we should expect to have better consequences. And, as J.S. Mill rightly stressed, we ought to have some pretty robust expectations about some of these things.6

Look where you’re going! Don’t blindly follow big numbers off a cliff.

Don’t Blindly Trust Formalisms

It’s not just “input” numbers that need to be filtered through good judgment. It’s also quite possible to start with accurate or reasonable numbers, and be led to utter insanity due to doing the wrong things with them.

For example, many philosophers seem to take seriously a formalization of risk-aversion which implies pro-extinctionism. As far as I can tell, there’s no principled basis for those particular formalisms (unlike the arguments for orthodox decision theory); they just happened to yield desired results in the narrow range of (low-stakes) cases usually considered. As a result, I don’t see any reason to take these formalisms seriously when they have crazy implications in high-stakes cases. Especially when ordinary risk-aversion seems better explained as a simple humility heuristic, trying to build it into the fundamental formalism itself just seems a mistake.

Another common mistake with formalisms is the failure to consider model uncertainty, relying on a single “most-likely” scenario or model, when you should be distributing your credence across a wider range. (“Respectable” academic objections to longtermism often take this form.)

Don’t Blindly Dismiss All Numbers

So, it’s easy to go wrong when dealing with numbers in practical decision-making. Still, the numbers do count, in principle. So it’s even easier to go wrong if you refuse to consider numbers at all. That route leads to all the familiar sorts of inefficiencies that motivated effective altruism to begin with. (For example: donating to a charity that trains seeing-eye dogs for $50,000, when another could restore sight to someone suffering from trachoma for just $50.) It’s worth trying seriously to do more good rather than less, all else equal. There’s plenty of low-hanging fruit here. Favoring GiveWell-recommended anti-malaria charities over arbitrary alternatives, for example, doesn’t risk any of the problems discussed in the previous section.7 It’s just a no-brainer. Yet most people still fail to do this. (You even get crazy people like Crary and Wenar who are positively hostile to GiveWell. It’s the weirdest thing.)

If we ask the question, what decision-procedure (or broader moral mindset) should we expect to have the best consequences? I think it’s clear that the answer is:

Not blindly following naive expected value calculations; but also…
Not completely ignoring numbers.
Rather, take scale (and tractability) into account when trying to do good, in the ways that effective altruists recommend, considering a wide range of potential opportunities and ambitiously pursuing potential “upside”, while…
Taking care to mitigate “downside” risks, e.g. by acting with integrity, respecting moral “guard rails” (e.g. legitimate laws and rights), and avoiding Machiavellian manipulation or deceit (except when validated by common sense).
And generally exercising good judgment throughout!

If folks want to argue that EA-style quantification is bad, I’d like to see them go beyond “All or Nothing” reasoning and seriously engage with this far more reasonable intermediate position. In particular, I’d love to see them spell out an alternative decision procedure that they expect would have better consequences for the world, and offer some argument or reasoning in support of that expectation.

In general, criticism is much more valuable when it goes beyond merely noting that something is flawed or imperfect (as if anything isn’t), and positively establishes a better alternative. If your criticism is just that EA is imperfect, that’s compatible with every alternative being (as I believe) much worse. And it’s hardly reasonable to try to discourage people from pursuing the best moral project currently on offer. (Indeed, that seems transparently vicious.)

Fallacies to Watch Out For

Be very careful about what you infer from the fact that some tradeoffs are unclear or difficult to make. Sometimes people will try to infer sweeping conclusions from this: things like, “therefore, you should sometimes prioritize homeless Americans over the global poor”; or “therefore, we should not even try to optimize”; or “therefore, effective altruism should be entirely abandoned.”

These inferences are bonkers,8 and you should severely downgrade your estimate of the reasonableness of anyone you see making them. They are rationalizing, not reasoning. (This becomes especially clear if they further infer, “therefore, you should prioritize my pet cause over things that seem more effective.”)

The correct inference to draw is just that it will sometimes be difficult to discern which option is most worth prioritizing. That’s all.

There’s a risk here of thinking, “Anyone more quantitatively-inclined than I am is a blind number-cruncher; anyone less inclined is stupid.” But even without getting too specific, I think it’s helpful to note that some — possibly broad — “middle ground” between the two extremes is plausibly going to be the most reasonable stance. Both extremes are very much worth avoiding!

Often, I think, because it’s really clear that there’s simply no basis for prioritizing their preferred interventions, but they don’t want to admit this.

I previously criticized the central mistake of his article: trying to raise the salience of rare side-effects, in a way that exploits people’s status quo bias, with the predictable effect of deterring life-saving interventions. Bad stuff. The mistake I want to discuss today is less egregious, but still worth remedying.

Relatedly: it’s totally fine for them to say, “Here are some potential negative and offsetting effects of the program that we believe to be too small or unlikely to have been included in our quantitative model.” Wenar claimed to find this outrageous, which is (again) simply unreasonable on his part.

Sometimes people imagine there’s an epistemic norm of open-mindedness, that you should never reject a view without argument, or even that you should always assign non-trivial credence to any view that you cannot absolutely disprove using non-question-begging premises. Something along these lines may (?) often be a good heuristic norm for general discourse. But as a strict universal generalization, it is completely insane—as again demonstrated by Pascal’s Mugging. (Middling credences are not automatically more reasonable than “extreme” ones—it depends on the proposition!)

J.S. Mill, chp 2 of Utilitarianism: “People talk as if… at the moment when some man feels tempted to meddle with the property or life of another, he had to begin considering for the first time whether murder and theft are injurious to human happiness.” Or, as Kamala Harris’s mom put it, “Do you think you just fell out of a coconut tree?”

It would be different if you really thought we should expect anti-malarial interventions to do more harm than good (due to second-order effects or whatever). But that seems a crazy expectation: one can of course describe a scenario vindicating how it is possible, but it would seem hard to justify having this as your dominant expectation. Even Wenar makes no such positive claims. To muddy the waters, he just raises the possibility of bad outcomes, without doing the epistemic work of establishing that we should actually expect them to predominate. Possibilities are cheap.

You might think that the first claim is independently defensible. What I’m criticizing here is the specific inference, not the conclusion being drawn. (I’d say it’s misguided to prioritize homeless Americans over the global poor, but I wouldn’t call the view “bonkers”.)

Good Thoughts

Discussion about this post

Ready for more?