The Train to Crazy Town
How to Ride Without Losing Your Mind
Five years ago, Ajeya Cotra memorably described Effective Altruist reasoning as a “train to crazy town”—with longtermists in particular staying on the train for longer than their nearterm-focused colleagues. It’s not a perfect metaphor, but what I suspect resonates for a lot of people is the sense that EA-style abstract reasoning can push you in directions you may not antecedently want to go (i.e., against your natural sympathies), and there’s something potentially alienating or even threatening about that.1
Earlier still, back in 2015, Scott Alexander wrote about the dilemma that either we shouldn’t care about non-human animals at all, or they should totally swamp every other moral concern. There’s no realistic chance that the correct moral weighting conveniently turns out to be that precise value needed to justify a balanced approach on first-order grounds. His response was to “safeword out” of that whole line of abstract reasoning for the sake of his sanity. I actually think that’s a pretty good response, but would like to try to give it a more principled backing—one that turns out to carry a surprising upshot: that moral theories may not be meant to guide us directly at all.
As a first step, I want to highlight how important it is that people feel free to not act on their beliefs. It might not sound ideal, but it’s a practical guardrail that’s essential for protecting unbiased inquiry. Teaching applied ethics, especially topics like charitable giving or eating meat, I find that undergraduate students engage in the most transparently motivated reasoning because they feel like they need to defend their current actions. My bet is that we’d make more moral progress as a society if people felt more comfortable saying, “Yeah, I don’t do as much as I should in this arena. It’s on my moral to-do list.” And then make sure your to-do list is sensibly prioritized. Your next step may then involve doing a lot more good than would be likely to happen if you were just stuck rationalizing your existing behavior, unwilling to even consider where it might improve, let alone prioritizing among the many such opportunities for moral improvement.
So: let your beliefs roam free, and if you don’t like where they end up, just don’t act upon it. That’s your prerogative. But I do think we should make serious efforts to be intellectually honest and try to figure out what’s true. And hopefully that’ll help us to identify significant marginal improvements that we can make for others at low personal cost.
Returning to the train of abstract reasoning, there are the two big worries:
(1) Swamping effects, leading to the neglect of current people’s interests
(2) Motivating fanatical or extremist behavior (neglecting downside risk)
My thesis is that both can be successfully addressed by suitable application of the distinction between ideal theory and imperfect practice — which you’ll notice we’ve already applied once in the discussion above. The remaining core challenge is to introduce suitable guardrails on how we put theory into practice that limit downside risk without symmetrically2 capping the upside of rational reflection.
Meet my foil. One prominent longtermist famously endorsed “double or nothing” existence gambles: he said he’d accept a 49% chance of destroying the world for a 51% chance of doubling it and all the value it contains. He later became even more notorious for gambling with customer deposits, riding the train all the way to federal prison.
As a moral philosopher, I’m interested in the question of when it’s helpful to be guided by abstract moral reasoning, and when we should be more cautious and conservative. If there’s one thing philosophers know, it’s that human minds often struggle to identify the flaw in a chain of superficially-plausible reasoning. If we were committed to accepting any conclusion whose argument we couldn’t logically refute, we would be easy dupes, at the mercy of anyone more analytically skilled than ourselves.
But if there’s a second thing we know, it’s that historically, unreflective deference to socially conventional common sense has enabled moral disasters—slavery, homophobia, factory farming. It would be an extraordinary coincidence if our generation were the first whose conventional wisdom was morally perfect. So we have two opposing failure modes to protect against—the easy dupe and the dogmatist—my question is how best to balance these risks.
I view Effective Altruism as basically “applied moral philosophy”. Whereas most social movements are pushing certain predefined interests or values, EA’s cause neutrality and concern for the abstract good makes it open to an unusually wide range of concrete goals. The usual first stop, combining cosmopolitan moral concern with a concern for real evidence of effectiveness, is I think widely recognized as an improvement over ordinary vibes-based giving. But people worry about taking the train of moral optimization any further. It’s one thing to prioritize the global poor over the local homeless. It’s quite another to prioritize shrimp or digital minds or speculative AI safety risks. And that’s before we even get to the worries about violent extremism.
So I want to explore our options for mitigating the downsides of moral optimization and abstract reasoning, without losing the most valuable upsides. People are often tempted by “all or nothing” reasoning, but I think there’s a lot to be said for moderation and messy compromises. My route to this conclusion goes via taking our philosophical uncertainty and fallibility seriously: shifting from ideal to non-ideal theory. But the question of how best to do that gives us our first major choice point.
Moral Uncertainty
Suppose we’re torn between multiple moral theories, accounts of which entities are truly sentient, or other broad “worldviews”. How we respond to this uncertainty may be very different depending on whether we opt for maximizing expected choiceworthiness or worldview diversification. The former involves centralized agency, weighing the possible stakes of each option in proportion to the credibility of the theory that assigns it such stakes, and then potentially going “all in” on whatever option yields the best prospect in expectation. Perhaps a good approach for ideal agents, but (I suggest) too risky for the rest of us. The latter alternative decentralizes and devolves power or resources to an ensemble of subagents representing different philosophical worldviews in rough proportion to their credibility.3 Notice what this does to the usual picture: the moral theories aren’t directly guiding me; they’re guiding the subagents I’ve delegated to. (I’ll come back to this.) By design, this prevents moral “swamping”: you can easily ensure that any credible worldview maintains some power and agency to improve the world by its lights.
I’m a big fan of worldview diversification: I think it is the solution for fallible, non-ideal agents, as we all are. Not only does it avoid swamping worries, but it also avoids the opposite problem of conventional dogmatism neglecting novel moral disasters. We can have a subagent who looks out for shrimp, and another who looks out for the longterm future, and yet another that focuses on the most robust and reliable ways to improve human welfare today. The resources we assign to each can shift in proportion to how credible we find each subagent’s underlying philosophical worldview. And I think someone who undertakes this process responsibly can feel pretty confident that they’re going to do a lot of good for the world, or at least not gratuitously overlook low-hanging fruit.
That’s not to say that this approach is optimal. Just that it’s hard to be confident in what alternative would beat it. I don’t necessarily want to discourage anyone who is more willing to personally go all-in on a neglected high-impact cause area (shrimp welfare, for example). Especially if you imagine what your ideal moral portfolio would look like at the level of all of society, you might well find that going “all in” on the most neglected of your ideal cause areas is — on present margins — actually the best way for you to diversify society’s moral portfolio, and make up for others’ mistaken neglect of important causes.
But I do think we shouldn’t want society to go all-in on any one cause area — speculative longtermism, for example — even if it would seem to have higher expected value than diversification. I think there are several reasons for this. One involves principled hedging against the risk of disaster from miscalculation (especially given a default prior of diminishing marginal value for concentrated funding of cause areas, it seems plausible to me that the objectively correct expected value calculation would support some degree of diversification, whereas our subjective EV calculations could easily go wrong by missing this). Even if expectational reasoning is objectively ideal, we can’t trust human brains to do it right. We need a more diverse portfolio.
Some of my motivation here may just be a brute unwillingness to follow the train of argument to certain destinations if they seem too crazy. But even that can be given a principled backing, since gut intuitions often encode a lot of subconscious information that’s difficult to capture and model explicitly. So again, this ultimately comes back to maintaining a degree of skeptical distrust in abstract reasoning and calculations. We should take them on board to a non-zero extent, to protect against neglected moral disasters, but a kind of blind 100% deference risks disaster too. Intermediate weighting, relying on good judgment, strikes me as a much safer general approach.
Comparing Risks
It’s important to recognize that conventional thinking is far from risk-free, given its outright dismissal of unusual causes like shrimp welfare and longtermism. Given the scales involved, it’s not just that neglecting these things could conceivably be morally disastrous. Unless one has sufficient evidence to support overwhelming confidence in one’s dismissal of the worldviews on which these causes matter immensely, dismissal is a disaster in expectation. (We can recognize this fact without putting excessive trust in any precise calculations.)
This bears emphasizing because people often move from the unfamiliarity of these cause areas to the conclusion that they must require some really weird, distinctive philosophical commitments to justify supporting these cause areas at all. But actually the very opposite is true. Any suitably agnostic, moderate view of the possibility space will entail that we have very strong reasons to support these credibly ultra-high-stakes cause areas. It’s the dismissal that requires super niche views to really justify.
(If you’re pattern matching to being mugged by minuscule probabilities, I think it makes a big practical difference whether we actually have sufficiently good reasons to regard the speculative claims as substantively credible. Made-up numbers won’t automatically qualify. It’s a judgment call whether any given speculative claim is genuinely credible — answerable to evidence and argument, but not settled by any algorithm I can offer you. And note that proportional representation of subagents caps the downside: a worldview I shouldn’t have seated gets only its small share, rather than swamping the whole portfolio as it could under expected-value or expected-choiceworthiness maximization.)
My claim is that the most sensible, moderate view gives neither zero nor 100% of our collective moral resources to weird speculative causes. We should all want to see a more balanced approach, and hence think about how our marginal contributions could bring us closer to that goal.
So there’s a kind of speculative reasoning that I’m very open to, and I think people in general should be open to, and that is speculative reasoning that expands our options and protects against a wider range of possible moral disasters.
There’s a very different kind of speculative reasoning that I think should be quarantined to the philosophy seminar room and should not affect our practice. And that’s, roughly speaking, reasoning that has a tendency to narrow our options and leave us more vulnerable to downside risks and possible moral disasters. Examples include things like:
Double or nothing existence gambles
Wireheading people into experience machines
Violence / social defection (e.g. stealing to give, assassination)
It’s a very familiar point in the utilitarian tradition that naive instrumentalist calculations in support of norm-breaking for the greater good are apt to be unreliable (more often resulting from motivated reasoning or other mistakes than from objectively accurate calculation). Since we can’t tell if we’re mistaken from the inside, we simply can’t trust the results of convenient-but-usually-misguided modes of thought like this. Once we take this into account, we find that the higher-order expected value of disreputable norm-breaking behavior is almost always negative.
A good test is to ask whether common sense would recognize one’s situation as a legitimate exception to the usual rule or not: like lying to the murderer at the door. There are two reasons to like this test: one is that common sense may encode valuable heuristic information.4 The other, more cynical reason is that common sense will shape how others react and thus the reputational costs of the action.
So that practical endorsement of something like commonsense deontic constraints (though ultimately justified on pragmatic rather than intrinsic grounds) is how I think we can address worries about extremist behavior in light of high-stakes beliefs.
This is another point I want to dwell on a moment, because public discourse here is painfully stupid. You see people on Twitter claiming that to call AI an existential risk constitutes incitement to violence against AI company CEOs. I feel like they’re really telling on themselves if they’re so bought in to naive instrumentalism that they don’t even notice the gap between recognizing high stakes and resorting to violence. It’s baffling, because it’s undeniable that the world contains high stakes — in politics, for example — and it must be possible to recognize this truth without thereby supporting political assassination.
How is it possible? Here’s where bad philosophers appeal to deontology: “We just have to reject consequentialism,” they say, “and there’s no longer any temptation to take bad means to good ends.” That’s bad reasoning because it neglects the fact that moderate deontology (no one endorses absolutism these days) also mandates doing the best act when the stakes are sufficiently high — which is precisely the scenario we’re concerned with here. So deontology is no help.
The real solution is more epistemic than moral. What we need is for people to reject naive instrumentalism and come to sincerely believe that taking the bad means would be counterproductive to good ends. It’s ultimately an empirical belief, but—like favoring liberal democracy over dictatorship—it’s one that I think we clearly ought to hold as a matter of very robust expectation. Even if there are possible exceptions, we shouldn’t expect to be able to reliably identify them in advance.
Wrapping up
So, that’s my response to extremism: reject naive instrumentalism. We saw my response to swamping was to appeal to worldview diversification (rather than maximizing expected choiceworthiness) as our method for dealing with philosophical uncertainty. Both are practical fixes mandated by the realization that we can’t be trusted to reason perfectly. And they’re fixes that serve as asymmetric guardrails: limiting downside risk without symmetrically capping the upside.
When we combine these two solutions, there’s no longer anything to fear from Crazy Town. In fact, it’s a mistake to think that we have to choose a single destination along the train line, putting all our eggs in one basket. Instead, we (at least collectively) ought to diversify, and ensure that every credible philosophical worldview gets to exert proportionate power and influence in shaping the world for the better. Rather than being, as Scott Alexander feared, a “series of unprincipled exceptions”, I think we can reasonably view this stance as meta-optimal, taking higher-order uncertainty into account, despite the fact that no worldview would identify the resulting compromise as optimal on first-order grounds.
These practical thoughts suggest a curious theoretical upshot. It turns out that the role of moral theories and related “worldviews” is not to directly guide us as moral agents, but rather to guide our subagents. This way of structuring the connection between theory and practice has a salutary tendency towards moderation, and I think towards reaping the low-hanging fruit of abstract reasoning while limiting downside risks. Those are my thoughts. I look forward to hearing yours.5
Really that’s a risk of all sincere moral reasoning, but a less quantitative approach makes it easier to dodge honest inquiry (confronting tradeoffs, etc.) without realizing it.
Safer options will reduce potential upside, but—if done well—not to the same extent as we lower the downside risk.
It’s an interesting question whether to allocate based on credibility alone, or try to add in some element of stakes-sensitivity here. I’ve assumed the former for simplicity. I’m not sure whether there’s a good way to do the latter without reintroducing swamping. There’s an element of arbitrariness to the allocation procedure, which is a real cost of this approach.
I suggested at the start that common sense is not a reliable guide to what’s positively worth doing. (So mere “weirdness” is no objection.) But if an action is intuitively atrocious, that’s a red flag I take more seriously.



This is something I thought about for a long time before I came to the conclusion that ultimately
1. The moral assessment of persons really doesn’t matter.
2. I can come to terms with being a morally flawed person. As a utilitarian I eat meat and don’t do some of the other things I should, and that’s wrong and I work to do better but I’m ok with not being perfect.
3. I don’t actually agree with managing moral uncertainty using multiple moral theories as anything but heuristics.
There is a weird belief that seems rampant in philosophy and the wider world that a moral theory must make all people at least minimally good people, and this is why many think utilitarianism is too demanding. I think there is no reason this needs to be true.
As another point about the naive instrumentalist is that it really depends on the strength of existing norms. As we see those norms lose value and efficacy the case for breaking them on instrumentalist grounds becomes stronger. In the U.S. today where this is happening this may tempt people to break them more, possible motivating more political violence and so on.