The Basic Argument for AI Safety
High-stakes uncertainty warrants caution and research
When I see confident dismissals of AI risk from other philosophers, it’s usually not clear whether our disagreement is ultimately empirical or decision-theoretic in nature. (Are they confident that there’s no non-negligible risk here, or do they think we should ignore the risk even though it’s non-negligible?) Either option seems pretty unreasonable to me, for the general reasons I previously outlined in X-Risk Agnosticism. But let me now take a stab at spelling out an ultra-minimal argument for worrying about AI safety in particular:
It’s just a matter of time until humanity develops artificial superintelligence (ASI). There’s no in-principle barrier to such technology, nor should we by default expect sociopolitical barriers to automatically prevent the innovation.
Indeed, we can’t even be confident that it’s more than a decade away.
Reasonable uncertainty should allow at least a 1% chance that it occurs within 5 years (let alone 10).
The stakes surrounding ASI are extremely high, to the point that we can’t be confident that humanity would long survive this development.
Even on tamer timelines (with no “acute jumps in capabilities”), gradual disempowerment of humanity is a highly credible concern.
We should not neglect credible near-term risks of human disempowerment or even extinction. Such risks warrant urgent further investigation and investment in precautionary measures.
If there’s even a 1% chance that, within a decade, we’ll develop technology that we can’t be confident humanity would survive—that easily qualifies as a “credible near-term risk” for purposes of applying this principle.
Conclusion: AI risk warrants urgent further investigation and precautionary measures.1
My question for those who disagree with the conclusion: which premise(s) do you reject?
[Edited to add:] See also:
- Helen Toner’s “Long” timelines to advanced AI have gotten crazy short
- Kelsey Piper’s If someone builds it, will everyone die?, and
- Vox’s How to kill a rogue AI—tagline: “none of the options are very appealing”.
Of course, there’s a lot of room for disagreement about what precise form this response should take. But resolving that requires further discussion. For now, I’m just focused on addressing those who claim not to view AI safety as worth discussing at all.



Often I find that philosophers' dismissals of AI risk are driven by a sort of fatalism, and that they can sometimes be swayed by making a quick case for tractability along the following lines. With AI risk, the dangerous technology doesn't exist yet (in contrast to nukes), we can shape its features (in contrast to pandemics and asteroids), and the barriers to causing harm are high (in contrast to engineered pandemics and climate change). To make a difference, we just need to persuade a small number of (admittedly very powerful and motivated) people to do things a little bit differently.
And it helps to note that reducing AI risk is especially tractable *for philosophers*. To address nuclear, pandemic, asteroid, or climate risks, you have to learn a new field and your philosophical skills are approximately useless. With AI risks, that's not true. Philosophical skills are useful. And since Claude Code can now design and run ML experiments for you, you don't even have to learn much of a new field.
Premise zero should be "ASI defined" and premise one should be "ASI possible." Most dismissive views seem to imagine ASI is something very different from what you do, and further they don't believe it is possible at all in some sense, though this is easy to conflate with "ASI soon."