Philosophical Incuriosity (AI edition)
How political blinders hinder thought
Earlier this month, Justin at Daily Nous drew attention to the leaked “soul doc” used by Anthropic to improve alignment for their latest model, Claude Opus 4.5. Their character-building approach stands in contrast to the hardcoded rules relied upon by other AI companies. (One thing I especially appreciate is how they emphasize the value of giving helpful answers, and hence the moral costs of being prone to excessive self-censorship.) Zvi also discusses its significance:
[Responding to Boaz (OpenAI), who’d tweeted: “Our model spec is more imperative - “the assistant should do X”, and this document tries to convince Claude of the reasons of why it should want to do X. I am actually not sure if these ultimately make much difference…”]
It indeed makes a very big difference whether you teach and focus on a particular set of practices or you teach the reasons behind those practices. Note that Boaz also doesn’t appreciate why this is true in humans. The obvious place to start is to ask the leading models to explain this one, all three of which gave me very good answers in their traditional styles. In this case I like GPT-5.1’s answer best, perhaps because it has a unique perspective on this.
Zvi continues with further thoughts, including that “Opus 4.5 has gotten close to universal praise, especially for its personality and alignment, and the soul document seems to be a big part of how that happened.”
If we return to the Daily Nous thread in hopes of finding intelligent engagement by professional philosophers, we instead discover that the most-upvoted comment reads in its entirety:
AI aren’t agents, so this is pointless. Stop rehashing tech bro talking points.
*facepalm*
It’s always disappointing to see how unphilosophical many philosophers become once their political tribalism has been triggered. As I replied in the thread:
How is “this is pointless” supposed to follow from “AI aren’t agents”? It seems to me that there are incredibly interesting and important questions here about how best to align AI behavior, that don’t depend on their possessing “agency” in any non-trivial sense. It suffices that they yield highly variable outputs that can be influenced in different ways, which raises the important and interesting question of how we (or their creators) might best hope to influence their outputs in morally better directions, i.e. to reduce the risk of harmful outputs and increase the likelihood of good & helpful outputs.
(The lack of intellectual curiosity many philosophers display towards this new technology has been really eye-opening to me. I’m reminded a bit of the early pandemic when there was a clear “party line” being enforced on social media, much to our collective detriment. I really don’t think an interest in questions of AI alignment should be dismissed as “tech bro talking points”!)
For the trivial sense in which AI agency is indisputable, see Dennett’s Intentional Stance, or the similar “interpretationist” theory assumed by Goldstein & Lederman in this recent paper. I take this to be the sense of AI “agency” that’s sufficient for questions of alignment to get off the ground. Whether a rogue AI has genuine intentions—or just something sufficiently functionally analogous that it behaves (in certain contexts) as if it did—may not make a huge difference to alignment concerns.
A different pseudonymous commenter replied:
I think philosophers display plenty of intellectual curiosity about it; it’s just that it’s not the sort of curiosity that’s good for business. A lot easier to focus questions in profit-generating directions like, “What if we build this super-intelligent thing that turns against us?” than the more mundane but also more tangibly impactful questions that many philosophers (and other scholars) do focus on, like “What does this mean for the environment? What does this mean for humanity?” In short, plenty of curiosity; it just doesn’t tow [sic] the “AI” “party line.”
It’s as if they think there’s a deontic constraint against exploring questions that anyone perceives to be business-friendly. “Plenty of curiosity, so long as it fits with anti-AI ideology,” is hackery, not real curiosity. As I responded:
I think the interest of alignment questions arises even just given current capabilities, since the technology is already capable of harm (e.g. encouraging suicide) and we should want to mitigate that. Nor is it necessarily “profit-generating” to ask these questions. For an obvious example: Users tend to love sycophancy (see the popular demand for 4o to be restored, after ChatGPT-5 turned out to be less sycophantic), but I take it that a morally better alignment target would avoid such sycophancy, even if this came at some cost to “user engagement” and hence potential profits.
But also, I don’t think that moral or philosophical interest depends upon not being profit-generating. It’s just orthogonal. There are plenty of interesting and important questions here (I’ve also written a bit about the environmental issues, intellectual property concerns, etc.), and I’d encourage folks to let a thousand flowers bloom and respect their colleagues’ interests rather than maligning them as “tech bros” or whatever. “The questions you’re interested in vaguely remind me of this other group of people I don’t like, therefore they’re bad questions” is not the sort of inference philosophers should be in the business of making, IMO.
It’s obviously fine to personally be more interested in different questions. But I really struggle to see how any intelligent person could seriously think that questions of AI alignment are “pointless” (let alone believe that this logically follows from the premise that AIs aren’t agents). It seems to me that a lot of people are reacting in a politicized rather than philosophically curious way to this issue, and I think that’s a shame.
(This can be true even if they are curious about some other aspects of AI ethics. Though in my experience a lot of people also talk about AI environmental issues in an incurious and politicized way, seeming more interested in finding a cudgel than in seriously examining how the water and energy use compares to other industries and then applying principles in a consistent way.)
I didn’t want to be too aggressive on someone else’s blog, but to be clear, I also think it’s just spectacularly shortsighted to limit reflection to present AI capabilities, when the trajectory of recent change has been so alarmingly steep.
(Remember: you don’t even have to think that AGI risk is especially likely in order for it to be worth insuring against. There’s plenty of room for reasonable debate about precise timelines and risk estimates, etc. But I don’t see how one could reasonably dispute that AGI risk is worth taking seriously. To dismiss the risk entirely would require some mix of (i) extreme overconfidence, and/or (ii) neglecting serious risks in a way that’s egregiously practically irrational—perhaps committing the fallacy of assuming that any “unlikely” outcomes can be safely ignored.)
It seems clear from various comments in the thread that what’s really going on is political rather than philosophical engagement. Certain participants in the discussion view “tech bros” as their political enemies, and any thought that takes AI capabilities seriously is viewed as serving the interests of those enemies, and hence must be opposed. I find it hard to understand their attitude: either they’re not interested in what’s true, or they think that blindly believing the opposite of their political enemies is a reliable route to truth? Whatever the explanation, it’s an appallingly politicized way to do philosophy,1 and I wish we had stronger norms against it.
Compare the unhinged responses of many philosophers on social media to Victor Kumar’s sharing a standard objection to affirmative action. Of course, “Academics respond unreasonably to criticism of woke shibboleths” is a bit of a “dog bites man” story. I worry more about ideological blinkers and politicization spreading to more important topics…


