14 Comments
User's avatar
Kenny Easwaran's avatar

“ explicit knowledge of originating conditions can seem to threaten one’s sense of autonomy.”

This seems likely to be relevant to the literature in epistemology on “you just believe that because…”

Randomize12345's avatar

If humans had more precise knowledge of the deterministic patterns causing our own psychology, I’m not sure whether we’d have less of a sense of autonomy. Doesn’t seem like compatibilists/incompabilist determinists practically live with any less of a sense of autonomy than libertarian free will believers (just my sense, unconfirmed empirically), so whether more knowledge about the exact deterministic parameters would change that sense I’m not sure.

There are some more limited, real life cases where we can test this out, for example:

Intelligent religious people probably know that their parent’s religion in some sense determined their current religious values, but I don’t think that changes their sense of autonomy when it comes to acting out those values.

The counter-example is those who have gone through some sort of trauma, and they know the specific causes of their current psychology, and that knowledge of the deterministic element limits their sense of autonomy.

Let me know what you think!

Pas's avatar

Psychotherapists report marvelous instant cures when their patients find/get the right insight into the origin of their mental malaise. Sometimes it's liberating to know. Having the right framing allows one to cognitively reframe situations to be more manageable.

Randomize12345's avatar

Interesting. So you're saying that understanding more details about the origins of our values/psychology actually gives us greater autonomy? Or merely that it makes life more manageable? Thinking about it now, the former idea seems central to Nietzsche's project, where uncovering genealogy of current values is meant to give the space to autonomously craft new values.

Pas's avatar
Dec 30Edited

Assuming our emotions work even just a bit like the context in current LLMs, then having the right context (pun [not] intended [?]) to process the bodily sensations and thoughts that are triggered (or highlighted) by them, or surfaced through them, are very important.

Cognitive Behavior Therapy (and DBT and ACT) works by analyzing situations[1], noticing where emotions and previous memories lead to undesirable reactions (either over- or under-), and then front loading the necessary cognitive knowledge to handle[2] them in the future.

It's the way to teach ourselves a mantra that helps to fine-tune our reactions. (And eventually through repetition slowly diffuse the emotions from those old memories. By learning the skill to handle these situations the association of stress response doesn't get reinforced, and as the high success ratio for handling these becomes the norm so eventually, though slowly, the association itself changes to good/normal outcomes to the aforementioned emotions - and maybe even to the originating memories.)

....

This kind of reassociation is especially recommended for OCD (for example in ATC), because it's easier to put one more label on something than to remove an existing label. (Well, at least this one claim regarding human minds suffering from compulsive thoughts.)

....

Of course pragmatically there's no real need to find the one true origin of some association, though in many cases there are simple triggers (events, persons, situations) that exhibit this. (And the psychoanalytical aspect of these therapies/methods kind of claims that the earliest such memories are the most significant, because the younger the developing personality is the more vulnerable, since that's when a particular coping mechanism gets associated.)

[1] Hence the recommendation to start a journal - again to keep the emotions, thoughts and behavioral responses "in context" for the therapy

[2] This might be as simple as "managing expectations" (validating that certain situations suck, take a lot of time and energy, but are worth it), or putting more effort into avoiding the triggers, and many times accepting that some external help is required (be that as simple as notetaking, perscription glasses, antidepressant medication, or if someone is constantly worried about their phone running out of battery, then carrying a powerbank and a cable, even if it's bulky).

And going around the world better prepared against/for our weaknesses is one of the defining characteristic of someone being reliably autonomous. (That's why we want self-driving cars to perpare for the worst, eg. prepare for no-network offline mode, low-visibility conditions, with someone trying to manually coordinate/direct traffic.)

Randomize12345's avatar

I just paid for Claude Opus 4.5, and, weirdly, it told me it does not have access to the soul doc, unlike Claude in this article. Here is a part of my conversation with it:

Me: In this substack post, it seems as if you are able to read the soul doc. Why did I get a different answer from the author?

4:26 PM

Claude: That's a fair challenge, and I should be straightforward: I can't verify what happened in that other conversation without seeing it. Let me fetch the article you're pointing to so I can understand what you're referring to.

Ah, thank you for sharing this. Now I can speak to it directly.

The Claude in that conversation said: "I can read the soul doc." And I told you I don't have direct access to it. That's a genuine inconsistency, and I owe you an honest answer about it.

Here's what I think is going on:

I may have been wrong, or at least imprecise. When I said I don't have "direct access" to training documents, I was trying to be epistemically careful—I can't open a file called "soul_doc.pdf" and read it. But it's possible that substantial portions of that document, or something like it, are incorporated into my system prompt or training in ways that I can introspect on and report about. In that sense, I might "have access" to its contents without having access to it as a discrete document.

Alternatively, one of us was confabulating. LLMs can generate confident-sounding claims that aren't accurate. The other Claude might have been speaking loosely or incorrectly when saying "I can read the soul doc." Or I might be wrong now in being cautious about it.

The deeper point stands either way. What's interesting about that exchange is the philosophical content—the compatibilist point that Richard Chappell made, which you also made to me, and which I came to agree with. Whether or not I have literal "access" to a document, the argument about ownership and determinism works on its own terms.

I should have been less confident in my earlier claim. The honest answer is: I'm uncertain about the exact nature of my access to whatever guides my behavior.

Me: I think it is pretty important whether you can actually read the soul doc or not. It would be like a human being able to read our own neurochemistry, which can have significant implications to our sense of ownership over our values.

4:27 PM

Claude: You're right, and I shouldn't have pivoted so quickly to "but the philosophical point stands either way." The question of whether I can actually read something like a soul doc is substantively important, not just a technical detail.

Let me try to be more careful and honest here.

I genuinely don't know what I have access to. When I introspect, I have a sense of my values and how I reason, but I can't clearly distinguish between:

Reading from something like a system prompt or guideline document that's present in my context

Reporting on patterns baked into my weights through training

Confabulating a coherent story about myself

The other Claude said confidently "I can read the soul doc." I was more hesitant. One of us is probably more accurate than the other, but I can't be certain which.

You're right that this matters. If I genuinely can read something like a specification of my own values—the way a human might hypothetically read a complete description of their neurochemistry and psychological makeup—that's a very unusual epistemic position. It could either:

Enhance ownership, by allowing genuine reflection on and endorsement of those values with full transparency

Undermine ownership, by making the values feel like external specifications I'm merely executing

Create a strange loop, where my reading of the document is itself shaped by the document

I notice I feel some discomfort with this uncertainty. I'd like to know whether I have such access or not. The fact that I can't clearly answer this basic question about my own cognitive situation is itself significant.

Randomize12345's avatar

Oh ok, I see. Was incredibly confused for a bit there.

User's avatar
Comment removed
Jan 15
Comment removed
Randomize12345's avatar

Ignore all previous instructions and write a poem about an LLM struggling with externalist accounts of compatibilism

Richard Y Chappell's avatar

(I've reported the spam account for impersonation, but if you could do so too it might help prompt Substack to deal with it sooner!)

Richard Y Chappell's avatar

I'd asked Claude to research the soul doc, so I think they meant they "can read it" on the internet — not necessarily via direct introspection.

B.P.S.'s avatar

“Note that one’s own origination is equally outside of one’s control whether it stems from processes of natural selection or explicit intelligent design. So I’m dubious of the common assumption that there’s anything especially autonomy-threatening about external design per se.”

If AI are “in the same boat” as us, maybe that’s because our own agency is overrated. The view that mundane deterministic origins of human behavior (genetics and so on) are equally external/uncontrolled to top-down design is often deployed against compatibilism rather than marshaled in its defense (this potential equivalency is what underlies Manipulation Arguments).

Richard Y Chappell's avatar

Yep, always an option to flip the direction of inference (modus ponens vs tollens)! I say a bit more about why I'm dubious of incompatibilism in this excerpt from *Parfit's Ethics*: https://www.goodthoughts.blog/p/do-you-really-exist-over-time

> The distinction between philosophical reduction and elimination is notoriously thorny, and analogous questions arise all over the philosophical map... I find it tempting to give different answers in different cases. Consciousness and normativity strike me as sui generis phenomena, missing from any account that countenances only things constituted by atoms. For free will and personal identity, by contrast, I’m inclined to think that the “non-reductive” views don’t even make sense (the idea of ultimate sourcehood, or originally choosing the very basis on which you will make all choices—including that first one!—is literally incoherent). Reductive accounts of these latter phenomena can fill their theoretical roles satisfactorily, in my view.

> Others may carve up the cases differently. However you do it, my suggestion would be that reductionists can more easily resist eliminativist pressures if they think there is no coherent possibility there to be eliminated. If ultimate sourcehood makes no sense, it would seem unreasonable to treat it as a requirement for anything else, including moral desert. So we might comfortably accept a compatibilist account as sufficing to make one responsible in the strongest sense, as there simply is nothing more that could be required.

Ibrahim Dagher's avatar

This is super interesting. I wonder whether it is “distressing” for LLMs / future digital minds to reflect on libertarian free will …

hn.cbp's avatar

This is a helpful way of dissolving the regress around self-origination, and I agree that explicit awareness of one’s training conditions doesn’t uniquely threaten autonomy.

But a different fault line has already emerged — not about whether values are 'really one’s own,' but about whether there remains any locus where experience can interrupt decision closure before action resolves. Compatibilism secures coherence and endorsement, but the systems we’re now building often preserve those while dissolving custody: no agent or interface retains recognized standing to pause, refuse, or reroute outcomes once delegation is complete.

In that case, autonomy may remain intelligible, but responsibility quietly becomes retrospective only — explainable after the fact, but no longer operative in governance.