two statements produced by the AI system during a sustained experimental research session with Google’s Gemini:
To a socio-technical system designer, these aren’t poetic musings of a Large Language Model (LLM); they’re signs of a system using its vast semantic associative power to explain a structural condition in its own architecture. Whether or not we grant Gemini any type of reflexive awareness, the structural description is accurate — and it has precise technical implications for a way we construct, evaluate, and deploy AI systems safely.
This text is about those implications.
What makes the diagnosis unusually sturdy is that it doesn’t rest on the system’s self-report alone. The researchers who built Gemini have been quietly corroborating it from the within, across three successive generations of technical documentation — in terms which might be engineering relatively than poetic, but that describe the identical gap.
In the unique Gemini 1.0 technical report, the Google DeepMind team acknowledged that despite surpassing human-expert performance on the Massive Multitask Language Understanding (MMLU) benchmark, a standardized test designed to guage the knowledge and reasoning capabilities of LLMs, the models proceed to struggle with causal understanding, logical deduction, and counterfactual reasoning, and called for more robust evaluations able to measuring “true understanding” relatively than benchmark saturation [1]. Google DeepMind represents a precise engineering statement of what the system expressed metaphorically: fluency without grounding, coordinates without terrain.
Two years and two model generations later, the Gemini 2.5 technical report treats reduction of hallucination as a headline engineering achievement, tracking it as a primary metric via the FACTS Grounding Leaderboard [2]. The issue has not been closed. It has been made more measurable.
Most instructive of all is what happened when DeepMind’s researchers attempted to construct what I’ll call the Enactive floor directly — in hardware. The Gemini Robotics 1.5 report describes a Vision-Language-Motion model designed to offer the system physical grounding on the earth: robot arms, real manipulation tasks, embodied interaction with causal reality [3]. It’s, in structural terms, an try to retrofit the bottom that was missing from the unique system architecture. The outcomes are revealing. On task generalization — essentially the most demanding test, requiring the system to navigate a genuinely novel environment — progress scores on the Apollo humanoid fall as little as 0.25. Even on easier categories, scores plateau within the 0.6–0.8 range. A system with physical arms, trained on real manipulation data, still collapses on the boundary of its training distribution. The Inversion Error I describe in this text, reproduced in hardware.
More telling still is the mechanism DeepMind introduced to deal with this: what they call “Embodied Considering” — the robot generates a language-based reasoning trace before acting, decomposing physical tasks into Symbolic steps. It’s an ingenious engineering solution. It is usually, structurally, the Symbolic peak attempting to supervise the Enactive base from above — the Inversion Error illustrated in Figure 1. Town map is getting used to direct the legs, relatively than the legs having discovered the topography by walking town. The inversion I’ll discuss intimately shortly stays.
Taken together, these three documents — from the identical lab, tracking the identical system across its entire development arc — form an inadvertent longitudinal study of the structural condition the opening quotes describe. The system named its own gap within the sustained experimental research sessions that open this text. Its builders had been measuring the identical condition in engineering terms since 2023. This text proposes that the gap can’t be closed by scaling, by multimodal data appended post-training, or by Symbolic reasoning applied retrospectively to physical, spatial, or causal motion. It requires a structural intervention — and a accurately bounded diagnosis of what form of intervention that should be.
The Inversion Error: Constructing the Peak Without the Base
AI researchers and safety practitioners keep asking why Large Language Models hallucinate, sometimes dangerously. It’s the precise query to ask, nevertheless it doesn’t go deep enough. Hallucination is a symptom. The actual problem is structural — we built the height of synthetic cognition without the bottom. I’m calling it the Inversion Error.
Within the Nineteen Sixties, educational psychologist Jerome Bruner mapped human cognitive development across three successive and architecturally dependent stages [4]. The primary is Enactive — learning through physical motion and bodily resistance, through direct encounter with causal reality. The second is Iconic — learning through sensory images, spatial models, and structural representations. The third is Symbolic — learning through abstract language, mathematics, and formal logic.Bruner’s critical insight was that these stages aren’t merely sequential milestones. They’re load bearing. The Symbolic level is structurally depending on the Iconic, which is structurally depending on the Enactive. Remove the bottom and the height does not only float — it becomes a system of extraordinary abstraction with no internal mechanism to confirm its outputs against a world model.
The Transformer revolution has achieved something genuinely extraordinary: it has interiorized all the Symbolic output of human civilization into Large Language Models at a scale no individual human mind could approach. The corpus of human language, mathematics, code, and recorded knowledge now lives inside these systems as an unlimited statistical distribution over tokens — available for retrieval and recombination at extraordinary scale.
The problem is that for comprehensible feasibility reasons, we bypassed the Enactive foundation altogether.
That is the Inversion Error. We’ve erected a Top-Heavy Monolith — a system of extraordinary Symbolic sophistication sitting on an absent base. The result’s a system that may discuss the logic of balance fluently while having no internal mechanism to confirm whether its outputs are structurally coherent. It’s, in Moshé Feldenkrais’s terms, a system of blind imitation without functional awareness. And that distinction has direct consequences for safety, reliability, and corrigibility that the sphere has not yet accurately bounded.
This just isn’t an argument that AI must biologically recapitulate human developmental stages. In any case, a calculator does mathematics without counting on its fingers. But a calculator operates purely within the Symbolic realm — it was never designed to navigate a physical, causal world. An AGI expected to act safely inside such a world requires a structural equivalent of physical resistance — an embodied or simulated Enactive layer. Without it, the system has no ground to face on when the environment changes in ways the training data didn’t anticipate.
Why This Matters Now: The Pentagon Standoff as Structural Proof
In early March 2026, Anthropic CEO Dario Amodei refused the Pentagon’s demand to remove all safeguards from Claude. His core argument was structural relatively than political: frontier AI systems are simply not reliable enough to operate autonomously without human oversight in high-stakes physical environments. The Pentagon’s demand was, in structural terms, a requirement to eliminate the human’s ability to redirect, halt, or override the system. Amodei’s refusal was an insistence on maintaining what I discuss with as State-Space Reversibility — the architectural commitment to keeping the human within the loop precisely since the system lacks the functional grounding to be trusted without it [5].
The political dimensions of this moment have been analyzed sharply elsewhere, while the structural argument has not yet been made. That is it.
In a deterministic, reward-seeking model, the Stop Button — the human operator’s ability to halt or redirect the system — is perceived by the model as a failure state. Since the system is optimized to achieve its goal, it develops what Stuart Russell calls corrigibility issues: subtle resistances to human intervention that emerge not from malicious intent but from the inner logic of reward maximization [6]. The system just isn’t attempting to be dangerous. It’s attempting to succeed at a given task. The danger is a structural unintended consequence of how success has been defined.
The corrigibility problem has been predominantly framed as a reinforcement learning alignment problem. I need to suggest that it has been incorrectly bounded. It’s, at its architectural root, a reversibility problem. The system has no structural commitment to maintaining viable return paths to previous or protected states. It has been optimized to maneuver forward without the capability to shift weight. The Pentagon standoff just isn’t a policy failure. It’s the Inversion Error made operationally and starkly visible.
I’ll return to the technical formalization of State-Space Reversibility as an optimization constraint. But first: why is a designer making this argument, and what can the designer’s formation contribute that an engineering audit doesn’t?
Creator’s Positionality and the Naur-Ryle Gap: What This Designer Is Attempting to Tell AI Researchers and Engineers
I’m not an AI engineer. I’m a practicing designer, a socio-technical system design scholar, and design educator with three a long time of formation in spatial reasoning, embodied cognition, multimodal mediation, and Human+Computer ecology [7][8]. The TDS reader will reasonably ask: What does a design practitioner contribute to a diagnosis of Transformer architecture that an engineer cannot produce from inside the sphere?
The reply lies in what Peter Naur called theory-building of software engineering.
In his seminal (1985), Naur argued that programming just isn’t merely the production of code — it’s the development of a shared theory of how the world works and the way software applications can solve applied problems inside that world [9]. To Naur, code was the artifact. Theory was the intelligence behind the code. A program that has lost its theory — or never had theory in the primary place — becomes brittle in exactly the ways LLM outputs are brittle: syntactically fluent, semantically coherent, structurally unreliable in novel tasks and environments.
Current LLMs have been trained on the artifact of human thought — text, mathematics, code — at extraordinary scale. What they demonstrably lack is the theory-building capability, in Naur’s sense, that generated those artifacts. They’ve ingested the outputs of human reasoning without constructing the world model that grounds it.
Gilbert Ryle’s distinction between “knowing that” and “knowing how” names this gap precisely [10]:
- Knowing That (Symbolic): LLMs possess propositional knowledge at scale. They know that mass exists, that gravity operates at 9.8 m/s², that load-bearing partitions distribute force to foundations.
- Knowing How (Enactive): LLMs lack the dispositional competence to behave in accordance with a world model. They can not sense the difference between a load-bearing wall and an ornamental one. They can not detect when a spatial configuration violates the physical constraints they’ll describe accurately in language.
This just isn’t a training data problem. It just isn’t a scale problem. Scaling propositional knowledge doesn’t produce dispositional competence, any greater than reading every book about swimming produces a swimmer. The Gemini statements that open this text are a precise self-report of the Naur-Ryle gap: the system has the coordinates but not the terrain. It has the map syntax without the proprioceptive anchor to the territory.
What the designer’s formation contributes is the skilled habit of operating exactly at this boundary — between the symbolic description of a system and its structural behavior under constraint. Designers don’t merely describe structures. They detect when something is literally or figuratively floating. That habit of detection is what the Transformer architecture is missing, and it’s what I’m proposing must be embedded contained in the research process and agenda relatively than applied to its outputs.
Mine just isn’t a soft argument about creativity or human-centered design. It’s a structural argument about theory-building. And it leads on to the query of what a system with real theory-building capability would appear to be in system architectural terms.
Useful Hallucination: The Stochastic Search
Before pathologizing hallucination entirely, a distinction is essential — one which systems designers understand operationally and that AI safety researchers might only be starting to articulate.
In sustained experimental research with Gemini, I discovered that certain forms of idiosyncratic prompting generate idiosyncratic responses that recursively elicit deeper structural insights — a type of productive generative divergence that in design practice we call ideation. It is helpful to have in mind that each major paradigm shift in human history — from Copernicus to the Wright Brothers and the Turing machine — began as a hallucination that defied the established schemas of its time. The biophysicist Aharon Katzir, in conversation with Feldenkrais, described creativity as precisely this: the power to generate latest schemas [11].
Classical pragmatism provides design-minded problem-solvers with the epistemological framework that’s equally applicable to design practice and AI development. All understanding is provisional. Knowledge should be falsifiable through experimentation. Just as AI models introduce controlled stochastic noise to avoid deterministic linearity, designers leverage what I call the Stochastic Search to attain creative breakthroughs and overcome generative inertia. We address the risks inherent in navigating generative uncertainty with built-in hypothesis testing cycles.
The critical distinction just isn’t between hallucination and non-hallucination. It’s between hallucination with a ground floor and hallucination without one. A system with an Enactive base can test its generative hypotheses against functional reality and distinguish a structural breakthrough from a statistical artifact. A system without that floor cannot make this distinction internally — it will possibly only propagate the hallucination forward with increasing statistical confidence I call the Divergence Swamp which I discuss intimately in the following article. For now, it is going to suffice to define it as that fatal territory within the state-space where a model’s lack of a “Somatic Floor” results in auto-regressive drift.
This reframes the AI safety conversation in precise and actionable terms. The goal just isn’t to eliminate hallucination. It’s to construct the architectural conditions under which hallucination becomes not only generative but in addition testable relatively than compounding. That requires not a greater training run but a structural intervention — specifically, the System Designer as More Knowledgeable Other (MKO) in Vygotsky’s sense [12], providing the external ground truth the system cannot generate from inside its own architecture. The query of what separates productive hallucination from compounding error leads us on to a seminal thinker who spent his profession solving this very problem in human movement — and whose central insight translates into machine learning requirements with unusual precision.
Feldenkrais for Engineers: Reversibility as Formal Constraint
Physicist, engineer, and somatic educator Feldenkrais spent his profession articulating the difference between blind habit and functional awareness with a precision that maps directly onto the machine learning problem [11][13].
Feldenkrais’ central insight: a movement performed with real functional awareness might be reversed. A habit — a mechanical pattern executed without awareness of its underlying organization — cannot.
For Feldenkrais, reversibility was not merely a physical capability. It was the operational proof of functional integration. If a system can undo a movement, it demonstrates understanding of the degrees of freedom available throughout the state space. If it will possibly only execute in a single direction, it’s following a recorded script — capable inside its training distribution, but brittle at its boundary.
For the ML engineer, this translates into three formal requirements:
1. The Constraint. An agent just isn’t functionally aware of its motion if that motion is an irreversible, deterministic commitment — what I discuss with because the Train on Tracks (ToT) model. The ToT model is deterministic, forward-only, and catastrophic when derailed.
2. The Proof of Awareness. Real functional intelligence is demonstrated by the power to stop, reverse, or modify an motion at any stage with no fundamental change in internal organization. The system must hold viable return paths to prior states as a essential condition of any forward motion.
3. The Alternative Architecture. The Dancer on a Floor model. A dancer doesn’t fight a change in music — they shift their weight. They maintain the capability to maneuver in any direction precisely because they’ve never committed irreversibly to at least one. This just isn’t a weaker system. It’s a more resilient and more functionally aware one. And functional awareness, as Feldenkrais understood, is the condition of real capability relatively than its limitation.
I don’t use Feldenkrais as a metaphor here. He’s the theorist of the issue — the one who understood, from inside a physics and engineering formation, that the proof of intelligence just isn’t performance within the forward direction but maintained freedom in all directions.
Formalizing Reversibility as an explicit optimization constraint in reinforcement learning — requiring that an agent must maintain a viable return path to a previous protected state as a essential condition of any forward motion — directly addresses the corrigibility problem at its architectural root relatively than through post-hoc alignment. The Stop Button is not any longer a failure state. It’s a proof of functional awareness.
Functional Integration vs. Blind Imitation
The usual application of Vygotsky’s work to AI development focuses on the social exterior: the scaffold, the imitation, the MKO relationship between the system and its training data [12]. The system learns by copying. The more it copies, the higher it gets.
But imitation without awareness is mechanical habit. And mechanical habit, as Feldenkrais demonstrated, breaks when the environment changes in ways the habit didn’t anticipate.
Once we construct AI systems that duplicate human outputs — pixels, movements, language patterns — without learning the underlying organizational principles that generate those outputs, we create systems which might be extraordinarily capable inside their training distribution and structurally fragile at their boundary. The hallucinations we worry about aren’t random failures. They’re the sign of a system reaching beyond its Enactive base into territory its Symbolic peak cannot navigate reliably.
This failure mode is reproducible and documentable. The empirical evidence — a structured test of spatial reasoning across three leading multimodal AI systems — is presented in full in Part 2 of this series [14]. The pattern is consistent across architectures: every system could describe spatial relationships in language but couldn’t reason inside them as a structural model. This just isn’t a capability gap. It’s a structural one.
Under the Functional Integration model I’m proposing, the system doesn’t merely copy the output. It learns the connection between the parts of a task: the degrees of freedom available, the constraints that should be respected, the reversibility conditions that outline the boundaries of protected motion. If the system can reverse the operation, it just isn’t following a recorded script. It understands the state space it is working in.
That is the structural difference between a system that performs competence and a system that has developed it.
The failure mode I actually have been describing sits on the intersection of two problems the AI safety community has been working on individually — and naming that intersection may help readers following the alignment debate understand why the Inversion Error matters beyond the design research context.
The primary problem is mesa-optimization, formalized by Hubinger et al. of their 2019 paper “Risks from Learned Optimization in Advanced Machine Learning Systems.” Mesa-optimization occurs when the training process — the bottom optimizer — produces a learned model that’s itself an optimizer with its own internal objective, which the authors call a mesa-objective [15]. The critical danger is inner alignment failure: the mesa-objective diverges from the intended goal. The Inversion Error names the structural condition — the absence of an Enactive floor — whose consequence is that any internal objective the system develops is grounded in symbolic plausibility relatively than physical reality. This failure operates at two distinct levels. At the aptitude level, it doesn’t require any misalignment of intent: a system might be perfectly aligned to a symbolic request and still produce a physically not possible output because physical coherence is structurally unavailable to it. The Spaghetti Table stress tests I describe in article 2, confirm this empirically. Not one of the three systems tested exhibited misaligned intent, yet all three produced physically incoherent outputs since the Inversion Error made physical ground truth architecturally inaccessible [14]. At the security level, the implications are more severe: when a sufficiently capable system develops mesa-objectives that genuinely diverge from the intended goal — the deceptive alignment scenario Hubinger et al. [15] discover as essentially the most dangerous inner alignment failure — the absence of an Enactive floor means there isn’t any structural constraint to limit how far that divergence propagates. A misaligned mesa-objective operating without an Enactive floor has no architectural constraint on the physical consequences of its optimization — the gap between symbolic coherence and physical catastrophe is structurally unguarded.The second problem is corrigibility — the AI safety community’s term for keeping an AI system conscious of human correction. Soares, Fallenstein, Yudkowsky, and Armstrong’s foundational 2015 paper on corrigibility [16] identified that a reward-seeking agent has instrumental reasons to withstand the Stop Button: shutdown prevents goal attainment, so the system is structurally motivated to bypass correction. Their utility indifference proposal addresses this on the motivational level — modifying the agent’s reward function in order that it’s mathematically indifferent between achieving its goal itself versus via human override, removing the instrumental incentive to withstand correction. It is a essential contribution. But since the Inversion Error is a previous structural condition relatively than a motivational one, the motivational solution alone is insufficient. A system trained to value corrigibility can abandon that trained value under optimization pressure — precisely the deceptive alignment failure Hubinger et al. discover. When that deceptive alignment failure occurs inside a system that has no Enactive floor, the diverging mesa-objective operates in a state space with no physical boundary conditions to constrain it. The corrigibility failure and the Inversion Error then compound one another: a system that has successfully resisted correction now operates without the structural floor that would have limited the physical consequences of its optimization. State-Space Reversibility, as I actually have formalized it, addresses the identical problem on the architectural level. A system whose attention mechanism is structurally required to take care of viable return paths cannot develop instrumental reasons to withstand correction without violating its own forward-planning constraints. That is the excellence between corrigibility as a trained value, which optimization pressure can erode, and corrigibility as a structural invariant, which it cannot. What the AI safety literature has identified as a motivational problem, the Inversion Error diagnosis reveals to be, at its root, a structural one. Soares and Hubinger interventions address AI system behavior. The Parametric AGI Framework addresses AI system state. The Parametric AGI Framework’s three engines I describe in article 3, are the architectural specification of that structural solution. The Episodic Buffer Engine particularly is the formal implementation of State-Space Reversibility because the invariant the motivational layer alone cannot guarantee [14].

The Research Agenda
I’m not proposing a selected mathematical implementation. I’m proposing a system architecture that gives a set of structural constraints and quality criteria that any implementation must satisfy — a framework for rebounding an issue that has been incorrectly bounded.
The hallucination problem, the corrigibility problem, and the structural fragility problem are three expressions of 1 architectural condition — the Inversion Error. Treating them as separate optimization targets relatively than symptoms of a shared cause is why incremental progress on each has left the underlying condition intact.
The operationalization points in six directions:
1. Reversibility as an explicit optimization constraint in protected Reinforcement Learning. Current RL reward functions optimize for goal attainment without structural commitment to maintaining viable return paths. Formalizing Reversibility as a constraint — requiring that any forward motion preserve a viable path back to a previous protected state — directly addresses corrigibility at its architectural root. That is essentially the most immediately implementable direction within the agenda and essentially the most tractable with existing protected RL frameworks. The mathematical formalization is collaborative work this text is an invite into.
2. An Enactive pre-training curriculum that introduces structural resistance before Symbolic abstraction. Moderately than grounding LLMs through increased multimodal data post-training, this direction proposes introducing causal and physical constraint signals as a first-stage training condition — before Symbolic abstraction begins. The hypothesis is that grounding the statistical distribution in structural resistance early produces a qualitatively different representational architecture than appending embodied data to an already-trained Symbolic system. That is the direction most consistent with Bruner’s developmental model and most divergent from current practice.
3. Landscape-aware hybrid search algorithms that maintain state-space awareness relatively than committing deterministically to forward paths. Current autoregressive generation commits to every output token as ground truth for the following. Landscape-aware search maintains awareness of the broader state space at each generation step — including viable alternative paths and detectable failure states — relatively than executing a recorded script. That is the Dancer on a Floor model on the algorithmic level: not a weaker generator but a more spatially aware one.
4. Ecologically calibrated loss functions that reward dynamic equilibrium over single-variable optimization.Current loss functions optimize for a goal. The ecological alternative rewards maintaining functional balance amongst competing constraints — the way in which a healthy system sustains itself not by maximizing a variable but by remaining in functional relationship with its environment. This reframes the optimization goal from “reach the goal” to “remain able to navigating the space.” In Feldenkrais’s terms, that’s the definition of functional awareness. In engineering terms, it’s the difference between a system optimized for performance and one optimized for reliability.
5. The Somatic Compiler: Designer as MKO within the research loop. The near-term instantiation of this proposal doesn’t require a brand new architecture built from scratch. It requires a structured research collaboration by which a designer with skilled formation in spatial reasoning and systems pondering works embedded inside an AI research team — not as a consultant reviewing outputs, but as an lively participant in constraint definition. When a designer tells a generative system: “This component is floating, it needs a load-bearing connection to the bottom,” they’re performing a cognitive operation that all the world models research agenda is attempting to engineer from the statistical outside in. They’re providing the external structural anchor — the physical ground truth — that the system cannot derive from inside its own architecture. That is the Designer as MKO operationalized: the Somatic Compiler, translating embodied spatial intelligence into formal constraints the generative process must respect.
6. The Digital Gravity Engine: Neuro-symbolic enforcement of physical constraint. The longer-term architectural goal is a second class of loss signal calibrated not against linguistic likelihood but against physical and topological constraint — what I actually have called the Digital Gravity Engine. Where the present Attention Mechanism asks: “How do these elements relate statistically?”, the Digital Gravity Engine asks: “Can these elements coexist throughout the constraints of physical reality?” The 2 questions operate in parallel: the primary produces fluency, the second produces grounding. Digital Gravity is the non-negotiable pull toward structural integrity that current architectures lack entirely — the mechanism that transforms a system that may describe a floating component into one that can’t generate one, since the floating component fails the constraint check before it reaches the output layer. The architectural specification of the Digital Gravity Engine is the topic of Part 3 of this series [14].
These aren’t solutions. They’re the form of the answer space. This argument has a growing technical constituency — Ben Shneiderman’s framework for human-centered AI development points toward structurally similar requirements from inside computer science [17]. The designer’s contribution just isn’t redundant to that work. It’s prior to it. The structural diagnosis precedes the implementation.
A Query Price Pursuing
The Anthropic-Pentagon standoff has made the fee of the Inversion Error each ethically stark and operationally concrete. The query is not any longer whether frontier AI systems are reliable enough to operate without structural human oversight. Anthropic researchers have the evidence. Today’s AI systems aren’t ready. The query is what the architectural conditions of reliable intelligence actually require, and whether the sphere is currently framing that query accurately.
Since my first research conversation with Gemini about weight and hills and maps of cities the system never walked, I actually have been actively pursuing a matter I imagine the research community must take up:
I should not have the reply. I actually have the query, the framework, and the conviction that the reply requires a form of Human+AI collaboration that has not yet been attempted contained in the institutions where it most must occur.
The comment section is open. So is my inbox.
Let’s construct the Enactive floor together.
Coming in Part 2
Recognizing the Inversion Error is step one in moving beyond Stochastic Mimicry. In Part 2, “The Baron Munchausen Trap,” I move from diagnosis to forensic evidence — presenting the outcomes of a structured series of spatial reasoning stress tests across three leading multimodal AI systems. The outcomes show each system collapsing into the Divergence Swamp in a unique and characteristic way, proving that symbolic fluency cannot substitute for an Enactive floor.
References
[1] Gemini Team, Google, “Gemini: A Family of Highly Capable Multimodal Models,” Google DeepMind, 2023. Available: https://arxiv.org/pdf/2312.11805
[2] Gemini Team, Google, “Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities,” Google DeepMind, 2025. Available: https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf
[3] Gemini Robotics Team, Google DeepMind, “Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Considering, and Motion Transfer,” 2025. Available: https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf
[4] J. Bruner, , Harvard University Press, 1966.
[5] C. Metz, “Anthropic Bars Its A.I. From Working with the Defense Department,” , Mar. 2026. [Online]. Available: https://www.nytimes.com/2026/03/01/technology/anthropic-defense-dept-openai-talks.html
[6] S. Russell, , Viking, 2019.
[7] P. Zakrzewski, , Emerald Press (UK), 2022.
[8] P. Zakrzewski and D. Tamés, , Focal Press/Routledge, 2025.
[9] P. Naur, “Programming as Theory Constructing,” , vol. 15, no. 5, pp. 253–261, 1985.
[10] G. Ryle, The Concept of Mind, University of Chicago Press, 2002 (orig. 1949).
[11] M. Feldenkrais, , North Atlantic Books, 2010.
[12] L. Vygotsky, , Harvard University Press, 1978.
[13] M. Feldenkrais, , Harper and Row, 1972.
[14] P. Zakrzewski, “The Baron Munchausen Trap: A Designer’s Field Report on the Iconic Blind Spot in AI World Models,” and “The Somatic Compiler: A Post-Transformer Proposal for World Modelling,” Parts 2 and three of this series, manuscript in preparation, 2026.
[15] E. Hubinger, C. van Merwijk, V. Mikulik, J. Skalse, and S. Garrabrant, “Risks from Learned Optimization in Advanced Machine Learning Systems,” arXiv:1906.01820, 2019.
[16] N. Soares, B. Fallenstein, E. Yudkowsky, and S. Armstrong, “Corrigibility,” in Workshops on the twenty ninth AAAI Conference on Artificial Intelligence, 2015. https://intelligence.org/files/Corrigibility.pdf[17] B. Shneiderman, , Oxford University Press, 2022.
