The Emergence of Self-Models
The Emergence of Self-Models
The self-model analysis connects to multiple research traditions:
- Mirror self-recognition (Gallup, 1970): Behavioral marker of self-model presence. The mirror test identifies systems that model their own appearance—a minimal self-model.
- Theory of Mind (Premack \& Woodruff, 1978): Modeling others’ mental states requires first modeling one’s own. Self-model precedes other-model developmentally.
- Metacognition research (Flavell, 1979; Koriat, 2007): Humans monitor their own cognitive processes—confidence, uncertainty, learning progress. This is self-model salience in action.
- Default Mode Network (Raichle et al., 2001): Brain regions active during self-referential thought. The neural substrate of high self-model salience states.
- Rubber hand illusion (Botvinick \& Cohen, 1998): Self-model boundaries are malleable, updated by sensory evidence. The self is a model, not a given.
The Self-Effect Regime
As a controller becomes more capable, it increasingly shapes its own environment. The observations it receives are increasingly consequences of its own actions.
The self-effect ratio quantifies this shift. For a system with policy in environment :
where denotes mutual information and denotes entropy. This measures what fraction of the information in future observations is attributable to past actions. For capable agents in structured environments, increases with agent capability, and in the limit:
(bounded by the environment’s intrinsic stochasticity).
Self-Modeling as Prediction Error Minimization
When is large, the agent’s own policy is a major latent cause of its observations. Consider the world model’s prediction task:
The term is the agent’s own policy. If the world model treats actions as exogenous—as if they come from outside the system—then it cannot accurately model this term. This generates systematic prediction error.
This generates a pressure toward self-modeling. Let be a world model for an agent with self-effect ratio for some threshold . Then:
where is the prediction loss. The gap grows with .
Without a self-model, the world model must treat as a fixed prior or uniform distribution. But the true action distribution depends on the agent’s internal states—beliefs, goals, and computational processes. By including a model of these internal states (a self-model ), the world model can better predict and hence . The improvement is proportional to the mutual information , which scales with .
□What does such a self-model contain? A self-model is a component of the world model that represents:
- The agent’s internal states (beliefs, goals, attention, etc.)
- The agent’s policy as a function of these internal states
- The agent’s computational limitations and biases
- The causal influence of these factors on action and observation
Formally, where captures the relevant internal degrees of freedom.
Self-modeling becomes the cheapest way to improve control once the agent's actions dominate its observations. The "self" is not mystical; it is the minimal latent variable that makes the agent's own behavior predictable.
A consequence: the self-model has interiority. It does not merely describe the agent’s body from outside; it captures the intrinsic perspective—goals, beliefs, anticipations, the agent’s own experience of what it is to be an agent. Once this self-model exists, the cheapest strategy for modeling other entities whose behavior resembles the agent’s is to reuse the same architecture. The self-model becomes the template for modeling the world. This has a name in Part II—participatory perception—and a parameter that governs how much of the self-model template leaks into the world model. That parameter, the inhibition coefficient , will turn out to shape much of what follows.
The Cellular Automaton Perspective
The emergence of self-maintaining patterns can be illustrated with striking clarity in cellular automata—discrete dynamical systems where local update rules generate global emergent structure.
Formally, a cellular automaton is a tuple where:
- is a lattice (typically for -dimensional grids)
- is a finite set of states (e.g., for binary CA)
- is a neighborhood function specifying which cells influence each update
- is the local update rule
Consider Conway’s Game of Life, a 2D binary CA with simple rules: cells survive with 2–3 neighbors, are born with exactly 3 neighbors, and die otherwise. From these minimal specifications, a zoo of structures emerges: oscillators (patterns repeating with fixed period), gliders (patterns translating across the lattice while maintaining identity), metastable configurations (long-lived patterns that eventually dissolve), and self-replicators (patterns that produce copies of themselves).
Among these, the glider is the minimal model of bounded existence. Its glider lifetime—the expected number of timesteps before destruction by collision or boundary effects—
captures something essential: a structure that maintains itself through time, distinct from its environment, yet ultimately impermanent.
Beings emerge not from explicit programming but from the topology of attractor basins. The local rules specify nothing about gliders, oscillators, or self-replicators. These patterns are fixed points or limit cycles in the global dynamics—attractors discovered by the system, not designed into it. The same principle operates across substrates: what survives is what finds a basin and stays there.
The CA as Substrate
The cellular automaton is not itself the entity with experience. It is the substrate—analogous to quantum fields, to the aqueous solution within which lipid bilayers form, to the physics within which chemistry happens. The grid is space. The update rule is physics. Each timestep is a moment. The patterns that emerge within this substrate are the bounded systems, the proto-selves, the entities that may have affect structure.
This distinction is crucial. When we say “a glider in Life,” we are not saying the CA is conscious. We are saying the CA provides the dynamical context within which a bounded, self-maintaining structure persists—and that structure, not the substrate, is the candidate for experiential properties. The two roles are sharply different. A substrate provides:
- A state space (all possible configurations)
- Dynamics (local update rules)
- Ongoing “energy” (continued computation)
- Locality (interactions fall off with distance)
An entity within the substrate is a pattern that:
- Has boundaries (correlation structure distinct from background)
- Persists (finds and remains in an attractor basin)
- Maintains itself (actively resists dissolution)
- May model world and self (sufficient complexity)
Boundary as Correlation Structure
In a uniform substrate, there is no fundamental boundary—every cell follows the same local rules. A boundary is a pattern of correlations that emerges from the dynamics.
In a CA, this means the following: let be cells. A set constitutes a bounded pattern if:
and
The boundary is the contour where correlation drops below threshold.
A glider in Life exemplifies this: its five cells have tightly correlated dynamics (knowing one cell’s state predicts the others), while cells outside the glider are uncorrelated with it. The boundary is not imposed by the rules—it is the edge of the information structure.
World Model as Implicit Structure
The world model is not a separate data structure in a CA—it is implicit in the pattern’s spatial configuration.
A pattern has an implicit world model if its internal structure encodes information predictive of future observations:
In a CA, this manifests as:
- Peripheral cells acting as sensors (state depends on distant influences via signal propagation)
- Memory regions (cells whose state encodes environmental history)
- Predictive structure (configuration that correlates with future states)
The compression ratio applies: the pattern necessarily compresses the world because it is smaller than the world.
Self-Model as Constitutive
Here is the recursive twist that CAs reveal with particular clarity. When the self-effect ratio is high, the world model must include the pattern itself. But the world model is part of the pattern. So the model must include itself.
In a CA, the self-model is not representational but constitutive. The cells that track the pattern’s state are part of the pattern whose state they track. The map is literally embedded in the territory.
This is the recursive structure described in Part II: “the process itself, recursively modeling its own modeling, predicting its own predictions.” In a CA, this recursion is visible—the self-tracking cells are part of the very structure being tracked.
The Ladder Traced in Discrete Substrate
We can now trace each step of the ladder with precise definitions:
- Uniform substrate: Just the grid with local rules. No structure yet.
- Transient structure: Random initial conditions produce temporary patterns. No persistence.
- Stable structure: Some configurations are stable (still lifes) or periodic (oscillators). First emergence of “entities” distinct from background.
- Self-maintaining structure: Patterns that persist through ongoing activity—gliders, puffers. Dynamic stability: the pattern regenerates itself each timestep.
- Bounded structure: Patterns with clear correlation boundaries. Interior cells mutually informative; exterior cells independent.
- Internally differentiated structure: Patterns with multiple components serving different functions (glider guns, breeders). Not homogeneous but organized.
- Structure with implicit world model: Patterns whose configuration encodes predictively useful information about their environment. The pattern “knows” what it cannot directly observe.
- Structure with self-model: Patterns whose world model includes themselves. Emerges when —the pattern’s own configuration dominates its observations.
- Integrated self-modeling structure: Patterns with high , where self-model and world-model are irreducibly coupled. The structural signature of unified experience under the identity thesis.
Each level requires greater complexity and is rarer. The forcing functions (partial observability, long horizons, self-prediction) should select for higher levels.
The Ladder of Inevitability
Each step follows from the previous under broad conditions:
- Microdynamics Attractors: Bifurcation theory for driven nonlinear systems
- Attractors Boundaries: Dissipative selection for gradient-channeling structures
- Boundaries Regulation: Maintenance requirement under perturbation
- Regulation World Model: POMDP sufficiency theorem — V20: , agents' hidden states predict future position and energy substantially above chance
- World Model Self-Model: Self-effect ratio exceeds threshold () — V20: from initialization; self-model salience in 2/3 seeds
- Self-Model Metacognition: Recursive application of modeling to the modeling process itself — nascent in V20; robust development likely requires resource-scarcity selection creating bottleneck dynamics (V19)
Measure-Theoretic Inevitability
Consider a substrate-environment prior: a probability measure over tuples representing physical substrates (degrees of freedom, interactions, constraints), environments (gradients, perturbations, resource availability), and initial conditions. Call a broad prior if it assigns non-negligible measure to sustained gradients (nonzero flux for times relaxation times), sufficient dimensionality ( large enough for complex attractors), locality (interactions falling off with distance), and bounded noise (stochasticity not overwhelming deterministic structure).
Under such a prior, self-modeling systems are typical. Define:
Then:
for some small depending on the fraction of substrates that lack sufficient computational capacity.
- Probability of structured attractors as gradient strength increases (bifurcation theory)
- Given structured attractors, probability of boundary formation as time increases (combinatorial exploration of configurations)
- Given boundaries, probability of effective regulation for self-maintaining structures (by definition of “self-maintaining”)
- Given regulation, world model is implied (POMDP sufficiency)
- Given world model in self-effecting regime, self-model has positive selection pressure
The only obstruction is substrates lacking the computational capacity to support recursive modeling, which is measure-zero under sufficiently rich priors.
□Inevitability means typicality in the ensemble. The null hypothesis is not "nothing interesting happens" but "something finds a basin and stays there," because that's what driven nonlinear systems do. Self-modeling attractors are among the accessible basins wherever environments are complex enough that self-effects matter. Empirical validation is emerging: in protocell agent experiments (V20–V31), self-modeling develops in 100% of seeds from random initialization — self-models are indeed typical. High integration () develops in approximately 30% of seeds, with the variance dominated by evolutionary trajectory, not initial conditions. The ensemble fraction for self-modeling is near unity; the fraction for rich integration is substantial but stochastic, consistent with the distinction between typicality (the structure will emerge) and universality (every trajectory reaches it).