From Boundaries to Models
From Boundaries to Models
The Necessity of Regulation Under Uncertainty
Once a boundary exists, it must be maintained. The interior must remain distinct from the exterior despite perturbations, degradation, and environmental fluctuations. This maintenance problem has a specific structure.
Let the interior state be and the exterior state be . The boundary mediates interactions through:
- Observations:
- Actions: (boundary permeabilities, active transport, etc.)
The system’s persistence requires maintaining within a viable region despite:
- Incomplete observation of (partial observability)
- Stochastic perturbations (environmental and internal noise)
- Degradation of the boundary itself (requiring continuous repair)
- Finite resources (energy, raw materials)
This maintenance problem has a deep consequence: regulation requires modeling. Let be a bounded system that must maintain under partial observability of . Any policy that achieves viability with probability (where is the viability probability under random actions) implicitly computes a function where is a sufficient statistic for predicting future observations and viability-relevant outcomes.
By the sufficiency principle, any policy that outperforms random must exploit statistical regularities in the observation sequence. These regularities, if exploited, constitute an implicit model of the environment’s dynamics. The minimal such model is the sufficient statistic for the prediction task. In the POMDP formulation (see below), this is the belief state.
□POMDP Formalization
The situation of a bounded system under uncertainty admits precise formalization as a Partially Observable Markov Decision Process (POMDP).
The POMDP framework connects this analysis to several established research programs:
- Active Inference (Friston et al., 2017): Organisms as inference machines that minimize expected free energy through action. The “belief state sufficiency” result here is their “Bayesian brain” hypothesis formalized.
- Predictive Processing (Clark, 2013; Hohwy, 2013): The brain as a prediction engine, with perception as hypothesis-testing. The world model is their “generative model.”
- Good Regulator Theorem (Conant \& Ashby, 1970): Every good regulator of a system must be a model of that system. The belief state sufficiency result above is a POMDP-specific instantiation.
- Embodied Cognition (Varela, Thompson \& Rosch, 1991): Cognition as enacted through sensorimotor coupling. My emphasis on the boundary as the locus of modeling aligns with enactivist insights.
Formally, a POMDP is a tuple where:
- : State space (true world state, including system interior)
- : Action space
- : Observation space
- : Transition kernel,
- : Observation kernel,
- : Reward function
- : Discount factor
The agent does not observe directly but only . The sufficient statistic for decision-making is the belief state—the posterior distribution over world states given the history:
The belief state updates via Bayes’ rule:
A classical result establishes that is a sufficient statistic for optimal decision-making: any optimal policy can be written as , mapping belief states to actions.
This establishes that any system that performs better than random under partial observability is implicitly maintaining and updating a belief state—i.e., a model of the world.
The World Model
In practice, maintaining the full belief state is computationally intractable for complex environments. Real systems maintain compressed representations.
A world model is a parameterized family of distributions that predicts future observations given history and planned actions, for some horizon .
Modern implementations in machine learning typically use recurrent latent state-space models:
The latent state serves as a compressed belief state, and the model is trained to minimize prediction error:
The world model is not an optional add-on. It is the minimal object that makes coherent control possible under uncertainty. Any system that regulates effectively under partial observability has a world model, whether explicit or implicit.
The Necessity of Compression
The world model is not merely convenient—it is constitutively necessary. This follows from a fundamental asymmetry between the world and any bounded system embedded within it.
The information bottleneck makes this precise.
Let be the world state space with effective dimensionality , and let be a bounded system with finite computational capacity . Then:
where is the system’s internal representation. The world model necessarily inhabits a state space smaller than the world.
The world contains effectively unbounded degrees of freedom: every particle, field configuration, and their interactions across all scales. Any physical system has finite matter, energy, and spatial extent, hence finite information-carrying capacity. The system cannot represent the world at full resolution; it must compress. This is not a limitation to be overcome but a constitutive feature of being a bounded entity in an unbounded world.
□The compression ratio of a world model captures how aggressively this simplification operates:
where is the subspace of world states that affect the system’s viability. The compression ratio characterizes how much the system must discard to exist. And this has a profound implication: compression determines ontology. What a system can perceive, respond to, and value is determined by what survives compression. The world model’s structure—which distinctions it maintains, which it collapses—constitutes the system’s effective ontology.
The information bottleneck principle formalizes this: the optimal representation maximizes information about viability-relevant outcomes while minimizing complexity:
The Lagrange multiplier controls the compression-fidelity tradeoff. Different values yield different creatures: high produces simple organisms with coarse world models; low produces complex organisms with rich representations.
The world model is not a luxury or optimization strategy. It is what it means to be a bounded system in an unbounded world. The compression ratio is not a parameter to be minimized but a constitutive feature of finite existence. What survives compression determines what the system is.
This has a precise architectural consequence that the experiments will confirm (Part VII, V22–V27). A linear prediction head compresses hidden state to output through a single weight matrix — and a single matrix is always decomposable into independent columns, each serving a separate target dimension. The compression creates a factored ontology: the system's internal states are channeled into independent streams with no pressure to coordinate. Replace the linear map with a two-layer architecture, and the compression changes: the chain rule through two weight matrices means every hidden dimension's gradient depends on every other dimension's activation at the intermediate layer. The compression now demands coordination. What survives it is not a collection of independent features but a coupled representation — an ontology where the parts cannot be understood without the whole. Compression does not merely determine what the system perceives. It determines whether the system's internal states are unified or factored.
Attention as Measurement Selection
Compression determines what can be perceived. But a second operation determines what is perceived: attention. Even within the compressed representation, the system must allocate processing resources selectively—it cannot respond to all viability-relevant features simultaneously. Attention is this allocation.
In any system whose dynamics are sensitive to initial conditions—and all nonlinear driven systems are—the choice of what to measure has consequences beyond what it reveals. It determines which trajectories the system becomes correlated with.
The claim is that attention selects trajectories. Let a system inhabit a chaotic environment where small differences in observation lead to divergent action sequences. The system’s attention pattern weights which observations are processed at high fidelity and which are compressed or discarded. Because subsequent actions depend on processed observations, and those actions shape future states, the attention pattern selects which dynamical trajectory the system follows from the space of trajectories consistent with its current state.
This is not metaphor. In deterministic chaos, trajectories diverge exponentially from nearby initial conditions. The system’s attention pattern determines which perturbations are registered and which are ignored, which means it determines which branch of the diverging trajectory bundle the system follows. The unattended perturbations are not “collapsed” or destroyed—they continue to exist in the dynamics of the broader environment. But the system’s future becomes correlated with the perturbations it attended to and decorrelated from those it did not.
The mechanism admits a precise formulation. Let be the a priori distribution over states—the probability of finding the environment in state , governed by physics. Let be the system’s measurement distribution—the probability that it attends to, and therefore registers, a perturbation at state . The effective distribution over states the system becomes correlated with is:
The system does not control —that is physics. But it controls —that is attention. If is sharply peaked (narrow attention), the effective distribution concentrates on a small region of state space regardless of the prior. If is broad (diffuse attention), the effective distribution approximates the prior. The system’s trajectory through state space follows from the sequence of effective distributions it generates, each conditioned on the previous.
This has a consequence for agency that deserves explicit statement. A system whose trajectory depends on its attention pattern is a system whose future depends, in part, on what it chooses to measure. Every branch it follows is fully deterministic—no physical law is violated. But which deterministic branch it follows is selected by the attention pattern, which is itself a product of the system’s internal dynamics (its world model, its self-model, its policy). This is not “free will” in the libertarian sense of uncaused choice. It is something more precise: trajectory selection through measurement, where the selecting mechanism is the system’s own cognitive architecture. Determinism is preserved. Agency is real. Both are true because “agency” does not require violation of physical law—it requires that the system’s internal states (including its values, its goals, its attention distribution) causally influence which trajectory it follows. They do.
This trajectory-selection mechanism operates at the population level too. In evolutionary experiments (V31), different seeds follow different trajectories through the same dynamical landscape — not because their initial conditions differ (all start identically) but because the drought-recovery measurement distribution differs: which agents survive each bottleneck selects which evolutionary path the population follows. The correlation between post-drought recovery and mean integration across seeds is . The measurement distribution — which perturbations are survived rather than which are attended to — selects the trajectory. The equation is the same; the scale is different.
This trajectory selection has a temporal depth. Once measurement information is integrated into the system’s belief state, its future must remain consistent with what was observed. Registered observations constrain the trajectory: the system cannot “un-observe” a perturbation. However, if entropy degrades the information—if the observation is forgotten, overwritten, or lost to noise—the constraint dissolves. The system’s trajectory is no longer pinned by that measurement, and the space of accessible futures re-expands. Sustained attention to a particular feature of reality functions as repeated measurement: it continuously re-constrains the trajectory, stabilizing it near states consistent with the attended feature. This is analogous to the quantum Zeno effect, where repeated measurement prevents a system from evolving away from its measured state—but the classical version requires no quantum mechanics, only the sensitivity of chaotic dynamics to which perturbations are registered.
The trajectory-selection mechanism admits a speculative extension. In an Everettian quantum framework, where all measurement outcomes coexist as branches, attention would determine not just which classical trajectory a system follows but which quantum branch it becomes entangled with. The effective distribution equation above would apply at the quantum level: the a priori distribution is the quantum state, the measurement distribution is the observer’s attention pattern, and the effective distribution determines which branch the observer becomes entangled with.
Whether this quantum extension is necessary depends on whether quantum coherence persists at scales relevant to biological attention—a question on which the evidence is currently against, given decoherence timescales at biological temperatures. But the classical version of the claim (attention selects among chaotically-divergent trajectories) requires no quantum commitment and is sufficient to establish that what a system attends to partially determines what happens to it, not merely what it knows about what happens to it. The speculative extension is noted here because the formal structure is identical at both scales—the same equation governs trajectory selection whether the underlying dynamics are classical-chaotic or quantum-mechanical.