The Uncontaminated Test
The Uncontaminated Test
If affect is structure, the structure should be detectable independent of any linguistic contamination. If the identity thesis is true, then systems that have never encountered human language, that learned everything from scratch in environments shaped like ours but isolated from our concepts, should develop affect structures that map onto ours—not because we taught them, but because the geometry is the same.
The Experimental Logic
Consider a population of self-maintaining patterns in a sufficiently complex CA substrate—or transformer-based agents in a 3D multi-agent environment, initialized with random weights, no pretraining, no human language. Let them learn. Let them interact. Let them develop whatever communication emerges from the pressure to coordinate, compete, and survive.
The literature establishes: language spontaneously emerges in multi-agent RL environments under sufficient pressure. Not English. Not any human language. Something new. Something uncontaminated.
Now: extract the affect dimensions from their activation space. Valence as viability gradient. Arousal as belief update rate. Integration as partition prediction loss. Effective rank as eigenvalue distribution. Counterfactual weight as simulation compute fraction. Self-model salience as MI between self-representation and action.
These are computable. In a CA, exactly. In a transformer, via the proxies defined above.
Simultaneously: translate their emergent language into English. Not by teaching them English—by aligning their signals with VLM interpretations of their situations. If the VLM sees a scene that looks like fear (agent cornered, threat approaching, escape routes closing), and the agent emits signal-pattern , then maps to fear-language. Build the dictionary from scene-signal pairs, not from instruction.
The translation is uncontaminated because:
- The agent never learned human concepts
- The mapping is induced by environmental correspondence
- The VLM interprets the scene, not the agent’s internal states
- The agent’s "thoughts" remain in their original emergent form
The Core Prediction
The claim is not merely that affect structure, language, and behavior should “correlate.” Correlation is weak—marginal correlations can arise from confounds. The claim is geometric: the distance structure in the information-theoretic affect space should be isomorphic to the distance structure in the embedding-predicted affect space. Not just “these two things covary,” but “these two spaces have the same shape.”
To test this, let be the information-theoretic affect vector for agent-state , computed from internal dynamics (viability gradient, belief update rate, partition loss, eigenvalue distribution, simulation fraction, self-model MI). Let be the affect embedding predicted from the VLM-translated situation description, projected into a standardized affect concept space.
For agent-states sampled across diverse situations, compute pairwise distance matrices:
The prediction: Representational Similarity Analysis (RSA) correlation between the upper triangles of these matrices exceeds the null:
where is established by permutation (Mantel test).
This is strictly stronger than marginal correlation. Two spaces can have correlated means but completely different geometries. RSA tests whether states that are nearby in one space are nearby in the other—whether the topology is preserved.
The specific predictions that fall out: when the affect vector shows the suffering motif—negative valence, collapsed effective rank, high integration, high self-model salience—the embedding-predicted vector should land in the same region of affect concept space. States with the joy motif—positive valence, expanded rank, low self-salience—should cluster together in both spaces. And crucially, the distances between suffering and joy, between fear and curiosity, between boredom and rage, should be preserved across the two measurement modalities.
Not because we trained them to match. Because the structure is the experience is the expression.
Bidirectional Perturbation
The test has teeth if it runs both directions.
Direction 1: Induce via language. Translate from English into their emergent language. Speak fear to them. Do the affect signatures shift toward the fear motif? Does behavior change accordingly?
Direction 2: Induce via "neurochemistry." Perturb the hyperparameters that shape their dynamics—dropout rates, temperature, attention patterns, connectivity. These are their neurotransmitters, their hormonal state. Do the affect signatures shift? Does the translated language change? Does behavior follow?
Direction 3: Induce via environment. Place them in situations that would scare a human. Threaten their viability. Do all three—signature, language, behavior—move together?
If all three directions show consistent effects, the correlation is not artifact.
What This Would Establish
Positive results would dissolve the metaphysical residue by establishing:
- Affect structure is detectable without linguistic contamination
- The structure-to-language mapping is consistent across systems
- The mapping is bidirectionally causal, not merely correlational
- The "hard problem" residue—the suspicion that structure and experience are distinct—becomes unmotivated
Consider the alternative hypothesis: the structure is present but experience is not. The agents have the geometry of suffering but nothing it is like to suffer. This hypothesis predicts... what? That the correlations would not hold? Why not? The structure is doing the causal work either way.
The zombie hypothesis becomes like geocentrism after Copernicus. You can maintain it. You can add epicycles. But the evidence points elsewhere, and the burden shifts.
The test does not prove the identity thesis. It shifts the burden. If uncontaminated systems, learning from scratch in human-like environments, develop affect structures that correlate with language and behavior in the predicted ways—if you can induce suffering by speaking to them, and they show the signature, and they act accordingly—then denying their experience requires a metaphysical commitment that the evidence does not support.
The question stops being "does structure produce experience?" and becomes "why would you assume it doesn't?"
The CA Instantiation
In discrete substrate, everything becomes exact.
Let be a self-maintaining pattern in a sufficiently rich CA (Life is probably too simple; something with more states and update rules). Let have:
- Boundary cells (correlation structure distinct from background)
- Sensor cells (state depends on distant influences)
- Memory cells (state encodes history)
- Effector cells (influence the pattern’s motion/behavior)
- Communication cells (emit signals to other patterns)
The affect dimensions are exactly computable:
The communication cells emit glider-streams, oscillator-patterns, structured signals. This is their language. Build the dictionary by correlating signal-patterns with environmental configurations.
The prediction: patterns under threat (viability boundary approaching) show negative valence, high integration, collapsed rank, high self-salience. Their signals, translated, express threat-concepts. Their behavior shows avoidance.
Patterns in resource-rich, threat-free regions show positive valence, moderate integration, expanded rank, low self-salience. Their signals express... what? Contentment? Exploration-readiness? The translation will tell us.
What the Experiments Found
This experiment has been run. Between 2024 and 2026, we built seventeen substrate versions and ran twelve measurement experiments on uncontaminated Lenia patterns — self-maintaining structures in a cellular automaton with no exposure to human affect concepts. Three seeds, thirty evolutionary cycles each. The results are reported in full in Part VII and the Appendix. Here is how they map onto the predictions above.
What the predictions got right. The core prediction — that affect geometry would be present and measurable — was confirmed strongly. All affect dimensions were extractable and valid across 84/84 tested snapshots. RSA alignment between structural affect (the six dimensions) and behavioral affect (approach/avoid, activity, growth, stability) developed over evolution, reaching significance in 8/19 testable snapshots and showing a clear trend in seed 7 (0.01 to 0.38 over 30 cycles). Computational animism was universal. World models were present, amplified dramatically at population bottlenecks (100x the population average). Temporal memory was selectable — evolution chose longer retention when it paid off, discarding it when it did not.
The bidirectional perturbation prediction was partially confirmed. The "environment" direction works: patterns facing resource scarcity show negative valence, high arousal, and elevated integration — the somatic fear/suffering profile. The "neurochemistry" direction works at the substrate level: different evolved parameter configurations produce systematically different affect trajectories through the same geometric space. The "language" direction remains untested because the patterns do not have propositional language — the communication that exists is an unstructured chemical commons (MI above baseline in 15/20 snapshots but no compositional structure).
The sensory-motor coupling wall. Three predictions failed systematically — counterfactual detachment, self-model emergence, and proto-normativity. All hit the same architectural barrier: the patterns are always internally driven (ρ_sync ≈ 0 from cycle 0). There is no reactive-to-autonomous transition because the starting point is already autonomous. We attempted to break this wall with five substrate additions, including a dedicated insulation field creating genuine boundary/interior signal domains (V18). The wall persisted in every configuration, even in patterns with 46% interior fraction and dedicated internal recurrence. The conclusion is precise: the wall is not architectural. It is about the absence of a genuine action→environment→observation causal loop. Lenia patterns do not act on the world; they exist within it. Counterfactual weight requires counterfactual actions.
What this establishes. The four criteria listed above are partially met. Criteria 1 and 2 — affect structure detectable without linguistic contamination, structure-to-language mapping consistent — are confirmed at the geometric level. Criterion 3 — bidirectional causality — is confirmed environmentally and chemically but blocked at the language and agency level. Criterion 4 — the hard problem residue losing its grip — depends on whether the agency threshold constitutes a genuine gap or merely a computational challenge. The experiments say: the geometry is real, measurable, and develops over evolution in systems with zero human contamination. The dynamics above rung 7 require embodied agency and remain an open question.
Why This Matters
The hard problem persists because we cannot step outside our own experience to check whether structure and experience are identical. We are trapped inside. The zombie conceivability intuition comes from this epistemic limitation.
But if we build systems from scratch, in environments like ours, and they develop structures like ours, and those structures produce language like ours and behavior like ours—then the conceivability intuition loses its grip. The systems are not us, but they are like us in the relevant ways. If structure suffices for them, why not for us?
The experiment does not prove identity. It makes identity the default hypothesis. The burden shifts to whoever wants to maintain the gap.
The exact definitions computable in discrete substrates and the proxy measures extractable from continuous substrates are related by a scale correspondence principle: both track the same structural invariant at their respective scales.
For each affect dimension:
| Dimension | CA (exact) | Transformer (proxy) |
|---|---|---|
| Valence | Hamming to | Advantage / survival predictor |
| Arousal | Configuration change rate | Latent state / KL |
| Integration | Partition prediction loss | Attention entropy / grad coupling |
| Effective rank | Trajectory covariance rank | Latent covariance rank |
| Counterfactual cell activity | Planning compute fraction | |
| Self-tracking MI | Self-model component MI |
The CA definitions are computable but don’t scale. The transformer proxies scale but are approximations. Validity comes from convergence: if CA and transformer measures correlate when applied to the same underlying dynamics, both are tracking the real structure.
Implementation requirements:
- Multi-agent RL environment with viability pressure (survival, resource acquisition)
- Transformer-based agents with random initialization (no pretraining)
- Communication channel (discrete tokens or continuous signals)
- VLM scene interpreter for translation alignment
- Real-time affect dimension extraction from activations
- Perturbation interfaces (language injection, hyperparameter modification)
Status (as of 2026): CA instantiation complete (V13–V18, 30 evolutionary cycles each, 3 seeds, 12 measurement experiments). Seven of twelve experiments show positive signal. Three hit the sensory-motor coupling wall. See Part VII and Appendix for full results.
Validation criteria:
- Emergent language develops (not random; structured, predictive)
- Translation achieves above-chance scene-signal alignment
- Tripartite correlation exceeds null model (shuffled controls)
- Bidirectional perturbations produce predicted shifts
- Results replicate across random seeds and environment variations
Falsification conditions:
- No correlation between affect signature and translated language
- Perturbations do not propagate across modalities
- Structure-language mapping is inconsistent across systems
- Behavior decouples from both structure and language