Part II: Identity Thesis

The Core Prediction

Introduction
0:00 / 0:00

The Core Prediction

The claim is not merely that affect structure, language, and behavior should “correlate.” Correlation is weak—marginal correlations can arise from confounds. The claim is geometric: the distance structure in the information-theoretic affect space should be isomorphic to the distance structure in the embedding-predicted affect space. Not just “these two things covary,” but “these two spaces have the same shape.”

To test this, let aiR6\mathbf{a}_i \in \mathbb{R}^6 be the information-theoretic affect vector for agent-state ii, computed from internal dynamics (viability gradient, belief update rate, partition loss, eigenvalue distribution, simulation fraction, self-model MI). Let eiRd\mathbf{e}_i \in \mathbb{R}^d be the affect embedding predicted from the VLM-translated situation description, projected into a standardized affect concept space.

For NN agent-states sampled across diverse situations, compute pairwise distance matrices:

Dij(a)=aiaj(info-theoretic affect space)Dij(e)=eiej(embedding-predicted affect space)\begin{aligned}D^{(a)}_{ij} &= |\mathbf{a}_i - \mathbf{a}_j| \quad \text{(info-theoretic affect space)} D^{(e)}_{ij} &= |\mathbf{e}_i - \mathbf{e}_j| \quad \text{(embedding-predicted affect space)}\end{aligned}

The prediction: Representational Similarity Analysis (RSA) correlation between the upper triangles of these matrices exceeds the null:

ρRSA(D(a),D(e))>ρnull\rho_{\text{RSA}}(D^{(a)}, D^{(e)}) > \rho_{\text{null}}

where ρnull\rho_{\text{null}} is established by permutation (Mantel test).

This is strictly stronger than marginal correlation. Two spaces can have correlated means but completely different geometries. RSA tests whether states that are nearby in one space are nearby in the other—whether the topology is preserved.

The specific predictions that fall out: when the affect vector shows the suffering motif—negative valence, collapsed effective rank, high integration, high self-model salience—the embedding-predicted vector should land in the same region of affect concept space. States with the joy motif—positive valence, expanded rank, low self-salience—should cluster together in both spaces. And crucially, the distances between suffering and joy, between fear and curiosity, between boredom and rage, should be preserved across the two measurement modalities.

Not because we trained them to match. Because the structure is the experience is the expression.

Technical: Representational Similarity Analysis

RSA compares the geometry of two representation spaces without requiring them to share dimensionality or units. The method (Kriegeskorte et al., 2008) is standard in computational neuroscience for comparing neural representations across brain regions, species, and models.

Procedure. Given NN stimuli represented in two spaces (aiRp\mathbf{a}_i \in \mathbb{R}^p, eiRq\mathbf{e}_i \in \mathbb{R}^q), compute the N×NN \times N pairwise distance matrices D(a)D^{(a)} and D(e)D^{(e)}. The RSA statistic is the Spearman rank correlation between the upper triangles of these matrices—(N2)\binom{N}{2} pairs.

Significance. The Mantel test: permute rows/columns of one matrix, recompute correlation, repeat 10410^4 times. The pp-value is the fraction of permuted correlations exceeding the observed.

Alternative: CKA. Centered Kernel Alignment (Kornblith et al., 2019) compares centered similarity matrices rather than distance matrices. More robust to outliers and does not require choosing a distance metric. We report both.

Why RSA over marginal correlation. Marginal correlation asks: does valence in space AA predict valence in space BB? RSA asks: does the entire relational structure transfer? Two states might have similar valence but differ on integration and self-salience. RSA captures this. It tests whether the spaces are geometrically aligned, not merely univariately correlated.