[Literature Review] The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment

This review is intended for my personal learning

Paper Info

OpenReview: PgIlCCNxdB (ICLR 2026)
Title: The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
Authors: Cheng-Yeh Chen, Raghupathy Sivakumar
Code: cheng-yeh/MindTransformer

Prior Knowledge

Voxel-wise encoding is the standard methodology for studying LLM-brain alignment. Given fMRI recordings from subjects listening to or reading a stimulus, the same stimulus is fed through a language model, and a linear ridge regression is fit to predict each voxel's BOLD signal from the model's representations. The Pearson correlation between predicted and held-out activity quantifies alignment. Almost all prior work has used a single representation per transformer layer — typically the input hidden state of each block, which in a residual stream equals the output of the previous block — and a smaller body of work has used the per-head context vector from individual attention heads.

A transformer block contains far more internal computation than these two representations capture. Within a single block, the input passes through pre-attention layer normalization, query/key/value projections, Rotary Positional Embedding application to queries and keys, attention score computation, weighted aggregation into context vectors, output projection, residual connection, pre-FFN normalization, FFN up-projection, nonlinearity, FFN down-projection, and another residual. Whether these intermediate computations carry distinct neurally-relevant information beyond what the standard hidden state already encodes has not been systematically tested.

On the brain side, language processing is organized hierarchically along the cortex. Low-level acoustic and phonological processing occurs in Heschl's gyrus and the planum temporale within the auditory cortex. From there, two anatomically distinct streams emerge: a ventral stream running through the superior and middle temporal gyrus to anterior temporal regions for sound-to-meaning mapping, and a dorsal stream through the planum temporale to inferior parietal and frontal regions for sound-to-articulation mapping. High-level semantic processing engages the inferior frontal gyrus and angular gyrus, defined functionally as the Fedorenko language network. Prior LLM-brain alignment work has consistently failed to predict activity in low-level auditory regions, leading to a widespread view that LLM representations only converge with the brain at the semantic level.

Main Question

When the transformer block is decomposed into all of its intermediate computational states rather than treated as a single hidden state, which states best predict activity in which brain regions, and does this finer decomposition reveal an alignment with the brain's processing hierarchy that the standard single-state approach misses?

Key Claims

The hidden states commonly used in LLM-brain alignment are suboptimal for the majority of brain voxels. Over 90% of voxels in the language network and over 96% in the auditory cortex are best explained by other intermediate states inside the transformer block.
Within a single transformer block, early-stage states (attention-related computations) preferentially align with low-level sensory cortices, while late-stage states (FFN-related computations) preferentially align with high-level association cortices. This intra-block hierarchy mirrors the brain's own cortical processing hierarchy.
Per-head queries after Rotary Positional Embedding (RoPE) application dominate alignment with the auditory cortex, winning in 73.88% of auditory voxels compared to only 7.82% for per-head queries without RoPE. This effect is specific to auditory regions and provides the first neurobiological evidence for RoPE's functional role.
A simple feature selection framework (MindTransformer) that exploits the distributed nature of brain-relevant information across intermediate states yields larger gains in primary auditory cortex than scaling models by 456 times.

Method

The work proceeds in three stages: dissecting the transformer block into intermediate states, mapping each state to brain activity via voxel-wise encoding, and then aggregating the resulting computational neuroanatomy map into a feature selection framework.

Brain Data and Encoding Pipeline

The fMRI data comes from the English subset of the Le Petit Prince corpus (49 native speakers listening to an audiobook for approximately 100 minutes across 9 runs). For each voxel, signals are averaged across participants to improve SNR, then aligned to LLM features via the standard pipeline:

HRF convolution. Word-level LLM activations are convolved with the canonical Glover hemodynamic response function to match the slow, delayed temporal profile of the BOLD signal.
Ridge regression. A separate L2-regularized linear model is fit per voxel, mapping HRF-convolved features to fMRI activity. The regularization parameter is selected by nested cross-validation in the range .
9-fold cross-validation over the 9 runs of the audiobook, with the final reported metric being the mean Pearson correlation across folds.

Decomposition of the Transformer Block

The block is decomposed into 13 intermediate states organized in three stages:

Block Input (2 states): input hidden state and pre-attention normalized state.
Attention Mechanism (7 states): per-head Q, per-head K, per-head Q with RoPE, per-head K with RoPE, per-head V, per-head context vector, and combined attention output.
FFN and Residuals (4 states): post-attention hidden state, pre-FFN normalized state, FFN activated state (after up-projection), and FFN output (after down-projection).

Per-head states retain shapes such as rather than being flattened, preserving head-level structure for regression. Token-level activations are aggregated to word level by averaging across subword tokens, matching the word-by-word onset timing of the audiobook stimulus.

This decomposition is applied to 21 open-weight LLMs spanning the Llama, Qwen, Mistral, GPT, and Gemma families, ranging from 270M to 123B parameters. The set covers grouped-query attention (Llama, Qwen, most Mistral and Gemma variants), multi-query attention (Gemma 270M and 1B), and mixture-of-experts (GPT-oss); the paper additionally states that standard multi-head attention is included, although every model listed in Appendix Table 2 uses .

Winning State Analysis

For each voxel, the state achieving the highest test correlation is recorded. The proportion of voxels where each state wins is then computed both globally and within specific ROIs: the whole brain (25,870 voxels), the Fedorenko language network (1,740 voxels, comprising IFG orbital, IFG, MFG, anterior and posterior temporal regions, and the angular gyrus), and the auditory-sensory cortex (325 voxels, comprising Heschl's gyrus, planum temporale, and the anterior and posterior divisions of the STG). Note that in this ROI scheme STG is grouped with the auditory cortex rather than the language network.

Intra-Block Hierarchy Quantification

To test whether the position of the winning state within the block correlates with the position of the brain region in the cortical hierarchy, the authors restrict attention to a focused ordered subset of 8 states. The subset starts from per-head Query, identified as the functional entry point because it is the first state to consistently align with early sensory cortices. The input hidden state and pre-attention normalized state are excluded as preceding this entry point, and Key, Value, and Key-with-RoPE are excluded to avoid redundancy with Query, since they occupy the same topological layer but exhibit weaker alignment. Each state in the resulting ordered subset is assigned a normalized Computational Depth:

with , so per-head Query maps to and FFN output to . For each brain region , a Weighted Computational Depth is computed by averaging across voxels:

where is the winning state for voxel . The value of is then plotted against the cortical position of along the auditory stream (HG → PT → STG → MTG) and the language network (MTG → IFG → AG), and a linear fit quantifies the alignment between the transformer's internal processing order and the brain's anatomical processing order.

MindTransformer Framework

Building on the per-voxel winning state analysis, MindTransformer is proposed in two modes:

Mode 1 (Optimal Single-State Selection). For each voxel or ROI, identify the single intermediate state with the highest training-set encoding correlation and use only that state for the final encoding model. This formalizes the winning state analysis as a prediction method.
Mode 2 (Multi-State Feature Integration). Concatenate representations from all 13 states and fit a ridge regression on the full feature space. Then select the top- features ranked by (with ) and retrain a refined ridge model on just those features. All feature selection happens strictly within the training set to prevent leakage.

Robustness Controls

Two controls address possible confounds. First, since FFN activated states have higher dimensionality than other states, the authors apply top-k feature selection to fix all states at . The auditory-vs-language dissociation persists, confirming that high feature count is not the primary driver. Second, encoding correlations are baseline-adjusted against random embeddings and GloVe embeddings to isolate the contribution of context-aware LLM processing beyond stimulus-onset tracking or static lexical features.

Result

Suboptimality of Standard Hidden States

The input hidden state (used by almost all prior alignment work) and the per-head context vector (used by Kumar et al., 2024), the two most commonly studied representations, together win in only 16.65% of whole-brain voxels (2.34% and 14.31% respectively). Within the language network this drops to 9.91%, and within the auditory cortex to 3.68%. Selecting the optimal state per voxel raises mean alignment correlation from 0.275 to 0.296 in the whole brain, 0.433 to 0.450 in the language network, and most dramatically from 0.407 to 0.475 in the auditory cortex.

Intra-Block Hierarchy

The winning state for each voxel systematically tracks the brain's hierarchy. Early-stage states (per-head Q, per-head Q with RoPE) dominate in Heschl's gyrus and the superior temporal gyrus, while late-stage states (FFN activated state, FFN output) dominate in the inferior frontal gyrus and angular gyrus. Quantitatively, the Weighted Computational Depth versus Cortical Hierarchy plot is reported separately for the auditory stream (HG → PT → STG → MTG) and the language network (MTG → IFG → AG). Along the auditory stream all five LLM families show a steep positive slope with consistently high (0.74 to 0.79; e.g. for Llama-3, for Mistral), indicating that ascending cortical levels correspond to deeper intra-block computational depth. Along the language network the trajectory is flat and the values are uneven across families (0.00 for Qwen-3 to 1.00 for Llama-3, with slopes between 0.03 and 0.28), consistent with the interpretation that alignment in association cortex has already saturated at the block's later stages. The same auditory steepness and language plateau are recovered when 5 subjects are analyzed individually on Llama 3.2 1B, suggesting the pattern is not an artifact of group averaging.

Role of RoPE in Auditory Alignment

The per-head query with RoPE is the single state that most improves alignment over the input hidden state, and its improvement is anatomically structured. The top-10 parcels with the largest improvement clearly trace the ventral and dorsal streams of auditory language processing, with the largest gain in Heschl's gyrus itself. The winning ratio comparison is striking: per-head Q with RoPE wins in 73.88% of auditory voxels versus 7.82% for per-head Q without RoPE. The pattern reverses in the language network, where per-head Q without RoPE (19.43%) outperforms its RoPE-enhanced counterpart (9.82%), indicating that RoPE's contribution is specifically tuned to low-level sensory processing rather than being a uniform improvement.

MindTransformer Performance

Averaged across the 21 LLMs and all transformer layers, MindTransformer Mode 2 improves Heschl's gyrus correlation from 0.356 (standard baseline) to 0.467, a 31.0% relative gain. The auditory average improves by 22.0%, while the language network average improves by only 2.6%, reinforcing that the value of intermediate states is concentrated in low-level sensory regions. To contextualize the size of the auditory gain, the paper notes that scaling LLMs 456 times from 270M to 123B parameters typically yields correlation improvements of only 0.02 to 0.04 in auditory regions, whereas Mode 2 achieves a 0.111 gain in Heschl's gyrus.

These improvements survive several robustness controls. Per-subject analysis on the first 5 subjects with Llama 3.2 1B reproduces a 21.9% gain in Heschl's gyrus. Baseline-adjusted improvements remain large: 29.2% over the standard baseline after random-embedding adjustment and 46.0% after GloVe adjustment, both in Heschl's gyrus. Bootstrap significance analysis (FDR ) shows that Mode 2 expands the coverage of significantly aligned voxels in Heschl's gyrus from 78.6% to 89.0%.

In some language regions, Mode 1 slightly outperforms Mode 2 (e.g. Angular Gyrus, 0.419 vs 0.409), suggesting that for high-level semantic processing a single well-chosen state suffices and adding more states can introduce noise.

Scaling laws for language encoding models in fMRI — establishes the log-linear scaling of brain alignment with model size that this paper uses as the comparison baseline for its 456-fold claim.
Shared functional specialization in transformer-based language models and the human brain — the immediate predecessor that introduced per-head context vectors as brain-alignment features, which this paper extends by analyzing all 13 intermediate states.
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network — relevant to the question of whether the RoPE-auditory result reflects architectural priors versus learned computation, since it shows that untrained attention architectures already capture significant brain variance.