[Literature Review] Parallelized Hierarchical Connectome: A Spatiotemporal Recurrent Framework for Spiking State-Space Models

This review is intended for my personal learning

Paper Info

arXiv: 2604.01295
Title: Parallelized Hierarchical Connectome: A Spatiotemporal Recurrent Framework for Spiking State-Space Models
Authors: Po-Han Chiang

I learned about this work through Dr. Po-Han Chiang's presentation at Rethinking Intelligence: A NeuroAI Symposium (RIANS). Many thanks to Dr. Chiang for sharing this line of research.

Prior Knowledge

Modern state-space models (SSMs) such as S4, S4D, S5, Mamba, and LRU achieve efficient sequence processing by restricting their state-transition matrices to diagonal form, enabling parallel associative scans in sequential depth. This diagonal constraint means that each hidden dimension evolves independently at every timestep: neurons within the same step are mutually decoupled, precluding lateral or feedback interactions. To recover representational depth, current SSM architectures stack independent SSM blocks interleaved with MLPs, yielding parameters.

Separately, spiking neural networks (SNNs) and recurrent neural networks (RNNs) support rich spatiotemporal dynamics including lateral connections, Dale's Law (excitatory/inhibitory segregation), and spike-timing-dependent plasticity (STDP). However, their non-linear recurrent dependencies force strictly sequential execution, making them incompatible with parallel hardware for long sequences. Several spiking SSMs have been proposed (Binary-S4D, SpikingSSM, SPikE-SSM), but none enforces the full suite of biological constraints simultaneously, and none introduces learnable lateral connections within the parallelizable recurrence.

The Tsodyks-Markram model of short-term plasticity (STP) describes how synaptic efficacy varies dynamically through facilitation and depression, transforming static structural connectivity into time-varying effective weights. The adaptive leaky integrate-and-fire (ALIF) neuron model extends the standard LIF with an adaptive firing threshold that accumulates with spiking history, providing intrinsic single-cell memory.

Main Question

Can the fundamental trade-off between learnable lateral spatial connections (inter-neuron interactions within a timestep) and temporal parallel scan efficiency in SSMs be resolved through a framework that decouples spatial and temporal dynamics, and can this framework support a comprehensive suite of biological constraints while maintaining competitive sequence modeling performance?

Key Claims

The Parallelized Hierarchical Connectome (PHC) framework is the first SSM architecture to introduce learnable weighted lateral connections within the recurrence structure while preserving temporal parallel scan efficiency.
By collapsing stacked layers into a single shared Neuron Layer and Synapse Layer connected via a Multi-Transmission Loop, the framework reduces parameter complexity from to .
The instantiation PHCSSM is the first model to simultaneously enforce five biological constraints (ALIF dynamics, Dale's Law, short-term plasticity, hierarchical connectome topology, and reward-modulated STDP) within a fully parallelizable training pipeline.
Each biological constraint contributes positively and non-redundantly to performance, functioning as stabilizing inductive biases rather than bottlenecks.

Method

Intra-Step Spatiotemporal Decoupling

The core architectural principle is the separation of temporal and spatial dynamics into distinct, bidirectionally coupled components. PHC identifies a structural isomorphism between conventional stacked SSMs and neural circuit elements: the diagonal SSM core maps to a shared Neuron Layer (NL) that encapsulates intrinsic membrane dynamics, while the inter-layer MLP maps to a shared Synapse Layer (SL) that mediates all inter-neuronal communication. Rather than replicating independent parameters across layers, PHC partitions the neurons of the single NL into hierarchical regions, and routes communication through a single Hierarchical Connectome Matrix , where is a binary topology mask governing permissible projections (feedforward, feedback, and intra-region).

Multi-Transmission Loop

To recover the logical processing depth lost by collapsing layers, the Multi-Transmission Loop circulates signals between NL and SL for steps within each temporal window. At transmission step :

The loop terminates via a Cauchy convergence criterion on the synaptic current, or at a maximum of steps. This achieves depth- spatial recurrence with total computational depth , where the spatial and temporal axes are fully orthogonal. Parallelizability is maintained because within each transmission step, every neuron's input is fixed across all timesteps (determined by the previous step), so the neuron-internal scans and the per-timestep matrix multiplication remain parallelizable.

Neuron Layer

The NL implements ALIF dynamics through three sequential diagonal parallel scans, each solved via log-domain parallel prefix sums. Scan 1 integrates synaptic input through a learnable excitatory decay to produce the membrane potential . Scan 2 computes an adaptive threshold driven by membrane-to-threshold proximity. Scan 3 models post-spike refractory suppression. The final spike output is:

where is the Heaviside function with a fast sigmoid surrogate gradient for backpropagation.

Synapse Layer

The SL comprises a Pre-synapse Module implementing Tsodyks-Markram STP and a Post-synapse Module enforcing biologically constrained spatial transmission. STP operates through two per-neuron state variables, facilitation and recovery , both reformulated as affine recurrences and solved via two sequential parallel scans. The effective synaptic current becomes:

where is the delayed pre-synaptic spike. The weight matrix is subject to three constraints: zero-diagonal (no autaptic self-excitation), Dale's Law (sign-clamping excitatory columns to non-negative and inhibitory columns to non-positive), and hierarchical topology masking via .

R-STDP

Exponentially decaying eligibility traces track pre- and post-synaptic spike timing, both solved in via the same parallel scan primitive. The accumulated weight update is gated by a reward signal derived from batch-level classification accuracy:

R-STDP operates only during training as a complementary learning signal outside the autograd computation graph, while gradient descent handles global optimization.

Experimental Setup

PHCSSM is evaluated on six datasets from the UEA Multivariate Time-Series Classification Archive, following the benchmark protocol of Walker et al. (2024): Heartbeat (405 steps), SCP1 (896 steps), SCP2 (1,152 steps), EthanolConcentration (1,751 steps), MotorImagery (3,000 steps), and EigenWorms (17,984 steps). A 70/15/15 split is used with grid search over learning rate, neuron dimension, readout strategy, and topology configuration, with test accuracy averaged over five fixed seeds. Baselines include NRDE, NCDE, Log-NCDE, LRU, S5, S6, Mamba, LinOSS-IMEX, LinOSS-IM, Transformer, RFormer, LrcSSM, and PD-SSM, all of which are unconstrained models.

Result

PHCSSM achieves the highest reported SSM accuracy on SCP2 (59.3%), surpassing LinOSS-IMEX (58.9%). On MotorImagery, it scores 53.7%, outperforming Mamba by 6.0 percentage points and exceeding six other baselines. On EigenWorms (17,984 timesteps), it achieves 83.9% with only 2,701 parameters, surpassing five baselines including LinOSS-IMEX (80.0%) and Mamba (70.9%). On Heartbeat, 74.2% exceeds LrcSSM (72.7%).

On SCP1 (80.0%) and EthanolConcentration (32.7%), PHCSSM does not reach the top unconstrained models, which the author attributes to the topology-constrained connectivity and fixed E/I ratio limiting capacity on tasks with low channel counts or where richer gating mechanisms are advantageous.

Parameter counts range from 1,748 to 9,485, one to two orders of magnitude smaller than comparably performing SSMs (S5: 67K to 206K, Mamba: 67K to 401K). Training runtime for 1,000 steps with R-STDP ranges from 27 to 129 seconds on an NVIDIA RTX 4090, and peak GPU memory stays between 10 and 48 MB, comparable to unconstrained baselines.

The ablation study across three benchmarks shows that all four ablated constraints contribute positively. ALIF dynamics produce the largest mean accuracy drop when removed (5.23 percentage points), with the most pronounced effect on SCP2 (12.3 points). R-STDP removal yields the most uniform penalty across datasets (mean 4.37 points) and substantially increases the validation-test accuracy gap (from 1.05 to 7.72 on SCP2), suggesting it functions as a generalization regularizer. Dale's Law removal increases training variance (SCP2 standard deviation rising from 6.0 to 10.6), corroborating its role as a structural regularizer.