Foundation Model for Intracranial Recordings: Training & Representation Analysis

Authors

  • Sebastjan Kramar University of Ljubljana

Abstract

Intracranial recordings—principally electrocorticography (ECoG) and stereoelectroencephalography (SEEG)—deliver millisecond-resolved, high-SNR measurements of population-level neural activity directly from the cortical surface. Despite this richness, deep-learning (DL) methods remain under-used because openly licensed datasets are scarce, electrode layouts vary across patients, and brain signals drift over time. These factors hinder cross-study aggregation and make it difficult to train high-capacity networks without overfitting. Still, DL models have been shown to outperform pipelines based on hand-crafted spectral features and shallow classifiers when applied to sufficiently rich biomedical signals, revealing non-linear relationships that conventional approaches often miss [1].

This thesis aims to address this gap by pre-training a foundation model—a task-agnostic network exposed to diverse, unlabeled ECoG and SEEG data so it can be fine-tuned for downstream applications. The dataset used is a closed-source collection from UMC Utrecht, comprising recordings from over 152 participants implanted with research or clinical grids. These include patients with ALS, other forms of paralysis, or those undergoing surgery with research grids. While the full dataset includes potential for 24/7 monitoring, this project will use only task-related segments ranging from a few minutes to several hours for now. Both classic and high-density grids are available; for now, we focus on classic grids only.

After preprocessing (notch filtering, re-referencing, and windowing), signals will be fed into a network in an unsupervised manner. The main architecture is based on U-Net, which preserves local detail while integrating broader context through its encoder–decoder structure with skip connections. The model will be trained using a contrastive self-supervised objective to encourage the discovery of stable structure across time windows.

Generalisation will be tested by fine-tuning on held-out data not seen during pre-training. Two benchmark tasks—attempted-movement decoding and speech-sound reconstruction—will potentially serve as validation. The network will handle variable-length inputs and apply spatial masking to accommodate differing electrode configurations. Performance will be compared to patient-specific baselines trained on standard features. Finally, we will explore the model’s embeddings using PCA, clustering, and saliency analysis—not to interpret specific patterns, but to examine whether any meaningful structure emerges.

This work has limitations. Despite its size, the dataset may still lack the heterogeneity needed for broad generalisation. Furthermore, while the model may uncover consistent patterns, we do not expect to link these directly to cognitive or physiological processes. The goal is to provide a reusable foundation—not full interpretability—on which future research can build.

References

[1] A. H. Peterson et al., “Generalized neural decoders for transfer learning across patients and recording modalities,” Nature Biomedical Engineering, vol. 5, no. 11, pp. 1229–1238, 2021.

Published

2025-06-10