Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-eeg-foundation-models-reviewgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-eeg-foundation-models-review/SKILL.md--- name: eeg-foundation-models-review description: Comprehensive review of EEG foundation models covering self-supervised learning, transfer learning, and downstream applications. Synthesizes current state-of-the-art approaches, benchmarks, and best practices. Based on arXiv:2604.16655 (April 2026). tags: [EEG, foundation models, self-supervised learning, transfer learning, brain-computer interface, neural decoding] --- # EEG Foundation Models Review ## Overview Comprehensive survey of foundation models for EEG signal processing, covering pre-training strategies, architectural designs, transfer learning approaches, and downstream task performance. **Paper**: arXiv:2604.16655 (April 2026) ## Key Categories ### 1. Self-Supervised Pre-training Objectives 1. **Masked Signal Modeling** - Mask random temporal segments - Predict masked content from context - Similar to BERT's masked language modeling - Works well for temporal structure learning 2. **Contrastive Learning** - Positive pairs: augmentations of same EEG segment - Negative pairs: different subjects/sessions - Subject-invariant representation learning - Handles inter-subject variability 3. **Predictive Coding** - Predict future EEG samples from past - Temporal dynamics modeling - Autoregressive pre-training - Captures neural oscillation patterns 4. **Cross-Modal Pre-training** - EEG + text, EEG + image, EEG + behavior - Multi-modal alignment objectives - Leverages abundant paired data - Enables zero-shot transfer ### 2. Architecture Families #### Transformer-Based - Multi-head self-attention for temporal modeling - Positional encoding for temporal structure - Scalable to large datasets - Most popular approach (BrainBERT, NeuroBERT variants) #### CNN-Based - Local temporal feature extraction - Efficient for short-range dependencies - Lower computational requirements - Good for real-time applications #### Hybrid Architectures - CNN for local features + Transformer for global context - Best of both worlds - More complex training pipeline #### State-Space Models - Selective SSMs (Mamba-style) - Linear scaling with sequence length - Emerging approach for long EEG recordings ### 3. Transfer Learning Strategies #### Fine-tuning Approaches 1. **Full fine-tuning** - All layers updated - Best performance, highest compute - Risk of catastrophic forgetting 2. **Linear probing** - Only classification head trained - Fast evaluation of representation quality - Lower performance ceiling 3. **Parameter-efficient fine-tuning** - LoRA adapters - Prompt tuning - Prefix tuning - Minimal parameter updates #### Domain Adaptation 1. **Subject-to-subject transfer** - Pre-train on many subjects - Adapt to new subject with minimal data - Few-shot / zero-shot scenarios 2. **Task-to-task transfer** - Pre-train on one task (e.g., motor imagery) - Adapt to another (e.g., emotion recognition) - Cross-task generalization 3. **Cross-dataset transfer** - Pre-train on large public datasets - Fine-tune on domain-specific data - Handles distribution shift ### 4. Benchmark Datasets #### Large-Scale Pre-training - TUH EEG Corpus (largest clinical EEG dataset) - Sleep-EDF / Sleep-EDFx - CHB-MIT (seizure detection) - BCI Competition datasets #### Downstream Evaluation - Motor imagery (BCI IV 2a/2b) - Emotion recognition (DEAP, SEED) - Sleep staging - Seizure detection - Cognitive state decoding ## Performance Insights ### Representation Quality 1. Self-supervised pre-training significantly outperforms random initialization 2. Contrastive learning excels at subject-invariant features 3. Masked modeling captures temporal structure better 4. Multi-modal pre-training enables cross-domain transfer ### Scalability 1. Performance scales with pre-training data size 2. Larger models show better transfer to low-data regimes 3. Diminishing returns beyond certain model sizes 4. Computational cost remains a bottleneck ### Practical Considerations 1. Channel configuration matters - Models trained on 64 channels don't transfer well to 32 - Need channel-agnostic pre-training strategies 2. Sampling rate standardization needed 3. Artifact handling critical for clinical deployment 4. Interpretability tools needed for medical applications ## Best Practices ### Pre-training 1. Use diverse datasets covering multiple paradigms 2. Balance subject diversity and data volume 3. Include both clinical and research-grade EEG 4. Standardize preprocessing pipeline ### Fine-tuning 1. Start with linear probing to assess representations 2. Use parameter-efficient methods for small target datasets 3. Apply domain-specific augmentations during fine-tuning 4. Monitor for distribution shift ### Evaluation 1. Report both within-subject and cross-subject performance 2. Use standardized metrics per task type 3. Include ablation studies on pre-training objectives 4. Compare against strong baselines (not just random init) ## Pitfalls - Ignoring inter-subject variability during pre-training - Over-reliance on a single pre-training objective - Not evaluating cross-dataset generalization - Missing channel mismatch between pre-training and fine-tuning - Insufficient data augmentation for contrastive learning - Neglecting computational cost for deployment ## Future Directions 1. Foundation models for emerging EEG modalities (dry electrodes, wearable) 2. Real-time inference optimization 3. Multi-modal fusion (EEG + fNIRS, EEG + eye-tracking) 4. Federated pre-training for privacy-preserving EEG models 5. Open-source benchmarks and standardized evaluation ## Related Skills - eeg-foundation-model-adapters - neural-encoding-evaluation-meeg - meta-learning-in-context-brain-decoding ## References - arXiv:2604.16655 (April 2026)