Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-data-value-scaling-lawsgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-data-value-scaling-laws/SKILL.md---
name: brain-data-value-scaling-laws
description: "Mathematical framework for quantifying the value of brain data for machine learning. Derives scaling laws for brain-regularized estimators, brain-to-task data exchange rates, and budget-optimal allocation of neural vs task samples. From Lewis et al. 2026 (arXiv:2605.09243). Use when: evaluating neural data for ML training, brain distillation, NeuroAI data collection planning, brain-regularized learning, neural alignment analysis."
---
# Brain Data Value Scaling Laws
Mathematical framework from **"How Much is Brain Data Worth for Machine Learning?"** (Lewis, Wang, Schwab, Pitkow -- CMU, NSF AI Institute, arXiv:2605.09243, May 2026) for quantifying the value of neural recordings as a training resource for machine learning models.
## Core Question
If a biological system (human/animal) can solve a task, does measuring its neural activity make it easier to train an ML model to solve that task too? If so, how much easier?
## Generative Model
The framework models four objects:
1. **Environmental inputs** x ~ N(0, I_{dx})
2. **Latent neural features** -- intermediate brain representations that are lower-dimensional than inputs but partially aligned with the task
3. **Neural recordings** -- noisy, partial observations of latent features
4. **Task targets** -- y = x^T beta* + epsilon_y
Key parameter: **misalignment m** between brain and task features, arising from:
- Only a subset of brain latents being measured
- Brain features not fully capturing task-relevant directions
## BEFS Estimator (Brain Encoding Foundation Student)
Two-stage estimator that uses neural recordings:
### Stage 1: Brain Encoding
- Learns encoding model from nB brain samples
- A_hat, H_hat = LRR(X, R) -- low-rank regression
- Learns latent feature map from inputs to low-dimensional neural representations
### Stage 2: Task Learning
- Uses learned brain features to regularize task learning on nT task samples
- Ridge regression with learned brain feature prior:
beta_hat_BEFS = argmin_beta (1/nT)||y - X beta||^2 + lambda ||A_hat^T beta||^2
## Scaling Laws
### Asymptotic Test Error (Theorem 1)
E[||y_test - x_test^T beta_hat_BEFS||^2] = sigma_y^2 * (dx/nT) + alpha * Tr(J_A_hat)/nT + O(1/nT^2)
where alpha = nT/(nT + lambda) and J_A_hat captures brain-data-dependent terms.
### Brain-Task Data Exchange Rate
Quantifies how many task samples nT brain data nB can substitute for:
Value(nB) = nT_baseline - nT_with_brain(nB)
The exchange rate depends on:
- **Brain-task alignment (m)**: higher alignment -> more value
- **SNR ratio (SNR_T/SNR_B)**: better neural SNR -> more value
- **Latent dimension (d_lH*/dx)**: smaller latent fraction -> more value
- **Number of brain samples (nB)**: value saturates with more brain data
- **Task sample regime**: brain data is most valuable in small/moderate nT regimes
### Key Findings
1. **Brain data substitutes for task data** -- yields equal performance while saving a percentage of task samples
2. **Value decreases with nT** -- savings are highest when task data is scarce
3. **Distribution shift matters** -- brain-regularized learning provides robustness when test distribution has mass on brain-insensitive directions
4. **Adversarial inputs can negate value** -- under certain distribution shifts, brain data can even hurt performance
5. **Budget optimization** -- under fixed collection budget, optimal allocation of nB vs nT depends on cost ratio cB/cT
## Budget Allocation Formula
Given total budget B = cB * nB + cT * nT:
- Optimal (nB*, nT*) maximizes performance under budget constraint
- Brain data is worth collecting when:
- Task solving is harder than brain estimation
- Small number of highly task-aligned latents are well-exposed
- Neural SNR is sufficient relative to task noise
## When Brain Data is Most Valuable
| Condition | Effect on Value |
|-----------|-----------------|
| Low task sample regime (nT small) | High value |
| High brain-task alignment (low m) | High value |
| Good neural SNR | More value |
| Small latent dimension fraction | More value |
| Large task sample regime | Diminishing returns |
| Adversarial test distribution | Can be negative |
## Application Scenarios
1. **NeuroAI data collection planning** -- Should you collect brain data or more task labels?
2. **Brain-regularized model training** -- Using neural recordings as regularization
3. **Brain distillation** -- Extracting representations from biological systems
4. **fMRI/EEG/Neural recording studies** -- Quantifying expected ML benefits
5. **Neural alignment analysis** -- Measuring how well brain features align with tasks
## Implementation Sketch
```python
import numpy as np
def brain_data_exchange_rate(n_B, m, snr_ratio, d_latent, d_input, n_T_base):
"""Estimate how many task samples brain data can substitute for."""
delta = m * (1 / snr_ratio) * (d_latent / d_input)
savings = delta * np.sqrt(n_B) / (1 + np.sqrt(n_B / n_T_base))
return n_T_base * savings / (1 + savings)
def optimal_budget_allocation(budget, c_B, c_T, m, snr_ratio, d_latent, d_input):
"""Find optimal split between brain and task samples under budget constraint."""
best_perf = float('inf')
best_nB, best_nT = 0, 0
for n_B in range(1, int(budget / c_B) + 1):
n_T = int((budget - c_B * n_B) / c_T)
if n_T < 1:
continue
perf = estimate_test_error(n_B, n_T, m, snr_ratio, d_latent, d_input)
if perf < best_perf:
best_perf = perf
best_nB, best_nT = n_B, n_T
return best_nB, best_nT
```
## Related Work
- **Brain distillation**: Using neural recordings to guide ML training
- **Scaling laws**: Systematic relationships between resources and performance (Kaplan et al. 2020)
- **Brain-aligned ML**: Selecting models/data based on brain predictiveness
- **Neural encoding models**: Predicting neural responses from stimuli
## References
- Lewis, L., Wang, Z., Schwab, D., Pitkow, X. (2026). "How Much is Brain Data Worth for Machine Learning?" arXiv:2605.09243
- Kaplan, J. et al. (2020). "Scaling laws for neural language models." arXiv:2001.08366
## Activation Keywords
- brain data value, neural data worth, brain distillation, brain-regularized learning
- neuroAI data collection, neural recording ML, brain-task alignment
- brain data scaling laws, BEFS estimator, neural sample efficiency
- how much is brain data worth, brain data exchange rate