DocumenthiyenwongFree

brain-data-value-scaling-laws

Mathematical framework for quantifying the value of brain data for machine learning. Derives scaling laws for brain-regularized estimators, brain-to-task data exchange rates, and budget-optimal allocation of neural vs task samples. From Lewis et al. 2026 (arXiv:2605.09243). Use when: evaluating neural data for ML training, brain distillation, NeuroAI data collection planning, brain-regularized learning, neural alignment analysis.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get brain-data-value-scaling-laws.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-data-value-scaling-laws

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-data-value-scaling-laws/SKILL.md

More Versuz picks

★ Featured$0.99

vz-scrape-runner

Web

★ Featured$1.99

vz-bench-debug

Document

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-brain-data-value-scaling-laws↵

Show SKILL.md content (~1.7k tokens)

---
name: brain-data-value-scaling-laws
description: "Mathematical framework for quantifying the value of brain data for machine learning. Derives scaling laws for brain-regularized estimators, brain-to-task data exchange rates, and budget-optimal allocation of neural vs task samples. From Lewis et al. 2026 (arXiv:2605.09243). Use when: evaluating neural data for ML training, brain distillation, NeuroAI data collection planning, brain-regularized learning, neural alignment analysis."
---

# Brain Data Value Scaling Laws

Mathematical framework from **"How Much is Brain Data Worth for Machine Learning?"** (Lewis, Wang, Schwab, Pitkow -- CMU, NSF AI Institute, arXiv:2605.09243, May 2026) for quantifying the value of neural recordings as a training resource for machine learning models.

## Core Question

If a biological system (human/animal) can solve a task, does measuring its neural activity make it easier to train an ML model to solve that task too? If so, how much easier?

## Generative Model

The framework models four objects:
1. **Environmental inputs** x ~ N(0, I_{dx})
2. **Latent neural features** -- intermediate brain representations that are lower-dimensional than inputs but partially aligned with the task
3. **Neural recordings** -- noisy, partial observations of latent features
4. **Task targets** -- y = x^T beta* + epsilon_y

Key parameter: **misalignment m** between brain and task features, arising from:
- Only a subset of brain latents being measured
- Brain features not fully capturing task-relevant directions

## BEFS Estimator (Brain Encoding Foundation Student)

Two-stage estimator that uses neural recordings:

### Stage 1: Brain Encoding
- Learns encoding model from nB brain samples
- A_hat, H_hat = LRR(X, R) -- low-rank regression
- Learns latent feature map from inputs to low-dimensional neural representations

### Stage 2: Task Learning
- Uses learned brain features to regularize task learning on nT task samples
- Ridge regression with learned brain feature prior:
  beta_hat_BEFS = argmin_beta (1/nT)||y - X beta||^2 + lambda ||A_hat^T beta||^2

## Scaling Laws

### Asymptotic Test Error (Theorem 1)
E[||y_test - x_test^T beta_hat_BEFS||^2] = sigma_y^2 * (dx/nT) + alpha * Tr(J_A_hat)/nT + O(1/nT^2)

where alpha = nT/(nT + lambda) and J_A_hat captures brain-data-dependent terms.

### Brain-Task Data Exchange Rate
Quantifies how many task samples nT brain data nB can substitute for:
Value(nB) = nT_baseline - nT_with_brain(nB)

The exchange rate depends on:
- **Brain-task alignment (m)**: higher alignment -> more value
- **SNR ratio (SNR_T/SNR_B)**: better neural SNR -> more value
- **Latent dimension (d_lH*/dx)**: smaller latent fraction -> more value
- **Number of brain samples (nB)**: value saturates with more brain data
- **Task sample regime**: brain data is most valuable in small/moderate nT regimes

### Key Findings

1. **Brain data substitutes for task data** -- yields equal performance while saving a percentage of task samples
2. **Value decreases with nT** -- savings are highest when task data is scarce
3. **Distribution shift matters** -- brain-regularized learning provides robustness when test distribution has mass on brain-insensitive directions
4. **Adversarial inputs can negate value** -- under certain distribution shifts, brain data can even hurt performance
5. **Budget optimization** -- under fixed collection budget, optimal allocation of nB vs nT depends on cost ratio cB/cT

## Budget Allocation Formula

Given total budget B = cB * nB + cT * nT:
- Optimal (nB*, nT*) maximizes performance under budget constraint
- Brain data is worth collecting when:
  - Task solving is harder than brain estimation
  - Small number of highly task-aligned latents are well-exposed
  - Neural SNR is sufficient relative to task noise

## When Brain Data is Most Valuable

| Condition | Effect on Value |
|-----------|-----------------|
| Low task sample regime (nT small) | High value |
| High brain-task alignment (low m) | High value |
| Good neural SNR | More value |
| Small latent dimension fraction | More value |
| Large task sample regime | Diminishing returns |
| Adversarial test distribution | Can be negative |

## Application Scenarios

1. **NeuroAI data collection planning** -- Should you collect brain data or more task labels?
2. **Brain-regularized model training** -- Using neural recordings as regularization
3. **Brain distillation** -- Extracting representations from biological systems
4. **fMRI/EEG/Neural recording studies** -- Quantifying expected ML benefits
5. **Neural alignment analysis** -- Measuring how well brain features align with tasks

## Implementation Sketch

```python
import numpy as np

def brain_data_exchange_rate(n_B, m, snr_ratio, d_latent, d_input, n_T_base):
    """Estimate how many task samples brain data can substitute for."""
    delta = m * (1 / snr_ratio) * (d_latent / d_input)
    savings = delta * np.sqrt(n_B) / (1 + np.sqrt(n_B / n_T_base))
    return n_T_base * savings / (1 + savings)

def optimal_budget_allocation(budget, c_B, c_T, m, snr_ratio, d_latent, d_input):
    """Find optimal split between brain and task samples under budget constraint."""
    best_perf = float('inf')
    best_nB, best_nT = 0, 0
    for n_B in range(1, int(budget / c_B) + 1):
        n_T = int((budget - c_B * n_B) / c_T)
        if n_T < 1:
            continue
        perf = estimate_test_error(n_B, n_T, m, snr_ratio, d_latent, d_input)
        if perf < best_perf:
            best_perf = perf
            best_nB, best_nT = n_B, n_T
    return best_nB, best_nT
```

## Related Work

- **Brain distillation**: Using neural recordings to guide ML training
- **Scaling laws**: Systematic relationships between resources and performance (Kaplan et al. 2020)
- **Brain-aligned ML**: Selecting models/data based on brain predictiveness
- **Neural encoding models**: Predicting neural responses from stimuli

## References

- Lewis, L., Wang, Z., Schwab, D., Pitkow, X. (2026). "How Much is Brain Data Worth for Machine Learning?" arXiv:2605.09243
- Kaplan, J. et al. (2020). "Scaling laws for neural language models." arXiv:2001.08366

## Activation Keywords

- brain data value, neural data worth, brain distillation, brain-regularized learning
- neuroAI data collection, neural recording ML, brain-task alignment
- brain data scaling laws, BEFS estimator, neural sample efficiency
- how much is brain data worth, brain data exchange rate