DocumenthiyenwongFree

braindinobrain-mri-foundation

BrainDINO: Brain MRI Foundation Model for generalizable clinical representation learning. Self-distilled foundation model trained on 6.6M unlabeled axial slices from diverse brain MRI datasets, generalizing across heterogeneous endpoints without task-specific fine-tuning. Trigger words: BrainDINO, brain MRI foundation model, self-supervised MRI, clinical representation learning, neuroimaging foundation, DINO brain MRI, medical imaging foundation model, brain MRI self-distillation.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get braindinobrain-mri-foundation.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-braindinobrain-mri-foundation

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-braindinobrain-mri-foundation/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-braindinobrain-mri-foundation↵

Show SKILL.md content (~2.4k tokens)

---
name: braindinobrain-mri-foundation
description: "BrainDINO: Brain MRI Foundation Model for generalizable clinical representation learning. Self-distilled foundation model trained on 6.6M unlabeled axial slices from diverse brain MRI datasets, generalizing across heterogeneous endpoints without task-specific fine-tuning. Trigger words: BrainDINO, brain MRI foundation model, self-supervised MRI, clinical representation learning, neuroimaging foundation, DINO brain MRI, medical imaging foundation model, brain MRI self-distillation."
---

# BrainDINO: Brain MRI Foundation Model

## Overview

BrainDINO is a self-supervised foundation model for brain MRI that learns generalizable representations from ~6.6 million unlabeled axial slices, enabling zero-shot and few-shot transfer to diverse clinical endpoints.

## Key Results

- **Training data**: ~6.6M unlabeled axial brain MRI slices
- **Method**: Self-distillation (DINO-style) with multi-crop augmentation
- **Generalization**: Works across heterogeneous brain MRI endpoints
- **Clinical relevance**: No task-specific labeled data needed for representation learning

## Architecture

### DINO Framework for Brain MRI

```
Input MRI Slice ──→ [Student Encoder (ViT)] ──→ Student Features
                        ↓
                   Multi-crop views
                        ↓
                 [Teacher Encoder (ViT)] ──→ Teacher Features
                        ↓ (EMA update)
                   Cross-entropy loss
                 (Student matches Teacher)
```

### Key Components

1. **Vision Transformer (ViT) backbone** — patch-based self-attention
2. **Multi-crop augmentation** — global + local views of brain slices
3. **Teacher-student distillation** — teacher updated via EMA
4. **Centering & sharpening** — prevent collapsed representations
5. **Positional encoding** — adapted for 2D medical images

## Pretraining Procedure

```python
import torch
import torch.nn as nn
import torchvision.transforms as transforms

class BrainDINOTrainer:
    def __init__(self, student, teacher, momentum=0.996,
                 warmup_epochs=10, total_epochs=300):
        self.student = student
        self.teacher = teacher
        self.momentum = momentum
        self.warmup_epochs = warmup_epochs
        self.total_epochs = total_epochs
        
        # Initialize teacher with student weights
        self._init_teacher()
        
    def _init_teacher(self):
        """Initialize teacher with student weights."""
        for param_s, param_t in zip(
            self.student.parameters(), 
            self.teacher.parameters()
        ):
            param_t.data.copy_(param_s.data)
            param_t.requires_grad = False
    
    def update_teacher(self, epoch):
        """EMA update of teacher from student."""
        momentum = self._get_momentum(epoch)
        for param_s, param_t in zip(
            self.student.parameters(),
            self.teacher.parameters()
        ):
            param_t.data = momentum * param_t.data + \
                          (1 - momentum) * param_s.data
    
    def _get_momentum(self, epoch):
        """Cosine schedule for momentum."""
        return 1 - (1 - self.momentum) * (
            (1 + torch.cos(torch.pi * epoch / self.total_epochs)) / 2
        )
    
    def compute_loss(self, student_out, teacher_out, temp_student=0.1,
                     temp_teacher=0.04):
        """DINO loss: cross-entropy between student and teacher."""
        # Apply temperature scaling
        student_logits = student_out / temp_student
        teacher_probs = torch.softmax(teacher_out / temp_teacher, dim=-1)
        
        # Cross-entropy loss
        loss = -torch.sum(
            teacher_probs * torch.log_softmax(student_logits, dim=-1),
            dim=-1
        ).mean()
        
        return loss
    
    def train_step(self, batch, optimizer, epoch):
        """Single training step."""
        # Generate multi-crop views
        global_views, local_views = self._augment(batch)
        
        # Student forward (all views)
        student_out = [self.student(v) for v in global_views + local_views]
        
        # Teacher forward (global views only)
        with torch.no_grad():
            teacher_out = [self.teacher(v) for v in global_views]
        
        # Compute loss (each global view matches teacher)
        loss = 0
        for s_out in student_out[:len(global_views)]:
            for t_out in teacher_out:
                loss += self.compute_loss(s_out, t_out)
        
        # Backward (student only)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Update teacher
        self.update_teacher(epoch)
        
        return loss.item()
```

### Augmentation Pipeline for Brain MRI

```python
def get_brain_mri_augmentation():
    """Multi-crop augmentation for brain MRI slices."""
    global_transform = transforms.Compose([
        transforms.RandomResizedCrop(224, scale=(0.4, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(15),
        transforms.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),
        transforms.ColorJitter(brightness=0.2, contrast=0.2),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5], std=[0.5])
    ])
    
    local_transform = transforms.Compose([
        transforms.RandomResizedCrop(96, scale=(0.05, 0.4)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(30),
        transforms.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5], std=[0.5])
    ])
    
    return global_transform, local_transform
```

## Downstream Tasks

### Linear Probing

```python
class LinearProbe(nn.Module):
    """Linear classifier on top of frozen BrainDINO features."""
    def __init__(self, backbone, num_classes, feature_dim=768):
        super().__init__()
        self.backbone = backbone
        for param in self.backbone.parameters():
            param.requires_grad = False
        self.head = nn.Linear(feature_dim, num_classes)
    
    def forward(self, x):
        with torch.no_grad():
            features = self.backbone(x)
        return self.head(features)
```

### Few-Shot Fine-Tuning

```python
def few_shot_finetune(backbone, train_loader, val_loader, 
                      num_classes, epochs=50, lr=1e-4):
    """Fine-tune with limited labeled data."""
    model = nn.Sequential(
        backbone,
        nn.Linear(768, num_classes)
    )
    
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(epochs):
        model.train()
        for batch_x, batch_y in train_loader:
            optimizer.zero_grad()
            logits = model(batch_x)
            loss = criterion(logits, batch_y)
            loss.backward()
            optimizer.step()
        
        # Validate
        model.eval()
        correct = 0
        total = 0
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                logits = model(batch_x)
                _, predicted = logits.max(1)
                correct += predicted.eq(batch_y).sum().item()
                total += batch_y.size(0)
        
        acc = correct / total
        print(f"Epoch {epoch}: val_acc={acc:.4f}")
    
    return model
```

## Clinical Applications

| Task | Input | Output | Notes |
|---|---|---|---|
| Tumor segmentation | MRI slice | Pixel mask | Use feature maps for U-Net decoder |
| Disease classification | MRI volume | Diagnosis | Pool slice features |
| Age prediction | MRI slice | Age (years) | Regression head |
| Atrophy quantification | MRI slice | Volume loss | Feature-based regression |
| Anomaly detection | MRI slice | Anomaly score | Reconstruction or distance |

## Advantages Over Task-Specific Models

1. **Data efficiency**: Pretrained on unlabeled data, fine-tune with few labeled examples
2. **Generalization**: Single model works across diverse clinical endpoints
3. **Transfer learning**: Features transfer to unseen tasks
4. **Scalability**: Training scales with unlabeled data availability
5. **Robustness**: Learns invariant features across scanners and protocols

## Best Practices

1. **Multi-crop augmentation**: Use 2 global + 6 local crops per batch
2. **Momentum scheduling**: Cosine schedule from 0.996 to 1.0
3. **Temperature tuning**: Student temp ~0.1, teacher temp ~0.04
4. **Centering**: Apply centering to teacher output to prevent collapse
5. **Batch size**: Large batches (256+) improve representation quality
6. **Patch size**: 16×16 patches work well for 224×224 brain MRI slices
7. **Positional encoding**: Use learnable 2D positional embeddings

## Reference

arXiv: 2604.27277 (2026-04-30)
Authors: Wu, Wang, Li, et al.
URL: https://arxiv.org/abs/2604.27277