DocumenthiyenwongFree

brain-to-text-unified-decoding

Unified brain-to-text decoding framework for both speech production and perception in Mandarin Chinese. Uses shared neural representations across modalities with dual-decoder architecture. Activation: brain-to-text decoding, speech BCI, neural speech decoding, unified speech decoding.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get brain-to-text-unified-decoding.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-to-text-unified-decoding

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-to-text-unified-decoding/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-brain-to-text-unified-decoding↵

Show SKILL.md content (~2.6k tokens)

---
name: brain-to-text-unified-decoding
description: "Unified brain-to-text decoding framework for both speech production and perception in Mandarin Chinese. Uses shared neural representations across modalities with dual-decoder architecture. Activation: brain-to-text decoding, speech BCI, neural speech decoding, unified speech decoding."
---

# Unified Brain-to-Text Decoding Across Speech Production and Perception

> A unified brain-to-sentence decoding framework for both speech production and perception in Mandarin Chinese, exhibiting strong cross-modal generalization capabilities.

## Metadata
- **Source**: arXiv:2603.12628v1
- **Authors**: Zhizhang Yuan, Yang Yang, Gaorui Zhang, et al.
- **Published**: 2026-03-13
- **Category**: Brain-Computer Interface, Speech Decoding, Neural Engineering

## Core Methodology

### Problem Addressed
Traditional brain-to-text decoding approaches:
- Focus on single modality (production OR perception)
- Limited to alphabetic languages
- Cannot leverage shared neural representations

### Key Innovation
This framework provides **unified decoding** across:
1. **Speech Production** (motor cortex → speech)
2. **Speech Perception** (auditory cortex → speech)

Using a shared latent space with modality-specific decoders.

### Architecture

```
Neural Activity (ECoG/iEEG)
    ↓
Shared Feature Encoder
    ↓
┌─────────────┴─────────────┐
↓                           ↓
Production Decoder    Perception Decoder
↓                           ↓
Mandarin Sentence Output
```

## Implementation Guide

### Prerequisites
- Python 3.8+
- PyTorch
- MNE-Python for ECoG/iEEG preprocessing
- Chinese NLP tools (jieba for tokenization)

### Step-by-Step Implementation

#### Step 1: Data Preprocessing
```python
import mne
import numpy as np
from scipy.signal import resample

# Load ECoG/iEEG data
raw = mne.io.read_raw_edf('neural_data.edf', preload=True)

# High-gamma band extraction (70-150 Hz for speech)
raw.filter(l_freq=70, h_freq=150)
raw.apply_hilbert(envelope=True)

# Epoch around speech events
events = mne.find_events(raw, stim_channel='STI')
epochs = mne.Epochs(raw, events, tmin=-0.5, tmax=2.0, baseline=None)

# Get data: (n_trials, n_channels, n_timepoints)
neural_data = epochs.get_data()

# Resample to common frequency
neural_data = resample(neural_data, num=200, axis=2)  # 200 timepoints
```

#### Step 2: Shared Feature Encoder
```python
import torch
import torch.nn as nn

class SharedFeatureEncoder(nn.Module):
    """Shared encoder for both production and perception"""
    def __init__(self, n_channels, n_timepoints, hidden_dim=512):
        super().__init__()
        self.temporal_conv = nn.Sequential(
            nn.Conv1d(n_channels, 128, kernel_size=11, padding=5),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Conv1d(128, 256, kernel_size=7, padding=3),
            nn.ReLU(),
            nn.BatchNorm1d(256),
            nn.Conv1d(256, hidden_dim, kernel_size=5, padding=2),
            nn.ReLU()
        )
        
        # Temporal pooling
        self.temporal_attention = nn.MultiheadAttention(
            embed_dim=hidden_dim, num_heads=8, batch_first=True
        )
        
    def forward(self, x):
        # x: (batch, n_channels, n_timepoints)
        features = self.temporal_conv(x)
        features = features.transpose(1, 2)  # (batch, time, hidden)
        
        # Self-attention for temporal aggregation
        attn_out, _ = self.temporal_attention(features, features, features)
        
        # Mean pooling
        encoded = attn_out.mean(dim=1)  # (batch, hidden_dim)
        return encoded
```

#### Step 3: Dual Decoder Architecture
```python
class ModalitySpecificDecoder(nn.Module):
    """Decoder for specific modality (production or perception)"""
    def __init__(self, hidden_dim=512, vocab_size=5000, max_length=50):
        super().__init__()
        self.vocab_size = vocab_size
        self.max_length = max_length
        
        self.decoder = nn.LSTM(
            input_size=hidden_dim,
            hidden_size=hidden_dim,
            num_layers=2,
            batch_first=True
        )
        
        self.output_projection = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, encoded_features, target_tokens=None):
        batch_size = encoded_features.shape[0]
        
        # Initialize decoder input
        decoder_input = encoded_features.unsqueeze(1)  # (batch, 1, hidden)
        
        outputs = []
        hidden = None
        
        for t in range(self.max_length if target_tokens is None else target_tokens.shape[1]):
            out, hidden = self.decoder(decoder_input, hidden)
            logits = self.output_projection(out.squeeze(1))
            outputs.append(logits)
            
            if target_tokens is not None:
                # Teacher forcing
                next_input = target_tokens[:, t]
                decoder_input = self.embedding(next_input).unsqueeze(1)
            else:
                # Greedy decoding
                next_token = logits.argmax(dim=1)
                decoder_input = self.embedding(next_token).unsqueeze(1)
        
        return torch.stack(outputs, dim=1)  # (batch, seq_len, vocab_size)


class UnifiedBrainToText(nn.Module):
    """Complete unified decoding model"""
    def __init__(self, n_channels, n_timepoints, vocab_size=5000, hidden_dim=512):
        super().__init__()
        self.encoder = SharedFeatureEncoder(n_channels, n_timepoints, hidden_dim)
        self.production_decoder = ModalitySpecificDecoder(hidden_dim, vocab_size)
        self.perception_decoder = ModalitySpecificDecoder(hidden_dim, vocab_size)
        
    def forward(self, neural_data, modality='production', target_tokens=None):
        # Encode neural features
        shared_features = self.encoder(neural_data)
        
        # Route to appropriate decoder
        if modality == 'production':
            output = self.production_decoder(shared_features, target_tokens)
        else:
            output = self.perception_decoder(shared_features, target_tokens)
        
        return output
```

#### Step 4: Training with Cross-Modal Regularization
```python
def train_unified_model(model, train_loader, epochs=50, lr=1e-3):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss(ignore_index=0)  # Ignore padding
    
    for epoch in range(epochs):
        model.train()
        total_loss = 0
        
        for neural_data, modality_labels, text_tokens in train_loader:
            optimizer.zero_grad()
            
            # Split by modality
            prod_mask = modality_labels == 'production'
            perc_mask = modality_labels == 'perception'
            
            loss = 0
            
            # Production branch
            if prod_mask.any():
                prod_out = model(
                    neural_data[prod_mask], 
                    modality='production',
                    target_tokens=text_tokens[prod_mask]
                )
                loss += criterion(
                    prod_out.reshape(-1, prod_out.shape[-1]),
                    text_tokens[prod_mask].reshape(-1)
                )
            
            # Perception branch
            if perc_mask.any():
                perc_out = model(
                    neural_data[perc_mask],
                    modality='perception', 
                    target_tokens=text_tokens[perc_mask]
                )
                loss += criterion(
                    perc_out.reshape(-1, perc_out.shape[-1]),
                    text_tokens[perc_mask].reshape(-1)
                )
            
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        
        if epoch % 10 == 0:
            print(f"Epoch {epoch}: Loss = {total_loss/len(train_loader):.4f}")
```

#### Step 5: Mandarin Chinese Tokenization
```python
import jieba

def tokenize_mandarin(text):
    """Tokenize Mandarin Chinese text"""
    tokens = list(jieba.cut(text))
    return tokens

# Build vocabulary
vocab = {'<PAD>': 0, '<UNK>': 1, '<START>': 2, '<END>': 3}
for sentence in training_sentences:
    for token in tokenize_mandarin(sentence):
        if token not in vocab:
            vocab[token] = len(vocab)
```

## Applications

1. **Communication BCIs**: Enable speech output for paralyzed patients
2. **Speech Rehabilitation**: Assist stroke recovery with neural feedback
3. **Cognitive Neuroscience**: Study speech production-perception interactions
4. **Multilingual BCIs**: Framework adaptable to other languages

## Pitfalls

- **Language Specificity**: Current implementation optimized for Mandarin; tonal languages may need pitch encoding
- **Electrode Coverage**: Requires specific coverage of motor and auditory cortices
- **Training Data**: Needs paired production-perception recordings for best results
- **Cross-Subject Generalization**: Performance varies across subjects; transfer learning recommended
- **Latency**: Real-time decoding requires optimization for low-latency applications

## Related Skills
- brain-to-speech-prosody-feature-engineering
- brain-to-speech-synthesis
- eeg-brain-connectivity-bci
- iphoneme-brain-to-text-als-conformerxl

## Citation
```bibtex
@article{yuan2026unified,
  title={Towards unified brain-to-text decoding across speech production and perception},
  author={Yuan, Zhizhang and Yang, Yang and Zhang, Gaorui and others},
  journal={arXiv preprint arXiv:2603.12628},
  year={2026}
}
```