ResearchhiyenwongFree

brain-to-speech-synthesis-v2

Brain-to-speech synthesis from intracranial EEG (iEEG) using prosody-aware feature engineering and transformer-based reconstruction. Enables high-fidelity speech reconstruction for neuroprosthetics. Based on arXiv:2604.05751v1 (April 2026). Activation: brain-to-speech, iEEG speech, intracranial EEG, speech neuroprosthetics, prosody brain decoding.

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get brain-to-speech-synthesis-v2.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-to-speech-synthesis

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

$cp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-to-speech-synthesis/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-brain-to-speech-synthesis↵

Show SKILL.md content (~2.4k tokens)

---
name: brain-to-speech-synthesis-v2
description: "Brain-to-speech synthesis from intracranial EEG (iEEG) using prosody-aware feature engineering and transformer-based reconstruction. Enables high-fidelity speech reconstruction for neuroprosthetics. Based on arXiv:2604.05751v1 (April 2026). Activation: brain-to-speech, iEEG speech, intracranial EEG, speech neuroprosthetics, prosody brain decoding."
---

# Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

Brain-to-speech synthesis from intracranial EEG using prosody-aware features and transformer architectures.

**Paper**: arXiv:2604.05751v1 (April 7, 2026)
**Authors**: Mohammed Salah Al-Radhi, Géza Németh, Andon Tchechmedjiev, Binbin Xu

## Overview

This methodology synthesizes speech directly from intracranial electroencephalography (iEEG) brain signals. The approach integrates prosody-aware feature engineering with transformer-based models to generate accurate and natural-sounding speech for assistive technologies.

## Key Components

### 1. Prosody Feature Extraction from iEEG

Traditional approaches focus on phonetic content only. This method extracts crucial prosodic features:
- **Intonation**: Pitch contours and melodic patterns
- **Pitch**: Fundamental frequency (F0) trajectories
- **Rhythm**: Timing and stress patterns

```python
def extract_prosody_features(ieeg_signal):
    """
    Extract prosodic features from iEEG signals.
    
    Args:
        ieeg_signal: Raw intracranial EEG data
    
    Returns:
        prosody_features: Dict with intonation, pitch, rhythm
    """
    # Extract pitch-related features from auditory cortex activity
    pitch = extract_pitch_from_ieeg(ieeg_signal, region='superior_temporal')
    
    # Extract intonation contours
    intonation = extract_intonation_pattern(ieeg_signal)
    
    # Extract rhythmic patterns
    rhythm = extract_timing_patterns(ieeg_signal)
    
    return {
        'pitch': pitch,
        'intonation': intonation,
        'rhythm': rhythm
    }
```

### 2. Transformer Encoder Architecture

Novel transformer architecture specifically designed for brain-to-speech tasks:

```
┌─────────────────────────────────────────────────────────┐
│              BRAIN-TO-SPEECH TRANSFORMER                 │
├─────────────────────────────────────────────────────────┤
│  Input: iEEG signal (multi-channel time series)          │
│                    ↓                                     │
│  ┌──────────────────────────────────────┐               │
│  │  Temporal Convolution Feature Extraction│              │
│  └──────────────────────────────────────┘               │
│                    ↓                                     │
│  ┌──────────────────────────────────────┐               │
│  │  Multi-Head Self-Attention (Brain)   │               │
│  │  - Capture spatial-temporal patterns  │               │
│  └──────────────────────────────────────┘               │
│                    ↓                                     │
│  ┌──────────────────────────────────────┐               │
│  │  Prosody Feature Integration          │               │
│  │  - Inject intonation, pitch, rhythm   │               │
│  └──────────────────────────────────────┘               │
│                    ↓                                     │
│  ┌──────────────────────────────────────┐               │
│  │  Cross-Modal Attention               │               │
│  │  - Brain → Speech features           │               │
│  └──────────────────────────────────────┘               │
│                    ↓                                     │
│  Output: Speech spectrogram / acoustic features          │
└─────────────────────────────────────────────────────────┘
```

### 3. Speech Reconstruction Pipeline

```python
class BrainToSpeechTransformer:
    def __init__(self):
        self.temporal_encoder = TemporalConvEncoder()
        self.transformer = TransformerEncoder()
        self.prosody_integrator = ProsodyIntegrationModule()
        self.speech_decoder = SpeechDecoder()
    
    def forward(self, ieeg_signal):
        # Extract temporal features
        temporal_features = self.temporal_encoder(ieeg_signal)
        
        # Apply transformer attention
        attended = self.transformer(temporal_features)
        
        # Integrate prosody features
        prosody = extract_prosody_features(ieeg_signal)
        enhanced = self.prosody_integrator(attended, prosody)
        
        # Decode to speech
        speech = self.speech_decoder(enhanced)
        return speech
```

## Implementation

### Step 1: iEEG Preprocessing

```python
def preprocess_ieeg(raw_ieeg, sampling_rate=1000):
    """
    Preprocess intracranial EEG signals.
    
    Args:
        raw_ieeg: Raw iEEG data (channels x time)
        sampling_rate: Sampling frequency in Hz
    
    Returns:
        processed: Cleaned iEEG signal
    """
    # Bandpass filter (1-200 Hz typical for speech)
    filtered = bandpass_filter(raw_ieeg, low=1, high=200, fs=sampling_rate)
    
    # Notch filter for line noise
    notch_filtered = notch_filter(filtered, freq=60, fs=sampling_rate)
    
    # Common average reference
    car = notch_filtered - np.mean(notch_filtered, axis=0)
    
    # Z-score normalization per channel
    normalized = (car - np.mean(car, axis=1, keepdims=True)) / np.std(car, axis=1, keepdims=True)
    
    return normalized
```

### Step 2: Prosody-Aware Training

```python
def train_with_prosody(model, dataloader, epochs=100):
    """
    Train brain-to-speech model with prosody features.
    
    Args:
        model: BrainToSpeechTransformer
        dataloader: Training data loader
        epochs: Number of training epochs
    """
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
    
    for epoch in range(epochs):
        for batch in dataloader:
            ieeg = batch['ieeg']  # Brain signals
            target_speech = batch['speech']  # Target audio
            target_prosody = batch['prosody']  # Prosody annotations
            
            # Forward pass
            predicted_speech, predicted_prosody = model(ieeg)
            
            # Multi-objective loss
            speech_loss = spectral_loss(predicted_speech, target_speech)
            prosody_loss = prosody_mse(predicted_prosody, target_prosody)
            
            total_loss = speech_loss + 0.5 * prosody_loss
            
            # Backpropagation
            optimizer.zero_grad()
            total_loss.backward()
            optimizer.step()
```

### Step 3: Inference and Evaluation

```python
def synthesize_speech(model, ieeg_signal):
    """
    Synthesize speech from iEEG signal.
    
    Args:
        model: Trained BrainToSpeechTransformer
        ieeg_signal: Input iEEG data
    
    Returns:
        audio: Synthesized speech waveform
    """
    model.eval()
    with torch.no_grad():
        # Preprocess
        processed = preprocess_ieeg(ieeg_signal)
        
        # Model inference
        spectrogram = model(processed)
        
        # Vocoder (e.g., HiFi-GAN)
        audio = vocoder(spectrogram)
    
    return audio
```

## Evaluation Metrics

### Quantitative Metrics
- **Mel Cepstral Distortion (MCD)**: Spectral similarity
- **Fundamental Frequency RMSE**: Pitch accuracy
- **Short-Time Objective Intelligibility (STOI)**: Speech intelligibility
- **Perceptual Evaluation of Speech Quality (PESQ)**: Perceptual quality

### Perceptual Metrics
- Mean Opinion Score (MOS) for naturalness
- Intelligibility scores from listeners
- Speaker similarity ratings

## Advantages Over Baselines

| Method | Intelligibility | Naturalness | Prosody Accuracy |
|--------|----------------|-------------|------------------|
| Griffin-Lim | Baseline | Poor | N/A |
| CNN-based | Moderate | Moderate | Poor |
| **Transformer + Prosody** | **High** | **High** | **High** |

## Applications

- **Neuroprosthetics**: Restore communication for speech-impaired individuals
- **Brain-Computer Interfaces**: Speech output for locked-in patients
- **Clinical Research**: Understanding speech production in the brain
- **Assistive Technology**: Communication aids for ALS, stroke patients

## Future Directions

1. **Diffusion Models**: Integrate diffusion-based vocoders for higher quality
2. **Real-time Systems**: Reduce latency for interactive use
3. **Non-invasive EEG**: Extend to scalp EEG with reduced performance
4. **Multilingual Support**: Adapt to different languages

## Trigger Words

- brain-to-speech, iEEG speech, intracranial EEG
- speech neuroprosthetics, prosody brain decoding
- speech brain interface, neural speech synthesis

## Category

neuroscience, brain-computer-interface, speech-synthesis

## Reference

Al-Radhi, M. S., Németh, G., Tchechmedjiev, A., & Xu, B. (2026). Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction. arXiv:2604.05751v1.