DatahiyenwongFree

Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

name: brain-to-speech-prosody-feature-engineering-transformer-based-reconstruction description: Brain-to-speech synthesis from intracranial EEG using prosody feature engineering and transformer-based reconstruction. Converts neural activity into natural-sounding speech with proper prosodic features. Activation: brain-to-speech, prosody, speech reconstruction, intracranial EEG, neural speech synthesis, iEEG to speech. version: 1.0.0 metadata: hermes: sourcepaper: "Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction (arXiv:2604.05751)" tags: [brain-computer-interface, speech-synthesis, transformer, ieeg, prosody] ---

Repo bundle on Versuzhiyenwong/ai_collection1001 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/hiyenwong/ai_collection Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-to-speech-prosody-feature-engineering-tr

Or clone the repo

$git clone https://github.com/hiyenwong/ai_collection.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge hiyenwong-ai-collection-collection-skills-brain-to-speech-prosody-feature-engineering-tr↵

Show SKILL.md content (~986 tokens)

---
name: brain-to-speech-prosody-feature-engineering-transformer-based-reconstruction
description: Brain-to-speech synthesis from intracranial EEG using prosody feature engineering and transformer-based reconstruction. Converts neural activity into natural-sounding speech with proper prosodic features. Activation: brain-to-speech, prosody, speech reconstruction, intracranial EEG, neural speech synthesis, iEEG to speech.
version: 1.0.0
metadata:
  hermes:
    source_paper: "Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction (arXiv:2604.05751)"
    tags: [brain-computer-interface, speech-synthesis, transformer, ieeg, prosody]
---

# Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

## Overview
Methodology for reconstructing speech from intracranial EEG (iEEG) signals using prosody feature engineering and transformer-based models. This enables direct brain-to-speech conversion for brain-computer interfaces.

## Source Paper
- **Title:** Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction
- **arXiv:** 2604.05751v1
- **Authors:** Mohammed Salah Al-Radhi, Géza Németh, Andon Tchechmedjiev, Binbin Xu
- **Published:** 2026-04-07

## Core Concepts

### Prosody Features
Speech prosody encompasses:
- **Fundamental frequency (F0):** Pitch contour
- **Intensity:** Loudness variation
- **Duration:** Timing and rhythm
- **Spectral envelope:** Timbre characteristics

### Architecture
```
iEEG signals -> Neural encoder -> Latent representation -> Transformer decoder -> Speech features -> Vocoder -> Audio
```

### Feature Engineering Pipeline

```python
import numpy as np

class ProsodyFeatureExtractor:
    def __init__(self, sample_rate=16000):
        self.sr = sample_rate

    def extract_prosody_features(self, audio):
        features = {}
        features['f0'] = self._estimate_f0(audio)
        frame_size = int(0.025 * self.sr)
        hop_size = int(0.010 * self.sr)
        energy = []
        for i in range(0, len(audio) - frame_size, hop_size):
            frame = audio[i:i+frame_size]
            energy.append(np.sum(frame**2))
        features['energy'] = np.array(energy)
        features['duration'] = len(audio) / self.sr
        return features

    def _estimate_f0(self, signal):
        frame_size = int(0.030 * self.sr)
        f0 = []
        for i in range(0, len(signal) - frame_size, frame_size):
            frame = signal[i:i+frame_size]
            if np.std(frame) < 0.01:
                f0.append(0)
                continue
            autocorr = np.correlate(frame, frame, mode='full')
            autocorr = autocorr[len(autocorr)//2:]
            min_lag = int(self.sr / 500)
            max_lag = int(self.sr / 50)
            search_region = autocorr[min_lag:max_lag]
            if len(search_region) > 0:
                lag = np.argmax(search_region) + min_lag
                f0.append(self.sr / lag)
            else:
                f0.append(0)
        return np.array(f0)

class NeuralSpeechDecoder:
    def __init__(self, n_electrodes, n_prosody_features=4):
        self.n_electrodes = n_electrodes
        self.n_features = n_prosody_features

    def neural_to_prosody(self, neural_features):
        W = np.random.randn(self.n_electrodes, self.n_features) * 0.1
        prosody = neural_features @ W
        return prosody
```

## Applications
- Speech restoration for paralyzed patients
- Brain-computer interfaces for communication
- Understanding neural basis of speech production
- Real-time neural speech decoding

## Related
- [[eeg-ieeg-bridge-bci]]
- [[brain-to-speech-synthesis]]