Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install hiyenwong-ai-collection-collection-skills-brain-to-speech-prosody-feature-engineering-trgit clone https://github.com/hiyenwong/ai_collection.gitcp ai_collection/SKILL.MD ~/.claude/skills/hiyenwong-ai-collection-collection-skills-brain-to-speech-prosody-feature-engineering-tr/SKILL.md---
name: brain-to-speech-prosody-feature-engineering-transformer-based-reconstruction
description: Brain-to-speech synthesis from intracranial EEG using prosody feature engineering and transformer-based reconstruction. Converts neural activity into natural-sounding speech with proper prosodic features. Activation: brain-to-speech, prosody, speech reconstruction, intracranial EEG, neural speech synthesis, iEEG to speech.
version: 1.0.0
metadata:
hermes:
source_paper: "Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction (arXiv:2604.05751)"
tags: [brain-computer-interface, speech-synthesis, transformer, ieeg, prosody]
---
# Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction
## Overview
Methodology for reconstructing speech from intracranial EEG (iEEG) signals using prosody feature engineering and transformer-based models. This enables direct brain-to-speech conversion for brain-computer interfaces.
## Source Paper
- **Title:** Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction
- **arXiv:** 2604.05751v1
- **Authors:** Mohammed Salah Al-Radhi, Géza Németh, Andon Tchechmedjiev, Binbin Xu
- **Published:** 2026-04-07
## Core Concepts
### Prosody Features
Speech prosody encompasses:
- **Fundamental frequency (F0):** Pitch contour
- **Intensity:** Loudness variation
- **Duration:** Timing and rhythm
- **Spectral envelope:** Timbre characteristics
### Architecture
```
iEEG signals -> Neural encoder -> Latent representation -> Transformer decoder -> Speech features -> Vocoder -> Audio
```
### Feature Engineering Pipeline
```python
import numpy as np
class ProsodyFeatureExtractor:
def __init__(self, sample_rate=16000):
self.sr = sample_rate
def extract_prosody_features(self, audio):
features = {}
features['f0'] = self._estimate_f0(audio)
frame_size = int(0.025 * self.sr)
hop_size = int(0.010 * self.sr)
energy = []
for i in range(0, len(audio) - frame_size, hop_size):
frame = audio[i:i+frame_size]
energy.append(np.sum(frame**2))
features['energy'] = np.array(energy)
features['duration'] = len(audio) / self.sr
return features
def _estimate_f0(self, signal):
frame_size = int(0.030 * self.sr)
f0 = []
for i in range(0, len(signal) - frame_size, frame_size):
frame = signal[i:i+frame_size]
if np.std(frame) < 0.01:
f0.append(0)
continue
autocorr = np.correlate(frame, frame, mode='full')
autocorr = autocorr[len(autocorr)//2:]
min_lag = int(self.sr / 500)
max_lag = int(self.sr / 50)
search_region = autocorr[min_lag:max_lag]
if len(search_region) > 0:
lag = np.argmax(search_region) + min_lag
f0.append(self.sr / lag)
else:
f0.append(0)
return np.array(f0)
class NeuralSpeechDecoder:
def __init__(self, n_electrodes, n_prosody_features=4):
self.n_electrodes = n_electrodes
self.n_features = n_prosody_features
def neural_to_prosody(self, neural_features):
W = np.random.randn(self.n_electrodes, self.n_features) * 0.1
prosody = neural_features @ W
return prosody
```
## Applications
- Speech restoration for paralyzed patients
- Brain-computer interfaces for communication
- Understanding neural basis of speech production
- Real-time neural speech decoding
## Related
- [[eeg-ieeg-bridge-bci]]
- [[brain-to-speech-synthesis]]