Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-audifygit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-audify/SKILL.md---
name: audify
description: Convert any text source (PDF, Markdown, TXT, log file, or a Claude Code session transcript) into a single MP3 file using OpenAI TTS. Preserves content verbatim — never summarizes, paraphrases, or generates dialogue (unlike NotebookLM). Use whenever the user says "audify", "make this an audio", "convert to mp3", "read this aloud", "I want to listen to this", or asks to turn a document/conversation/file into audio they can play, share, email, or upload.
---
# audify — Document & Session → MP3
Turn any text source into a single MP3 using OpenAI TTS, with structural cleanup so it sounds natural (dehyphenation, page-number removal, markdown stripping, blockquote unwrapping, reference-marker removal). Output is a plain MP3 — share it via the existing `/upload`, `/share`, `/upload-and-share`, or `/email` skills.
## When to use
Trigger on any of:
- "audify X" / "audify this"
- "convert X to mp3" / "make an mp3 of X"
- "read this aloud" / "I want to listen to X on the train"
- "turn this PDF/markdown/log/transcript into audio"
- "audify our session" / "make this conversation an audio"
## Inputs supported
| Suffix | Handler |
|---|---|
| `.pdf` | `pdftotext -nopgbrk` (default flow, NOT `-layout` — see Insights) |
| `.md`, `.markdown` | UTF-8 read + markdown structural strip |
| `.txt`, `.log` | UTF-8 read |
| Claude Code session JSONL | First run `extract_session.py` to flatten to `.txt`, then audify |
## Standard workflow
### Case 1 — Single document (PDF/MD/TXT)
```bash
# Dry-run first to confirm extraction + cost (free, no API call)
python3 ~/.claude/skills/audify/scripts/audify.py /path/to/input.pdf --dry-run
# Real run — writes /path/to/input.mp3
python3 ~/.claude/skills/audify/scripts/audify.py /path/to/input.pdf
```
For long inputs (>50K chars / multiple minutes of synthesis), run in the background and check the tempdir for `part_NNNN.mp3` to track progress:
```bash
ls /tmp/tmp*/part_*.mp3 2>/dev/null | wc -l # chunks completed so far
```
### Case 2 — Claude Code session transcript
The active session's JSONL lives at:
```
~/.claude/projects/<sanitized-cwd>/<session-uuid>.jsonl
```
`<sanitized-cwd>` is the current working directory with `/` replaced by `-` and a leading `-`. Find the most recent session file with:
```bash
ls -lt ~/.claude/projects/<sanitized-cwd>/*.jsonl | head -1
```
Then extract → audify:
```bash
python3 ~/.claude/skills/audify/scripts/extract_session.py \
/path/to/session.jsonl \
./session.txt
python3 ~/.claude/skills/audify/scripts/audify.py ./session.txt
```
`extract_session.py` keeps only user prompts and assistant prose — it strips tool calls, tool results, thinking blocks, system reminders, command stdout, slash-command echoes, and skill envelope text. The output reads cleanly as alternating "User said." / "Assistant replied." narration.
### Case 3 — Audify + share in one go
After producing the MP3, chain into `/upload-and-share` (or call `gws` directly) to push it to Google Drive and share with a recipient. Don't reinvent that plumbing — delegate.
## CLI reference (`audify.py`)
```
audify.py INPUT [-o OUT.mp3] [--voice V] [--model M]
[--speed S] [--no-clean] [--dry-run]
```
| Flag | Default | Notes |
|---|---|---|
| `-o, --output` | `<input>.mp3` | Output path |
| `--voice` | `alloy` | One of: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, verse |
| `--model` | `gpt-4o-mini-tts` | Cheapest. `tts-1` and `tts-1-hd` also work. |
| `--speed` | `1.0` | 0.25 – 4.0 |
| `--no-clean` | off | Skip structural cleanup (true verbatim, sounds rough on PDFs) |
| `--dry-run` | off | Extract + clean + chunk + cost estimate, no API call |
Cost (gpt-4o-mini-tts): ~$0.015 per 1,000 characters. A 70K-char paper ≈ $1.
## Prerequisites
- `OPENAI_API_KEY` env var set (Vivek already has this).
- `ffmpeg` and `pdftotext` on PATH (standard on Pop!_OS / most Linux).
- `openai` Python SDK (≥1.0; tested with 2.17).
## How it works (architecture)
```
input → extract → clean → chunk (≤3500 chars) → OpenAI TTS per chunk
↓
ffmpeg concat -c copy → MP3
```
- **Chunking** targets 3500 chars (OpenAI's hard limit is 4096) with sentence-boundary splits and greedy packing for minimum API calls.
- **Concat** uses `ffmpeg -f concat -c copy` — stream copy, no re-encode, no quality loss.
- **Single-chunk inputs** skip ffmpeg entirely.
## Insights / gotchas (lessons learned building this)
1. **Use `pdftotext` default, not `-layout`.** `-layout` preserves visual columns, which on a two-column academic PDF reads row-1-left, row-1-right, row-2-left, row-2-right — gibberish for TTS. Default flow tracks reading order.
2. **Regex order matters in markdown cleanup.** Strip blockquote markers (`>`) BEFORE heading markers (`#`), so a `> ## Title` line gets fully unwrapped. Reorder by dependency, not by aesthetic grouping.
3. **Don't pipe long-running synthesis through `tail`.** `python3 audify.py X 2>&1 | tail -30` looks fine but silences interim progress until the pipe closes — useless for monitoring. Either drop the pipe or `tee` to a logfile.
4. **Session JSONL extraction is non-trivial.** Skill tool results arrive as user-role messages with `text` content blocks — they look like real user speech but are plumbing. Filter on prefixes like `Base directory for this skill:`, `Tool loaded.`, `Launching skill:`, `Caveat:`. Also strip XML-style envelope tags (`<system-reminder>`, `<command-name>`, `<local-command-stdout>`, `<task-notification>`, etc.).
5. **References/Bibliography sections are noise.** Drop everything from a `References` / `Bibliography` / `Works Cited` heading onward — listening to a citation list is awful.
## Out of scope (deliberate)
- No equation-to-speech (would need an LLM pass; risks hallucination of "verbatim" content).
- No multi-speaker / dialogue generation (that's NotebookLM — the explicit non-goal).
- No streaming-while-synthesizing playback.
- No GUI.
- No partial-resume / chunk caching — short inputs re-run cheaply.
## Files
- `scripts/audify.py` — main CLI
- `scripts/extract_session.py` — Claude session JSONL → narration text