Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-voice-session-to-pdfgit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-voice-session-to-pdf/SKILL.md# Voice Session to PDF — Clean Up Voice Transcripts into Polished Documents Take a raw robo-voice transcript .txt file and transform it into a clean, structured, polished PDF document. ## Arguments `<path to transcript .txt file>` — The raw transcript file to process. Examples: - `/voice-session-to-pdf foolish-framing/transcripts.txt` - `/voice-session-to-pdf ~/conversations/unix-philosophy-discussion.txt` - `/voice-session-to-pdf` — no args, ask the user which file to process ## Reference Document The file `reference-documents/example_transcript.txt` shows the exact input format this skill expects. Read it before processing any transcript to understand the structure. ## Input Format The transcript comes from robo-voice and has this structure: ``` [12:05:31 AM] You: Hello? [12:05:50 AM] Robo: Hey! Welcome back. Ready to continue? [12:05:57 AM] You: Yeah. [12:05:58 AM] You: So [12:05:59 AM] You: okay. So let me start at this. [12:06:02 AM] You: We were at [12:06:03 AM] You: give me a quick okay. So I got distracted. ``` Key characteristics: - **Timestamps** in `[H:MM:SS AM/PM]` format - **Speaker labels**: `You:` (the user) and `Robo:` (Claude Code) - **Fragmented speech**: Voice transcription splits sentences across multiple lines with separate timestamps - **Filler words**: "uh", "um", "like", "you know" scattered throughout - **Transcription errors**: Words misheard by the STT engine (e.g., "Clark Court" for "Claude Code", "Eunix" for "Unix") ## Workflow ### Step 1: Read and Parse Read the transcript file. Identify all messages by speaker and timestamp. ### Step 2: Merge Fragments Consecutive lines from the same speaker within a short time window (< 30 seconds between lines) should be merged into a single coherent paragraph. This is the most important step — raw voice transcripts are fragmented into 3-10 word chunks that need to be reassembled. ### Step 3: Clean Up For each merged paragraph: - Fix obvious transcription errors (context-dependent — "Clark Court" → "Claude Code", "Eunix" → "Unix", etc.) - Remove excessive filler words while keeping the natural voice feel - Fix grammar where speech-to-text produced nonsense - Do NOT rewrite the content — preserve the speaker's words and meaning - Keep the conversational tone — this should read like a cleaned-up conversation, not a formal paper ### Step 4: Structure Identify natural sections in the conversation: - Topic changes - Q&A patterns - Arguments and responses - Transitions Add section headers where the conversation shifts topic. ### Step 5: Generate PDF Use reportlab to create a polished PDF: ```python from reportlab.lib.pagesizes import letter from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle from reportlab.lib.colors import HexColor ``` Style guidelines: - **Title**: Large, dark, centered - **Subtitle**: Smaller, gray, centered - **User quotes**: Dark text, normal weight - **Robo quotes**: Can be slightly different style (bold label, normal text) - **Section headers**: Bold, colored, with space above - **Margins**: 1 inch all sides - **Font size**: 11pt body, 12pt for speaker-labeled paragraphs ### Step 6: Save and Report Save the PDF next to the source transcript file (same directory, same name but .pdf extension). Report: - Number of raw lines processed - Number of merged paragraphs produced - Number of sections identified - Output file path and size ## Rules 1. **Read the reference transcript first.** Understand the format before processing. 2. **Merge fragments aggressively.** A single thought split across 8 timestamped lines should become one paragraph. 3. **Preserve meaning.** Clean up delivery, not content. The speaker's ideas and words should survive intact. 4. **Fix transcription errors contextually.** Only fix words that are obviously wrong given the context. Don't guess. 5. **Keep the conversational feel.** This is a cleaned-up conversation, not a formal document. Some informality is correct. 6. **Ask the user for a title.** The transcript doesn't contain one. Ask what to call the document before generating.