Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install youtube-transcriber-pipeline-claude-code-skillgit clone https://github.com/astoreyai/claude-skills.gitcp -r claude-skills/ ~/.claude/skills/youtube-transcriber-pipeline-claude-code-skill/# YouTube Transcriber Pipeline - Claude Code Skill
**Version**: 1.0.0 | **Status**: Ready for Claude Code Integration
## Overview
Complete 4-skill pipeline for extracting, formatting, organizing, and archiving YouTube transcripts. Integrates with Claude Code CLI using the system's configured API keys (no manual key management needed).
## Capabilities
This skill provides a complete workflow:
1. **Extract Facts** - AI-powered fact extraction from transcripts (uses Claude API via Claude Code)
2. **Format Notes** - Convert facts to atomic markdown notes with wiki-links
3. **Integrate Obsidian** - Import notes into Obsidian vault with backlinks
4. **Archive Transcripts** - Safe archival with integrity verification
## Usage in Claude Code
```bash
# Run complete pipeline
/run-youtube-transcriber --transcript ~/Downloads/transcript.txt
# Just extract facts
/extract-transcript-facts ~/Downloads/transcript.txt
# Just format notes
/format-transcript-notes ~/Downloads/notes/ --output ./formatted
# Just integrate Obsidian
/integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron
# Just archive
/archive-transcripts ~/Downloads/transcript.txt
```
## Features
### Phase 1: Extract Facts (AI-Powered)
- Uses Claude 3.5 Sonnet via Claude Code CLI
- 7 fact categories (scientific, quotes, methods, examples, statistics, warnings, connections)
- Hierarchical topic organization
- Confidence scoring (0.0-1.0)
- Automatic metadata (actionable, philosophical, controversial)
- **No API key needed** - Uses Claude Code's configured authentication
### Phase 2: Format as Atomic Notes
- 1 fact = 1 markdown file
- Wiki-link cross-references [[Topic Name]]
- YAML front matter with metadata
- Topic indices and hierarchies
- Obsidian vault compatibility
### Phase 3: Integrate Obsidian
- Safe vault backup before changes
- Preserve directory structure
- Auto-generate backlinks
- Master index creation
- Link verification
- Dry-run mode for testing
### Phase 4: Archive Transcripts
- tar.gz compression
- MD5 integrity verification
- Metadata tracking
- Optional cleanup
- Verify-only mode
## Installation
The skill is ready to use once copied to Claude Code:
```bash
# Copy to Claude Code skills directory
cp -r ~/github/astoreyai/claude-skills/ ~/.claude/skills/youtube-transcriber
# Or individual components
cp -r youtube-transcript-extractor ~/.claude/skills/
cp -r transcript-to-logseq ~/.claude/skills/
cp -r transcript-to-obsidian ~/.claude/skills/
cp -r transcript-archiver ~/.claude/skills/
```
## Requirements
### For Claude Code CLI
- Claude Code installed and configured
- API key already configured in Claude Code (no manual setup needed)
- Python 3.11+
### For Obsidian Integration
- Obsidian vault at `~/Documents/Obsidian/Aaron/` (configurable)
- Read/write permissions to vault directory
## Architecture
```
Claude Code CLI
↓
Transcript Extractor (Phase 1)
├─ Uses Claude Code's API key automatically
├─ Extracts facts via Claude API
└─ Outputs Fact objects
↓
Logseq Formatter (Phase 2)
├─ Converts facts to atomic notes
└─ Creates markdown files
↓
Obsidian Integrator (Phase 3)
├─ Imports notes to vault
├─ Generates backlinks
└─ Creates indices
↓
Transcript Archiver (Phase 4)
├─ Compresses files
├─ Verifies integrity
└─ Stores archive
```
## Future: Ollama Integration
The architecture is designed to support local LLM models via Ollama:
```bash
# Future upgrade (when Ollama integration ready)
/run-youtube-transcriber --transcript ~/Downloads/transcript.txt \
--llm-backend ollama \
--model mistral-7b
# Benefits:
# - No API costs
# - Runs completely locally
# - Works offline
# - Full privacy (data never leaves your machine)
```
Swapping backends will be as simple as changing a configuration parameter.
## Testing
All 4 skills are fully tested:
- Phase 1: 18 tests (extraction logic)
- Phase 2: 21 tests (formatting, hierarchy)
- Phase 3: 6 tests (integration, backlinks)
- Phase 4: 18 tests (archival, verification)
- Integration: 6 tests (cross-skill compatibility)
**Total: 69/69 tests passing (100%)**
## Performance
- Fact extraction: 50-100ms per fact (via Claude API)
- Note formatting: ~500ms per note
- Obsidian integration: <1s for 100 notes
- Archival: <2s for 1000 files
## Data Flow
```
Raw Transcript (txt/md)
↓
Phase 1: Extract Facts
→ Fact objects with metadata
↓
Phase 2: Format Notes
→ Markdown files in ~/output/notes/
↓
Phase 3: Integrate Obsidian
→ Updated ~/Documents/Obsidian/Aaron/
↓
Phase 4: Archive
→ ~/youtube-transcriber/transcripts-archive/
```
## Example Workflow
### 1. Extract Facts
```bash
/extract-transcript-facts ~/Downloads/ml-lecture.txt
```
Output:
```
✓ Transcript loaded (5,234 chars)
✓ Extracting facts...
✓ Facts extracted: 12
✓ Topics identified: 4
✓ Accuracy: 92%
```
### 2. Format Notes
```bash
/format-transcript-notes ~/Downloads/ml-lecture.txt --output ./notes
```
Output:
```
✓ Creating atomic notes...
✓ Notes created: 12
✓ Topics structured: 4
✓ Output: ./notes/
```
### 3. Integrate Obsidian
```bash
/integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron
```
Output:
```
✓ Vault path: ~/Documents/Obsidian/Aaron
✓ Notes integrated: 12
✓ Backlinks created: 8
✓ Index created: Video-Transcripts/index.md
```
### 4. Archive
```bash
/archive-transcripts ~/Downloads/ml-lecture.txt
```
Output:
```
✓ Files archived: 1
✓ Archive: ~/youtube-transcriber/transcripts-archive/transcripts_20251122_143200.tar.gz
✓ Size: 45.2 KB
```
## Configuration
Edit `config.yaml` in the skill directory:
```yaml
# Paths
vault_path: "~/Documents/Obsidian/Aaron"
archive_dir: "~/youtube-transcriber/transcripts-archive"
# Extraction
max_facts: null # No limit
confidence_threshold: 0.75
extract_all_categories: true
# Formatting
notes_per_file: 1 # Atomic notes
add_wiki_links: true
create_indices: true
# Integration
backup_before_integrate: true
auto_generate_backlinks: true
dry_run_mode: false # Set true to test without changes
# Archival
compression: "gzip"
verify_after_archive: true
cleanup_originals: false
```
## Troubleshooting
### API Key Issues (If Running Standalone)
If you run the scripts outside Claude Code and get API key errors:
```bash
# Claude Code integration (recommended)
# Just use /extract-transcript-facts - no setup needed
# Or manually set API key for standalone use
export ANTHROPIC_API_KEY='sk-ant-...'
python workflow_runner.py --transcript your_file.txt
```
### Obsidian Vault Not Found
```bash
# Create vault directory if needed
mkdir -p ~/Documents/Obsidian/Aaron
# Set custom vault path
/integrate-obsidian-vault ./notes/ --vault-path /your/vault/path
```
### Permission Errors
```bash
# Ensure write access to vault and archive directories
chmod -R 755 ~/Documents/Obsidian/Aaron
mkdir -p ~/youtube-transcriber/transcripts-archive
chmod -R 755 ~/youtube-transcriber
```
## Files Included
```
youtube-transcriber/
├── youtube-transcript-extractor/ Phase 1 Skill
│ ├── src/
│ │ ├── extractor.py
│ │ ├── formatters.py
│ │ └── metadata.py
│ └── tests/ (18 tests)
│
├── transcript-to-logseq/ Phase 2 Skill
│ ├── src/
│ │ ├── logseq_formatter.py
│ │ ├── obsidian_compat.py
│ │ └── hierarchy_builder.py
│ └── tests/ (21 tests)
│
├── transcript-to-obsidian/ Phase 3 Skill
│ ├── src/
│ │ └── integrator.py
│ └── tests/ (6 tests)
│
├── transcript-archiver/ Phase 4 Skill
│ ├── src/
│ │ └── archiver.py
│ └── tests/ (18 tests)
│
├── workflow_runner.py Full pipeline runner
├── integration_test.py Cross-skill tests (6 tests)
└── config.yaml Configuration
```
## Future Roadmap
### v1.1.0 (Ollama Integration)
- [ ] Swap Claude API → Ollama for local LLMs
- [ ] Support multiple local models (Mistral, Llama2, etc)
- [ ] Offline-first workflow
### v1.2.0 (Advanced Features)
- [ ] Batch transcript processing
- [ ] Custom fact categories
- [ ] Multiple output formats
- [ ] Web UI for management
### v1.3.0 (Integration)
- [ ] Direct YouTube URL support (fetch transcripts automatically)
- [ ] Streaming support for long transcripts
- [ ] Multi-language support
## Support
- **Documentation**: See README.md in each skill directory
- **Tests**: Run `python -m pytest tests/` in each directory
- **Issues**: Check integration_test.py for validation
## License
Same as parent project (astoreyai)
---
**Status**: ✅ Ready for Claude Code integration
**Works With**:
- ✅ Claude Code CLI (uses configured API key automatically)
- ✅ Ollama (local models - future version)
- ✅ Custom LLM backends (extensible architecture)
**No API key management needed when used with Claude Code CLI**