Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install nyldn-claude-octopus-skills-skill-extractgit clone https://github.com/nyldn/claude-octopus.gitcp claude-octopus/SKILL.MD ~/.claude/skills/nyldn-claude-octopus-skills-skill-extract/SKILL.md---
name: skill-extract
description: "Reverse-engineer design systems, tokens, and components from live products or screenshots"
---
> **Host: Codex CLI** — This skill was designed for Claude Code and adapted for Codex.
> Cross-reference commands use installed skill names in Codex rather than `/octo:*` slash commands.
> Use the active Codex shell and subagent tools. Do not claim a provider, model, or host subagent is available until the current session exposes it.
> For host tool equivalents, see `skills/blocks/codex-host-adapter.md`.
# Extract Skill - Implementation Guide
## Overview
The `extract` skill provides comprehensive reverse-engineering capabilities for design systems and product architectures. It transforms undocumented codebases into structured, implementation-ready documentation.
## Capabilities
### Design System Extraction
- **Token Extraction**: Colors, typography, spacing, shadows from code or CSS
- **Component Analysis**: Props, variants, usage patterns across React/Vue/Svelte
- **Pattern Detection**: Layout patterns, design rules, accessibility guidelines
- **Storybook Generation**: Auto-generated stories with variants and controls
### Product Architecture Extraction
- **Service Detection**: Microservice boundaries, modules, domain boundaries
- **API Mapping**: REST, GraphQL, tRPC, gRPC endpoint cataloging
- **Data Modeling**: ORM schema extraction (Prisma, TypeORM, Sequelize)
- **Feature Inventory**: Route-based and domain-based feature detection
- **C4 Diagrams**: Automated architecture visualization (Mermaid)
## Technical Implementation
### Token Extraction Pipeline
**Priority Order** (High to Low Confidence):
1. **Code-Defined** (95%): `theme.ts`, `tokens.json`, Tailwind config
2. **CSS Variables** (90%): `:root` declarations
3. **Computed Styles** (60%): DOM analysis
4. **Inferred** (40-60%): Color clustering, scale detection
**Color Clustering Algorithm**:
- Uses CIEDE2000 for perceptually-accurate color distance
- K-means++ initialization for stable clustering
- Default k=8 clusters for primary palettes
- ΔE < 2 threshold for duplicate detection
### Component Analysis
**Detection Strategies**:
- AST parsing for TypeScript/JavaScript
- Prop extraction from interfaces and PropTypes
- Variant detection from union types
- Usage tracking across codebase
**Supported Frameworks**:
- React (functional, class, hooks)
- Vue (SFC, Composition API, Options API)
- Svelte (script/template separation)
### Architecture Detection
**Service Boundary Heuristics**:
- Package.json in subdirectories
- Independent deployment configs
- Team ownership boundaries
- Communication pattern analysis
**API Endpoint Detection**:
- Decorator-based routing (NestJS, routing-controllers)
- Express/Fastify route definitions
- GraphQL resolver classes
- tRPC router procedures
- Protocol Buffer (.proto) files
## Multi-AI Orchestration
When enabled, the extract feature uses multiple AI providers for higher accuracy:
**Provider Roles**:
- **Claude**: Synthesis, conflict resolution, documentation
- **Codex**: Code-level analysis, type extraction, architecture
- **Gemini**: Pattern recognition, alternative interpretations, UX insights
**Consensus Mechanism**:
- Threshold: 67% (2/3 providers must agree)
- Disagreements logged in `90_evidence/disagreements.md`
- Confidence scores attached to all outputs
## Output Structure
```
octopus-extract/
└── project-name/
└── timestamp/
├── README.md # Navigation and summary
├── metadata.json # Extraction parameters
│
├── 00_intent/
│ ├── answers.json # User intent responses
│ ├── intent-contract.md # Human-readable summary
│ └── detection-report.md # Stack auto-detection results
│
├── 10_design/
│ ├── tokens.json # W3C Design Tokens format
│ ├── tokens.css # CSS custom properties
│ ├── tokens.md # Human-readable token docs
│ ├── components.csv # Component inventory (tabular)
│ ├── components.json # Structured component data
│ ├── patterns.md # Layout and design patterns
│ └── storybook/ # Storybook scaffold (optional)
│ ├── .storybook/
│ └── stories/
│
├── 20_product/
│ ├── product-overview.md # What, who, key journeys
│ ├── feature-inventory.md # Features by domain
│ ├── architecture.md # C4 text description
│ ├── architecture.mmd # Mermaid C4 diagrams
│ ├── PRD.md # AI-agent executable PRD
│ ├── user-stories.md # Gherkin-style scenarios
│ ├── api-contracts.md # Endpoint specifications
│ ├── data-model.md # Entity relationships
│ └── implementation-plan.md # Phased milestones
│
└── 90_evidence/
├── quality-report.md # Coverage and confidence metrics
├── disagreements.md # Multi-AI conflicts
├── extraction-log.md # Timestamped progress log
└── references.json # File paths per claim
```
## Quality Gates
Automated validation ensures extraction quality:
1. **Token Coverage**: Fail if 0 tokens in design mode
2. **Component Coverage**: Warn if < 50% of component files detected
3. **Architecture Completeness**: Warn if no services detected in product mode
4. **Multi-AI Consensus**: Fail if < 50% agreement on key outputs
## Usage Patterns
### Basic Extraction
```bash
/octo:extract ./my-app
```
### Design-Only Extraction
```bash
/octo:extract ./my-app --mode design --storybook true
```
### Deep Analysis with Multi-AI
```bash
/octo:extract ./my-app --depth deep --multi-ai force
```
### URL Extraction
```bash
/octo:extract https://example.com --mode design --depth quick
```
## Integration with Other Skills
- **/octo:review**: Review extracted outputs for quality
- **/octo:deliver**: Validate extraction completeness
- **/octo:docs**: Generate additional documentation from extractions
## Error Handling
Common error codes:
- `ERR-001`: Invalid input (path/URL not found)
- `ERR-002`: Network timeout (URL extraction)
- `ERR-003`: Permission denied
- `ERR-004`: Out of memory (use `--depth quick`)
- `VAL-001`: Validation failed (no tokens detected)
- `VAL-004`: Low multi-AI consensus
## Performance Targets
| Depth | Time Target | Coverage Target |
|-------|-------------|-----------------|
| Quick | < 2 min | 70% coverage, basic analysis |
| Standard | 2-5 min | 85% coverage, comprehensive |
| Deep | 5-15 min | 95% coverage, multi-AI validation |
## Research Sources
This skill is informed by research on:
- [Tokens Studio](https://tokens.studio/) - Design token automation
- [Superposition](https://superposition.design/) - Token extraction from websites
- [W3C Design Tokens](https://www.designtokens.org/) - Token format standard
- [C4 Model](https://c4model.com/) - Architecture diagramming
- Modern reverse-engineering practices (2026)
## Implementation Status
**Current Version**: 1.0.0 (Skeleton)
**Implemented**:
- ✅ Command structure
- ✅ CLI argument parsing
- ✅ Output directory setup
- ✅ Metadata generation
- ✅ Multi-AI detection
**In Progress**:
- 🚧 Token extraction pipeline
- 🚧 Component analysis engine
- 🚧 Architecture detection
- 🚧 PRD generation
- 🚧 Quality gates
**Planned**:
- ⏳ Storybook scaffold generation
- ⏳ C4 diagram generation
- ⏳ URL extraction mode
- ⏳ CSS inference algorithms
## Contributing
See implementation plan in project documentation.
Implementation phases:
1. Foundation & CLI (Week 1)
2. Auto-Detection Engine (Week 2)
3. Design Extraction (Week 3-4)
4. Product Extraction (Week 5-6)
5. Multi-AI Orchestration (Week 7)
6. Quality Gates (Week 8)
7. Testing & Documentation (Week 10)
*This skill implements the design specified in PRD v2.0 (AI-Executable)*