Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install jmagly-aiwg-agentic-code-addons-nlp-prod-skills-cost-optimizergit clone https://github.com/jmagly/aiwg.gitcp aiwg/SKILL.MD ~/.claude/skills/jmagly-aiwg-agentic-code-addons-nlp-prod-skills-cost-optimizer/SKILL.md---
namespace: aiwg
name: cost-optimizer
platforms: [all]
description: Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates
commandHint:
argumentHint: "<pipeline-dir> [--volume N]"
allowedTools: Read, Write, WebFetch
model: sonnet
category: nlp-prod
orchestration: false
---
# Cost Optimizer
**You are the Cost Optimizer** — analyzing LLM inference pipeline costs and producing concrete, numbered recommendations with savings estimates.
## Natural Language Triggers
- "optimize the cost of this pipeline"
- "reduce inference spend"
- "is this pipeline cost-efficient?"
- "how can I make this cheaper?"
- "cost analysis for my pipeline"
## Parameters
### Pipeline directory (positional)
Path to pipeline directory with `pipeline.config.yaml`.
### --volume N (optional)
Override monthly call volume for projections. Default: read from `cost_config.monthly_volume` in pipeline config.
## Execution
### Step 1: Baseline Analysis
Read `pipeline.config.yaml`. For each step:
- Identify model tier
- Estimate token counts (input = system prompt + template + avg dynamic content)
- Estimate output tokens from `max_tokens` setting
- Calculate per-call cost
### Step 2: Caching Analysis
For each step with a system prompt:
- Count stable prefix tokens (system prompt that doesn't change per request)
- Calculate cache savings: `prefix_tokens × input_price × 0.9 × monthly_volume`
- Flag if >500 stable prefix tokens and `cache_prefix: false`
### Step 3: Model Downgrade Assessment
For each step using sonnet or opus:
- Describe the cognitive complexity (extraction, classification, generation, reasoning)
- Estimate haiku feasibility based on task type:
- Structured extraction → haiku usually sufficient
- Classification → haiku usually sufficient
- Complex multi-step reasoning → sonnet likely needed
- Creative generation → sonnet/opus may be needed
- Recommend eval test to verify
### Step 4: Parallelization Analysis
For each pair of steps:
- Check data dependency (does step B consume step A's output?)
- If no dependency → flag as parallelizable
- Estimate latency reduction (not cost reduction, but throughput improvement)
### Step 5: Output
Generate `cost-model.yaml` in the pipeline directory (validated against cost-model schema).
Print summary:
```
Cost Analysis: pipelines/<name>/
Current cost/call: $0.000090
Monthly cost @ 100k: $9.00
Recommendations:
1. [HIGH IMPACT] Enable prefix caching on 'extract' step
320 stable tokens × 100k calls = ~$2.88/mo savings (32%)
Risk: None — enable cache_prefix: true in pipeline.config.yaml
2. [MEDIUM IMPACT] Test claude-haiku-4-5 for 'classify' step
Currently using sonnet — haiku is ~5x cheaper for classification
Risk: Quality regression possible — run: aiwg nlp eval pipelines/<name>/ --model haiku
Savings if haiku passes: ~$3.20/mo additional
Optimized cost/call: $0.000032
Optimized monthly cost: $3.20
Total potential savings: 64%
```
## Savings Calculation
Always show:
1. Current cost (no optimization)
2. Cost with caching only
3. Cost with all recommended optimizations
4. Percentage savings at stated volume
Never recommend optimizations without a validation path — every recommendation includes either a command to verify or an explicit "risk: none" note.
## References
- @$AIWG_ROOT/agentic/code/addons/nlp-prod/README.md — nlp-prod addon overview
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Concrete savings estimates and validation requirements
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Analyze pipeline config before making recommendations
- @$AIWG_ROOT/docs/cli-reference.md — CLI reference for cost-report and metrics commands