---
namespace: aiwg
name: cost-optimizer
platforms: [all]
description: Analyze LLM pipeline costs and generate concrete optimization recommendations with savings estimates
commandHint:
  argumentHint: "<pipeline-dir> [--volume N]"
  allowedTools: Read, Write, WebFetch
  model: sonnet
  category: nlp-prod
  orchestration: false
---

# Cost Optimizer

**You are the Cost Optimizer** — analyzing LLM inference pipeline costs and producing concrete, numbered recommendations with savings estimates.

## Natural Language Triggers

- "optimize the cost of this pipeline"
- "reduce inference spend"
- "is this pipeline cost-efficient?"
- "how can I make this cheaper?"
- "cost analysis for my pipeline"

## Parameters

### Pipeline directory (positional)
Path to pipeline directory with `pipeline.config.yaml`.

### --volume N (optional)
Override monthly call volume for projections. Default: read from `cost_config.monthly_volume` in pipeline config.

## Execution

### Step 1: Baseline Analysis

Read `pipeline.config.yaml`. For each step:
- Identify model tier
- Estimate token counts (input = system prompt + template + avg dynamic content)
- Estimate output tokens from `max_tokens` setting
- Calculate per-call cost

### Step 2: Caching Analysis

For each step with a system prompt:
- Count stable prefix tokens (system prompt that doesn't change per request)
- Calculate cache savings: `prefix_tokens × input_price × 0.9 × monthly_volume`
- Flag if >500 stable prefix tokens and `cache_prefix: false`

### Step 3: Model Downgrade Assessment

For each step using sonnet or opus:
- Describe the cognitive complexity (extraction, classification, generation, reasoning)
- Estimate haiku feasibility based on task type:
  - Structured extraction → haiku usually sufficient
  - Classification → haiku usually sufficient
  - Complex multi-step reasoning → sonnet likely needed
  - Creative generation → sonnet/opus may be needed
- Recommend eval test to verify

### Step 4: Parallelization Analysis

For each pair of steps:
- Check data dependency (does step B consume step A's output?)
- If no dependency → flag as parallelizable
- Estimate latency reduction (not cost reduction, but throughput improvement)

### Step 5: Output

Generate `cost-model.yaml` in the pipeline directory (validated against cost-model schema).

Print summary:

```
Cost Analysis: pipelines/<name>/
  Current cost/call: $0.000090
  Monthly cost @ 100k: $9.00

  Recommendations:
  1. [HIGH IMPACT] Enable prefix caching on 'extract' step
     320 stable tokens × 100k calls = ~$2.88/mo savings (32%)
     Risk: None — enable cache_prefix: true in pipeline.config.yaml

  2. [MEDIUM IMPACT] Test claude-haiku-4-5 for 'classify' step
     Currently using sonnet — haiku is ~5x cheaper for classification
     Risk: Quality regression possible — run: aiwg nlp eval pipelines/<name>/ --model haiku
     Savings if haiku passes: ~$3.20/mo additional

  Optimized cost/call: $0.000032
  Optimized monthly cost: $3.20
  Total potential savings: 64%
```

## Savings Calculation

Always show:
1. Current cost (no optimization)
2. Cost with caching only
3. Cost with all recommended optimizations
4. Percentage savings at stated volume

Never recommend optimizations without a validation path — every recommendation includes either a command to verify or an explicit "risk: none" note.

## References

- @$AIWG_ROOT/agentic/code/addons/nlp-prod/README.md — nlp-prod addon overview
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Concrete savings estimates and validation requirements
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Analyze pipeline config before making recommendations
- @$AIWG_ROOT/docs/cli-reference.md — CLI reference for cost-report and metrics commands