CoderuvnetFree

cost-trend

Read every docs/benchmarks/runs/*.json and surface drift in win rate, latency, escalation rate, and LLM-baseline cost over time

Repo bundle on Versuzruvnet/ruflo282 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/ruvnet/ruflo Yours? Claim it ↗

§ 01 — Stats

Stars49.5k

Prior1224

Quality—

Score—

Tasks—

§ 02 — Install

Get cost-trend.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install ruvnet-ruflo-plugins-ruflo-cost-tracker-skills-cost-trend

Or clone the repo

$git clone https://github.com/ruvnet/ruflo.git

Or copy the SKILL.md manually

$cp ruflo/SKILL.MD ~/.claude/skills/ruvnet-ruflo-plugins-ruflo-cost-tracker-skills-cost-trend/SKILL.md

More Versuz picks

★ Featured$1.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge ruvnet-ruflo-plugins-ruflo-cost-tracker-skills-cost-trend↵

Show SKILL.md content (~530 tokens)

---
name: cost-trend
description: Read every docs/benchmarks/runs/*.json and surface drift in win rate, latency, escalation rate, and LLM-baseline cost over time
argument-hint: ""
allowed-tools: Bash
---

# Cost Trend

The smoke gate is binary (`winRate ≥ 0.80` → pass/fail). The corpus benchmarks captured over time form a curve — and curves catch regressions the gate misses (win rate slowly creeping from 100% to 85% is "still passing" by smoke but a real degradation).

This skill reads every persisted run in `docs/benchmarks/runs/*.json` and reports first→last deltas plus a per-run series, flagging regressions in win rate or latency.

## When to use

- Before a release — check that the speedup hasn't drifted.
- After expanding the corpus — verify older runs still hit the same win rate on the new corpus *they* reflected.
- After upgrading `agent-booster` — surface latency / strategy changes.

## Steps

1. **Run the trend script** from the project root:

```bash
node plugins/ruflo-cost-tracker/scripts/trend.mjs
```

Optional env:
- `TREND_FORMAT=json` — emit JSON instead of markdown
- `TREND_LIMIT=10` — consider only the most recent N runs

2. **Inspect the drift summary** — first vs last on win rate, avg latency, p99, escalation rate, speedup vs Gemini.

3. **Inspect the per-run series** — one row per run, including Sonnet 4.6 + Opus 4.7 baseline latencies if those were enabled (`BENCH_ANTHROPIC=1` at run time).

4. **Regression flags** — the script emits `> ⚠ Regression` callouts when:
- Win rate dropped between first and last run
- Avg latency rose ≥ 1.5× from first run

## Cross-references

- `cost-benchmark` — the producer of the run JSONs this skill consumes
- `bench/booster-corpus.json` — the corpus version is recorded in each run, so trends across corpus versions remain interpretable
- `docs/benchmarks/runs/latest.json` — the most-recent run; smoke step 23 gates on `winRate ≥ 0.80` from this file

cost-trend

Get cost-trend.

vz-bench-debug

vz-scrape-runner

Think you can beat it?