Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install seb155-atlas-plugin-skills-skill-canary-deployergit clone https://github.com/seb155/atlas-plugin.gitcp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-skills-skill-canary-deployer/SKILL.md---
name: skill-canary-deployer
description: "Skill canary deployment + auto-rollback. 10% traffic mirror, error budget >2% = rollback. Use when 'canary deploy', 'gradual rollout', 'rollback skill'."
mode: [ops, engineering]
effort: high
version: 1.0.0
tier: [admin]
---
# Skill Canary Deployer — Gradual Rollout + Auto-Rollback
Canary-deploy a new skill version to **10% of session traffic**, watch its
error rate via the W1.2 `skill-scorecard` JSONL telemetry, and **auto-rollback**
when the rolling error rate exceeds a 2% budget over a 50-invocation window.
This is the v7.0 W3.2 quality gate that lets you ship skill changes without
blast-radius surprises. It pairs with W3.1 (regression test gate, blocks bad
versions before merge) — W3.1 is the *pre-merge* defense, W3.2 is the
*post-merge progressive-rollout* defense.
> **HITL gate (W3.2 mandatory)**: the **first three canaries** ever deployed
> require explicit Seb sign-off before promotion. After three successful
> rollouts, auto-promote becomes the default and Seb is notified async.
## When to invoke
- "canary deploy <skill>"
- "gradual rollout"
- "deploy skill behind a flag"
- "rollback the canary for <skill>"
- After tagging a new version of an existing admin/dev skill
- Whenever a skill change is risky enough to merit shadow traffic before full
rollout (e.g. prompt rewrite, model-tier downgrade, new tool dependency)
## Architecture
```
┌──────────────────────────────────────────────────────────────────────┐
│ atlas-plugin/ atlas-plugin-staging/ (DESIGN ONLY) │
│ skills/<name>/SKILL.md skills/<name>/SKILL.md │
│ ▲ ▲ │
│ │ baseline │ canary │
│ │ (90%) │ (10%) │
│ │ │ │
│ └──────────┬───────────────────┘ │
│ │ │
│ hooks/canary-traffic-mirrorer (PreToolUse Skill) │
│ ├─ deterministic coin-flip per session │
│ ├─ writes ATLAS_SKILL_VERSION=canary|baseline │
│ └─ updateEnv via JSON output │
│ │
│ ~/.atlas/canary-state/<skill>.json │
│ { canary_version, baseline_version, traffic_pct, │
│ invocations, errors, status, started_at } │
│ │
│ ~/.atlas/scorecards/<skill>/YYYY-MM-DD.jsonl ← W1.2 │
│ (data source: error_rate over rolling 50 invocations) │
│ │
│ atlas-routines hourly scan: │
│ → if rolling error_rate > 2% over last 50 → ROLLBACK │
│ → if invocations >= 200 and error_rate < 2% → eligible promote │
└──────────────────────────────────────────────────────────────────────┘
```
### Staging area (`atlas-plugin-staging/`)
The canary lives in a parallel git tree co-located with the main plugin
checkout. **Design only for v7.0 W3.2** — actual cross-tree dispatch
infrastructure is deferred (BKL-W3.2-INFRA). The `ATLAS_SKILL_VERSION`
env var is the contract; consumers (Skill tool, Read of `SKILL.md`)
honor it once that infra lands. For now, the env var is recorded for
downstream telemetry / future routing and the hook is observable via
`~/.atlas/canary-state/_decisions.jsonl`.
### Traffic split (10%)
The PreToolUse hook computes a deterministic coin-flip per session:
```
seed = sha256("$CLAUDE_SESSION_ID:$skill_name") modulo 100
canary chosen ⇔ seed < traffic_pct (default 10)
```
This guarantees a session that picks canary on the first invocation keeps
picking canary for every subsequent invocation of the same skill — no
mid-session flapping that would smear the metrics across versions.
`ATLAS_CANARY_SKILL_OVERRIDE=<skill>` forces canary for that skill in the
current shell (manual override for debugging / dogfooding).
### Error budget (2% over 50 invocations)
Source of truth: `~/.atlas/scorecards/<skill>/YYYY-MM-DD.jsonl` written by
W1.2 `hooks/scorecard-emitter`. Each line:
```json
{"ts":"2026-05-01T18:30:00Z","skill":"foo","duration_ms":182,
"status":"ok","tokens_used":1240,"cost_usd":0.012,"trace_id":"…"}
```
Aggregator:
```
canary_invocations = lines where status in (ok,error) AND ATLAS_SKILL_VERSION=canary
in the last 50 canary lines (across all daily files)
canary_errors = subset where status == "error"
error_rate = canary_errors / max(1, canary_invocations)
```
When `canary_invocations >= 50` and `error_rate > 0.02`:
- mark `status=rolled_back` in canary state
- emit `~/.atlas/canary-state/_decisions.jsonl` audit line
- log to ATLAS audit channel
- **HITL #1-3**: also page Seb (notify-only — rollback already done)
When `canary_invocations >= 200` and `error_rate <= 0.02`:
- mark `status=eligible_promote`
- HITL #1-3: AskUserQuestion before flipping
- HITL ≥4: auto-promote and emit notification
## CLI
| Command | Behavior |
|---------|----------|
| `atlas canary deploy <skill@version>` | Register canary; init state; first-call setup the staging path; bump traffic to 10%. Refuses if a canary already exists for that skill (must rollback first). |
| `atlas canary status [--all\|--skill <name>]` | Tabulate active canaries: skill, canary_version, baseline_version, invocations, error_rate, status. Default = `--all`. |
| `atlas canary promote <skill@version>` | Move canary → baseline. HITL gate for canaries #1-3; otherwise immediate. Updates state, archives decision line, removes ATLAS_CANARY_SKILL_OVERRIDE notes from shell init. |
| `atlas canary rollback <skill>` | Force rollback. Sets traffic_pct=0, status=rolled_back, removes hook bindings for that skill. Emits decision line with reason="manual". |
All four commands are thin wrappers — the implementation surface is
intentionally small (a single Python module under
`scripts/atlas_canary.py` per the plan, deferred infra) so the SKILL
itself is purely the *protocol* + *audit guarantees*.
## State file schema
`~/.atlas/canary-state/<skill>.json`:
```json
{
"skill": "memory-dream",
"canary_version": "1.4.0",
"baseline_version": "1.3.2",
"traffic_pct": 10,
"invocations": 42,
"errors": 1,
"status": "running",
"started_at": "2026-05-01T17:00:00Z",
"hitl_index": 2,
"rollback_threshold": 0.02,
"promote_threshold_invocations": 200
}
```
Allowed `status` values: `running` | `rolled_back` | `eligible_promote`
| `promoted` | `paused`.
## Auto-rollback routine (cron-equivalent)
Implementation is `atlas-routines` (15-min interval, headless via
Anthropic Routines API). Pseudocode:
```python
for state in glob("~/.atlas/canary-state/*.json"):
if state.status != "running": continue
metrics = aggregate_scorecard(state.skill, version="canary", window=50)
if metrics.invocations < 50: continue
if metrics.error_rate > state.rollback_threshold:
rollback(state.skill, reason="error_budget_exceeded",
actual_rate=metrics.error_rate)
notify_seb(state) # always, regardless of HITL index
elif metrics.invocations >= state.promote_threshold_invocations:
if state.hitl_index < 3:
ask_user_question(state) # blocking AskUserQuestion
else:
promote(state.skill)
notify_seb(state, channel="async")
```
The routine is idempotent (re-reads state every tick) and crash-safe
(state writes use `tmp + atomic rename`).
## Hook contract
`hooks/canary-traffic-mirrorer` (PreToolUse, event filter: tool=Skill):
- Reads stdin (Claude Code hook payload JSON)
- If `tool_name != "Skill"`, exits 0 silently
- Reads `~/.atlas/canary-state/<skill>.json`; if missing, exits 0
- If `ATLAS_CANARY_SKILL_OVERRIDE=<skill>`, force `version=canary`
- Else compute `coin_flip = sha256("$CLAUDE_SESSION_ID:$skill") % 100`;
pick `canary` iff `coin_flip < state.traffic_pct`
- Emit `{"updateEnv":{"ATLAS_SKILL_VERSION":"<canary|baseline>"}}` to stdout
- Append a one-liner to `~/.atlas/canary-state/_decisions.jsonl` for audit
- Always exit 0 (never block tool dispatch)
Performance budget: <5ms P95 (matches the trace-id-injector budget; the
hook does at most one stat + one read + one append).
## HITL flow (gates W3.2 #1, #2, #3)
For the first three canaries deployed across the lifetime of an ATLAS
install (tracked via `~/.atlas/canary-state/_hitl-counter.json`), the
promote step is **blocking on Seb**:
```
ATLAS uses AskUserQuestion (per global rule "AskUserQuestion = OBLIGATOIRE"):
Question: "Canary <skill@1.4.0> reached 200 invocations with 0.5% error rate.
Promote to baseline?"
Options: ["Promote now", "Hold (collect more data)", "Rollback"]
```
After three successful HITL-gated promotions, the counter freezes and
auto-promote is the default. Rollback notifications remain blocking for
the engineer (so a 3 AM rollback wakes someone) — the auto-rollback
itself is *immediate*, but the human follow-up is mandatory.
## Reuse pointer (W1.2 skill-scorecard)
The error rate comes **entirely** from W1.2's scorecard JSONL — this skill
adds **zero new emission paths**. We re-use:
- `~/.atlas/scorecards/<skill>/YYYY-MM-DD.jsonl` (W1.2 hooks/scorecard-emitter)
- The `error_rate` aggregation primitive documented in `skill-scorecard`
- The same `status` field semantics (`ok|error`)
If the hook ever needs a per-version dimension, we add a `version` field
to the scorecard schema (W1.2 owns it). For v7.0 W3.2, the hook records
the version in `_decisions.jsonl` and the routine joins the two streams
on `(skill, ts-bucket)`.
## See Also
- W1.1 atlas-trace (parent observability layer)
- W1.2 skill-scorecard (error_rate data source — REUSED)
- W3.1 skill-regression-gate (pre-merge sibling)
- W4.x progressive-deploy (cross-skill orchestration, future)
- Anthropic Routines API (atlas-routines skill, hourly cron)
- Plan SSoT: `.blueprint/plans/ultrathink-regarde-ce-qui-abundant-petal.md` §H W3.2
## Anti-patterns (do NOT do)
1. **Bypass the coin-flip** by hard-coding `ATLAS_SKILL_VERSION=canary` in
`~/.bashrc`. That smears the canary metric across 100% of your sessions
and the auto-rollback math becomes meaningless. Use
`ATLAS_CANARY_SKILL_OVERRIDE=<skill>` for the single-shell override.
2. **Promote without invocations**. The 200-invocation floor is the
statistical-significance budget. Lowering it (state file edit) means
you ship on noise.
3. **Multi-skill canary in a single session**. The hook supports it but
correlated failures across skills become impossible to attribute.
Deploy one canary at a time per session.
4. **Edit state files by hand**. Use the CLI; manual edits skip the audit
line and break the routine's diff detection.
5. **Skip HITL on canary #1-3**. The first three deployments calibrate
*your* operator intuition for the system. Skipping = no calibration.
## Verification
```bash
# 1. Frontmatter valid
head -10 skills/skill-canary-deployer/SKILL.md \
| python3 -c "import sys, yaml; doc=sys.stdin.read().split('---')[1]; print(yaml.safe_load(doc))"
# 2. Hook syntax valid
bash -n hooks/canary-traffic-mirrorer
# 3. Hook emits no env update when no canary state exists (default-off)
echo '{"tool_name":"Skill","tool_input":{"skill":"nonexistent"}}' \
| bash hooks/canary-traffic-mirrorer
# Expected: empty stdout, exit 0
# 4. Hook flips canary deterministically when state file present
mkdir -p ~/.atlas/canary-state
cat > ~/.atlas/canary-state/test-skill.json <<'EOF'
{"skill":"test-skill","canary_version":"2.0","baseline_version":"1.0",
"traffic_pct":100,"invocations":0,"errors":0,"status":"running"}
EOF
echo '{"tool_name":"Skill","tool_input":{"skill":"test-skill"}}' \
| CLAUDE_SESSION_ID=test bash hooks/canary-traffic-mirrorer
# Expected: stdout contains '"ATLAS_SKILL_VERSION":"canary"' (100% traffic)
rm ~/.atlas/canary-state/test-skill.json
```