Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install seb155-atlas-plugin-skills-skill-discovery-loopgit clone https://github.com/seb155/atlas-plugin.gitcp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-skills-skill-discovery-loop/SKILL.md---
name: skill-discovery-loop
description: "Voyager-style skill auto-discovery loop: idle-curiosity propose gap → skill-creator draft → adversary test → canary deploy → eval gate → promote OR rollback. HITL gate first 3 auto-skills."
mode: [personal, all]
effort: high
version: 1.0.0
tier: [admin]
---
# Skill Discovery Loop — Voyager Autonomy Pattern
Closed-loop skill auto-discovery inspired by **NVIDIA Voyager** (Wang et al.,
2023): an autonomous agent proposes new abilities, drafts them, stress-tests
them, deploys them progressively, and promotes or discards them based on
empirical evidence — all without a human in the inner loop.
This is the v7.0 W4.1 capstone of the autonomy stack. It does **not**
introduce new logic — it **composes** five existing skills into a single
self-perpetuating workflow that grows the ATLAS skill library over time.
> **HITL gate (W4.1 mandatory)**: the **first three auto-created skills**
> require explicit Seb sign-off at the promotion stage. After three
> successful auto-skills land in `main` and survive 30 days without rollback,
> the loop transitions to **fully autonomous** — Seb is notified async via
> Routines, and only intervenes on explicit failure escalations.
## When to invoke
- `/atlas skill-discovery-loop --once` — manual single iteration
- `/atlas skill-discovery-loop --status` — list recent auto-skills + state
- Daily cron via Anthropic Routines (target cadence: ≥1 auto-skill/month
steady state Q4 2026)
- After idle-curiosity surfaces a gap with `confidence ≥ 0.7`
- When the user asks to "grow the skill library", "Voyager loop",
"auto-discover skills", "self-improve ATLAS"
## Workflow — The Voyager Loop (6 stages)
```
┌──────────────────────────────────────────────────────────────────────┐
│ Stage 1: PROPOSE (idle-curiosity) │
│ └─ Detect skill gap from session telemetry │
│ └─ Output: gap-proposal.json {topic, evidence, confidence} │
│ │ │
│ ▼ │
│ Stage 2: DRAFT (skill-creator) │
│ └─ Generate SKILL.md frontmatter + body from proposal │
│ └─ Output: skills/<auto-name>/SKILL.md (in draft/ branch) │
│ │ │
│ ▼ │
│ Stage 3: STRESS-TEST (W3.3 skill-adversary) │
│ └─ Apply 8 attack patterns: prompt injection, edge inputs, │
│ ambiguous triggers, scope creep, hallucination bait, etc. │
│ └─ Gate: ≥6/8 attacks survived → continue, else → discard │
│ │ │
│ ▼ │
│ Stage 4: CANARY DEPLOY (W3.2 skill-canary-deployer) │
│ └─ Mirror 10% session traffic for 50-invocation window │
│ └─ Watch error rate via skill-scorecard JSONL telemetry │
│ └─ Auto-rollback if error budget >2% │
│ │ │
│ ▼ │
│ Stage 5: EVAL GATE (W3.1 skill-regression-test) │
│ └─ Run golden eval suite (LLM-as-judge + deterministic asserts) │
│ └─ Gate: pass-rate ≥ baseline OR no regression on existing skills │
│ │ │
│ ▼ │
│ Stage 6: PROMOTE OR ROLLBACK │
│ └─ HITL gate (first 3 auto-skills): block at PR for Seb sign-off │
│ └─ Auto-promote (after 3 successful): merge to main + tag │
│ └─ Rollback: archive draft + log decision in decisions.jsonl │
└──────────────────────────────────────────────────────────────────────┘
```
## Reuse pointers (no new logic — pure orchestration)
| Stage | Composed skill | Source |
|-------|----------------------------------------|--------|
| 1 | `idle-curiosity` | `skills/idle-curiosity/SKILL.md` |
| 2 | `skill-creator` (Anthropic core) | `skills/skill-creator/SKILL.md` (or `skill-management` for ATLAS-specific scaffold) |
| 3 | `skill-adversary` (W3.3) | `skills/skill-adversary/SKILL.md` |
| 4 | `skill-canary-deployer` (W3.2) | `skills/skill-canary-deployer/SKILL.md` |
| 5 | `skill-regression-test` (W3.1) | `skills/skill-regression-test/SKILL.md` |
| 6 | `forgejo-pr` + `decision-log` | `skills/forgejo-pr/SKILL.md`, `skills/decision-log/SKILL.md` |
**Zero net-new code paths** — this skill is a workflow contract. All side
effects flow through composed children's existing telemetry, hooks, and
deployment mechanisms.
## CLI surface
```bash
# Manual single iteration (synchronous)
atlas skill-discovery-loop --once
# → runs Stages 1-6 in sequence, blocks on HITL gate if first 3 skills
# Status / observability
atlas skill-discovery-loop --status
# → prints table:
# | auto-name | stage | created | result |
# |---------------------|----------|------------|-----------|
# | auto-grafana-tuner | promoted | 2026-04-12 | live |
# | auto-pg-vacuum-tip | rolled-back | 2026-04-19 | adversary |
# | auto-traefik-debug | canary | 2026-04-28 | watching |
# Specific stage isolation (dev/debug)
atlas skill-discovery-loop --propose-only # Stage 1 only
atlas skill-discovery-loop --resume <auto-name> # restart from last stage
atlas skill-discovery-loop --abandon <auto-name> # archive draft + log
# Routines integration (cloud cron)
atlas skill-discovery-loop --routine-create
# → registers Anthropic Routine: daily at 03:00 EDT, --once
```
## Configuration
`~/.atlas/skill-discovery-loop.yaml` (created on first run):
```yaml
hitl:
required_signoffs: 3 # promote auto-skills 1-3 with Seb approval
signoffs_completed: 0 # incremented only after main-merge + 30d
gates:
adversary_min_pass: 6 # of 8 attacks
canary_error_budget_pct: 2.0
canary_window_invocations: 50
regression_min_baseline_ratio: 1.0
cadence:
proposals_per_day_max: 1
steady_state_target: "1 skill/month"
naming:
prefix: "auto-" # all auto-created skills use auto-* prefix
reserved_words: [atlas, core, admin, dev] # forbidden in auto-names
```
## HITL gate semantics (CRITICAL)
The first 3 auto-created skills MUST satisfy ALL of:
1. PR opened against `main` with label `auto-skill-hitl-required`
2. Seb posts approving review with the exact phrase `auto-skill: APPROVE`
3. PR description contains the full Stage 1-5 audit trail (proposal,
adversary report, canary metrics, regression diff)
4. Skill survives 30 days post-merge with zero rollbacks
Only when all 4 conditions hold for 3 distinct auto-skills does the loop
flip `signoffs_completed: 3` and transition to autonomous mode.
**Escape valve**: any auto-skill can be force-rolled-back via
`atlas skill-discovery-loop --abandon <auto-name>` regardless of stage.
## Telemetry & observability
Each loop iteration appends one JSONL line to
`~/.atlas/skill-discovery-loop.jsonl`:
```json
{"iteration": 42, "auto_name": "auto-pg-vacuum-tip", "stage_reached": "regression",
"stage_results": {"propose": "ok", "draft": "ok", "adversary": "6/8",
"canary": "1.2% err", "regression": "fail"},
"outcome": "rolled-back", "duration_s": 487,
"ts": "2026-04-19T03:14:02Z"}
```
The `--status` subcommand renders this ledger plus a 90-day moving average
of "skills proposed → skills promoted" funnel conversion. SLO target:
**≥10% propose-to-promote conversion** in steady state.
## Rationale (Voyager → ATLAS)
Voyager grew Minecraft skills via env feedback (block-world physics).
ATLAS grows engineering skills via session telemetry (success/error rates,
user reactions, reuse counts). The composition pattern preserves
Voyager's three invariants:
1. **Open-ended exploration** — proposals are not constrained to a fixed
taxonomy; idle-curiosity surfaces whatever gap is empirically real.
2. **Iterative refinement** — failed adversary or regression rounds feed
back into the proposal corpus (negative training signal).
3. **Skill library compounds** — each promoted auto-skill becomes a
primitive callable by future proposals (composition multiplier).
## Failure modes & recovery
| Failure | Detection | Recovery |
|----------------------------------------|---------------------------------|---------------------------------------------------|
| idle-curiosity returns no proposals | Stage 1 stdout empty | Skip iteration, log `no-proposal`, retry next day |
| skill-creator drafts malformed YAML | Frontmatter parse fails | Abandon draft, log `draft-malformed` |
| skill-adversary <6/8 pass | adversary report `pass < 6` | Abandon draft, feed report into `decisions.jsonl` |
| Canary error >2% over 50 invocations | scorecard JSONL rolling window | Auto-rollback via skill-canary-deployer |
| Regression eval fails baseline | regression-test exit ≠ 0 | Abandon draft, log diff for next iteration |
| HITL signoff timeout (>14d, first 3) | PR labeled `auto-skill-stale` | Auto-close PR, archive draft, retry next month |
| forgejo-pr open fails (rate limit, 5xx)| stderr ≠ 0 | Backoff 1h, retry once, then escalate to Seb |
All failures are **non-fatal to the loop itself** — the next scheduled
iteration runs as if the failed attempt never happened, except that the
failed proposal's topic is added to a 30-day cooldown to avoid re-proposing
the same gap immediately.
## Worked example (Stage 1-6 audit trail)
```text
# Iteration #17, 2026-04-12 03:00 EDT
Stage 1 PROPOSE → idle-curiosity surfaced gap: "Grafana dashboard tuner"
confidence=0.82, evidence=12 sessions debugging panels manually
Stage 2 DRAFT → skill-creator generated skills/auto-grafana-tuner/SKILL.md
(frontmatter valid, body 287 lines)
Stage 3 ADVERSARY → 7/8 attacks survived (failed: prompt-injection in panel JSON)
→ continue (≥6/8 threshold met)
Stage 4 CANARY → 10% mirror, 50 invocations, error rate 0.4%
→ continue (under 2% budget)
Stage 5 REGRESS → golden eval: 14/14 pass, no regression on 73 existing skills
→ continue
Stage 6 PROMOTE → HITL gate active (signoffs_completed=0/3)
→ PR opened, awaiting `auto-skill: APPROVE` from Seb
```
After Seb approves and skill survives 30d in main, `signoffs_completed`
increments to 1. After 3 such cycles, the gate flips to autonomous and Seb
receives Routines digest emails instead of PR review pings.
## Anti-patterns (what this loop will NOT do)
- ❌ Propose skills outside the admin tier without explicit Seb invocation
- ❌ Skip any of the 5 gates "because the proposal looks safe"
- ❌ Auto-promote during the first-3-skill HITL window
- ❌ Run more than 1 proposal/day (avoid skill-spam)
- ❌ Create skills with names not prefixed `auto-`
- ❌ Modify existing skills (loop only CREATES; modifications go through
normal `atlas-dev-self` workflow)
## References
- Plan section H W4.1 — Voyager autonomy pattern
- W3.1 `skill-regression-test/SKILL.md` — eval gate
- W3.2 `skill-canary-deployer/SKILL.md` — gradual rollout
- W3.3 `skill-adversary/SKILL.md` — stress-test gate
- `idle-curiosity/SKILL.md` — gap proposer
- `skill-creator/SKILL.md` — draft generator
- Voyager paper: Wang et al., "Voyager: An Open-Ended Embodied Agent with
Large Language Models" (2023, arXiv:2305.16291)
- Anthropic Routines API — daily cron substrate