Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install seb155-atlas-plugin-dist-atlas-admin-addon-skills-experiment-loopgit clone https://github.com/seb155/atlas-plugin.gitcp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-dist-atlas-admin-addon-skills-experiment-loop/SKILL.md---
name: experiment-loop
description: "Autonomous optimization loop (autoresearch). This skill should be used when the user asks to 'run experiment', 'optimize', 'autoresearch', '/atlas experiment', or has a config.yaml for analyze→mutate→execute→measure→decide with HITL gates."
mode: [personal, all]
effort: high
context: fork
agent: experiment-runner
---
# Experiment Loop — Autonomous Optimization
Karpathy autoresearch-inspired: defined target → autonomous experimentation → HITL review.
## v5.7.0+ Native Delegation (Phase 4)
For simple recurring tasks, prefer CC native `/loop` (v2.1.89+) or `CronCreate` tool
over this skill's custom scheduling:
```bash
# Simple periodic task — use native
/loop 5m check deploy status
/loop 1h /atlas health infra
# Full experiment with HITL gates + mutation tracking — use this skill
/atlas experiment start <config>
```
Keep this skill for: multi-iteration experiments with mutation proposals, HITL gates,
measurement tracking, result synthesis. Not for simple recurring pings.
## Invocation
| Command | Action |
|---------|--------|
| `/atlas tune <name>` | Run named experiment |
| `/atlas tune --list` | List available experiments |
| `/atlas tune --history <name>` | Show experiment history |
| `/atlas tune --baseline <name>` | Show/update baseline |
## Experiment Config
Defined in `.claude/assay/experiments.yaml`. Each experiment specifies:
| Field | Purpose |
|-------|---------|
| `target` | What to optimize: `{type: database\|file\|api, table/path, filter}` |
| `metric` | Measurement: `{name, direction: maximize\|minimize, command}` |
| `golden_dataset` | HITL-validated baseline: `{path, description}` |
| `budget` | Limits: `{max_iterations, time_per_iteration, total_timeout}` |
| `hitl` | Gates: `{threshold, auto_accept_below, always_reject_below}` |
| `mutation_strategy` | Approach: `insights\|random\|systematic` + params |
| `model` | `sonnet` (iteration) or `opus` (design/report) |
See existing experiments in `.claude/assay/experiments.yaml` for examples (rule-engine, yolo-pid, omnisearch).
## Execution Flow
### Step 1: LOAD
Read config → validate target/metric/dataset → load or create baseline.
**HITL Gate**: AskUserQuestion to confirm experiment parameters before starting.
### Step 2: BASELINE
Execute metric command → record `{metric, timestamp, config_snapshot}` → save to `.claude/assay/baselines/`.
### Step 3: ITERATE (loop up to max_iterations)
| Phase | Action |
|-------|--------|
| 3a. ANALYZE | Read insights/telemetry, identify lowest-performing element |
| 3b. HYPOTHESIZE | Formulate what to change and why |
| 3c. MUTATE | Apply exactly **ONE** change. Record old/new/reason |
| 3d. EXECUTE | Run metric command (time-boxed) |
| 3e. MEASURE | `delta = new - baseline`, compute improvement % |
| 3f. DECIDE | See decision table below |
| 3g. LOG | Append to `.claude/assay/history/{experiment}-{date}.jsonl` |
**Decision table**:
| Condition | Action |
|-----------|--------|
| `delta < always_reject_below` | ROLLBACK |
| `delta < auto_accept_below` | ACCEPT (quiet) |
| `delta > threshold` | **HITL GATE** — AskUserQuestion: accept/reject/modify |
| Otherwise | ACCEPT (auto) |
### Step 4: REPORT
Generate report → save to `.claude/assay/reports/{experiment}-{date}.md`.
**HITL Gate**: Keep all / rollback to baseline / cherry-pick iterations.
## Approved-Mode Integration (v6.0.0-alpha.7+)
Every HITL gate in experiment-loop respects session autonomy state via `hooks/autonomy-gate.sh`:
| Gate | Default gate_id | Default tier | Always-ask action |
|------|-----------------|--------------|-------------------|
| Step 1 LOAD (confirm params) | `experiment-params` | CODED | — |
| Step 3f DECIDE (delta > threshold) | `experiment-decide` | VALIDATING | — |
| Step 4 REPORT (keep/rollback/cherry-pick) | `experiment-final` | VALIDATED | `data:modify_prod_schema` (if rule changes affect prod) |
**Auto-approval scenarios**:
- In `approved` mode with `experiment-params` + `experiment-decide` gates approved → iteration loop runs autonomously
- Step 4 REPORT with `VALIDATED` tier → ALWAYS asks (always_ask_tiers)
- If experiment touches prod rule config → `data:modify_prod_schema` action forces ask
**User activation pattern**:
```bash
./hooks/autonomy-gate.sh approve experiment-params
./hooks/autonomy-gate.sh approve experiment-decide
./hooks/autonomy-gate.sh set-mode approved
/atlas tune my-experiment # runs autonomously until Step 4 HITL
```
Audit trail: every gate decision logged to `.claude/decisions.jsonl`.
## Integration APIs
| System | Endpoints |
|--------|-----------|
| Rule Engine | `GET /{pid}/rules/insights`, `GET /{pid}/rules/evaluate`, `PUT /{pid}/rules/{id}`, `POST /{pid}/rules/{id}/revert` |
| YOLO (VM 600) | `POST /detect`, SSH: `python train_yolo_pid.py`, `python evaluate.py` |
## Non-Negotiable Rules
1. **ONE mutation per iteration** — isolate variables
2. **Time-boxed** — never exceed budget
3. **HITL gates** — significant changes require human approval
4. **Rollback capability** — every mutation reversible
5. **Structured logging** — every iteration to JSONL
6. **Baseline preservation** — original state always recoverable
7. **DRY_RUN first** — validate metric command before iterating
## Model Strategy
| Phase | Model |
|-------|-------|
| Experiment design, hypotheses, final report | Opus 4.7 |
| Iteration execution (mutations, evaluation) | Sonnet 4.6 |