Researchseb155Free

experiment-loop

Autonomous optimization loop (autoresearch). This skill should be used when the user asks to 'run experiment', 'optimize', 'autoresearch', '/atlas experiment', or has a config.yaml for analyze→mutate→execute→measure→decide with HITL gates.

Repo bundle on Versuzseb155/atlas-plugin336 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/seb155/atlas-plugin Yours? Claim it ↗

§ 01 — Stats

Prior1090

Quality—

Score—

Tasks—

§ 02 — Install

Get experiment-loop.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install seb155-atlas-plugin-dist-atlas-admin-addon-skills-experiment-loop

Or clone the repo

$git clone https://github.com/seb155/atlas-plugin.git

Or copy the SKILL.md manually

$cp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-dist-atlas-admin-addon-skills-experiment-loop/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge seb155-atlas-plugin-dist-atlas-admin-addon-skills-experiment-loop↵

Show SKILL.md content (~1.5k tokens)

---
name: experiment-loop
description: "Autonomous optimization loop (autoresearch). This skill should be used when the user asks to 'run experiment', 'optimize', 'autoresearch', '/atlas experiment', or has a config.yaml for analyze→mutate→execute→measure→decide with HITL gates."
mode: [personal, all]
effort: high
context: fork
agent: experiment-runner
---

# Experiment Loop — Autonomous Optimization

Karpathy autoresearch-inspired: defined target → autonomous experimentation → HITL review.

## v5.7.0+ Native Delegation (Phase 4)

For simple recurring tasks, prefer CC native `/loop` (v2.1.89+) or `CronCreate` tool
over this skill's custom scheduling:

```bash
# Simple periodic task — use native
/loop 5m check deploy status
/loop 1h /atlas health infra

# Full experiment with HITL gates + mutation tracking — use this skill
/atlas experiment start <config>
```

Keep this skill for: multi-iteration experiments with mutation proposals, HITL gates,
measurement tracking, result synthesis. Not for simple recurring pings.

## Invocation

| Command | Action |
|---------|--------|
| `/atlas tune <name>` | Run named experiment |
| `/atlas tune --list` | List available experiments |
| `/atlas tune --history <name>` | Show experiment history |
| `/atlas tune --baseline <name>` | Show/update baseline |

## Experiment Config

Defined in `.claude/assay/experiments.yaml`. Each experiment specifies:

| Field | Purpose |
|-------|---------|
| `target` | What to optimize: `{type: database\|file\|api, table/path, filter}` |
| `metric` | Measurement: `{name, direction: maximize\|minimize, command}` |
| `golden_dataset` | HITL-validated baseline: `{path, description}` |
| `budget` | Limits: `{max_iterations, time_per_iteration, total_timeout}` |
| `hitl` | Gates: `{threshold, auto_accept_below, always_reject_below}` |
| `mutation_strategy` | Approach: `insights\|random\|systematic` + params |
| `model` | `sonnet` (iteration) or `opus` (design/report) |

See existing experiments in `.claude/assay/experiments.yaml` for examples (rule-engine, yolo-pid, omnisearch).

## Execution Flow

### Step 1: LOAD
Read config → validate target/metric/dataset → load or create baseline.
**HITL Gate**: AskUserQuestion to confirm experiment parameters before starting.

### Step 2: BASELINE
Execute metric command → record `{metric, timestamp, config_snapshot}` → save to `.claude/assay/baselines/`.

### Step 3: ITERATE (loop up to max_iterations)

| Phase | Action |
|-------|--------|
| 3a. ANALYZE | Read insights/telemetry, identify lowest-performing element |
| 3b. HYPOTHESIZE | Formulate what to change and why |
| 3c. MUTATE | Apply exactly **ONE** change. Record old/new/reason |
| 3d. EXECUTE | Run metric command (time-boxed) |
| 3e. MEASURE | `delta = new - baseline`, compute improvement % |
| 3f. DECIDE | See decision table below |
| 3g. LOG | Append to `.claude/assay/history/{experiment}-{date}.jsonl` |

**Decision table**:

| Condition | Action |
|-----------|--------|
| `delta < always_reject_below` | ROLLBACK |
| `delta < auto_accept_below` | ACCEPT (quiet) |
| `delta > threshold` | **HITL GATE** — AskUserQuestion: accept/reject/modify |
| Otherwise | ACCEPT (auto) |

### Step 4: REPORT
Generate report → save to `.claude/assay/reports/{experiment}-{date}.md`.
**HITL Gate**: Keep all / rollback to baseline / cherry-pick iterations.

## Approved-Mode Integration (v6.0.0-alpha.7+)

Every HITL gate in experiment-loop respects session autonomy state via `hooks/autonomy-gate.sh`:

| Gate | Default gate_id | Default tier | Always-ask action |
|------|-----------------|--------------|-------------------|
| Step 1 LOAD (confirm params) | `experiment-params` | CODED | — |
| Step 3f DECIDE (delta > threshold) | `experiment-decide` | VALIDATING | — |
| Step 4 REPORT (keep/rollback/cherry-pick) | `experiment-final` | VALIDATED | `data:modify_prod_schema` (if rule changes affect prod) |

**Auto-approval scenarios**:
- In `approved` mode with `experiment-params` + `experiment-decide` gates approved → iteration loop runs autonomously
- Step 4 REPORT with `VALIDATED` tier → ALWAYS asks (always_ask_tiers)
- If experiment touches prod rule config → `data:modify_prod_schema` action forces ask

**User activation pattern**:
```bash
./hooks/autonomy-gate.sh approve experiment-params
./hooks/autonomy-gate.sh approve experiment-decide
./hooks/autonomy-gate.sh set-mode approved
/atlas tune my-experiment  # runs autonomously until Step 4 HITL
```

Audit trail: every gate decision logged to `.claude/decisions.jsonl`.

## Integration APIs

| System | Endpoints |
|--------|-----------|
| Rule Engine | `GET /{pid}/rules/insights`, `GET /{pid}/rules/evaluate`, `PUT /{pid}/rules/{id}`, `POST /{pid}/rules/{id}/revert` |
| YOLO (VM 600) | `POST /detect`, SSH: `python train_yolo_pid.py`, `python evaluate.py` |

## Non-Negotiable Rules

1. **ONE mutation per iteration** — isolate variables
2. **Time-boxed** — never exceed budget
3. **HITL gates** — significant changes require human approval
4. **Rollback capability** — every mutation reversible
5. **Structured logging** — every iteration to JSONL
6. **Baseline preservation** — original state always recoverable
7. **DRY_RUN first** — validate metric command before iterating

## Model Strategy

| Phase | Model |
|-------|-------|
| Experiment design, hypotheses, final report | Opus 4.7 |
| Iteration execution (mutations, evaluation) | Sonnet 4.6 |