SQLdurinwinterFree

harkonnen

Operate Harkonnen through the repo-local MCP server and CLI. TRIGGER: user asks about run history, PackChat threads or messages, checkpoints, board snapshots, reasoning trails, benchmark suites or reports, wants to diagnose a failure, wants a run report, or wants to start, queue, or watch a run. Preferred over raw shell commands for Harkonnen interactions.

Repo bundle on Versuzdurinwinter/Harkonnen-Labs32 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/durinwinter/Harkonnen-Labs Yours? Claim it ↗

§ 01 — Stats

Stars4

Prior1661

Quality70.0

Score—

Tasks—

§ 02 — Install

Get harkonnen.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install durinwinter-harkonnen-labs-claude-skills-harkonnen

Or clone the repo

$git clone https://github.com/durinwinter/Harkonnen-Labs.git

Or copy the SKILL.md manually

$cp Harkonnen-Labs/SKILL.MD ~/.claude/skills/durinwinter-harkonnen-labs-claude-skills-harkonnen/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge durinwinter-harkonnen-labs-claude-skills-harkonnen↵

Show SKILL.md content (~2.2k tokens)

---
name: harkonnen
description: "Operate Harkonnen through the repo-local MCP server and CLI. TRIGGER: user asks about run history, PackChat threads or messages, checkpoints, board snapshots, reasoning trails, benchmark suites or reports, wants to diagnose a failure, wants a run report, or wants to start, queue, or watch a run. Preferred over raw shell commands for Harkonnen interactions."
user-invocable: true
argument-hint: "[recent-runs | report <run-id> | diagnose <run-id> | queue <spec> <product-or-path> | watch <run-id> | boards <run-id> | reasoning <run-id> | start <spec> <product-or-path> | thread-open <title> | thread-list | say <thread-id> <message> | checkpoints <run-id> | answer-checkpoint <run-id> <checkpoint-id> | unblock <run-id> <agent> | benchmark-suites | benchmark-recent | benchmark-report <id-or-latest> | benchmark-smoke <suite-id...>]"
allowed-tools:
- Read
- Bash(claude mcp list)
- Bash(cargo run -- setup check)
- mcp__harkonnen__*
- mcp__sqlite__*
- mcp__filesystem__*
---

# /harkonnen - Run, Board, And Benchmark Operations

Arguments passed: `$ARGUMENTS`

Use the repo-local `harkonnen` MCP server as the first-choice surface for run,
PackChat, checkpoint, and benchmark ops. If MCP is unavailable, diagnose and
fix the MCP path instead of silently dropping to normal CLI operations.

---

## Common Tasks

### `recent-runs`
Call `mcp__harkonnen__list_runs` with limit 5–10. Return a compact table:
run-id, spec, status, outcome, timestamp.

### `report <run-id>`
Call `mcp__harkonnen__get_run_report`. Extract: outcome, failure point, timings,
recommended next step. Don't dump the full report unless the user asks.

### `diagnose <run-id>`
Combine `mcp__harkonnen__get_run`, `mcp__harkonnen__get_run_report`, and
`mcp__harkonnen__list_run_decisions`. Identify whether the blocker is:
spec quality / implementation / validation / setup or MCP / hidden scenarios.
End with the highest-leverage next move.

### `queue <spec> <product-or-path>`
Use `mcp__harkonnen__queue_run` for normal commissioning so the command returns
immediately with a queued run id. If the second argument looks like a path
(contains `/` or starts with `.`), use `product_path`; otherwise use `product`.
Quote paths with spaces. Default `run_hidden_scenarios` to `true` unless the
user asks for a lighter run. Echo: spec, product/path, hidden scenarios
enabled, returned run-id, and tell the user `watch <run-id>` is the next step.
If the user provides draft spec YAML, pass it through `queue_run` instead of
asking them to save a file first.

### `watch <run-id>`
Call `mcp__harkonnen__watch_run`. Summarize the current status, latest phase
activity, active checkpoints, and run timing when present. This is the default
follow-up after `queue`.

### `boards <run-id>`
Call `mcp__harkonnen__get_run_board_snapshot`. Present the run using the same
four-board mental model as the web UI: Mission, Action, Evidence, and Memory.
Prefer this when the user wants a holistic “what’s going on right now?” view
instead of a narrow event stream. Use the summary view by default for quick
answers, and only ask for the full board payload when the user explicitly wants
deep detail.

### `reasoning <run-id>`
Call `mcp__harkonnen__get_run_reasoning_snapshot`. Use this when the user wants
to inspect how the pack is thinking in practice: recent decisions, checkpoint
answers, unblock patterns, and live reasoning counts. Prefer the summary view
for quick diagnostics and the full view when the user wants the exact recent
decision trail or checkpoint-answer history.

### `start <spec> <product-or-path>`
If the second argument looks like a path (contains `/` or starts with `.`),
use `product_path`. Otherwise use `product`. Quote paths with spaces.
This is the blocking variant and should be used only when the user explicitly
wants to wait for the lifecycle result in one call. Otherwise prefer `queue`.
If used, default `run_hidden_scenarios` to `true` unless user asks for a
lighter run. Echo: spec, product/path, hidden scenarios enabled, returned
run-id. Tell the user: `report <run-id>` or `diagnose <run-id>` is the next
step. If the user provides draft spec YAML, pass it through the MCP `start_run`
operation instead of asking them to save a file first.

### `thread-list`
Call `mcp__harkonnen__list_chat_threads`. Return thread id, title, kind, run
binding, and updated time. Use small limits unless the user asks for a full list.

### `thread-open <title>`
Call `mcp__harkonnen__open_chat_thread`. Default to `thread_kind: "general"`
unless the user clearly wants a run, spec, or operator-model thread.

### `say <thread-id> <message>`
Call `mcp__harkonnen__post_chat_message`. Summarize both the operator message
and the generated agent reply. If the user addresses a specific pup, pass the
agent explicitly when helpful; otherwise let normal PackChat routing handle it.

### `checkpoints <run-id>`
Call `mcp__harkonnen__list_run_checkpoints`. Focus on checkpoints still open or
recently answered, and surface agent, phase, prompt, and next action.

### `answer-checkpoint <run-id> <checkpoint-id>`
Call `mcp__harkonnen__reply_to_checkpoint`. Use `answer_text` for ordinary
operator replies and `decision_json` only when the user has clearly provided
structured decisions.

### `unblock <run-id> <agent>`
Call `mcp__harkonnen__unblock_agent` after confirming the target agent and run.
Use this when the user wants the run to continue after answering a blocker or
when they explicitly ask to release a stalled pup.

### `benchmark-suites`
Call `mcp__harkonnen__list_benchmark_suites`. Return the suite ids, titles,
default-selected status, and any obvious required env that affects a smoke run.

### `benchmark-recent`
Call `mcp__harkonnen__list_benchmark_reports`. Summarize the newest benchmark
artifacts with report id, generated time, selected suites, and pass/fail/skipped
counts.

### `benchmark-report <id-or-latest>`
Call `mcp__harkonnen__get_benchmark_report`. Default to `latest` when the user
does not provide an id. Prefer `format: "markdown"` for operator-facing output
and `format: "summary"` for compact machine-readable summaries.

### `benchmark-smoke <suite-id...>`
Call `mcp__harkonnen__run_benchmarks`. If the user does not name a suite,
default to the repo's fast smoke path, `local_regression`. Before running
obviously external or long-haul suites such as LongMemEval, LoCoMo, DevBench,
SWE-bench, or tau2-bench, pause and confirm because they may need extra setup
or much more time.

---

## Benchmark Smoke

- Prefer structured specs, especially DevBench-generated specs
- For Harkonnen self-tests: `product_path` is usually `.`
- Prefer `queue` plus `watch` over the blocking `start` path for everyday
commissioning through Claude or Codex
- Prefer `boards` when the user wants the UI-style Mission / Action / Evidence /
Memory overview instead of raw event or status output
- Prefer `reasoning` when the user wants to inspect the live decision trail,
checkpoint-answer history, or unblock behavior rather than only phase status
- Prefer PackChat thread open/list/message and checkpoint reply/unblock through
MCP instead of telling the user to use the web UI or raw API routes
- Start the run first; inspect report + decision log before dropping to shell-level
benchmark commands
- Prefer benchmark suite listing, report listing, report retrieval, and smoke
execution through MCP before dropping to shell-level benchmark commands
- For comparative benchmarking beyond one run: use MCP run data first, then recommend
suite-level shell commands

---

## MCP Recovery

If `harkonnen` tools are unavailable:
1. Read `.mcp.json` and `.claude/settings.local.json`
2. Confirm `harkonnen` is present and enabled
3. Run `claude mcp list` or `cargo run -- setup check`
4. Tell the user the concrete fix — don't only report that MCP failed

---

## Boundaries

- Do not use shell commands for run inspection when the MCP tool already exists
- Do not invent run IDs, statuses, reports, or decision details
- If a start request is missing both product name and product path, ask one
concise follow-up question