plan-task

Show SKILL.md content (~6.1k tokens)

---
name: plan-task
description: >-
Drive a task through PLANNING to READY: investigate the spec, surface gaps,
file subtasks, and run the gate check. Use when: 'investigate mt#X', 'plan
mt#X', 'look into mt#X', "what's the gap for mt#X", 'bring mt#X to ready',
'research mt#X', 'analyze mt#X spec'. Does NOT create new tasks (use
/create-task) and does NOT implement (use /implement-task).
user-invocable: true
---

# Plan Task

Drive an existing task from TODO through PLANNING to READY by investigating its spec, surfacing
gaps, filing any needed subtasks, and running the PLANNING → READY gate check.

## Arguments

Required: a task ID (e.g., `/plan-task mt#915` or `investigate mt#915`).

## Triggers

This skill auto-invokes on:

- "investigate mt#X"
- "plan mt#X"
- "look into mt#X"
- "what's the gap for mt#X"
- "bring mt#X to ready"
- "research mt#X"
- "analyze mt#X spec"

It does **not** trigger on task creation intents (use `/create-task`) or implementation
intents (use `/implement-task`).

## PLANNING lifecycle ownership

This skill owns the **TODO → PLANNING → READY** state arc. The first mechanical step is always
a status transition; everything else is investigation and gate-check.

## Process

- Step 1: Transition to PLANNING (idempotent)
- Step 2: Read and verify the spec
- Step 2.5: Premise audit (four checks — must run before the gate)
- Step 3: Run the PLANNING → READY gate check
- (a) Required spec sections present
- (b) Success criteria are testable
- (c) Scope is bounded
- (d) No blocking questions
- (e) File:line references are fresh
- (f) Subtasks filed for multi-phase work
- (g) No parallel work in flight
- (h) Contract-propagation enumeration
- Step 4: Act on gate results

### Step 1: Transition to PLANNING (idempotent)

1. Call `mcp__minsky__tasks_status_get` with the task ID to read the current status.
2. Branch on current status:
- **TODO** → call `mcp__minsky__tasks_status_set` to transition to **PLANNING**.
- **PLANNING** → already in the right state; proceed without re-transitioning.
- **READY** → task is already gate-passed. Confirm with the user whether to re-investigate
or stop. Default: stop and report it's READY.
- **IN-PROGRESS / IN-REVIEW / DONE** → task is past the planning phase. Inform the user
and stop — do not attempt to walk the status backward.
- **BLOCKED** → surface the blocker, do not transition.

### Step 2: Read and verify the spec

1. Call `mcp__minsky__tasks_spec_get` to load the full task specification.
2. Check that the spec is substantive — not just a one-line title. If the spec is empty or
only contains a title, that is itself a blocking gap (surface it now).
3. Note any file:line references and verify them against the current codebase (use
`mcp__minsky__session_exec` or `mcp__minsky__session_grep_search` to confirm they exist
and point to the right code).

### Step 2.5: Premise audit

Before running the spec-quality gate, answer all four checks below explicitly in your
planning output. **READY recommendations, closure recommendations, and new-task creation
calls are blocked until all four answers are stated.**

Each check is a separate sub-section in the output. Use the (i)/(ii)/(iii)/(iv) labels.

#### Premise check (i) — Open hypotheses

Does the parent investigation (or the spec being planned) explicitly leave premises open
that this task is treating as settled?

- Name any open premises the spec carries forward as if they were resolved facts.
- Identify what evidence or decision would resolve each open premise.
- Either gate the task on that resolution, or rescope to be premise-independent.

If no open premises exist, state that explicitly: "(i) No open premises identified."

#### Categorization check (ii) — Scope/label fit

Is the plan relying on a categorization (scope label, file pattern, tier, classifier
verdict) — and does that categorization actually fit the change's nature, or is it
inherited from a heuristic built for a different purpose?

- Name any categorization the plan depends on.
- Verify it was designed for this type of change (not just pattern-matched).
- If the categorization is suspect: file a separate task to fix the classifier rather than
building on its bad output. Do not proceed on a categorization you cannot validate.

If no categorization is relied on, state that explicitly: "(ii) No inherited categorization relied on."

#### Parallel-work check (iii) — In-flight overlap

Before recommending closure, amendment, or new tasks: run `mcp__minsky__tasks_search`
with subsystem keywords from the task being planned. Surface any in-flight tasks that
touch the same files, subsystem, or problem class.

This check fires the moment the planning flow generates a closure, amendment, or new-task
recommendation — not only on the actual `tasks_create` call.

Report any overlapping tasks found. If none: "(iii) No overlapping in-flight tasks found."

#### Framing check (iv) — Symptom vs. structure

Before recommending implementation, ask: "Is this fixing a symptom of a deeper structural
issue?"

If a fix repeatedly recurs in the same area (sanitizer iteration #N, prompt iteration #N,
classifier patch #N), surface the structural reframe as a follow-up RFC even when shipping
the tactical patch.

**Socratic-premise sub-check.** When stuck on a tactical recommendation, decompose the
operation being patched into its constituent parts. Ask: "What are the actual sub-operations
of this thing? Are they being conflated?" Apply Socratic decomposition of the operation
being patched as part of this check — not just pattern-matching on cluster shape.

If no structural issue is suspected: "(iv) No recurring pattern identified; tactical fix is appropriate."

### Step 3: Run the PLANNING → READY gate check

Evaluate each criterion in order. A single **fail** halts promotion to READY; surface all
failures together so the user can address them in one pass.

#### Gate criterion (a) — Required spec sections present

The spec must have **all five** of the following top-level sections (exact heading text):

- `## Summary`
- `## Success Criteria`
- `## Scope`
- `## Acceptance Tests`
- `## Context`

Check each section's presence. Record any missing sections as blocking gaps.

#### Gate criterion (b) — Success criteria are testable

Each item under `## Success Criteria` must be independently verifiable by an agent or a
human reviewer. Reject criteria that:

- Use vague language ("should work correctly", "behaves as expected", "is improved")
- Cannot be checked by running a command, reading a file, or calling a tool
- Are aspirational rather than observable

For each weak criterion, write a concrete revision and surface it as a gap.

#### Gate criterion (c) — Scope is bounded

`## Scope` must contain explicit **In scope** and **Out of scope** (or equivalent) lists.
A scope section that only describes what is in scope (no out-of-scope list) is insufficient —
without an out-of-scope list, creep risk is unmanaged. Surface as a gap if missing.

#### Gate criterion (d) — No blocking questions

Look for any open questions in the spec or in the task's history that would prevent starting
implementation. Indicators:

- "TBD" or "TODO" items inside the spec text
- Unresolved design decisions ("[open question: …]" patterns)
- Dependencies on unmerged PRs or incomplete tasks (check status of listed deps)

If blocking questions exist, list them explicitly. They must be answered before READY.

#### Gate criterion (e) — File:line references are fresh

For every `path/to/file.ts:N` reference in the spec:

1. Verify the file exists in the current codebase.
2. Verify the referenced code (function, class, constant) is still present near line N (±10).
3. If a reference is stale, note the stale ref and the correct location (or note it was deleted).

If no file:line references exist in the spec, this criterion passes automatically.

#### Gate criterion (f) — Subtasks filed for multi-phase work

If the task spec describes work that spans multiple independent phases, components, or team
boundaries, confirm that child subtasks have been filed (check `mcp__minsky__tasks_children`).
If the parent has no children but the work clearly decomposes, surface "subtasks not yet filed"
as a blocking gap and propose the decomposition.

Single-phase tasks pass this criterion automatically.

#### Gate criterion (g) — No parallel work in flight

Before a task can be READY, verify no other in-flight work covers the same files, signatures,
or symptoms. Three required checks; **any hit is a blocking gap** until resolved (the user
chooses: wait, coordinate, reframe scope, or explicitly acknowledge).

Rationale: this gate operationalizes `feedback_check_parallel_work_before_decomposing`.
Three recurrences in three days proved memory-only enforcement insufficient (mt#1192/mt#1199,
mt#1068/mt#1240, mt#1261/mt#1281, plus the meta-incident: mt#1299 vs mt#1305 itself).

Run all three:

1. **Path/file-collision check** — for each file/path listed in the spec's
`## Scope` → `In scope` section:

- Call `mcp__github__list_pull_requests` with `state: "open"` and inspect titles/branches.
- For high-suspicion matches, call `mcp__github__pull_request_read` with `method: "get_diff"`
to confirm the PR actually touches the path.
- Also check recent merges: `mcp__minsky__git_log` with the file path filter for the
last 7 days — a fix that just landed on `main` is just as bad as one in flight.

2. **Signature search** — for the spec's signature phrases (specific identifier names,
error message strings, env var names, migration slot numbers):

- Call `mcp__minsky__tasks_search` with each phrase. Inspect any IN-REVIEW, IN-PROGRESS,
or recently-DONE matches.
- For bug tasks, also `mcp__minsky__git_log` with `--grep=<phrase>` against `main` for
recently-merged commits.

3. **Parent/sibling enumeration** — if the task has a parent:
- Walk `mcp__minsky__tasks_parent` then `mcp__minsky__tasks_children` to enumerate the
full sibling/descendant set.
- For each related task ID, call `mcp__minsky__session_pr_list` with `status: "open"`
and `task: "mt#X"`; surface any open PR.

If any check hits, surface findings as a blocking gap with task/PR IDs and the specific
overlap (file, phrase, or sibling). Do NOT promote to READY until the user resolves the
overlap.

If no check hits, this criterion passes.

#### Gate criterion (h) — Contract-propagation enumeration

When the task retires or modifies a contract — a function/type signature, skill text, command
name, env-var name, config key, or schema field — the spec's `## Scope` → `In scope` section
must explicitly enumerate the downstream consumers of that contract. A spec that names the
retired or changed artifact without listing who reads or depends on it is incomplete and must
not proceed to READY.

Rationale: four incidents on 2026-05-06/08 traced to exactly this gap. In each case the spec
correctly identified the artifact being changed but missed one or more consumer classes,
causing silent breakage after merge:

- **mt#1551** — retired the `/verify-task` audit gate without enumerating the skill files
referencing it; caused idle-drift on PR #970.
- **mt#1086** — added required fields to `ReviewerConfig` without enumerating test fixtures;
CI on main was broken for ~24 hours.
- **mt#1610 (doc-side)** — enumerated 25+ in-scope code sites but missed three documentation
files (`docs/configuration-guide.md`, `docs/repository-configuration.md`,
`docs/github-issues-backend-guide.md`).
- **mt#1610 (Railway env-var side)** — spec claimed "Sole consumer is `~/.config/minsky/config.yaml`"
but the Railway-deployed `minsky-mcp` service was also a consumer with its own
`MINSKY_SESSIONDB_*` env vars. Production crashed 2026-05-08T00:09Z; fixed via mt#1624 / PR #976.

This criterion encodes the escalation policy of the `contract_propagation_at_design_time`
memory (id `513934fa-3000-4f67-8869-2d50598f484b`): when a fourth instance surfaces, add
Gate criterion (h).

**Trigger condition.** This criterion fires when the spec describes any of:

- Retiring, renaming, or changing the signature of a function, type, interface, or class
- Renaming or retiring a skill, command, or CLI subcommand
- Renaming or retiring an env-var or config key
- Changing a schema field name, type, or required-status

If none of these apply, this criterion passes automatically. State that explicitly:
"(h) No contract modification — criterion passes."

**Consumer enumeration heuristic by change type.** For each category of change, the spec's
`## Scope` → `In scope` list must cover all of the following:

| Change type | Consumers to enumerate |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| Function / type signature | All call sites and imports in `src/`, `tests/`, `services/`, `.github/` |
| Skill text / command name | All skill files under `.claude/skills/` and `.claude/agents/`, all `CLAUDE.md` sections that reference the skill/command by name |
| Env-var rename | All reads in `src/`, `services/`, `scripts/`, `.github/` **and** deployed-environment artifacts (see below) |
| Config key / schema field | All reads in `src/`, `tests/`, `services/`, `.github/`, `docs/` **and** deployed-environment artifacts (see below) |

**Deployed-environment artifacts (required callout for env-var and config-key changes).**
Source-code consumers are not the only consumers. When an env-var or config key changes, the
following deployed-environment locations must be explicitly checked and enumerated or ruled out:

- **Railway service env vars** — any Railway service that sets or reads the variable
(visible in `services/*/railway.config.ts`, Railway dashboard env-var declarations, and
`railway.json` / `railway.toml` files if present)
- **CI/CD env declarations** — `.github/workflows/*.yml` files that set the variable via
`env:` blocks or `secrets:` references
- **In-tree service configs** — `services/*/railway.config.ts` and any other
service-config files in the `services/` directory that reference the key

**Check steps:**

1. Read the spec and identify whether it describes any of the trigger-condition change types.
If not, record "(h) passes — no contract modification."
2. If triggered, identify the specific artifact(s) being changed (names, paths, key names).
3. For each artifact, look up its consumer class in the heuristic table above.
4. Verify the spec's `## Scope` → `In scope` list covers each consumer class. Missing
consumer classes are blocking gaps.
5. For env-var and config-key changes specifically: confirm the spec explicitly addresses each
of the three deployed-environment artifact categories, either enumerating consumers or
stating "no consumers in this category."

A spec that says "sole consumer is X" without a verified sweep of the consumer classes does
not satisfy this criterion — the claim must be grounded in an actual search, not an assumption.

### Step 4: Act on gate results

**All gate criteria pass:**

1. Report the gate summary (all green).
2. Call `mcp__minsky__tasks_status_set` to transition the task to **READY**.
3. **Continue the lifecycle: invoke `/implement-task mt#X` directly** (do NOT stop and hand the next-step instruction back to the user). Per CLAUDE.md User Preferences ("Take direct action without asking: When the next step is clear, proceed immediately"), the post-READY default IS implementation. Stopping at READY with "Use `/implement-task` to begin" wording is the failure mode this step was rewritten to prevent (originating incident 2026-05-11; prior incident 2026-04-30 captured in memory `feedback_auto_mode_chains_skills_at_affirmative_tokens`, id `4b83ff51-4bc2-49f5-84be-7e4eac073125`).

**Only halt before `/implement-task` if** one of these explicit halt conditions holds:

- The user said something during planning that explicitly defers implementation ("don't implement yet", "just plan it", "I'll handle the impl").
- The READY transition itself surfaced a new blocking signal (e.g., dependency status check failed mid-transition).
- The task is gated on an external decision the user owns (e.g., "spec needs your approval before impl"), explicitly stated in the spec.

**Do NOT halt for any of these reasons** (each was a confabulated halt rationale in the originating incident):

- "Planning is the skill's scope; implementation is a separate skill."
- "User might want to review the gate report before I proceed."
- "The next move is user-driven."

When a brief affirmative ("proceed", "continue", "go", "ok", "yes") arrives at any planning hand-off point, treat it as confirmation to walk the chain forward — NOT as acknowledgment to stop. The bridge memory `4b83ff51` covers this verbatim; this step encodes the same discipline structurally so the agent doesn't have to recall the memory at hand-off time.

**Tracking task for the structural chaining mechanism:** mt#1478 (Auto-mode skill chaining: /plan-task → /implement-task → /prepare-pr → /review-pr walk the chain at gate-passes). When mt#1478's other deliverables ship (implement-task, prepare-pr, review-pr SKILL amendments + CLAUDE.md doc section), the chain is fully structural and this paragraph can be retired.

**One or more gate criteria fail:**

1. Do **not** call `tasks_status_set` → READY.
2. Task remains in PLANNING.
3. Present a structured gap report:

```
## Gap Report for mt#X (PLANNING — not yet READY)

### Blocking gaps
- [criterion letter] <description of gap>
- [criterion letter] <description of gap>

### Required actions before READY
1. <concrete action the user or agent must take>
2. <concrete action the user or agent must take>

To re-run the gate after fixes: `/plan-task mt#X`
```

4. Stop. Do not attempt to patch the spec automatically unless the user explicitly asks.

**Example (h) failure.** For a task that renames a config key (e.g., `sessionDbPath` →
`sessiondb.path`) whose spec says "Sole consumer is `~/.config/minsky/config.yaml`":

```
## Gap Report for mt#1610 (PLANNING — not yet READY)

### Blocking gaps
- (h) Contract-propagation enumeration: spec claims sole consumer of `MINSKY_SESSIONDB_*`
is `~/.config/minsky/config.yaml` but does not enumerate deployed-environment consumers.
Missing: Railway service env vars (`MINSKY_SESSIONDB_PATH`, `MINSKY_SESSIONDB_AUTH_TOKEN`
set on `minsky-mcp` Railway service), CI/CD env declarations (`.github/workflows/`
references), and in-tree service configs (`services/*/railway.config.ts`).

### Required actions before READY
1. Add the Railway env-var consumers to `## Scope` → `In scope`:
"Railway `minsky-mcp` service env vars: MINSKY_SESSIONDB_PATH, MINSKY_SESSIONDB_AUTH_TOKEN"
2. State explicitly whether CI/CD workflows or in-tree service configs reference this key
(or confirm they do not after a verified grep).

To re-run the gate after fixes: `/plan-task mt#1610`
```

## State transition map

| Current status | Action |
| -------------- | ------------------------------------------------ |
| TODO | → PLANNING (first step), then investigate + gate |
| PLANNING | Skip transition, investigate + gate |
| READY | Report already READY, stop (confirm to re-run) |
| IN-PROGRESS | Out of scope for this skill; inform user |
| IN-REVIEW | Out of scope for this skill; inform user |
| DONE | Out of scope for this skill; inform user |
| BLOCKED | Surface blocker, do not transition |

## Key constraints

- **Never set DONE** — only the merge + post-merge audit flow does that.
- **Never start a session** — that is `/implement-task`'s responsibility.
- **Never create the task** — use `/create-task` for new tasks.
- **Idempotent transitions** — calling `tasks_status_set` → PLANNING when already PLANNING
is a no-op; the skill handles this by reading status first.
- **Premise audit must precede spec-quality gate check** — READY recommendations, closure
recommendations, and amendment recommendations are blocked until all four premise-audit
checks (i)–(iv) have explicit answers in the agent's output.

## Reframe-trigger ergonomics

There is no reliable harness-level intervention that _produces_ a reframe. The harness can
block premature transitions and require audit answers, but it cannot force the agent to
recognize a structural pattern it has not already seen.

The load-bearing prompt-shape that unlocks a reframe is **Socratic premise-interrogation by
the user**: asking "what exactly is this fixing?", "what are the sub-operations?", "is this
the third time we've patched this?" These questions surface assumptions the agent has
silently inherited.

This skill encourages the agent to apply that Socratic shape to itself during the framing
check (iv): decompose the operation being patched, question whether sub-operations are being
conflated, and check whether the cluster of prior fixes points to a structural gap rather
than a series of independent incidents.

The agent should not wait for the user to ask these questions. If the framing check (iv)
produces no structural reframe, the agent should explicitly document why — not silently
pass.

## Regression example

**Example failure (2026-04-27, mt#1357 investigation).** Investigating three child tasks of
a sanitizer-cluster investigation, the agent:

(a) treated parent-investigation correlation as causation without checking what would
resolve the open hypothesis — premise check (i) failure;

(b) anchored on existing scope-calibration architecture when the actual problem was an
output-format issue, not a rigor-calibration issue — categorization check (ii) failure;

(c) inherited a classifier's verdict as truth (skill files matching `*.md` therefore being
"docs") — a second categorization check (ii) failure;

(d) skipped the parallel-work check because investigation felt like not-yet-acting — a
parallel-work check (iii) failure.

The user's premise-checking questions surfaced all four errors. The structural fix (this
premise-audit step) would have produced the same answers without that prompting.

Get plan-task.

vz-scrape-runner

vz-bench-debug

Think you can beat it?