grill-with-docs

Show SKILL.md content (~5.4k tokens)
---
name: grill-with-docs
description: "Requirements elicitation anchored on existing domain model docs (CLAUDE.md, PRD, .blueprint/). Use when 'grill on feature X', 'requirements elicitation', 'spec the feature', or before implementing anything ambiguous."
mode: [engineering]
effort: medium
version: 2.1.0
tier: [admin]
attribution: "Cherry-picked from mattpocock/skills (MIT, see CREDITS.md). Socratic enhancements: sp5 2026-05-02. Re-merged upstream practical-workflow tactics + ported CONTEXT-FORMAT.md + ADR-FORMAT.md companion files: 2026-05-03 (per memory/audit-mattpocock-drift-2026-05-03.md)."
see_also: [vision-alignment, brainstorming, plan-builder, to-prd, scope-check, decision-log]
thinking_mode: adaptive
---

# Grill With Docs — Socratic Requirements Elicitation

> Stress-test a plan or feature idea against the project's existing domain model. Sharpen
> terminology, surface hidden variables, and force precision BEFORE implementation. Anchored
> on the Synapse + AXOIQ documented landscape (CLAUDE.md, `.blueprint/PRD-*`, MEMORY.md, ADRs).
>
> **SOTA 2026** — enhanced with Socratic Q-Tree (3 levels), Adversarial Debate (proponent vs
> skeptic), and Hidden-Var Drag (5 categories). Precedes `/plan-builder` in the standard flow.

## When to Invoke

### Manual triggers
- `/grill-me "feature description"` (canonical — use command for UX)
- "grill on feature X" / "grill me on this" / "grill-with-docs"
- "requirements elicitation" / "elicit requirements"
- "spec the feature" / "stress-test this plan"
- "challenge this against the domain model"

### Automatic triggers
Auto-fire BEFORE implementation when ANY of these signals:
- User asks to implement a feature with < 3 sentences of spec
- User uses domain terms (`E-Part`, `M-Part`, `WBS`, `package_material_rules`) without binding to concrete behaviour
- A new endpoint/service/page is requested but no PRD section exists yet
- User says "add X" without naming the persona, the entry-point, or the data source

### Skip triggers (do NOT grill)
- A PRD section in `.blueprint/PRD-SYNAPSE-ENTERPRISE.md` already covers this feature
  → call out the section + skip elicitation
- User has produced a written spec / sub-plan in `.blueprint/plans/*.md`
  → run `vision-alignment` instead (alignment, not elicitation)
- Trivial bugfixes / one-line config tweaks
- Pure refactor with no behaviour change

## Anti-Pattern Guard

Before grilling, scan the documented landscape:

```bash
# Verify the user hasn't already specced this
rg -l "<feature-keyword>" .blueprint/PRD-*.md .blueprint/plans/ memory/ 2>/dev/null
```

If hits found → STOP grilling, surface the existing doc, ask user if they want to
**extend** the existing spec or **replace** it. Do NOT re-elicit from scratch — that
wastes user time and introduces drift.

## Workflow

### Step 1 — Context Loading (read-only, fast)

Scan these sources in this order. Use `rg` and `grep`, NOT full reads:

| Source | Looks For |
|--------|-----------|
| `CLAUDE.md` (root + worktree) | Synapse principles, MBSE 4-layer, part lifecycle, persona list |
| `.blueprint/PRD-SYNAPSE-ENTERPRISE.md` | Existing feature specs, persona definitions |
| `.blueprint/MEGA-PLAN.md` | Cross-plan strategic context |
| `.blueprint/plans/INDEX.md` + targeted sub-plans | Adjacent or parent work |
| `memory/MEMORY.md` + `memory/feedback_*.md` | User preferences, prior decisions, lessons |
| `.blueprint/AUDITS-REGISTRY.md` | Constraints from past audits (corpo-leak, Zustand, TVT) |
| Code (only if domain term ambiguous) | `rg "class X" backend/app/` to confirm canonical naming |

Output of Step 1: a 5-7 line "context summary" — what the docs already say about this
feature area. Surface to user explicitly: "Here's what the docs already establish: …"

### Step 2 — Generate Grilling Questions

Produce **5-7 questions** targeting domain-model gaps. Cover at minimum:

1. **Persona** — Which of the 8 personas (I&C / EL / ME / Process / PM / Procurement /
   Admin / Client) is the primary user? Secondary?
2. **Engineering chain placement** — Where does this fit in `IMPORT → CLASSIFY → ENGINEER
   → SPEC GROUP → E-BOM → PROCURE → ESTIMATE → OUTPUTS`?
3. **Part lifecycle** — Does this touch G-Part / E-Part / M-Part / I-Part / S-Part? Which
   transitions?
4. **MBSE layer** — Which of the 4 layers (QUOI / OÙ / COMMENT / QUI) does this affect?
5. **Multi-tenant** — How does `project_id` filter work for this? THM-012 + 4 REF projects
   behave the same?
6. **Data source** — Where does the data live? `material_catalog` / `package_material_rules`
   / `frm_item_presentation` / `instruments` / new table?
7. **Determinism** — Same input = same output? Any AI dependency to flag?

For each question, also propose a **recommended answer** based on Step 1 context. The
user can accept / refine / reject. Recommended answers anchor the conversation.

### Step 3 — AskUserQuestion (MANDATORY per ATLAS rule)

Ask the questions **one at a time** via the `AskUserQuestion` tool — never via free text.
This is non-negotiable per `~/.claude/CLAUDE.md` global rule.

Per question:
- Pose the question with the recommended answer as default option
- Wait for the user's reply before asking the next
- If the user can't answer or says "tu décides" → log a gap, propose the most-likely
  default, mark it `assumed:` in the spec output

### Step 4 — Sharpen / Cross-Reference

While grilling:
- **Glossary clash** — if the user uses a term that conflicts with `CLAUDE.md` or
  `.blueprint/PRD-*` → call it out: "You said `M-Part`, but the lifecycle defines that
  as a procurement-stage part. Did you mean `E-Part` (engineering stage)?"
- **Code clash** — if user states a behaviour that contradicts the code → surface it:
  "Your description says `recompute_hours` ignores salvage rate, but the branching in
  `backend/app/services/estimate_service.py` includes it."
- **Concrete scenario** — for each ambiguity, invent one concrete scenario. "Walk me
  through THM-012 building 3, panel 2A, package P-201 — what happens?"

### Step 5 — Detect Showstoppers (refuse to proceed)

Refuse to produce a spec if ANY of these unresolved:
- No persona identified (≠ "everyone uses it")
- Multi-tenant filter undefined (would leak data across projects)
- Data source undefined ("we'll figure it out") = future hardcode incident
- Engineering chain placement unclear (cannot wire into existing pipeline)

If a showstopper is unresolved → tell the user, document the gap in `memory/grill-gap-<topic>.md`,
abandon the elicitation cleanly. Better to refuse than produce a wishful spec.

### Step 6 — Produce Spec Output

If all questions answered → produce a **spec markdown** ready for `to-prd` or
`plan-builder` downstream:

```
.blueprint/plans/draft-<feature-slug>.md
```

Sections (minimum):
- **Persona(s)** — primary + secondary
- **Engineering chain placement** — node in the pipeline
- **Part lifecycle touched** — which G/E/M/I/S transitions
- **MBSE layer(s)** — QUOI / OÙ / COMMENT / QUI
- **Multi-tenant rule** — how `project_id` is enforced
- **Data source** — table(s) + columns + new schema if any
- **Determinism contract** — explicit "no AI in prod" reaffirmation
- **Acceptance scenarios** — 3-5 concrete scenarios from Step 4

The spec is a **draft** — `to-prd` (W4.6) folds it into `PRD-SYNAPSE-ENTERPRISE.md`,
`plan-builder` produces the executable sub-plan.

## Pair With (downstream)

| Skill | Role |
|-------|------|
| `vision-alignment` | If feature feels strategic — run BEFORE grilling for roadmap fit |
| `brainstorming` | If multiple architectural options — run BEFORE grilling to enumerate |
| `to-prd` (W4.6) | Folds the grill output into the PRD |
| `plan-builder` | Produces the 15+5 section sub-plan from the PRD section |
| `scope-check` | Mid-execution drift detection on the grilled spec |

## Pocock → ATLAS Adaptations

| Pocock pattern | ATLAS adaptation |
|----------------|-------------------|
| `CONTEXT.md` glossary | Synapse uses CLAUDE.md + `.blueprint/PRD-*` as SSoT — no separate glossary file |
| `docs/adr/` lazy creation | ATLAS uses `.blueprint/AUDITS-REGISTRY.md` + `.claude/decisions.jsonl` — log there, not new ADR per term |
| Inline CONTEXT.md updates | Inline `.blueprint/PRD-SYNAPSE-ENTERPRISE.md` updates — only via `to-prd` downstream, never directly here |
| Free-text questions | **MANDATORY `AskUserQuestion`** per global rule (non-negotiable) |
| ADR offered sparingly | ADRs in `infrastructure/docs/ADR/` (homelab) or `projects/atlas-plugin/docs/ADR/` (atlas) — leave to `decision-log` skill |
| Single-language English | Bilingual FR/Quebecois friendly — questions can be French if user French |

## Anti-Patterns

1. **Grilling without context load (Step 1)** — produces generic questions, wastes user's time
2. **Grilling AFTER spec exists** — re-elicits what's already documented → drift
3. **Free-text questions** — violates global AskUserQuestion rule
4. **Producing a spec without persona** — every Synapse feature MUST name a persona
5. **Ignoring multi-tenant** — features that don't filter `project_id` = data leak risk
6. **Batching questions** — Pocock rule is one-at-a-time; lose context if batched
7. **Refusing to refuse** — if showstoppers unresolved, abandon. Wishful specs cost weeks
   to retract (see `memory/feedback_time_estimation_always_wrong.md`)
8. **Using grill-me to write PRD** — `/grill-me` produces a draft spec only. `/to-prd`
   folds the draft into `PRD-SYNAPSE-ENTERPRISE.md`. **NEVER** write to the PRD from here.

---

## Socratic Q-Tree (SOTA 2026 Enhancement)

> Reference: `references/socratic-patterns.md` | Source: Socratic-Solver (emergentmind.com)
> Three levels of questioning depth, auto-selected based on ambiguity signals.

### Level Selection Logic

| Signal detected | Run levels |
|-----------------|-----------|
| Feature described in ≥ 5 sentences, persona named, data source specified | L1 only |
| Feature described in < 4 sentences OR missing persona OR data source unclear | L1 + L2 |
| L2 surfaces 2+ unstated assumptions OR `--level 3` flag | L1 + L2 + L3 |
| Multi-tenant/auth touched OR 2+ critical risks at L3 OR `--debate` flag | L1 + L2 + L3 + Debate |

### Level 1 — Clarify (always)

**Goal**: establish identity. 5 fixed questions tied to Synapse domain model.

1. **Persona** — Which of the 8 personas (I&C / EL / ME / Process / PM / Procurement /
   Admin / Client) is primary? Secondary? Anti-user (who must NOT use it)?
2. **Chain placement** — Where does this fit in `IMPORT → CLASSIFY → ENGINEER → SPEC GROUP
   → E-BOM → PROCURE → ESTIMATE → OUTPUTS`?
3. **Part lifecycle** — Which G→E→M→I→S transitions does this trigger or depend on?
4. **Data source** — Which table(s): `material_catalog`, `package_material_rules`,
   `frm_item_presentation`, `instruments`, or new?
5. **Multi-tenant** — How does `project_id` filtering apply? THM-012 AND 4 REF projects
   behave identically?

Output: context summary + 5 questions with recommended answers.

### Level 2 — Challenge

**Goal**: surface assumptions the user didn't articulate. 3-5 questions using 3 tactics.

**Tactic A — Reverse the spec**: "If we DON'T build this, what breaks or stays manual?"
This surfaces whether the feature is essential or nice-to-have.

**Tactic B — Counter-example**: "Give me one concrete scenario where your stated rule
produces the wrong result." Forces precision on edge conditions.

**Tactic C — Dependency probe**: "What must already be true (in the data, in the code,
in the user's workflow) for this to work?" Uncovers hidden prerequisites.

Example Level 2 questions:
- "You said 'auto-classify'. If classification returns no match, what happens?"
- "If the same package exists in THM-012 AND BRTZ, should the rule apply identically?"
- "What existing feature would break if we deployed this as described?"

### Level 3 — Constrain

**Goal**: bound the solution space before design begins. 3-4 constraint questions.

**Type P (Performance)**: "What is the acceptable response time for X, given Y records?"
**Type S (Security)**: "Who must NOT see this data? What role is required to execute this?"
**Type R (Rollback)**: "If we revert this feature in 2 weeks, what data becomes permanently orphaned?"
**Type E (Edge)**: "What happens at zero records? At 10,000? Under concurrent edits?"

Level 3 output includes: 2-3 constraint statements + 1-2 flagged risks ready for decision-log.

---

## Adversarial Debate (SOTA 2026 Enhancement)

> Reference: `references/adversarial-debate.md` | Source: Mitsubishi multi-agent pattern (2026)
> Two conceptual sub-agents stress-test the spec from opposing directions.
> Patterns: atlas-team compatible (can be spawned as real parallel agents in V2).

### When to Run

- Explicit `--debate` flag
- Level 3 surfaces 2+ critical risks
- Feature touches: multi-tenant data, authentication, billing/cost calculations, HITL gates

### Proponent Agent

**Role**: "Defend this spec as stated. Find the 3 strongest reasons this approach is correct."

Inputs: draft spec from Levels 1-3 + existing domain-model docs.

Produces:
1. 3 arguments FOR the approach (anchored on domain-model evidence)
2. 1 risk the approach explicitly accepts (and why it's acceptable)
3. 1 success criterion that would prove this works

### Skeptic Agent

**Role**: "Challenge this spec. Find the 3 most likely production failure modes."

Inputs: same as proponent.

Produces:
1. 3 points where the spec breaks in production (concrete failure scenario per point)
2. 1 alternative approach that avoids the highest-risk assumption
3. 1 hidden dependency the spec silently relies on

### Resolution (Orchestrator)

After both agents respond:
1. Identify 1-2 genuine disagreements (proponent's strength vs skeptic's failure mode)
2. Pose each genuine disagreement to user via `AskUserQuestion` for resolution
3. Log resolved disagreements as decision-log entries (`.claude/decisions.jsonl`)
4. Unresolved disagreements after 2 turns → showstopper (abandon cleanly)

---

## Hidden-Var Drag (SOTA 2026 Enhancement)

> "Hidden variables" = invisible spec dependencies that drag the feature toward production
> failure if not surfaced during elicitation. Run AFTER Level 2, BEFORE Level 3.

### 5 Categories

| Category | Probe questions | Risk if missed |
|----------|----------------|----------------|
| **domain** | Term matches CLAUDE.md glossary? MBSE layer correct? Chain placement confirmed? | Terminology drift → wrong implementation |
| **persona** | Named persona from 8? Secondary user? Anti-user (who must NOT)? | Features built for "everyone" work for no one |
| **edge** | Zero records? Max records? Concurrent edits? Partial import? | 3 AM failures on edge inputs |
| **perf** | Hot path? (see `performance-discipline.md`) Max latency? ParadeDB BM25 or PG? | Perf budget violated post-launch |
| **sec** | `project_id` filter confirmed? Audit trail needed? Role required? RBAC impact? | Multi-tenant data leak (irreversible) |

### Probe Execution

For each category, run 1-2 targeted questions based on the feature description.
A category is "covered" when its core invariant is confirmed (not necessarily perfect).
An uncovered category at this stage → **promoted to Level 3** constraint question.

### Hidden-Var Summary

After probing, emit a table:
```
| Category | Status   | Gap (if any)                    |
|----------|----------|---------------------------------|
| domain   | ✅ clear | —                               |
| persona  | ⚠️ gap  | Anti-user not defined           |
| edge     | ✅ clear | —                               |
| perf     | ⚠️ gap  | Latency budget not stated       |
| sec      | ✅ clear | project_id filter confirmed     |
```

Gaps → promoted to Level 3 or showstopper (if sec category is uncovered).

---

## /grill-me ≠ /to-prd (NON-NEGOTIABLE Distinction)

| Dimension | `/grill-me` | `/to-prd` |
|-----------|-------------|-----------|
| **Output** | `draft-<slug>.md` + decision-log entry | Section in `PRD-SYNAPSE-ENTERPRISE.md` |
| **Trigger** | Feature is ambiguous, pre-implementation | Spec is sharp, post-grilling |
| **Audience** | Developer + AI (internal, iterative) | Stakeholders + future devs (canonical) |
| **Produces PRD?** | ❌ NEVER | ✅ YES |
| **Decision-log?** | ✅ ALWAYS | ❌ Not its job |
| **Reversible?** | ✅ draft can be discarded | ⚠️ PRD edit = official record |

**Refusal pattern**: if user says "grill me and write the PRD" → `/grill-me` MUST:
1. Complete grilling and produce the draft spec
2. State explicitly: "Draft spec created. Use `/to-prd` to fold into the PRD."
3. NOT touch `PRD-SYNAPSE-ENTERPRISE.md`

This is a golden.jsonl refusal case (sp5-014) — weight 2.5.

## Output Discipline

- Per turn: **1 question** via `AskUserQuestion`, ≤ 5 lines of preamble, recommended answer included
- Per session: **1 draft spec** (`draft-<slug>.md`), gap log if abandoned, decision-log entry always
- Never: free-text question, multi-question turn, spec without persona, PRD write

## References

- Source: https://github.com/mattpocock/skills `engineering/grill-with-docs`
- License: MIT (see `CREDITS.md` after batch consolidation)
- Plan (v1): `.blueprint/plans/ultrathink-regarde-ce-qui-abundant-petal.md` Section H W4.5
- Plan (v2): `.blueprint/plans/sp5-grill-me-socratic-2026.md`
- Memory: `memory/lesson_cherrypick_distribute_vs_dedicated_wave.md`
- Socratic patterns: `skills/grill-with-docs/references/socratic-patterns.md`
- Adversarial debate: `skills/grill-with-docs/references/adversarial-debate.md`
- Command: `commands/grill-me.md`
- Companion: `vision-alignment`, `brainstorming`, `plan-builder`, `to-prd`, `decision-log`
- Socratic-Solver: https://www.emergentmind.com/topics/socratic-solver
- SocraticAgent: https://www.emergentmind.com/topics/socraticagent
- Mitsubishi adversarial debates: https://us.mitsubishielectric.com/en/pr/global/2026/0120/
- Towards-AI: https://towardsai.net/p/machine-learning/the-socratic-prompt-how-to-make-a-language-model-stop-guessing-and-start-thinking
- Synapse domain SSoT: `CLAUDE.md`, `.blueprint/PRD-SYNAPSE-ENTERPRISE.md`, `memory/MEMORY.md`
- Persona list: `.blueprint/PRD-SYNAPSE-ENTERPRISE.md` § Personas
- Engineering chain: `CLAUDE.md` § IDENTITY
- Part lifecycle: `CLAUDE.md` § IDENTITY (G→E→M→I→S)
- MBSE 4-layer: `CLAUDE.md` § IDENTITY (QUOI / OÙ / COMMENT / QUI)
- Performance discipline: `.claude/rules/performance-discipline.md`

---

## Practical workflow tactics (re-merged from upstream 2026-05-03)

The atlas v2.0.0 SOTA enhancements (Socratic Q-Tree, Adversarial Debate, Hidden-Var Drag) augment but do not replace upstream's practical tactics. Per audit `memory/audit-mattpocock-drift-2026-05-03.md`, these upstream tactics are now exposed via companion references:

- **Domain awareness** — before grilling, locate the project's existing `CLAUDE.md` glossary section, `.blueprint/PRD-*.md` glossary, or `.claude/decisions.jsonl` ADRs. Do not re-litigate established decisions.
- **Cross-reference with code** — when the user asserts a behavior, grep the codebase to verify before challenging. Saves rounds of unnecessary debate.
- **Sharpen fuzzy language** — when a term is used loosely, name it explicitly during the conversation. Add to CONTEXT.md / glossary on the spot.
- **Challenge against the glossary** — every claim should map to a known term. Unmappable claims surface either (a) a missing glossary term, or (b) a confused requirement.
- **Offer ADRs sparingly** — only when the rationale would actually be needed by a future explorer. Skip ephemeral ("not worth it now") and self-evident reasons.
- **Discuss concrete scenarios** — ground abstract claims in scenarios. "When the user clicks X with Y selected, what happens?" beats "the system should be flexible."

Companion references (ported 2026-05-03 from upstream `mattpocock/skills`):
- [references/CONTEXT-FORMAT.md](references/CONTEXT-FORMAT.md) — DDD glossary template (compatible with `ubiquitous-language` skill)
- [references/ADR-FORMAT.md](references/ADR-FORMAT.md) — ADR template (used by `decision-log` skill)
Get grill-with-docs.

vz-bench-debug

vz-scrape-runner

Think you can beat it?