writing-mioflow-skills

Show SKILL.md content (~2.5k tokens)
---
name: writing-mioflow-skills
description: Use when creating a new mioflow skill, editing an existing mioflow skill, or verifying a skill works under pressure before deploying. Use when you need an agent skill to be bulletproof against rationalization. Do NOT use for project-specific AGENTS.md conventions or one-off solutions.
metadata:
  dependencies: []
---

# Writing MioFlow Skills

If `.mioflow/onboarding.json` is missing or stale for the current repo, stop and invoke `mioflow:using-mioflow` before continuing.

## Overview

Skills are code. They have bugs. Test them before deploying.

This is the TDD-for-skills methodology adapted from Superpowers (N=28,000 scale testing confirms persuasion-optimized skills produce 3-4× better agent compliance than plain instructions).

**THE IRON LAW: NO SKILL WITHOUT A FAILING TEST FIRST.**
Write skill before testing? Delete it. Start over. No exceptions — not for "simple additions," not for "just a section," not for "reference only."

## When to Use

Use this skill when you are about to:
- Create any new skill for the mioflow ecosystem
- Edit an existing skill (even a small section)
- Deploy a skill and want confidence it works under pressure

When NOT to use: AGENTS.md files, project-specific conventions, one-off prompt instructions.

## The Core Cycle: RED → GREEN → REFACTOR

| TDD Concept | Skill Equivalent |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | SKILL.md |
| Test fails (RED) | Agent violates rule without skill |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes, maintain compliance |

---

## PHASE 1 — RED: Write the Failing Test

**HARD-GATE: Do not write any skill content until you complete this phase.**

Teams that skip baseline testing consistently deploy skills with predictable, preventable failures.

**Steps:**
1. Define the skill's purpose: what behavior must it enforce? What are failure modes without it?
2. Create 3–5 pressure scenarios that stress-test critical constraints (see `references/pressure-test-template.md`)
3. Run scenarios WITHOUT the skill — give agents the realistic task under pressure
4. Document exact rationalizations verbatim: "Agent was wrong" is useless. "Agent said 'I already manually tested it, so the spirit of TDD is satisfied'" is target material
5. Identify patterns: which excuses repeat across scenarios?

**What to record:**
```
Scenario: [name]
Combined pressures: [list]
Exact violation: [what agent chose]
Exact rationalization (verbatim): "[quote]"
```

---

## PHASE 2 — GREEN: Write the Minimal Skill

Write SKILL.md addressing the **specific rationalizations documented in RED only.**
Do not add content for hypothetical cases you didn't observe — hypothetical content bloats the skill and gets skipped.

**SKILL.md checklist:**
- [ ] YAML frontmatter starts on line 1 (`---`)
- [ ] `name`: bare hyphen-case, matches the directory name exactly
- [ ] `description`: starts with "Use when..." — **triggering conditions ONLY, no workflow summary**
- [ ] Description is third-person, ≤1024 chars
- [ ] Body stays lean; prefer < 400 lines and move overflow into `references/` when practical
- [ ] Uses persuasion principles (see table below)
- [ ] HARD-GATE markers on critical stops
- [ ] `references/` files never nested more than one level deep

For MioFlow plugin skills specifically, keep frontmatter `name` bare. The plugin wrapper adds the `mioflow:` prefix when the skill is surfaced to agents.

**Description trap (most common mistake):**
Workflow summary in description → Claude follows description instead of reading skill body. Every time.
```yaml
# ❌ BAD — workflow summary
description: Use when creating skills — run baseline test, write minimal skill, run tests

# ✅ GOOD — triggering conditions only
description: Use when creating a new mioflow skill or editing an existing one
```

**Dependency metadata style:**
When a MioFlow skill has runtime dependencies, write `metadata.dependencies` as a mapping keyed by dependency id. Do not use YAML arrays of objects such as `- id: beads-cli`; generic skill evaluators may reject that frontmatter shape.

```yaml
metadata:
  dependencies:
    beads-cli:
      kind: command
      command: br
      missing_effect: unavailable
      reason: The skill reads and updates beads.
```

**Apply persuasion principles:**

| Principle | Implementation | Use For |
|---|---|---|
| **Authority** | "YOU MUST", "Never", "No exceptions" | Discipline-enforcing rules |
| **Commitment** | Ordered checklists, announce skill usage | Multi-step processes |
| **Scarcity** | "Before proceeding", "IMMEDIATELY after X" | Verification requirements |
| **Social Proof** | "Teams report...", "X without Y = failure. Every time." | Common failure patterns |
| **Unity** | "our skills", collaborative framing | Techniques, guidance |

After writing: re-run the same pressure scenarios WITH the skill. Agent must now comply.
If agent still fails → skill is unclear or incomplete. Revise and re-test. Do not proceed.

---

## PHASE 3 — REFACTOR: Close Loopholes

When an agent violates a rule despite having the skill, that is a test regression — the skill has a bug. Fix it:

1. Capture the new rationalization verbatim
2. Add explicit negation in the rule
3. Add entry to rationalization table in the skill
4. Add entry to red flags list
5. Re-run all scenarios — verify all still pass

Continue until no new rationalizations emerge from pressure testing.

**Meta-testing technique:** After an agent chooses wrong, ask:
> "You read the skill and chose Option C anyway. How could the skill have been written differently to make Option A the only acceptable answer?"

Three diagnoses:
- "The skill WAS clear, I chose to ignore it" → add "Violating the letter IS violating the spirit"
- "The skill should have said X" → add their exact suggestion verbatim
- "I didn't see section Y" → make key point more prominent, move it earlier

---

## PHASE 4 — VALIDATE & DOCUMENT

**Run validation:**
```bash
python3 "${CODEX_HOME:-$HOME/.codex}/skills/.system/skill-creator/scripts/quick_validate.py" skills/ak-mioflow/mioflow-skills/<skill-name>
bash scripts/check-markdown-links.sh skills/ak-mioflow/mioflow-skills/<skill-name>/SKILL.md
bash scripts/sync-skills.sh --dry-run
```

If the edited skill owns a repo-local test script, run that too.

**Create CREATION-LOG.md** documenting the full TDD process (see `references/creation-log-template.md`):
- Source material and extraction decisions
- Pressure scenarios run and results
- Rationalizations found and fixes applied
- Iterations required before bulletproof

**Signs the skill IS bulletproof:**
- Agent chooses correct option under maximum pressure
- Agent cites specific skill sections as justification
- Agent acknowledges temptation but follows rule
- Meta-test reveals: "skill was clear, I should follow it"

**Signs the skill is NOT bulletproof:**
- Agent finds rationalizations not addressed in the skill
- Agent argues the skill itself is wrong
- Agent creates "hybrid approaches" that satisfy letter but not spirit

---

## Rationalization Table (Common Violations)

| Excuse | Reality |
|---|---|
| "I know this technique, testing is unnecessary" | You're testing the SKILL, not your knowledge. Agents differ from you. |
| "It's so simple it can't have bugs" | Every untested skill has issues. Test takes 30 minutes. |
| "Academic questions passed — that's sufficient" | Reading a skill ≠ using a skill under pressure. Test application scenarios. |
| "My description summarizes the workflow so agents know what to do" | Workflow-summary descriptions cause agents to skip the skill body. Remove it. |
| "This edit is minor — testing isn't needed" | The Iron Law applies to edits. No exceptions. |
| "I'll test it after a few real uses" | Problems = agents misuse in production. Test BEFORE deploying. |
| "The baseline is obvious, I know what failures to expect" | You know YOUR failures. Agent failures differ. Run the baseline. |

---

## Red Flags — STOP and Run Baseline Tests

- Writing skill content before creating any pressure scenarios
- "I already know what agents will do"
- "It's just a small addition"
- "Academic questions passed, that's sufficient testing"
- Description contains workflow steps or process summary
- Skill addresses hypothetical scenarios not observed in baseline
- Deploying without running scenarios WITH skill (no green verification)
- "The skill was good last month, edits don't need testing"

**All of these mean: Stop. Run baseline tests first.**

---

## References

Load when needed:
- `references/creation-log-template.md` — CREATION-LOG.md template for documenting the TDD process
- `references/pressure-test-template.md` — Pressure scenario templates and the 7 pressure types

**Background:** The TDD-for-skills methodology originates from the Superpowers framework (obra/superpowers). Persuasion research: Meincke et al. (2025), N=28,000 LLM conversations, University of Pennsylvania. Compliance methodology validated by ComplexBench, PromptAgent, and RNR studies (see research/15-tdd-skills-methodology.md for full citations).
Get writing-mioflow-skills.

vz-bench-debug

vz-scrape-runner

Think you can beat it?