Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install seb155-atlas-plugin-skills-pbt-generatorgit clone https://github.com/seb155/atlas-plugin.gitcp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-skills-pbt-generator/SKILL.md---
name: pbt-generator
description: "Property-Based Test generator local runtime. Generates Hypothesis (Python) / fast-check (TS) tests from code change. LLM-judge validates invariants. Use when 'generate PBT', 'property test', 'invariants for X'."
mode: [coding, engineering]
effort: medium
version: 1.0.0
tier: [admin]
---
# pbt-generator — Property-Based Test generator (local runtime)
> Treat invariants like production tests: AI drafts properties from code shape, local runner exercises them with random inputs, LLM-judge validates the invariant set is meaningful.
> Pattern: code change → draft PBT → run locally (pytest/vitest) → judge invariants → PASS/FAIL.
> Pairs with W3.1 `skill-regression-test` (failing PBT cases become regression fixtures) and W1.5 `atlas-eval` (judge layer).
## When to Use
Invoke this skill when the user says:
- "generate PBT for `<function>`"
- "property test this"
- "what are the invariants for `X`?"
- "fuzz `parse_*` / `validate_*` / `serialize_*`"
- "run PBT on the diff"
- After modifying a pure helper, parser, validator, or serializer (high invariant density)
Skip for:
- I/O-heavy code (DB, HTTP, FS) — use integration tests instead
- UI components — use Storybook/Playwright
- Code with < 10 LOC and zero branches — TVT T0/T1 covers it cheaper
## What This Skill Does (No External Sandbox)
**Doctrine** (locked 2026-04-30 by Seb): runs **LOCAL on the trust mesh**. No e2b, no Modal, no remote sandbox. Rationale: PBT is fast, deterministic, and sandbox-free runtime keeps the loop tight (< 30s end-to-end on a typical helper).
```
┌─────────────────────────────────────────────────────────────────┐
│ 1. INSPECT atlas pbt generate <file-or-function> │
│ │ Reads target. Extracts signature, types, branches│
│ ▼ │
│ 2. DRAFT LLM proposes 3-7 invariants + Hypothesis/ │
│ │ fast-check strategies covering them │
│ ▼ │
│ 3. WRITE Emits test file under tests/pbt/<target>_pbt.py │
│ │ (or .test.ts for TS). Idempotent — overwrites │
│ │ with `# generated-by: pbt-generator` header │
│ ▼ │
│ 4. RUN atlas pbt run <skill> │
│ │ pytest -p hypothesis --hypothesis-seed=0 -x │
│ │ OR bunx vitest run --testNamePattern '@pbt' │
│ ▼ │
│ 5. JUDGE atlas pbt judge <result> │
│ Delegates to W1.5 atlas-eval. Scores invariant │
│ quality 0-100. Threshold ≥ 70 = PASS. │
└─────────────────────────────────────────────────────────────────┘
```
## CLI Surface
```bash
# Draft + write a test file (no execution)
atlas pbt generate backend/app/services/wbs/parse.py::parse_wbs_code
atlas pbt generate frontend/src/utils/format-currency.ts
# Generate AND run (combined fast loop)
atlas pbt run pbt-generator # runs all PBTs from this skill
atlas pbt run --target parse_wbs_code # narrow to one target
# Judge a recorded result (uses W1.5 atlas-eval)
atlas pbt judge tests/pbt/parse_wbs_code_pbt_result.jsonl
# Re-record fixtures from a passing run (used by W3.1)
atlas pbt freeze parse_wbs_code
```
Flags:
- `--lang {py,ts,auto}` — defaults to file extension
- `--max-examples N` — Hypothesis `max_examples` / fast-check `numRuns` (default 100)
- `--seed N` — deterministic replay (default 0)
- `--judge / --no-judge` — toggle LLM-judge step (default: on for `run`, off for `generate`)
## Invariant Cookbook (what the LLM drafts)
The generator proposes invariants from these standard families. The judge penalises drafts that test only happy-path values (i.e. forgot to assert these where applicable):
| Family | Example for a parser | Example for a serializer |
|---|---|---|
| **Roundtrip** | `parse(format(x)) == x` | `decode(encode(x)) == x` |
| **Idempotence** | `parse(parse(s)) == parse(s)` | `normalize(normalize(x)) == normalize(x)` |
| **Determinism** | same input → same output across calls | same input → byte-identical bytes |
| **Boundary** | empty / single / max-length inputs | None / 0 / negative / NaN |
| **Type safety** | never raises on `str`; always raises on `bytes` | output type matches signature |
| **Algebraic** | `merge(a, b) == merge(b, a)` (commutativity), `f(g(x)) == h(x)` | sum-of-parts equals whole |
| **Monotonic** | sort-order preserved across encoding | length non-decreasing |
## Reuse (no reinventing)
- **`tdd` skill** (existing) — owns the Red-Green-Refactor patterns. PBTs slot into RED phase as the *first* failing test, then concrete examples follow.
- **`atlas-eval` (W1.5)** — owns the LLM-judge harness. `pbt judge` is a thin wrapper that supplies the rubric (`references/judge-rubric.md` — invariant coverage 40 / boundary 25 / determinism 15 / readability 20).
- **Hypothesis** (Python, `pyproject.toml` already has it for backend tests) — strategies (`st.text`, `st.integers`, `st.from_type`) cover most signatures.
- **fast-check** (TS, install on demand: `bun add -D fast-check`) — `fc.string`, `fc.record`, `fc.tuple`.
- **W3.1 `skill-regression-test`** — when a PBT finds a failing input, its shrunken minimal counter-example is appended to the regression JSONL automatically. Same schema as `atlas-eval`.
## Output Schema (forward-compat with eval/canary)
`tests/pbt/<target>_pbt_result.jsonl` — one line per invariant:
```jsonl
{"target":"parse_wbs_code","invariant":"roundtrip","status":"pass","examples":100,"shrunk":null,"seed":0,"ts":"2026-05-01T...Z"}
{"target":"parse_wbs_code","invariant":"empty-input-raises","status":"fail","examples":3,"shrunk":"\"\"","seed":0,"ts":"..."}
```
## Verify (acceptance criteria)
The skill is considered working when this concrete example holds:
```bash
# 1. Pick a real Synapse helper that has known edge cases
atlas pbt generate backend/app/services/wbs/parse.py::parse_wbs_code
# 2. Run it
atlas pbt run --target parse_wbs_code --max-examples 200
# 3. EXPECT: at least one shrunken counter-example for empty-string input,
# surfaced as a minimal failing case (~3 chars or fewer after shrink)
# with status:fail in the JSONL.
```
If step 3 yields only `pass` rows, either (a) `parse_wbs_code` was hardened since the cookbook was written (good — log decision and pick another target), or (b) the draft missed the boundary family — re-run with `--judge` and inspect the rubric score; below 70 means re-draft.
## Pair-With Map
- **Before** invoking `tdd` for a pure function: run `pbt generate` first. If PBT finds counter-examples, those become the RED-phase tests.
- **During** `code-review`: reviewer-agent flags any new pure function (no I/O, ≥ 1 branch) without a corresponding `tests/pbt/*_pbt.{py,ts}` neighbour.
- **After** finding a regression bug in production: `pbt freeze <target>` records the shrunken input as a permanent regression fixture; W3.1 picks it up for nightly canary (W3.2).
## Constraints (non-negotiable)
- **No external sandbox.** Runs in the same Python/Node process as the dev box.
- **No network access in tests.** PBT targets pure functions only. If the target imports `httpx`, `requests`, `fetch`, or `psycopg`, the generator refuses (suggests integration test path instead).
- **Deterministic by default.** `--seed 0` always set unless the user overrides. Flaky PBTs are bugs in the invariant, not in the runner — fix the invariant.
- **Idempotent regeneration.** `atlas pbt generate` overwrites only files carrying the `# generated-by: pbt-generator` header. Hand-edited PBTs are preserved (rename them or strip the header).
## References
- Plan parent: `.blueprint/plans/ultrathink-regarde-ce-qui-abundant-petal.md` Section H W5.1
- Decision (no sandbox): user 2026-04-30
- Sibling skills: `tdd`, `atlas-eval` (W1.5), `skill-regression-test` (W3.1)
- Hypothesis docs: https://hypothesis.readthedocs.io/
- fast-check docs: https://fast-check.dev/
- TVT companion: `.claude/rules/test-value-tiering.md` (PBT lives at T1/T2)
- Mock-budget companion: `.claude/rules/testing-mock-budget.md` (PBT must NOT mock the target)