Researchseb155Free

pbt-generator

Property-Based Test generator local runtime. Generates Hypothesis (Python) / fast-check (TS) tests from code change. LLM-judge validates invariants. Use when 'generate PBT', 'property test', 'invariants for X'.

Repo bundle on Versuzseb155/atlas-plugin336 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/seb155/atlas-plugin Yours? Claim it ↗

§ 01 — Stats

Prior1090

Quality—

Score—

Tasks—

§ 02 — Install

Get pbt-generator.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install seb155-atlas-plugin-skills-pbt-generator

Or clone the repo

$git clone https://github.com/seb155/atlas-plugin.git

Or copy the SKILL.md manually

$cp atlas-plugin/SKILL.MD ~/.claude/skills/seb155-atlas-plugin-skills-pbt-generator/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge seb155-atlas-plugin-skills-pbt-generator↵

Show SKILL.md content (~2.3k tokens)

---
name: pbt-generator
description: "Property-Based Test generator local runtime. Generates Hypothesis (Python) / fast-check (TS) tests from code change. LLM-judge validates invariants. Use when 'generate PBT', 'property test', 'invariants for X'."
mode: [coding, engineering]
effort: medium
version: 1.0.0
tier: [admin]
---

# pbt-generator — Property-Based Test generator (local runtime)

> Treat invariants like production tests: AI drafts properties from code shape, local runner exercises them with random inputs, LLM-judge validates the invariant set is meaningful.
> Pattern: code change → draft PBT → run locally (pytest/vitest) → judge invariants → PASS/FAIL.
> Pairs with W3.1 `skill-regression-test` (failing PBT cases become regression fixtures) and W1.5 `atlas-eval` (judge layer).

## When to Use

Invoke this skill when the user says:

- "generate PBT for `<function>`"
- "property test this"
- "what are the invariants for `X`?"
- "fuzz `parse_*` / `validate_*` / `serialize_*`"
- "run PBT on the diff"
- After modifying a pure helper, parser, validator, or serializer (high invariant density)

Skip for:

- I/O-heavy code (DB, HTTP, FS) — use integration tests instead
- UI components — use Storybook/Playwright
- Code with < 10 LOC and zero branches — TVT T0/T1 covers it cheaper

## What This Skill Does (No External Sandbox)

**Doctrine** (locked 2026-04-30 by Seb): runs **LOCAL on the trust mesh**. No e2b, no Modal, no remote sandbox. Rationale: PBT is fast, deterministic, and sandbox-free runtime keeps the loop tight (< 30s end-to-end on a typical helper).

```
┌─────────────────────────────────────────────────────────────────┐
│  1. INSPECT     atlas pbt generate <file-or-function>           │
│     │           Reads target. Extracts signature, types, branches│
│     ▼                                                            │
│  2. DRAFT       LLM proposes 3-7 invariants + Hypothesis/        │
│     │           fast-check strategies covering them              │
│     ▼                                                            │
│  3. WRITE       Emits test file under tests/pbt/<target>_pbt.py  │
│     │           (or .test.ts for TS). Idempotent — overwrites    │
│     │           with `# generated-by: pbt-generator` header      │
│     ▼                                                            │
│  4. RUN         atlas pbt run <skill>                            │
│     │           pytest -p hypothesis --hypothesis-seed=0 -x      │
│     │           OR  bunx vitest run --testNamePattern '@pbt'     │
│     ▼                                                            │
│  5. JUDGE       atlas pbt judge <result>                         │
│                 Delegates to W1.5 atlas-eval. Scores invariant   │
│                 quality 0-100. Threshold ≥ 70 = PASS.            │
└─────────────────────────────────────────────────────────────────┘
```

## CLI Surface

```bash
# Draft + write a test file (no execution)
atlas pbt generate backend/app/services/wbs/parse.py::parse_wbs_code
atlas pbt generate frontend/src/utils/format-currency.ts

# Generate AND run (combined fast loop)
atlas pbt run pbt-generator              # runs all PBTs from this skill
atlas pbt run --target parse_wbs_code    # narrow to one target

# Judge a recorded result (uses W1.5 atlas-eval)
atlas pbt judge tests/pbt/parse_wbs_code_pbt_result.jsonl

# Re-record fixtures from a passing run (used by W3.1)
atlas pbt freeze parse_wbs_code
```

Flags:

- `--lang {py,ts,auto}` — defaults to file extension
- `--max-examples N` — Hypothesis `max_examples` / fast-check `numRuns` (default 100)
- `--seed N` — deterministic replay (default 0)
- `--judge / --no-judge` — toggle LLM-judge step (default: on for `run`, off for `generate`)

## Invariant Cookbook (what the LLM drafts)

The generator proposes invariants from these standard families. The judge penalises drafts that test only happy-path values (i.e. forgot to assert these where applicable):

| Family | Example for a parser | Example for a serializer |
|---|---|---|
| **Roundtrip** | `parse(format(x)) == x` | `decode(encode(x)) == x` |
| **Idempotence** | `parse(parse(s)) == parse(s)` | `normalize(normalize(x)) == normalize(x)` |
| **Determinism** | same input → same output across calls | same input → byte-identical bytes |
| **Boundary** | empty / single / max-length inputs | None / 0 / negative / NaN |
| **Type safety** | never raises on `str`; always raises on `bytes` | output type matches signature |
| **Algebraic** | `merge(a, b) == merge(b, a)` (commutativity), `f(g(x)) == h(x)` | sum-of-parts equals whole |
| **Monotonic** | sort-order preserved across encoding | length non-decreasing |

## Reuse (no reinventing)

- **`tdd` skill** (existing) — owns the Red-Green-Refactor patterns. PBTs slot into RED phase as the *first* failing test, then concrete examples follow.
- **`atlas-eval` (W1.5)** — owns the LLM-judge harness. `pbt judge` is a thin wrapper that supplies the rubric (`references/judge-rubric.md` — invariant coverage 40 / boundary 25 / determinism 15 / readability 20).
- **Hypothesis** (Python, `pyproject.toml` already has it for backend tests) — strategies (`st.text`, `st.integers`, `st.from_type`) cover most signatures.
- **fast-check** (TS, install on demand: `bun add -D fast-check`) — `fc.string`, `fc.record`, `fc.tuple`.
- **W3.1 `skill-regression-test`** — when a PBT finds a failing input, its shrunken minimal counter-example is appended to the regression JSONL automatically. Same schema as `atlas-eval`.

## Output Schema (forward-compat with eval/canary)

`tests/pbt/<target>_pbt_result.jsonl` — one line per invariant:

```jsonl
{"target":"parse_wbs_code","invariant":"roundtrip","status":"pass","examples":100,"shrunk":null,"seed":0,"ts":"2026-05-01T...Z"}
{"target":"parse_wbs_code","invariant":"empty-input-raises","status":"fail","examples":3,"shrunk":"\"\"","seed":0,"ts":"..."}
```

## Verify (acceptance criteria)

The skill is considered working when this concrete example holds:

```bash
# 1. Pick a real Synapse helper that has known edge cases
atlas pbt generate backend/app/services/wbs/parse.py::parse_wbs_code

# 2. Run it
atlas pbt run --target parse_wbs_code --max-examples 200

# 3. EXPECT: at least one shrunken counter-example for empty-string input,
#    surfaced as a minimal failing case (~3 chars or fewer after shrink)
#    with status:fail in the JSONL.
```

If step 3 yields only `pass` rows, either (a) `parse_wbs_code` was hardened since the cookbook was written (good — log decision and pick another target), or (b) the draft missed the boundary family — re-run with `--judge` and inspect the rubric score; below 70 means re-draft.

## Pair-With Map

- **Before** invoking `tdd` for a pure function: run `pbt generate` first. If PBT finds counter-examples, those become the RED-phase tests.
- **During** `code-review`: reviewer-agent flags any new pure function (no I/O, ≥ 1 branch) without a corresponding `tests/pbt/*_pbt.{py,ts}` neighbour.
- **After** finding a regression bug in production: `pbt freeze <target>` records the shrunken input as a permanent regression fixture; W3.1 picks it up for nightly canary (W3.2).

## Constraints (non-negotiable)

- **No external sandbox.** Runs in the same Python/Node process as the dev box.
- **No network access in tests.** PBT targets pure functions only. If the target imports `httpx`, `requests`, `fetch`, or `psycopg`, the generator refuses (suggests integration test path instead).
- **Deterministic by default.** `--seed 0` always set unless the user overrides. Flaky PBTs are bugs in the invariant, not in the runner — fix the invariant.
- **Idempotent regeneration.** `atlas pbt generate` overwrites only files carrying the `# generated-by: pbt-generator` header. Hand-edited PBTs are preserved (rename them or strip the header).

## References

- Plan parent: `.blueprint/plans/ultrathink-regarde-ce-qui-abundant-petal.md` Section H W5.1
- Decision (no sandbox): user 2026-04-30
- Sibling skills: `tdd`, `atlas-eval` (W1.5), `skill-regression-test` (W3.1)
- Hypothesis docs: https://hypothesis.readthedocs.io/
- fast-check docs: https://fast-check.dev/
- TVT companion: `.claude/rules/test-value-tiering.md` (PBT lives at T1/T2)
- Mock-budget companion: `.claude/rules/testing-mock-budget.md` (PBT must NOT mock the target)