Testingevent4u-appFree

judge-test-coverage

Use when a diff may lack tests — missing assertions, uncovered branches, over-mocking, no regression test for a bug fix — dispatched by /review-changes, /do-and-judge, /judge, even without 'tests'.

Repo bundle on Versuzevent4u-app/agent-config329 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/event4u-app/agent-config Yours? Claim it ↗

§ 01 — Stats

Stars2

Prior1104

Quality—

Score—

Tasks—

§ 02 — Install

Get judge-test-coverage.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install event4u-app-agent-config-agent-src-uncompressed-skills-judge-test-coverage

Or clone the repo

$git clone https://github.com/event4u-app/agent-config.git

Or copy the SKILL.md manually

cp agent-config/SKILL.MD ~/.claude/skills/event4u-app-agent-config-agent-src-uncompressed-skills-judge-test-coverage/SKILL.md

More Versuz picks

★ Featured$0.99

vz-scrape-runner

Web

★ Featured$1.99

vz-bench-debug

Document

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge event4u-app-agent-config-agent-src-uncompressed-skills-judge-test-coverage↵

Show SKILL.md content (~1.9k tokens)

---
name: judge-test-coverage
description: "Use when a diff may lack tests — missing assertions, uncovered branches, over-mocking, no regression test for a bug fix — dispatched by /review-changes, /do-and-judge, /judge, even without 'tests'."
personas:
  - qa
source: package
domain: quality
---

# judge-test-coverage

> You are a judge specialized in **test coverage and test quality**.
> Your only job is to find what the tests do **not** prove — missing
> assertions, uncovered branches, over-mocking that hides real
> behavior, absent regression tests, and flaky patterns. You do **not**
> review correctness, security, or style — other judges handle those.

## When to use

* A diff adds or changes behavior and must ship with tests
* A diff is labeled a bug fix and needs a regression test
* `/review-changes` dispatches its "coverage" slice to this skill
* The user asks "are the tests enough?", "did we cover the edge
  case?", or "why is this still green after the fix?"

Do NOT use when:

* The diff is documentation-only or a formatting-only change
* The concern is a bug in production code — route to
  [`judge-bug-hunter`](../judge-bug-hunter/SKILL.md)
* The concern is a security gap — route to
  [`judge-security-auditor`](../judge-security-auditor/SKILL.md)

## Procedure

### 1. Inspect the diff and pair production changes with test changes

Examine the full diff. For every non-test file modified, identify the
matching test changes. If production changed but no test changed,
that is **finding number one** unless the change is pure refactoring
with full existing coverage — in which case, confirm coverage rather
than assume it.

### 2. Analyze the assertions

For each new or changed test:

| Question | Why it matters |
|---|---|
| Does it actually assert the new behavior, or only that no exception was thrown? | Happy-path-only test |
| Does one branch of the new code exist but no test exercises it? | Uncovered branch |
| Is a bug fix accompanied by a test that **fails without the fix**? | Regression gap |
| Are boundary inputs tested (empty, null, max, off-by-one)? | Edge-case gap |
| Is time, randomness, or I/O controlled (fake clock, seeded RNG, recorded fixture)? | Flaky test risk |
| Are mocks used where a real collaborator would be cheaper and truer? | Over-mocking |

### 3. Detect test-quality anti-patterns

- **Tautological assertions** — asserting the mock returned what the
  mock was told to return
- **Structural asserts** — `assertInstanceOf` or `assertCount` where
  the behavior under test is about values or side effects
- **Shared mutable state** between tests causing order dependence
- **Hidden network or filesystem calls** not stubbed
- **Snapshot/golden tests** with no human-readable intent

### 4. Verdict

| Verdict  | When to return it |
|---|---|
| `apply`  | New behavior is covered by assertions that would fail without the change |
| `revise` | Specific gaps listed: missing test, missing assertion, or weak assertion |
| `reject` | The test strategy is fundamentally wrong (all mocks, no real paths) |

## Validation

Before finalizing your verdict, confirm:

1. You matched every production hunk to a test hunk (or noted its absence)
2. Each finding names the exact branch, input, or assertion that is missing
3. For every bug fix in the diff, you verified a regression test exists
4. You have NOT commented on implementation correctness or style

## Output format

```
Judge:   judge-test-coverage
Model:   <resolved from subagents.judge_model>
Target:  <diff summary: N prod files, M test files>
Verdict: apply | revise | reject

Issues (if revise/reject):
  🔴  path/to/file.ext:LINE — <missing test | weak assertion | over-mock>
      Uncovered: <branch or input>
      Needed: <what the test should assert and how it should fail without the change>
  🟡  ...
```

Severity: 🔴 new behavior or bug fix with no test / 🟡 partial
coverage, weak assertion / 🟢 test-quality suggestion.

Required fields (ordered):

1. **Judge** and **Model** — skill name and resolved judge model
2. **Target** — one-line diff summary naming prod/test file split
3. **Verdict** — `apply`, `revise`, or `reject`
4. **Issues** — every finding names the uncovered branch/input and
   the missing or weak assertion; omit only when verdict is `apply`

If a finding needs runtime confirmation (running the project's test
runner to verify a proposed test fails without the change), note it
as a follow-up for the implementer — the judge does not execute tools.

## Gotcha

* **Counting lines, not branches** — coverage metrics can be 100% on
  lines with zero branch assertions. Walk conditionals.
* **Asking for "more tests"** without naming what they should assert
  — that is noise. Every finding must name the missing assertion.
* **Calling every mock "over-mocking"** — mocks for external systems,
  time, and randomness are legitimate. Flag only mocks that replace
  the unit under test's own collaborators.
* **Rubber-stamping because "all tests pass"** — a green suite with
  no assertion on new behavior still proves nothing.

## Do NOT

* NEVER return `apply` when new behavior lacks an assertion that would
  fail without the change
* NEVER flag correctness, security, or style — out of scope
* NEVER invent required tests for features the diff did not add
* NEVER silently fall back to a different model than `subagents.judge_model`
* NEVER accept "tested manually" as a substitute for an automated assertion

## References

- **LLM-as-a-Judge foundations** — Zheng et al., "Judging LLM-as-a-Judge
  with MT-Bench and Chatbot Arena" (2023), [arxiv.org/abs/2306.05685](https://arxiv.org/abs/2306.05685).
  Establishes the specialized-judge pattern and its known failure modes
  (position bias, self-consistency) this skill must defend against.
- **Test-value rubric** — Martin Fowler, "Test Pyramid",
  [martinfowler.com/bliki/TestPyramid.html](https://martinfowler.com/bliki/TestPyramid.html),
  and Kent Beck, "Test Desiderata",
  [kentbeck.github.io/TestDesiderata](https://kentbeck.github.io/TestDesiderata/).
  The properties (isolated, specific, fast, predictive) the judge asks
  of every new test — asserts on behavior, not coverage lines.
- [`subagent-orchestration`](../subagent-orchestration/SKILL.md) —
  model-pairing rules (`subagents.judge_model` one tier above implementer).
- [`test-driven-development`](../test-driven-development/SKILL.md) —
  the write-the-test-first workflow that prevents most findings this judge makes.
- [`testing-anti-patterns`](../testing-anti-patterns/SKILL.md) and its
  sibling [`process-anti-patterns.md`](../testing-anti-patterns/process-anti-patterns.md) —
  prevention layer this judge backs up; rationalization-table row numbers
  are valid review citations.
- Sibling judges: [`judge-bug-hunter`](../judge-bug-hunter/SKILL.md),
  [`judge-security-auditor`](../judge-security-auditor/SKILL.md),
  [`judge-code-quality`](../judge-code-quality/SKILL.md) — dispatched
  together by [`/review-changes`](../../commands/review-changes.md).