ResearchWILLOSCARFree

paragraph-curator

Structured paragraph curation for C5: **select -> evaluate -> subset -> fuse**, so drafts converge instead of only expanding. **Trigger**: paragraph curator, curation, select evaluate fuse, paragraph selection, 选段, 评价, 融合, 收敛, 去冗余. **Use when**: you are in C5, `sections/*.md` exist, and the writing loop drifts toward 'longer by accumulation' (repetition, redundant paragraphs, weak synthesis). **Skip if**: evidence packs are thin / `evidence-selfloop` is BLOCKED; or you are pre-C2 (NO PROSE). **Network**: none. **Guardrail**: do not invent facts; do not add/remove citation keys; do not move citations across subsections; keep section-level claims consistent with `output/ARGUMENT_SKELETON.md# Consistency Contract`.

Repo bundle on VersuzWILLOSCAR/research-units-pipeline-skills109 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/WILLOSCAR/research-units-pipeline-skills Yours? Claim it ↗

§ 01 — Stats

Stars431

Prior1169

Quality—

Score—

Tasks—

§ 02 — Install

Get paragraph-curator.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install willoscar-research-units-pipeline-skills-codex-skills-paragraph-curator

Or clone the repo

$git clone https://github.com/WILLOSCAR/research-units-pipeline-skills.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge willoscar-research-units-pipeline-skills-codex-skills-paragraph-curator↵

Show SKILL.md content (~2.1k tokens)

---
name: paragraph-curator
description: |
  Structured paragraph curation for C5: **select -> evaluate -> subset -> fuse**, so drafts converge instead of only expanding.
  **Trigger**: paragraph curator, curation, select evaluate fuse, paragraph selection, 选段, 评价, 融合, 收敛, 去冗余.
  **Use when**: you are in C5, `sections/*.md` exist, and the writing loop drifts toward 'longer by accumulation' (repetition, redundant paragraphs, weak synthesis).
  **Skip if**: evidence packs are thin / `evidence-selfloop` is BLOCKED; or you are pre-C2 (NO PROSE).
  **Network**: none.
  **Guardrail**: do not invent facts; do not add/remove citation keys; do not move citations across subsections; keep section-level claims consistent with `output/ARGUMENT_SKELETON.md# Consistency Contract`.
---

# Paragraph Curator (select -> evaluate -> subset -> fuse)

Purpose: turn “keep rewriting and getting longer” into a controlled convergence step.

This skill adds a decision layer between “draft paragraphs” and “polish voice”:
- keep the best paragraphs
- merge redundant ones
- rewrite for clearer argument moves
- expand only when coverage is missing (using existing evidence cards)

This is a content-structure pass (not a style pass). Run `style-harmonizer` and `opener-variator` after curation.

## Inputs

Required:
- `sections/` (especially H3 bodies: `sections/S<sub_id>.md`)
- `outline/writer_context_packs.jsonl` (what each H3 must cover + allowed citations)
- `output/ARGUMENT_SKELETON.md` (single source of truth for terminology + premises)

Recommended:
- `output/SECTION_ARGUMENT_SUMMARIES.jsonl` (paragraph moves + outputs)
- `output/SECTION_LOGIC_REPORT.md` (paragraph linkage risks)
- `output/WRITER_SELFLOOP_TODO.md` (style smells / scope/citation warnings)

## Outputs

- Updated `sections/*.md` (same filenames; body-only; no headings)
- `output/PARAGRAPH_CURATION_REPORT.md` (short; PASS/FAIL + what changed)
- Create `sections/paragraphs_curated.refined.ok` when done (empty file; pipeline contract signal)

## What this skill optimizes (rubric)

You are not trying to “shorten”. You are trying to increase information density while keeping the section verifiable.

Score each paragraph on a simple 0-2 rubric:

| Criterion | 0 (bad) | 1 (ok) | 2 (good) |
|---|---|---|---|
| Coverage | does not match any required axis/card | matches one axis, thin | directly executes a must-use card/comparison |
| Novelty | repeats nearby content | partially redundant | adds a distinct comparison/insight |
| Move clarity | unclear what it does | move exists, weak output | clear move + reusable output |
| Consistency | premise/term drift vs skeleton | minor mismatch | fully aligned with Consistency Contract |
| Citation hygiene | uncited when it should be; cite-dump vibe | acceptable | citations are local and anchored (not just tail) |
| Fusion readiness | cannot merge; tangled | mergeable with edits | clean unit that can be fused or kept |

Decision labels:
- `KEEP`: keep mostly as-is
- `REWRITE`: keep content, rewrite for clearer move/output
- `FUSE`: merge with neighbor(s) and rewrite into one stronger paragraph
- `REPLACE`: keep the slot, but rewrite using existing evidence cards (when coverage is missing)

## Paragraph budget (profile-aware)

Default per-H3 target:
- `draft_profile=survey`: 10-12 paragraphs
- `draft_profile=deep`: 11-13 paragraphs

If you exceed the budget, do not delete content blindly. Prefer `FUSE` (merge redundancy) and make the fused paragraph denser.

## Must-have coverage checklist (per H3)

Each H3 must contain at least:
- 1x `Definition/Setup` (only if this H3 introduces a new term/protocol field)
- 2x concrete `Contrast` paragraphs (A-vs-B comparisons; not just “many papers do...”)
- 1x `Evaluation anchor` paragraph (task + metric + constraint/budget/tool access; cite-backed)
- 1x cross-paper `Synthesis` paragraph (what generalizes, what does not; cite-backed)
- 1x `Boundary/Failure` paragraph (limitations; threats to validity; cite-backed when possible)
- 1x `Local conclusion` (a reusable takeaway used downstream)

If any item is missing, use `REPLACE` to write that paragraph from the writer context pack (do not invent new facts).

## Workflow (minimal)

1) Pick the target set
- Start with the H3 bodies listed in `output/SECTION_LOGIC_REPORT.md`, plus any H3 flagged in `output/WRITER_SELFLOOP_TODO.md` as repetitive/template-y, plus any H3 that keeps growing across edits.
- Work file-by-file: each target is a concrete `sections/S<sub_id>.md`.

2) Build a paragraph inventory (scratch only; do not paste into the paper)
- If `output/SECTION_ARGUMENT_SUMMARIES.jsonl` exists, use its per-paragraph `moves`/`output` as the first draft of your inventory, then reconcile with the actual text.
For each paragraph, write one line:
- `P<i> :: move(s) -> output (1 sentence) :: citations (keys)`

3) Apply the rubric and label each paragraph
- Mark `KEEP/REWRITE/FUSE/REPLACE`.
- If two adjacent paragraphs repeat the same axis, `FUSE`.
- For any paragraph you plan to change (`REWRITE`/`REPLACE`/`FUSE`), draft 2-3 candidate rewrites in parallel (different angles: contrast-first / protocol-first / synthesis-first).
  - Score candidates quickly with the rubric; keep one winner (or fuse two if they cover complementary axes).
  - Keep citation keys unchanged while sampling; you are choosing surface form + structure, not changing the evidence set.

4) Construct the curated set
- Use `outline/writer_context_packs.jsonl` to enforce must-have coverage (paragraph_plan/must_use/comparison_cards/limitation_hooks) without inventing new content.
- Enforce the must-have coverage checklist.
- Enforce the paragraph budget by fusing redundancy rather than deleting substance.

5) Fuse + rewrite (keep citation keys fixed)
Rules that keep the pipeline stable:
- Do not add/remove citation keys; when fusing, carry citations forward and re-anchor them to the right sentence.
- Do not move citations across subsections.
- Avoid adjacent citation blocks (e.g., `[@a] [@b]`) and duplicate keys in one block (e.g., `[@a; @a]`).
- When fusing, it is often faster to write two fused candidates (one contrast-heavy, one synthesis-heavy) and pick the better one.

6) Write the report + marker
- `output/PARAGRAPH_CURATION_REPORT.md` should be short and actionable:
  - `- Status: PASS|FAIL`
  - per H3: paragraph count before/after; what was fused; any remaining gaps
  - (minimal) how many candidates you tried for the main rewrites (e.g., 2-3), so future passes can see whether this was a real selection step
- Create `sections/paragraphs_curated.refined.ok`.

## Routing rules

- If you cannot fill a missing must-have paragraph without new evidence: stop and route upstream (`evidence-selfloop` / C3-C4). Do not pad.
- If you feel forced to change a definition or evaluation premise: update `output/ARGUMENT_SKELETON.md# Consistency Contract` first, then rerun `argument-selfloop`.
- If the only issue is surface cadence/openers: do not overwork curation; run `style-harmonizer` / `opener-variator`.

## Done checklist

- [ ] Each targeted H3 stays within its paragraph budget (survey 10-12; deep 11-13) without losing required moves.
- [ ] Redundant paragraphs are fused into denser, clearer ones (not just deleted).
- [ ] No citation keys were added/removed; citation shape is reader-facing (no adjacent blocks, no dup keys).
- [ ] `output/PARAGRAPH_CURATION_REPORT.md` exists and is understandable.
- [ ] `sections/paragraphs_curated.refined.ok` exists.

## Script

### Quick Start

- `python .codex/skills/paragraph-curator/scripts/run.py --workspace workspaces/<ws>`

### All Options

- `--workspace <dir>` (required)
- `--unit-id <U###>`
- `--inputs <semicolon-separated>`
- `--outputs <semicolon-separated>`
- `--checkpoint <C#>`

### Examples

- Curate paragraphs in a survey workspace:
  - `python .codex/skills/paragraph-curator/scripts/run.py --workspace workspaces/survey-llm-agents`