reproduce-basic-paper-tex

Show SKILL.md content (~3.5k tokens)
---
name: reproduce-basic-paper-tex
description: Reproduce an entire research paper as a single merged LaTeX source file by orchestrating the lower-level reproduce-*-basic-tex skills. Asks the user for paper context (name, PDF path, reconstruction directory, stem), detects which per-page `.tex` files already exist, dispatches the most-efficient downstream skill to generate whatever is missing (all-pages / multi-pages / per-page based on the shape of the missing set), concatenates all per-page files into `<stem>_full.tex`, and verifies the result compiles cleanly. Use when the user invokes `/reproduce-basic-paper-tex` and wants a single end-to-end "reconstruct this paper" command that picks up from wherever the reconstruction directory currently sits.
---

# reproduce-basic-paper-tex

Produce a **single merged LaTeX document** for an entire research paper by orchestrating the three lower-level reproduce-*-basic-tex skills. This is the **top of the reproduce-page-basic family** — a Layer 3 orchestrator that:

1. Asks the user for context (paper name, PDF path, stem, reconstruction directory)
2. Detects which per-page `.tex` files already exist in the reconstruction directory
3. Dispatches the most-efficient downstream skill to generate whatever is missing
4. Concatenates all per-page files into `<stem>_full.tex` via the skill-local `concatenate_pages.py` helper
5. Verifies the merged file compiles cleanly via `pdflatex` to a temp directory (so no `.pdf`/`.aux`/`.log` pollute the reconstruction dir)
6. Reports success or the compile error and STOPs

The skill's primary value is **intelligent dispatch**: it picks the cheapest downstream skill based on the shape of the missing-page set. Running this on a reconstruction dir that already has all pages is effectively just concatenate + verify, which is near-instant.

## Arguments (optional — skill asks for missing pieces)

- `<pdf_path>` — absolute path to the source PDF (optional if all per-page tex files already exist)
- `<stem>` — filename prefix for per-page files (e.g., `test_paper` means the skill looks for `test_paper_page1.tex`, `test_paper_page2.tex`, ...). Defaults to the PDF basename without extension.
- `<reconstruction_dir>` — directory containing (or destined to contain) the per-page files. Defaults to `<stem>_reconstruction/` under the current working directory.

**Interactive mode**: if invoked with zero arguments, the skill asks the user for `<pdf_path>` first, then infers `<stem>` and `<reconstruction_dir>` from sensible defaults and confirms them explicitly before proceeding.

## Pipeline

### Step 1 — Gather context

If any required argument is missing, ask the user a single consolidated question:

> "I'm about to reconstruct a paper as a merged `.tex` file. I need:
>  - PDF path (e.g., `/path/to/paper.pdf`)
>  - Stem for the per-page filenames (e.g., `paper_stem` → looks for `paper_stem_pageN.tex`). Default: PDF basename.
>  - Reconstruction directory. Default: `<stem>_reconstruction/` under CWD.
>  - Human-readable paper name (for logs). Default: PDF basename."

Wait for the user's reply. Accept defaults on anything they omit.

If the user supplies a PDF path but NO per-page tex files exist yet and no reconstruction directory was specified, create `<stem>_reconstruction/` as the default location.

**Edge case**: if no PDF is supplied AND no per-page files exist, report that nothing can be done and STOP — there's nothing to operate on.

### Step 2 — Determine the total page count

Two possible sources:

- **PDF available**: run the shared helper
  ```bash
  python3 ~/.claude/skills/reproduce-all-pages-basic-tex/helpers/get_total_pages.py <pdf_path>
  ```
  Capture the bare-integer stdout as `TOTAL`. If the helper exits non-zero, surface its stderr to the user and STOP.

- **PDF not available (all-pages-already-exist path)**: infer `TOTAL` from the highest-numbered `<stem>_pageN.tex` file in `<reconstruction_dir>`. If no files exist, STOP (see Step 1 edge case).

Report the derived `TOTAL` to the user so they can confirm the skill is looking at the right paper.

### Step 3 — Classify the missing-page set

For each `N` in `1..TOTAL`, check whether `<reconstruction_dir>/<stem>_page<N>.tex` exists. Partition `1..TOTAL` into `EXISTING` and `MISSING`.

Classify `MISSING` into one of four shapes:

| Shape of `MISSING` | Downstream action |
|---|---|
| `MISSING == ∅` (empty — all pages already exist) | Skip directly to Step 5 (concatenate) |
| `MISSING == {1, 2, …, TOTAL}` (all pages missing) | Invoke `reproduce-all-pages-basic-tex` with `<pdf_path>` |
| `MISSING` is a **single contiguous run** `{a, a+1, …, b}` (some existing, some missing, but missing forms one unbroken range) | Invoke `reproduce-multiple-pages-basic-tex` with `<pdf_path> a b` |
| `MISSING` has **gaps** (two or more contiguous runs, or a mix of runs and singletons, or only scattered singletons) | Invoke `reproduce-page-basic-tex` with `<pdf_path> N` **once per missing page** `N` (in ascending order) |

**Contiguous-run detection algorithm**: sort `MISSING`. Check that `missing[i+1] == missing[i] + 1` for all adjacent pairs. If yes, it's a single contiguous run `[missing[0], missing[-1]]`. If no, fall through to the scattered-gaps case.

Report the classification and the chosen downstream skill to the user BEFORE invoking it (single line, e.g.\ `"All 15 pages missing → delegating to reproduce-all-pages-basic-tex"` or `"3 of 15 pages missing at {5, 8, 12} → running reproduce-page-basic-tex 3 times"`).

### Step 4 — Generate missing pages (if any)

Execute the downstream skill pipeline selected in Step 3. For each downstream invocation, follow that skill's own SKILL.md pipeline verbatim — do NOT re-implement its logic inline here.

**Failure handling**: if the downstream skill reports per-page failures via its continue-and-report summary, carry those failure states into Step 5. A per-page tex file that failed to generate should NOT be passed to the concatenation step; instead, report the failures and STOP before Step 5. (Do not attempt to concatenate an incomplete reconstruction — the merged output would silently lose content.)

### Step 5 — Concatenate

Once all `<stem>_page1.tex` through `<stem>_page<TOTAL>.tex` exist, run the skill-local helper:

```bash
python3 ~/.claude/skills/reproduce-basic-paper-tex/helpers/concatenate_pages.py \
  <reconstruction_dir> <stem> <TOTAL>
```

The helper:
- Discovers all `<stem>_page<N>.tex` files in `<reconstruction_dir>`
- Sorts them by page number
- Verifies the full range `[1, TOTAL]` is present (exit 1 if any are missing — in which case Step 4 didn't finish and the skill should report that and STOP)
- Extracts each file's body content and its `\setcounter{equation}{M}` line
- Assembles a single merged document with a canonical superset preamble + `\newpage` between pages
- Writes `<reconstruction_dir>/<stem>_full.tex`

Exit 0 from the helper means the merge succeeded structurally. Stdout reports the output path and page count.

### Step 6 — Verify coherence

Compile the merged file via `pdflatex` in **a temp directory** so no `.pdf`/`.aux`/`.log` files pollute `<reconstruction_dir>`:

```bash
TMPDIR=$(mktemp -d)
pdflatex -interaction=nonstopmode -output-directory="$TMPDIR" \
  <reconstruction_dir>/<stem>_full.tex > /tmp/<stem>_compile.log 2>&1
```

Check for fatal LaTeX errors by searching the log for lines matching `^!`:

```bash
ERROR_LINES=$(grep "^!" /tmp/<stem>_compile.log)
```

- **If `ERROR_LINES` is empty**: compile succeeded. Report "coherent ✓" with the PDF page count (read via `pdfinfo` on the temp-dir output). Delete `$TMPDIR`. Proceed to Step 7.
- **If `ERROR_LINES` is non-empty**: compile failed. Report the first 10 lines of errors to the user along with the source line numbers, note that the merged `.tex` file still exists at `<reconstruction_dir>/<stem>_full.tex` for manual inspection, and STOP.

**Important**: compilation errors in the merged file are often caused by latent bugs in individual per-page files (e.g., unescaped `\~`, unbalanced braces, missing math-mode delimiters). The tex-only sub-skills don't compile during their own pipelines, so such bugs can sit undetected until this skill runs. When reporting a compile failure, tell the user which per-page file the error likely originated from (the line number in the merged file can be traced back to a specific page by counting `% =========== Page N:` comment markers).

### Step 7 — Report and STOP

On success, report a single summary block:

```
Paper: <paper_name>
PDF: <pdf_path>  (<TOTAL> pages)
Reconstruction dir: <reconstruction_dir>
Pages generated this run: <count of pages created in Step 4, or "0 — all already existed">
Merged output: <reconstruction_dir>/<stem>_full.tex
Coherence check: ✓ compiles cleanly (<physical PDF page count>)
```

Then STOP. Do NOT:
- Open any viewer
- Copy the PDF anywhere
- Ask "do you want to see it?"
- Append to any NOTES.md (that's the per-page skills' job)

## Failure semantics

This skill can fail at each step; each failure mode halts the pipeline with a clear reason:

| Step | Failure mode | User-visible message |
|---|---|---|
| 1 | No PDF and no existing per-page files | "Nothing to operate on — give me a PDF path or an existing reconstruction directory." |
| 2 | `get_total_pages.py` non-zero exit | Surface the helper's stderr verbatim. |
| 4 | Downstream skill reports per-page failures | "Reconstruction is incomplete — the following pages failed: [list]. Merged `.tex` not generated." |
| 5 | `concatenate_pages.py` exit 1 (missing pages) | "Pages <list> missing after Step 4 — something is wrong with the downstream skill's output. Merged `.tex` not generated." |
| 5 | `concatenate_pages.py` exit 2 (parse error) | "A per-page file is malformed: <error>. Merged `.tex` not generated." |
| 6 | `pdflatex` reports fatal errors | Show first 10 `^!` lines + pointer to the likely source page + note that the merged `.tex` exists for manual inspection. |

The skill never silently produces a corrupt `<stem>_full.tex`. Either it finishes with coherence ✓, or it halts with a reason.

## Resources in this skill

- **`NOTES.md`** (symlink → base `reproduce-page-basic/NOTES.md`) — shared judgment-call reference.
- **`examples/`** (symlink → base) — 11 worked Wei examples, consulted transitively by downstream skills.
- **`helpers/concatenate_pages.py`** — **new**; the skill-local merger. Takes `<reconstruction_dir> <stem> [<expected_total>]`, produces `<reconstruction_dir>/<stem>_full.tex`. Uses a canonical superset preamble hardcoded in the helper (covers every package seen in the Wei + Adam reconstructions). Tri-state exit codes (0 = OK, 1 = missing pages, 2 = usage/parse error).

This skill does NOT symlink the four base helpers because it never runs per-page extraction or prose cleanup directly — those are the downstream skills' job.

## When to use this skill vs. the sub-skills directly

| If you want… | Use |
|---|---|
| A single page, `.tex` + compiled PDF + viewer | `reproduce-page-basic` |
| A single page, `.tex` only | `reproduce-page-basic-tex` |
| A contiguous range of pages, `.tex` only | `reproduce-multiple-pages-basic-tex` |
| Every page of a paper, `.tex` only, individual per-page files | `reproduce-all-pages-basic-tex` |
| **A single merged `.tex` document for the whole paper** (end-to-end) | **`reproduce-basic-paper-tex`** (this skill) |
| **Resume a partially-completed reconstruction** (some pages exist, others missing) | **`reproduce-basic-paper-tex`** — its dispatch logic will pick the right sub-skill based on what's missing |
| Pixel-perfect publisher layout | None of the current skills — out of scope |

The "resume" use case is the one this skill uniquely enables: you can stop mid-reconstruction, come back later, and invoke this skill on the partial state. It'll fill in whatever's missing and concatenate. No other skill in the family handles that.

## Do-not-touch rules

- NEVER modify the input PDF. It is read-only.
- NEVER delete or overwrite existing `<stem>_page<N>.tex` files. The idempotency `.bak` rule is inherited from the downstream skills; this skill respects it.
- NEVER concatenate an incomplete set of per-page files. If Step 4 reports failures, halt — do not produce a "mostly merged" output that silently loses content.
- NEVER write `.pdf`/`.aux`/`.log` files to `<reconstruction_dir>`. The coherence check happens in a temp directory; only `<stem>_full.tex` lands in the reconstruction dir.
- NEVER run the merged document through a viewer. The skill is source-only; the temp-dir compile is for verification only.
- NEVER skip the Step 6 coherence check. Compilation is the ground-truth "does this work" signal and catches bugs that the tex-only sub-skills miss by design.
Get reproduce-basic-paper-tex.

vz-scrape-runner

vz-bench-debug

Think you can beat it?