Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-reproduce-page-basicgit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-reproduce-page-basic/SKILL.md---
name: reproduce-page-basic
description: Reproduce a single research-paper PDF page as a standalone compiled LaTeX file — prose taken verbatim from the PDF text layer, equations hand-transcribed from a visual reading of the rendered page, figures as placeholder boxes with visual descriptions, tables reconstructed via booktabs. Use when the user invokes `/reproduce-page-basic` with a PDF path and a page number, or asks to "reproduce page N of <paper>" in a way that produces a compiled LaTeX artifact.
---
# reproduce-page-basic
Produce a faithful LaTeX reconstruction of a *single* PDF page, using a hybrid pipeline that minimizes LLM involvement (where hallucination is a risk) and maximizes use of deterministic Unix tools (where exactness is free).
**Scope boundary**: this skill does NOT reproduce figures as images, does NOT redraw figures as TikZ, does NOT build a BibTeX bibliography, does NOT achieve pixel-perfect publisher layout. Figures become placeholder boxes with prose descriptions; citations stay as literal `[N]` text markers; output uses standard `article` class. For future skills that handle real figures or full-paper workflows, see the "scope boundary" section at the bottom.
## Arguments
- `<pdf_path>` — absolute path to the source PDF (must have a text layer; scanned PDFs won't work)
- `<page_number>` — 1-indexed page number as it appears when the PDF is opened in a viewer (NOT the printed journal page number, unless they happen to match)
- `[<out_dir>]` — optional; defaults to the current working directory
## Pipeline (execute these steps in order)
### 1. Extract the prose verbatim (deterministic)
Run the prose helper, which wraps `pdftotext -layout` for a single page:
```bash
python3 ~/.claude/skills/reproduce-page-basic/helpers/extract_pdf_text.py \
<pdf_path> --page <N> > <out_dir>/<stem>_page<N>_raw.txt
```
Treat the contents of `<stem>_page<N>_raw.txt` as the source of truth for all prose on the page. Do NOT retype prose from memory or from the visual reading — only from this file. This is the key invariant that prevents hallucination.
### 2. Read the page visually (for equations, figures, tables, layout)
Use the Read tool with `pages: "<N>"` to load the rendered page image:
```
Read(<pdf_path>, pages: "<N>")
```
From this visual reading, you extract:
- **Equations** — hand-transcribe to LaTeX. `pdftotext` mangles equations into scattered line fragments; ignore those fragments entirely.
- **Figures** — note the layout, panel structure, colorbars, captions, axis labels. You do NOT reproduce the figure; you write a placeholder box that describes it.
- **Tables** — reproduce the structure via `booktabs` + `multirow`. Cell content comes from the visual reading.
- **Section headers, italicized subsection headers, bold labels** — these are layout cues the text layer doesn't preserve reliably.
### 3. Write the `.tex` file
Combine the cleaned prose (from step 1) with the hand-transcribed equations and figure/table structure (from step 2) into a standalone compileable `.tex`. The file should:
1. Start with the stable preamble (see `preamble_stable.tex` in this skill, and `examples/wei-explicit-inverse/` for complete files). Add `booktabs`, `multirow`, `array` to the preamble if the page contains a table.
2. Use the footer convention `\cfoot{\small <PRINTED_PAGE> $\to$ \arabic{page}}` + `\setcounter{page}{1}`, where `<PRINTED_PAGE>` is the journal's printed page number (NOT the PDF page index — check the visual for the actual number shown on the page). This makes the 2-physical-PDF-pages-per-journal-page rendering unambiguous: footers read "136 → 1" and "136 → 2" instead of misleading "136" and "137".
3. Use `\setcounter{equation}{M}` at the top where M is the last equation number from the previous page (so per-page equation numbering matches the paper). Skip this if the page has no equations.
4. Apply the prose cleanup rules from NOTES.md §6:
- Soft hyphens (U+00AD): join word halves (`distribution` → `distribution`)
- Line breaks inside paragraphs: collapse to spaces
- Unicode en-dashes: convert to LaTeX `--`
- Unicode right quotation marks: convert to ASCII `'`
- Escape `%`, `&`, `$`, `_`, `#` in prose
5. Preserve typesetting anomalies **verbatim** — do not "fix" the paper. See NOTES.md §5 for known anomaly types.
6. Reference figures as placeholder boxes following the template in NOTES.md §7.
7. Reference tables using the template in NOTES.md §7a.
8. If the page ends mid-sentence, preserve the cut-off. Do NOT complete the sentence from context.
### 4. Compile
```bash
cd <out_dir>
pdflatex -interaction=nonstopmode <stem>_page<N>.tex
```
If compile fails, read the log, fix the LaTeX error (usually an unescaped `%` or missing package), recompile. Do NOT rewrite the content from scratch — the error is almost always a single-character typo.
### 5. Open the result
```bash
xdg-open <stem>_page<N>.pdf
```
### 6. (Optional but recommended) Record what was tricky
If the page introduced a new quirk not covered by NOTES.md (new content type, new typesetting anomaly, new prose cleanup rule), append a brief note to a `NOTES.md` file in `<out_dir>` so future reproductions of pages from the same paper can consult it. This is how the example base grows.
## The invariant you must preserve
> **Prose comes from the text layer. Equations come from the visual. Neither tool crosses into the other's domain.**
This is the key insight that makes the pipeline work. `pdftotext` is character-exact on prose but mangles equations into meaningless line fragments. The Read tool sees equations cleanly but will hallucinate long prose if asked to retype it. Using each tool only inside its strength zone is what gives the output its fidelity.
If you ever find yourself retyping a sentence "from memory" because the raw.txt had some weird character, STOP. Either the raw.txt is correct (and you need to escape the weird character in LaTeX), or the PDF has no text layer (in which case this skill cannot handle it and you should tell the user).
## Resources in this skill
- **`NOTES.md`** — judgment calls collected from reproducing Wei et al. pages 1–11: math notation conventions for the Wei paper, prose cleanup rules, figure placeholder template, table template, known typesetting anomalies. Read this before starting if the paper you're reproducing looks similar to Wei; use it as a decision reference in general.
- **`preamble_stable.tex`** — the standard preamble that has compiled cleanly across 11 pages. Copy this block to the top of new reconstructions, then add/remove packages as needed.
- **`helpers/`** — four atomic CLI tools for the mechanical steps:
- `extract_pdf_text.py` — wraps `pdftotext -layout -f N -l N`
- `clean_soft_hyphens.py` — strips U+00AD and U+2019, converts U+2013 to `--`
- `reflow_paragraphs.py` — collapses per-line output to paragraph-per-blank-line
- `escape_latex.py` — escapes LaTeX special characters in prose text
Each helper reads from stdin or a file arg and writes to stdout. They compose with Unix pipes.
- **`examples/wei-explicit-inverse/`** — eleven worked reconstructions of Wei et al. (Appl. Math. Modelling 134, 2024). Each is a complete `.tex` → `.pdf` pair you can open and read as a reference for what "good" looks like. See `examples/wei-explicit-inverse/README.md` for the per-page content map.
## When to consult which resource
| If the page has… | Consult |
|---|---|
| Only prose (no equations, no figures) | `examples/wei-explicit-inverse/wei_page1.tex` (two-column abstract block) |
| Equations + prose | `examples/wei-explicit-inverse/wei_page2.tex` or `wei_page4.tex` (dense, 7 equations) |
| A figure | `examples/wei-explicit-inverse/wei_page3.tex` (B-spline figure) or `wei_page5.tex` (two figures) |
| A result figure comparison grid | `examples/wei-explicit-inverse/wei_page9.tex` or `wei_page10.tex` |
| A table | `examples/wei-explicit-inverse/wei_page11.tex` (booktabs + multirow, 2-level column headers) |
| A typesetting anomaly (bold equation, unusual glyph) | NOTES.md §5 |
| Uncertainty about a math symbol | NOTES.md §4 |
## Scope boundary (what this skill intentionally does NOT do)
- Does NOT extract figures as images
- Does NOT redraw figures as TikZ
- Does NOT reproduce pixel-perfect publisher layout
- Does NOT build a BibTeX bibliography
- Does NOT handle scanned PDFs without a text layer
- Does NOT reproduce a whole paper in one invocation — use a shell loop for that:
```bash
for p in $(seq 1 22); do /reproduce-page-basic paper.pdf $p; done
```
The atomic-first design is deliberate; see `reproduce-page-basic-global-skill-blueprint/architecture.md` §"The Granularity Decision" for the McIlroy reasoning.
These are categorical limitations. Future skills (`reproduce-page-with-figures`, `reproduce-page-full`, `reproduce-paper`) would address them one at a time, when the need actually appears.
## Do-not-touch rules
- NEVER modify the input PDF. It is read-only.
- NEVER overwrite an existing `<stem>_page<N>.tex` without user confirmation. Reconstructions are work product, not scratch.
- NEVER "fix" typesetting anomalies in the source paper. Reproduce verbatim.
- NEVER retype prose from memory. It must come from `pdftotext` output.