Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-analyze-paper-against-clustergit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-analyze-paper-against-cluster/SKILL.md--- name: analyze-paper-against-cluster description: Given a query paper, find its semantic cluster in the user's PDF library, read every paper in full, define the cluster's shared methodological class, and judge whether the query paper is (a) an intra-class incremental tweak, (b) a substantive class-jump novelty, or (c) just an application-space sibling that's not methodologically relevant. Use when the user asks "is this paper really new or just another one in the same bucket?" --- # analyze-paper-against-cluster Three-stage pipeline: cluster → read → judge. ## Arguments - `<query>` — paper PDF path (preferred) or title - `--root <dir>` (optional, default: CWD) — search root for the cluster - `--top <N>` (optional, default: 5) — how many neighbors form the cluster - `--threshold <0-1>` (optional, default: 0.15) — minimum similarity for cluster membership ## Pipeline ### Step 1 — Find the cluster Delegate to `find-similar-papers`: ```bash python3 ~/.claude/skills/find-similar-papers/helpers/find_similar.py \ --root "<root>" \ --query-path "<pdf>" \ --top <N> --threshold <thresh> --exclude-self ``` Collect the returned paths (the "cluster"). If fewer than 3 neighbors clear the threshold, STOP and report — there is no real cluster to define. ### Step 2 — Read every paper in full For each PDF in the cluster + the query paper: ```bash pdftotext -layout "<pdf>" /tmp/cluster_analysis/<idx>_<short_name>.txt ``` Then **read every file in full** with the Read tool (use offset/limit if needed). Do not summarize from abstracts only. The user explicitly cares about reading the whole paper, not just the intro. ### Step 3 — Define the class DNA Synthesize what unifies the cluster. Write a single paragraph describing: - **Input** — what physical measurement they take - **Model** — local vs global, what physics or learning is invoked - **Output** — what they actually produce (point estimate, surface map, volumetric field, classification, etc.) - **Search/Sampling strategy** — how they decide what to do next - **Scope** — surface only, single-layer, full volume, etc. This paragraph is the **class definition**. If you cannot write it in one paragraph, the cluster isn't coherent — say so. ### Step 4 — Judge the query paper Compare the query against the class definition along the same five axes. Then return one of three verdicts: 1. **Intra-class incremental** — same input, same model family, same output, just a smarter sub-component (better acquisition function, better controller, better sensor). Not worth deep attention. 2. **Substantive class-jump** — solves a genuinely different mathematical problem (e.g., local point estimation vs global field reconstruction, forward vs inverse problem, single-modality vs multi-modality fusion). Demands attention. 3. **Application-space sibling, methodologically distinct** — shares the application (e.g., tumor detection) but the methodology is in a different class entirely. Belongs in the broad lit review but not the core baseline list. ### Step 5 — Report Return a structured response: ``` CLUSTER DEFINITION (1 paragraph) ───────────────────────────────── [the class DNA] QUERY PAPER POSITION ───────────────────────────────── Verdict: [intra-class incremental | class-jump | application-sibling] Justification: [2-3 sentences citing the specific axis where it differs or doesn't] ``` Keep the report tight. The user wants the verdict, not a literature review. ## Do-not-touch rules - Read papers in full. Do not shortcut to abstract-only analysis — the user explicitly pushed back on this. - Do not invent class boundaries. If the cluster is incoherent, say so. - Do not hedge. Pick one of the three verdicts.