DocumentVivekKarmarkarFree

analyze-paper-against-cluster

Given a query paper, find its semantic cluster in the user's PDF library, read every paper in full, define the cluster's shared methodological class, and judge whether the query paper is (a) an intra-class incremental tweak, (b) a substantive class-jump novelty, or (c) just an application-space sibling that's not methodologically relevant. Use when the user asks "is this paper really new or just another one in the same bucket?"

Repo bundle on VersuzVivekKarmarkar/claude-code-os810 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/VivekKarmarkar/claude-code-os Yours? Claim it ↗

§ 01 — Stats

Prior1090

Quality—

Score—

Tasks—

§ 02 — Install

Get analyze-paper-against-cluster.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install vivekkarmarkar-claude-code-os-skills-analyze-paper-against-cluster

Or clone the repo

$git clone https://github.com/VivekKarmarkar/claude-code-os.git

Or copy the SKILL.md manually

$cp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-analyze-paper-against-cluster/SKILL.md

More Versuz picks

★ Featured$0.99

vz-scrape-runner

Web

★ Featured$1.99

vz-bench-debug

Document

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge vivekkarmarkar-claude-code-os-skills-analyze-paper-against-cluster↵

Show SKILL.md content (~1.0k tokens)

---
name: analyze-paper-against-cluster
description: Given a query paper, find its semantic cluster in the user's PDF library, read every paper in full, define the cluster's shared methodological class, and judge whether the query paper is (a) an intra-class incremental tweak, (b) a substantive class-jump novelty, or (c) just an application-space sibling that's not methodologically relevant. Use when the user asks "is this paper really new or just another one in the same bucket?"
---

# analyze-paper-against-cluster

Three-stage pipeline: cluster → read → judge.

## Arguments

- `<query>` — paper PDF path (preferred) or title
- `--root <dir>` (optional, default: CWD) — search root for the cluster
- `--top <N>` (optional, default: 5) — how many neighbors form the cluster
- `--threshold <0-1>` (optional, default: 0.15) — minimum similarity for cluster membership

## Pipeline

### Step 1 — Find the cluster

Delegate to `find-similar-papers`:

```bash
python3 ~/.claude/skills/find-similar-papers/helpers/find_similar.py \
  --root "<root>" \
  --query-path "<pdf>" \
  --top <N> --threshold <thresh> --exclude-self
```

Collect the returned paths (the "cluster"). If fewer than 3 neighbors clear the threshold, STOP and report — there is no real cluster to define.

### Step 2 — Read every paper in full

For each PDF in the cluster + the query paper:

```bash
pdftotext -layout "<pdf>" /tmp/cluster_analysis/<idx>_<short_name>.txt
```

Then **read every file in full** with the Read tool (use offset/limit if needed). Do not summarize from abstracts only. The user explicitly cares about reading the whole paper, not just the intro.

### Step 3 — Define the class DNA

Synthesize what unifies the cluster. Write a single paragraph describing:
- **Input** — what physical measurement they take
- **Model** — local vs global, what physics or learning is invoked
- **Output** — what they actually produce (point estimate, surface map, volumetric field, classification, etc.)
- **Search/Sampling strategy** — how they decide what to do next
- **Scope** — surface only, single-layer, full volume, etc.

This paragraph is the **class definition**. If you cannot write it in one paragraph, the cluster isn't coherent — say so.

### Step 4 — Judge the query paper

Compare the query against the class definition along the same five axes. Then return one of three verdicts:

1. **Intra-class incremental** — same input, same model family, same output, just a smarter sub-component (better acquisition function, better controller, better sensor). Not worth deep attention.
2. **Substantive class-jump** — solves a genuinely different mathematical problem (e.g., local point estimation vs global field reconstruction, forward vs inverse problem, single-modality vs multi-modality fusion). Demands attention.
3. **Application-space sibling, methodologically distinct** — shares the application (e.g., tumor detection) but the methodology is in a different class entirely. Belongs in the broad lit review but not the core baseline list.

### Step 5 — Report

Return a structured response:

```
CLUSTER DEFINITION (1 paragraph)
─────────────────────────────────
[the class DNA]

QUERY PAPER POSITION
─────────────────────────────────
Verdict: [intra-class incremental | class-jump | application-sibling]
Justification: [2-3 sentences citing the specific axis where it differs or doesn't]
```

Keep the report tight. The user wants the verdict, not a literature review.

## Do-not-touch rules

- Read papers in full. Do not shortcut to abstract-only analysis — the user explicitly pushed back on this.
- Do not invent class boundaries. If the cluster is incoherent, say so.
- Do not hedge. Pick one of the three verdicts.