Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-find-paper-by-titlegit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-find-paper-by-title/SKILL.md--- name: find-paper-by-title description: Search a directory tree for PDF(s) whose title matches a query string. Uses PDF metadata first (fast), falls back to filename and first-page text if needed. Groups byte-identical duplicates via MD5 so you see which copies are the same file vs different versions. Use when the user asks "do I have a paper called X?" or "where is the paper with title Y in my folder?" or wants to dedupe papers across subfolders. --- # find-paper-by-title Fast PDF title search across a directory tree. Returns all matches, grouped by content hash so duplicates are visible. ## Arguments - `<title>` — paper title (may be truncated or paraphrased) - `--root <dir>` (optional, default: current directory) — where to search - `--threshold <0-1>` (optional, default: 0.5) — minimum match score - `--deep` (optional) — also extract first-page text for PDFs whose metadata is missing/junky (slower) ## Pipeline ### Step 0 — Determine the search root If the user named a directory, use it. Otherwise default to `$PWD`. For this project, the common search root is the PAT-Scan repo. ### Step 1 — Run the search ```bash python3 ~/.claude/skills/find-paper-by-title/helpers/search_paper.py \ "<title>" \ --root "<dir>" \ [--threshold 0.5] \ [--deep] ``` The helper: 1. Walks the tree collecting all `*.pdf` files. 2. For each PDF, extracts candidate titles from PDF metadata (`pdfinfo`) and normalized filename. 3. Scores each candidate against the query using token overlap + substring match. 4. Keeps matches above the threshold. 5. Computes MD5 of each match and groups duplicates. ### Step 2 — Report The helper prints human-readable output by default: - Number of matches and distinct files - For each distinct file: title, score, MD5, and all paths where that content lives - `[DUPLICATE]` tag if the same file appears in multiple locations, `[UNIQUE]` otherwise If the user wants machine-readable output, pass `--json`. ## Examples **Exact title, no duplicates:** ``` python3 search_paper.py "A novel tactile tomography system based on mechanical principles for internal 3D imaging" --root /path/to/PAT-Scan ``` **Truncated title (OK because normalized token match handles it):** ``` python3 search_paper.py "JAX-SSO differentiable finite element analysis solver" --root /path/to/PAT-Scan ``` **Paper stored under a filename that doesn't mention the title (fallback to `--deep`):** ``` python3 search_paper.py "<paraphrased title>" --root /path/to/papers --deep ``` ## When to use `--deep` Default mode relies on PDF metadata + filename. It's fast (~5 seconds for 250 PDFs). Use `--deep` only when: - A paper is known to exist but the default search can't find it - Metadata-less scanned PDFs or bare arXiv dumps are in play - The user explicitly wants exhaustive search `--deep` adds first-page text extraction, which is still tolerable (~30s for 250 PDFs) but not needed most of the time. ## Do-not-touch rules - Never modify or move any PDF. This skill is read-only. - Filename-based "normalization" for matching is internal; the actual filename on disk is not touched.