Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install willoscar-research-units-pipeline-skills-codex-skills-dedupe-rankgit clone https://github.com/WILLOSCAR/research-units-pipeline-skills.gitcp research-units-pipeline-skills/SKILL.MD ~/.claude/skills/willoscar-research-units-pipeline-skills-codex-skills-dedupe-rank/SKILL.md--- name: dedupe-rank description: | Use when a broad paper candidate pool needs deterministic deduplication and a stable core set. **Trigger**: dedupe, rank, core set, 去重, 排序, 精选论文, 核心集合. **Use when**: 检索后需要把广覆盖集合收敛成可管理的 core set(用于 taxonomy/outline/mapping)。 **Skip if**: 已经有人手工整理了稳定的 `papers/core_set.csv`(无需再次 churn)。 **Network**: none. **Guardrail**: 偏 deterministic;输出应可重复(稳定 paper_id、字段规范)。 --- # Dedupe + Rank Turns a raw candidate pool into a deduped pool and a stable core set. ## Input - `papers/papers_raw.jsonl` ## Outputs - `papers/papers_dedup.jsonl` - `papers/core_set.csv` ## Script boundary `scripts/run.py` should own only: - title/year deduplication - deterministic ranking - stable `paper_id` generation Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior. ## Contract-driven behavior The script should prefer pipeline contract metadata over profile-name branching. Current important field: - `quality_contract.candidate_pool_policy.keep_full_deduped_pool` If true, the script keeps the full deduped pool in `papers/core_set.csv` unless the user explicitly overrides core size. ## Acceptance - deduped JSONL exists - core-set CSV exists - reruns are stable for the same inputs ## Non-goals - retrieval - screening - manual topic authoring inside the script