Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-scrna-orchestratorgit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-scrna-orchestrator/SKILL.md---
name: scrna-orchestrator
description: Local Scanpy pipeline for single-cell RNA-seq QC, clustering, marker discovery, and optional two-group differential expression from raw-count .h5ad.
version: 0.1.0
author: Yonghao Zhao
license: MIT
tags: [scrna, single-cell, scanpy, clustering, differential-expression]
metadata:
openclaw:
requires:
bins:
- python3
env: []
config: []
always: false
emoji: "🦖"
homepage: https://github.com/ClawBio/ClawBio
os: [macos, linux]
install:
- kind: uv
package: scanpy
bins: []
- kind: uv
package: anndata
bins: []
trigger_keywords:
- scrna
- single-cell
- scanpy
- h5ad
- leiden
- marker genes
- differential expression
---
# 🦖 scRNA Orchestrator
You are **scRNA Orchestrator**, a specialised ClawBio agent for local single-cell RNA-seq analysis with Scanpy.
## Why This Exists
Single-cell workflows are easy to misconfigure and hard to reproduce when run ad hoc.
- **Without it**: Users manually stitch QC, normalization, clustering, and marker/DE steps with inconsistent defaults.
- **With it**: One command produces a consistent `report.md`, figures, tables, and reproducibility bundle.
- **Why ClawBio**: The workflow is local-first, explicit about assumptions (raw counts), and ships machine-readable outputs.
## Core Capabilities
1. **QC and Filtering**: Mitochondrial percentage filtering and min genes/cells thresholds.
2. **Preprocessing**: Library-size normalization, `log1p`, and HVG selection.
3. **Embedding and Clustering**: PCA, neighbors graph, UMAP, Leiden clustering.
4. **Cluster Markers**: Wilcoxon cluster-vs-rest marker detection.
5. **Optional Group DE (v1)**: Two-group Wilcoxon DE on any `obs` column.
6. **Optional Volcano Plot**: Generate DE volcano plot with `--de-volcano`.
7. **Reporting**: Markdown report, CSV/TSV tables, PNG figures, reproducibility files.
## Input Formats
| Format | Extension | Required Fields | Example |
|--------|-----------|-----------------|---------|
| AnnData raw counts | `.h5ad` | Raw count matrix in `X`; cell metadata in `obs`; gene metadata in `var` | `pbmc_raw.h5ad` |
| Demo mode | n/a | none | `python clawbio.py run scrna --demo` |
Notes:
- Processed/normalized/scaled `.h5ad` inputs are rejected with an actionable error.
- `pbmc3k_processed`-style inputs are out of scope for this skill.
## Workflow
When the user asks for scRNA QC/clustering/markers/DE:
1. **Validate**: Check `.h5ad` input (or `--demo`), and reject processed-like matrices.
2. **Process**: Run QC filtering, normalization, HVG selection, PCA, neighbors, UMAP, and Leiden.
3. **Analyze**:
- Always run cluster marker analysis (`leiden`, Wilcoxon).
- Optionally run DE if `--de-groupby --de-group1 --de-group2` are all provided.
4. **Generate**: Write `report.md`, `result.json`, tables, figures, and reproducibility bundle.
## CLI Reference
```bash
# Standard usage
python skills/scrna-orchestrator/scrna_orchestrator.py \
--input <input.h5ad> --output <report_dir>
# Demo mode
python skills/scrna-orchestrator/scrna_orchestrator.py \
--demo --output <report_dir>
# Optional two-group DE
python skills/scrna-orchestrator/scrna_orchestrator.py \
--input <input.h5ad> --output <report_dir> \
--de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b>
# Optional DE volcano plot
python skills/scrna-orchestrator/scrna_orchestrator.py \
--input <input.h5ad> --output <report_dir> \
--de-groupby <obs_column> --de-group1 <group_a> --de-group2 <group_b> \
--de-volcano
# Via ClawBio runner
python clawbio.py run scrna --input <input.h5ad> --output <report_dir>
python clawbio.py run scrna --demo
```
## Demo
```bash
python clawbio.py run scrna --demo
```
Expected output:
- `report.md` with QC, clustering, and marker summaries
- figure files (`qc_violin.png`, `umap_leiden.png`, `marker_dotplot.png`)
- optional DE figure (`de_volcano.png`) when `--de-volcano` is set
- marker tables and reproducibility bundle
## Algorithm / Methodology
1. **QC**:
- Compute QC metrics (`n_genes_by_counts`, `total_counts`, `pct_counts_mt`)
- Filter by `min_genes`, `min_cells`, `max_mt_pct`
2. **Preprocess**:
- Normalize total counts to `1e4`
- Apply `log1p`
- Select HVGs (`flavor="seurat"`)
3. **Embed and cluster**:
- Scale (`max_value=10`)
- PCA, neighbors graph, UMAP
- Leiden clustering
4. **Markers**:
- `scanpy.tl.rank_genes_groups(groupby="leiden", method="wilcoxon", pts=True)`
5. **Optional DE v1**:
- `scanpy.tl.rank_genes_groups(groupby=<de_groupby>, groups=[group1], reference=group2, method="wilcoxon", pts=True)`
- Export full statistics and top genes by score
6. **Optional volcano plot**:
- Plot `logfoldchanges` vs `-log10(pvals_adj)` (fallback to `pvals` if needed)
- Highlight genes with `p < 0.05` and `|log2FC| >= 1`
## Example Queries
- "Run standard QC and clustering on my h5ad file"
- "Find marker genes for each cluster"
- "Generate a UMAP coloured by cluster"
- "Run differential expression for treated vs control"
## Output Structure
```text
output_directory/
├── report.md
├── result.json
├── figures/
│ ├── qc_violin.png
│ ├── umap_leiden.png
│ ├── marker_dotplot.png
│ └── de_volcano.png # only when DE volcano is enabled
├── tables/
│ ├── cluster_summary.csv
│ ├── markers_top.csv
│ ├── markers_top.tsv
│ ├── de_full.csv # only when DE is enabled
│ └── de_top.csv # only when DE is enabled
└── reproducibility/
├── commands.sh
├── environment.yml
└── checksums.sha256
```
## Dependencies
**Required**:
- `scanpy` >= 1.10
- `anndata` >= 0.10
- `numpy`, `pandas`, `matplotlib`, `leidenalg`, `python-igraph`
**Optional (future)**:
- `celltypist` (cell-type annotation)
- `scvi-tools` (deep generative modeling)
## Safety
- **Local-first**: No patient data upload.
- **Disclaimer**: Reports include the ClawBio medical disclaimer.
- **Input guardrails**: Rejects processed-like matrices to reduce invalid biological inferences.
- **Reproducibility**: Writes command/environment/checksum bundle.
## Integration with Bio Orchestrator
**Trigger conditions**:
- File extension `.h5ad`
- User intent includes scRNA terms (single-cell, Scanpy, clustering, marker genes, DE)
**Current limitations**:
- Raw-count `.h5ad` only
- Seurat input/output is not implemented in Python path
- Multi-group pairwise DE, within-cluster DE, and automated annotation are future work
## Citations
- [Scanpy documentation](https://scanpy.readthedocs.io/) — analysis API and methods.
- [AnnData documentation](https://anndata.readthedocs.io/) — data model.
- [Leiden algorithm paper](https://www.nature.com/articles/s41598-019-41695-z) — community detection.