Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bulk-deg-analysisgit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-bulk-deg-analysis/SKILL.md---
name: bulk-rna-seq-differential-expression-with-omicverse
title: Bulk RNA-seq differential expression with omicverse
description: Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse.
---
# Bulk RNA-seq differential expression with omicverse
## Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in [`t_deg.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_deg.ipynb). It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
## Instructions
1. **Set up the session**
- Import `omicverse as ov`, `scanpy as sc`, and `matplotlib.pyplot as plt`.
- Call `ov.plot_set()` so downstream plots adopt omicverse styling.
2. **Prepare ID mapping assets**
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via `ov.utils.download_geneid_annotation_pair()` and store them under `genesets/`.
- Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
3. **Load the raw counts**
- Read tab-delimited featureCounts output with `ov.pd.read_csv(..., sep='\t', header=1, index_col=0)`.
- Strip trailing `.bam` segments from column names using list comprehension so sample IDs are clean.
4. **Map gene identifiers**
- Run `ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')` to replace `gene_id` entries with gene symbols.
5. **Initialise the DEG object**
- Create `dds = ov.bulk.pyDEG(mapped_counts)`.
- Handle duplicate gene symbols with `dds.drop_duplicates_index()` to keep the highest expressed version.
6. **Normalise and estimate size factors**
- Execute `dds.normalize()` to calculate DESeq2 size factors, correcting for library size and batch differences.
7. **Run differential testing**
- Collect treatment and control replicate labels into lists.
- Call `dds.deg_analysis(treatment_groups, control_groups, method='ttest')` for the default Welch t-test.
- Offer optional alternatives: `method='edgepy'` for edgeR-like tests and `method='limma'` for limma-style modelling.
8. **Filter and threshold results**
- Note that lowly expressed genes are retained by default; filter using `dds.result.loc[dds.result['log2(BaseMean)'] > 1]` when needed.
- Set dynamic fold-change and significance cutoffs via `dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)` (`fc_threshold=-1` auto-selects based on log2FC distribution).
9. **Visualise differential expression**
- Produce volcano plots with `dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)` to highlight key genes.
- Generate per-gene boxplots using `dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...)`; adjust y-axis tick labels if required.
10. **Perform pathway enrichment (optional)**
- Download curated pathway libraries through `ov.utils.download_pathway_database()`.
- Load genesets with `ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...)`.
- Build the DEG gene list from `dds.result.loc[dds.result['sig'] != 'normal'].index`.
- Run enrichment with `ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...)`. Encourage users without internet access to provide a `background` gene list.
- Visualise single-library results via `ov.bulk.geneset_plot(...)` and combine multiple ontologies using `ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...)`.
11. **Document outputs**
- Suggest exporting `dds.result` and enrichment tables to CSV for downstream reporting.
- Encourage users to save figures generated by matplotlib (`plt.savefig(...)`) when running outside notebooks.
12. **Troubleshooting tips**
- Ensure sample labels in `treatment_groups`/`control_groups` exactly match column names post-cleanup.
- Verify required packages (`omicverse`, `pyComplexHeatmap`, `gseapy`) are installed for enrichment visualisations.
- Remind users that internet access is required the first time they download gene mappings or pathway databases.
## Examples
- "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."
## References
- Detailed walkthrough notebook: [`t_deg.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_deg.ipynb)
- Sample count matrix for testing: [`sample/counts.txt`](../../sample/counts.txt)
- Quick copy/paste commands: [`reference.md`](reference.md)