Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-tcga-preprocessinggit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-tcga-preprocessing/SKILL.md---
name: tcga-bulk-data-preprocessing-with-omicverse
title: TCGA bulk data preprocessing with omicverse
description: Guide Claude through ingesting TCGA sample sheets, expression archives, and clinical carts into omicverse, initialising survival metadata, and exporting annotated AnnData files.
---
# TCGA bulk data preprocessing with omicverse
## Overview
Follow this skill to recreate the preprocessing routine from [`t_tcga.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_tcga.ipynb). It automates loading TCGA downloads, generating raw/normalised matrices, initialising metadata, and running survival analyses through `ov.bulk.pyTCGA`.
## Instructions
1. **Gather required downloads**
- Confirm the user has:
- `gdc_sample_sheet.<date>.tsv` from the TCGA Sample Sheet export.
- The decompressed `gdc_download_xxxxx` directory containing expression archives.
- The `clinical.cart.<date>` directory with clinical XML/JSON files.
- Mention that sample data are available under [`omicverse_guide/docs/Tutorials-bulk/data/TCGA_OV/`](../../omicverse_guide/docs/Tutorials-bulk/data/TCGA_OV/).
2. **Initialise the TCGA helper**
- Import `omicverse as ov` (and `scanpy as sc` if plotting) then call `ov.plot_set()`.
- Instantiate `aml_tcga = ov.bulk.pyTCGA(sample_sheet_path, download_dir, clinical_dir)`.
- Run `aml_tcga.adata_init()` to build the AnnData object with raw counts, FPKM, and TPM layers.
3. **Persist the dataset**
- Encourage saving the initial AnnData: `aml_tcga.adata.write_h5ad('data/TCGA_OV/ov_tcga_raw.h5ad', compression='gzip')`.
- When reloading, reconstruct the class with the same paths and call `aml_tcga.adata_read(<path>)`.
4. **Initialise metadata and clinical information**
- Populate sample metadata using `aml_tcga.adata_meta_init()` to convert gene IDs to symbols and attach patient info.
- Add survival attributes via `aml_tcga.survial_init()` (note the intentional spelling in the API).
5. **Perform survival analyses**
- Plot gene-level survival curves with `aml_tcga.survival_analysis('GENE', layer='deseq_normalize', plot=True)`.
- To process all genes, call `aml_tcga.survial_analysis_all()`; warn that it may take time.
6. **Export results**
- Save enriched metadata to a new AnnData file (`aml_tcga.adata.write_h5ad('.../ov_tcga_survial_all.h5ad', compression='gzip')`).
- Suggest exporting summary tables (e.g., survival statistics) if users need to share outputs outside Python.
7. **Troubleshooting tips**
- Ensure TCGA archives are fully extracted; missing XML/TSV files trigger parsing errors.
- The helper expects matching case IDs between the sample sheet and expression files—direct users to re-download if IDs do not
align.
- Survival plots require clinical dates; if absent, instruct users to check the `clinical_cart` contents.
## Examples
- "Read my TCGA OV download, initialise metadata, and plot MYC survival curves using DESeq-normalised counts."
- "Reload a saved AnnData file, attach survival annotations, and export the updated `.h5ad`."
- "Run survival analysis for all genes and store the enriched dataset."
## References
- Tutorial notebook: [`t_tcga.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_tcga.ipynb)
- Sample dataset: [`data/TCGA_OV/`](../../omicverse_guide/docs/Tutorials-bulk/data/TCGA_OV/)
- Quick copy/paste commands: [`reference.md`](reference.md)