Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bio-research-tools-biomarker-signature-studiogit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-bio-research-tools-biomarker-signature-studio/SKILL.md<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA --> --- name: bio-research-tools-biomarker-signature-studio description: Multi-omic biomarker discovery studio that ingests expression + metadata, performs QC, multi-strategy feature selection, nested CV model training, survival analysis hooks, and SHAP-based interpretation. Use to design translational biomarker panels with documented evidence. tool_type: python primary_tool: scikit-learn depends_on: - machine-learning/biomarker-discovery - machine-learning/model-validation - machine-learning/omics-classifiers - differential-expression/de-results - workflow-management/biomarker-pipeline measurable_outcome: Run biomarker_signature_studio.py end-to-end on provided data within 20 minutes and produce metrics + feature rankings JSON artifacts. allowed-tools: - read_file - run_shell_command --- # Biomarker Signature Studio Design validated biomarker panels that are explainable, stable, and ready for translational follow-up. This skill stitches together the existing biomarker pipeline tooling, adds configurable feature-selection ensembles, a small survival-analysis hook, and artifact export so downstream lab teams can review QC outputs. ## What This Skill Does 1. **QC + Harmonization:** Align expression matrices (samples x features) with metadata, check label balance, and compute summary stats. 2. **Feature Selection Ensemble:** Supports Boruta, elastic-net stability, mutual-information top-K, and mRMR with optional intersection voting. 3. **Model Factory:** Trains multiple estimators (Logistic L1, RandomForest, XGBoost if present) under nested CV, picks champion by AUC. 4. **Explainability + Export:** Produces SHAP tables/plots when packages are available, exports feature rankings and model weights. 5. **Survival Hook:** If metadata contains `time_to_event` and `event` the skill computes concordance for selected features via Cox model. All logic lives in `scripts/biomarker_signature_studio.py`. ## Inputs - Expression matrix (`--expression`): CSV/TSV genes x samples or samples x genes (auto-detected by metadata match). - Metadata (`--metadata`): Must contain `--label-column`. Optional `--id-column` (default `sample_id`), `time_to_event`, `event`. - Optional gene list for filtering (`--feature-list`). - Output directory (`--output-dir`), created if missing. ## Quick CLI Usage ```bash python Skills/Research_Tools/Biomarker_Signature_Studio/scripts/biomarker_signature_studio.py \ --expression data/expression.csv \ --metadata data/metadata.csv \ --label-column phenotype \ --selectors boruta,lasso,mrmr \ --models rf,logit \ --output-dir outputs/biomarkers_run1 ``` Key flags: | Flag | Description | |------|-------------| | `--selectors` | Comma list of selection strategies (`boruta`, `lasso`, `mrmr`, `mi_topk`). | | `--models` | Models to evaluate (`logit`, `rf`, `xgb`). | | `--k-features` | Target number of features for `mrmr`/`mi_topk`. | | `--survival` | Enable Cox evaluation when survival columns exist. | | `--random-state` | Reproducibility. | | `--nested-folds` | Outer CV folds (default 5). | ## Workflow 1. Load + align inputs, infer orientation, impute missing values. 2. Standardize features (fit on train set only). 3. Run requested selectors; create intersection + union candidate lists. 4. For each selector output run nested CV training across requested models. 5. Export champion metrics (`metrics.json`), feature table (`selected_features.csv`), SHAP summary (`shap_summary.csv` when available), and survival stats (`survival.json`). ## QC Expectations - Class count ratio ≤3:1; warnings logged otherwise. - Selected features between 5 and 250 unless user overrides. - Nested CV AUC ≥0.70 or flagged in report. - SHAP overlap with selected features ≥60% (reported). ## Related Assets - `examples/configs/biomarker_studio_template.yaml` (scaffold for teams) - `scripts/biomarker_signature_studio.py` (entry point) - Existing biomarker workflow skill for orchestrated runs. Use this skill whenever you need a ready-to-review biomarker dossier (data QC, model metrics, explainability artifacts) before moving to validation cohorts or lab assays. <!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->