CodeFreedomIntelligenceFree

COPYRIGHT NOTICE

<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA

Repo bundle on VersuzFreedomIntelligence/OpenClaw-Medical-Skills895 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/FreedomIntelligence/OpenClaw-Medical-Skills Yours? Claim it ↗

§ 01 — Stats

Stars2.5k

Prior1192

Quality—

Score—

Tasks—

§ 02 — Install

Get COPYRIGHT NOTICE.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bio-workflow-management-snakemake-workflows

Or clone the repo

$git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge freedomintelligence-openclaw-medical-skills-skills-bio-workflow-management-snakemake-workflows↵

Show SKILL.md content (~2.2k tokens)

<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA

-->

---
name: bio-workflow-management-snakemake-workflows
description: Build reproducible bioinformatics pipelines with Snakemake using rules, wildcards, and automatic dependency resolution. Use when creating Python-based workflows, automating multi-step analyses with make-like dependency tracking, or running pipelines on HPC clusters with SLURM.
tool_type: python
primary_tool: Snakemake
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
  - read_file
  - run_shell_command
---

# Snakemake Workflows

Compatible with Snakemake 7.x, 8.x, and 9.x. For Snakemake 8.0+, use `--executor` instead of `--cluster`.

## Basic Rule Structure

```python
# Snakefile

rule all:
    input:
        expand("results/{sample}_counts.txt", sample=SAMPLES)

rule align:
    input:
        r1 = "data/{sample}_R1.fq.gz",
        r2 = "data/{sample}_R2.fq.gz",
        index = "ref/genome.fa"
    output:
        bam = "aligned/{sample}.bam"
    threads: 8
    shell:
        "bwa mem -t {threads} {input.index} {input.r1} {input.r2} | "
        "samtools sort -@ {threads} -o {output.bam}"
```

## Config File

```yaml
# config.yaml
samples:
  - sample1
  - sample2
  - sample3

reference: "ref/genome.fa"
threads: 8
```

```python
# Snakefile
configfile: "config.yaml"

SAMPLES = config["samples"]
REF = config["reference"]

rule all:
    input:
        expand("results/{sample}.bam", sample=SAMPLES)
```

## Wildcards and Expand

```python
# Define samples
SAMPLES = ["A", "B", "C"]
CHROMOSOMES = [str(i) for i in range(1, 23)] + ["X", "Y"]

# Expand for all combinations
rule all:
    input:
        expand("results/{sample}_{chrom}.vcf", sample=SAMPLES, chrom=CHROMOSOMES)

# Access wildcard in rule
rule process:
    input:
        "data/{sample}.bam"
    output:
        "results/{sample}.vcf"
    params:
        sample = lambda wildcards: wildcards.sample
    shell:
        "bcftools call -s {params.sample} {input} > {output}"
```

## Python Integration

```python
rule analyze:
    input:
        counts = "data/{sample}_counts.txt"
    output:
        results = "results/{sample}_de.csv"
    run:
        import pandas as pd
        counts = pd.read_csv(input.counts, sep='\t')
        # Process data
        results = counts.groupby('gene').sum()
        results.to_csv(output.results)
```

## Conda Environments

```python
rule fastqc:
    input:
        "data/{sample}.fq.gz"
    output:
        "qc/{sample}_fastqc.html"
    conda:
        "envs/qc.yaml"
    shell:
        "fastqc {input} -o qc/"
```

```yaml
# envs/qc.yaml
channels:
  - bioconda
  - conda-forge
dependencies:
  - fastqc=0.12.1
  - multiqc=1.14
```

## Container Support

```python
rule align:
    input:
        r1 = "data/{sample}_R1.fq.gz"
    output:
        bam = "aligned/{sample}.bam"
    container:
        "docker://biocontainers/bwa:v0.7.17"
    shell:
        "bwa mem {input} | samtools sort -o {output}"
```

```bash
# Run with Singularity
snakemake --use-singularity --singularity-args "-B /data"
```

## Resource Management

```python
rule memory_intensive:
    input:
        "data/{sample}.bam"
    output:
        "results/{sample}.vcf"
    threads: 4
    resources:
        mem_mb = 16000,
        time = "4:00:00",
        gpu = 1
    shell:
        "variant_caller --threads {threads} {input} > {output}"
```

## Cluster Execution

```yaml
# cluster.yaml
__default__:
    partition: normal
    time: "1:00:00"
    mem: 4G

align:
    partition: normal
    time: "8:00:00"
    mem: 32G
    cpus-per-task: 8
```

```bash
# Snakemake 8.x with executor plugin
pip install snakemake-executor-plugin-slurm
snakemake --executor slurm --jobs 100

# Snakemake 7.x cluster execution
snakemake --cluster "sbatch --partition={cluster.partition} \
    --time={cluster.time} --mem={cluster.mem}" \
    --cluster-config cluster.yaml --jobs 100

# Or use profile (both versions)
snakemake --profile slurm
```

## Checkpoints

```python
# For rules that determine downstream files dynamically
checkpoint split_samples:
    input:
        "data/all_samples.txt"
    output:
        directory("split/")
    shell:
        "split_script.py {input} {output}"

def get_split_files(wildcards):
    checkpoint_output = checkpoints.split_samples.get(**wildcards).output[0]
    return expand("split/{sample}.txt", sample=glob_wildcards("split/{sample}.txt").sample)

rule aggregate:
    input:
        get_split_files
    output:
        "results/aggregated.txt"
    shell:
        "cat {input} > {output}"
```

## Modular Workflows

```python
# Snakefile
include: "rules/qc.smk"
include: "rules/align.smk"
include: "rules/call.smk"

rule all:
    input:
        rules.qc_all.input,
        rules.call_all.input
```

```python
# rules/qc.smk
rule fastqc:
    input: "data/{sample}.fq.gz"
    output: "qc/{sample}_fastqc.html"
    shell: "fastqc {input} -o qc/"

rule qc_all:
    input: expand("qc/{sample}_fastqc.html", sample=SAMPLES)
```

## Temporary and Protected Files

```python
rule align:
    input:
        "data/{sample}.fq.gz"
    output:
        bam = temp("aligned/{sample}.unsorted.bam"),  # Auto-deleted
        sorted = protected("aligned/{sample}.bam")     # Write-protected
    shell:
        "bwa mem {input} > {output.bam} && "
        "samtools sort {output.bam} -o {output.sorted}"
```

## Benchmarking

```python
rule heavy_computation:
    input:
        "data/{sample}.bam"
    output:
        "results/{sample}.txt"
    benchmark:
        "benchmarks/{sample}.tsv"
    shell:
        "heavy_tool {input} > {output}"
```

## Logging

```python
rule process:
    input:
        "data/{sample}.bam"
    output:
        "results/{sample}.txt"
    log:
        "logs/{sample}.log"
    shell:
        "(command {input} > {output}) 2> {log}"
```

## Run Commands

```bash
# Dry run (show what would be executed)
snakemake -n

# Run with 8 cores
snakemake --cores 8

# Force rerun of specific rule
snakemake --forcerun align

# Run until specific target
snakemake results/sample1.vcf

# Generate DAG visualization
snakemake --dag | dot -Tpdf > dag.pdf

# Generate report
snakemake --report report.html
```

## Complete RNA-seq Workflow

```python
configfile: "config.yaml"

SAMPLES = config["samples"]

rule all:
    input:
        "results/multiqc_report.html",
        "results/deseq2_results.csv"

rule fastp:
    input:
        r1 = "data/{sample}_R1.fq.gz",
        r2 = "data/{sample}_R2.fq.gz"
    output:
        r1 = "trimmed/{sample}_R1.fq.gz",
        r2 = "trimmed/{sample}_R2.fq.gz",
        json = "qc/{sample}_fastp.json"
    threads: 4
    shell:
        "fastp -i {input.r1} -I {input.r2} -o {output.r1} -O {output.r2} "
        "--json {output.json} --thread {threads}"

rule salmon_quant:
    input:
        r1 = "trimmed/{sample}_R1.fq.gz",
        r2 = "trimmed/{sample}_R2.fq.gz",
        index = config["salmon_index"]
    output:
        directory("salmon/{sample}")
    threads: 8
    shell:
        "salmon quant -i {input.index} -l A -1 {input.r1} -2 {input.r2} "
        "-o {output} --threads {threads}"

rule multiqc:
    input:
        expand("qc/{sample}_fastp.json", sample=SAMPLES),
        expand("salmon/{sample}", sample=SAMPLES)
    output:
        "results/multiqc_report.html"
    shell:
        "multiqc qc/ salmon/ -o results/"

rule deseq2:
    input:
        quants = expand("salmon/{sample}", sample=SAMPLES),
        metadata = config["metadata"]
    output:
        "results/deseq2_results.csv"
    script:
        "scripts/deseq2.R"
```

## Related Skills

- workflow-management/nextflow-pipelines - Nextflow alternative
- workflows/rnaseq-to-de - RNA-seq workflow
- read-qc/fastp-workflow - QC in pipelines


<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->