CodeFreedomIntelligenceFree

COPYRIGHT NOTICE

<!-- # COPYRIGHT NOTICE # This file is part of the "Universal Biomedical Skills" project. # Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu> # All Rights Reserved. # # This code is proprietary and confidential. # Unauthorized copying of this file, via any medium is strictly prohibited. # # Provenance: Authenticated by MD BABU MIA

Repo bundle on VersuzFreedomIntelligence/OpenClaw-Medical-Skills895 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/FreedomIntelligence/OpenClaw-Medical-Skills Yours? Claim it ↗

§ 01 — Stats

Stars2.5k

Prior1192

Quality—

Score—

Tasks—

§ 02 — Install

Get COPYRIGHT NOTICE.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bio-workflows-metagenomics-pipeline

Or clone the repo

$git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge freedomintelligence-openclaw-medical-skills-skills-bio-workflows-metagenomics-pipeline↵

Show SKILL.md content (~2.2k tokens)

<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA

-->

---
name: bio-workflows-metagenomics-pipeline
description: End-to-end metagenomics workflow from FASTQ to taxonomic and functional profiles. Covers Kraken2 classification, Bracken abundance estimation, and HUMAnN functional profiling. Use when profiling metagenomic samples.
tool_type: cli
primary_tool: Kraken2
workflow: true
depends_on:
  - read-qc/fastp-workflow
  - metagenomics/kraken-classification
  - metagenomics/metaphlan-profiling
  - metagenomics/abundance-estimation
  - metagenomics/functional-profiling
  - metagenomics/metagenome-visualization
qc_checkpoints:
  - after_qc: "Q30 >80%, host reads removed"
  - after_classification: "Classification rate >60%, known taxa dominant"
  - after_functional: "Pathway coverage reasonable, unmapped <50%"
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
  - read_file
  - run_shell_command
---

# Metagenomics Pipeline

Complete workflow from metagenomic FASTQ to taxonomic and functional profiles.

## Workflow Overview

```
FASTQ files
    |
    v
[1. QC & Host Removal] --> fastp + Bowtie2
    |
    v
[2. Taxonomic Classification]
    |
    +---> Kraken2 + Bracken (fast, database-dependent)
    |
    +---> MetaPhlAn (marker-based, standardized)
    |
    v
[3. Functional Profiling] --> HUMAnN
    |
    v
Taxonomic profiles + Pathway abundances
```

## Primary Path: Kraken2 + Bracken + HUMAnN

### Step 1: Quality Control and Host Removal

```bash
# QC with fastp
for sample in sample1 sample2 sample3; do
    fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
        -o trimmed/${sample}_R1.fq.gz -O trimmed/${sample}_R2.fq.gz \
        --detect_adapter_for_pe \
        --qualified_quality_phred 20 \
        --length_required 50 \
        --html qc/${sample}_fastp.html
done

# Remove host reads (human example)
for sample in sample1 sample2 sample3; do
    bowtie2 -p 8 -x human_index \
        -1 trimmed/${sample}_R1.fq.gz \
        -2 trimmed/${sample}_R2.fq.gz \
        --un-conc-gz host_removed/${sample}_R%.fq.gz \
        > /dev/null 2> qc/${sample}_host_removal.log
done
```

### Step 2A: Kraken2 Classification

```bash
# Classify reads
for sample in sample1 sample2 sample3; do
    kraken2 --db kraken2_db \
        --threads 8 \
        --paired \
        --report kraken/${sample}.report \
        --output kraken/${sample}.output \
        host_removed/${sample}_R1.fq.gz \
        host_removed/${sample}_R2.fq.gz
done
```

### Step 2B: Bracken Abundance Estimation

```bash
# Estimate species abundance
for sample in sample1 sample2 sample3; do
    bracken -d kraken2_db \
        -i kraken/${sample}.report \
        -o bracken/${sample}.species.txt \
        -r 150 \
        -l S \
        -t 10
done

# Combine samples into abundance matrix
combine_bracken_outputs.py \
    --files bracken/*.species.txt \
    -o bracken/combined_species.txt
```

### Step 2C: Alternative - MetaPhlAn Profiling

```bash
# Profile with MetaPhlAn 4
for sample in sample1 sample2 sample3; do
    metaphlan host_removed/${sample}_R1.fq.gz,host_removed/${sample}_R2.fq.gz \
        --bowtie2out metaphlan/${sample}.bowtie2.bz2 \
        --input_type fastq \
        --nproc 8 \
        -o metaphlan/${sample}_profile.txt
done

# Merge profiles
merge_metaphlan_tables.py metaphlan/*_profile.txt > metaphlan/merged_abundance.txt
```

### Step 3: Functional Profiling with HUMAnN

```bash
# Run HUMAnN
for sample in sample1 sample2 sample3; do
    # Concatenate paired reads
    cat host_removed/${sample}_R1.fq.gz host_removed/${sample}_R2.fq.gz > \
        host_removed/${sample}_concat.fq.gz

    humann --input host_removed/${sample}_concat.fq.gz \
        --output humann/${sample} \
        --threads 8 \
        --metaphlan-options "--bowtie2db metaphlan_db"
done

# Normalize and join tables
humann_renorm_table --input humann/sample1/sample1_pathabundance.tsv \
    --output humann/sample1/sample1_pathabundance_cpm.tsv \
    --units cpm

humann_join_tables --input humann \
    --output humann/merged_pathabundance.tsv \
    --file_name pathabundance
```

### Visualization

```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load Bracken species table
species = pd.read_csv('bracken/combined_species.txt', sep='\t', index_col=0)

# Top 20 species heatmap
top20 = species.sum(axis=1).nlargest(20).index
plt.figure(figsize=(12, 8))
sns.heatmap(species.loc[top20], cmap='viridis', annot=False)
plt.title('Top 20 Species Abundance')
plt.tight_layout()
plt.savefig('top20_species_heatmap.pdf')

# Stacked bar plot
species_norm = species.div(species.sum()) * 100
top10 = species_norm.sum(axis=1).nlargest(10).index
other = species_norm.loc[~species_norm.index.isin(top10)].sum()

plot_data = species_norm.loc[top10].T
plot_data['Other'] = other
plot_data.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.ylabel('Relative Abundance (%)')
plt.legend(bbox_to_anchor=(1.05, 1))
plt.tight_layout()
plt.savefig('species_barplot.pdf')
```

## Parameter Recommendations

| Step | Parameter | Value |
|------|-----------|-------|
| fastp | --length_required | 50 (metagenomic reads) |
| Kraken2 | --confidence | 0.0 (default) or 0.1 |
| Bracken | -r | Read length (e.g., 150) |
| Bracken | -l | S (species) or G (genus) |
| Bracken | -t | 10 (min reads threshold) |
| MetaPhlAn | --min_cu_len | 2000 (default) |
| HUMAnN | --threads | 8+ |

## Troubleshooting

| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| Low classification rate | Database mismatch, novel organisms | Try different database, check sample type |
| High unclassified | Novel microbes, host contamination | Remove host, use larger database |
| High host reads | Incomplete host removal | Use multiple host reference genomes |
| HUMAnN slow | Large files | Increase threads, pre-filter reads |

## Complete Pipeline Script

```bash
#!/bin/bash
set -e

THREADS=8
KRAKEN_DB="kraken2_standard_db"
HOST_INDEX="human_bt2_index"
SAMPLES="sample1 sample2 sample3"
OUTDIR="metagenomics_results"

mkdir -p ${OUTDIR}/{trimmed,host_removed,kraken,bracken,metaphlan,humann,qc}

# Step 1: QC
echo "=== QC ==="
for sample in $SAMPLES; do
    fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz \
        -o ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
        -O ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
        --length_required 50 \
        --html ${OUTDIR}/qc/${sample}_fastp.html -w ${THREADS}
done

# Host removal
echo "=== Host Removal ==="
for sample in $SAMPLES; do
    bowtie2 -p ${THREADS} -x ${HOST_INDEX} \
        -1 ${OUTDIR}/trimmed/${sample}_R1.fq.gz \
        -2 ${OUTDIR}/trimmed/${sample}_R2.fq.gz \
        --un-conc-gz ${OUTDIR}/host_removed/${sample}_R%.fq.gz \
        > /dev/null 2> ${OUTDIR}/qc/${sample}_host.log
done

# Step 2: Kraken2
echo "=== Kraken2 ==="
for sample in $SAMPLES; do
    kraken2 --db ${KRAKEN_DB} --threads ${THREADS} --paired \
        --report ${OUTDIR}/kraken/${sample}.report \
        --output ${OUTDIR}/kraken/${sample}.output \
        ${OUTDIR}/host_removed/${sample}_R1.fq.gz \
        ${OUTDIR}/host_removed/${sample}_R2.fq.gz
done

# Bracken
echo "=== Bracken ==="
for sample in $SAMPLES; do
    bracken -d ${KRAKEN_DB} \
        -i ${OUTDIR}/kraken/${sample}.report \
        -o ${OUTDIR}/bracken/${sample}.species.txt \
        -r 150 -l S -t 10
done

echo "=== Pipeline Complete ==="
echo "Kraken reports: ${OUTDIR}/kraken/"
echo "Bracken abundances: ${OUTDIR}/bracken/"
```

## Related Skills

- metagenomics/kraken-classification - Kraken2 details
- metagenomics/metaphlan-profiling - MetaPhlAn parameters
- metagenomics/abundance-estimation - Bracken options
- metagenomics/functional-profiling - HUMAnN workflow
- metagenomics/metagenome-visualization - Plotting functions


<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->