Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bio-alignment-indexinggit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-bio-alignment-indexing/SKILL.md<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
---
name: bio-alignment-indexing
description: Create and use BAI/CSI indices for BAM/CRAM files using samtools and pysam. Use when enabling random access to alignment files or fetching specific genomic regions.
tool_type: cli
primary_tool: samtools
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
---
# Alignment Indexing
Create indices for random access to alignment files using samtools and pysam.
## Index Types
| Index | Extension | Use Case |
|-------|-----------|----------|
| BAI | `.bai` | Standard BAM index, chromosomes < 512 Mbp |
| CSI | `.csi` | Large chromosomes, custom bin sizes |
| CRAI | `.crai` | CRAM index |
## samtools index
### Create BAI Index
```bash
samtools index input.bam
# Creates input.bam.bai
```
### Create CSI Index
```bash
samtools index -c input.bam
# Creates input.bam.csi
```
### Specify Output Name
```bash
samtools index input.bam output.bai
```
### Multi-threaded Indexing
```bash
samtools index -@ 4 input.bam
```
### Index CRAM
```bash
samtools index input.cram
# Creates input.cram.crai
```
## Index Requirements
Indexing requires coordinate-sorted files:
```bash
# Check sort order
samtools view -H input.bam | grep "^@HD"
# Should show SO:coordinate
# Sort if needed, then index
samtools sort -o sorted.bam input.bam
samtools index sorted.bam
```
## Using Indices for Region Access
### samtools view with Region
```bash
# Requires index file present
samtools view input.bam chr1:1000000-2000000
```
### Multiple Regions
```bash
samtools view input.bam chr1:1000-2000 chr2:3000-4000
```
### Regions from BED File
```bash
samtools view -L regions.bed input.bam
```
## pysam Python Alternative
### Create Index
```python
import pysam
pysam.index('input.bam')
# Creates input.bam.bai
```
### Create CSI Index
```python
pysam.index('input.bam', 'input.bam.csi', csi=True)
```
### Fetch with Index
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
# fetch() requires index
for read in bam.fetch('chr1', 1000000, 2000000):
print(read.query_name)
```
### Check if Indexed
```python
import pysam
from pathlib import Path
def is_indexed(bam_path):
bam_path = Path(bam_path)
return (bam_path.with_suffix('.bam.bai').exists() or
Path(str(bam_path) + '.bai').exists() or
bam_path.with_suffix('.bam.csi').exists())
if not is_indexed('input.bam'):
pysam.index('input.bam')
```
### Fetch Multiple Regions
```python
regions = [('chr1', 1000, 2000), ('chr1', 5000, 6000), ('chr2', 1000, 2000)]
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for chrom, start, end in regions:
count = sum(1 for _ in bam.fetch(chrom, start, end))
print(f'{chrom}:{start}-{end}: {count} reads')
```
### Count Reads in Region
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
count = bam.count('chr1', 1000000, 2000000)
print(f'Reads in region: {count}')
```
### Get Reads Covering Position
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam.fetch('chr1', 1000000, 1000001):
if read.reference_start <= 1000000 < read.reference_end:
print(f'{read.query_name} covers position 1000000')
```
## Index File Locations
samtools looks for indices in two locations:
```
input.bam.bai # Standard location
input.bai # Alternative location
```
For CRAM:
```
input.cram.crai
```
## idxstats - Index Statistics
### Get Per-Chromosome Counts
```bash
samtools idxstats input.bam
```
Output format:
```
chr1 248956422 5000000 0
chr2 242193529 4500000 0
* 0 0 10000
```
Columns: reference name, length, mapped reads, unmapped reads
### Sum Total Mapped Reads
```bash
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
```
### pysam idxstats
```python
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for stat in bam.get_index_statistics():
print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
```
## FASTA Index (faidx)
Related but different - index reference FASTA for random access:
```bash
samtools faidx reference.fa
# Creates reference.fa.fai
# Fetch region from indexed FASTA
samtools faidx reference.fa chr1:1000-2000
```
### pysam FastaFile
```python
with pysam.FastaFile('reference.fa') as ref:
seq = ref.fetch('chr1', 1000, 2000)
print(seq)
```
## Quick Reference
| Task | samtools | pysam |
|------|----------|-------|
| Create BAI | `samtools index file.bam` | `pysam.index('file.bam')` |
| Create CSI | `samtools index -c file.bam` | `pysam.index('file.bam', csi=True)` |
| Fetch region | `samtools view file.bam chr1:1-1000` | `bam.fetch('chr1', 0, 1000)` |
| Count in region | `samtools view -c file.bam chr1:1-1000` | `bam.count('chr1', 0, 1000)` |
| Index stats | `samtools idxstats file.bam` | `bam.get_index_statistics()` |
| Index FASTA | `samtools faidx ref.fa` | Automatic with FastaFile |
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `random alignment retrieval only works for indexed BAM` | Missing index | Run `samtools index file.bam` |
| `file is not sorted` | Unsorted BAM | Sort first with `samtools sort` |
| `chromosome not found` | Wrong chromosome name | Check names with `samtools view -H` |
## Related Skills
- sam-bam-basics - View and convert alignment files
- alignment-sorting - Sort BAM files (required before indexing)
- alignment-filtering - Filter by regions using index
- bam-statistics - Use idxstats for quick counts
- sequence-io/read-sequences - Index FASTA with SeqIO.index_db()
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->