Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-bio-sequence-propertiesgit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-bio-sequence-properties/SKILL.md<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
---
name: bio-sequence-properties
description: Calculate sequence properties like GC content, molecular weight, isoelectric point, and GC skew using Biopython. Use when analyzing sequence composition, computing physical properties, or comparing sequences.
tool_type: python
primary_tool: Bio.SeqUtils
measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes.
allowed-tools:
- read_file
- run_shell_command
---
# Sequence Properties
Calculate physical and chemical properties of biological sequences using Biopython.
## Required Imports
```python
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction, molecular_weight, GC123, GC_skew, nt_search, seq1, seq3
from Bio.SeqUtils.ProtParam import ProteinAnalysis
```
## DNA/RNA Properties
### GC Content
```python
from Bio.SeqUtils import gc_fraction
seq = Seq('ATGCGATCGATCGATCGATCG')
gc = gc_fraction(seq) # Returns 0.476... (fraction)
gc_percent = gc * 100 # Convert to percentage
```
Handle ambiguous bases:
```python
gc = gc_fraction(seq, ambiguous='ignore') # Ignore N bases in calculation
gc = gc_fraction(seq, ambiguous='weighted') # Weight by probability
```
### GC at Codon Positions (GC123)
Analyze GC content at each codon position (useful for codon bias analysis):
```python
from Bio.SeqUtils import GC123
seq = Seq('ATGCGATCGATCGATCGATCG')
gc_total, gc_pos1, gc_pos2, gc_pos3 = GC123(seq)
# gc_total: overall GC%
# gc_pos1: GC% at 1st codon position
# gc_pos2: GC% at 2nd codon position
# gc_pos3: GC% at 3rd codon position (wobble)
```
### GC Skew
Calculate (G-C)/(G+C) in sliding windows to identify replication origins:
```python
from Bio.SeqUtils import GC_skew
seq = Seq('ATGCGATCGATCGATCGATCG' * 10)
skew_values = GC_skew(seq, window=100) # Returns list of skew values
```
### Molecular Weight
```python
from Bio.SeqUtils import molecular_weight
dna = Seq('ATGCGATCG')
mw = molecular_weight(dna) # Single-stranded DNA weight
# Double-stranded DNA
mw_ds = molecular_weight(dna, double_stranded=True)
# Circular DNA
mw_circ = molecular_weight(dna, circular=True)
# RNA
rna = Seq('AUGCGAUCG')
mw_rna = molecular_weight(rna, seq_type='RNA')
# Protein
protein = Seq('MRCRS')
mw_prot = molecular_weight(protein, seq_type='protein')
```
### Melting Temperature
```python
from Bio.SeqUtils import MeltingTemp
seq = Seq('ATGCGATCGATCG')
# Basic Tm (Wallace rule - short oligos)
tm_wallace = MeltingTemp.Tm_Wallace(seq)
# GC method (more accurate for longer sequences)
tm_gc = MeltingTemp.Tm_GC(seq)
# Nearest neighbor (most accurate)
tm_nn = MeltingTemp.Tm_NN(seq)
# With salt correction
tm_corrected = MeltingTemp.Tm_NN(seq, Na=50, Mg=1.5)
```
### Sequence Search with IUPAC Codes (nt_search)
Built-in search that handles IUPAC ambiguity codes:
```python
from Bio.SeqUtils import nt_search
seq = 'ATGCGATCGATCGATNGATC'
# Search for pattern with ambiguous bases
result = nt_search(seq, 'GATNGATC') # N matches any base
# Returns: ['GATNGATC', 8] - pattern and position(s)
```
### Base Composition
```python
def base_composition(seq):
seq_str = str(seq).upper()
total = len(seq_str)
return {base: seq_str.count(base) / total * 100 for base in 'ATGC'}
```
## Amino Acid Code Conversion
Convert between 1-letter and 3-letter amino acid codes:
```python
from Bio.SeqUtils import seq1, seq3
# 3-letter to 1-letter
one_letter = seq1('MetAlaGlyTrp') # Returns 'MAGW'
# 1-letter to 3-letter
three_letter = seq3('MAGW') # Returns 'MetAlaGlyTrp'
# With custom separator
three_letter = seq3('MAGW', join='-') # Returns 'Met-Ala-Gly-Trp'
```
## Protein Properties
### Using ProteinAnalysis
```python
from Bio.SeqUtils.ProtParam import ProteinAnalysis
protein = ProteinAnalysis('MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNG')
```
### Basic Properties
```python
mw = protein.molecular_weight() # Molecular weight in Daltons
pi = protein.isoelectric_point() # Isoelectric point
aa_comp = protein.amino_acids_percent # Amino acid composition (property)
aa_count = protein.count_amino_acids() # Raw amino acid counts
```
### Stability and Hydropathy
```python
instability = protein.instability_index() # < 40 = stable, > 40 = unstable
gravy = protein.gravy() # Negative = hydrophilic, Positive = hydrophobic
arom = protein.aromaticity() # Fraction of Phe+Trp+Tyr
```
### Charge at Specific pH
```python
charge = protein.charge_at_pH(7.0) # Net charge at pH 7.0
charge_acidic = protein.charge_at_pH(4.0)
charge_basic = protein.charge_at_pH(10.0)
```
### Flexibility Profile
```python
flexibility = protein.flexibility() # List of flexibility values per residue
# Based on Vihinen, 1994 scale
```
### Secondary Structure
```python
helix, turn, sheet = protein.secondary_structure_fraction()
```
### Extinction Coefficient
```python
eps_reduced, eps_cystine = protein.molar_extinction_coefficient()
# eps_reduced: all Cys are reduced
# eps_cystine: all Cys form disulfide bonds
```
### Protein Scale Profiles
Calculate profiles using any amino acid scale:
```python
# Kyte-Doolittle hydropathy profile
kd_scale = {'A': 1.8, 'R': -4.5, 'N': -3.5, ...}
profile = protein.protein_scale(kd_scale, window=7)
```
## Code Patterns
### Analyze Multiple Sequences
```python
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
def analyze_fasta(filename):
results = []
for record in SeqIO.parse(filename, 'fasta'):
results.append({
'id': record.id,
'length': len(record.seq),
'gc': gc_fraction(record.seq) * 100
})
return results
```
### GC Content Distribution (Sliding Windows)
```python
from Bio.SeqUtils import gc_fraction
def gc_distribution(seq, window_size=100, step=50):
gc_values = []
for i in range(0, len(seq) - window_size + 1, step):
window = seq[i:i + window_size]
gc_values.append((i, gc_fraction(window) * 100))
return gc_values
```
### GC Skew Plot Data
```python
from Bio.SeqUtils import GC_skew
def gc_skew_analysis(seq, window=1000):
skew = GC_skew(seq, window=window)
positions = list(range(0, len(seq) - window + 1, window))
cumulative = []
total = 0
for s in skew:
total += s
cumulative.append(total)
return positions, skew, cumulative
```
### Full Protein Analysis Report
```python
from Bio.SeqUtils.ProtParam import ProteinAnalysis
def protein_report(sequence):
protein = ProteinAnalysis(str(sequence).replace('*', ''))
helix, turn, sheet = protein.secondary_structure_fraction()
return {
'length': len(sequence),
'molecular_weight': protein.molecular_weight(),
'isoelectric_point': protein.isoelectric_point(),
'charge_at_pH7': protein.charge_at_pH(7.0),
'instability_index': protein.instability_index(),
'gravy': protein.gravy(),
'aromaticity': protein.aromaticity(),
'helix_fraction': helix,
'turn_fraction': turn,
'sheet_fraction': sheet,
}
```
### Dinucleotide Frequencies
```python
from collections import Counter
def dinucleotide_freq(seq):
seq_str = str(seq)
dinucs = [seq_str[i:i+2] for i in range(len(seq_str) - 1)]
counts = Counter(dinucs)
total = sum(counts.values())
return {di: count / total for di, count in counts.items()}
```
### CpG Observed/Expected Ratio
```python
def cpg_ratio(seq):
seq_str = str(seq).upper()
cpg = seq_str.count('CG')
c = seq_str.count('C')
g = seq_str.count('G')
expected = (c * g) / len(seq_str) if len(seq_str) > 0 else 0
return cpg / expected if expected > 0 else 0
```
## Property Reference
### DNA/RNA Properties
| Property | Function | Notes |
|----------|----------|-------|
| GC content | `gc_fraction()` | Returns fraction (0-1) |
| GC by position | `GC123()` | Total, pos1, pos2, pos3 |
| GC skew | `GC_skew()` | (G-C)/(G+C) in windows |
| Molecular weight | `molecular_weight()` | In Daltons |
| Melting temp | `MeltingTemp.Tm_*()` | Multiple methods |
| IUPAC search | `nt_search()` | Handles ambiguity codes |
### Protein Properties
| Property | Method | Typical Range |
|----------|--------|---------------|
| MW | `molecular_weight()` | Daltons |
| pI | `isoelectric_point()` | 0-14 |
| Charge | `charge_at_pH(pH)` | Varies |
| Instability | `instability_index()` | <40 stable |
| GRAVY | `gravy()` | -2 to +2 |
| Aromaticity | `aromaticity()` | 0-0.15 |
| Flexibility | `flexibility()` | Per-residue list |
### Amino Acid Codes
| Function | Input | Output |
|----------|-------|--------|
| `seq1()` | 'MetAlaGly' | 'MAG' |
| `seq3()` | 'MAG' | 'MetAlaGly' |
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `KeyError` in ProteinAnalysis | Non-standard amino acid | Remove X, *, etc. |
| Wrong MW | Wrong seq_type | Specify 'RNA' or 'protein' |
| Invalid Tm | Sequence too short | Use >10 bp for accurate Tm |
| Empty GC_skew | Sequence shorter than window | Reduce window size |
## Decision Tree
```
Need sequence properties?
├── DNA/RNA sequence?
│ ├── GC content? → gc_fraction()
│ ├── GC at codon positions? → GC123()
│ ├── GC skew (replication origin)? → GC_skew()
│ ├── Molecular weight? → molecular_weight()
│ ├── Melting temperature? → MeltingTemp.Tm_NN()
│ └── Search with IUPAC codes? → nt_search()
├── Protein sequence?
│ └── Create ProteinAnalysis object
│ ├── Size → molecular_weight()
│ ├── Charge → isoelectric_point(), charge_at_pH()
│ ├── Stability → instability_index()
│ ├── Hydrophobicity → gravy()
│ ├── Flexibility → flexibility()
│ └── Structure → secondary_structure_fraction()
└── Convert amino acid codes?
└── seq1() / seq3()
```
## Related Skills
- seq-objects - Create Seq objects for property calculation
- sequence-io/sequence-statistics - File-level statistics (N50, totals)
- codon-usage - GC123 for codon position analysis
- transcription-translation - Translate before protein analysis
- alignment-files/bam-statistics - samtools stats for alignment-level GC and quality stats
<!-- AUTHOR_SIGNATURE: 9a7f3c2e-MD-BABU-MIA-2026-MSSM-SECURE -->