Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install freedomintelligence-openclaw-medical-skills-skills-protein-design-workflowgit clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills.gitcp OpenClaw-Medical-Skills/SKILL.MD ~/.claude/skills/freedomintelligence-openclaw-medical-skills-skills-protein-design-workflow/SKILL.md---
name: protein-design-workflow
description: >
End-to-end guidance for protein design pipelines.
Use this skill when: (1) Starting a new protein design project,
(2) Need step-by-step workflow guidance,
(3) Understanding the full design pipeline,
(4) Planning compute resources and timelines,
(5) Integrating multiple design tools.
For tool selection, use binder-design.
For QC thresholds, use protein-qc.
license: MIT
category: orchestration
tags: [guidance, pipeline, workflow]
---
# Protein Design Workflow Guide
## Standard binder design pipeline
### Overview
```
Target Preparation --> Backbone Generation --> Sequence Design
| | |
v v v
(pdb skill) (rfdiffusion) (proteinmpnn)
| |
v v
Structure Validation --> Filtering
| |
v v
(alphafold/chai) (protein-qc)
```
## Phase 1: Target preparation
### 1.1 Obtain target structure
```bash
# Download from PDB
curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb"
```
### 1.2 Clean and prepare
```python
# Extract target chain
# Remove waters, ligands if needed
# Trim to binding region + 10A buffer
```
### 1.3 Select hotspots
- Choose 3-6 exposed residues
- Prefer charged/aromatic (K, R, E, D, W, Y, F)
- Check surface accessibility
- Verify residue numbering
**Output**: `target_prepared.pdb`, hotspot list
## Phase 2: Backbone generation
### Option A: RFdiffusion (diverse exploration)
```bash
modal run modal_rfdiffusion.py \
--pdb target_prepared.pdb \
--contigs "A1-150/0 70-100" \
--hotspot "A45,A67,A89" \
--num-designs 500
```
### Option B: BindCraft (end-to-end)
```bash
modal run modal_bindcraft.py \
--target-pdb target_prepared.pdb \
--hotspots "A45,A67,A89" \
--num-designs 100
```
**Output**: 100-500 backbone PDBs
## Phase 3: Sequence design
### For RFdiffusion backbones
```bash
for backbone in backbones/*.pdb; do
modal run modal_proteinmpnn.py \
--pdb-path "$backbone" \
--num-seq-per-target 8 \
--sampling-temp 0.1
done
```
**Output**: 8 sequences per backbone (800-4000 total)
## Phase 4: Structure validation
### Predict complexes
```bash
# Prepare FASTA with binder + target
# binder:target format for multimer
modal run modal_colabfold.py \
--input-faa all_sequences.fasta \
--out-dir predictions/
```
**Output**: AF2 predictions with pLDDT, ipTM, PAE
## Phase 5: Filtering and selection
### Apply standard thresholds
```python
import pandas as pd
# Load metrics
designs = pd.read_csv('all_metrics.csv')
# Filter
filtered = designs[
(designs['pLDDT'] > 0.85) &
(designs['ipTM'] > 0.50) &
(designs['PAE_interface'] < 10) &
(designs['scRMSD'] < 2.0) &
(designs['esm2_pll'] > 0.0)
]
# Rank by composite score
filtered['score'] = (
0.3 * filtered['pLDDT'] +
0.3 * filtered['ipTM'] +
0.2 * (1 - filtered['PAE_interface'] / 20) +
0.2 * filtered['esm2_pll']
)
top_designs = filtered.nlargest(50, 'score')
```
**Output**: 50-200 filtered candidates
## Resource planning
### Compute requirements
| Stage | GPU | Time (100 designs) |
|-------|-----|-------------------|
| RFdiffusion | A10G | 30 min |
| ProteinMPNN | T4 | 15 min |
| ColabFold | A100 | 4-8 hours |
| Filtering | CPU | 15 min |
### Total timeline
- Small campaign (100 designs): 8-12 hours
- Medium campaign (500 designs): 24-48 hours
- Large campaign (1000+ designs): 2-5 days
## Quality checkpoints
### After backbone generation
- [ ] Visual inspection of diverse backbones
- [ ] Secondary structure present
- [ ] No clashes with target
### After sequence design
- [ ] ESM2 PLL > 0.0 for most sequences
- [ ] No unwanted cysteines (unless intentional)
- [ ] Reasonable sequence diversity
### After validation
- [ ] pLDDT > 0.85
- [ ] ipTM > 0.50
- [ ] PAE_interface < 10
- [ ] Self-consistency RMSD < 2.0 A
### Final selection
- [ ] Diverse sequences (cluster if needed)
- [ ] Manufacturable (no problematic motifs)
- [ ] Reasonable molecular weight
## Common issues
| Problem | Solution |
|---------|----------|
| Low ipTM | Check hotspots, increase designs |
| Poor diversity | Higher temperature, more backbones |
| High scRMSD | Backbone may be unusual |
| Low pLDDT | Check design quality |
## Advanced workflows
### Multi-tool combination
1. RFdiffusion for initial backbones
2. ColabDesign for refinement
3. ProteinMPNN diversification
4. AF2 final validation
### Iterative refinement
1. Run initial campaign
2. Analyze failures
3. Adjust hotspots/parameters
4. Repeat with insights