Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install jeremylongshore-claude-code-plugins-plus-skills-plugins-saas-packs-claude-pack-skills-clade-incident-runbookgit clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills.gitcp claude-code-plugins-plus-skills/SKILL.MD ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-plugins-saas-packs-claude-pack-skills-clade-incident-runbook/SKILL.md---
name: clade-incident-runbook
description: "Respond to Anthropic API incidents \u2014 outages, degraded performance,\n\
Use when working with incident-runbook patterns.\nerror spikes, and rate limit issues\
\ in production.\nTrigger with \"anthropic down\", \"claude outage\", \"anthropic\
\ incident\",\n\"claude not responding\", \"anthropic 529\".\n"
allowed-tools: Read, Grep, Bash(curl:*)
version: 1.0.0
license: MIT
author: Jeremy Longshore <jeremy@intentsolutions.io>
tags:
- saas
- anthropic
- claude
- incident
- runbook
compatibility: Designed for Claude Code
---
# Anthropic Incident Runbook
## Overview
Respond to Anthropic API incidents in production — outages, sustained 529 errors, authentication failures, and timeouts. Covers status page checking, severity classification, model fallback activation, communication, and post-incident review.
## Step 1: Confirm the Issue
```bash
# Check Anthropic status
curl -s https://status.anthropic.com/api/v2/status.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
print(f\"Status: {d['status']['description']} ({d['status']['indicator']})\")"
# Test API directly
curl -s -w "\nHTTP %{http_code} in %{time_total}s\n" \
https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "claude-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-5-20251001","max_tokens":5,"messages":[{"role":"user","content":"ping"}]}'
```
## Step 2: Classify Severity
| Symptom | Severity | Action |
|---------|----------|--------|
| 529 overloaded (intermittent) | Low | SDK auto-retries handle this |
| 529 overloaded (sustained 5+ min) | Medium | Switch to fallback model |
| 401/403 on all requests | High | API key issue — check console |
| All requests timing out | High | Check status page, activate fallback |
| Status page shows incident | Varies | Follow status page updates |
## Step 3: Activate Fallback
```typescript
async function callWithFallback(params: Anthropic.MessageCreateParams) {
try {
return await client.messages.create(params);
} catch (err) {
if (err instanceof Anthropic.APIError && (err.status === 529 || err.status === 500)) {
// Try a different model
if (params.model.includes('opus')) {
return await client.messages.create({ ...params, model: 'claude-sonnet-4-20250514' });
}
if (params.model.includes('sonnet')) {
return await client.messages.create({ ...params, model: 'claude-haiku-4-5-20251001' });
}
}
throw err;
}
}
```
## Step 4: Communicate
- Update your status page if user-facing
- Note: Anthropic incidents typically resolve in 15-60 minutes
## Step 5: Post-Incident
- Check your error logs for the incident window
- Calculate impact (failed requests, user impact)
- Verify all systems recovered
## Output
- Incident confirmed via status page and direct API test
- Severity classified (Low/Medium/High) based on symptoms
- Fallback activated if needed (downgrade model or queue requests)
- Impact assessed and documented post-incident
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| API Error | Check error type and status code | See `clade-common-errors` |
## Examples
See Step 1 (curl status check and API test), Step 2 (severity classification table), Step 3 (fallback code with model downgrade), and Step 5 (post-incident checklist) above.
## Resources
- [Anthropic Status](https://status.anthropic.com)
- [Status API](https://status.anthropic.com/api)
## Next Steps
See `clade-reliability-patterns` for building resilient integrations.
## Prerequisites
- Production Claude integration deployed
- Fallback model configuration in place (see `clade-reliability-patterns`)
- Monitoring/alerting configured (see `clade-observability`)
## Instructions
### Step 1: Review the patterns below
Each section contains production-ready code examples. Copy and adapt them to your use case.
### Step 2: Apply to your codebase
Integrate the patterns that match your requirements. Test each change individually.
### Step 3: Verify
Run your test suite to confirm the integration works correctly.