SQLUltronCoreFree

llm-routing-and-fallback

Route LLM API calls through cost-efficient proxy layers with automatic model fallback, budget caps, and retry logic. Use when building systems that need to switch between Claude models, fall back from expensive to cheaper models, or unify multiple LLM providers behind one interface. Trigger when users mention litellm, model routing, API cost reduction, model fallback, or budget-aware inference.

Repo bundle on VersuzUltronCore/claude-skill-vault436 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/UltronCore/claude-skill-vault Yours? Claim it ↗

§ 01 — Stats

Stars1

Prior1099

Quality—

Score—

Tasks—

§ 02 — Install

Get llm-routing-and-fallback.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install ultroncore-claude-skill-vault-skills-optimization-llm-routing-and-fallback-versions-v1

Or clone the repo

$git clone https://github.com/UltronCore/claude-skill-vault.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$0.99

vz-scrape-runner

Web

★ Featured$1.99

vz-bench-debug

Document

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge ultroncore-claude-skill-vault-skills-optimization-llm-routing-and-fallback-versions-v1↵

Show SKILL.md content (~742 tokens)

---
name: llm-routing-and-fallback
description: Route LLM API calls through cost-efficient proxy layers with automatic model fallback, budget caps, and retry logic. Use when building systems that need to switch between Claude models, fall back from expensive to cheaper models, or unify multiple LLM providers behind one interface. Trigger when users mention litellm, model routing, API cost reduction, model fallback, or budget-aware inference.
---

# LLM Routing and Fallback

When building Claude Code skills or automation that calls LLM APIs, routing through a proxy layer reduces cost, adds reliability, and enables model switching without code changes.

## Core patterns

### Pattern 1: litellm unified interface
litellm provides an OpenAI-compatible interface for 100+ LLMs. Claude calls look identical to GPT calls at the code level.

```python
import litellm

# Route to Claude
response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}]
)

# Fallback: if claude fails, try gpt-4o
response = litellm.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
    fallbacks=["openai/gpt-4o"]
)
```

### Pattern 2: Budget caps
```python
import litellm
litellm.max_budget = 0.50  # $0.50 cap per run

response = litellm.completion(
    model="anthropic/claude-haiku-4-5-20251001",  # cheapest first
    messages=[{"role": "user", "content": prompt}]
)
```

### Pattern 3: Model tiering
Route by task complexity:
- Simple tasks → claude-haiku (cheapest)
- Standard tasks → claude-sonnet
- Complex/creative → claude-opus (only when justified)

```python
def route_by_complexity(prompt: str, complexity: str) -> str:
    models = {
        "simple": "anthropic/claude-haiku-4-5-20251001",
        "standard": "anthropic/claude-sonnet-4-6",
        "intensive": "anthropic/claude-opus-4-6"
    }
    return litellm.completion(
        model=models[complexity],
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content
```

### Pattern 4: Vercel AI SDK (TypeScript)
```typescript
import { anthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";

const { text } = await generateText({
    model: anthropic("claude-sonnet-4-6"),
    prompt: "Hello",
    maxRetries: 3,
});
```

## When to use each pattern
- Building a tool that should work across multiple LLM providers → Pattern 1
- Enforcing API spend limits → Pattern 2
- Optimizing cost by routing based on task complexity → Pattern 3 + claude-usage-orchestrator
- Building TypeScript/Next.js skills → Pattern 4

## Related skills
- claude-usage-orchestrator (routing decisions)
- sentry-and-otel-setup (observability for LLM calls)