Testingevent4u-appFree

prompt-engineering-patterns

Use when designing production-LLM prompts — few-shot, chain-of-thought, system prompts, templates, self-verification — distinct from prompt-optimizer and refine-prompt.

Repo bundle on Versuzevent4u-app/agent-config329 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/event4u-app/agent-config Yours? Claim it ↗

§ 01 — Stats

Stars2

Prior1101

Quality—

Score—

Tasks—

§ 02 — Install

Get prompt-engineering-patterns.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install event4u-app-agent-config-agent-src-uncompressed-skills-prompt-engineering-patterns

Or clone the repo

$git clone https://github.com/event4u-app/agent-config.git

Or copy the SKILL.md manually

More Versuz picks

★ Featured$0.99

vz-scrape-runner

Web

★ Featured$1.99

vz-bench-debug

Document

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge event4u-app-agent-config-agent-src-uncompressed-skills-prompt-engineering-patterns↵

Show SKILL.md content (~1.9k tokens)

---
name: prompt-engineering-patterns
description: "Use when designing production-LLM prompts — few-shot, chain-of-thought, system prompts, templates, self-verification — distinct from prompt-optimizer and refine-prompt."
source: package
domain: product
status: active
---

# prompt-engineering-patterns

Production patterns for LLM prompts: few-shot, chain-of-thought, system-prompt design, templating, self-verification. **Distinct surface** from sibling skills:

- [`prompt-optimizer`](../prompt-optimizer/SKILL.md) — polishes a single end-user prompt for ChatGPT / Claude / Gemini.
- [`refine-prompt`](../refine-prompt/SKILL.md) — refines a free-form work prompt into engine-ready acceptance criteria.
- **This skill** — designs prompts that ship inside an application that calls an LLM at runtime.

## When to use

- Designing the system prompt for a new LLM-powered feature.
- Building a few-shot template with dynamic example selection.
- Adding chain-of-thought reasoning to a low-accuracy prompt.
- Reviewing a prompt diff in production code.
- Diagnosing inconsistent LLM outputs that look like prompt drift.

Do NOT use when:

- Polishing a one-off prompt for a chat session — route to `prompt-optimizer`.
- Turning a Jira ticket into engine input — route to `refine-prompt`.
- Tuning a model's weights — this skill is prompt-only, not fine-tuning.

## Decision framework

### Step 1 — Pick the prompt level (progressive disclosure)

```
Start at Level 1; only escalate when measurement says you must.

Level 1  Direct instruction                    "Summarize this article."
Level 2  + constraints (length, format, focus) "...in 3 bullets, key findings only."
Level 3  + reasoning scaffold                  "Read first, identify findings, then summarize."
Level 4  + few-shot examples                   "Like these examples: ..."
Level 5  + self-verification step              "...then check answer against criteria; revise if fails."
```

Escalating without evidence is over-engineering. Each level adds tokens, latency, and a maintenance surface.

### Step 2 — Structure the prompt

Fixed instruction hierarchy — every production prompt fills these slots in order:

```
[System context]   role, expertise, constraints, safety
[Task instruction] what to do, in one sentence
[Examples]         few-shot demonstrations (optional)
[Input data]       the user-supplied content
[Output format]    schema, length, citation rules
```

Stable slots (system, task, format) belong in cached prompt prefixes; volatile slots (examples, input) belong in the per-call portion.

### Step 3 — Pick the few-shot strategy

```
Examples are uniform and small (< 20)         → embed all of them; deterministic.
Examples are large or diverse                 → semantic-similarity retrieval per call.
Edge cases dominate                           → diversity-sampled examples (cluster + pick one per cluster).
Token budget tight                            → fewer, higher-quality examples beats many mediocre.
Examples drift with the data                  → regenerate from a labeled corpus on a schedule, not hand-edited.
```

Bad examples are worse than no examples — the model imitates structure.

### Step 4 — Add chain-of-thought ONLY when measured

CoT improves accuracy on multi-step reasoning, hurts on classification and lookup. Decision rule:

```
Task is multi-step / arithmetic / multi-hop   → add CoT (zero-shot "let's think step by step", or few-shot CoT).
Task is single-step extraction / classify     → CoT adds tokens without lift; skip.
You haven't measured                          → measure first, decide second.
Self-consistency needed (high-stakes answers) → sample N reasoning paths, majority vote.
```

### Step 5 — Build error recovery into the prompt

Production prompts handle their own failure cases:

- Specify the explicit "I don't know" output (don't let the model invent).
- Require a confidence indicator when downstream code needs to gate.
- Define the format for "missing information" so callers can branch.
- For self-verification: specify the criteria, then the revision rule.

### Step 6 — Treat prompts as code

- Version every prompt (file + git, not a wiki page).
- Test on a frozen evaluation set before shipping changes.
- Track P50 / P95 latency, token usage, accuracy, success rate per version.
- A/B test prompt variants behind a flag; never edit a live prompt without a rollback path.

## Procedure: Apply to a new LLM feature

1. **Inspect** the existing prompt (if any) and the eval set; verify a success metric exists (accuracy / consistency / latency / token cost) — refuse to design without it.
2. Draft Level-1 prompt (Step 1) and measure on the eval set.
3. Escalate one level at a time (Step 1) until metric is met or budget runs out.
4. Lock the structure (Step 2), choose few-shot strategy (Step 3), decide CoT (Step 4).
5. Add error-recovery clauses (Step 5).
6. Commit prompt + eval results + chosen version (Step 6); cite this skill.

## Output format

1. Prompt-spec table: slot · content · stable-vs-volatile · cached-vs-per-call.
2. Eval results table: prompt-version · metric · delta-vs-previous.
3. Failure-mode list: trigger · prompt clause that handles it.

## Gotcha

- Few-shot examples leak the model's style — examples that include hedging produce hedging.
- "Let's think step by step" works zero-shot on capable models, fails on smaller models without exemplar reasoning traces.
- Self-consistency (N samples + vote) multiplies cost by N — only on high-stakes paths.
- Cached prompt prefixes only cache when byte-identical — a single reformat busts the cache.
- Prompts that drift across model versions silently regress accuracy when the provider rolls a model update; pin model version OR re-run eval per release.

## Do NOT

- Do NOT escalate to Level 4 / 5 before measuring at lower levels.
- Do NOT mix few-shot examples from different tasks; the model averages them.
- Do NOT add CoT to single-step classification — it hurts.
- Do NOT hand-edit production prompts without versioning + eval.
- Do NOT echo secrets or PII into the prompt — they end up in provider logs.

## Auto-trigger keywords

- prompt engineering
- few-shot learning
- chain-of-thought
- system prompt design
- prompt template
- LLM prompt versioning
- prompt evaluation

## Provenance

- Adopted from: `Microck/ordinary-claude-skills@8f5c83174f7aa683b4ddc7433150471983b93131:skills_all/prompt-engineering-patterns/SKILL.md` (MIT, © 2025 Microck) — restructured into a decision-framework shape; vendor `prompt_optimizer` Python snippets dropped (project-specific to Microck).
- Cross-linked: [`prompt-optimizer`](../prompt-optimizer/SKILL.md), [`refine-prompt`](../refine-prompt/SKILL.md), [`mcp-builder`](../mcp-builder/SKILL.md), [`async-python-patterns`](../async-python-patterns/SKILL.md).
- Provenance registry: `agents/contexts/skills-provenance.yml` (entry: `prompt-engineering-patterns`).
- Iron-Law floor: `verify-before-complete`, `skill-quality`, `non-destructive-by-default`.