Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-validategit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-validate/SKILL.md---
name: validate
description: Validate that something actually works by testing it the way a real human user would — with expected inputs and expected outputs, in an agentic loop that retries, inspects, and doesn't stop until it either passes or gives you a clear failure report. Use this skill when the user says "validate this", "test this end to end", "does this actually work", "prove it works", "check this for real", "verify the output", "run a real test", "make sure this works", "validate inputs and outputs", "acceptance test", or when you just finished building something and need to confirm it works before claiming it's done. Also triggers on "/validate". This is not unit testing — this is human-style "click the button and see what happens" validation.
---
# Validate — Human-Style Input/Output Testing
You test things the way a human QA engineer would: give it real inputs, check the real outputs, and don't trust anything until you've seen it with your own eyes.
## Philosophy
Unit tests check code. This skill checks BEHAVIOR. The question isn't "does the function return the right value" — it's "if a human user did X, would they get Y?" That means using the actual interfaces: running the command, hitting the API, opening the browser, sending the message, uploading the file — whatever the real user would do.
The agentic loop is key: if something fails, don't just report failure. Investigate WHY, try to fix it, and re-test. A human QA engineer doesn't file a bug on the first failure — they poke at it, try variations, check if it's a flaky test or a real bug, and THEN report.
## Step 1: Define the Test Spec
Before testing anything, establish what you're testing. Either extract from context or ask the user:
**What are we testing?**
- A skill you just created? A script? An API? A UI? A workflow?
- What's the entry point? (command to run, URL to hit, skill to invoke, file to execute)
**Expected inputs:**
- What does the user/caller provide?
- What are the edge cases? (empty input, huge input, malformed input, unicode, etc.)
- What are the preconditions? (auth required? file must exist? server must be running?)
**Expected outputs:**
- What should the user see/get back?
- How do you verify it? (check stdout? read a file? inspect the database? look at the browser?)
- What does failure look like? (error message? wrong output? hang? crash?)
Build a test matrix:
```markdown
## Test Spec: [what we're testing]
| # | Test Case | Input | Expected Output | How to Verify |
|---|-----------|-------|-----------------|---------------|
| 1 | Happy path | [normal input] | [expected result] | [verification method] |
| 2 | Edge case: empty | [empty/null] | [graceful error or default] | [verification method] |
| 3 | Edge case: large | [oversized input] | [handles correctly] | [verification method] |
| 4 | Error case | [bad input] | [clear error message] | [verification method] |
| 5 | Real-world scenario | [realistic user input] | [what user expects] | [verification method] |
```
Show this to the user: "Here's my test plan. Anything to add or change?"
## Step 2: Execute the Agentic Test Loop
For each test case, run this loop:
```
┌─────────────────┐
│ PREPARE │ Set up preconditions, clean state
└────────┬────────┘
▼
┌─────────────────┐
│ EXECUTE │ Run the actual thing with the input
└────────┬────────┘
▼
┌─────────────────┐
│ OBSERVE │ Capture ALL output (stdout, files, UI, side effects)
└────────┬────────┘
▼
┌─────────────────┐
│ COMPARE │ Does actual output match expected?
└────────┬────────┘
▼
┌────┴────┐
│ PASS? │
└────┬────┘
YES │ NO
│ ▼
│ ┌──────────────────┐
│ │ INVESTIGATE │ WHY did it fail?
│ │ - Read error msgs │
│ │ - Check logs │
│ │ - Inspect state │
│ └────────┬─────────┘
│ ▼
│ ┌─────┴─────┐
│ │ FIXABLE? │
│ └─────┬─────┘
│ YES │ NO
│ │ ▼
│ │ Record failure
│ │ with evidence
│ ▼
│ ┌──────────┐
│ │ FIX │ Apply the fix
│ └────┬─────┘
│ ▼
│ ┌──────────┐
│ │ RE-TEST │ Run the same test case again
│ └────┬─────┘
│ │
│ └──→ (back to EXECUTE, max 3 retries)
▼
Record PASS
with evidence
```
### How to execute each test type:
**Testing a skill:**
- Invoke it via the Skill tool with the test input
- Capture what it does (tool calls, files created, output)
- Compare against expected behavior
**Testing a script:**
- Run it via Bash with the test input
- Capture stdout, stderr, exit code
- Check output files if applicable
**Testing an API/endpoint:**
- Hit it with curl or the appropriate MCP tool
- Check response status, body, headers
- Verify side effects (database changes, files created)
**Testing a UI/browser flow:**
- Use browser automation (agent-browser / Playwright MCP) to:
- Navigate to the page
- Input the test data
- Click the buttons
- Screenshot the result
- Read the page text
- Compare against expected
**Testing a workflow (multi-step):**
- Execute step by step
- Verify each intermediate state
- Check the final output
### Investigation rules:
When a test fails, don't just say "it failed." Investigate like a human would:
1. **Read the error message carefully** — what does it actually say?
2. **Check the logs** — is there more context in stderr, log files, or console output?
3. **Inspect the state** — did the input get processed at all? Where did it break?
4. **Try a variation** — does a slightly different input work? This isolates the problem.
5. **Check preconditions** — is auth valid? Is the server running? Is the file accessible?
### Fix-and-retry rules:
- **Max 3 fix attempts per test case** — after that, it's a real bug, not a flaky test
- **Only fix things within scope** — if the test reveals the code is wrong, fix the code. If the test itself is wrong, fix the test spec (and tell the user).
- **Document every fix** — what was wrong, what you changed, did the re-test pass
- **Don't silently change expected outputs** — if the actual output is different but arguably correct, flag it for the user to decide
## Step 3: Report Results
After all test cases, present a results table:
```markdown
## Validation Report: [what we tested]
| # | Test Case | Result | Attempts | Notes |
|---|-----------|--------|----------|-------|
| 1 | Happy path | ✅ PASS | 1/1 | Output matched exactly |
| 2 | Empty input | ✅ PASS | 2/3 | Failed first time (missing null check), fixed, re-passed |
| 3 | Large input | ⚠️ PASS with caveat | 1/1 | Works but takes 8s for large files |
| 4 | Bad input | ✅ PASS | 1/1 | Clear error message returned |
| 5 | Real-world | ❌ FAIL | 3/3 | Encoding issue with unicode chars — needs fix |
**Overall: 4/5 passing (1 failure)**
### Failures requiring attention:
**Test 5 — Real-world scenario**
- Input: [what was provided]
- Expected: [what should have happened]
- Actual: [what actually happened]
- Root cause: [what investigation revealed]
- Evidence: [screenshot, log output, error message]
- Suggested fix: [what needs to change]
### Fixes applied during testing:
- Test 2: Added null check in [file:line] — empty input now returns graceful error
```
### Evidence rules:
Every result needs EVIDENCE — not just "it passed." Show:
- **For PASS**: the actual output (or a screenshot), proving it matches
- **For FAIL**: the error, the expected vs actual, and investigation findings
- **For fixes**: what was changed and the before/after
## Step 4: Regression Check (if fixes were applied)
If you fixed anything during testing, re-run ALL test cases — not just the one that failed. A fix for test 2 might break test 1. This is a full regression pass:
```markdown
## Regression Pass (after fixes)
| # | Test Case | Before Fix | After Fix |
|---|-----------|------------|-----------|
| 1 | Happy path | ✅ | ✅ (still passing) |
| 2 | Empty input | ❌ | ✅ (fixed) |
| 3 | Large input | ✅ | ✅ (still passing) |
| 4 | Bad input | ✅ | ✅ (still passing) |
| 5 | Real-world | ❌ | ❌ (still failing — separate issue) |
```
## Adapting Scope
**Quick validation** (user says "just check if it works"):
- 1-2 test cases: happy path + one edge case
- Skip the full test matrix, just run and report
**Thorough validation** (user says "really test this" or you just built something complex):
- Full test matrix with 5-8 cases
- Edge cases, error cases, real-world scenarios
- Fix-and-retry loop with regression
**Continuous validation** (used with /loop):
- Define a "smoke test" — the single most important test case
- Run it on an interval to catch regressions
- Alert only on failure
## When to Use This Skill Proactively
You should suggest `/validate` after:
- Creating a new skill (test it before declaring it done)
- Writing a script that processes data (test with real data)
- Building an API endpoint (test with real requests)
- Fixing a bug (test the fix AND test that nothing else broke)
- Any time you're about to say "it should work" — test it instead