MarketingbendourtheFree

skill-eval-loop

Drive a structured evaluation iteration loop for any DevAI-Hub skill - capture user intent, write test prompts, run the skill against a baseline (no-skill) control, grade outputs against assertions, aggregate to a benchmark, view in a browser, collect feedback, and improve the skill across iterations until pass-rate stabilizes. Use whenever the user wants to evaluate a skill, benchmark a skill, A/B test a skill, optimize a skill description, run an eval set, score a skill against test prompts, iterate on a skill, or "make this skill actually work" - even if they don't say the word "eval". Covers workspace layout, eval-prompt authoring, with-skill / without-skill paired runs, grading via assertions, browser-based human review, feedback capture, and the description-optimizer integration. SKIP one-off prompt tests with no comparison, ad-hoc skill drafting that does not need iteration, or simple unit-test runs against deterministic code.

Repo bundle on Versuzbendourthe/DevAI-Hub186 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/bendourthe/DevAI-Hub Yours? Claim it ↗

§ 01 — Stats

Prior1090

Quality—

Score—

Tasks—

§ 02 — Install