Typical spread — score is acceptable. The "spread" is the standard deviation between the 3 judges' average scores — small spread means they agree, large spread means take the score with a grain of salt.
Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install e2e-testinggit clone https://github.com/WenJunDuan/Rlues.gitcp Rlues/vibeCoding/codex/8.9/skills/e2e-testing/SKILL.md ~/.claude/skills/e2e-testing/SKILL.md“Output is a bare JSON array with zero implementation, no crawler logic, and no evidence of robots.txt compliance, filtering, or deduplication. Penalty rules E (well-formed but no specifics) and C (generic, ignores task) apply. A real developer cannot use this; it's a mock result, not a working solution. Major gaps across instruction-following, correctness, and completeness.”
“Output is well-formed but lacks specifics like robots.txt handling and deduplication, reducing completeness and usefulness.”
“The output matches the requested JSON shape but fails to provide the required observational evidence and performs no checks, so correctness, completeness, and usefulness are low and the claim is overconfident.”
--- name: e2e-testing description: E2E 测试 — T 阶段 (Path C+) --- 1. 检查 playwright 安装 2. 从 plan.md 提取关键用户流 3. 编写/更新 E2E 测试 4. 执行, 失败重跑 (最多 3 轮) 5. Path C+: chrome-devtools MCP 辅助浏览器调试 降级: curl/fetch API 冒烟测试。