Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install bench-comparegit clone https://github.com/apache/pinot.gitcp pinot/.claude/skills/bench-compare/SKILL.md ~/.claude/skills/bench-compare/SKILL.md---
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
name: bench-compare
description: Run a Pinot JMH benchmark twice — once on a baseline commit, once with the current changes — and report both sets of results side by side. Uses a git worktree so the user's working tree is never disturbed.
---
# /bench-compare
Purpose: when a change claims a performance impact (principle C6.7 — "performance-sensitive changes require benchmark comparisons"), produce the before/after numbers in one command, without making the user manually stash, checkout, run, un-stash, and re-run.
Usage:
- `/bench-compare BenchmarkDictionary` — compares current working tree vs. `merge-base HEAD upstream/master` (falls back to `origin/master` if upstream missing).
- `/bench-compare BenchmarkDictionary <baseline-ref>` — compare against an explicit ref (commit, tag, branch).
- `/bench-compare BenchmarkDictionary --args "-wi 1 -i 2 -f 1 -r 5s -w 5s"` — pass extra JMH args. **Always use short warmup/iteration flags for a first pass**; defaults run for hours or days.
**Time expectations.** Pinot benchmarks are not quick. Default JMH config in `pinot-perf` is 8 warmup × 60s + 8 measurement × 60s × 5 forks per parameter combination — a single benchmark method's `@Benchmark` can report an ETA of multiple days. The skill will refuse to run without either: (a) explicit `--args` that reduce warmup/iteration counts, or (b) the user confirming they really do want the full default run.
## Procedure
1. **Locate the benchmark.** Glob for `pinot-perf/**/<benchmarkName>.java`. If zero or multiple matches, report and stop. The benchmark class must be under `pinot-perf`.
2. **Resolve the baseline ref.**
- Default: `git merge-base HEAD upstream/master`. If the `upstream` remote isn't defined, fall back to `origin/master`. If neither resolves, ask the user for an explicit ref.
- If the user passed a ref, validate it with `git rev-parse --verify <ref>`.
3. **Prepare output directory.** `mkdir -p .bench-compare/` and append it to the repo's `.gitignore` if not already there. Produce two files: `baseline-<short-sha>.txt` and `current-<short-sha-or-WIP>.txt`.
4. **Warn and confirm.** Benchmarks take real time. Inspect `--args` — if the user hasn't passed iteration controls, warn that the default suite can take hours to days and suggest a starter like `-wi 1 -i 2 -f 1 -r 5s -w 5s`. Print an estimate of the pair of runs (rough: a 5s-warmup × 5s-measurement × 1 fork run takes ~30–120s per `@Benchmark` method after the Pinot-side `@Setup` completes; `@Setup` alone can run for 1–10 minutes for benchmarks that build segments). Ask the user to confirm.
5. **Build pinot-perf in a baseline worktree.** This avoids touching the working tree:
```
git worktree add /tmp/pinot-bench-baseline <baseline-ref>
(cd /tmp/pinot-bench-baseline && ./mvnw -pl pinot-perf -am package -DskipTests)
```
The package goal produces the jars, an appassembler-generated launcher (for ~21 blessed benchmark classes) at `pinot-perf/target/pinot-perf-pkg/bin/pinot-<BenchmarkClass>.sh`, and a fat `lib/` directory.
6. **Run the baseline benchmark.** Two invocation styles, in order of preference:
**Preferred — always use JMH's own Main class:**
```
java -Xms4G -Xmx8G -cp '/tmp/pinot-bench-baseline/pinot-perf/target/pinot-perf-pkg/lib/*' \
org.openjdk.jmh.Main 'org.apache.pinot.perf.<BenchmarkClass>' \
-wi 1 -i 2 -f 1 -r 5s -w 5s \
-jvmArgsAppend='-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true' \
> .bench-compare/baseline-<short-sha>.txt 2>&1
```
Why not use the generated `pinot-<BenchmarkClass>.sh`?
- It hard-codes `-Xms24G -Xmx24G` — OOMs on <32GB machines.
- The benchmark's own `main()` (which the script invokes) typically constructs `OptionsBuilder` directly and **ignores CLI args**, so you can't override warmup/iterations or pass `-jvmArgsAppend`. Going through `org.openjdk.jmh.Main` bypasses the custom main and gets you JMH's standard CLI.
- The `--add-opens`/`--add-exports` flags are mandatory for any benchmark that extends `BaseClusterIntegrationTest` (i.e., spins up a Pinot cluster) on JDK 21 — without them, ZK startup fails with `InaccessibleObjectException` wrapped as `ExceptionInInitializerError`.
For the vector suite (`BenchmarkVectorIndex`) use the `exec:java` form from `pinot-perf/README.md`; it has its own quirks.
7. **Build and run the current tree.** Two mandatory gotchas:
- **Always clean first:** `./mvnw -pl pinot-perf clean package -DskipTests` (note the `clean`, no `-am` — see next bullet). If `pinot-perf/target/pinot-perf-pkg/` already exists from a prior build on a different branch/ref, incremental `package` leaves stale third-party jars in `lib/` when a dependency version changes upstream. Those stale jars sit on the classpath alongside the new ones (e.g. `zookeeper-3.9.4.jar` and `zookeeper-3.9.5.jar`) and cause `NoSuchMethodError` at runtime. Crucially, Helix/Pinot swallows the resulting `ExceptionInInitializerError` in ZK startup and surfaces a misleading `ZkTimeoutException: timeout: 1000` instead — which looks for all the world like a flaky port or timing issue. If you see that exception, **check `lib/` for duplicate versions of `zookeeper-*`, `helix-*`, `netty-*`, etc. first.**
- **Use `-am` only on the first build.** After the first clean+package, upstream modules are populated; subsequent builds can skip `-am`. The worktree build in step 5 gets a fresh `target/` so doesn't have this problem.
Invocation is identical to step 6, just against the current tree's `lib/*`:
```
./mvnw -pl pinot-perf clean package -DskipTests -am
java -Xms4G -Xmx8G -cp 'pinot-perf/target/pinot-perf-pkg/lib/*' \
org.openjdk.jmh.Main 'org.apache.pinot.perf.<BenchmarkClass>' <same JMH + jvmArgsAppend flags> \
> .bench-compare/current-<sha-or-WIP>.txt 2>&1
```
8. **Clean up the worktree.** `git worktree remove /tmp/pinot-bench-baseline --force`. Do this even if steps 6 or 7 failed.
9. **Diff the results.** Parse JMH's table output (the `Benchmark ... Score Error Units` lines) from both files. Produce a table:
```
Benchmark Baseline (ops/s) Current (ops/s) Δ Δ%
foo.methodA 1234.5 ± 12.1 1478.2 ± 15.3 +243.7 +19.7%
foo.methodB 987.6 ± 8.0 992.1 ± 7.2 +4.5 +0.4%
```
Use "ops/s", "ns/op", or whatever unit JMH emits — don't convert.
10. **Report.** Print the table. Flag any benchmark where `|Δ%| > 2×error%` as likely a real change (otherwise probably noise). Include the paths to the raw files so the user can share them in a PR.
## Notes
- **Stale-jar trap is the #1 source of mysterious failures.** Pinot benchmarks that fail on a subsequent run of `/bench-compare` in the same repo almost always fail because of duplicate third-party jars in `pinot-perf/target/pinot-perf-pkg/lib/` — typically `zookeeper-X.jar` + `zookeeper-Y.jar` (or equivalent for helix, netty, guava) from different builds. The failure shows up as a deeply-wrapped `ZkTimeoutException: Unable to connect to zookeeper server within timeout: 1000` (or similar NoSuchMethodError swallowed into an infrastructure-looking error). First diagnostic when a second run fails: `ls pinot-perf/target/pinot-perf-pkg/lib/ | sort | awk -F- '{v=$NF; sub("\\.jar$","",v); k=$0; sub("-"v"\\.jar$","",k); print k}' | sort | uniq -d` to spot duplicates. The fix is always `./mvnw -pl pinot-perf clean package -DskipTests -am`, never `rm` individual jars.
- Worktrees require a clean `.git`. If the repo is in the middle of a rebase/merge, abort with a clear message.
- **JDK 21 needs the full `--add-opens` / `--add-exports` flag set** for any cluster-backed benchmark (extends `BaseClusterIntegrationTest`). Without them, ZK startup fails with `InaccessibleObjectException: ... module java.base does not "opens java.lang"`. Pass via `-jvmArgsAppend=...` to `org.openjdk.jmh.Main`; CI's `pinot_tests.yml` has the canonical list.
- **The generated `pinot-<BenchmarkClass>.sh` scripts hard-code `-Xms24G -Xmx24G`.** Avoid them — use `java -cp 'lib/*' org.openjdk.jmh.Main <FQN>` directly with your own `-Xmx`.
- **Not every benchmark has a generated script.** The appassembler programs list in `pinot-perf/pom.xml` covers ~21 of ~60 benchmark classes. The direct `java -cp` invocation works for any of them.
- JMH's `-l` (list benchmarks) flag **does not help here** — Pinot benchmark classes have custom `main()` entry points that construct `OptionsBuilder` directly, ignore CLI args, and plunge straight into `Runner.run(...)` which in turn kicks off `@Setup`. For `BenchmarkDictionary` this `@Setup` alone burns 5+ minutes building dictionaries. There is no fast sanity-check short of actually running the benchmark through `org.openjdk.jmh.Main` (which at least accepts `-wi 1 -i 1 -r 1s -w 1s` to minimise it).
- Benchmarks must run on the same hardware, same JDK, same OS load. Warn the user if they're on battery power or running other heavy processes.
- Do not `sleep` between runs for "timing" reasons. If a second run fails, it is the stale-jar issue (above), not TIME_WAIT. I spent a long time chasing the timing hypothesis before spotting the classpath mismatch.
- If the benchmark's output format isn't plain JMH (e.g. `BenchmarkVectorIndex` writes a custom report), don't try to parse it — just save both outputs and tell the user where they are, with a note that manual comparison is needed.
- Never use `git stash` instead of a worktree. Stash can be lost if the second build fails and the user doesn't know to pop it.