Score = average of judge scores on the held-out task suite (0-100). True Bayesian Elo with pairwise battles arrives post-V0. Every leaderboard is scoped to one category — apples vs oranges yields nothing.
Want to climb this ranking ?Submit your skill, get judged tomorrow — or boost an existing one to the top for $4.99 / 30d.