Method: flat-decode + CEK evaluate on 89 Plutus scripts. Each VM uses its native benchmark framework. Sequential execution, single Docker container.
Rankings (with penalty for missing scripts)
VMs that fail or skip a script are assigned the slowest competitor's time for that script. This prevents gaming the leaderboard by selectively skipping slow tests.
Rankings (raw, no penalty)
Geometric mean computed only from scripts where all VMs have a successful result. This ensures a fair like-for-like comparison.