labeille can compare 2 test runs and show what changed and why it changed.
When it goes from PASS to CRASH, labeille looks at the package's repo. If the commit is the same, it's a CPython/JIT regression. Otherwise, it might be the package:
requests: PASS → CRASH
Repo: abc1234 → abc1234 (unchanged — likely a CPython/JIT regression)
flask: CRASH → PASS
Repo: 222bbbb → 333cccc (changed)
This allows figuring out "3 of these are JIT regressions".