labeille can compare 2 test runs and show what changed and why it changed.

When it goes from PASS to CRASH, labeille looks at the package's repo. If the commit is the same, it's a CPython/JIT regression. Otherwise, it might be the package:

requests: PASS → CRASH
Repo: abc1234 → abc1234 (unchanged — likely a CPython/JIT regression)

flask: CRASH → PASS
Repo: 222bbbb → 333cccc (changed)

This allows figuring out "3 of these are JIT regressions".

I built labeille to find CPython JIT crashes, but it's a "run real world test suites at scale" platform.

It also works for:
— Checking which packages pass their tests on a new CPython version
— Testing free-threaded (no-GIL) CPython compatibility
— Measuring coverage.py or memray overhead across hundreds of packages
— Comparing CPython vs PyPy performance on real code

The registry of 350+ packages with install/test commands is the core.

0

If you have a fediverse account, you can quote this note from your own instance. Search https://mastodon.social/users/danzin/statuses/116148601036225419 on your instance and quote it. (Note that quoting is not supported in Mastodon.)