I've been working on a new Python tool: labeille. Its main purpose is to look for CPython JIT crashes by running real world test suites.
https://github.com/devdanzin/labeille
But it's grown a feature that might interest more people: benchmarking using PyPI packages.
How does that work?
labeille allows you to run test suites in 2 different configurations. Say, with coverage on and off, or memray on and off. Here's an example:
https://gist.github.com/devdanzin/63528343df98779b5fedf657bf8286cd