My favorite local model right now is a bit of surprise to me: I'm really enjoying the relatively tiny Qwen3-8B, running the 4bit quantized version on my Mac using MLX

It's surprisingly capable given it's a 4.3GB download and uses just 4-5GB of RAM while it's running

simonwillison.net/2025/May/2/q

0

If you have a fediverse account, you can quote this note from your own instance. Search https://fedi.simonwillison.net/users/simon/statuses/114440897356073827 on your instance and quote it. (Note that quoting is not supported in Mastodon.)