More interesting progress trying to make suitable for very busy sites!

I realized that (both with and ) is a *major* bottleneck. With TLS enabled, I couldn't cross 3000 requests per second, with somewhat acceptable response times (most below 500ms). Disabling TLS, I could really see the impact of a queue as opposed to one protected by a . With the mutex, up to around 8000 req/s could be reached on the same hardware. And with a lockfree design, that quickly went beyond 10k req/s, but crashed. ๐Ÿ˜†

So I read some scientific papers ๐Ÿ™ˆ ... and redesigned a lot (*). And now it finally seems to work. My latest test reached a throughput of almost 25k req/s, with response times below 10ms for most requests! I really didn't expect to see *this* happen. ๐Ÿคฉ Maybe it could do even more, didn't try yet.

Open issue: Can I do something about TLS? There *must* be some way to make it perform at least a *bit* better...

(*) edit: Here's the design I finally used, with a much simplified "dequeue" because the queues in question are guaranteed to have only a single consumer: dl.acm.org/doi/10.1145/248052.

Throuput curve of my latest stress test of swad (with ramp-up and ramp-down phase)Response times in percentiles. 97% stay below 10ms!
0

If you have a fediverse account, you can quote this note from your own instance. Search https://mastodon.bsd.cafe/users/zirias/statuses/114637986074327787 on your instance and quote it. (Note that quoting is not supported in Mastodon.)