An update on the state of our instance.
We were very successful tonight in getting the database under control.
It took a gargantuan effort, and I thank everyone in the Discord for all their help. This was a cross-instance effort, and I'm so humbled by that.
@RichardNairn,
@andrewdemarsh,
@quaff,
@ThaMunstaMike Johnston,
@controlc, among so many others put our brains together.
As it turns out, we were missing a couple of indexes in the (now 364GB, was 480GB) database, autovacuum wasn't cleaning up a few tables frequently enough, and our concurrency (connections) with the database were too high.
All of this grew to a head last Friday, like if there was too much water for a waterfall.
There are about 600,000 jobs left for the instance to process to catch up with the fediverse, so if you're notifications or posts are two hours behind, that's why. That latency is decreasing with time.
I'm hopeful that come the morning when our traffic comes back, we're still in great shape.
π¨π¦