random thought: are there performance enthusiasts who have tried making a fast cache data structure or such that self-tunes based on perf counters or access timings? like "oh, cache set 21 has too much contention, let's try to leave that set mostly empty from now" or "this was supposed to fit into L2 but clearly it's all gone because the other code running in between is too L2-heavy, let's resize to something more appropriate given that we can only persist in L3"