Objective-C's fast enumeration protocol is one of the most underrated parts of the language and something I wish other systems would learn from.
It's designed to build fast loops that iterate over a collection and is agnostic to the kind of collection, without needing a lot of run-time specialisation of the loop.
When you write for (object in collection), it is expanded to a nested loop. The outer loop is responsible for filling a buffer, the inner loop for iterating over the buffer.
The outer loop does some dynamic dispatch and then returns a size, so the dynamic dispatch cost is amortised over the number of elements returned. The structure that's passed to the method contains a buffer and a size of that buffer, a pointer, and some space for storing other iteration state, as well as a pointer that can be checked to ensure that concurrent mutation has not occurred.
If a collection stores data in a single contiguous block (for example, something like std::vector) of the correct type, the outer loop runs once, fetches that pointer, and then the inner loop scans through it. If the collection stores data in a set of buckets, it can return each bucket as a run. If it stores each element separately, then it can do a single dynamically dispatched call and look up the number of elements that the caller requests, which can then be processed quickly by the inner loop.
Pretty much every C++ forward iterator that I've implemented has ended up being some specialised variant of this. ICU's UText has a similar pattern, and I've seen something equivalent implemented in an ad-hoc way in a lot of different places. Being able to write code that iterates over a collection of T, that can dynamically dispatch to any collection even if it's compile-time specialised on the type of T is incredibly useful for avoiding tight coupling across ABI boundaries.