Cloudflare outage should not have happened, and they seem to be missing the point on how to avoid it in the future
Yet again, another global IT outage happen (deja vu strikes again in our industry). This time at cloudflare(Prince 2025). Again, taking down large swats of the internet with it(Booth 2025).
And yes, like my previous analysis of the GCP and CrowdStrike’s outages, this post critiques Cloudflare’s root cause analysis (RCA), which — despite providing a great overview of what happened — misses the real lesson.
Here’s the key section of their RCA:
ebellani.github.io