X, Spotify, ChatGPT among services taken offline today
Pro
Image: Shutterstock
X, Spotify, PayPal and ChatGPT were among hundreds of websites and services taken offline today after cloud infrastructure provider Cloudflare suffered a global outage.
Cloudflare had posted notices for scheduled maintenance today at data centres in Atlanta, Chile and Tahiti. However at 11:17 this morning the company said it was experiencing issues with its support portal provider, leading to customers having to deal with slower response times. This was followed by another update at 11:48 saying Cloudflare was experiencing “an internal service degradation”. At this point users lost access for about half an hour before some services began to reappear.
By 1:35 error levels had returned to “pre-incident rates”. At 2:42 Cloudflare posted: “A fix has been implemented and we believe the incident is now resolved. We are continuing to monitor for errors to ensure all services are back to normal.”
Appropriately, one of the affected services was Down Detector, which tracks the status of websites around the world.
Response on social media has ranged from the posting of memes depicting engineers as cats to calls for greater decentralisation of infrastructure.
Posting on X, @KuroukaRun said: “One company, multiple failure points, Zero geographic isolation, cascade effect to dependent services. This wasn’t a small incident. This was a systemic failure that exposed how fragile internet infrastructure is.”
Also on X Cloudflare CTO Dane Knecht emphasised the outage was not the result of a cyber attack, posting: “…earlier today we failed our customers and the broader Internet when a problem in @Cloudflare network impacted large amounts of traffic that rely on us. The sites, businesses, and organizations that rely on Cloudflare depend on us being available and I apologize for the impact that we caused.
“Transparency about what happened matters, and we plan to share a breakdown with more details in a few hours. In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack.
“That issue, impact it caused, and time to resolution is unacceptable. Work is already underway to make sure it does not happen again, but I know it caused real pain today. The trust our customers place in us is what we value the most and we are going to do what it takes to earn that back.”
“This is exactly the kind of upstream failure that should be on every Irish organisation’s risk register,” said George Foley, spokesperson for Eset Ireland. “Cloudflare sits quietly in front of a huge slice of the Internet, and when it stumbles, the impact is instant. Your own servers and applications can be healthy, but staff and customers are locked out because DNS lookups, routing or security checks are failing somewhere else. If you build your entire digital front door on a single provider, you are accepting their bad day as your outage… Outages like today are not rare black swan events. They are a predictable operational risk that belongs right beside ransomware and fraud in board conversations.”
TechCentral Reporters


