Anthropic's 12-hour outage
Last Thursday, the Anthropic API was unstable/down for almost 12 hours:
We saw that orgs who had setup fallbacks for their Anthropic requests, they did not face any failures — 99.86% of their requests succeeded, and only 0.14% failed.
These users had setup fallbacks to route their requests to (1) OpenAI, (2) Azure, (3) Gemini, and (4) a bunch of hosted Llama models.
There are some key learnings from the 0.14% requests that failed though:
- Some users hadn't configured fallbacks for the 529 status code (which had spiked the most that day)
- A few had improperly set up fallback targets (expired keys, non-existent targets)
- In rare cases, even the fallback target failed (pro tip: always have multiple options!)
Check out our fallback documentation to protect your app from going down with an LLM API failure again → Portkey Fallback Docs