Failure Index / Rate limits & availability · warning · LangChain · OpenAI Agents SDK · CrewAI

Agent tool fails with HTTP 500 / 502 / 503 from the upstream service

The service behind the tool is having a bad day — this failure is on their side, not in your agent. The real question is whether your agent handles it gracefully or corrupts the rest of the run.

The error

tool 503 service unavailable agent
langchain tool 500 internal server error
502 bad gateway agent api call

Root cause

Upstream outage or degradation. Not caused by the agent's inputs or configuration — but agents without retry/fallback logic turn a transient blip into a failed run.

The fix

Retry with backoff (5xx errors are usually transient), check the provider's status page if it persists, and make the agent degrade gracefully — skip the step or use a fallback tool instead of aborting the whole run.

Preventing it next time

Wrap actuator calls in retry-with-backoff by default, and distinguish transient (5xx, timeout) from permanent (4xx) failures in your error handling — only transients deserve retries.

Stop debugging this by hand. Vorlo watches your agent and, when this failure happens, hands you the diagnosis and the fix — verified by developers who hit it before you — in your dashboard, Slack, or your editor. Two lines of code: pip install vorlo-trace · npm install vorlo-trace. Start free

Related failures