ADR-021: Provider Fallback Chain
- Status
-
Accepted
- Date
-
2026-04
- Authors
-
Netresearch DTT GmbH
Context
A single misbehaving provider (OpenAI rate-limit, Claude outage, local Ollama daemon not running) previously bubbled up as an uncaught exception to every consuming extension. Operators had no built-in way to degrade gracefully to a second or third provider.
Decision
A configuration's fallback_chain column stores an ordered JSON list of
other Llm identifiers. On retryable failures during
Llm or
complete, a Fallback walks the
chain and returns the first successful response — or throws
Fallback carrying every attempt error.
"Retryable" is narrowly defined: the request might succeed against a different provider.
Provider— network / timeout / HTTP 5xx / retries exhaustedConnection Exception Providerwith HTTP code 429 — this provider is rate-limiting us, another might not beResponse Exception
Everything else (authentication, bad request, unsupported feature, misconfiguration) bubbles up unchanged — a different provider won't help.
Scope limitations (v1)
- Streaming is not wrapped. Once the first chunk has been yielded, we
cannot swap providers mid-stream.
streamcalls the primary adapter directly.Chat With Configuration () - Shallow only. A fallback configuration's own chain is ignored. This
prevents both cycles (
a -> b -> a) and exponential blow-up of attempts. - Inactive fallbacks are skipped, not treated as failures.
- Missing identifiers are skipped with a warning log, not treated as failures. Misconfiguration should not mask outages.
Storage
The chain is stored as a single JSON column to keep the schema change
minimal and avoid an additional relation table. The
Netresearch\ value object handles
serialization, deduplication, and order preservation.
TCA presents the field as a JSON textarea for v1. A richer UI (sortable multi-select of available configurations) can replace the textarea without schema or API change.
Alternatives considered
- Fat middleware pipeline (as in b13/aim). Rejected for this release — too invasive for a single-feature change. The middleware pattern remains on the roadmap as a v1.0 refactor; a fallback chain is the most valuable pipeline step users ask for and works fine as a standalone service.
- Recursive chain resolution (fallback's fallback). Rejected as the cost (cycle detection, attempt amplification) outweighs the benefit; operators can always append to the primary's chain directly.
- Per-link retry policy (per fallback: max retries, backoff, which exceptions). Rejected as over-engineered for the initial release.