ADR-038: Tool runtime (function-calling agent loop)

Status: Accepted
Date: 2026-06-29
Authors: Netresearch DTT GmbH

Context

nr-llm completion has been single-shot: one request, one answer. The tool protocol value objects already existed — ToolSpec and ToolCall (ADR-010), OpenAI-wire-aligned — and LlmServiceManager::chatWithTools() could send tool declarations and read the model's tool calls back. But there was no registry of executable tools, no PHP that runs a tool, and no loop that feeds a tool result back into the conversation. A model could ask to call a tool; nothing answered.

Worse, chatWithTools() cannot be the loop's engine. It resolves its provider from the ExtensionConfiguration['nr_llm']['providers'] keyed registry and runs against a model-less transient configuration. That registry is not populated for chat (providers, models and configurations are DB-backed). The consequences are concrete:

For keyed providers (Claude, Gemini, Groq, Mistral, OpenRouter) there is no registered API key, so the call is unauthenticated (401).
Every provider runs on its hardcoded default model, never the model the admin selected on the LlmConfiguration.
Cost is computed downstream by UsageMiddleware from the priced Model; a model-less transient config records zero-cost usage, so the budget cost bucket never sees the spend.

So the agent loop cannot reach a selected configuration's vault key, model, temperature, system prompt or pricing through the provider-key path. A config-aware entry point is required before a loop is safe to run.

Decision

A DI-tagged tool registry. ToolInterface (Classes/Service/Tool/) declares four methods — getSpec(): ToolSpec, execute(array $arguments): string, isEnabledByDefault(): bool (curated low-risk tools return true; secret- or system-exposing tools return false so they are opt-in) and requiresAdmin(): bool (admin-only gating for tools surfacing system/host/cross-user data) — both central to the fail-open/fail-closed security model below. It carries #[AutoconfigureTag('nr_llm.tool')]. ToolRegistry collects every tagged tool through an autowired iterator and indexes it by spec name (a duplicate name is a developer error → LogicException at construction). An extension adds a tool simply by tagging a class — no central registration edit. The registry is the authoritative allow-set: specs($allowedNames) intersects the declared names against what is actually registered and drops the rest.
A config-aware tool entry point. LlmServiceManager::chatWithToolsForConfiguration() mirrors chatWithConfiguration() — it resolves the adapter from the LlmConfiguration (vault key + real Model + params), guards instanceof ToolCapableInterface and runs through the middleware pipeline, so UsageMiddleware sees the priced model and records real cost. It is additive on LlmServiceManagerInterface (no consumer break) and is the only call the loop makes per round.
A bounded agent loop. ToolLoopService::runLoop() calls chatWithToolsForConfiguration() each iteration; while the model returns tool calls it executes them and re-sends, bounded by a configurable max-iteration cap (constructor default 5). Three fail-soft rules keep the admin informed instead of aborting:
- An empty offered set (no tools, or an empty allow-list) is a single plain chatWithConfiguration() completion — an empty tools array makes some providers (OpenAI) 400.
- Hitting the cap with tools still pending triggers one final plain chatWithConfiguration() (no tools field at all) to synthesise a closing answer and sets truncated = true. A no-tools completion yields a real finalContent uniformly across OpenAI, Claude and Ollama — unlike toolChoice='none' or an empty tools array.
- A mid-loop BudgetExceededException returns the partial ToolLoopResult (trace + usage so far, truncated = true); the budget fires pre-flight and tools are read-only, so the state is consistent.
Raw-array message turns; ChatMessage unchanged. The loop appends the assistant tool_calls turn and one tool result turn per call as raw arrays. LlmServiceManager::normaliseMessages() routes only exact 2-key {role,content} arrays through ChatMessage; the 3-key tool turns pass through unchanged to OpenAI and Claude. Empty arguments serialise to {} (an object), never []. OllamaProvider translates the replayed OpenAI-shape turns into Ollama's native /api/chat shape (object arguments, tool_call_id dropped) and synthesises a call id (call_<index>) on the way out, because Ollama returns none and ToolCall rejects an empty id.
Skill.allowed_tools is a fail-closed-on-declaration allow-list. AllowedToolsResolver reads the effective skills (enabled, non-orphaned, deduped — exactly what SkillComposer injects) of the configuration and task. If no skill declares allowed-tools it returns null (no skill-imposed restriction → all registered tools). If any declares, the result is the union of the declared lists — a lone declared empty list yields [] (no tools). The allow-list is enforced twice: when computing the offered specs() and again at execution time, so a model steered by injected skill prose cannot call a registered-but-not-offered tool.
Authorization is enforced in the runtime, against the acting backend user — not only in the playground. Because ToolLoopService runs tools on behalf of a backend request (and a future non-admin consumer could be wired to it), every tool declares requiresAdmin(). The loop resolves the acting $GLOBALS['BE_USER'] and, when it is not an admin, filters every admin-only tool out of the offered set (fail-closed: an unknown tool name is treated as admin-only). Admin-only tools are those exposing system / host / cross-user data — fetch_logs, get_env / get_env_raw, get_php_info / get_php_info_raw, list_be_users / list_be_users_raw, list_be_groups and read_fal_asset_meta. Tools that read user-scoped records and are usable by a non-admin instead self-enforce the acting user's own TYPO3 permissions inside execute(): get_pagetree applies getPagePermsClause(Permission::PAGE_SHOW) and get_tca filters tables by check('tables_select', …) (an admin bypasses both — TYPO3 admins see everything). Queries use the default restriction set (no blanket removeAll()) so soft-deleted rows never surface; the admin-only be_users / be_groups listings keep removeAll() plus an explicit deleted = 0 so disabled users remain visible for auditing.
Generic error egress, detail logged server-side. A thrown tool, an unknown or disallowed tool name, and any unexpected provider failure become a generic error string. The exception body may carry DBAL/PDO credentials that URL-sanitising would not strip, so it never reaches the provider or the DOM; the full detail is logged through the injected logger.

Consequences

●● nr-llm gains a real agent loop: admin-curated PHP tools run mid-generation on the selected configuration's vault key and model, and the result is fed back until the model answers or the cap is reached.
●● Cost is recorded via the config-aware path and bounded by the iteration cap plus the per-iteration budget pre-flight (request-count / token / cost buckets, given the BE-user uid is set). Without chatWithToolsForConfiguration() only the cap and token/request counts would bound spend, and keyed providers would 401.
● Extensions extend the tool set by tagging a class; no edit to nr-llm and no architecture exception (tools live under Service\Tool and inherit the existing service-layer guard).
● The allow-list re-validation at both offer and execution time means a declared-but-unknown tool name is dropped and an injected prompt cannot reach a tool the skills did not grant.
◐ The shipped built-in tools (fetch_logs, read_fal_asset_meta, and the later diagnostic/record tools — get_php_info, get_env, get_pagetree, get_tca, list_be_users, list_be_groups and their secret-redacted/raw variants) are admin-curated, read-only, input-bounded and scoped (limit cap + PII redaction; storage-scoped lookup). They are reference implementations of the security contract, not a general capability.
●● Authorization is per-tool and enforced in the runtime against the acting backend user, not merely the playground gate (§6): admin-only tools are filtered out for non-admins (fail-closed), and the user-scoped tools honour the acting user's page / table permissions. A future non-admin consumer of ToolLoopService therefore cannot reach system data or read beyond the user's own TYPO3 rights — closing the escalation surface the earlier admin-only-playground assumption relied on.
◐ read_fal_asset_meta is gated admin-only rather than resolving per-user file-storage permissions: file metadata can span storages a non-admin cannot see, and per-storage resolution is brittle, so the simpler, stricter gate was chosen (with the storage allow-list as a further bound).
✕ Message role is not a trust boundary: a prompt injection in skill prose can still steer a tool's arguments. The mitigation is input validation + scoping in each tool, the offered allow-list, and the XSS-safe render of every tool-derived string in the playground.

See ADR-010 for the tool/function-calling abstraction, ADR-013 for the configuration hierarchy the loop runs on, ADR-026 for the middleware pipeline that records cost, ADR-036 for skill injection (which steers tool arguments), and the administration guide for operation.