ADR-038: Tool runtime (function-calling agent loop)
- Status
-
Accepted
- Date
-
2026-06-29
- Authors
-
Netresearch DTT GmbH
Context
nr-llm completion has been single-shot: one request, one answer. The
tool protocol value objects already existed — Tool and
Tool (ADR-010), OpenAI-wire-aligned — and
Llm could send tool declarations and
read the model's tool calls back. But there was no registry of executable
tools, no PHP that runs a tool, and no loop that feeds a tool result back
into the conversation. A model could ask to call a tool; nothing answered.
Worse, chatWithTools() cannot be the loop's engine. It resolves its
provider from the ExtensionConfiguration['nr_llm']['providers'] keyed
registry and runs against a model-less transient configuration. That
registry is not populated for chat (providers, models and configurations are
DB-backed). The consequences are concrete:
- For keyed providers (Claude, Gemini, Groq, Mistral, OpenRouter) there is no registered API key, so the call is unauthenticated (401).
- Every provider runs on its hardcoded default model, never the model
the admin selected on the
Llm.Configuration - Cost is computed downstream by
Usagefrom the pricedMiddleware Model; a model-less transient config records zero-cost usage, so the budget cost bucket never sees the spend.
So the agent loop cannot reach a selected configuration's vault key, model, temperature, system prompt or pricing through the provider-key path. A config-aware entry point is required before a loop is safe to run.
Decision
- A DI-tagged tool registry.
Tool(Interface Classes/Service/Tool/) declares four methods —getSpec(): ToolSpec,execute(array $arguments): string,isEnabledByDefault(): bool(curated low-risk tools returntrue; secret- or system-exposing tools returnfalseso they are opt-in) andrequiresAdmin(): bool(admin-only gating for tools surfacing system/host/cross-user data) — both central to the fail-open/fail-closed security model below. It carries#[AutoconfigureTag('nr_llm.tool')].Toolcollects every tagged tool through an autowired iterator and indexes it by spec name (a duplicate name is a developer error →Registry Logicat construction). An extension adds a tool simply by tagging a class — no central registration edit. The registry is the authoritative allow-set:Exception specs($allowedNames)intersects the declared names against what is actually registered and drops the rest. - A config-aware tool entry point.
LlmmirrorsService Manager:: chat With Tools For Configuration () chatWithConfiguration()— it resolves the adapter from theLlm(vault key + realConfiguration Model+ params), guardsinstanceof ToolCapableInterfaceand runs through the middleware pipeline, soUsagesees the priced model and records real cost. It is additive onMiddleware Llm(no consumer break) and is the only call the loop makes per round.Service Manager Interface -
A bounded agent loop.
ToolcallsLoop Service:: run Loop () chatWithToolsForConfiguration()each iteration; while the model returns tool calls it executes them and re-sends, bounded by a configurable max-iteration cap (constructor default 5). Three fail-soft rules keep the admin informed instead of aborting:- An empty offered set (no tools, or an empty allow-list) is a single
plain
chatWithConfiguration()completion — an emptytoolsarray makes some providers (OpenAI) 400. - Hitting the cap with tools still pending triggers one final plain
chatWithConfiguration()(notoolsfield at all) to synthesise a closing answer and setstruncated = true. A no-tools completion yields a realfinalContentuniformly across OpenAI, Claude and Ollama — unliketoolChoice='none'or an empty tools array. - A mid-loop
Budgetreturns the partialExceeded Exception Tool(trace + usage so far,Loop Result truncated = true); the budget fires pre-flight and tools are read-only, so the state is consistent.
- An empty offered set (no tools, or an empty allow-list) is a single
plain
- Raw-array message turns; ChatMessage unchanged. The loop appends the
assistant
tool_callsturn and onetoolresult turn per call as raw arrays.Llmroutes only exact 2-keyService Manager:: normalise Messages () {role,content}arrays throughChat; the 3-key tool turns pass through unchanged to OpenAI and Claude. Empty arguments serialise toMessage {}(an object), never[].Ollamatranslates the replayed OpenAI-shape turns into Ollama's nativeProvider /api/chatshape (object arguments,tool_call_iddropped) and synthesises a call id (call_<index>) on the way out, because Ollama returns none andToolrejects an empty id.Call - Skill.allowed_tools is a fail-closed-on-declaration allow-list.
Allowedreads the effective skills (enabled, non-orphaned, deduped — exactly whatTools Resolver Skillinjects) of the configuration and task. If no skill declaresComposer allowed-toolsit returnsnull(no skill-imposed restriction → all registered tools). If any declares, the result is the union of the declared lists — a lone declared empty list yields[](no tools). The allow-list is enforced twice: when computing the offeredspecs()and again at execution time, so a model steered by injected skill prose cannot call a registered-but-not-offered tool. - Authorization is enforced in the runtime, against the acting backend
user — not only in the playground. Because
Toolruns tools on behalf of a backend request (and a future non-admin consumer could be wired to it), every tool declaresLoop Service requires. The loop resolves the actingAdmin () $GLOBALS['BE_USER']and, when it is not an admin, filters every admin-only tool out of the offered set (fail-closed: an unknown tool name is treated as admin-only). Admin-only tools are those exposing system / host / cross-user data —fetch_logs,get_env/get_env_raw,get_php_info/get_php_info_raw,list_be_users/list_be_users_raw,list_be_groupsandread_fal_asset_meta. Tools that read user-scoped records and are usable by a non-admin instead self-enforce the acting user's own TYPO3 permissions insideexecute():get_pagetreeappliesgetPagePermsClause(Permission::PAGE_SHOW)andget_tcafilters tables bycheck('tables_select', …)(an admin bypasses both — TYPO3 admins see everything). Queries use the default restriction set (no blanketremoveAll()) so soft-deleted rows never surface; the admin-onlybe_users/be_groupslistings keepremoveAll()plus an explicitdeleted = 0so disabled users remain visible for auditing. - Generic error egress, detail logged server-side. A thrown tool, an unknown or disallowed tool name, and any unexpected provider failure become a generic error string. The exception body may carry DBAL/PDO credentials that URL-sanitising would not strip, so it never reaches the provider or the DOM; the full detail is logged through the injected logger.
Consequences
- ●● nr-llm gains a real agent loop: admin-curated PHP tools run mid-generation on the selected configuration's vault key and model, and the result is fed back until the model answers or the cap is reached.
- ●● Cost is recorded via the config-aware path and bounded by the
iteration cap plus the per-iteration budget pre-flight (request-count /
token / cost buckets, given the BE-user uid is set). Without
chatWithToolsForConfiguration()only the cap and token/request counts would bound spend, and keyed providers would 401. - ● Extensions extend the tool set by tagging a class; no edit to nr-llm and
no architecture exception (tools live under
Service\Tooland inherit the existing service-layer guard). - ● The allow-list re-validation at both offer and execution time means a declared-but-unknown tool name is dropped and an injected prompt cannot reach a tool the skills did not grant.
- ◐ The shipped built-in tools (
fetch_logs,read_fal_asset_meta, and the later diagnostic/record tools —get_php_info,get_env,get_pagetree,get_tca,list_be_users,list_be_groupsand their secret-redacted/raw variants) are admin-curated, read-only, input-bounded and scoped (limit cap + PII redaction; storage-scoped lookup). They are reference implementations of the security contract, not a general capability. - ●● Authorization is per-tool and enforced in the runtime against the
acting backend user, not merely the playground gate (§6): admin-only tools
are filtered out for non-admins (fail-closed), and the user-scoped tools
honour the acting user's page / table permissions. A future non-admin
consumer of
Tooltherefore cannot reach system data or read beyond the user's own TYPO3 rights — closing the escalation surface the earlier admin-only-playground assumption relied on.Loop Service - ◐
read_fal_asset_metais gated admin-only rather than resolving per-user file-storage permissions: file metadata can span storages a non-admin cannot see, and per-storage resolution is brittle, so the simpler, stricter gate was chosen (with the storage allow-list as a further bound). - ✕ Message role is not a trust boundary: a prompt injection in skill prose can still steer a tool's arguments. The mitigation is input validation + scoping in each tool, the offered allow-list, and the XSS-safe render of every tool-derived string in the playground.
See ADR-010 for the tool/function-calling abstraction, ADR-013 for the configuration hierarchy the loop runs on, ADR-026 for the middleware pipeline that records cost, ADR-036 for skill injection (which steers tool arguments), and the administration guide for operation.