.. include:: /Includes.rst.txt .. _adr-038: ================================================================== ADR-038: Tool runtime (function-calling agent loop) ================================================================== :Status: Accepted :Date: 2026-06-29 :Authors: Netresearch DTT GmbH .. _adr-038-context: Context ======= nr-llm completion has been **single-shot**: one request, one answer. The tool *protocol* value objects already existed — :php:`ToolSpec` and :php:`ToolCall` (:ref:`ADR-010 `), OpenAI-wire-aligned — and :php:`LlmServiceManager::chatWithTools()` could send tool declarations and read the model's tool calls back. But there was **no registry of executable tools, no PHP that runs a tool, and no loop** that feeds a tool result back into the conversation. A model could *ask* to call a tool; nothing answered. Worse, ``chatWithTools()`` cannot be the loop's engine. It resolves its provider from the ``ExtensionConfiguration['nr_llm']['providers']`` keyed registry and runs against a **model-less transient configuration**. That registry is not populated for chat (providers, models and configurations are DB-backed). The consequences are concrete: - For keyed providers (Claude, Gemini, Groq, Mistral, OpenRouter) there is no registered API key, so the call is **unauthenticated** (401). - Every provider runs on its **hardcoded default model**, never the model the admin selected on the :php:`LlmConfiguration`. - Cost is computed downstream by :php:`UsageMiddleware` from the priced :php:`Model`; a model-less transient config records **zero-cost** usage, so the budget cost bucket never sees the spend. So the agent loop cannot reach a selected configuration's vault key, model, temperature, system prompt or pricing through the provider-key path. A config-aware entry point is required before a loop is safe to run. .. _adr-038-decision: Decision ======== 1. **A DI-tagged tool registry.** :php:`ToolInterface` (``Classes/Service/Tool/``) declares four methods — ``getSpec(): ToolSpec``, ``execute(array $arguments): string``, ``isEnabledByDefault(): bool`` (curated low-risk tools return ``true``; secret- or system-exposing tools return ``false`` so they are opt-in) and ``requiresAdmin(): bool`` (admin-only gating for tools surfacing system/host/cross-user data) — both central to the fail-open/fail-closed security model below. It carries ``#[AutoconfigureTag('nr_llm.tool')]``. :php:`ToolRegistry` collects every tagged tool through an autowired iterator and indexes it by spec name (a duplicate name is a developer error → :php:`LogicException` at construction). **An extension adds a tool simply by tagging a class** — no central registration edit. The registry is the **authoritative allow-set**: ``specs($allowedNames)`` intersects the declared names against what is actually registered and drops the rest. 2. **A config-aware tool entry point.** :php:`LlmServiceManager::chatWithToolsForConfiguration()` mirrors ``chatWithConfiguration()`` — it resolves the adapter from the :php:`LlmConfiguration` (vault key + real :php:`Model` + params), guards ``instanceof ToolCapableInterface`` and runs through the middleware pipeline, so :php:`UsageMiddleware` sees the priced model and records real cost. It is additive on :php:`LlmServiceManagerInterface` (no consumer break) and is the only call the loop makes per round. 3. **A bounded agent loop.** :php:`ToolLoopService::runLoop()` calls ``chatWithToolsForConfiguration()`` each iteration; while the model returns tool calls it executes them and re-sends, **bounded by a configurable max-iteration cap (constructor default 5)**. Three fail-soft rules keep the admin informed instead of aborting: - An **empty offered set** (no tools, or an empty allow-list) is a single plain ``chatWithConfiguration()`` completion — an empty ``tools`` array makes some providers (OpenAI) 400. - **Hitting the cap** with tools still pending triggers one final plain ``chatWithConfiguration()`` (no ``tools`` field at all) to synthesise a closing answer and sets ``truncated = true``. A no-tools completion yields a real ``finalContent`` uniformly across OpenAI, Claude and Ollama — unlike ``toolChoice='none'`` or an empty tools array. - A mid-loop :php:`BudgetExceededException` returns the **partial** :php:`ToolLoopResult` (trace + usage so far, ``truncated = true``); the budget fires pre-flight and tools are read-only, so the state is consistent. 4. **Raw-array message turns; ChatMessage unchanged.** The loop appends the assistant ``tool_calls`` turn and one ``tool`` result turn per call as raw arrays. :php:`LlmServiceManager::normaliseMessages()` routes only exact 2-key ``{role,content}`` arrays through :php:`ChatMessage`; the 3-key tool turns pass through unchanged to OpenAI and Claude. Empty arguments serialise to ``{}`` (an object), never ``[]``. :php:`OllamaProvider` translates the replayed OpenAI-shape turns into Ollama's native ``/api/chat`` shape (object arguments, ``tool_call_id`` dropped) and **synthesises a call id** (``call_``) on the way out, because Ollama returns none and :php:`ToolCall` rejects an empty id. 5. **Skill.allowed_tools is a fail-closed-on-declaration allow-list.** :php:`AllowedToolsResolver` reads the *effective* skills (enabled, non-orphaned, deduped — exactly what :php:`SkillComposer` injects) of the configuration and task. If **no** skill declares ``allowed-tools`` it returns ``null`` (no skill-imposed restriction → all registered tools). If **any** declares, the result is the **union** of the declared lists — a lone declared empty list yields ``[]`` (no tools). The allow-list is enforced **twice**: when computing the offered ``specs()`` *and* again at execution time, so a model steered by injected skill prose cannot call a registered-but-not-offered tool. 6. **Authorization is enforced in the runtime, against the acting backend user — not only in the playground.** Because :php:`ToolLoopService` runs tools on behalf of a backend request (and a future non-admin consumer could be wired to it), every tool declares :php:`requiresAdmin()`. The loop resolves the acting ``$GLOBALS['BE_USER']`` and, when it is not an admin, **filters every admin-only tool out of the offered set** (fail-closed: an unknown tool name is treated as admin-only). Admin-only tools are those exposing system / host / cross-user data — ``fetch_logs``, ``get_env`` / ``get_env_raw``, ``get_php_info`` / ``get_php_info_raw``, ``list_be_users`` / ``list_be_users_raw``, ``list_be_groups`` and ``read_fal_asset_meta``. Tools that read user-scoped records and are usable by a non-admin instead **self-enforce the acting user's own TYPO3 permissions** inside ``execute()``: ``get_pagetree`` applies ``getPagePermsClause(Permission::PAGE_SHOW)`` and ``get_tca`` filters tables by ``check('tables_select', …)`` (an admin bypasses both — TYPO3 admins see everything). Queries use the default restriction set (no blanket ``removeAll()``) so soft-deleted rows never surface; the admin-only ``be_users`` / ``be_groups`` listings keep ``removeAll()`` plus an explicit ``deleted = 0`` so disabled users remain visible for auditing. 7. **Generic error egress, detail logged server-side.** A thrown tool, an unknown or disallowed tool name, and any unexpected provider failure become a **generic** error string. The exception body may carry DBAL/PDO credentials that URL-sanitising would not strip, so it never reaches the provider or the DOM; the full detail is logged through the injected logger. .. _adr-038-consequences: Consequences ============ - ●● nr-llm gains a real agent loop: admin-curated PHP tools run mid-generation on the selected configuration's vault key and model, and the result is fed back until the model answers or the cap is reached. - ●● Cost is **recorded** via the config-aware path and bounded by the iteration cap **plus** the per-iteration budget pre-flight (request-count / token / cost buckets, given the BE-user uid is set). Without ``chatWithToolsForConfiguration()`` only the cap and token/request counts would bound spend, and keyed providers would 401. - ● Extensions extend the tool set by tagging a class; no edit to nr-llm and no architecture exception (tools live under ``Service\Tool`` and inherit the existing service-layer guard). - ● The allow-list re-validation at both offer and execution time means a declared-but-unknown tool name is dropped and an injected prompt cannot reach a tool the skills did not grant. - ◐ The shipped built-in tools (``fetch_logs``, ``read_fal_asset_meta``, and the later diagnostic/record tools — ``get_php_info``, ``get_env``, ``get_pagetree``, ``get_tca``, ``list_be_users``, ``list_be_groups`` and their secret-redacted/raw variants) are admin-curated, **read-only**, input-bounded and scoped (limit cap + PII redaction; storage-scoped lookup). They are reference implementations of the security contract, not a general capability. - ●● Authorization is **per-tool and enforced in the runtime against the acting backend user**, not merely the playground gate (§6): admin-only tools are filtered out for non-admins (fail-closed), and the user-scoped tools honour the acting user's page / table permissions. A future non-admin consumer of :php:`ToolLoopService` therefore cannot reach system data or read beyond the user's own TYPO3 rights — closing the escalation surface the earlier admin-only-playground assumption relied on. - ◐ ``read_fal_asset_meta`` is gated **admin-only** rather than resolving per-user file-storage permissions: file metadata can span storages a non-admin cannot see, and per-storage resolution is brittle, so the simpler, stricter gate was chosen (with the storage allow-list as a further bound). - ✕ Message role is not a trust boundary: a prompt injection in skill prose can still steer a tool's arguments. The mitigation is input validation + scoping in each tool, the offered allow-list, and the XSS-safe render of every tool-derived string in the playground. See :ref:`ADR-010 ` for the tool/function-calling abstraction, :ref:`ADR-013 ` for the configuration hierarchy the loop runs on, :ref:`ADR-026 ` for the middleware pipeline that records cost, :ref:`ADR-036 ` for skill injection (which steers tool arguments), and :ref:`the administration guide ` for operation.