ADR-035: Skill ingest (GitHub-hosted SKILL.md sources)
- Status
-
Accepted
- Date
-
2026-06-27
- Authors
-
Netresearch DTT GmbH
Context
Editors want to reuse the growing ecosystem of Claude Code skills —
SKILL.md files with YAML front-matter (name + description)
and a markdown body — inside nr-llm. These live on GitHub as a single
file, as a whole repository (many SKILL.md under skills/,
.claude/skills/ or <plugin>/skills/), or behind an Anthropic
marketplace.json index that points at further repositories.
Fetching attacker-influenced markdown from the public internet and later feeding it into an LLM prompt raises two separate concerns that are easy to conflate:
- Server-Side Request Forgery. The existing nr-vault transport
(
vault->http()) already blocks internal/private/metadata targets. That guard is about where a request may go, not who owns it. - Supply-chain origin and integrity. Even a non-SSRF target must be a real GitHub host, and the bytes we store must be the bytes we reviewed — a moving branch ref can change content under us.
This ADR records the decisions for Plan 1a — ingest only. Skills are parsed, materialized and reviewed, but not yet injected into prompts; injection, the MM attach tables, and checksum-verify-on-injection are deferred to Plan 1b.
Decision
- Dedicated entities, not extended snippets. Two new Extbase
entities —
Skill(tableSource tx_nrllm_skill_source) andSkill(tabletx_nrllm_skill) — model the ingest domain. A skill is a materializedSKILL.md; a source produces N skills. ReusingPrompt(ADR-031: Tagged Prompt Snippet Library) was rejected: snippets are editor-authored fragments, skills are synced remote artifacts with their own lifecycle (sync status, checksum, orphaning).Snippet - Ingest / use split. Unit 1 is split at the MM-table seam into Plan 1a (this ADR: sources, fetch, parse, review) and Plan 1b (attach + inject). Each ships fully implemented, no stubs.
- SSRF guard ≠ GitHub-origin guard. On top of the nr-vault SSRF
guard,
Gitenforces an app-level GitHub host allowlist:Hub Client scheme = httpsAND host ∈ `{github.com, raw. githubusercontent. com, api. github. com, codeload. github. com} on the **initial request URL**. The transport does **not follow redirects**HostNotAllowedException` — never a silent skip.(any 3xx is treated as an error), so there is no redirect target to escape the allowlist. A rejected URL raises a typed : php: - Fetch by immutable commit SHA + checksum. A source
ref(branch/tag) is resolved once to a commit SHA viaGET /repos/{o}/{r}/commits/{ref}; the storedpinned_shais the URL all bodies are fetched from (raw.githubusercontent.comby SHA, never by branch). Abody_checksum(sha256) is computed at materialization and re-verified on injection in Plan 1b (fail-closed). - Disabled-by-default for multi-skill discovery. Every
repoandmarketplaceskill arrivesenabled = falseand must be reviewed before use. Asingle_filesource — one explicit admin act — may default enabled. Re-syncing an enabled skill whose recomputedbody_checksumchanged auto-reverts it to disabled and surfaces the diff for re-confirmation. - Namespaced upsert, orphan-disable.
identifieris namespaced"{source_uid}:{path}"so identical skill names across sources never collide. Re-sync is upsert-by-(source, identifier); a skill that disappeared upstream is marked orphaned + disabled, never silently dropped. - Admin-only management. Sources and skills live in a new
nrllm_skillsaccess = adminbackend submodule. The two tables are an escalation surface (the body becomes prompt context in 1b) and must never be granted to non-admin backend groups; sync-managed TCA fields (body_checksum,source_sha,raw_frontmatter,support_status,identifier) are read-only andgithub_tokenis never shown in a FormEngine form. - String-backed enums + bounded JSON.
Skill,Source Type SyncandStatus Supportare string-backed withStatus values()/isValid()/tryFromString()(the project's Defensive-Enum rule).raw_frontmatterand the reservedallowed_toolsJSON are byte- and shape-bounded at parse time even thoughallowed_toolsis ignored in 1a. - Explicit ``symfony/yaml`` dependency. Front-matter is parsed with
Symfony\Component\Yaml\Yaml; the package is added tocomposer.jsonrequireexplicitly rather than relied on transitively.
Consequences
- ● Admins reuse the GitHub skill ecosystem from inside the backend, with SHA-pinned, checksum-verified, host-allowlisted fetches.
- ● The SSRF guard and the GitHub-origin allowlist are independent controls, stated and tested separately — neither masks the other.
- ● Disabled-by-default plus auto-disable-on-change means no remote content silently enters a prompt: every enable is a deliberate admin review, and an upstream change re-opens that review.
- ● Orphan-disable (never drop) keeps attached skills (Plan 1b) from vanishing under an editor and makes upstream deletions visible.
- ◐ Two more domain entities and a new submodule increase surface area;
the split from
Promptis intentional and documented here and in the administration guide.Snippet - ◐ On hardened instances the global
HTTP/allowed_hostsSSRF list must include the four GitHub hosts, or every sync fails closed — a deliberate, documented prerequisite. - ✕
support_status = partialis not a safety signal. It only flags that referenced scripts/assets are not executed (always true in 1a); the prose stays fully untrusted. The injection-time output integrity controls land in Plan 1b.