# Research: Sitemap XML Generation **Feature**: `002-sitemap-generation` **Branch**: `002-sitemap-generation` **Date**: 2025-07-14 --- ## R-001: Token Reuse — OIDC Cache Pattern **Decision**: Reuse `redis.hGet('authorization', 'token')` / `redis.hGet('authorization', 'expiry')` and the existing stampede-guard / token-refresh flow verbatim. **Rationale**: The existing `kmeContentSourceAdapter.js` already implements a correct, battle-tested pattern for obtaining a valid OIDC `id_token` from Redis and refreshing it when expired. Duplicating only the cache-read portion (steps 1–3 of the existing flow) would create divergence. Calling the full existing logic first and then branching to the sitemap flow avoids that risk while reusing the security invariants already proven in production. **Approach in code**: Refactor the top-level IIFE so that: 1. URL routing check happens **first** (before any async work). 2. For sitemap requests, a shared `getValidToken()` helper (inlined in the script, no imports) performs the identical cache-hit → stampede-guard → refresh → cache-write sequence. 3. For all other requests, the existing flow runs unchanged. **Alternatives considered**: - Call the existing OIDC logic unconditionally, then branch: rejected because it adds unnecessary latency to non-sitemap requests (token check not needed for sitemap but would execute anyway). - Separate helper file: rejected by the monolithic architecture constraint (Section I, constitution). --- ## R-002: KME Knowledge Search Service API — Response Envelope **Decision**: Assume the response body is a JSON object with a top-level `items` array. Each element of `items` is an object whose `vkm:url` property holds the canonical document URL. **Rationale**: The feature spec states: > "The `vkm:url` field is present at the top level of each item object in the search results > array; the exact response envelope shape will be confirmed against the live API during > implementation." The most common shape for knowledge/search services is `{ items: [ { "vkm:url": "...", ... } ] }`. This assumption allows the code to be written and fully unit-tested before live-API access is available. A single `items` extraction line (`response.data.items ?? response.data`) means the adaption to the real shape is a one-line change. **Concrete assumption**: ```json { "items": [ { "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "…" }, { "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "…" } ] } ``` **Verification required**: During implementation, run the live API call against `/` and confirm: 1. The top-level key that holds the array (likely `items`, `results`, or the root is directly an array). 2. That `vkm:url` is a string property, not nested deeper. **Fallback**: If the root is a bare array, `response.data` itself is used as the items array. **Alternatives considered**: - `results` key: equally plausible; the code will use `response.data.items ?? response.data` as a defensive pattern until confirmed. - Deeply nested: no evidence for this; rejected pending confirmation. --- ## R-003: xmlbuilder2 `create()` API for Sitemap XML **Decision**: Use the `xmlBuilder` context variable (which is `xmlbuilder2`'s `create` function) with the following call chain: ```javascript const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); for (const item of items) { urlset.ele('url').ele('loc').txt(locValue).up().up(); } const xml = doc.end({ prettyPrint: false }); ``` **Rationale**: `xmlbuilder2` v4.x `create()` returns a `XMLBuilder` document node. Calling `.ele()` on it creates the root element. Child elements are built by chaining `.ele()` / `.txt()` / `.up()`. `doc.end({ prettyPrint: false })` serialises to a string prefixed with ``. `prettyPrint: false` is chosen for minimal byte overhead (sitemap consumers parse XML, not read it). **Sitemap namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9` — required by the Sitemaps protocol and the XSD schema referenced in SC-004. **Validation**: The serialised string must begin with `` root. Unit tests will assert this. **Alternatives considered**: - Manual string concatenation: rejected (error-prone escaping, violates FR-008 which requires xmlBuilder). - `xmlbuilder` (v1/v2): not the installed package; rejected. --- ## R-004: Axios Error Differentiation — 502 vs 504 **Decision**: Reuse the exact error-detection pattern already present in the script: | Condition | Status | Detection | |---|---|---| | `err.response` is defined | 502 Bad Gateway | Axios sets `err.response` for non-2xx HTTP responses | | `err.code === 'ECONNABORTED'` | 504 Gateway Timeout | Axios timeout (pre-Node 18) | | `err.code === 'ERR_CANCELED'` | 504 Gateway Timeout | Axios timeout (Node 18+ / AbortSignal) | | Other | 502 Bad Gateway | Treated as upstream failure | **Rationale**: The existing script already uses this exact pattern for token-service errors (`err.response`, `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'`). Reusing it for search-service errors ensures consistent error classification across all upstream calls. **Timeout value**: 10 000 ms, as stated in the spec assumption ("consistent with industry-standard defaults for proxy-initiated upstream requests"). **Alternatives considered**: - `AbortController` + `fetch`: not available in the VM context (only `axios` is injected). Rejected. - Different timeout for search vs auth: spec does not require this; YAGNI. --- ## R-005: Settings Validation — New Fields **Decision**: At the entry point of the sitemap flow, perform an explicit guard before any async operation: ```javascript const requiredSitemapFields = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl']; for (const field of requiredSitemapFields) { if (!kme_CSA_settings[field]) { res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Configuration error: missing required field: ' + field); return; } } ``` **Rationale**: FR-011 requires HTTP 500 with a descriptive message for missing settings. Checking before any async work means no I/O is attempted against an unconfigured upstream, and the error message identifies exactly which field is absent. **The three new fields to add to `kme_CSA_settings.json`**: | Field | Type | Description | |---|---|---| | `searchApiBaseUrl` | string | Base URL of the KME Knowledge Search Service | | `tenant` | string | Tenant identifier appended to search base URL | | `proxyBaseUrl` | string | Externally accessible HTTPS URL of this adapter instance | --- ## R-006: `loc` URL Construction and `vkm:url` Encoding **Decision**: Construct each `` as: ```javascript `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` ``` **Rationale**: FR-005 specifies exactly this pattern. `encodeURIComponent` is a built-in available inside the VM context without injection (it is a standard JavaScript global). Using it percent-encodes the `vkm:url` value, producing a safe query-string parameter even if the value contains `://`, `?`, `#`, or other URL-special characters. **Empty/missing guard** (FR-006): ```javascript const vkmUrl = item['vkm:url']; if (!vkmUrl) continue; // omit silently ``` --- ## Summary of All Decisions | ID | Topic | Decision | |---|---|---| | R-001 | Token reuse | Inline shared token-fetch logic; branch on URL first | | R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API | | R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset', {...})…doc.end({})` | | R-004 | Error mapping | Reuse existing `err.response` / `err.code` pattern | | R-005 | Settings validation | Explicit `requiredSitemapFields` guard → HTTP 500 | | R-006 | `loc` construction | `proxyBaseUrl?kmeURL=encodeURIComponent(vkm:url)` |