# Implementation Plan: Sitemap XML Generation **Branch**: `002-sitemap-generation` | **Date**: 2025-07-14 | **Spec**: [spec.md](./spec.md) **Input**: Feature specification from `/specs/002-sitemap-generation/spec.md` --- ## Summary Add a `GET /sitemap.xml` route to `kmeContentSourceAdapter.js`. The adapter detects sitemap requests by URL suffix, obtains a valid OIDC `id_token` from the Redis cache (reusing the existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each result's `vkm:url` field to a `` entry, and returns a standards-compliant XML Sitemap as `application/xml`. All existing non-sitemap requests are unaffected. Three new fields are added to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`). --- ## Technical Context **Language/Version**: Node.js ≥18, ESM (`"type": "module"`) **Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json` **Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage **Testing**: Node.js built-in test runner (`node:test`); no external test framework **Target Platform**: Linux server / container (HTTP proxy adapter) **Project Type**: HTTP proxy adapter (web-service) **Performance Goals**: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005) **Constraints**: - Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`) - No references to `config`, `global.config`, or `process.env` in proxy script - XML built exclusively with the injected `xmlbuilder2` (FR-008) - No new npm packages; no new source files (monolithic architecture — Section I of constitution) **Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1) --- ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* | # | Principle | Status | Notes | |---|---|---|---| | I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files | | I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced | | I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlbuilder2`, `req`, `res` used | | I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` | | I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) | | I.III | No new files challenged | ✅ PASS | No new files in `src/` | | I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` | | I.V | `xmlbuilder2` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed | | II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` | | III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation | | VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) | **Post-design re-check**: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations. --- ## Project Structure ### Documentation (this feature) ```text specs/002-sitemap-generation/ ├── plan.md # This file (/speckit.plan command output) ├── spec.md # Feature specification ├── research.md # Phase 0 output (/speckit.plan command) ├── data-model.md # Phase 1 output (/speckit.plan command) ├── quickstart.md # Phase 1 output (/speckit.plan command) ├── contracts/ # Phase 1 output (/speckit.plan command) │ └── sitemap-endpoint.md └── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan) ``` ### Source Code (repository root) ```text src/ ├── proxyScripts/ │ └── kmeContentSourceAdapter.js # MODIFIED: sitemap branch + token helper added ├── globalVariables/ │ ├── kme_CSA_settings.json # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl) │ └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders └── server.js # NO CHANGE tests/ ├── unit/ │ └── proxy.test.js # MODIFIED: sitemap test cases added └── contract/ └── proxy-http.test.js # MODIFIED: sitemap HTTP contract tests added ``` **Structure Decision**: Single-project layout. No new directories. Only the proxy script, its settings JSON, and the existing test files are modified. --- ## Phase 0: Research Findings > Full research notes: [research.md](./research.md) | Research ID | Topic | Decision | |---|---|---| | R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first | | R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation | | R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint | | R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern | | R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 | | R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` | **Resolved NEEDS CLARIFICATION**: None remain. All decisions are documented. --- ## Phase 1: Design ### Data Model > Full data model: [data-model.md](./data-model.md) **Key entities**: - `KnowledgeItem` — raw search result with `vkm:url` (read-only, from upstream API) - `SitemapEntry` — `{ loc: string }` derived in-memory from `KnowledgeItem` - `SitemapDocument` — serialised XML output (`urlset` + `url` elements) - `OIDCTokenCache` — shared Redis store (unchanged; `hGet`/`hSet` pattern reused) - `kme_CSA_settings` — extended JSON settings (3 new fields) ### Contracts > Full contract: [contracts/sitemap-endpoint.md](./contracts/sitemap-endpoint.md) | Scenario | Status | Response | |---|---|---| | Search succeeds, items present | 200 | `application/xml` sitemap with `` entries | | Search succeeds, zero items | 200 | `application/xml` empty `` | | Missing settings field | 500 | `text/plain` descriptive message | | Upstream non-2xx | 502 | `text/plain` upstream error | | Upstream timeout | 504 | `text/plain` timeout message | ### Implementation Design **Entry point restructure** (single IIFE, no imports): ```javascript (async () => { // FR-001: Route on URL suffix if (req.url.endsWith('/sitemap.xml')) { await sitemapFlow(); } else { await oidcAuthFlow(); // existing logic, moved to inner async function } })(); ``` **`sitemapFlow` logic**: ```javascript async function sitemapFlow() { // FR-011: Validate required settings const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl']; for (const f of required) { if (!kme_CSA_settings[f]) { res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Configuration error: missing required field: ' + f); return; } } // FR-003: Obtain valid OIDC token (shared helper with existing flow) const token = await getValidToken(); // throws on failure → caught by outer try/catch // FR-002: Call KME Knowledge Search Service const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings; const searchResponse = await axios.get( `${searchApiBaseUrl}/${tenant}`, { headers: { Authorization: `OIDC_id_token ${token}` }, timeout: 10_000, } ); // Extract items (R-002: assume { items: [...] } or bare array) const items = searchResponse.data.items ?? searchResponse.data ?? []; // FR-004, FR-005, FR-006, FR-008: Build sitemap XML const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); for (const item of items) { const vkmUrl = item['vkm:url']; if (!vkmUrl) continue; // FR-006: omit silently const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`; urlset.ele('url').ele('loc').txt(loc).up().up(); } const xml = doc.end({ prettyPrint: false }); // FR-007: Respond res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml); } ``` **Error handling** (wrapping `sitemapFlow` catch): - `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → 504 - `err.response` defined → 502 `Search service error: HTTP ${err.response.status}` - other → 502 `Search service error: ${err.message}` **`getValidToken` helper** (shared inline function; extract from existing OIDC flow): Encapsulates steps 2–6 of the existing flow: - `hGet('authorization', 'token')` / `hGet('authorization', 'expiry')` - Cache hit → return token - Stampede guard → queue on in-flight promise - Cache miss → `axios.post(tokenUrl, ...)` → `hSet` both fields - Returns the `id_token` string; throws on failure **Token fetch failure in sitemap context**: If `getValidToken` throws, the outer catch returns `401 Unauthorized: ` (same as existing flow). ### Test Plan **Unit tests** (`tests/unit/proxy.test.js`) — new `describe('sitemap flow')` block: | Scenario | Mock | Assert | |---|---|---| | Happy path: items present | axios.get → `{ items: [{ 'vkm:url': '...' }] }` | 200, `application/xml`, `` | | Happy path: zero items | axios.get → `{ items: [] }` | 200, empty `` | | Items with empty vkm:url | mix of valid + empty | only non-empty items in output | | Missing `searchApiBaseUrl` | settings without field | 500, descriptive message | | Missing `tenant` | settings without field | 500, descriptive message | | Missing `proxyBaseUrl` | settings without field | 500, descriptive message | | Upstream 503 | axios.get rejects with `{ response: { status: 503 } }` | 502 | | Upstream timeout | axios.get rejects with `{ code: 'ECONNABORTED' }` | 504 | | Non-sitemap URL still works | req.url = '/' | existing 200 Authorized behaviour | **Contract tests** (`tests/contract/proxy-http.test.js`) — new `describe('sitemap endpoint')` block: | Scenario | Setup | Assert | |---|---|---| | Full round-trip: GET /sitemap.xml | Mock search server → 200 `{ items: [...] }` | 200, `application/xml`, valid XML with `` | | Empty results | Mock search server → 200 `{ items: [] }` | 200, `application/xml`, empty `` | | Search server returns 503 | Mock → 503 | 502 | | Search server hangs > 10 s | Mock → never respond | 504 | --- ## Complexity Tracking > No violations to justify. All gates pass. No entries required.