# Implementation Plan: KME Article Content Fetch **Branch**: `003-kme-content-fetch` | **Date**: 2025-07-15 | **Spec**: [spec.md](spec.md) **Input**: Feature specification from `specs/003-kme-content-fetch/spec.md` ## Summary Add a new `contentFetchFlow()` to `src/proxyScripts/kmeContentSourceAdapter.js` that handles requests carrying a `?kmeURL=` query parameter. The flow validates the parameter, obtains an OIDC token via the existing `getValidToken()`, performs a GET request to the `kmeURL` with `Authorization: OIDC_id_token {token}`, extracts `vkm:articleBody` from the JSON-LD response, and returns it as `text/html`. A new pure helper `extractArticleBody(data)` is added to `src/globalVariables/kmeContentSourceAdapterHelpers.js`. No new files, no new npm dependencies. ## Technical Context **Language/Version**: Node.js ≥18, ESM (`"type": "module"`) **Primary Dependencies**: `axios ^1.13` (HTTP client, already in context), `redis ^5` (token cache, injected), `xmlbuilder2 ^4` (sitemap, unrelated to this feature) **Storage**: Redis (token cache only — managed by `getValidToken()`, not modified by this feature) **Testing**: Node.js built-in test runner (`node:test`) — `npm run test:unit`, `npm run test:contract` **Target Platform**: Node.js server (Linux/macOS); proxy script executed inside `vm.createContext` per request **Project Type**: HTTP proxy adapter — monolithic VM-sandbox architecture **Performance Goals**: End-to-end response ≤11 s (10 s upstream timeout + 1 s proxy overhead) per SC-001 **Constraints**: Zero new imports/exports in VM sandbox files; no new npm dependencies; no new `src/` files **Scale/Scope**: Two files modified (`kmeContentSourceAdapter.js`, `kmeContentSourceAdapterHelpers.js`); new unit tests in `tests/unit/proxy.test.js`; new contract tests in `tests/contract/proxy-http.test.js` ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* | Gate | Status | Notes | |------|--------|-------| | All business logic stays in `src/proxyScripts/kmeContentSourceAdapter.js` | ✅ PASS | `contentFetchFlow()` lives entirely in the adapter file | | Zero `import`/`export` in VM sandbox file | ✅ PASS | No imports added; all dependencies via injected context | | `extractArticleBody` is a pure utility → helper file | ✅ PASS | No state, no API calls — qualifies for `kmeContentSourceAdapterHelpers.js` | | No new files in `src/` | ✅ PASS | Only two existing files are modified | | No new npm dependencies | ✅ PASS | `axios` and `URL`/`URLSearchParams` already available in context | | Helpers file uses literal function body pattern | ✅ PASS | New helper added before existing `return { ... }` block | | Authentication (`getValidToken`) stays in proxy script (called from adapter, not moved) | ✅ PASS | `getValidToken()` is invoked from `contentFetchFlow()` in the adapter | **Post-design re-check**: All gates pass. No constitutional violations. No complexity tracking required. ## Project Structure ### Documentation (this feature) ```text specs/003-kme-content-fetch/ ├── plan.md # This file ├── research.md # Phase 0 output ├── data-model.md # Phase 1 output ├── quickstart.md # Phase 1 output ├── contracts/ │ └── http-content-fetch.md # HTTP request/response contract └── tasks.md # Phase 2 output (/speckit.tasks — NOT created here) ``` ### Source Code (repository root) ```text src/ ├── proxyScripts/ │ └── kmeContentSourceAdapter.js # MODIFIED: add contentFetchFlow(), update routing ├── globalVariables/ │ └── kmeContentSourceAdapterHelpers.js # MODIFIED: add extractArticleBody() ├── logger.js # unchanged └── server.js # unchanged tests/ ├── unit/ │ └── proxy.test.js # MODIFIED: add content-fetch describe blocks └── contract/ └── proxy-http.test.js # MODIFIED: add content-fetch contract tests ``` **Structure Decision**: Single-project monolith. Two existing source files modified; two existing test files extended. No new files in `src/`. The VM sandbox pattern and helper file pattern are preserved exactly as documented in the constitution. --- ## Phase 0: Research Findings → `research.md` See [research.md](research.md) for full decision log. Summary: - **URL parameter extraction**: `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` — confirmed safe from VM context research; `URL` is injected at server.js line 19. - **URL validation**: `new URL(kmeURL)` + protocol check in try/catch — cleanly handles FR-007/FR-008 in one guard. - **Axios error handling**: Confirmed `ECONNABORTED`/`ERR_CANCELED` for timeout; `err.response.status` available for all HTTP errors; JSON auto-parsed when `Content-Type: application/json`. - **JSON-LD parsing**: `response.data` is an object when axios auto-parses; fallback `JSON.parse()` needed for non-JSON content-types; non-object → 502. - **No unknowns remaining**: All NEEDS CLARIFICATION resolved. Research complete. --- ## Phase 1: Design ### Routing Change (`kmeContentSourceAdapter.js`) Add a new `else if` branch between the existing sitemap check and the `oidcAuthFlow` fallback: ```javascript // Entry point — URL routing try { if (req.url.endsWith('/sitemap.xml')) { await sitemapFlow(); } else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) { await contentFetchFlow(); // ← NEW } else { await oidcAuthFlow(); } } catch (err) { /* existing outer catch → 401 */ } ``` `contentFetchFlow()` is fully self-contained — all errors are caught internally and never propagate to the outer catch. ### `contentFetchFlow()` — Complete Logic ```javascript async function contentFetchFlow() { // Step 1: Extract kmeURL (FR-001, FR-002) const kmeURL = new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? ''; // Step 2: Validate — absent / empty (FR-007) if (!kmeURL.trim()) { res.writeHead(400, { 'Content-Type': 'text/plain' }); res.end('Bad Request: kmeURL parameter is required'); return; } // Step 3: Validate — well-formed absolute http/https URL (FR-008) try { const u = new URL(kmeURL); if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error(); } catch { res.writeHead(400, { 'Content-Type': 'text/plain' }); res.end('Bad Request: kmeURL must be a well-formed absolute http/https URL'); return; } // Step 4: Validate OIDC settings (config guard, returns 500 for missing config) const missingField = kmeContentSourceAdapterHelpers.validateSettings( kme_CSA_settings, ['tokenUrl', 'username', 'password', 'clientId', 'scope'], ); if (missingField) { console.error({ message: 'Content fetch: config error', missingField }); res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Configuration error: missing required field: ' + missingField); return; } // Step 5: Obtain OIDC token (FR-003, FR-011) let token; try { token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method); } catch (tokenErr) { console.error({ message: 'Content fetch: token acquisition failed', error: tokenErr.message }); res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: token acquisition failed'); return; } // Step 6: GET kmeURL verbatim with auth header (FR-002, FR-003, FR-004) let response; try { console.debug({ message: 'Content fetch: fetching article', kmeURL }); response = await axios.get(kmeURL, { headers: { Authorization: `OIDC_id_token ${token}` }, timeout: 10000, }); } catch (fetchErr) { if (fetchErr.code === 'ECONNABORTED' || fetchErr.code === 'ERR_CANCELED') { console.error({ message: 'Content fetch: upstream timeout', code: fetchErr.code }); res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: upstream request timed out'); } else if (fetchErr.response) { const status = fetchErr.response.status; console.error({ message: 'Content fetch: upstream HTTP error', status }); if (status >= 400 && status < 500) { res.writeHead(404, { 'Content-Type': 'text/plain' }); res.end('Not Found: article not found at upstream'); } else { res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: upstream error HTTP ' + status); } } else { console.error({ message: 'Content fetch: network error', error: fetchErr.message }); res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: ' + fetchErr.message); } return; } // Step 7: Parse body — handle non-JSON content-type (FR-005, FR-010) let data = response.data; if (typeof data === 'string') { try { data = JSON.parse(data); } catch { console.error({ message: 'Content fetch: unparseable response body', kmeURL }); res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: unparseable response from upstream'); return; } } if (typeof data !== 'object' || data === null) { console.error({ message: 'Content fetch: unexpected non-object response', kmeURL }); res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Bad Gateway: unexpected response from upstream'); return; } // Step 8: Extract vkm:articleBody (FR-005, FR-009) const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data); if (!articleBody) { console.error({ message: 'Content fetch: vkm:articleBody absent or empty', kmeURL }); res.writeHead(404, { 'Content-Type': 'text/plain' }); res.end('Not Found: article body not present in upstream response'); return; } // Step 9: Return article HTML (FR-006) console.info({ message: 'Content fetch: article fetched successfully', kmeURL }); res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(articleBody); } ``` ### `extractArticleBody(data)` — New Helper Add to `kmeContentSourceAdapterHelpers.js` before the existing `return { ... }` block: ```javascript /** * Extracts the vkm:articleBody string from a KME Content Service JSON-LD response. * Returns null if the field is absent, null, not a string, or an empty/whitespace string. * @param {object} data – parsed JSON-LD response from the KME Content Service * @returns {string|null} */ function extractArticleBody(data) { if (!data || typeof data !== 'object') return null; const body = data['vkm:articleBody']; if (body == null || typeof body !== 'string' || body.trim() === '') return null; return body; } ``` Update the `return { ... }` at the bottom of the helpers file to export the new function: ```javascript return { validateSettings, extractHydraItems, buildSitemapXml, getValidToken, extractArticleBody, // ← NEW }; ``` ### Error Response Matrix | Condition | HTTP Status | Response Body | |-----------|-------------|---------------| | `kmeURL` absent or empty | 400 | `Bad Request: kmeURL parameter is required` | | `kmeURL` not a well-formed absolute http/https URL | 400 | `Bad Request: kmeURL must be a well-formed absolute http/https URL` | | Missing OIDC config field | 500 | `Configuration error: missing required field: {field}` | | Token acquisition failure | 502 | `Bad Gateway: token acquisition failed` | | Upstream 4xx response | 404 | `Not Found: article not found at upstream` | | Upstream 5xx response | 502 | `Bad Gateway: upstream error HTTP {status}` | | Upstream timeout (`ECONNABORTED`/`ERR_CANCELED`) | 502 | `Bad Gateway: upstream request timed out` | | Network error (no `err.response`) | 502 | `Bad Gateway: {err.message}` | | Response body unparseable as JSON | 502 | `Bad Gateway: unparseable response from upstream` | | Non-object response body | 502 | `Bad Gateway: unexpected response from upstream` | | `vkm:articleBody` absent, null, or empty | 404 | `Not Found: article body not present in upstream response` | | Success | 200 `text/html` | article body HTML | ### Test Coverage Plan **Unit tests** (add to `tests/unit/proxy.test.js`): | Describe block | Test case | Verifies | |---------------|-----------|---------| | `US-content-fetch: happy path` | cached token → 200 HTML body | FR-001, FR-005, FR-006 | | `US-content-fetch: happy path` | cache miss → token fetch → 200 HTML body | FR-003 | | `US-content-fetch: happy path` | expired token → refresh → 200 HTML body | FR-003 | | `US-content-fetch: input validation` | no `kmeURL` → oidcAuthFlow (unchanged 200) | FR-012 | | `US-content-fetch: input validation` | `kmeURL` empty string → 400 | FR-007 | | `US-content-fetch: input validation` | `kmeURL` whitespace → 400 | FR-007 | | `US-content-fetch: input validation` | `kmeURL` relative URL → 400 | FR-008 | | `US-content-fetch: input validation` | `kmeURL` non-http protocol (`ftp:`) → 400 | FR-008 | | `US-content-fetch: input validation` | `kmeURL` malformed string → 400 | FR-008 | | `US-content-fetch: token failure` | `getValidToken` throws → 502 | FR-011 | | `US-content-fetch: upstream errors` | upstream 404 → 404 | FR-009 | | `US-content-fetch: upstream errors` | upstream 410 → 404 | FR-009 | | `US-content-fetch: upstream errors` | upstream 503 → 502 | FR-010 | | `US-content-fetch: upstream errors` | timeout `ECONNABORTED` → 502 | FR-010 | | `US-content-fetch: upstream errors` | timeout `ERR_CANCELED` → 502 | FR-010 | | `US-content-fetch: upstream errors` | network error (no `err.response`) → 502 | FR-010 | | `US-content-fetch: body parsing` | unparseable string body → 502 | FR-010 | | `US-content-fetch: body parsing` | `vkm:articleBody` absent → 404 | FR-009 | | `US-content-fetch: body parsing` | `vkm:articleBody` null → 404 | FR-009 | | `US-content-fetch: body parsing` | `vkm:articleBody` empty string → 404 | FR-009 | | `US-content-fetch: body parsing` | `vkm:articleBody` whitespace → 404 | FR-009 | | `US-content-fetch: passthrough preserved` | no `kmeURL`, not sitemap → 200 'Authorized' | FR-012 | | `extractArticleBody helper` | returns body string | FR-005 | | `extractArticleBody helper` | null data → null | FR-005 | | `extractArticleBody helper` | no `vkm:articleBody` field → null | FR-009 | | `extractArticleBody helper` | empty string → null | FR-009 | | `extractArticleBody helper` | whitespace string → null | FR-009 | **Contract tests** (add to `tests/contract/proxy-http.test.js`): | Test case | Setup | Verifies | |-----------|-------|---------| | valid `kmeURL` → real mock HTTP server returning JSON-LD with `vkm:articleBody` → 200 HTML | real HTTP server, real token server | SC-001, FR-006 | | real mock server returning 404 → proxy returns 404 | real 404 HTTP server | FR-009 | | real mock server returning 503 → proxy returns 502 | real 503 HTTP server | FR-010 | | non-responding server → proxy returns 502 within 12 s | real server that never responds | FR-010 | ### `extractArticleBody` — Edge Case Coverage | Input | Expected output | |-------|----------------| | `{ 'vkm:articleBody': '

Hello

' }` | `'

Hello

'` | | `{ 'vkm:articleBody': '' }` | `null` | | `{ 'vkm:articleBody': ' ' }` | `null` | | `{ 'vkm:articleBody': null }` | `null` | | `{ 'vkm:articleBody': undefined }` (field absent) | `null` | | `{}` (field absent) | `null` | | `null` | `null` | | `'string'` (non-object) | `null` |