- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
15 KiB
Implementation Plan: KME Article Content Fetch
Branch: 003-kme-content-fetch | Date: 2025-07-15 | Spec: spec.md
Input: Feature specification from specs/003-kme-content-fetch/spec.md
Summary
Add a new contentFetchFlow() to src/proxyScripts/kmeContentSourceAdapter.js that handles
requests carrying a ?kmeURL= query parameter. The flow validates the parameter, obtains an OIDC
token via the existing getValidToken(), performs a GET request to the kmeURL with
Authorization: OIDC_id_token {token}, extracts vkm:articleBody from the JSON-LD response, and
returns it as text/html. A new pure helper extractArticleBody(data) is added to
src/globalVariables/kmeContentSourceAdapterHelpers.js. No new files, no new npm dependencies.
Technical Context
Language/Version: Node.js ≥18, ESM ("type": "module")
Primary Dependencies: axios ^1.13 (HTTP client, already in context), redis ^5 (token cache, injected), xmlbuilder2 ^4 (sitemap, unrelated to this feature)
Storage: Redis (token cache only — managed by getValidToken(), not modified by this feature)
Testing: Node.js built-in test runner (node:test) — npm run test:unit, npm run test:contract
Target Platform: Node.js server (Linux/macOS); proxy script executed inside vm.createContext per request
Project Type: HTTP proxy adapter — monolithic VM-sandbox architecture
Performance Goals: End-to-end response ≤11 s (10 s upstream timeout + 1 s proxy overhead) per SC-001
Constraints: Zero new imports/exports in VM sandbox files; no new npm dependencies; no new src/ files
Scale/Scope: Two files modified (kmeContentSourceAdapter.js, kmeContentSourceAdapterHelpers.js); new unit tests in tests/unit/proxy.test.js; new contract tests in tests/contract/proxy-http.test.js
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Gate | Status | Notes |
|---|---|---|
All business logic stays in src/proxyScripts/kmeContentSourceAdapter.js |
✅ PASS | contentFetchFlow() lives entirely in the adapter file |
Zero import/export in VM sandbox file |
✅ PASS | No imports added; all dependencies via injected context |
extractArticleBody is a pure utility → helper file |
✅ PASS | No state, no API calls — qualifies for kmeContentSourceAdapterHelpers.js |
No new files in src/ |
✅ PASS | Only two existing files are modified |
| No new npm dependencies | ✅ PASS | axios and URL/URLSearchParams already available in context |
| Helpers file uses literal function body pattern | ✅ PASS | New helper added before existing return { ... } block |
Authentication (getValidToken) stays in proxy script (called from adapter, not moved) |
✅ PASS | getValidToken() is invoked from contentFetchFlow() in the adapter |
Post-design re-check: All gates pass. No constitutional violations. No complexity tracking required.
Project Structure
Documentation (this feature)
specs/003-kme-content-fetch/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/
│ └── http-content-fetch.md # HTTP request/response contract
└── tasks.md # Phase 2 output (/speckit.tasks — NOT created here)
Source Code (repository root)
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # MODIFIED: add contentFetchFlow(), update routing
├── globalVariables/
│ └── kmeContentSourceAdapterHelpers.js # MODIFIED: add extractArticleBody()
├── logger.js # unchanged
└── server.js # unchanged
tests/
├── unit/
│ └── proxy.test.js # MODIFIED: add content-fetch describe blocks
└── contract/
└── proxy-http.test.js # MODIFIED: add content-fetch contract tests
Structure Decision: Single-project monolith. Two existing source files modified; two existing test
files extended. No new files in src/. The VM sandbox pattern and helper file pattern are preserved
exactly as documented in the constitution.
Phase 0: Research Findings → research.md
See research.md for full decision log. Summary:
- URL parameter extraction:
new URL(req.url, 'http://localhost').searchParams.get('kmeURL')— confirmed safe from VM context research;URLis injected at server.js line 19. - URL validation:
new URL(kmeURL)+ protocol check in try/catch — cleanly handles FR-007/FR-008 in one guard. - Axios error handling: Confirmed
ECONNABORTED/ERR_CANCELEDfor timeout;err.response.statusavailable for all HTTP errors; JSON auto-parsed whenContent-Type: application/json. - JSON-LD parsing:
response.datais an object when axios auto-parses; fallbackJSON.parse()needed for non-JSON content-types; non-object → 502. - No unknowns remaining: All NEEDS CLARIFICATION resolved. Research complete.
Phase 1: Design
Routing Change (kmeContentSourceAdapter.js)
Add a new else if branch between the existing sitemap check and the oidcAuthFlow fallback:
// Entry point — URL routing
try {
if (req.url.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) {
await contentFetchFlow(); // ← NEW
} else {
await oidcAuthFlow();
}
} catch (err) { /* existing outer catch → 401 */ }
contentFetchFlow() is fully self-contained — all errors are caught internally and never propagate to the outer catch.
contentFetchFlow() — Complete Logic
async function contentFetchFlow() {
// Step 1: Extract kmeURL (FR-001, FR-002)
const kmeURL = new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? '';
// Step 2: Validate — absent / empty (FR-007)
if (!kmeURL.trim()) {
res.writeHead(400, { 'Content-Type': 'text/plain' });
res.end('Bad Request: kmeURL parameter is required');
return;
}
// Step 3: Validate — well-formed absolute http/https URL (FR-008)
try {
const u = new URL(kmeURL);
if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error();
} catch {
res.writeHead(400, { 'Content-Type': 'text/plain' });
res.end('Bad Request: kmeURL must be a well-formed absolute http/https URL');
return;
}
// Step 4: Validate OIDC settings (config guard, returns 500 for missing config)
const missingField = kmeContentSourceAdapterHelpers.validateSettings(
kme_CSA_settings,
['tokenUrl', 'username', 'password', 'clientId', 'scope'],
);
if (missingField) {
console.error({ message: 'Content fetch: config error', missingField });
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Configuration error: missing required field: ' + missingField);
return;
}
// Step 5: Obtain OIDC token (FR-003, FR-011)
let token;
try {
token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
} catch (tokenErr) {
console.error({ message: 'Content fetch: token acquisition failed', error: tokenErr.message });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: token acquisition failed');
return;
}
// Step 6: GET kmeURL verbatim with auth header (FR-002, FR-003, FR-004)
let response;
try {
console.debug({ message: 'Content fetch: fetching article', kmeURL });
response = await axios.get(kmeURL, {
headers: { Authorization: `OIDC_id_token ${token}` },
timeout: 10000,
});
} catch (fetchErr) {
if (fetchErr.code === 'ECONNABORTED' || fetchErr.code === 'ERR_CANCELED') {
console.error({ message: 'Content fetch: upstream timeout', code: fetchErr.code });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: upstream request timed out');
} else if (fetchErr.response) {
const status = fetchErr.response.status;
console.error({ message: 'Content fetch: upstream HTTP error', status });
if (status >= 400 && status < 500) {
res.writeHead(404, { 'Content-Type': 'text/plain' });
res.end('Not Found: article not found at upstream');
} else {
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: upstream error HTTP ' + status);
}
} else {
console.error({ message: 'Content fetch: network error', error: fetchErr.message });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: ' + fetchErr.message);
}
return;
}
// Step 7: Parse body — handle non-JSON content-type (FR-005, FR-010)
let data = response.data;
if (typeof data === 'string') {
try {
data = JSON.parse(data);
} catch {
console.error({ message: 'Content fetch: unparseable response body', kmeURL });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: unparseable response from upstream');
return;
}
}
if (typeof data !== 'object' || data === null) {
console.error({ message: 'Content fetch: unexpected non-object response', kmeURL });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: unexpected response from upstream');
return;
}
// Step 8: Extract vkm:articleBody (FR-005, FR-009)
const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data);
if (!articleBody) {
console.error({ message: 'Content fetch: vkm:articleBody absent or empty', kmeURL });
res.writeHead(404, { 'Content-Type': 'text/plain' });
res.end('Not Found: article body not present in upstream response');
return;
}
// Step 9: Return article HTML (FR-006)
console.info({ message: 'Content fetch: article fetched successfully', kmeURL });
res.writeHead(200, { 'Content-Type': 'text/html' });
res.end(articleBody);
}
extractArticleBody(data) — New Helper
Add to kmeContentSourceAdapterHelpers.js before the existing return { ... } block:
/**
* Extracts the vkm:articleBody string from a KME Content Service JSON-LD response.
* Returns null if the field is absent, null, not a string, or an empty/whitespace string.
* @param {object} data – parsed JSON-LD response from the KME Content Service
* @returns {string|null}
*/
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
const body = data['vkm:articleBody'];
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
return body;
}
Update the return { ... } at the bottom of the helpers file to export the new function:
return {
validateSettings,
extractHydraItems,
buildSitemapXml,
getValidToken,
extractArticleBody, // ← NEW
};
Error Response Matrix
| Condition | HTTP Status | Response Body |
|---|---|---|
kmeURL absent or empty |
400 | Bad Request: kmeURL parameter is required |
kmeURL not a well-formed absolute http/https URL |
400 | Bad Request: kmeURL must be a well-formed absolute http/https URL |
| Missing OIDC config field | 500 | Configuration error: missing required field: {field} |
| Token acquisition failure | 502 | Bad Gateway: token acquisition failed |
| Upstream 4xx response | 404 | Not Found: article not found at upstream |
| Upstream 5xx response | 502 | Bad Gateway: upstream error HTTP {status} |
Upstream timeout (ECONNABORTED/ERR_CANCELED) |
502 | Bad Gateway: upstream request timed out |
Network error (no err.response) |
502 | Bad Gateway: {err.message} |
| Response body unparseable as JSON | 502 | Bad Gateway: unparseable response from upstream |
| Non-object response body | 502 | Bad Gateway: unexpected response from upstream |
vkm:articleBody absent, null, or empty |
404 | Not Found: article body not present in upstream response |
| Success | 200 text/html |
article body HTML |
Test Coverage Plan
Unit tests (add to tests/unit/proxy.test.js):
| Describe block | Test case | Verifies |
|---|---|---|
US-content-fetch: happy path |
cached token → 200 HTML body | FR-001, FR-005, FR-006 |
US-content-fetch: happy path |
cache miss → token fetch → 200 HTML body | FR-003 |
US-content-fetch: happy path |
expired token → refresh → 200 HTML body | FR-003 |
US-content-fetch: input validation |
no kmeURL → oidcAuthFlow (unchanged 200) |
FR-012 |
US-content-fetch: input validation |
kmeURL empty string → 400 |
FR-007 |
US-content-fetch: input validation |
kmeURL whitespace → 400 |
FR-007 |
US-content-fetch: input validation |
kmeURL relative URL → 400 |
FR-008 |
US-content-fetch: input validation |
kmeURL non-http protocol (ftp:) → 400 |
FR-008 |
US-content-fetch: input validation |
kmeURL malformed string → 400 |
FR-008 |
US-content-fetch: token failure |
getValidToken throws → 502 |
FR-011 |
US-content-fetch: upstream errors |
upstream 404 → 404 | FR-009 |
US-content-fetch: upstream errors |
upstream 410 → 404 | FR-009 |
US-content-fetch: upstream errors |
upstream 503 → 502 | FR-010 |
US-content-fetch: upstream errors |
timeout ECONNABORTED → 502 |
FR-010 |
US-content-fetch: upstream errors |
timeout ERR_CANCELED → 502 |
FR-010 |
US-content-fetch: upstream errors |
network error (no err.response) → 502 |
FR-010 |
US-content-fetch: body parsing |
unparseable string body → 502 | FR-010 |
US-content-fetch: body parsing |
vkm:articleBody absent → 404 |
FR-009 |
US-content-fetch: body parsing |
vkm:articleBody null → 404 |
FR-009 |
US-content-fetch: body parsing |
vkm:articleBody empty string → 404 |
FR-009 |
US-content-fetch: body parsing |
vkm:articleBody whitespace → 404 |
FR-009 |
US-content-fetch: passthrough preserved |
no kmeURL, not sitemap → 200 'Authorized' |
FR-012 |
extractArticleBody helper |
returns body string | FR-005 |
extractArticleBody helper |
null data → null | FR-005 |
extractArticleBody helper |
no vkm:articleBody field → null |
FR-009 |
extractArticleBody helper |
empty string → null | FR-009 |
extractArticleBody helper |
whitespace string → null | FR-009 |
Contract tests (add to tests/contract/proxy-http.test.js):
| Test case | Setup | Verifies |
|---|---|---|
valid kmeURL → real mock HTTP server returning JSON-LD with vkm:articleBody → 200 HTML |
real HTTP server, real token server | SC-001, FR-006 |
| real mock server returning 404 → proxy returns 404 | real 404 HTTP server | FR-009 |
| real mock server returning 503 → proxy returns 502 | real 503 HTTP server | FR-010 |
| non-responding server → proxy returns 502 within 12 s | real server that never responds | FR-010 |
extractArticleBody — Edge Case Coverage
| Input | Expected output |
|---|---|
{ 'vkm:articleBody': '<p>Hello</p>' } |
'<p>Hello</p>' |
{ 'vkm:articleBody': '' } |
null |
{ 'vkm:articleBody': ' ' } |
null |
{ 'vkm:articleBody': null } |
null |
{ 'vkm:articleBody': undefined } (field absent) |
null |
{} (field absent) |
null |
null |
null |
'string' (non-object) |
null |