Files
kme_content_adapter/specs/003-kme-content-fetch/plan.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

15 KiB
Raw Blame History

Implementation Plan: KME Article Content Fetch

Branch: 003-kme-content-fetch | Date: 2025-07-15 | Spec: spec.md
Input: Feature specification from specs/003-kme-content-fetch/spec.md

Summary

Add a new contentFetchFlow() to src/proxyScripts/kmeContentSourceAdapter.js that handles requests carrying a ?kmeURL= query parameter. The flow validates the parameter, obtains an OIDC token via the existing getValidToken(), performs a GET request to the kmeURL with Authorization: OIDC_id_token {token}, extracts vkm:articleBody from the JSON-LD response, and returns it as text/html. A new pure helper extractArticleBody(data) is added to src/globalVariables/kmeContentSourceAdapterHelpers.js. No new files, no new npm dependencies.

Technical Context

Language/Version: Node.js ≥18, ESM ("type": "module")
Primary Dependencies: axios ^1.13 (HTTP client, already in context), redis ^5 (token cache, injected), xmlbuilder2 ^4 (sitemap, unrelated to this feature)
Storage: Redis (token cache only — managed by getValidToken(), not modified by this feature)
Testing: Node.js built-in test runner (node:test) — npm run test:unit, npm run test:contract
Target Platform: Node.js server (Linux/macOS); proxy script executed inside vm.createContext per request
Project Type: HTTP proxy adapter — monolithic VM-sandbox architecture
Performance Goals: End-to-end response ≤11 s (10 s upstream timeout + 1 s proxy overhead) per SC-001
Constraints: Zero new imports/exports in VM sandbox files; no new npm dependencies; no new src/ files
Scale/Scope: Two files modified (kmeContentSourceAdapter.js, kmeContentSourceAdapterHelpers.js); new unit tests in tests/unit/proxy.test.js; new contract tests in tests/contract/proxy-http.test.js

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Gate Status Notes
All business logic stays in src/proxyScripts/kmeContentSourceAdapter.js PASS contentFetchFlow() lives entirely in the adapter file
Zero import/export in VM sandbox file PASS No imports added; all dependencies via injected context
extractArticleBody is a pure utility → helper file PASS No state, no API calls — qualifies for kmeContentSourceAdapterHelpers.js
No new files in src/ PASS Only two existing files are modified
No new npm dependencies PASS axios and URL/URLSearchParams already available in context
Helpers file uses literal function body pattern PASS New helper added before existing return { ... } block
Authentication (getValidToken) stays in proxy script (called from adapter, not moved) PASS getValidToken() is invoked from contentFetchFlow() in the adapter

Post-design re-check: All gates pass. No constitutional violations. No complexity tracking required.

Project Structure

Documentation (this feature)

specs/003-kme-content-fetch/
├── plan.md              # This file
├── research.md          # Phase 0 output
├── data-model.md        # Phase 1 output
├── quickstart.md        # Phase 1 output
├── contracts/
│   └── http-content-fetch.md   # HTTP request/response contract
└── tasks.md             # Phase 2 output (/speckit.tasks — NOT created here)

Source Code (repository root)

src/
├── proxyScripts/
│   └── kmeContentSourceAdapter.js   # MODIFIED: add contentFetchFlow(), update routing
├── globalVariables/
│   └── kmeContentSourceAdapterHelpers.js  # MODIFIED: add extractArticleBody()
├── logger.js             # unchanged
└── server.js             # unchanged

tests/
├── unit/
│   └── proxy.test.js     # MODIFIED: add content-fetch describe blocks
└── contract/
    └── proxy-http.test.js  # MODIFIED: add content-fetch contract tests

Structure Decision: Single-project monolith. Two existing source files modified; two existing test files extended. No new files in src/. The VM sandbox pattern and helper file pattern are preserved exactly as documented in the constitution.


Phase 0: Research Findings → research.md

See research.md for full decision log. Summary:

  • URL parameter extraction: new URL(req.url, 'http://localhost').searchParams.get('kmeURL') — confirmed safe from VM context research; URL is injected at server.js line 19.
  • URL validation: new URL(kmeURL) + protocol check in try/catch — cleanly handles FR-007/FR-008 in one guard.
  • Axios error handling: Confirmed ECONNABORTED/ERR_CANCELED for timeout; err.response.status available for all HTTP errors; JSON auto-parsed when Content-Type: application/json.
  • JSON-LD parsing: response.data is an object when axios auto-parses; fallback JSON.parse() needed for non-JSON content-types; non-object → 502.
  • No unknowns remaining: All NEEDS CLARIFICATION resolved. Research complete.

Phase 1: Design

Routing Change (kmeContentSourceAdapter.js)

Add a new else if branch between the existing sitemap check and the oidcAuthFlow fallback:

// Entry point — URL routing
try {
  if (req.url.endsWith('/sitemap.xml')) {
    await sitemapFlow();
  } else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) {
    await contentFetchFlow();          // ← NEW
  } else {
    await oidcAuthFlow();
  }
} catch (err) { /* existing outer catch → 401 */ }

contentFetchFlow() is fully self-contained — all errors are caught internally and never propagate to the outer catch.

contentFetchFlow() — Complete Logic

async function contentFetchFlow() {
  // Step 1: Extract kmeURL (FR-001, FR-002)
  const kmeURL = new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? '';

  // Step 2: Validate — absent / empty (FR-007)
  if (!kmeURL.trim()) {
    res.writeHead(400, { 'Content-Type': 'text/plain' });
    res.end('Bad Request: kmeURL parameter is required');
    return;
  }

  // Step 3: Validate — well-formed absolute http/https URL (FR-008)
  try {
    const u = new URL(kmeURL);
    if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error();
  } catch {
    res.writeHead(400, { 'Content-Type': 'text/plain' });
    res.end('Bad Request: kmeURL must be a well-formed absolute http/https URL');
    return;
  }

  // Step 4: Validate OIDC settings (config guard, returns 500 for missing config)
  const missingField = kmeContentSourceAdapterHelpers.validateSettings(
    kme_CSA_settings,
    ['tokenUrl', 'username', 'password', 'clientId', 'scope'],
  );
  if (missingField) {
    console.error({ message: 'Content fetch: config error', missingField });
    res.writeHead(500, { 'Content-Type': 'text/plain' });
    res.end('Configuration error: missing required field: ' + missingField);
    return;
  }

  // Step 5: Obtain OIDC token (FR-003, FR-011)
  let token;
  try {
    token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
  } catch (tokenErr) {
    console.error({ message: 'Content fetch: token acquisition failed', error: tokenErr.message });
    res.writeHead(502, { 'Content-Type': 'text/plain' });
    res.end('Bad Gateway: token acquisition failed');
    return;
  }

  // Step 6: GET kmeURL verbatim with auth header (FR-002, FR-003, FR-004)
  let response;
  try {
    console.debug({ message: 'Content fetch: fetching article', kmeURL });
    response = await axios.get(kmeURL, {
      headers: { Authorization: `OIDC_id_token ${token}` },
      timeout: 10000,
    });
  } catch (fetchErr) {
    if (fetchErr.code === 'ECONNABORTED' || fetchErr.code === 'ERR_CANCELED') {
      console.error({ message: 'Content fetch: upstream timeout', code: fetchErr.code });
      res.writeHead(502, { 'Content-Type': 'text/plain' });
      res.end('Bad Gateway: upstream request timed out');
    } else if (fetchErr.response) {
      const status = fetchErr.response.status;
      console.error({ message: 'Content fetch: upstream HTTP error', status });
      if (status >= 400 && status < 500) {
        res.writeHead(404, { 'Content-Type': 'text/plain' });
        res.end('Not Found: article not found at upstream');
      } else {
        res.writeHead(502, { 'Content-Type': 'text/plain' });
        res.end('Bad Gateway: upstream error HTTP ' + status);
      }
    } else {
      console.error({ message: 'Content fetch: network error', error: fetchErr.message });
      res.writeHead(502, { 'Content-Type': 'text/plain' });
      res.end('Bad Gateway: ' + fetchErr.message);
    }
    return;
  }

  // Step 7: Parse body — handle non-JSON content-type (FR-005, FR-010)
  let data = response.data;
  if (typeof data === 'string') {
    try {
      data = JSON.parse(data);
    } catch {
      console.error({ message: 'Content fetch: unparseable response body', kmeURL });
      res.writeHead(502, { 'Content-Type': 'text/plain' });
      res.end('Bad Gateway: unparseable response from upstream');
      return;
    }
  }
  if (typeof data !== 'object' || data === null) {
    console.error({ message: 'Content fetch: unexpected non-object response', kmeURL });
    res.writeHead(502, { 'Content-Type': 'text/plain' });
    res.end('Bad Gateway: unexpected response from upstream');
    return;
  }

  // Step 8: Extract vkm:articleBody (FR-005, FR-009)
  const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data);
  if (!articleBody) {
    console.error({ message: 'Content fetch: vkm:articleBody absent or empty', kmeURL });
    res.writeHead(404, { 'Content-Type': 'text/plain' });
    res.end('Not Found: article body not present in upstream response');
    return;
  }

  // Step 9: Return article HTML (FR-006)
  console.info({ message: 'Content fetch: article fetched successfully', kmeURL });
  res.writeHead(200, { 'Content-Type': 'text/html' });
  res.end(articleBody);
}

extractArticleBody(data) — New Helper

Add to kmeContentSourceAdapterHelpers.js before the existing return { ... } block:

/**
 * Extracts the vkm:articleBody string from a KME Content Service JSON-LD response.
 * Returns null if the field is absent, null, not a string, or an empty/whitespace string.
 * @param {object} data  parsed JSON-LD response from the KME Content Service
 * @returns {string|null}
 */
function extractArticleBody(data) {
  if (!data || typeof data !== 'object') return null;
  const body = data['vkm:articleBody'];
  if (body == null || typeof body !== 'string' || body.trim() === '') return null;
  return body;
}

Update the return { ... } at the bottom of the helpers file to export the new function:

return {
  validateSettings,
  extractHydraItems,
  buildSitemapXml,
  getValidToken,
  extractArticleBody,   // ← NEW
};

Error Response Matrix

Condition HTTP Status Response Body
kmeURL absent or empty 400 Bad Request: kmeURL parameter is required
kmeURL not a well-formed absolute http/https URL 400 Bad Request: kmeURL must be a well-formed absolute http/https URL
Missing OIDC config field 500 Configuration error: missing required field: {field}
Token acquisition failure 502 Bad Gateway: token acquisition failed
Upstream 4xx response 404 Not Found: article not found at upstream
Upstream 5xx response 502 Bad Gateway: upstream error HTTP {status}
Upstream timeout (ECONNABORTED/ERR_CANCELED) 502 Bad Gateway: upstream request timed out
Network error (no err.response) 502 Bad Gateway: {err.message}
Response body unparseable as JSON 502 Bad Gateway: unparseable response from upstream
Non-object response body 502 Bad Gateway: unexpected response from upstream
vkm:articleBody absent, null, or empty 404 Not Found: article body not present in upstream response
Success 200 text/html article body HTML

Test Coverage Plan

Unit tests (add to tests/unit/proxy.test.js):

Describe block Test case Verifies
US-content-fetch: happy path cached token → 200 HTML body FR-001, FR-005, FR-006
US-content-fetch: happy path cache miss → token fetch → 200 HTML body FR-003
US-content-fetch: happy path expired token → refresh → 200 HTML body FR-003
US-content-fetch: input validation no kmeURL → oidcAuthFlow (unchanged 200) FR-012
US-content-fetch: input validation kmeURL empty string → 400 FR-007
US-content-fetch: input validation kmeURL whitespace → 400 FR-007
US-content-fetch: input validation kmeURL relative URL → 400 FR-008
US-content-fetch: input validation kmeURL non-http protocol (ftp:) → 400 FR-008
US-content-fetch: input validation kmeURL malformed string → 400 FR-008
US-content-fetch: token failure getValidToken throws → 502 FR-011
US-content-fetch: upstream errors upstream 404 → 404 FR-009
US-content-fetch: upstream errors upstream 410 → 404 FR-009
US-content-fetch: upstream errors upstream 503 → 502 FR-010
US-content-fetch: upstream errors timeout ECONNABORTED → 502 FR-010
US-content-fetch: upstream errors timeout ERR_CANCELED → 502 FR-010
US-content-fetch: upstream errors network error (no err.response) → 502 FR-010
US-content-fetch: body parsing unparseable string body → 502 FR-010
US-content-fetch: body parsing vkm:articleBody absent → 404 FR-009
US-content-fetch: body parsing vkm:articleBody null → 404 FR-009
US-content-fetch: body parsing vkm:articleBody empty string → 404 FR-009
US-content-fetch: body parsing vkm:articleBody whitespace → 404 FR-009
US-content-fetch: passthrough preserved no kmeURL, not sitemap → 200 'Authorized' FR-012
extractArticleBody helper returns body string FR-005
extractArticleBody helper null data → null FR-005
extractArticleBody helper no vkm:articleBody field → null FR-009
extractArticleBody helper empty string → null FR-009
extractArticleBody helper whitespace string → null FR-009

Contract tests (add to tests/contract/proxy-http.test.js):

Test case Setup Verifies
valid kmeURL → real mock HTTP server returning JSON-LD with vkm:articleBody → 200 HTML real HTTP server, real token server SC-001, FR-006
real mock server returning 404 → proxy returns 404 real 404 HTTP server FR-009
real mock server returning 503 → proxy returns 502 real 503 HTTP server FR-010
non-responding server → proxy returns 502 within 12 s real server that never responds FR-010

extractArticleBody — Edge Case Coverage

Input Expected output
{ 'vkm:articleBody': '<p>Hello</p>' } '<p>Hello</p>'
{ 'vkm:articleBody': '' } null
{ 'vkm:articleBody': ' ' } null
{ 'vkm:articleBody': null } null
{ 'vkm:articleBody': undefined } (field absent) null
{} (field absent) null
null null
'string' (non-object) null