Files
kme_content_adapter/specs/003-kme-content-fetch/tasks.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

15 KiB
Raw Blame History

description
description
Task list for KME Article Content Fetch (003)

Tasks: KME Article Content Fetch

Input: Design documents from specs/003-kme-content-fetch/ Prerequisites: plan.md , spec.md , research.md , data-model.md , contracts/http-content-fetch.md , quickstart.md

Architecture constraints:

  • Zero new files in src/ — only src/proxyScripts/kmeContentSourceAdapter.js and src/globalVariables/kmeContentSourceAdapterHelpers.js are modified
  • VM sandbox: zero import/export statements in proxy script or helpers file
  • Helpers file is a literal function body (ends with return { ... }) — new function added before that block
  • Tests use Node.js built-in test runner (node:test)

Files in scope:

File Change
src/globalVariables/kmeContentSourceAdapterHelpers.js Add extractArticleBody(data); export in return { ... }
src/proxyScripts/kmeContentSourceAdapter.js Add contentFetchFlow(); add routing branch
tests/unit/proxy.test.js Add content-fetch describe blocks and helper tests
tests/contract/proxy-http.test.js Add content-fetch contract tests
CHANGELOG.md Add feature entry

Format: [ID] [P?] [Story?] Description

  • [P]: Can run in parallel (different files, no dependencies on incomplete tasks)
  • [Story]: Which user story this task belongs to (US1US4)
  • All file paths are relative to repository root

Phase 1: Setup

Purpose: Confirm baseline before any modifications

  • T001 Run npm test from repository root to confirm all existing tests pass and record the baseline count

Checkpoint: Baseline confirmed — no pre-existing failures


Phase 2: Foundational (Blocking Prerequisite)

Purpose: Add extractArticleBody pure helper — required by contentFetchFlow() in every user story phase

⚠️ CRITICAL: Phase 3 implementation cannot begin until T002 and T003 are complete; T004 is independently testable after T002+T003

  • T002 Add extractArticleBody(data) function body to src/globalVariables/kmeContentSourceAdapterHelpers.js — insert immediately before the existing return { ... } block; implementation: guard for non-object input (if (!data || typeof data !== 'object') return null), extract data['vkm:articleBody'], return null if field is null/undefined/non-string/empty/whitespace, otherwise return the string
  • T003 Add extractArticleBody to the exports in the return { ... } block at the bottom of src/globalVariables/kmeContentSourceAdapterHelpers.js so the injected VM context exposes the new function
  • T004 [P] Add extractArticleBody helper describe block to tests/unit/proxy.test.js covering all 7 edge cases per data-model.md: valid HTML string → returns string; empty string → null; whitespace-only string → null; null field value → null; field absent ({}) → null; null input → null; non-object input (string) → null — no mocking needed, call the helper directly

Checkpoint: extractArticleBody is implemented, exported, and unit-tested; run npm run test:unit to confirm T004 passes


Phase 3: User Story 1 — Happy Path Article Fetch (Priority: P1) 🎯 MVP

Goal: Proxy receives a valid ?kmeURL= request, obtains an OIDC token, fetches the upstream article, extracts vkm:articleBody, and returns it as 200 text/html

Independent Test: curl "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123" returns 200 OK, Content-Type: text/html, and body matching vkm:articleBody from the mock upstream

Implementation for User Story 1

  • T005 [US1] Implement complete contentFetchFlow() async function in src/proxyScripts/kmeContentSourceAdapter.js following the 9-step design in plan.md: (1) extract kmeURL via new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? '', (2) empty/blank → 400, (3) malformed/non-http(s) → 400, (4) validateSettings missing field → 500, (5) getValidToken throws → 502, (6) axios.get(kmeURL, { headers: { Authorization: 'OIDC_id_token {token}' }, timeout: 10000 }) — ECONNABORTED/ERR_CANCELED → 502, upstream 4xx → 404, upstream 5xx → 502, network error → 502, (7) string body fallback JSON.parse — failure → 502; non-object → 502, (8) extractArticleBody(data) → null → 404, (9) res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(articleBody)
  • T006 [US1] Add content-fetch routing branch to the URL dispatch block in src/proxyScripts/kmeContentSourceAdapter.js: insert else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) { await contentFetchFlow(); } between the existing sitemap check and the oidcAuthFlow() fallback

Tests for User Story 1

  • T007 [P] [US1] Add US-content-fetch: happy path describe block to tests/unit/proxy.test.js with two tests: (a) stub getValidToken returning cached token + stub axios.get returning { data: { 'vkm:articleBody': '<p>Hello</p>' } } → assert status 200, Content-Type: text/html, body <p>Hello</p>; (b) stub getValidToken simulating cache miss (returns a freshly acquired token) → same 200 assertion
  • T008 [P] [US1] Add happy path contract test to tests/contract/proxy-http.test.js: start a real mock HTTP server that returns { "vkm:articleBody": "<p>Contract test article</p>" } with Content-Type: application/ld+json; start a real mock token server; issue GET /?kmeURL={mock-server-url} to the proxy; assert status 200, Content-Type: text/html, response body equals <p>Contract test article</p>; verify total round-trip is under 11 s (SC-001)

Checkpoint: npm run test:unit and npm run test:contract both pass for happy path; manually verify with curl per quickstart.md


Phase 4: User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)

Goal: Requests with absent, empty, whitespace, or malformed kmeURL receive a 400 response with no upstream call made

Independent Test: curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=" returns 400; curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url" returns 400

Tests for User Story 2

  • T009 [P] [US2] Add US-content-fetch: input validation describe block to tests/unit/proxy.test.js with 6 tests using a spy on axios.get to assert it is never called: (a) ?kmeURL absent (no kmeURL param) → routes to oidcAuthFlow → 200 (confirms FR-012); (b) ?kmeURL= empty string → 400, body Bad Request: kmeURL parameter is required; (c) ?kmeURL=%20 whitespace-only → 400; (d) ?kmeURL=relative/path → 400, body Bad Request: kmeURL must be a well-formed absolute http/https URL; (e) ?kmeURL=ftp://example.com/article non-http protocol → 400; (f) ?kmeURL=:::malformed → 400

Checkpoint: npm run test:unit passes for validation tests; confirm no upstream stubs are invoked in any 400 scenario


Phase 5: User Story 3 — Upstream Failure & Missing Article Body (Priority: P3)

Goal: All upstream error conditions (token failure, 4xx, 5xx, timeout, network error, bad body, missing/empty vkm:articleBody) return the correct 404 or 502 status to the caller

Independent Test: Stub axios.get to throw an ECONNABORTED error; verify proxy returns 502. Stub getValidToken to throw; verify proxy returns 502. Stub axios.get returning { data: {} }; verify proxy returns 404.

Tests for User Story 3

  • T010 [P] [US3] Add US-content-fetch: upstream errors describe block to tests/unit/proxy.test.js with 7 tests: (a) getValidToken throws → 502, body Bad Gateway: token acquisition failed; (b) axios.get throws with { response: { status: 404 } } → 404, body Not Found: article not found at upstream; (c) axios.get throws with { response: { status: 410 } } → 404; (d) axios.get throws with { response: { status: 503 } } → 502, body Bad Gateway: upstream error HTTP 503; (e) axios.get throws with { code: 'ECONNABORTED' } → 502, body Bad Gateway: upstream request timed out; (f) axios.get throws with { code: 'ERR_CANCELED' } → 502; (g) axios.get throws with { message: 'ENOTFOUND' } (no response, no code) → 502, body contains Bad Gateway:
  • T011 [P] [US3] Add US-content-fetch: body parsing describe block to tests/unit/proxy.test.js with 5 tests (all require valid getValidToken stub): (a) axios.get returns { data: 'not json{{{' } (string, unparseable) → 502, body Bad Gateway: unparseable response from upstream; (b) axios.get returns { data: { 'vkm:articleBody': undefined } } (field absent) → 404, body Not Found: article body not present in upstream response; (c) field is null → 404; (d) field is '' empty string → 404; (e) field is ' ' whitespace-only → 404
  • T012 [P] [US3] Add contract error tests to tests/contract/proxy-http.test.js: (a) mock upstream server returns HTTP 404 → proxy returns 404; (b) mock upstream server returns HTTP 503 → proxy returns 502; (c) mock server accepts connection but never responds (use server.on('request', () => {})) → proxy returns 502 within 12 s and does not hang

Checkpoint: All 19 unit tests in T010+T011 pass; all 3 contract error tests in T012 pass


Phase 6: User Story 4 — Passthrough Behaviour Preserved (Priority: P4)

Goal: Requests without kmeURL and without /sitemap.xml suffix continue to receive the existing 200 OK auth-check passthrough — zero regression

Independent Test: curl -o /dev/null -w "%{http_code}" "http://localhost:3000/" returns 200 and body is Authorized (unchanged)

Tests for User Story 4

  • T013 [US4] Add US-content-fetch: passthrough preserved describe block to tests/unit/proxy.test.js with 1 test: GET /?someOtherParam=value (no kmeURL, not sitemap) → assert status 200, body Authorized, and confirm axios.get is never called (spy asserts not called) — verifies FR-012 and SC-005

Checkpoint: Passthrough test passes; run full npm test to confirm zero regressions across entire suite


Final Phase: Polish & Cross-Cutting Concerns

Purpose: Changelog documentation and final validation

  • T014 [P] Add entry to CHANGELOG.md for feature 003-kme-content-fetch: document new contentFetchFlow() in kmeContentSourceAdapter.js (routes ?kmeURL= requests, handles all error paths 400/404/500/502, 10 s timeout), new extractArticleBody(data) in kmeContentSourceAdapterHelpers.js, new unit test describe blocks in tests/unit/proxy.test.js, and new contract tests in tests/contract/proxy-http.test.js
  • T015 Run full test suite npm test and confirm all tests pass; run the four quickstart.md curl smoke tests (valid kmeURL passthrough, empty kmeURL → 400, malformed kmeURL → 400, sitemap → 200) to validate end-to-end behaviour

Dependencies & Execution Order

Phase Dependencies

  • Setup (Phase 1): No dependencies — run immediately
  • Foundational (Phase 2): Depends on Setup BLOCKS all user story phases
    • T002 → T003 (sequential, same file)
    • T004 [P] can run after T002+T003 (different file: test file)
  • US1 (Phase 3): Depends on Foundational complete (T002+T003)
    • T005 → T006 (sequential, same file)
    • T007 [P] and T008 [P] can run after T005+T006 (different files)
  • US2 (Phase 4): Depends on T005+T006 complete (tests the validation guards inside contentFetchFlow)
  • US3 (Phase 5): Depends on T005+T006 complete (tests the error guards inside contentFetchFlow)
  • US4 (Phase 6): Depends on T006 complete (tests the routing branch)
  • Polish (Final): Depends on all user story phases complete

User Story Dependencies

  • US1 (P1): Depends only on Foundational phase
  • US2 (P2): Depends on US1 implementation (T005+T006) — validation lives inside contentFetchFlow()
  • US3 (P3): Depends on US1 implementation (T005+T006) — error paths live inside contentFetchFlow()
  • US4 (P4): Depends on T006 routing branch — tests that passthrough still reached when no kmeURL

Within Each Phase

  • Source file edits must complete before their corresponding test tasks
  • T002 must complete before T003 (same file, sequential)
  • T005 must complete before T006 (same file, sequential)
  • T005+T006 must complete before T007, T008, T009, T010, T011, T012, T013

Parallel Opportunities

  • T004 [P] runs in parallel with T005+T006 (different files: test file vs source file)
  • After T005+T006: T007 [P], T008 [P], T009 [P], T010 [P], T011 [P], T012 [P] can all run in parallel (different describe blocks, or separate test file vs unit file)
  • T013 and T014 [P] run in parallel (different files)

Parallel Execution Examples

Foundational Phase Parallelism

# After T002+T003 complete, run simultaneously:
Task A: T004 — Write extractArticleBody unit tests in tests/unit/proxy.test.js
Task B: T005 — Implement contentFetchFlow() in src/proxyScripts/kmeContentSourceAdapter.js

After T005+T006 Complete

# These 6 tasks can all run in parallel (different describe blocks / different files):
Task A: T007 — Happy path unit tests (proxy.test.js)
Task B: T008 — Happy path contract test (proxy-http.test.js)
Task C: T009 — Input validation unit tests (proxy.test.js, separate describe block)
Task D: T010 — Upstream error unit tests (proxy.test.js, separate describe block)
Task E: T011 — Body parsing unit tests (proxy.test.js, separate describe block)
Task F: T012 — Contract error tests (proxy-http.test.js, separate describe block)

Implementation Strategy

MVP First (User Story 1 Only)

  1. Complete Phase 1: Setup baseline verification
  2. Complete Phase 2: Add extractArticleBody helper (CRITICAL — blocks everything)
  3. Complete Phase 3: Implement contentFetchFlow(), routing branch, and happy path tests
  4. STOP and VALIDATE: npm run test:unit + npm run test:contract pass; manual curl smoke test works
  5. Deploy/demo if ready — consumers can now fetch articles via the proxy

Incremental Delivery

  1. Foundation + US1 → happy path working → Demo MVP
  2. Add US2 tests → validate 400 rejection works
  3. Add US3 tests → validate error handling works
  4. Add US4 test → confirm no regression
  5. Polish → CHANGELOG + final npm test

Single-Developer Sequence (Optimal Order)

T001 → T002 → T003 → T005 → T006 → T004* → T007 → T009 → T010 → T011 → T013 → T008 → T012 → T014 → T015
(* T004 can be done any time after T003 — fits naturally here before test sprint)

Notes

  • VM sandbox constraint: contentFetchFlow() must not contain any import or require — all dependencies (axios, kmeContentSourceAdapterHelpers, kme_CSA_settings, URL, URLSearchParams, console, req, res) arrive via the injected VM context
  • Helpers file constraint: extractArticleBody must be inserted as a plain function declaration before the existing return { ... } block — no module syntax
  • [P] tasks: different files with no dependency on incomplete tasks in the same file
  • [Story] labels: map each test task back to the user story it validates for traceability
  • Each user story's test tasks are independently runnable with node --test tests/unit/proxy.test.js (filter by describe block name)
  • Commit after each logical group (e.g., after T002+T003, after T005+T006, after all unit test tasks)
  • Verify npm test green at each checkpoint before proceeding to next phase