Files
kme_content_adapter/specs/003-kme-content-fetch/tasks.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

228 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
description: "Task list for KME Article Content Fetch (003)"
---
# Tasks: KME Article Content Fetch
**Input**: Design documents from `specs/003-kme-content-fetch/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/http-content-fetch.md ✅, quickstart.md ✅
**Architecture constraints**:
- Zero new files in `src/` — only `src/proxyScripts/kmeContentSourceAdapter.js` and `src/globalVariables/kmeContentSourceAdapterHelpers.js` are modified
- VM sandbox: zero `import`/`export` statements in proxy script or helpers file
- Helpers file is a literal function body (ends with `return { ... }`) — new function added before that block
- Tests use Node.js built-in test runner (`node:test`)
**Files in scope**:
| File | Change |
|------|--------|
| `src/globalVariables/kmeContentSourceAdapterHelpers.js` | Add `extractArticleBody(data)`; export in `return { ... }` |
| `src/proxyScripts/kmeContentSourceAdapter.js` | Add `contentFetchFlow()`; add routing branch |
| `tests/unit/proxy.test.js` | Add content-fetch describe blocks and helper tests |
| `tests/contract/proxy-http.test.js` | Add content-fetch contract tests |
| `CHANGELOG.md` | Add feature entry |
## Format: `[ID] [P?] [Story?] Description`
- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
- **[Story]**: Which user story this task belongs to (US1US4)
- All file paths are relative to repository root
---
## Phase 1: Setup
**Purpose**: Confirm baseline before any modifications
- [X] T001 Run `npm test` from repository root to confirm all existing tests pass and record the baseline count
**Checkpoint**: Baseline confirmed — no pre-existing failures
---
## Phase 2: Foundational (Blocking Prerequisite)
**Purpose**: Add `extractArticleBody` pure helper — required by `contentFetchFlow()` in every user story phase
**⚠️ CRITICAL**: Phase 3 implementation cannot begin until T002 and T003 are complete; T004 is independently testable after T002+T003
- [X] T002 Add `extractArticleBody(data)` function body to `src/globalVariables/kmeContentSourceAdapterHelpers.js` — insert immediately before the existing `return { ... }` block; implementation: guard for non-object input (`if (!data || typeof data !== 'object') return null`), extract `data['vkm:articleBody']`, return null if field is null/undefined/non-string/empty/whitespace, otherwise return the string
- [X] T003 Add `extractArticleBody` to the exports in the `return { ... }` block at the bottom of `src/globalVariables/kmeContentSourceAdapterHelpers.js` so the injected VM context exposes the new function
- [X] T004 [P] Add `extractArticleBody helper` describe block to `tests/unit/proxy.test.js` covering all 7 edge cases per data-model.md: valid HTML string → returns string; empty string → null; whitespace-only string → null; null field value → null; field absent (`{}`) → null; null input → null; non-object input (string) → null — no mocking needed, call the helper directly
**Checkpoint**: `extractArticleBody` is implemented, exported, and unit-tested; run `npm run test:unit` to confirm T004 passes
---
## Phase 3: User Story 1 — Happy Path Article Fetch (Priority: P1) 🎯 MVP
**Goal**: Proxy receives a valid `?kmeURL=` request, obtains an OIDC token, fetches the upstream article, extracts `vkm:articleBody`, and returns it as `200 text/html`
**Independent Test**: `curl "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123"` returns `200 OK`, `Content-Type: text/html`, and body matching `vkm:articleBody` from the mock upstream
### Implementation for User Story 1
- [X] T005 [US1] Implement complete `contentFetchFlow()` async function in `src/proxyScripts/kmeContentSourceAdapter.js` following the 9-step design in plan.md: (1) extract `kmeURL` via `new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? ''`, (2) empty/blank → 400, (3) malformed/non-http(s) → 400, (4) `validateSettings` missing field → 500, (5) `getValidToken` throws → 502, (6) `axios.get(kmeURL, { headers: { Authorization: 'OIDC_id_token {token}' }, timeout: 10000 })` — ECONNABORTED/ERR_CANCELED → 502, upstream 4xx → 404, upstream 5xx → 502, network error → 502, (7) string body fallback `JSON.parse` — failure → 502; non-object → 502, (8) `extractArticleBody(data)` → null → 404, (9) `res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(articleBody)`
- [X] T006 [US1] Add content-fetch routing branch to the URL dispatch block in `src/proxyScripts/kmeContentSourceAdapter.js`: insert `else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) { await contentFetchFlow(); }` between the existing sitemap check and the `oidcAuthFlow()` fallback
### Tests for User Story 1
- [X] T007 [P] [US1] Add `US-content-fetch: happy path` describe block to `tests/unit/proxy.test.js` with two tests: (a) stub `getValidToken` returning cached token + stub `axios.get` returning `{ data: { 'vkm:articleBody': '<p>Hello</p>' } }` → assert status 200, `Content-Type: text/html`, body `<p>Hello</p>`; (b) stub `getValidToken` simulating cache miss (returns a freshly acquired token) → same 200 assertion
- [X] T008 [P] [US1] Add happy path contract test to `tests/contract/proxy-http.test.js`: start a real mock HTTP server that returns `{ "vkm:articleBody": "<p>Contract test article</p>" }` with `Content-Type: application/ld+json`; start a real mock token server; issue `GET /?kmeURL={mock-server-url}` to the proxy; assert status 200, `Content-Type: text/html`, response body equals `<p>Contract test article</p>`; verify total round-trip is under 11 s (SC-001)
**Checkpoint**: `npm run test:unit` and `npm run test:contract` both pass for happy path; manually verify with `curl` per quickstart.md
---
## Phase 4: User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)
**Goal**: Requests with absent, empty, whitespace, or malformed `kmeURL` receive a 400 response with no upstream call made
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL="` returns `400`; `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url"` returns `400`
### Tests for User Story 2
- [X] T009 [P] [US2] Add `US-content-fetch: input validation` describe block to `tests/unit/proxy.test.js` with 6 tests using a spy on `axios.get` to assert it is never called: (a) `?kmeURL` absent (no kmeURL param) → routes to `oidcAuthFlow` → 200 (confirms FR-012); (b) `?kmeURL=` empty string → 400, body `Bad Request: kmeURL parameter is required`; (c) `?kmeURL=%20` whitespace-only → 400; (d) `?kmeURL=relative/path` → 400, body `Bad Request: kmeURL must be a well-formed absolute http/https URL`; (e) `?kmeURL=ftp://example.com/article` non-http protocol → 400; (f) `?kmeURL=:::malformed` → 400
**Checkpoint**: `npm run test:unit` passes for validation tests; confirm no upstream stubs are invoked in any 400 scenario
---
## Phase 5: User Story 3 — Upstream Failure & Missing Article Body (Priority: P3)
**Goal**: All upstream error conditions (token failure, 4xx, 5xx, timeout, network error, bad body, missing/empty `vkm:articleBody`) return the correct 404 or 502 status to the caller
**Independent Test**: Stub `axios.get` to throw an ECONNABORTED error; verify proxy returns 502. Stub `getValidToken` to throw; verify proxy returns 502. Stub `axios.get` returning `{ data: {} }`; verify proxy returns 404.
### Tests for User Story 3
- [X] T010 [P] [US3] Add `US-content-fetch: upstream errors` describe block to `tests/unit/proxy.test.js` with 7 tests: (a) `getValidToken` throws → 502, body `Bad Gateway: token acquisition failed`; (b) `axios.get` throws with `{ response: { status: 404 } }` → 404, body `Not Found: article not found at upstream`; (c) `axios.get` throws with `{ response: { status: 410 } }` → 404; (d) `axios.get` throws with `{ response: { status: 503 } }` → 502, body `Bad Gateway: upstream error HTTP 503`; (e) `axios.get` throws with `{ code: 'ECONNABORTED' }` → 502, body `Bad Gateway: upstream request timed out`; (f) `axios.get` throws with `{ code: 'ERR_CANCELED' }` → 502; (g) `axios.get` throws with `{ message: 'ENOTFOUND' }` (no `response`, no code) → 502, body contains `Bad Gateway:`
- [X] T011 [P] [US3] Add `US-content-fetch: body parsing` describe block to `tests/unit/proxy.test.js` with 5 tests (all require valid `getValidToken` stub): (a) `axios.get` returns `{ data: 'not json{{{' }` (string, unparseable) → 502, body `Bad Gateway: unparseable response from upstream`; (b) `axios.get` returns `{ data: { 'vkm:articleBody': undefined } }` (field absent) → 404, body `Not Found: article body not present in upstream response`; (c) field is `null` → 404; (d) field is `''` empty string → 404; (e) field is `' '` whitespace-only → 404
- [X] T012 [P] [US3] Add contract error tests to `tests/contract/proxy-http.test.js`: (a) mock upstream server returns HTTP 404 → proxy returns 404; (b) mock upstream server returns HTTP 503 → proxy returns 502; (c) mock server accepts connection but never responds (use `server.on('request', () => {})`) → proxy returns 502 within 12 s and does not hang
**Checkpoint**: All 19 unit tests in T010+T011 pass; all 3 contract error tests in T012 pass
---
## Phase 6: User Story 4 — Passthrough Behaviour Preserved (Priority: P4)
**Goal**: Requests without `kmeURL` and without `/sitemap.xml` suffix continue to receive the existing 200 OK auth-check passthrough — zero regression
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/"` returns `200` and body is `Authorized` (unchanged)
### Tests for User Story 4
- [X] T013 [US4] Add `US-content-fetch: passthrough preserved` describe block to `tests/unit/proxy.test.js` with 1 test: GET `/?someOtherParam=value` (no `kmeURL`, not sitemap) → assert status 200, body `Authorized`, and confirm `axios.get` is never called (spy asserts not called) — verifies FR-012 and SC-005
**Checkpoint**: Passthrough test passes; run full `npm test` to confirm zero regressions across entire suite
---
## Final Phase: Polish & Cross-Cutting Concerns
**Purpose**: Changelog documentation and final validation
- [X] T014 [P] Add entry to `CHANGELOG.md` for feature `003-kme-content-fetch`: document new `contentFetchFlow()` in `kmeContentSourceAdapter.js` (routes `?kmeURL=` requests, handles all error paths 400/404/500/502, 10 s timeout), new `extractArticleBody(data)` in `kmeContentSourceAdapterHelpers.js`, new unit test describe blocks in `tests/unit/proxy.test.js`, and new contract tests in `tests/contract/proxy-http.test.js`
- [X] T015 Run full test suite `npm test` and confirm all tests pass; run the four quickstart.md `curl` smoke tests (valid kmeURL passthrough, empty kmeURL → 400, malformed kmeURL → 400, sitemap → 200) to validate end-to-end behaviour
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — run immediately
- **Foundational (Phase 2)**: Depends on Setup ✅ — **BLOCKS** all user story phases
- T002 → T003 (sequential, same file)
- T004 [P] can run after T002+T003 (different file: test file)
- **US1 (Phase 3)**: Depends on Foundational complete (T002+T003)
- T005 → T006 (sequential, same file)
- T007 [P] and T008 [P] can run after T005+T006 (different files)
- **US2 (Phase 4)**: Depends on T005+T006 complete (tests the validation guards inside `contentFetchFlow`)
- **US3 (Phase 5)**: Depends on T005+T006 complete (tests the error guards inside `contentFetchFlow`)
- **US4 (Phase 6)**: Depends on T006 complete (tests the routing branch)
- **Polish (Final)**: Depends on all user story phases complete
### User Story Dependencies
- **US1 (P1)**: Depends only on Foundational phase
- **US2 (P2)**: Depends on US1 implementation (T005+T006) — validation lives inside `contentFetchFlow()`
- **US3 (P3)**: Depends on US1 implementation (T005+T006) — error paths live inside `contentFetchFlow()`
- **US4 (P4)**: Depends on T006 routing branch — tests that passthrough still reached when no `kmeURL`
### Within Each Phase
- Source file edits must complete before their corresponding test tasks
- T002 must complete before T003 (same file, sequential)
- T005 must complete before T006 (same file, sequential)
- T005+T006 must complete before T007, T008, T009, T010, T011, T012, T013
### Parallel Opportunities
- T004 [P] runs in parallel with T005+T006 (different files: test file vs source file)
- After T005+T006: T007 [P], T008 [P], T009 [P], T010 [P], T011 [P], T012 [P] can all run in parallel (different describe blocks, or separate test file vs unit file)
- T013 and T014 [P] run in parallel (different files)
---
## Parallel Execution Examples
### Foundational Phase Parallelism
```
# After T002+T003 complete, run simultaneously:
Task A: T004 — Write extractArticleBody unit tests in tests/unit/proxy.test.js
Task B: T005 — Implement contentFetchFlow() in src/proxyScripts/kmeContentSourceAdapter.js
```
### After T005+T006 Complete
```
# These 6 tasks can all run in parallel (different describe blocks / different files):
Task A: T007 — Happy path unit tests (proxy.test.js)
Task B: T008 — Happy path contract test (proxy-http.test.js)
Task C: T009 — Input validation unit tests (proxy.test.js, separate describe block)
Task D: T010 — Upstream error unit tests (proxy.test.js, separate describe block)
Task E: T011 — Body parsing unit tests (proxy.test.js, separate describe block)
Task F: T012 — Contract error tests (proxy-http.test.js, separate describe block)
```
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup baseline verification
2. Complete Phase 2: Add `extractArticleBody` helper (CRITICAL — blocks everything)
3. Complete Phase 3: Implement `contentFetchFlow()`, routing branch, and happy path tests
4. **STOP and VALIDATE**: `npm run test:unit` + `npm run test:contract` pass; manual `curl` smoke test works
5. **Deploy/demo if ready** — consumers can now fetch articles via the proxy
### Incremental Delivery
1. Foundation + US1 → happy path working → Demo MVP
2. Add US2 tests → validate 400 rejection works
3. Add US3 tests → validate error handling works
4. Add US4 test → confirm no regression
5. Polish → CHANGELOG + final `npm test`
### Single-Developer Sequence (Optimal Order)
```
T001 → T002 → T003 → T005 → T006 → T004* → T007 → T009 → T010 → T011 → T013 → T008 → T012 → T014 → T015
(* T004 can be done any time after T003 — fits naturally here before test sprint)
```
---
## Notes
- **VM sandbox constraint**: `contentFetchFlow()` must not contain any `import` or `require` — all dependencies (`axios`, `kmeContentSourceAdapterHelpers`, `kme_CSA_settings`, `URL`, `URLSearchParams`, `console`, `req`, `res`) arrive via the injected VM context
- **Helpers file constraint**: `extractArticleBody` must be inserted as a plain `function` declaration before the existing `return { ... }` block — no module syntax
- **`[P]` tasks**: different files with no dependency on incomplete tasks in the same file
- **`[Story]` labels**: map each test task back to the user story it validates for traceability
- Each user story's test tasks are independently runnable with `node --test tests/unit/proxy.test.js` (filter by describe block name)
- Commit after each logical group (e.g., after T002+T003, after T005+T006, after all unit test tasks)
- Verify `npm test` green at each checkpoint before proceeding to next phase