feat: content fetch, sitemap fixes, remove oidcAuthFlow

- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-23 16:40:06 -05:00
parent d50f041488
commit f840587e5e
29 changed files with 1998 additions and 352 deletions

View File

@@ -0,0 +1,227 @@
---
description: "Task list for KME Article Content Fetch (003)"
---
# Tasks: KME Article Content Fetch
**Input**: Design documents from `specs/003-kme-content-fetch/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/http-content-fetch.md ✅, quickstart.md ✅
**Architecture constraints**:
- Zero new files in `src/` — only `src/proxyScripts/kmeContentSourceAdapter.js` and `src/globalVariables/kmeContentSourceAdapterHelpers.js` are modified
- VM sandbox: zero `import`/`export` statements in proxy script or helpers file
- Helpers file is a literal function body (ends with `return { ... }`) — new function added before that block
- Tests use Node.js built-in test runner (`node:test`)
**Files in scope**:
| File | Change |
|------|--------|
| `src/globalVariables/kmeContentSourceAdapterHelpers.js` | Add `extractArticleBody(data)`; export in `return { ... }` |
| `src/proxyScripts/kmeContentSourceAdapter.js` | Add `contentFetchFlow()`; add routing branch |
| `tests/unit/proxy.test.js` | Add content-fetch describe blocks and helper tests |
| `tests/contract/proxy-http.test.js` | Add content-fetch contract tests |
| `CHANGELOG.md` | Add feature entry |
## Format: `[ID] [P?] [Story?] Description`
- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
- **[Story]**: Which user story this task belongs to (US1US4)
- All file paths are relative to repository root
---
## Phase 1: Setup
**Purpose**: Confirm baseline before any modifications
- [X] T001 Run `npm test` from repository root to confirm all existing tests pass and record the baseline count
**Checkpoint**: Baseline confirmed — no pre-existing failures
---
## Phase 2: Foundational (Blocking Prerequisite)
**Purpose**: Add `extractArticleBody` pure helper — required by `contentFetchFlow()` in every user story phase
**⚠️ CRITICAL**: Phase 3 implementation cannot begin until T002 and T003 are complete; T004 is independently testable after T002+T003
- [X] T002 Add `extractArticleBody(data)` function body to `src/globalVariables/kmeContentSourceAdapterHelpers.js` — insert immediately before the existing `return { ... }` block; implementation: guard for non-object input (`if (!data || typeof data !== 'object') return null`), extract `data['vkm:articleBody']`, return null if field is null/undefined/non-string/empty/whitespace, otherwise return the string
- [X] T003 Add `extractArticleBody` to the exports in the `return { ... }` block at the bottom of `src/globalVariables/kmeContentSourceAdapterHelpers.js` so the injected VM context exposes the new function
- [X] T004 [P] Add `extractArticleBody helper` describe block to `tests/unit/proxy.test.js` covering all 7 edge cases per data-model.md: valid HTML string → returns string; empty string → null; whitespace-only string → null; null field value → null; field absent (`{}`) → null; null input → null; non-object input (string) → null — no mocking needed, call the helper directly
**Checkpoint**: `extractArticleBody` is implemented, exported, and unit-tested; run `npm run test:unit` to confirm T004 passes
---
## Phase 3: User Story 1 — Happy Path Article Fetch (Priority: P1) 🎯 MVP
**Goal**: Proxy receives a valid `?kmeURL=` request, obtains an OIDC token, fetches the upstream article, extracts `vkm:articleBody`, and returns it as `200 text/html`
**Independent Test**: `curl "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123"` returns `200 OK`, `Content-Type: text/html`, and body matching `vkm:articleBody` from the mock upstream
### Implementation for User Story 1
- [X] T005 [US1] Implement complete `contentFetchFlow()` async function in `src/proxyScripts/kmeContentSourceAdapter.js` following the 9-step design in plan.md: (1) extract `kmeURL` via `new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? ''`, (2) empty/blank → 400, (3) malformed/non-http(s) → 400, (4) `validateSettings` missing field → 500, (5) `getValidToken` throws → 502, (6) `axios.get(kmeURL, { headers: { Authorization: 'OIDC_id_token {token}' }, timeout: 10000 })` — ECONNABORTED/ERR_CANCELED → 502, upstream 4xx → 404, upstream 5xx → 502, network error → 502, (7) string body fallback `JSON.parse` — failure → 502; non-object → 502, (8) `extractArticleBody(data)` → null → 404, (9) `res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(articleBody)`
- [X] T006 [US1] Add content-fetch routing branch to the URL dispatch block in `src/proxyScripts/kmeContentSourceAdapter.js`: insert `else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) { await contentFetchFlow(); }` between the existing sitemap check and the `oidcAuthFlow()` fallback
### Tests for User Story 1
- [X] T007 [P] [US1] Add `US-content-fetch: happy path` describe block to `tests/unit/proxy.test.js` with two tests: (a) stub `getValidToken` returning cached token + stub `axios.get` returning `{ data: { 'vkm:articleBody': '<p>Hello</p>' } }` → assert status 200, `Content-Type: text/html`, body `<p>Hello</p>`; (b) stub `getValidToken` simulating cache miss (returns a freshly acquired token) → same 200 assertion
- [X] T008 [P] [US1] Add happy path contract test to `tests/contract/proxy-http.test.js`: start a real mock HTTP server that returns `{ "vkm:articleBody": "<p>Contract test article</p>" }` with `Content-Type: application/ld+json`; start a real mock token server; issue `GET /?kmeURL={mock-server-url}` to the proxy; assert status 200, `Content-Type: text/html`, response body equals `<p>Contract test article</p>`; verify total round-trip is under 11 s (SC-001)
**Checkpoint**: `npm run test:unit` and `npm run test:contract` both pass for happy path; manually verify with `curl` per quickstart.md
---
## Phase 4: User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)
**Goal**: Requests with absent, empty, whitespace, or malformed `kmeURL` receive a 400 response with no upstream call made
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL="` returns `400`; `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url"` returns `400`
### Tests for User Story 2
- [X] T009 [P] [US2] Add `US-content-fetch: input validation` describe block to `tests/unit/proxy.test.js` with 6 tests using a spy on `axios.get` to assert it is never called: (a) `?kmeURL` absent (no kmeURL param) → routes to `oidcAuthFlow` → 200 (confirms FR-012); (b) `?kmeURL=` empty string → 400, body `Bad Request: kmeURL parameter is required`; (c) `?kmeURL=%20` whitespace-only → 400; (d) `?kmeURL=relative/path` → 400, body `Bad Request: kmeURL must be a well-formed absolute http/https URL`; (e) `?kmeURL=ftp://example.com/article` non-http protocol → 400; (f) `?kmeURL=:::malformed` → 400
**Checkpoint**: `npm run test:unit` passes for validation tests; confirm no upstream stubs are invoked in any 400 scenario
---
## Phase 5: User Story 3 — Upstream Failure & Missing Article Body (Priority: P3)
**Goal**: All upstream error conditions (token failure, 4xx, 5xx, timeout, network error, bad body, missing/empty `vkm:articleBody`) return the correct 404 or 502 status to the caller
**Independent Test**: Stub `axios.get` to throw an ECONNABORTED error; verify proxy returns 502. Stub `getValidToken` to throw; verify proxy returns 502. Stub `axios.get` returning `{ data: {} }`; verify proxy returns 404.
### Tests for User Story 3
- [X] T010 [P] [US3] Add `US-content-fetch: upstream errors` describe block to `tests/unit/proxy.test.js` with 7 tests: (a) `getValidToken` throws → 502, body `Bad Gateway: token acquisition failed`; (b) `axios.get` throws with `{ response: { status: 404 } }` → 404, body `Not Found: article not found at upstream`; (c) `axios.get` throws with `{ response: { status: 410 } }` → 404; (d) `axios.get` throws with `{ response: { status: 503 } }` → 502, body `Bad Gateway: upstream error HTTP 503`; (e) `axios.get` throws with `{ code: 'ECONNABORTED' }` → 502, body `Bad Gateway: upstream request timed out`; (f) `axios.get` throws with `{ code: 'ERR_CANCELED' }` → 502; (g) `axios.get` throws with `{ message: 'ENOTFOUND' }` (no `response`, no code) → 502, body contains `Bad Gateway:`
- [X] T011 [P] [US3] Add `US-content-fetch: body parsing` describe block to `tests/unit/proxy.test.js` with 5 tests (all require valid `getValidToken` stub): (a) `axios.get` returns `{ data: 'not json{{{' }` (string, unparseable) → 502, body `Bad Gateway: unparseable response from upstream`; (b) `axios.get` returns `{ data: { 'vkm:articleBody': undefined } }` (field absent) → 404, body `Not Found: article body not present in upstream response`; (c) field is `null` → 404; (d) field is `''` empty string → 404; (e) field is `' '` whitespace-only → 404
- [X] T012 [P] [US3] Add contract error tests to `tests/contract/proxy-http.test.js`: (a) mock upstream server returns HTTP 404 → proxy returns 404; (b) mock upstream server returns HTTP 503 → proxy returns 502; (c) mock server accepts connection but never responds (use `server.on('request', () => {})`) → proxy returns 502 within 12 s and does not hang
**Checkpoint**: All 19 unit tests in T010+T011 pass; all 3 contract error tests in T012 pass
---
## Phase 6: User Story 4 — Passthrough Behaviour Preserved (Priority: P4)
**Goal**: Requests without `kmeURL` and without `/sitemap.xml` suffix continue to receive the existing 200 OK auth-check passthrough — zero regression
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/"` returns `200` and body is `Authorized` (unchanged)
### Tests for User Story 4
- [X] T013 [US4] Add `US-content-fetch: passthrough preserved` describe block to `tests/unit/proxy.test.js` with 1 test: GET `/?someOtherParam=value` (no `kmeURL`, not sitemap) → assert status 200, body `Authorized`, and confirm `axios.get` is never called (spy asserts not called) — verifies FR-012 and SC-005
**Checkpoint**: Passthrough test passes; run full `npm test` to confirm zero regressions across entire suite
---
## Final Phase: Polish & Cross-Cutting Concerns
**Purpose**: Changelog documentation and final validation
- [X] T014 [P] Add entry to `CHANGELOG.md` for feature `003-kme-content-fetch`: document new `contentFetchFlow()` in `kmeContentSourceAdapter.js` (routes `?kmeURL=` requests, handles all error paths 400/404/500/502, 10 s timeout), new `extractArticleBody(data)` in `kmeContentSourceAdapterHelpers.js`, new unit test describe blocks in `tests/unit/proxy.test.js`, and new contract tests in `tests/contract/proxy-http.test.js`
- [X] T015 Run full test suite `npm test` and confirm all tests pass; run the four quickstart.md `curl` smoke tests (valid kmeURL passthrough, empty kmeURL → 400, malformed kmeURL → 400, sitemap → 200) to validate end-to-end behaviour
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — run immediately
- **Foundational (Phase 2)**: Depends on Setup ✅ — **BLOCKS** all user story phases
- T002 → T003 (sequential, same file)
- T004 [P] can run after T002+T003 (different file: test file)
- **US1 (Phase 3)**: Depends on Foundational complete (T002+T003)
- T005 → T006 (sequential, same file)
- T007 [P] and T008 [P] can run after T005+T006 (different files)
- **US2 (Phase 4)**: Depends on T005+T006 complete (tests the validation guards inside `contentFetchFlow`)
- **US3 (Phase 5)**: Depends on T005+T006 complete (tests the error guards inside `contentFetchFlow`)
- **US4 (Phase 6)**: Depends on T006 complete (tests the routing branch)
- **Polish (Final)**: Depends on all user story phases complete
### User Story Dependencies
- **US1 (P1)**: Depends only on Foundational phase
- **US2 (P2)**: Depends on US1 implementation (T005+T006) — validation lives inside `contentFetchFlow()`
- **US3 (P3)**: Depends on US1 implementation (T005+T006) — error paths live inside `contentFetchFlow()`
- **US4 (P4)**: Depends on T006 routing branch — tests that passthrough still reached when no `kmeURL`
### Within Each Phase
- Source file edits must complete before their corresponding test tasks
- T002 must complete before T003 (same file, sequential)
- T005 must complete before T006 (same file, sequential)
- T005+T006 must complete before T007, T008, T009, T010, T011, T012, T013
### Parallel Opportunities
- T004 [P] runs in parallel with T005+T006 (different files: test file vs source file)
- After T005+T006: T007 [P], T008 [P], T009 [P], T010 [P], T011 [P], T012 [P] can all run in parallel (different describe blocks, or separate test file vs unit file)
- T013 and T014 [P] run in parallel (different files)
---
## Parallel Execution Examples
### Foundational Phase Parallelism
```
# After T002+T003 complete, run simultaneously:
Task A: T004 — Write extractArticleBody unit tests in tests/unit/proxy.test.js
Task B: T005 — Implement contentFetchFlow() in src/proxyScripts/kmeContentSourceAdapter.js
```
### After T005+T006 Complete
```
# These 6 tasks can all run in parallel (different describe blocks / different files):
Task A: T007 — Happy path unit tests (proxy.test.js)
Task B: T008 — Happy path contract test (proxy-http.test.js)
Task C: T009 — Input validation unit tests (proxy.test.js, separate describe block)
Task D: T010 — Upstream error unit tests (proxy.test.js, separate describe block)
Task E: T011 — Body parsing unit tests (proxy.test.js, separate describe block)
Task F: T012 — Contract error tests (proxy-http.test.js, separate describe block)
```
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup baseline verification
2. Complete Phase 2: Add `extractArticleBody` helper (CRITICAL — blocks everything)
3. Complete Phase 3: Implement `contentFetchFlow()`, routing branch, and happy path tests
4. **STOP and VALIDATE**: `npm run test:unit` + `npm run test:contract` pass; manual `curl` smoke test works
5. **Deploy/demo if ready** — consumers can now fetch articles via the proxy
### Incremental Delivery
1. Foundation + US1 → happy path working → Demo MVP
2. Add US2 tests → validate 400 rejection works
3. Add US3 tests → validate error handling works
4. Add US4 test → confirm no regression
5. Polish → CHANGELOG + final `npm test`
### Single-Developer Sequence (Optimal Order)
```
T001 → T002 → T003 → T005 → T006 → T004* → T007 → T009 → T010 → T011 → T013 → T008 → T012 → T014 → T015
(* T004 can be done any time after T003 — fits naturally here before test sprint)
```
---
## Notes
- **VM sandbox constraint**: `contentFetchFlow()` must not contain any `import` or `require` — all dependencies (`axios`, `kmeContentSourceAdapterHelpers`, `kme_CSA_settings`, `URL`, `URLSearchParams`, `console`, `req`, `res`) arrive via the injected VM context
- **Helpers file constraint**: `extractArticleBody` must be inserted as a plain `function` declaration before the existing `return { ... }` block — no module syntax
- **`[P]` tasks**: different files with no dependency on incomplete tasks in the same file
- **`[Story]` labels**: map each test task back to the user story it validates for traceability
- Each user story's test tasks are independently runnable with `node --test tests/unit/proxy.test.js` (filter by describe block name)
- Commit after each logical group (e.g., after T002+T003, after T005+T006, after all unit test tasks)
- Verify `npm test` green at each checkpoint before proceeding to next phase