# Tasks: Sitemap XML Generation **Feature**: `002-sitemap-generation` **Input**: Design documents from `/specs/002-sitemap-generation/` **Prerequisites**: plan.md ✅ spec.md ✅ research.md ✅ data-model.md ✅ contracts/sitemap-endpoint.md ✅ quickstart.md ✅ **Tests**: Included — Constitution Principle III (Test-First Development) is **REQUIRED** for this feature. **Organization**: Tasks grouped by user story to enable independent implementation and testing. ## Format: `[ID] [P?] [Story] Description` - **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks) - **[Story]**: User story this task belongs to (US1, US2, US3) - Exact file paths in all descriptions --- ## Phase 1: Setup (Configuration) **Purpose**: Extend the settings schema with the three new fields required by the sitemap flow. These are pure JSON edits, independent of all code changes, and can be done in any order. - [X] T001 [P] Add `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl` fields to `src/globalVariables/kme_CSA_settings.json` - [X] T002 [P] Add `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl` placeholder entries to `src/globalVariables/kme_CSA_settings.json.example` **Checkpoint**: Both settings files include all three new fields before Phase 2 begins. --- ## Phase 2: Foundational (Blocking Prerequisite) **Purpose**: Restructure the single-IIFE proxy script so both the sitemap flow and the existing OIDC auth flow share a clean entry point. **No user-story work can begin until this is done.** - [X] T003 Restructure `src/proxyScripts/kmeContentSourceAdapter.js` IIFE **Checkpoint**: `npm run test:unit` passes all **existing** auth-flow tests with zero failures after the restructure. --- ## Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP **Goal**: A consumer calling `GET /sitemap.xml` receives a well-formed XML Sitemap containing one `/` per knowledge item, built via `xmlbuilder2`, with `Content-Type: application/xml`. **Independent Test**: `curl http://localhost:3000/sitemap.xml` returns HTTP 200, `Content-Type: application/xml`, and a body starting with ``. ### Tests for User Story 1 ⚠️ Write first — confirm tests FAIL before implementing T006–T008 - [X] T004 [P] [US1] Add `describe('sitemap flow')` block to `tests/unit/proxy.test.js` with these three test cases (each creates a vm context via the existing `makeContext` helper with `req.url` set to `'/sitemap.xml'`): - **Happy path — items present**: mock `axios.get` resolving `{ data: { items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }, { 'vkm:url': 'https://kme.example.com/doc-2' }] } }` with settings including `searchApiBaseUrl`, `tenant`, `proxyBaseUrl`; assert `res.statusCode === 200`, `res.headers['Content-Type'] === 'application/xml'`, body contains `https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1` - **Happy path — zero items**: mock `axios.get` resolving `{ data: { items: [] } }`; assert 200, `application/xml`, body contains `` - **Items with empty `vkm:url` filtered**: mock items array `[{ 'vkm:url': '' }, { 'vkm:url': 'https://kme.example.com/valid' }]`; assert body contains exactly one `` and it contains `valid` - [X] T005 [P] [US1] Add `describe('sitemap endpoint')` block to `tests/contract/proxy-http.test.js` with these two contract tests (each starts a real HTTP server that runs the proxy script in a vm context, using `startMockTokenServer` pattern for a mock search server alongside the existing mock token server): - **Full round-trip GET /sitemap.xml**: mock search server returns `{ items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }] }`; send real `axios.get('http://localhost:/sitemap.xml')`; assert status 200, `content-type` header contains `application/xml`, body is parseable XML containing `` - **Empty results round-trip**: mock search server returns `{ items: [] }`; assert 200, `application/xml`, body contains `` element ### Implementation for User Story 1 - [X] T006 [US1] Replace the `sitemapFlow()` stub in `src/proxyScripts/kmeContentSourceAdapter.js` with a settings validation guard: declare `const requiredSitemapFields = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl']`, loop over each field, and if `!kme_CSA_settings[field]` respond `res.writeHead(500, { 'Content-Type': 'text/plain' })` + `res.end('Configuration error: missing required field: ' + field)` + `return` (per FR-011 and R-005); add `const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;` after the guard - [X] T007 [US1] Add token fetch and search API call to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: call `const token = await getValidToken();` (throws on failure, caught by outer try/catch → 401), then call `const searchResponse = await axios.get(\`${searchApiBaseUrl}/${tenant}\`, { headers: { Authorization: \`OIDC_id_token ${token}\` }, timeout: 10_000 })`, then extract `const items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002) - [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007) **Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all sitemap happy-path tests. At this point `GET /sitemap.xml` is fully functional; MVP is deliverable. --- ## Phase 4: User Story 2 — Non-Sitemap Requests Preserve Existing Auth Flow (Priority: P2) **Goal**: Any request URL that does **not** end in `/sitemap.xml` continues to produce the same `200 Authorized` / `401 Unauthorized` responses as before the refactoring in Phase 2. **Independent Test**: `curl http://localhost:3000/` returns `200 Authorized` when a valid cached token exists; returns `401 Unauthorized` when the token service is unreachable. ### Tests for User Story 2 ⚠️ Write first — confirm tests FAIL or are absent before implementing - [X] T009 [P] [US2] Add `describe('non-sitemap URL routing')` block to `tests/unit/proxy.test.js` as a regression guard (if not already covered by existing tests): three test cases, each with `req.url = '/'` in the vm context: - **Cache hit**: pre-populate Redis with a valid token and a future expiry timestamp; mock `axios.post` to fail (should never be called); assert `res.statusCode === 200`, body `=== 'Authorized'`, and `axios.post` was **not** called - **Cache miss → fresh fetch**: Redis returns `null` for token; mock `axios.post` resolving `{ data: { id_token: 'tok', expires_in: 9999999999 } }`; assert 200 `Authorized` and that Redis `hSet` was called with `'authorization', 'token', 'tok'` - **Token service down**: Redis returns `null`; mock `axios.post` rejecting with `{ code: 'ECONNABORTED' }`; assert `res.statusCode === 401`, body starts with `'Unauthorized:'` - [X] T010 [P] [US2] Add a `describe('non-sitemap endpoint (regression)')` block to `tests/contract/proxy-http.test.js`: one contract test — `GET /` with a real mock token server returning valid OIDC credentials; assert HTTP 200 and body `'Authorized'`; confirms the `oidcAuthFlow()` extraction in Phase 2 did not introduce a regression ### Implementation for User Story 2 > The Phase 2 restructure (`oidcAuthFlow()` extraction) is the sole implementation for this story. > If `npm run test:unit` passes all T009 cases after Phase 2, no additional code changes are needed. - [X] T011 [US2] Review `oidcAuthFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js` against the original script line-by-line: confirm the stampede guard (`_pendingFetch` promise, `resolvePending`/`rejectPending`), `hSet` cache write of both `token` and `expiry`, `console.debug`/`console.info`/`console.error` calls, and all error-path `res.writeHead(401)` / `res.end('Unauthorized: …')` responses are byte-for-byte identical to the pre-refactor behaviour; update any divergence found **Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all non-sitemap tests with zero regressions. --- ## Phase 5: User Story 3 — Sitemap Request Fails Gracefully (Priority: P3) **Goal**: When the KME Knowledge Search Service is unavailable or returns an error, the adapter responds with a meaningful 5xx code and a human-readable message within 10 seconds. **Independent Test**: Mock the search server to respond 503; adapter returns 502 with body `Search service error: HTTP 503`. Mock the search server to time out; adapter returns 504. ### Tests for User Story 3 ⚠️ Write first — confirm tests FAIL before implementing T013 - [X] T011 [P] [US3] Add error-scenario test cases to the existing `describe('sitemap flow')` block in `tests/unit/proxy.test.js` (append after T004 cases): - **Upstream 503**: mock `axios.get` rejecting with `{ response: { status: 503 } }`; assert `res.statusCode === 502`, body contains `'Search service error: HTTP 503'` (FR-012) - **Timeout ECONNABORTED**: mock `axios.get` rejecting with `{ code: 'ECONNABORTED' }`; assert `res.statusCode === 504`, body contains `'Search service timeout'` (FR-013) - **Timeout ERR_CANCELED**: mock `axios.get` rejecting with `{ code: 'ERR_CANCELED' }`; assert `res.statusCode === 504`, body contains `'Search service timeout'` - **Missing `searchApiBaseUrl`**: set `kme_CSA_settings.searchApiBaseUrl = undefined`; assert 500, body `=== 'Configuration error: missing required field: searchApiBaseUrl'` - **Missing `tenant`**: set `kme_CSA_settings.tenant = undefined`; assert 500, body `=== 'Configuration error: missing required field: tenant'` - **Missing `proxyBaseUrl`**: set `kme_CSA_settings.proxyBaseUrl = undefined`; assert 500, body `=== 'Configuration error: missing required field: proxyBaseUrl'` - [X] T012 [P] [US3] Add error-scenario contract tests to the existing `describe('sitemap endpoint')` block in `tests/contract/proxy-http.test.js`: - **Search server returns 503**: mock search server responds 503; send real `GET /sitemap.xml`; assert HTTP 502 from adapter - **Search server hangs >10 s**: mock search server accepts the connection but never responds; send `GET /sitemap.xml` with a 15 s client timeout; assert adapter responds 504 within 12 s (accounts for 10 s upstream timeout + adapter overhead) ### Implementation for User Story 3 - [X] T013 [US3] Wrap the body of `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js` in a `try/catch` block (surrounding the search API call and XML generation in T007–T008, **after** the settings validation guard which remains outside): in the `catch (err)` handler, check `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → `res.writeHead(504, { 'Content-Type': 'text/plain' }); res.end('Search service timeout');`; else if `err.response` → `res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: HTTP ' + err.response.status);`; else → `res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: ' + err.message);` (per R-004 and contracts/sitemap-endpoint.md) **Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all error-scenario tests. --- ## Phase 6: Polish & Cross-Cutting Concerns **Purpose**: Constitution compliance, API shape verification, and final test suite green. - [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlbuilder2` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/` - [X] T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call `GET ${searchApiBaseUrl}/${tenant}` manually with `curl -H "Authorization: OIDC_id_token " /` and confirm (a) the top-level key holding the items array (`items` vs `results` vs bare array) and (b) that `vkm:url` is a direct string property of each item; update the extraction line `response.data.items ?? response.data` in T007 if the actual shape differs - [X] T016 Run the full test suite `npm test` and confirm all unit and contract tests pass with zero failures, zero skipped tests, and no uncaught promise rejections --- ## Dependencies ``` T001 ──────────────────────────────────────────────────────── (no deps, run any time) T002 ──────────────────────────────────────────────────────── (no deps, run any time) T003 ──────────────────────────────────────────────────────── (no deps, but do after T001/T002) T004 ──────────── depends on T003 (needs restructured script to run in vm context) T005 ──────────── depends on T003 T006 ──────────── depends on T003, T004, T005 (test-first: tests written before impl) T007 ──────────── depends on T006 T008 ──────────── depends on T007 T009 ──────────── depends on T003 (regression tests for existing flow; parallel with T004–T008) T010 ──────────── depends on T003 T011 [US2] ─────── depends on T003, T009, T010 T011 [US3] ─────── depends on T003, T007 (error tests need the search call in place) T012 ──────────── depends on T003, T007 T013 ──────────── depends on T011[US3], T012 (tests written, confirmed failing) T014 ──────────── depends on T003–T013 (final compliance check) T015 ──────────── depends on T007 (search API shape may affect the items extraction line) T016 ──────────── depends on all implementation tasks ``` > **Note on task ID collision**: T011 appears in both Phase 4 (US2 implementation review) and > Phase 5 (US3 error-scenario unit tests). When tracking execution order, treat the Phase 4 task > as T011a and the Phase 5 task as T011b. Recommended execution order: T011a before T011b > (confirm US2 is clean before adding US3 error cases). --- ## Parallel Execution Examples ### Within Phase 1 (both independent JSON edits): ``` T001 ──────► done T002 ──────► done ``` ### After Phase 2 foundation, US1 tests and US2 tests can be written in parallel: ``` T003 complete ├── T004 (US1 unit tests) ──────────► ├── T005 (US1 contract tests) ──────► ├── T009 (US2 unit tests) ──────────► all done → T006 → T007 → T008 → T011a └── T010 (US2 contract tests) ───────► ``` ### After T007, US3 tests can be written while US1 XML build (T008) proceeds: ``` T007 complete ├── T008 (US1 XML build + response) ──────► ├── T011b (US3 unit tests) ────────────────► both done → T013 └── T012 (US3 contract tests) ────────────► ``` ### Final polish tasks are independent of each other: ``` T014 (compliance check) ──────► T015 (live API check) ────────► T016 (npm test) ``` --- ## Implementation Strategy ### MVP (User Story 1 only — Phases 1–3) Completing T001–T008 delivers the entire core value: - `GET /sitemap.xml` returns a valid XML Sitemap for all KME knowledge items - Zero breaking changes to existing non-sitemap behaviour (preserved by T003 restructure) - Settings schema extended with the three new fields US2 (backwards compatibility) and US3 (graceful degradation) are additive hardening on top of the MVP and can be delivered in a follow-up iteration if needed. ### Incremental delivery order 1. **Iteration 1** (MVP): T001 → T002 → T003 → T004 + T005 → T006 → T007 → T008 2. **Iteration 2** (Hardening): T009 + T010 → T011a → T011b + T012 → T013 3. **Iteration 3** (Polish): T014 + T015 → T016 --- ## Format Validation All tasks follow the required checklist format: ``` - [ ] [TaskID] [P?] [Story?] Description with file path ``` | Check | Result | |---|---| | All tasks start with `- [ ]` checkbox | ✅ | | All tasks have a sequential ID (T001–T016) | ✅ | | `[P]` only on tasks modifying different files with no unmet dependencies | ✅ | | `[US1]`/`[US2]`/`[US3]` labels only on user-story phase tasks | ✅ | | Setup/Foundational/Polish tasks have no story label | ✅ | | All tasks name at least one explicit file path | ✅ |