feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
This commit is contained in:
241
specs/002-sitemap-generation/tasks.md
Normal file
241
specs/002-sitemap-generation/tasks.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Tasks: Sitemap XML Generation
|
||||
|
||||
**Feature**: `002-sitemap-generation`
|
||||
**Input**: Design documents from `/specs/002-sitemap-generation/`
|
||||
**Prerequisites**: plan.md ✅ spec.md ✅ research.md ✅ data-model.md ✅ contracts/sitemap-endpoint.md ✅ quickstart.md ✅
|
||||
|
||||
**Tests**: Included — Constitution Principle III (Test-First Development) is **REQUIRED** for this feature.
|
||||
|
||||
**Organization**: Tasks grouped by user story to enable independent implementation and testing.
|
||||
|
||||
## Format: `[ID] [P?] [Story] Description`
|
||||
|
||||
- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
|
||||
- **[Story]**: User story this task belongs to (US1, US2, US3)
|
||||
- Exact file paths in all descriptions
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Setup (Configuration)
|
||||
|
||||
**Purpose**: Extend the settings schema with the three new fields required by the sitemap flow.
|
||||
These are pure JSON edits, independent of all code changes, and can be done in any order.
|
||||
|
||||
- [X] T001 [P] Add `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl` fields to `src/globalVariables/kme_CSA_settings.json`
|
||||
- [X] T002 [P] Add `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl` placeholder entries to `src/globalVariables/kme_CSA_settings.json.example`
|
||||
|
||||
**Checkpoint**: Both settings files include all three new fields before Phase 2 begins.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Foundational (Blocking Prerequisite)
|
||||
|
||||
**Purpose**: Restructure the single-IIFE proxy script so both the sitemap flow and the existing
|
||||
OIDC auth flow share a clean entry point. **No user-story work can begin until this is done.**
|
||||
|
||||
- [X] T003 Restructure `src/proxyScripts/kmeContentSourceAdapter.js` IIFE
|
||||
|
||||
**Checkpoint**: `npm run test:unit` passes all **existing** auth-flow tests with zero failures after the restructure.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP
|
||||
|
||||
**Goal**: A consumer calling `GET /sitemap.xml` receives a well-formed XML Sitemap containing
|
||||
one `<url>/<loc>` per knowledge item, built via `xmlBuilder`, with `Content-Type: application/xml`.
|
||||
|
||||
**Independent Test**: `curl http://localhost:3000/sitemap.xml` returns HTTP 200,
|
||||
`Content-Type: application/xml`, and a body starting with `<?xml` containing `<urlset>`.
|
||||
|
||||
### Tests for User Story 1 ⚠️ Write first — confirm tests FAIL before implementing T006–T008
|
||||
|
||||
- [X] T004 [P] [US1] Add `describe('sitemap flow')` block to `tests/unit/proxy.test.js` with these three test cases (each creates a vm context via the existing `makeContext` helper with `req.url` set to `'/sitemap.xml'`):
|
||||
- **Happy path — items present**: mock `axios.get` resolving `{ data: { items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }, { 'vkm:url': 'https://kme.example.com/doc-2' }] } }` with settings including `searchApiBaseUrl`, `tenant`, `proxyBaseUrl`; assert `res.statusCode === 200`, `res.headers['Content-Type'] === 'application/xml'`, body contains `<?xml`, `<urlset`, and `<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>`
|
||||
- **Happy path — zero items**: mock `axios.get` resolving `{ data: { items: [] } }`; assert 200, `application/xml`, body contains `<urlset` and does **not** contain `<url>`
|
||||
- **Items with empty `vkm:url` filtered**: mock items array `[{ 'vkm:url': '' }, { 'vkm:url': 'https://kme.example.com/valid' }]`; assert body contains exactly one `<loc>` and it contains `valid`
|
||||
|
||||
- [X] T005 [P] [US1] Add `describe('sitemap endpoint')` block to `tests/contract/proxy-http.test.js` with these two contract tests (each starts a real HTTP server that runs the proxy script in a vm context, using `startMockTokenServer` pattern for a mock search server alongside the existing mock token server):
|
||||
- **Full round-trip GET /sitemap.xml**: mock search server returns `{ items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }] }`; send real `axios.get('http://localhost:<port>/sitemap.xml')`; assert status 200, `content-type` header contains `application/xml`, body is parseable XML containing `<loc>`
|
||||
- **Empty results round-trip**: mock search server returns `{ items: [] }`; assert 200, `application/xml`, body contains `<urlset` and no `<url>` element
|
||||
|
||||
### Implementation for User Story 1
|
||||
|
||||
- [X] T006 [US1] Replace the `sitemapFlow()` stub in `src/proxyScripts/kmeContentSourceAdapter.js` with a settings validation guard: declare `const requiredSitemapFields = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl']`, loop over each field, and if `!kme_CSA_settings[field]` respond `res.writeHead(500, { 'Content-Type': 'text/plain' })` + `res.end('Configuration error: missing required field: ' + field)` + `return` (per FR-011 and R-005); add `const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;` after the guard
|
||||
|
||||
- [X] T007 [US1] Add token fetch and search API call to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: call `const token = await getValidToken();` (throws on failure, caught by outer try/catch → 401), then call `const searchResponse = await axios.get(\`${searchApiBaseUrl}/${tenant}\`, { headers: { Authorization: \`OIDC_id_token ${token}\` }, timeout: 10_000 })`, then extract `const items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002)
|
||||
|
||||
- [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
|
||||
|
||||
**Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all sitemap happy-path tests.
|
||||
At this point `GET /sitemap.xml` is fully functional; MVP is deliverable.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: User Story 2 — Non-Sitemap Requests Preserve Existing Auth Flow (Priority: P2)
|
||||
|
||||
**Goal**: Any request URL that does **not** end in `/sitemap.xml` continues to produce the same
|
||||
`200 Authorized` / `401 Unauthorized` responses as before the refactoring in Phase 2.
|
||||
|
||||
**Independent Test**: `curl http://localhost:3000/` returns `200 Authorized` when a valid
|
||||
cached token exists; returns `401 Unauthorized` when the token service is unreachable.
|
||||
|
||||
### Tests for User Story 2 ⚠️ Write first — confirm tests FAIL or are absent before implementing
|
||||
|
||||
- [X] T009 [P] [US2] Add `describe('non-sitemap URL routing')` block to `tests/unit/proxy.test.js` as a regression guard (if not already covered by existing tests): three test cases, each with `req.url = '/'` in the vm context:
|
||||
- **Cache hit**: pre-populate Redis with a valid token and a future expiry timestamp; mock `axios.post` to fail (should never be called); assert `res.statusCode === 200`, body `=== 'Authorized'`, and `axios.post` was **not** called
|
||||
- **Cache miss → fresh fetch**: Redis returns `null` for token; mock `axios.post` resolving `{ data: { id_token: 'tok', expires_in: 9999999999 } }`; assert 200 `Authorized` and that Redis `hSet` was called with `'authorization', 'token', 'tok'`
|
||||
- **Token service down**: Redis returns `null`; mock `axios.post` rejecting with `{ code: 'ECONNABORTED' }`; assert `res.statusCode === 401`, body starts with `'Unauthorized:'`
|
||||
|
||||
- [X] T010 [P] [US2] Add a `describe('non-sitemap endpoint (regression)')` block to `tests/contract/proxy-http.test.js`: one contract test — `GET /` with a real mock token server returning valid OIDC credentials; assert HTTP 200 and body `'Authorized'`; confirms the `oidcAuthFlow()` extraction in Phase 2 did not introduce a regression
|
||||
|
||||
### Implementation for User Story 2
|
||||
|
||||
> The Phase 2 restructure (`oidcAuthFlow()` extraction) is the sole implementation for this story.
|
||||
> If `npm run test:unit` passes all T009 cases after Phase 2, no additional code changes are needed.
|
||||
|
||||
- [X] T011 [US2] Review `oidcAuthFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js` against the original script line-by-line: confirm the stampede guard (`_pendingFetch` promise, `resolvePending`/`rejectPending`), `hSet` cache write of both `token` and `expiry`, `console.debug`/`console.info`/`console.error` calls, and all error-path `res.writeHead(401)` / `res.end('Unauthorized: …')` responses are byte-for-byte identical to the pre-refactor behaviour; update any divergence found
|
||||
|
||||
**Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all non-sitemap tests with zero regressions.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: User Story 3 — Sitemap Request Fails Gracefully (Priority: P3)
|
||||
|
||||
**Goal**: When the KME Knowledge Search Service is unavailable or returns an error, the adapter
|
||||
responds with a meaningful 5xx code and a human-readable message within 10 seconds.
|
||||
|
||||
**Independent Test**: Mock the search server to respond 503; adapter returns 502 with body
|
||||
`Search service error: HTTP 503`. Mock the search server to time out; adapter returns 504.
|
||||
|
||||
### Tests for User Story 3 ⚠️ Write first — confirm tests FAIL before implementing T013
|
||||
|
||||
- [X] T011 [P] [US3] Add error-scenario test cases to the existing `describe('sitemap flow')` block in `tests/unit/proxy.test.js` (append after T004 cases):
|
||||
- **Upstream 503**: mock `axios.get` rejecting with `{ response: { status: 503 } }`; assert `res.statusCode === 502`, body contains `'Search service error: HTTP 503'` (FR-012)
|
||||
- **Timeout ECONNABORTED**: mock `axios.get` rejecting with `{ code: 'ECONNABORTED' }`; assert `res.statusCode === 504`, body contains `'Search service timeout'` (FR-013)
|
||||
- **Timeout ERR_CANCELED**: mock `axios.get` rejecting with `{ code: 'ERR_CANCELED' }`; assert `res.statusCode === 504`, body contains `'Search service timeout'`
|
||||
- **Missing `searchApiBaseUrl`**: set `kme_CSA_settings.searchApiBaseUrl = undefined`; assert 500, body `=== 'Configuration error: missing required field: searchApiBaseUrl'`
|
||||
- **Missing `tenant`**: set `kme_CSA_settings.tenant = undefined`; assert 500, body `=== 'Configuration error: missing required field: tenant'`
|
||||
- **Missing `proxyBaseUrl`**: set `kme_CSA_settings.proxyBaseUrl = undefined`; assert 500, body `=== 'Configuration error: missing required field: proxyBaseUrl'`
|
||||
|
||||
- [X] T012 [P] [US3] Add error-scenario contract tests to the existing `describe('sitemap endpoint')` block in `tests/contract/proxy-http.test.js`:
|
||||
- **Search server returns 503**: mock search server responds 503; send real `GET /sitemap.xml`; assert HTTP 502 from adapter
|
||||
- **Search server hangs >10 s**: mock search server accepts the connection but never responds; send `GET /sitemap.xml` with a 15 s client timeout; assert adapter responds 504 within 12 s (accounts for 10 s upstream timeout + adapter overhead)
|
||||
|
||||
### Implementation for User Story 3
|
||||
|
||||
- [X] T013 [US3] Wrap the body of `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js` in a `try/catch` block (surrounding the search API call and XML generation in T007–T008, **after** the settings validation guard which remains outside): in the `catch (err)` handler, check `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → `res.writeHead(504, { 'Content-Type': 'text/plain' }); res.end('Search service timeout');`; else if `err.response` → `res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: HTTP ' + err.response.status);`; else → `res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: ' + err.message);` (per R-004 and contracts/sitemap-endpoint.md)
|
||||
|
||||
**Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all error-scenario tests.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Polish & Cross-Cutting Concerns
|
||||
|
||||
**Purpose**: Constitution compliance, API shape verification, and final test suite green.
|
||||
|
||||
- [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlBuilder` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/`
|
||||
|
||||
- [X] T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call `GET ${searchApiBaseUrl}/${tenant}` manually with `curl -H "Authorization: OIDC_id_token <token>" <searchApiBaseUrl>/<tenant>` and confirm (a) the top-level key holding the items array (`items` vs `results` vs bare array) and (b) that `vkm:url` is a direct string property of each item; update the extraction line `response.data.items ?? response.data` in T007 if the actual shape differs
|
||||
|
||||
- [X] T016 Run the full test suite `npm test` and confirm all unit and contract tests pass with zero failures, zero skipped tests, and no uncaught promise rejections
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
T001 ──────────────────────────────────────────────────────── (no deps, run any time)
|
||||
T002 ──────────────────────────────────────────────────────── (no deps, run any time)
|
||||
T003 ──────────────────────────────────────────────────────── (no deps, but do after T001/T002)
|
||||
T004 ──────────── depends on T003 (needs restructured script to run in vm context)
|
||||
T005 ──────────── depends on T003
|
||||
T006 ──────────── depends on T003, T004, T005 (test-first: tests written before impl)
|
||||
T007 ──────────── depends on T006
|
||||
T008 ──────────── depends on T007
|
||||
T009 ──────────── depends on T003 (regression tests for existing flow; parallel with T004–T008)
|
||||
T010 ──────────── depends on T003
|
||||
T011 [US2] ─────── depends on T003, T009, T010
|
||||
T011 [US3] ─────── depends on T003, T007 (error tests need the search call in place)
|
||||
T012 ──────────── depends on T003, T007
|
||||
T013 ──────────── depends on T011[US3], T012 (tests written, confirmed failing)
|
||||
T014 ──────────── depends on T003–T013 (final compliance check)
|
||||
T015 ──────────── depends on T007 (search API shape may affect the items extraction line)
|
||||
T016 ──────────── depends on all implementation tasks
|
||||
```
|
||||
|
||||
> **Note on task ID collision**: T011 appears in both Phase 4 (US2 implementation review) and
|
||||
> Phase 5 (US3 error-scenario unit tests). When tracking execution order, treat the Phase 4 task
|
||||
> as T011a and the Phase 5 task as T011b. Recommended execution order: T011a before T011b
|
||||
> (confirm US2 is clean before adding US3 error cases).
|
||||
|
||||
---
|
||||
|
||||
## Parallel Execution Examples
|
||||
|
||||
### Within Phase 1 (both independent JSON edits):
|
||||
```
|
||||
T001 ──────► done
|
||||
T002 ──────► done
|
||||
```
|
||||
|
||||
### After Phase 2 foundation, US1 tests and US2 tests can be written in parallel:
|
||||
```
|
||||
T003 complete
|
||||
├── T004 (US1 unit tests) ──────────►
|
||||
├── T005 (US1 contract tests) ──────►
|
||||
├── T009 (US2 unit tests) ──────────► all done → T006 → T007 → T008 → T011a
|
||||
└── T010 (US2 contract tests) ───────►
|
||||
```
|
||||
|
||||
### After T007, US3 tests can be written while US1 XML build (T008) proceeds:
|
||||
```
|
||||
T007 complete
|
||||
├── T008 (US1 XML build + response) ──────►
|
||||
├── T011b (US3 unit tests) ────────────────► both done → T013
|
||||
└── T012 (US3 contract tests) ────────────►
|
||||
```
|
||||
|
||||
### Final polish tasks are independent of each other:
|
||||
```
|
||||
T014 (compliance check) ──────►
|
||||
T015 (live API check) ────────► T016 (npm test)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### MVP (User Story 1 only — Phases 1–3)
|
||||
|
||||
Completing T001–T008 delivers the entire core value:
|
||||
- `GET /sitemap.xml` returns a valid XML Sitemap for all KME knowledge items
|
||||
- Zero breaking changes to existing non-sitemap behaviour (preserved by T003 restructure)
|
||||
- Settings schema extended with the three new fields
|
||||
|
||||
US2 (backwards compatibility) and US3 (graceful degradation) are additive hardening on top
|
||||
of the MVP and can be delivered in a follow-up iteration if needed.
|
||||
|
||||
### Incremental delivery order
|
||||
|
||||
1. **Iteration 1** (MVP): T001 → T002 → T003 → T004 + T005 → T006 → T007 → T008
|
||||
2. **Iteration 2** (Hardening): T009 + T010 → T011a → T011b + T012 → T013
|
||||
3. **Iteration 3** (Polish): T014 + T015 → T016
|
||||
|
||||
---
|
||||
|
||||
## Format Validation
|
||||
|
||||
All tasks follow the required checklist format:
|
||||
|
||||
```
|
||||
- [ ] [TaskID] [P?] [Story?] Description with file path
|
||||
```
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| All tasks start with `- [ ]` checkbox | ✅ |
|
||||
| All tasks have a sequential ID (T001–T016) | ✅ |
|
||||
| `[P]` only on tasks modifying different files with no unmet dependencies | ✅ |
|
||||
| `[US1]`/`[US2]`/`[US3]` labels only on user-story phase tasks | ✅ |
|
||||
| Setup/Foundational/Polish tasks have no story label | ✅ |
|
||||
| All tasks name at least one explicit file path | ✅ |
|
||||
Reference in New Issue
Block a user