Files
kme_content_adapter/specs/002-sitemap-generation/tasks.md
Peter.Morton 50b87297d2 feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
2026-04-22 22:08:08 -05:00

17 KiB
Raw Blame History

Tasks: Sitemap XML Generation

Feature: 002-sitemap-generation Input: Design documents from /specs/002-sitemap-generation/ Prerequisites: plan.md spec.md research.md data-model.md contracts/sitemap-endpoint.md quickstart.md

Tests: Included — Constitution Principle III (Test-First Development) is REQUIRED for this feature.

Organization: Tasks grouped by user story to enable independent implementation and testing.

Format: [ID] [P?] [Story] Description

  • [P]: Can run in parallel (different files, no dependencies on incomplete tasks)
  • [Story]: User story this task belongs to (US1, US2, US3)
  • Exact file paths in all descriptions

Phase 1: Setup (Configuration)

Purpose: Extend the settings schema with the three new fields required by the sitemap flow. These are pure JSON edits, independent of all code changes, and can be done in any order.

  • T001 [P] Add searchApiBaseUrl, tenant, and proxyBaseUrl fields to src/globalVariables/kme_CSA_settings.json
  • T002 [P] Add searchApiBaseUrl, tenant, and proxyBaseUrl placeholder entries to src/globalVariables/kme_CSA_settings.json.example

Checkpoint: Both settings files include all three new fields before Phase 2 begins.


Phase 2: Foundational (Blocking Prerequisite)

Purpose: Restructure the single-IIFE proxy script so both the sitemap flow and the existing OIDC auth flow share a clean entry point. No user-story work can begin until this is done.

  • T003 Restructure src/proxyScripts/kmeContentSourceAdapter.js IIFE

Checkpoint: npm run test:unit passes all existing auth-flow tests with zero failures after the restructure.


Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP

Goal: A consumer calling GET /sitemap.xml receives a well-formed XML Sitemap containing one <url>/<loc> per knowledge item, built via xmlBuilder, with Content-Type: application/xml.

Independent Test: curl http://localhost:3000/sitemap.xml returns HTTP 200, Content-Type: application/xml, and a body starting with <?xml containing <urlset>.

Tests for User Story 1 ⚠️ Write first — confirm tests FAIL before implementing T006T008

  • T004 [P] [US1] Add describe('sitemap flow') block to tests/unit/proxy.test.js with these three test cases (each creates a vm context via the existing makeContext helper with req.url set to '/sitemap.xml'):

    • Happy path — items present: mock axios.get resolving { data: { items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }, { 'vkm:url': 'https://kme.example.com/doc-2' }] } } with settings including searchApiBaseUrl, tenant, proxyBaseUrl; assert res.statusCode === 200, res.headers['Content-Type'] === 'application/xml', body contains <?xml, <urlset, and <loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
    • Happy path — zero items: mock axios.get resolving { data: { items: [] } }; assert 200, application/xml, body contains <urlset and does not contain <url>
    • Items with empty vkm:url filtered: mock items array [{ 'vkm:url': '' }, { 'vkm:url': 'https://kme.example.com/valid' }]; assert body contains exactly one <loc> and it contains valid
  • T005 [P] [US1] Add describe('sitemap endpoint') block to tests/contract/proxy-http.test.js with these two contract tests (each starts a real HTTP server that runs the proxy script in a vm context, using startMockTokenServer pattern for a mock search server alongside the existing mock token server):

    • Full round-trip GET /sitemap.xml: mock search server returns { items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }] }; send real axios.get('http://localhost:<port>/sitemap.xml'); assert status 200, content-type header contains application/xml, body is parseable XML containing <loc>
    • Empty results round-trip: mock search server returns { items: [] }; assert 200, application/xml, body contains <urlset and no <url> element

Implementation for User Story 1

  • T006 [US1] Replace the sitemapFlow() stub in src/proxyScripts/kmeContentSourceAdapter.js with a settings validation guard: declare const requiredSitemapFields = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'], loop over each field, and if !kme_CSA_settings[field] respond res.writeHead(500, { 'Content-Type': 'text/plain' }) + res.end('Configuration error: missing required field: ' + field) + return (per FR-011 and R-005); add const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings; after the guard

  • T007 [US1] Add token fetch and search API call to sitemapFlow() in src/proxyScripts/kmeContentSourceAdapter.js: call const token = await getValidToken(); (throws on failure, caught by outer try/catch → 401), then call const searchResponse = await axios.get(\${searchApiBaseUrl}/${tenant}`, { headers: { Authorization: `OIDC_id_token ${token}` }, timeout: 10_000 }), then extract const items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002)

  • T008 [US1] Add item mapping, XML build, and HTTP response to sitemapFlow() in src/proxyScripts/kmeContentSourceAdapter.js: iterate items, skip entries where !item['vkm:url'] (FR-006), for each valid item compute const loc = \${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}`(FR-005, R-006); build XML viaconst doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();(FR-008, R-003); serialise withconst xml = doc.end({ prettyPrint: false }); respond res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)

Checkpoint: npm run test:unit and npm run test:contract pass all sitemap happy-path tests. At this point GET /sitemap.xml is fully functional; MVP is deliverable.


Phase 4: User Story 2 — Non-Sitemap Requests Preserve Existing Auth Flow (Priority: P2)

Goal: Any request URL that does not end in /sitemap.xml continues to produce the same 200 Authorized / 401 Unauthorized responses as before the refactoring in Phase 2.

Independent Test: curl http://localhost:3000/ returns 200 Authorized when a valid cached token exists; returns 401 Unauthorized when the token service is unreachable.

Tests for User Story 2 ⚠️ Write first — confirm tests FAIL or are absent before implementing

  • T009 [P] [US2] Add describe('non-sitemap URL routing') block to tests/unit/proxy.test.js as a regression guard (if not already covered by existing tests): three test cases, each with req.url = '/' in the vm context:

    • Cache hit: pre-populate Redis with a valid token and a future expiry timestamp; mock axios.post to fail (should never be called); assert res.statusCode === 200, body === 'Authorized', and axios.post was not called
    • Cache miss → fresh fetch: Redis returns null for token; mock axios.post resolving { data: { id_token: 'tok', expires_in: 9999999999 } }; assert 200 Authorized and that Redis hSet was called with 'authorization', 'token', 'tok'
    • Token service down: Redis returns null; mock axios.post rejecting with { code: 'ECONNABORTED' }; assert res.statusCode === 401, body starts with 'Unauthorized:'
  • T010 [P] [US2] Add a describe('non-sitemap endpoint (regression)') block to tests/contract/proxy-http.test.js: one contract test — GET / with a real mock token server returning valid OIDC credentials; assert HTTP 200 and body 'Authorized'; confirms the oidcAuthFlow() extraction in Phase 2 did not introduce a regression

Implementation for User Story 2

The Phase 2 restructure (oidcAuthFlow() extraction) is the sole implementation for this story. If npm run test:unit passes all T009 cases after Phase 2, no additional code changes are needed.

  • T011 [US2] Review oidcAuthFlow() in src/proxyScripts/kmeContentSourceAdapter.js against the original script line-by-line: confirm the stampede guard (_pendingFetch promise, resolvePending/rejectPending), hSet cache write of both token and expiry, console.debug/console.info/console.error calls, and all error-path res.writeHead(401) / res.end('Unauthorized: …') responses are byte-for-byte identical to the pre-refactor behaviour; update any divergence found

Checkpoint: npm run test:unit and npm run test:contract pass all non-sitemap tests with zero regressions.


Phase 5: User Story 3 — Sitemap Request Fails Gracefully (Priority: P3)

Goal: When the KME Knowledge Search Service is unavailable or returns an error, the adapter responds with a meaningful 5xx code and a human-readable message within 10 seconds.

Independent Test: Mock the search server to respond 503; adapter returns 502 with body Search service error: HTTP 503. Mock the search server to time out; adapter returns 504.

Tests for User Story 3 ⚠️ Write first — confirm tests FAIL before implementing T013

  • T011 [P] [US3] Add error-scenario test cases to the existing describe('sitemap flow') block in tests/unit/proxy.test.js (append after T004 cases):

    • Upstream 503: mock axios.get rejecting with { response: { status: 503 } }; assert res.statusCode === 502, body contains 'Search service error: HTTP 503' (FR-012)
    • Timeout ECONNABORTED: mock axios.get rejecting with { code: 'ECONNABORTED' }; assert res.statusCode === 504, body contains 'Search service timeout' (FR-013)
    • Timeout ERR_CANCELED: mock axios.get rejecting with { code: 'ERR_CANCELED' }; assert res.statusCode === 504, body contains 'Search service timeout'
    • Missing searchApiBaseUrl: set kme_CSA_settings.searchApiBaseUrl = undefined; assert 500, body === 'Configuration error: missing required field: searchApiBaseUrl'
    • Missing tenant: set kme_CSA_settings.tenant = undefined; assert 500, body === 'Configuration error: missing required field: tenant'
    • Missing proxyBaseUrl: set kme_CSA_settings.proxyBaseUrl = undefined; assert 500, body === 'Configuration error: missing required field: proxyBaseUrl'
  • T012 [P] [US3] Add error-scenario contract tests to the existing describe('sitemap endpoint') block in tests/contract/proxy-http.test.js:

    • Search server returns 503: mock search server responds 503; send real GET /sitemap.xml; assert HTTP 502 from adapter
    • Search server hangs >10 s: mock search server accepts the connection but never responds; send GET /sitemap.xml with a 15 s client timeout; assert adapter responds 504 within 12 s (accounts for 10 s upstream timeout + adapter overhead)

Implementation for User Story 3

  • T013 [US3] Wrap the body of sitemapFlow() in src/proxyScripts/kmeContentSourceAdapter.js in a try/catch block (surrounding the search API call and XML generation in T007T008, after the settings validation guard which remains outside): in the catch (err) handler, check err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'res.writeHead(504, { 'Content-Type': 'text/plain' }); res.end('Search service timeout');; else if err.responseres.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: HTTP ' + err.response.status);; else → res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: ' + err.message); (per R-004 and contracts/sitemap-endpoint.md)

Checkpoint: npm run test:unit and npm run test:contract pass all error-scenario tests.


Phase 6: Polish & Cross-Cutting Concerns

Purpose: Constitution compliance, API shape verification, and final test suite green.

  • T014 [P] Verify src/proxyScripts/kmeContentSourceAdapter.js constitution compliance: run grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js and confirm zero matches (FR-009, Constitution §I); confirm xmlBuilder is the sole XML-building mechanism (FR-008); confirm no new files were created in src/

  • T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call GET ${searchApiBaseUrl}/${tenant} manually with curl -H "Authorization: OIDC_id_token <token>" <searchApiBaseUrl>/<tenant> and confirm (a) the top-level key holding the items array (items vs results vs bare array) and (b) that vkm:url is a direct string property of each item; update the extraction line response.data.items ?? response.data in T007 if the actual shape differs

  • T016 Run the full test suite npm test and confirm all unit and contract tests pass with zero failures, zero skipped tests, and no uncaught promise rejections


Dependencies

T001 ──────────────────────────────────────────────────────── (no deps, run any time)
T002 ──────────────────────────────────────────────────────── (no deps, run any time)
T003 ──────────────────────────────────────────────────────── (no deps, but do after T001/T002)
T004 ──────────── depends on T003 (needs restructured script to run in vm context)
T005 ──────────── depends on T003
T006 ──────────── depends on T003, T004, T005 (test-first: tests written before impl)
T007 ──────────── depends on T006
T008 ──────────── depends on T007
T009 ──────────── depends on T003 (regression tests for existing flow; parallel with T004T008)
T010 ──────────── depends on T003
T011 [US2] ─────── depends on T003, T009, T010
T011 [US3] ─────── depends on T003, T007 (error tests need the search call in place)
T012 ──────────── depends on T003, T007
T013 ──────────── depends on T011[US3], T012 (tests written, confirmed failing)
T014 ──────────── depends on T003T013 (final compliance check)
T015 ──────────── depends on T007 (search API shape may affect the items extraction line)
T016 ──────────── depends on all implementation tasks

Note on task ID collision: T011 appears in both Phase 4 (US2 implementation review) and Phase 5 (US3 error-scenario unit tests). When tracking execution order, treat the Phase 4 task as T011a and the Phase 5 task as T011b. Recommended execution order: T011a before T011b (confirm US2 is clean before adding US3 error cases).


Parallel Execution Examples

Within Phase 1 (both independent JSON edits):

T001  ──────►  done
T002  ──────►  done

After Phase 2 foundation, US1 tests and US2 tests can be written in parallel:

T003 complete
├── T004  (US1 unit tests)  ──────────►
├── T005  (US1 contract tests)  ──────►
├── T009  (US2 unit tests)  ──────────►  all done → T006 → T007 → T008 → T011a
└── T010  (US2 contract tests)  ───────►

After T007, US3 tests can be written while US1 XML build (T008) proceeds:

T007 complete
├── T008  (US1 XML build + response)  ──────►
├── T011b (US3 unit tests)  ────────────────►  both done → T013
└── T012  (US3 contract tests)  ────────────►

Final polish tasks are independent of each other:

T014  (compliance check)  ──────►
T015  (live API check)  ────────►  T016 (npm test)

Implementation Strategy

MVP (User Story 1 only — Phases 13)

Completing T001T008 delivers the entire core value:

  • GET /sitemap.xml returns a valid XML Sitemap for all KME knowledge items
  • Zero breaking changes to existing non-sitemap behaviour (preserved by T003 restructure)
  • Settings schema extended with the three new fields

US2 (backwards compatibility) and US3 (graceful degradation) are additive hardening on top of the MVP and can be delivered in a follow-up iteration if needed.

Incremental delivery order

  1. Iteration 1 (MVP): T001 → T002 → T003 → T004 + T005 → T006 → T007 → T008
  2. Iteration 2 (Hardening): T009 + T010 → T011a → T011b + T012 → T013
  3. Iteration 3 (Polish): T014 + T015 → T016

Format Validation

All tasks follow the required checklist format:

- [ ] [TaskID] [P?] [Story?] Description with file path
Check Result
All tasks start with - [ ] checkbox
All tasks have a sequential ID (T001T016)
[P] only on tasks modifying different files with no unmet dependencies
[US1]/[US2]/[US3] labels only on user-story phase tasks
Setup/Foundational/Polish tasks have no story label
All tasks name at least one explicit file path