- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
17 KiB
Tasks: Sitemap XML Generation
Feature: 002-sitemap-generation
Input: Design documents from /specs/002-sitemap-generation/
Prerequisites: plan.md ✅ spec.md ✅ research.md ✅ data-model.md ✅ contracts/sitemap-endpoint.md ✅ quickstart.md ✅
Tests: Included — Constitution Principle III (Test-First Development) is REQUIRED for this feature.
Organization: Tasks grouped by user story to enable independent implementation and testing.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no dependencies on incomplete tasks)
- [Story]: User story this task belongs to (US1, US2, US3)
- Exact file paths in all descriptions
Phase 1: Setup (Configuration)
Purpose: Extend the settings schema with the three new fields required by the sitemap flow. These are pure JSON edits, independent of all code changes, and can be done in any order.
- T001 [P] Add
searchApiBaseUrl,tenant, andproxyBaseUrlfields tosrc/globalVariables/kme_CSA_settings.json - T002 [P] Add
searchApiBaseUrl,tenant, andproxyBaseUrlplaceholder entries tosrc/globalVariables/kme_CSA_settings.json.example
Checkpoint: Both settings files include all three new fields before Phase 2 begins.
Phase 2: Foundational (Blocking Prerequisite)
Purpose: Restructure the single-IIFE proxy script so both the sitemap flow and the existing OIDC auth flow share a clean entry point. No user-story work can begin until this is done.
- T003 Restructure
src/proxyScripts/kmeContentSourceAdapter.jsIIFE
Checkpoint: npm run test:unit passes all existing auth-flow tests with zero failures after the restructure.
Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP
Goal: A consumer calling GET /sitemap.xml receives a well-formed XML Sitemap containing
one <url>/<loc> per knowledge item, built via xmlbuilder2, with Content-Type: application/xml.
Independent Test: curl http://localhost:3000/sitemap.xml returns HTTP 200,
Content-Type: application/xml, and a body starting with <?xml containing <urlset>.
Tests for User Story 1 ⚠️ Write first — confirm tests FAIL before implementing T006–T008
-
T004 [P] [US1] Add
describe('sitemap flow')block totests/unit/proxy.test.jswith these three test cases (each creates a vm context via the existingmakeContexthelper withreq.urlset to'/sitemap.xml'):- Happy path — items present: mock
axios.getresolving{ data: { items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }, { 'vkm:url': 'https://kme.example.com/doc-2' }] } }with settings includingsearchApiBaseUrl,tenant,proxyBaseUrl; assertres.statusCode === 200,res.headers['Content-Type'] === 'application/xml', body contains<?xml,<urlset, and<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc> - Happy path — zero items: mock
axios.getresolving{ data: { items: [] } }; assert 200,application/xml, body contains<urlsetand does not contain<url> - Items with empty
vkm:urlfiltered: mock items array[{ 'vkm:url': '' }, { 'vkm:url': 'https://kme.example.com/valid' }]; assert body contains exactly one<loc>and it containsvalid
- Happy path — items present: mock
-
T005 [P] [US1] Add
describe('sitemap endpoint')block totests/contract/proxy-http.test.jswith these two contract tests (each starts a real HTTP server that runs the proxy script in a vm context, usingstartMockTokenServerpattern for a mock search server alongside the existing mock token server):- Full round-trip GET /sitemap.xml: mock search server returns
{ items: [{ 'vkm:url': 'https://kme.example.com/doc-1' }] }; send realaxios.get('http://localhost:<port>/sitemap.xml'); assert status 200,content-typeheader containsapplication/xml, body is parseable XML containing<loc> - Empty results round-trip: mock search server returns
{ items: [] }; assert 200,application/xml, body contains<urlsetand no<url>element
- Full round-trip GET /sitemap.xml: mock search server returns
Implementation for User Story 1
-
T006 [US1] Replace the
sitemapFlow()stub insrc/proxyScripts/kmeContentSourceAdapter.jswith a settings validation guard: declareconst requiredSitemapFields = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'], loop over each field, and if!kme_CSA_settings[field]respondres.writeHead(500, { 'Content-Type': 'text/plain' })+res.end('Configuration error: missing required field: ' + field)+return(per FR-011 and R-005); addconst { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;after the guard -
T007 [US1] Add token fetch and search API call to
sitemapFlow()insrc/proxyScripts/kmeContentSourceAdapter.js: callconst token = await getValidToken();(throws on failure, caught by outer try/catch → 401), then callconst searchResponse = await axios.get(\${searchApiBaseUrl}/${tenant}`, { headers: { Authorization: `OIDC_id_token ${token}` }, timeout: 10_000 }), then extractconst items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002) -
T008 [US1] Add item mapping, XML build, and HTTP response to
sitemapFlow()insrc/proxyScripts/kmeContentSourceAdapter.js: iterateitems, skip entries where!item['vkm:url'](FR-006), for each valid item computeconst loc = \${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}`(FR-005, R-006); build XML viaconst doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();(FR-008, R-003); serialise withconst xml = doc.end({ prettyPrint: false }); respondres.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
Checkpoint: npm run test:unit and npm run test:contract pass all sitemap happy-path tests.
At this point GET /sitemap.xml is fully functional; MVP is deliverable.
Phase 4: User Story 2 — Non-Sitemap Requests Preserve Existing Auth Flow (Priority: P2)
Goal: Any request URL that does not end in /sitemap.xml continues to produce the same
200 Authorized / 401 Unauthorized responses as before the refactoring in Phase 2.
Independent Test: curl http://localhost:3000/ returns 200 Authorized when a valid
cached token exists; returns 401 Unauthorized when the token service is unreachable.
Tests for User Story 2 ⚠️ Write first — confirm tests FAIL or are absent before implementing
-
T009 [P] [US2] Add
describe('non-sitemap URL routing')block totests/unit/proxy.test.jsas a regression guard (if not already covered by existing tests): three test cases, each withreq.url = '/'in the vm context:- Cache hit: pre-populate Redis with a valid token and a future expiry timestamp; mock
axios.postto fail (should never be called); assertres.statusCode === 200, body=== 'Authorized', andaxios.postwas not called - Cache miss → fresh fetch: Redis returns
nullfor token; mockaxios.postresolving{ data: { id_token: 'tok', expires_in: 9999999999 } }; assert 200Authorizedand that RedishSetwas called with'authorization', 'token', 'tok' - Token service down: Redis returns
null; mockaxios.postrejecting with{ code: 'ECONNABORTED' }; assertres.statusCode === 401, body starts with'Unauthorized:'
- Cache hit: pre-populate Redis with a valid token and a future expiry timestamp; mock
-
T010 [P] [US2] Add a
describe('non-sitemap endpoint (regression)')block totests/contract/proxy-http.test.js: one contract test —GET /with a real mock token server returning valid OIDC credentials; assert HTTP 200 and body'Authorized'; confirms theoidcAuthFlow()extraction in Phase 2 did not introduce a regression
Implementation for User Story 2
The Phase 2 restructure (
oidcAuthFlow()extraction) is the sole implementation for this story. Ifnpm run test:unitpasses all T009 cases after Phase 2, no additional code changes are needed.
- T011 [US2] Review
oidcAuthFlow()insrc/proxyScripts/kmeContentSourceAdapter.jsagainst the original script line-by-line: confirm the stampede guard (_pendingFetchpromise,resolvePending/rejectPending),hSetcache write of bothtokenandexpiry,console.debug/console.info/console.errorcalls, and all error-pathres.writeHead(401)/res.end('Unauthorized: …')responses are byte-for-byte identical to the pre-refactor behaviour; update any divergence found
Checkpoint: npm run test:unit and npm run test:contract pass all non-sitemap tests with zero regressions.
Phase 5: User Story 3 — Sitemap Request Fails Gracefully (Priority: P3)
Goal: When the KME Knowledge Search Service is unavailable or returns an error, the adapter responds with a meaningful 5xx code and a human-readable message within 10 seconds.
Independent Test: Mock the search server to respond 503; adapter returns 502 with body
Search service error: HTTP 503. Mock the search server to time out; adapter returns 504.
Tests for User Story 3 ⚠️ Write first — confirm tests FAIL before implementing T013
-
T011 [P] [US3] Add error-scenario test cases to the existing
describe('sitemap flow')block intests/unit/proxy.test.js(append after T004 cases):- Upstream 503: mock
axios.getrejecting with{ response: { status: 503 } }; assertres.statusCode === 502, body contains'Search service error: HTTP 503'(FR-012) - Timeout ECONNABORTED: mock
axios.getrejecting with{ code: 'ECONNABORTED' }; assertres.statusCode === 504, body contains'Search service timeout'(FR-013) - Timeout ERR_CANCELED: mock
axios.getrejecting with{ code: 'ERR_CANCELED' }; assertres.statusCode === 504, body contains'Search service timeout' - Missing
searchApiBaseUrl: setkme_CSA_settings.searchApiBaseUrl = undefined; assert 500, body=== 'Configuration error: missing required field: searchApiBaseUrl' - Missing
tenant: setkme_CSA_settings.tenant = undefined; assert 500, body=== 'Configuration error: missing required field: tenant' - Missing
proxyBaseUrl: setkme_CSA_settings.proxyBaseUrl = undefined; assert 500, body=== 'Configuration error: missing required field: proxyBaseUrl'
- Upstream 503: mock
-
T012 [P] [US3] Add error-scenario contract tests to the existing
describe('sitemap endpoint')block intests/contract/proxy-http.test.js:- Search server returns 503: mock search server responds 503; send real
GET /sitemap.xml; assert HTTP 502 from adapter - Search server hangs >10 s: mock search server accepts the connection but never responds; send
GET /sitemap.xmlwith a 15 s client timeout; assert adapter responds 504 within 12 s (accounts for 10 s upstream timeout + adapter overhead)
- Search server returns 503: mock search server responds 503; send real
Implementation for User Story 3
- T013 [US3] Wrap the body of
sitemapFlow()insrc/proxyScripts/kmeContentSourceAdapter.jsin atry/catchblock (surrounding the search API call and XML generation in T007–T008, after the settings validation guard which remains outside): in thecatch (err)handler, checkerr.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'→res.writeHead(504, { 'Content-Type': 'text/plain' }); res.end('Search service timeout');; else iferr.response→res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: HTTP ' + err.response.status);; else →res.writeHead(502, { 'Content-Type': 'text/plain' }); res.end('Search service error: ' + err.message);(per R-004 and contracts/sitemap-endpoint.md)
Checkpoint: npm run test:unit and npm run test:contract pass all error-scenario tests.
Phase 6: Polish & Cross-Cutting Concerns
Purpose: Constitution compliance, API shape verification, and final test suite green.
-
T014 [P] Verify
src/proxyScripts/kmeContentSourceAdapter.jsconstitution compliance: rungrep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.jsand confirm zero matches (FR-009, Constitution §I); confirmxmlbuilder2is the sole XML-building mechanism (FR-008); confirm no new files were created insrc/ -
T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call
GET ${searchApiBaseUrl}/${tenant}manually withcurl -H "Authorization: OIDC_id_token <token>" <searchApiBaseUrl>/<tenant>and confirm (a) the top-level key holding the items array (itemsvsresultsvs bare array) and (b) thatvkm:urlis a direct string property of each item; update the extraction lineresponse.data.items ?? response.datain T007 if the actual shape differs -
T016 Run the full test suite
npm testand confirm all unit and contract tests pass with zero failures, zero skipped tests, and no uncaught promise rejections
Dependencies
T001 ──────────────────────────────────────────────────────── (no deps, run any time)
T002 ──────────────────────────────────────────────────────── (no deps, run any time)
T003 ──────────────────────────────────────────────────────── (no deps, but do after T001/T002)
T004 ──────────── depends on T003 (needs restructured script to run in vm context)
T005 ──────────── depends on T003
T006 ──────────── depends on T003, T004, T005 (test-first: tests written before impl)
T007 ──────────── depends on T006
T008 ──────────── depends on T007
T009 ──────────── depends on T003 (regression tests for existing flow; parallel with T004–T008)
T010 ──────────── depends on T003
T011 [US2] ─────── depends on T003, T009, T010
T011 [US3] ─────── depends on T003, T007 (error tests need the search call in place)
T012 ──────────── depends on T003, T007
T013 ──────────── depends on T011[US3], T012 (tests written, confirmed failing)
T014 ──────────── depends on T003–T013 (final compliance check)
T015 ──────────── depends on T007 (search API shape may affect the items extraction line)
T016 ──────────── depends on all implementation tasks
Note on task ID collision: T011 appears in both Phase 4 (US2 implementation review) and Phase 5 (US3 error-scenario unit tests). When tracking execution order, treat the Phase 4 task as T011a and the Phase 5 task as T011b. Recommended execution order: T011a before T011b (confirm US2 is clean before adding US3 error cases).
Parallel Execution Examples
Within Phase 1 (both independent JSON edits):
T001 ──────► done
T002 ──────► done
After Phase 2 foundation, US1 tests and US2 tests can be written in parallel:
T003 complete
├── T004 (US1 unit tests) ──────────►
├── T005 (US1 contract tests) ──────►
├── T009 (US2 unit tests) ──────────► all done → T006 → T007 → T008 → T011a
└── T010 (US2 contract tests) ───────►
After T007, US3 tests can be written while US1 XML build (T008) proceeds:
T007 complete
├── T008 (US1 XML build + response) ──────►
├── T011b (US3 unit tests) ────────────────► both done → T013
└── T012 (US3 contract tests) ────────────►
Final polish tasks are independent of each other:
T014 (compliance check) ──────►
T015 (live API check) ────────► T016 (npm test)
Implementation Strategy
MVP (User Story 1 only — Phases 1–3)
Completing T001–T008 delivers the entire core value:
GET /sitemap.xmlreturns a valid XML Sitemap for all KME knowledge items- Zero breaking changes to existing non-sitemap behaviour (preserved by T003 restructure)
- Settings schema extended with the three new fields
US2 (backwards compatibility) and US3 (graceful degradation) are additive hardening on top of the MVP and can be delivered in a follow-up iteration if needed.
Incremental delivery order
- Iteration 1 (MVP): T001 → T002 → T003 → T004 + T005 → T006 → T007 → T008
- Iteration 2 (Hardening): T009 + T010 → T011a → T011b + T012 → T013
- Iteration 3 (Polish): T014 + T015 → T016
Format Validation
All tasks follow the required checklist format:
- [ ] [TaskID] [P?] [Story?] Description with file path
| Check | Result |
|---|---|
All tasks start with - [ ] checkbox |
✅ |
| All tasks have a sequential ID (T001–T016) | ✅ |
[P] only on tasks modifying different files with no unmet dependencies |
✅ |
[US1]/[US2]/[US3] labels only on user-story phase tasks |
✅ |
| Setup/Foundational/Polish tasks have no story label | ✅ |
| All tasks name at least one explicit file path | ✅ |