feat(002): add sitemap generation feature

- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
This commit is contained in:
2026-04-22 22:08:08 -05:00
parent 49a6b2e4e7
commit 50b87297d2
17 changed files with 1879 additions and 40 deletions

View File

@@ -0,0 +1,248 @@
# Implementation Plan: Sitemap XML Generation
**Branch**: `002-sitemap-generation` | **Date**: 2025-07-14 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/002-sitemap-generation/spec.md`
---
## Summary
Add a `GET /sitemap.xml` route to `kmeContentSourceAdapter.js`. The adapter detects sitemap
requests by URL suffix, obtains a valid OIDC `id_token` from the Redis cache (reusing the
existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each
result's `vkm:url` field to a `<loc>` entry, and returns a standards-compliant XML Sitemap as
`application/xml`. All existing non-sitemap requests are unaffected. Three new fields are added
to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
---
## Technical Context
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage
**Testing**: Node.js built-in test runner (`node:test`); no external test framework
**Target Platform**: Linux server / container (HTTP proxy adapter)
**Project Type**: HTTP proxy adapter (web-service)
**Performance Goals**: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005)
**Constraints**:
- Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`)
- No references to `config`, `global.config`, or `process.env` in proxy script
- XML built exclusively with the injected `xmlBuilder` (FR-008)
- No new npm packages; no new source files (monolithic architecture — Section I of constitution)
**Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1)
---
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| # | Principle | Status | Notes |
|---|---|---|---|
| I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files |
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlBuilder`, `req`, `res` used |
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` |
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) |
| I.III | No new files challenged | ✅ PASS | No new files in `src/` |
| I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` |
| I.V | `xmlBuilder` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` |
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
| VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) |
**Post-design re-check**: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations.
---
## Project Structure
### Documentation (this feature)
```text
specs/002-sitemap-generation/
├── plan.md # This file (/speckit.plan command output)
├── spec.md # Feature specification
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
│ └── sitemap-endpoint.md
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
```text
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # MODIFIED: sitemap branch + token helper added
├── globalVariables/
│ ├── kme_CSA_settings.json # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl)
│ └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders
└── server.js # NO CHANGE
tests/
├── unit/
│ └── proxy.test.js # MODIFIED: sitemap test cases added
└── contract/
└── proxy-http.test.js # MODIFIED: sitemap HTTP contract tests added
```
**Structure Decision**: Single-project layout. No new directories. Only the proxy script, its
settings JSON, and the existing test files are modified.
---
## Phase 0: Research Findings
> Full research notes: [research.md](./research.md)
| Research ID | Topic | Decision |
|---|---|---|
| R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation |
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern |
| R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 |
| R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` |
**Resolved NEEDS CLARIFICATION**: None remain. All decisions are documented.
---
## Phase 1: Design
### Data Model
> Full data model: [data-model.md](./data-model.md)
**Key entities**:
- `KnowledgeItem` — raw search result with `vkm:url` (read-only, from upstream API)
- `SitemapEntry` — `{ loc: string }` derived in-memory from `KnowledgeItem`
- `SitemapDocument` — serialised XML output (`urlset` + `url` elements)
- `OIDCTokenCache` — shared Redis store (unchanged; `hGet`/`hSet` pattern reused)
- `kme_CSA_settings` — extended JSON settings (3 new fields)
### Contracts
> Full contract: [contracts/sitemap-endpoint.md](./contracts/sitemap-endpoint.md)
| Scenario | Status | Response |
|---|---|---|
| Search succeeds, items present | 200 | `application/xml` sitemap with `<url>` entries |
| Search succeeds, zero items | 200 | `application/xml` empty `<urlset/>` |
| Missing settings field | 500 | `text/plain` descriptive message |
| Upstream non-2xx | 502 | `text/plain` upstream error |
| Upstream timeout | 504 | `text/plain` timeout message |
### Implementation Design
**Entry point restructure** (single IIFE, no imports):
```javascript
(async () => {
// FR-001: Route on URL suffix
if (req.url.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else {
await oidcAuthFlow(); // existing logic, moved to inner async function
}
})();
```
**`sitemapFlow` logic**:
```javascript
async function sitemapFlow() {
// FR-011: Validate required settings
const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'];
for (const f of required) {
if (!kme_CSA_settings[f]) {
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Configuration error: missing required field: ' + f);
return;
}
}
// FR-003: Obtain valid OIDC token (shared helper with existing flow)
const token = await getValidToken(); // throws on failure → caught by outer try/catch
// FR-002: Call KME Knowledge Search Service
const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
const searchResponse = await axios.get(
`${searchApiBaseUrl}/${tenant}`,
{
headers: { Authorization: `OIDC_id_token ${token}` },
timeout: 10_000,
}
);
// Extract items (R-002: assume { items: [...] } or bare array)
const items = searchResponse.data.items ?? searchResponse.data ?? [];
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];
if (!vkmUrl) continue; // FR-006: omit silently
const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`;
urlset.ele('url').ele('loc').txt(loc).up().up();
}
const xml = doc.end({ prettyPrint: false });
// FR-007: Respond
res.writeHead(200, { 'Content-Type': 'application/xml' });
res.end(xml);
}
```
**Error handling** (wrapping `sitemapFlow` catch):
- `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → 504
- `err.response` defined → 502 `Search service error: HTTP ${err.response.status}`
- other → 502 `Search service error: ${err.message}`
**`getValidToken` helper** (shared inline function; extract from existing OIDC flow):
Encapsulates steps 26 of the existing flow:
- `hGet('authorization', 'token')` / `hGet('authorization', 'expiry')`
- Cache hit → return token
- Stampede guard → queue on in-flight promise
- Cache miss → `axios.post(tokenUrl, ...)` → `hSet` both fields
- Returns the `id_token` string; throws on failure
**Token fetch failure in sitemap context**: If `getValidToken` throws, the outer catch
returns `401 Unauthorized: <message>` (same as existing flow).
### Test Plan
**Unit tests** (`tests/unit/proxy.test.js`) — new `describe('sitemap flow')` block:
| Scenario | Mock | Assert |
|---|---|---|
| Happy path: items present | axios.get → `{ items: [{ 'vkm:url': '...' }] }` | 200, `application/xml`, `<loc>` |
| Happy path: zero items | axios.get → `{ items: [] }` | 200, empty `<urlset/>` |
| Items with empty vkm:url | mix of valid + empty | only non-empty items in output |
| Missing `searchApiBaseUrl` | settings without field | 500, descriptive message |
| Missing `tenant` | settings without field | 500, descriptive message |
| Missing `proxyBaseUrl` | settings without field | 500, descriptive message |
| Upstream 503 | axios.get rejects with `{ response: { status: 503 } }` | 502 |
| Upstream timeout | axios.get rejects with `{ code: 'ECONNABORTED' }` | 504 |
| Non-sitemap URL still works | req.url = '/' | existing 200 Authorized behaviour |
**Contract tests** (`tests/contract/proxy-http.test.js`) — new `describe('sitemap endpoint')` block:
| Scenario | Setup | Assert |
|---|---|---|
| Full round-trip: GET /sitemap.xml | Mock search server → 200 `{ items: [...] }` | 200, `application/xml`, valid XML with `<loc>` |
| Empty results | Mock search server → 200 `{ items: [] }` | 200, `application/xml`, empty `<urlset/>` |
| Search server returns 503 | Mock → 503 | 502 |
| Search server hangs > 10 s | Mock → never respond | 504 |
---
## Complexity Tracking
> No violations to justify. All gates pass. No entries required.