- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
249 lines
11 KiB
Markdown
249 lines
11 KiB
Markdown
# Implementation Plan: Sitemap XML Generation
|
||
|
||
**Branch**: `002-sitemap-generation` | **Date**: 2025-07-14 | **Spec**: [spec.md](./spec.md)
|
||
**Input**: Feature specification from `/specs/002-sitemap-generation/spec.md`
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
Add a `GET /sitemap.xml` route to `kmeContentSourceAdapter.js`. The adapter detects sitemap
|
||
requests by URL suffix, obtains a valid OIDC `id_token` from the Redis cache (reusing the
|
||
existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each
|
||
result's `vkm:url` field to a `<loc>` entry, and returns a standards-compliant XML Sitemap as
|
||
`application/xml`. All existing non-sitemap requests are unaffected. Three new fields are added
|
||
to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
|
||
|
||
---
|
||
|
||
## Technical Context
|
||
|
||
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
|
||
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json`
|
||
**Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage
|
||
**Testing**: Node.js built-in test runner (`node:test`); no external test framework
|
||
**Target Platform**: Linux server / container (HTTP proxy adapter)
|
||
**Project Type**: HTTP proxy adapter (web-service)
|
||
**Performance Goals**: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005)
|
||
**Constraints**:
|
||
- Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`)
|
||
- No references to `config`, `global.config`, or `process.env` in proxy script
|
||
- XML built exclusively with the injected `xmlBuilder` (FR-008)
|
||
- No new npm packages; no new source files (monolithic architecture — Section I of constitution)
|
||
**Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1)
|
||
|
||
---
|
||
|
||
## Constitution Check
|
||
|
||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||
|
||
| # | Principle | Status | Notes |
|
||
|---|---|---|---|
|
||
| I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files |
|
||
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
|
||
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlBuilder`, `req`, `res` used |
|
||
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` |
|
||
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) |
|
||
| I.III | No new files challenged | ✅ PASS | No new files in `src/` |
|
||
| I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` |
|
||
| I.V | `xmlBuilder` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
|
||
| II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` |
|
||
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
|
||
| VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) |
|
||
|
||
**Post-design re-check**: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations.
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
### Documentation (this feature)
|
||
|
||
```text
|
||
specs/002-sitemap-generation/
|
||
├── plan.md # This file (/speckit.plan command output)
|
||
├── spec.md # Feature specification
|
||
├── research.md # Phase 0 output (/speckit.plan command)
|
||
├── data-model.md # Phase 1 output (/speckit.plan command)
|
||
├── quickstart.md # Phase 1 output (/speckit.plan command)
|
||
├── contracts/ # Phase 1 output (/speckit.plan command)
|
||
│ └── sitemap-endpoint.md
|
||
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
|
||
```
|
||
|
||
### Source Code (repository root)
|
||
|
||
```text
|
||
src/
|
||
├── proxyScripts/
|
||
│ └── kmeContentSourceAdapter.js # MODIFIED: sitemap branch + token helper added
|
||
├── globalVariables/
|
||
│ ├── kme_CSA_settings.json # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl)
|
||
│ └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders
|
||
└── server.js # NO CHANGE
|
||
|
||
tests/
|
||
├── unit/
|
||
│ └── proxy.test.js # MODIFIED: sitemap test cases added
|
||
└── contract/
|
||
└── proxy-http.test.js # MODIFIED: sitemap HTTP contract tests added
|
||
```
|
||
|
||
**Structure Decision**: Single-project layout. No new directories. Only the proxy script, its
|
||
settings JSON, and the existing test files are modified.
|
||
|
||
---
|
||
|
||
## Phase 0: Research Findings
|
||
|
||
> Full research notes: [research.md](./research.md)
|
||
|
||
| Research ID | Topic | Decision |
|
||
|---|---|---|
|
||
| R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first |
|
||
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation |
|
||
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
|
||
| R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern |
|
||
| R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 |
|
||
| R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` |
|
||
|
||
**Resolved NEEDS CLARIFICATION**: None remain. All decisions are documented.
|
||
|
||
---
|
||
|
||
## Phase 1: Design
|
||
|
||
### Data Model
|
||
|
||
> Full data model: [data-model.md](./data-model.md)
|
||
|
||
**Key entities**:
|
||
- `KnowledgeItem` — raw search result with `vkm:url` (read-only, from upstream API)
|
||
- `SitemapEntry` — `{ loc: string }` derived in-memory from `KnowledgeItem`
|
||
- `SitemapDocument` — serialised XML output (`urlset` + `url` elements)
|
||
- `OIDCTokenCache` — shared Redis store (unchanged; `hGet`/`hSet` pattern reused)
|
||
- `kme_CSA_settings` — extended JSON settings (3 new fields)
|
||
|
||
### Contracts
|
||
|
||
> Full contract: [contracts/sitemap-endpoint.md](./contracts/sitemap-endpoint.md)
|
||
|
||
| Scenario | Status | Response |
|
||
|---|---|---|
|
||
| Search succeeds, items present | 200 | `application/xml` sitemap with `<url>` entries |
|
||
| Search succeeds, zero items | 200 | `application/xml` empty `<urlset/>` |
|
||
| Missing settings field | 500 | `text/plain` descriptive message |
|
||
| Upstream non-2xx | 502 | `text/plain` upstream error |
|
||
| Upstream timeout | 504 | `text/plain` timeout message |
|
||
|
||
### Implementation Design
|
||
|
||
**Entry point restructure** (single IIFE, no imports):
|
||
|
||
```javascript
|
||
(async () => {
|
||
// FR-001: Route on URL suffix
|
||
if (req.url.endsWith('/sitemap.xml')) {
|
||
await sitemapFlow();
|
||
} else {
|
||
await oidcAuthFlow(); // existing logic, moved to inner async function
|
||
}
|
||
})();
|
||
```
|
||
|
||
**`sitemapFlow` logic**:
|
||
|
||
```javascript
|
||
async function sitemapFlow() {
|
||
// FR-011: Validate required settings
|
||
const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'];
|
||
for (const f of required) {
|
||
if (!kme_CSA_settings[f]) {
|
||
res.writeHead(500, { 'Content-Type': 'text/plain' });
|
||
res.end('Configuration error: missing required field: ' + f);
|
||
return;
|
||
}
|
||
}
|
||
|
||
// FR-003: Obtain valid OIDC token (shared helper with existing flow)
|
||
const token = await getValidToken(); // throws on failure → caught by outer try/catch
|
||
|
||
// FR-002: Call KME Knowledge Search Service
|
||
const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
|
||
const searchResponse = await axios.get(
|
||
`${searchApiBaseUrl}/${tenant}`,
|
||
{
|
||
headers: { Authorization: `OIDC_id_token ${token}` },
|
||
timeout: 10_000,
|
||
}
|
||
);
|
||
|
||
// Extract items (R-002: assume { items: [...] } or bare array)
|
||
const items = searchResponse.data.items ?? searchResponse.data ?? [];
|
||
|
||
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
|
||
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
|
||
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
|
||
for (const item of items) {
|
||
const vkmUrl = item['vkm:url'];
|
||
if (!vkmUrl) continue; // FR-006: omit silently
|
||
const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`;
|
||
urlset.ele('url').ele('loc').txt(loc).up().up();
|
||
}
|
||
const xml = doc.end({ prettyPrint: false });
|
||
|
||
// FR-007: Respond
|
||
res.writeHead(200, { 'Content-Type': 'application/xml' });
|
||
res.end(xml);
|
||
}
|
||
```
|
||
|
||
**Error handling** (wrapping `sitemapFlow` catch):
|
||
- `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → 504
|
||
- `err.response` defined → 502 `Search service error: HTTP ${err.response.status}`
|
||
- other → 502 `Search service error: ${err.message}`
|
||
|
||
**`getValidToken` helper** (shared inline function; extract from existing OIDC flow):
|
||
|
||
Encapsulates steps 2–6 of the existing flow:
|
||
- `hGet('authorization', 'token')` / `hGet('authorization', 'expiry')`
|
||
- Cache hit → return token
|
||
- Stampede guard → queue on in-flight promise
|
||
- Cache miss → `axios.post(tokenUrl, ...)` → `hSet` both fields
|
||
- Returns the `id_token` string; throws on failure
|
||
|
||
**Token fetch failure in sitemap context**: If `getValidToken` throws, the outer catch
|
||
returns `401 Unauthorized: <message>` (same as existing flow).
|
||
|
||
### Test Plan
|
||
|
||
**Unit tests** (`tests/unit/proxy.test.js`) — new `describe('sitemap flow')` block:
|
||
|
||
| Scenario | Mock | Assert |
|
||
|---|---|---|
|
||
| Happy path: items present | axios.get → `{ items: [{ 'vkm:url': '...' }] }` | 200, `application/xml`, `<loc>` |
|
||
| Happy path: zero items | axios.get → `{ items: [] }` | 200, empty `<urlset/>` |
|
||
| Items with empty vkm:url | mix of valid + empty | only non-empty items in output |
|
||
| Missing `searchApiBaseUrl` | settings without field | 500, descriptive message |
|
||
| Missing `tenant` | settings without field | 500, descriptive message |
|
||
| Missing `proxyBaseUrl` | settings without field | 500, descriptive message |
|
||
| Upstream 503 | axios.get rejects with `{ response: { status: 503 } }` | 502 |
|
||
| Upstream timeout | axios.get rejects with `{ code: 'ECONNABORTED' }` | 504 |
|
||
| Non-sitemap URL still works | req.url = '/' | existing 200 Authorized behaviour |
|
||
|
||
**Contract tests** (`tests/contract/proxy-http.test.js`) — new `describe('sitemap endpoint')` block:
|
||
|
||
| Scenario | Setup | Assert |
|
||
|---|---|---|
|
||
| Full round-trip: GET /sitemap.xml | Mock search server → 200 `{ items: [...] }` | 200, `application/xml`, valid XML with `<loc>` |
|
||
| Empty results | Mock search server → 200 `{ items: [] }` | 200, `application/xml`, empty `<urlset/>` |
|
||
| Search server returns 503 | Mock → 503 | 502 |
|
||
| Search server hangs > 10 s | Mock → never respond | 504 |
|
||
|
||
---
|
||
|
||
## Complexity Tracking
|
||
|
||
> No violations to justify. All gates pass. No entries required.
|