- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
11 KiB
Implementation Plan: Sitemap XML Generation
Branch: 002-sitemap-generation | Date: 2025-07-14 | Spec: spec.md
Input: Feature specification from /specs/002-sitemap-generation/spec.md
Summary
Add a GET /sitemap.xml route to kmeContentSourceAdapter.js. The adapter detects sitemap
requests by URL suffix, obtains a valid OIDC id_token from the Redis cache (reusing the
existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each
result's vkm:url field to a <loc> entry, and returns a standards-compliant XML Sitemap as
application/xml. All existing non-sitemap requests are unaffected. Three new fields are added
to kme_CSA_settings.json (searchApiBaseUrl, tenant, proxyBaseUrl).
Technical Context
Language/Version: Node.js ≥18, ESM ("type": "module")
Primary Dependencies: axios (HTTP), redis (token cache), xmlbuilder2 (XML — already injected as xmlbuilder2), uuid, jsonwebtoken — all already in package.json
Storage: Redis read/write (hGet/hSet) for OIDC token cache only — no new storage
Testing: Node.js built-in test runner (node:test); no external test framework
Target Platform: Linux server / container (HTTP proxy adapter)
Project Type: HTTP proxy adapter (web-service)
Performance Goals: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005)
Constraints:
- Zero
import/exportinkmeContentSourceAdapter.js(runs invm.createContext) - No references to
config,global.config, orprocess.envin proxy script - XML built exclusively with the injected
xmlbuilder2(FR-008) - No new npm packages; no new source files (monolithic architecture — Section I of constitution) Scale/Scope: Single tenant per deployment; all search results in one API call (no pagination, v1)
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| # | Principle | Status | Notes |
|---|---|---|---|
| I | Monolithic architecture | ✅ PASS | All new code added to kmeContentSourceAdapter.js; no new source files |
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
| I.0 | No forbidden globals (config, global.config, process.env) |
✅ PASS | Only kme_CSA_settings, redis, axios, xmlbuilder2, req, res used |
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in kmeContentSourceAdapter.js |
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in src/globalVariables/ (existing pattern) |
| I.III | No new files challenged | ✅ PASS | No new files in src/ |
| I.IV | New config in src/globalVariables/ not config/default.json |
✅ PASS | Three fields added to kme_CSA_settings.json |
| I.V | xmlbuilder2 already in globalVMContext |
✅ PASS | xmlbuilder2 create already injected; no server.js changes needed |
| II | API-First Design | ✅ PASS | HTTP contract documented in contracts/sitemap-endpoint.md |
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
| VII | No new dependencies | ✅ PASS | All required packages already installed (xmlbuilder2, axios, redis) |
Post-design re-check: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations.
Project Structure
Documentation (this feature)
specs/002-sitemap-generation/
├── plan.md # This file (/speckit.plan command output)
├── spec.md # Feature specification
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
│ └── sitemap-endpoint.md
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
Source Code (repository root)
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # MODIFIED: sitemap branch + token helper added
├── globalVariables/
│ ├── kme_CSA_settings.json # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl)
│ └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders
└── server.js # NO CHANGE
tests/
├── unit/
│ └── proxy.test.js # MODIFIED: sitemap test cases added
└── contract/
└── proxy-http.test.js # MODIFIED: sitemap HTTP contract tests added
Structure Decision: Single-project layout. No new directories. Only the proxy script, its settings JSON, and the existing test files are modified.
Phase 0: Research Findings
Full research notes: research.md
| Research ID | Topic | Decision |
|---|---|---|
| R-001 | Token reuse | Inline shared getValidToken() helper in proxy script; branch on URL first |
| R-002 | Search API response shape | Assume { items: [...] }; verify against live API during implementation |
| R-003 | xmlbuilder2 API | xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({}) — no prettyPrint |
| R-004 | Error mapping | Reuse err.response / err.code === ECONNABORTED|ERR_CANCELED pattern |
| R-005 | Settings validation | requiredSitemapFields guard before any async work → HTTP 500 |
| R-006 | loc construction |
`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` |
Resolved NEEDS CLARIFICATION: None remain. All decisions are documented.
Phase 1: Design
Data Model
Full data model: data-model.md
Key entities:
KnowledgeItem— raw search result withvkm:url(read-only, from upstream API)SitemapEntry—{ loc: string }derived in-memory fromKnowledgeItemSitemapDocument— serialised XML output (urlset+urlelements)OIDCTokenCache— shared Redis store (unchanged;hGet/hSetpattern reused)kme_CSA_settings— extended JSON settings (3 new fields)
Contracts
Full contract: contracts/sitemap-endpoint.md
| Scenario | Status | Response |
|---|---|---|
| Search succeeds, items present | 200 | application/xml sitemap with <url> entries |
| Search succeeds, zero items | 200 | application/xml empty <urlset/> |
| Missing settings field | 500 | text/plain descriptive message |
| Upstream non-2xx | 502 | text/plain upstream error |
| Upstream timeout | 504 | text/plain timeout message |
Implementation Design
Entry point restructure (single IIFE, no imports):
(async () => {
// FR-001: Route on URL suffix
if (req.url.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else {
await oidcAuthFlow(); // existing logic, moved to inner async function
}
})();
sitemapFlow logic:
async function sitemapFlow() {
// FR-011: Validate required settings
const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'];
for (const f of required) {
if (!kme_CSA_settings[f]) {
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Configuration error: missing required field: ' + f);
return;
}
}
// FR-003: Obtain valid OIDC token (shared helper with existing flow)
const token = await getValidToken(); // throws on failure → caught by outer try/catch
// FR-002: Call KME Knowledge Search Service
const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
const searchResponse = await axios.get(
`${searchApiBaseUrl}/${tenant}`,
{
headers: { Authorization: `OIDC_id_token ${token}` },
timeout: 10_000,
}
);
// Extract items (R-002: assume { items: [...] } or bare array)
const items = searchResponse.data.items ?? searchResponse.data ?? [];
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];
if (!vkmUrl) continue; // FR-006: omit silently
const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`;
urlset.ele('url').ele('loc').txt(loc).up().up();
}
const xml = doc.end({ prettyPrint: false });
// FR-007: Respond
res.writeHead(200, { 'Content-Type': 'application/xml' });
res.end(xml);
}
Error handling (wrapping sitemapFlow catch):
err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'→ 504err.responsedefined → 502Search service error: HTTP ${err.response.status}- other → 502
Search service error: ${err.message}
getValidToken helper (shared inline function; extract from existing OIDC flow):
Encapsulates steps 2–6 of the existing flow:
hGet('authorization', 'token')/hGet('authorization', 'expiry')- Cache hit → return token
- Stampede guard → queue on in-flight promise
- Cache miss →
axios.post(tokenUrl, ...)→hSetboth fields - Returns the
id_tokenstring; throws on failure
Token fetch failure in sitemap context: If getValidToken throws, the outer catch
returns 401 Unauthorized: <message> (same as existing flow).
Test Plan
Unit tests (tests/unit/proxy.test.js) — new describe('sitemap flow') block:
| Scenario | Mock | Assert |
|---|---|---|
| Happy path: items present | axios.get → { items: [{ 'vkm:url': '...' }] } |
200, application/xml, <loc> |
| Happy path: zero items | axios.get → { items: [] } |
200, empty <urlset/> |
| Items with empty vkm:url | mix of valid + empty | only non-empty items in output |
Missing searchApiBaseUrl |
settings without field | 500, descriptive message |
Missing tenant |
settings without field | 500, descriptive message |
Missing proxyBaseUrl |
settings without field | 500, descriptive message |
| Upstream 503 | axios.get rejects with { response: { status: 503 } } |
502 |
| Upstream timeout | axios.get rejects with { code: 'ECONNABORTED' } |
504 |
| Non-sitemap URL still works | req.url = '/' | existing 200 Authorized behaviour |
Contract tests (tests/contract/proxy-http.test.js) — new describe('sitemap endpoint') block:
| Scenario | Setup | Assert |
|---|---|---|
| Full round-trip: GET /sitemap.xml | Mock search server → 200 { items: [...] } |
200, application/xml, valid XML with <loc> |
| Empty results | Mock search server → 200 { items: [] } |
200, application/xml, empty <urlset/> |
| Search server returns 503 | Mock → 503 | 502 |
| Search server hangs > 10 s | Mock → never respond | 504 |
Complexity Tracking
No violations to justify. All gates pass. No entries required.