Files
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

11 KiB
Raw Permalink Blame History

Implementation Plan: Sitemap XML Generation

Branch: 002-sitemap-generation | Date: 2025-07-14 | Spec: spec.md Input: Feature specification from /specs/002-sitemap-generation/spec.md


Summary

Add a GET /sitemap.xml route to kmeContentSourceAdapter.js. The adapter detects sitemap requests by URL suffix, obtains a valid OIDC id_token from the Redis cache (reusing the existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each result's vkm:url field to a <loc> entry, and returns a standards-compliant XML Sitemap as application/xml. All existing non-sitemap requests are unaffected. Three new fields are added to kme_CSA_settings.json (searchApiBaseUrl, tenant, proxyBaseUrl).


Technical Context

Language/Version: Node.js ≥18, ESM ("type": "module") Primary Dependencies: axios (HTTP), redis (token cache), xmlbuilder2 (XML — already injected as xmlbuilder2), uuid, jsonwebtoken — all already in package.json Storage: Redis read/write (hGet/hSet) for OIDC token cache only — no new storage Testing: Node.js built-in test runner (node:test); no external test framework Target Platform: Linux server / container (HTTP proxy adapter) Project Type: HTTP proxy adapter (web-service) Performance Goals: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005) Constraints:

  • Zero import/export in kmeContentSourceAdapter.js (runs in vm.createContext)
  • No references to config, global.config, or process.env in proxy script
  • XML built exclusively with the injected xmlbuilder2 (FR-008)
  • No new npm packages; no new source files (monolithic architecture — Section I of constitution) Scale/Scope: Single tenant per deployment; all search results in one API call (no pagination, v1)

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

# Principle Status Notes
I Monolithic architecture PASS All new code added to kmeContentSourceAdapter.js; no new source files
I (vm.Script) Zero imports/exports in proxy script PASS Sitemap logic is inlined; no import statements introduced
I.0 No forbidden globals (config, global.config, process.env) PASS Only kme_CSA_settings, redis, axios, xmlbuilder2, req, res used
I.I Business logic in proxy.js PASS Auth, API call, XML generation all in kmeContentSourceAdapter.js
I.II Separate files only for allowed categories PASS Settings JSON in src/globalVariables/ (existing pattern)
I.III No new files challenged PASS No new files in src/
I.IV New config in src/globalVariables/ not config/default.json PASS Three fields added to kme_CSA_settings.json
I.V xmlbuilder2 already in globalVMContext PASS xmlbuilder2 create already injected; no server.js changes needed
II API-First Design PASS HTTP contract documented in contracts/sitemap-endpoint.md
III Test-First Development REQUIRED Unit + contract tests must be written before/alongside implementation
VII No new dependencies PASS All required packages already installed (xmlbuilder2, axios, redis)

Post-design re-check: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations.


Project Structure

Documentation (this feature)

specs/002-sitemap-generation/
├── plan.md              # This file (/speckit.plan command output)
├── spec.md              # Feature specification
├── research.md          # Phase 0 output (/speckit.plan command)
├── data-model.md        # Phase 1 output (/speckit.plan command)
├── quickstart.md        # Phase 1 output (/speckit.plan command)
├── contracts/           # Phase 1 output (/speckit.plan command)
│   └── sitemap-endpoint.md
└── tasks.md             # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)

Source Code (repository root)

src/
├── proxyScripts/
│   └── kmeContentSourceAdapter.js   # MODIFIED: sitemap branch + token helper added
├── globalVariables/
│   ├── kme_CSA_settings.json        # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl)
│   └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders
└── server.js                         # NO CHANGE

tests/
├── unit/
│   └── proxy.test.js                 # MODIFIED: sitemap test cases added
└── contract/
    └── proxy-http.test.js            # MODIFIED: sitemap HTTP contract tests added

Structure Decision: Single-project layout. No new directories. Only the proxy script, its settings JSON, and the existing test files are modified.


Phase 0: Research Findings

Full research notes: research.md

Research ID Topic Decision
R-001 Token reuse Inline shared getValidToken() helper in proxy script; branch on URL first
R-002 Search API response shape Assume { items: [...] }; verify against live API during implementation
R-003 xmlbuilder2 API xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({}) — no prettyPrint
R-004 Error mapping Reuse err.response / err.code === ECONNABORTED|ERR_CANCELED pattern
R-005 Settings validation requiredSitemapFields guard before any async work → HTTP 500
R-006 loc construction `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}`

Resolved NEEDS CLARIFICATION: None remain. All decisions are documented.


Phase 1: Design

Data Model

Full data model: data-model.md

Key entities:

  • KnowledgeItem — raw search result with vkm:url (read-only, from upstream API)
  • SitemapEntry{ loc: string } derived in-memory from KnowledgeItem
  • SitemapDocument — serialised XML output (urlset + url elements)
  • OIDCTokenCache — shared Redis store (unchanged; hGet/hSet pattern reused)
  • kme_CSA_settings — extended JSON settings (3 new fields)

Contracts

Full contract: contracts/sitemap-endpoint.md

Scenario Status Response
Search succeeds, items present 200 application/xml sitemap with <url> entries
Search succeeds, zero items 200 application/xml empty <urlset/>
Missing settings field 500 text/plain descriptive message
Upstream non-2xx 502 text/plain upstream error
Upstream timeout 504 text/plain timeout message

Implementation Design

Entry point restructure (single IIFE, no imports):

(async () => {
  // FR-001: Route on URL suffix
  if (req.url.endsWith('/sitemap.xml')) {
    await sitemapFlow();
  } else {
    await oidcAuthFlow();   // existing logic, moved to inner async function
  }
})();

sitemapFlow logic:

async function sitemapFlow() {
  // FR-011: Validate required settings
  const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'];
  for (const f of required) {
    if (!kme_CSA_settings[f]) {
      res.writeHead(500, { 'Content-Type': 'text/plain' });
      res.end('Configuration error: missing required field: ' + f);
      return;
    }
  }

  // FR-003: Obtain valid OIDC token (shared helper with existing flow)
  const token = await getValidToken();   // throws on failure → caught by outer try/catch

  // FR-002: Call KME Knowledge Search Service
  const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
  const searchResponse = await axios.get(
    `${searchApiBaseUrl}/${tenant}`,
    {
      headers: { Authorization: `OIDC_id_token ${token}` },
      timeout: 10_000,
    }
  );

  // Extract items (R-002: assume { items: [...] } or bare array)
  const items = searchResponse.data.items ?? searchResponse.data ?? [];

  // FR-004, FR-005, FR-006, FR-008: Build sitemap XML
  const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
  const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
  for (const item of items) {
    const vkmUrl = item['vkm:url'];
    if (!vkmUrl) continue;                                // FR-006: omit silently
    const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`;
    urlset.ele('url').ele('loc').txt(loc).up().up();
  }
  const xml = doc.end({ prettyPrint: false });

  // FR-007: Respond
  res.writeHead(200, { 'Content-Type': 'application/xml' });
  res.end(xml);
}

Error handling (wrapping sitemapFlow catch):

  • err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED' → 504
  • err.response defined → 502 Search service error: HTTP ${err.response.status}
  • other → 502 Search service error: ${err.message}

getValidToken helper (shared inline function; extract from existing OIDC flow):

Encapsulates steps 26 of the existing flow:

  • hGet('authorization', 'token') / hGet('authorization', 'expiry')
  • Cache hit → return token
  • Stampede guard → queue on in-flight promise
  • Cache miss → axios.post(tokenUrl, ...)hSet both fields
  • Returns the id_token string; throws on failure

Token fetch failure in sitemap context: If getValidToken throws, the outer catch returns 401 Unauthorized: <message> (same as existing flow).

Test Plan

Unit tests (tests/unit/proxy.test.js) — new describe('sitemap flow') block:

Scenario Mock Assert
Happy path: items present axios.get → { items: [{ 'vkm:url': '...' }] } 200, application/xml, <loc>
Happy path: zero items axios.get → { items: [] } 200, empty <urlset/>
Items with empty vkm:url mix of valid + empty only non-empty items in output
Missing searchApiBaseUrl settings without field 500, descriptive message
Missing tenant settings without field 500, descriptive message
Missing proxyBaseUrl settings without field 500, descriptive message
Upstream 503 axios.get rejects with { response: { status: 503 } } 502
Upstream timeout axios.get rejects with { code: 'ECONNABORTED' } 504
Non-sitemap URL still works req.url = '/' existing 200 Authorized behaviour

Contract tests (tests/contract/proxy-http.test.js) — new describe('sitemap endpoint') block:

Scenario Setup Assert
Full round-trip: GET /sitemap.xml Mock search server → 200 { items: [...] } 200, application/xml, valid XML with <loc>
Empty results Mock search server → 200 { items: [] } 200, application/xml, empty <urlset/>
Search server returns 503 Mock → 503 502
Search server hangs > 10 s Mock → never respond 504

Complexity Tracking

No violations to justify. All gates pass. No entries required.