Files
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

249 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Implementation Plan: Sitemap XML Generation
**Branch**: `002-sitemap-generation` | **Date**: 2025-07-14 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/002-sitemap-generation/spec.md`
---
## Summary
Add a `GET /sitemap.xml` route to `kmeContentSourceAdapter.js`. The adapter detects sitemap
requests by URL suffix, obtains a valid OIDC `id_token` from the Redis cache (reusing the
existing stampede-guarded refresh logic), calls the KME Knowledge Search Service, maps each
result's `vkm:url` field to a `<loc>` entry, and returns a standards-compliant XML Sitemap as
`application/xml`. All existing non-sitemap requests are unaffected. Three new fields are added
to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
---
## Technical Context
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage
**Testing**: Node.js built-in test runner (`node:test`); no external test framework
**Target Platform**: Linux server / container (HTTP proxy adapter)
**Project Type**: HTTP proxy adapter (web-service)
**Performance Goals**: Sitemap response < 5 s p95 under normal conditions (SC-001); error responses < 10 s (SC-005)
**Constraints**:
- Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`)
- No references to `config`, `global.config`, or `process.env` in proxy script
- XML built exclusively with the injected `xmlbuilder2` (FR-008)
- No new npm packages; no new source files (monolithic architecture — Section I of constitution)
**Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1)
---
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| # | Principle | Status | Notes |
|---|---|---|---|
| I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files |
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlbuilder2`, `req`, `res` used |
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` |
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) |
| I.III | No new files challenged | ✅ PASS | No new files in `src/` |
| I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` |
| I.V | `xmlbuilder2` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` |
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
| VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) |
**Post-design re-check**: All gates still pass. The design introduces zero new files, zero new dependencies, and zero architectural violations.
---
## Project Structure
### Documentation (this feature)
```text
specs/002-sitemap-generation/
├── plan.md # This file (/speckit.plan command output)
├── spec.md # Feature specification
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
│ └── sitemap-endpoint.md
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
```text
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # MODIFIED: sitemap branch + token helper added
├── globalVariables/
│ ├── kme_CSA_settings.json # MODIFIED: 3 new fields (searchApiBaseUrl, tenant, proxyBaseUrl)
│ └── kme_CSA_settings.json.example # MODIFIED: updated with new field placeholders
└── server.js # NO CHANGE
tests/
├── unit/
│ └── proxy.test.js # MODIFIED: sitemap test cases added
└── contract/
└── proxy-http.test.js # MODIFIED: sitemap HTTP contract tests added
```
**Structure Decision**: Single-project layout. No new directories. Only the proxy script, its
settings JSON, and the existing test files are modified.
---
## Phase 0: Research Findings
> Full research notes: [research.md](./research.md)
| Research ID | Topic | Decision |
|---|---|---|
| R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation |
| R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern |
| R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 |
| R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` |
**Resolved NEEDS CLARIFICATION**: None remain. All decisions are documented.
---
## Phase 1: Design
### Data Model
> Full data model: [data-model.md](./data-model.md)
**Key entities**:
- `KnowledgeItem` — raw search result with `vkm:url` (read-only, from upstream API)
- `SitemapEntry` — `{ loc: string }` derived in-memory from `KnowledgeItem`
- `SitemapDocument` — serialised XML output (`urlset` + `url` elements)
- `OIDCTokenCache` — shared Redis store (unchanged; `hGet`/`hSet` pattern reused)
- `kme_CSA_settings` — extended JSON settings (3 new fields)
### Contracts
> Full contract: [contracts/sitemap-endpoint.md](./contracts/sitemap-endpoint.md)
| Scenario | Status | Response |
|---|---|---|
| Search succeeds, items present | 200 | `application/xml` sitemap with `<url>` entries |
| Search succeeds, zero items | 200 | `application/xml` empty `<urlset/>` |
| Missing settings field | 500 | `text/plain` descriptive message |
| Upstream non-2xx | 502 | `text/plain` upstream error |
| Upstream timeout | 504 | `text/plain` timeout message |
### Implementation Design
**Entry point restructure** (single IIFE, no imports):
```javascript
(async () => {
// FR-001: Route on URL suffix
if (req.url.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else {
await oidcAuthFlow(); // existing logic, moved to inner async function
}
})();
```
**`sitemapFlow` logic**:
```javascript
async function sitemapFlow() {
// FR-011: Validate required settings
const required = ['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'];
for (const f of required) {
if (!kme_CSA_settings[f]) {
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Configuration error: missing required field: ' + f);
return;
}
}
// FR-003: Obtain valid OIDC token (shared helper with existing flow)
const token = await getValidToken(); // throws on failure → caught by outer try/catch
// FR-002: Call KME Knowledge Search Service
const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
const searchResponse = await axios.get(
`${searchApiBaseUrl}/${tenant}`,
{
headers: { Authorization: `OIDC_id_token ${token}` },
timeout: 10_000,
}
);
// Extract items (R-002: assume { items: [...] } or bare array)
const items = searchResponse.data.items ?? searchResponse.data ?? [];
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];
if (!vkmUrl) continue; // FR-006: omit silently
const loc = `${proxyBaseUrl}?kmeURL=${encodeURIComponent(vkmUrl)}`;
urlset.ele('url').ele('loc').txt(loc).up().up();
}
const xml = doc.end({ prettyPrint: false });
// FR-007: Respond
res.writeHead(200, { 'Content-Type': 'application/xml' });
res.end(xml);
}
```
**Error handling** (wrapping `sitemapFlow` catch):
- `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → 504
- `err.response` defined → 502 `Search service error: HTTP ${err.response.status}`
- other → 502 `Search service error: ${err.message}`
**`getValidToken` helper** (shared inline function; extract from existing OIDC flow):
Encapsulates steps 26 of the existing flow:
- `hGet('authorization', 'token')` / `hGet('authorization', 'expiry')`
- Cache hit → return token
- Stampede guard → queue on in-flight promise
- Cache miss → `axios.post(tokenUrl, ...)` → `hSet` both fields
- Returns the `id_token` string; throws on failure
**Token fetch failure in sitemap context**: If `getValidToken` throws, the outer catch
returns `401 Unauthorized: <message>` (same as existing flow).
### Test Plan
**Unit tests** (`tests/unit/proxy.test.js`) — new `describe('sitemap flow')` block:
| Scenario | Mock | Assert |
|---|---|---|
| Happy path: items present | axios.get → `{ items: [{ 'vkm:url': '...' }] }` | 200, `application/xml`, `<loc>` |
| Happy path: zero items | axios.get → `{ items: [] }` | 200, empty `<urlset/>` |
| Items with empty vkm:url | mix of valid + empty | only non-empty items in output |
| Missing `searchApiBaseUrl` | settings without field | 500, descriptive message |
| Missing `tenant` | settings without field | 500, descriptive message |
| Missing `proxyBaseUrl` | settings without field | 500, descriptive message |
| Upstream 503 | axios.get rejects with `{ response: { status: 503 } }` | 502 |
| Upstream timeout | axios.get rejects with `{ code: 'ECONNABORTED' }` | 504 |
| Non-sitemap URL still works | req.url = '/' | existing 200 Authorized behaviour |
**Contract tests** (`tests/contract/proxy-http.test.js`) — new `describe('sitemap endpoint')` block:
| Scenario | Setup | Assert |
|---|---|---|
| Full round-trip: GET /sitemap.xml | Mock search server → 200 `{ items: [...] }` | 200, `application/xml`, valid XML with `<loc>` |
| Empty results | Mock search server → 200 `{ items: [] }` | 200, `application/xml`, empty `<urlset/>` |
| Search server returns 503 | Mock → 503 | 502 |
| Search server hangs > 10 s | Mock → never respond | 504 |
---
## Complexity Tracking
> No violations to justify. All gates pass. No entries required.