feat: content fetch, sitemap fixes, remove oidcAuthFlow

- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-23 16:40:06 -05:00
parent d50f041488
commit f840587e5e
29 changed files with 1998 additions and 352 deletions

View File

@@ -6,7 +6,7 @@
## Content Quality
- [x] No implementation details (languages, frameworks, APIs) — *Note: FR-008/FR-009 reference `xmlBuilder` and the VM sandbox constraint. These are explicitly mandated architectural constraints from the feature description, not incidental implementation choices; they belong in the spec as requirements.*
- [x] No implementation details (languages, frameworks, APIs) — *Note: FR-008/FR-009 reference `xmlbuilder2` and the VM sandbox constraint. These are explicitly mandated architectural constraints from the feature description, not incidental implementation choices; they belong in the spec as requirements.*
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders — *Technical terms (Redis, OIDC) are domain-specific to this integration; they cannot be abstracted away without losing meaning.*
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria, Assumptions all present
@@ -18,7 +18,7 @@
- [x] Success criteria are measurable — SC-001 (5-second response time), SC-002 (zero silent drops), SC-003 (zero regressions), SC-004 (XSD validation), SC-005 (10-second error bound)
- [x] Success criteria are technology-agnostic — SC-004 references the public Sitemaps XSD standard, not an internal tool
- [x] All acceptance scenarios are defined — 8 acceptance scenarios across 3 user stories
- [x] Edge cases are identified — 5 edge cases documented (expired token, missing `vkm:url`, large result sets, missing settings, missing `xmlBuilder`)
- [x] Edge cases are identified — 5 edge cases documented (expired token, missing `vkm:url`, large result sets, missing settings, missing `xmlbuilder2`)
- [x] Scope is clearly bounded — v1 scope explicitly excludes pagination, multi-tenant, and optional sitemap elements
- [x] Dependencies and assumptions identified — 8 assumptions documented

View File

@@ -156,7 +156,7 @@ Incoming GET /…/sitemap.xml
(skip empty vkm:url)
|
v
Build SitemapDocument (xmlBuilder)
Build SitemapDocument (xmlbuilder2)
|
v
200 OK

View File

@@ -19,7 +19,7 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
## Technical Context
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage
**Testing**: Node.js built-in test runner (`node:test`); no external test framework
**Target Platform**: Linux server / container (HTTP proxy adapter)
@@ -28,7 +28,7 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
**Constraints**:
- Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`)
- No references to `config`, `global.config`, or `process.env` in proxy script
- XML built exclusively with the injected `xmlBuilder` (FR-008)
- XML built exclusively with the injected `xmlbuilder2` (FR-008)
- No new npm packages; no new source files (monolithic architecture — Section I of constitution)
**Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1)
@@ -42,12 +42,12 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
|---|---|---|---|
| I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files |
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlBuilder`, `req`, `res` used |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlbuilder2`, `req`, `res` used |
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` |
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) |
| I.III | No new files challenged | ✅ PASS | No new files in `src/` |
| I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` |
| I.V | `xmlBuilder` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| I.V | `xmlbuilder2` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` |
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
| VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) |
@@ -103,7 +103,7 @@ settings JSON, and the existing test files are modified.
|---|---|---|
| R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation |
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern |
| R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 |
| R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` |
@@ -183,7 +183,7 @@ async function sitemapFlow() {
const items = searchResponse.data.items ?? searchResponse.data ?? [];
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];

View File

@@ -118,9 +118,9 @@ Contract tests live in `tests/contract/proxy-http.test.js`.
- **No new files**: All new logic is added directly to
`src/proxyScripts/kmeContentSourceAdapter.js` (monolithic architecture constraint).
- **No new dependencies**: `xmlbuilder2` is already in `package.json` and injected into the
VM context as `xmlBuilder`.
VM context as `xmlbuilder2`.
- **Token reuse**: The sitemap flow reuses the existing Redis `hGet`/token-refresh pattern —
no separate auth logic.
- **VM isolation**: The proxy script runs in a `vm.createContext` sandbox. It has access only
to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlBuilder`,
to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlbuilder2`,
`kme_CSA_settings`, `req`, `res`, `console`, `URLSearchParams`, `URL`, `crypto`).

View File

@@ -72,11 +72,11 @@ adaption to the real shape is a one-line change.
## R-003: xmlbuilder2 `create()` API for Sitemap XML
**Decision**: Use the `xmlBuilder` context variable (which is `xmlbuilder2`'s `create` function)
**Decision**: Use the `xmlbuilder2` context variable (which is `xmlbuilder2`'s `create` function)
with the following call chain:
```javascript
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
urlset.ele('url').ele('loc').txt(locValue).up().up();
@@ -98,7 +98,7 @@ Unit tests will assert this.
**Alternatives considered**:
- Manual string concatenation: rejected (error-prone escaping, violates FR-008 which requires
xmlBuilder).
xmlbuilder2).
- `xmlbuilder` (v1/v2): not the installed package; rejected.
---
@@ -184,7 +184,7 @@ if (!vkmUrl) continue; // omit silently
|---|---|---|
| R-001 | Token reuse | Inline shared token-fetch logic; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API |
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset', {...})…doc.end({})` |
| R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset', {...})…doc.end({})` |
| R-004 | Error mapping | Reuse existing `err.response` / `err.code` pattern |
| R-005 | Settings validation | Explicit `requiredSitemapFields` guard → HTTP 500 |
| R-006 | `loc` construction | `proxyBaseUrl?kmeURL=encodeURIComponent(vkm:url)` |

View File

@@ -59,7 +59,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- What happens when a knowledge item has a missing or empty `vkm:url` field? That item must be omitted from the sitemap rather than producing a malformed `<loc>` entry.
- What happens when the search API returns a very large number of results? The sitemap should include all returned results; pagination handling is out of scope for v1 (assumption documented below).
- What happens when `searchApiBaseUrl`, `tenant`, or `proxyBaseUrl` are missing from the settings file? The adapter must respond with a `500` error and a descriptive message.
- What happens when `xmlBuilder` is not available in the VM context? The adapter must respond with a `500` error.
- What happens when `xmlbuilder2` is not available in the VM context? The adapter must respond with a `500` error.
## Requirements *(mandatory)*
@@ -72,7 +72,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- **FR-005**: Each `<loc>` value MUST be constructed as `<proxyBaseUrl>?kmeURL=<vkm:url value>`, where `proxyBaseUrl` is taken from `kme_CSA_settings.proxyBaseUrl`.
- **FR-006**: Knowledge items with a missing or empty `vkm:url` field MUST be silently omitted from the sitemap.
- **FR-007**: The sitemap response MUST be returned with the HTTP header `Content-Type: application/xml`.
- **FR-008**: The XML MUST be built using the `xmlBuilder` utility already available in the VM context — no additional XML libraries may be imported.
- **FR-008**: The XML MUST be built using the `xmlbuilder2` utility already available in the VM context — no additional XML libraries may be imported.
- **FR-009**: The proxy script MUST contain zero `import` or `export` statements and MUST NOT reference `config`, `global.config`, or `process.env`.
- **FR-010**: `kme_CSA_settings.json` MUST be extended with three new fields: `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl`.
- **FR-011**: If any required settings field (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`) is absent at runtime, the adapter MUST respond with HTTP 500 and a descriptive error message.
@@ -100,7 +100,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- The KME Knowledge Search Service returns all relevant knowledge items in a single response for v1; pagination of search results is out of scope.
- The `vkm:url` field is present at the top level of each item object in the search results array; the exact response envelope shape will be confirmed against the live API during implementation.
- The `xmlBuilder` injected into the VM context exposes a builder API compatible with the existing usage in the project (e.g., `fast-xml-parser` `XMLBuilder` or equivalent).
- The `xmlbuilder2` injected into the VM context exposes a builder API compatible with the existing usage in the project (e.g., `fast-xml-parser` `XMLBuilder` or equivalent).
- No additional `<lastmod>`, `<changefreq>`, or `<priority>` elements are required in sitemap entries for v1; only `<loc>` is mandatory.
- The proxy adapter is deployed behind a reverse proxy or load balancer that handles TLS termination; the `proxyBaseUrl` in settings reflects the externally accessible HTTPS URL.
- A single tenant is configured per adapter deployment; multi-tenant sitemap generation is out of scope.

View File

@@ -42,7 +42,7 @@ OIDC auth flow share a clean entry point. **No user-story work can begin until t
## Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP
**Goal**: A consumer calling `GET /sitemap.xml` receives a well-formed XML Sitemap containing
one `<url>/<loc>` per knowledge item, built via `xmlBuilder`, with `Content-Type: application/xml`.
one `<url>/<loc>` per knowledge item, built via `xmlbuilder2`, with `Content-Type: application/xml`.
**Independent Test**: `curl http://localhost:3000/sitemap.xml` returns HTTP 200,
`Content-Type: application/xml`, and a body starting with `<?xml` containing `<urlset>`.
@@ -64,7 +64,7 @@ one `<url>/<loc>` per knowledge item, built via `xmlBuilder`, with `Content-Type
- [X] T007 [US1] Add token fetch and search API call to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: call `const token = await getValidToken();` (throws on failure, caught by outer try/catch → 401), then call `const searchResponse = await axios.get(\`${searchApiBaseUrl}/${tenant}\`, { headers: { Authorization: \`OIDC_id_token ${token}\` }, timeout: 10_000 })`, then extract `const items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002)
- [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
- [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
**Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all sitemap happy-path tests.
At this point `GET /sitemap.xml` is fully functional; MVP is deliverable.
@@ -133,7 +133,7 @@ responds with a meaningful 5xx code and a human-readable message within 10 secon
**Purpose**: Constitution compliance, API shape verification, and final test suite green.
- [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlBuilder` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/`
- [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlbuilder2` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/`
- [X] T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call `GET ${searchApiBaseUrl}/${tenant}` manually with `curl -H "Authorization: OIDC_id_token <token>" <searchApiBaseUrl>/<tenant>` and confirm (a) the top-level key holding the items array (`items` vs `results` vs bare array) and (b) that `vkm:url` is a direct string property of each item; update the extraction line `response.data.items ?? response.data` in T007 if the actual shape differs