Merge branch '003-kme-content-fetch'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-23 19:08:06 -05:00
30 changed files with 2238 additions and 360 deletions

View File

@@ -1,7 +1,7 @@
## Active Technologies
- Node.js ≥18, ESM (`"type": "module"`) + `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json` (002-sitemap-generation)
- Node.js ≥18, ESM (`"type": "module"`) + `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json` (002-sitemap-generation)
- Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage (002-sitemap-generation)
## Recent Changes
- 002-sitemap-generation: Added Node.js ≥18, ESM (`"type": "module"`) + `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json`
- 002-sitemap-generation: Added Node.js ≥18, ESM (`"type": "module"`) + `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json`

View File

@@ -1,7 +1,7 @@
<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan at
`specs/002-sitemap-generation/plan.md`
`specs/003-kme-content-fetch/plan.md`
<!-- SPECKIT END -->
## Project Overview
@@ -52,7 +52,7 @@ config/
| `axios` | HTTP client |
| `jwt` | `jsonwebtoken` |
| `uuidv4` | UUID v4 generator |
| `xmlBuilder` | `xmlbuilder2` `create` function |
| `xmlbuilder2` | `xmlbuilder2` `create` function |
| `URLSearchParams`, `URL` | Node.js globals |
| `adapterHelper` | Loaded from `src/globalVariables/adapterHelper.js` (if present) |
| `<name>` | Each JSON/JS file in `src/globalVariables/` (filename → variable name) |

View File

@@ -1,3 +1,3 @@
{
"feature_directory": "specs/002-sitemap-generation"
"feature_directory": "specs/003-kme-content-fetch"
}

View File

@@ -295,7 +295,7 @@ Follow-up TODOs:
-`crypto` - Web Crypto API for randomUUID()
-`axios` - HTTP client for API calls
-`jwt` - JSON Web Token library for authentication
-`xmlBuilder` - XML document builder
-`xmlbuilder2` - XML document builder
-`uuidv4` - UUID generator
-`redis` - Redis client for token caching and shared state
-`adapterHelper` - Helper functions (loaded from src/globalVariables/)
@@ -440,7 +440,7 @@ const globalVMContext = {
axios,
uuidv4,
jwt,
xmlBuilder,
xmlbuilder2,
redis, // Connected Redis client for token caching
};
@@ -501,11 +501,11 @@ script.runInContext(context);
- Package: `jsonwebtoken`
- Injected from: `globalVMContext.jwt`
6. **xmlBuilder** - XML builder/generator
6. **xmlbuilder2** - XML builder/generator
- Purpose: Constructing XML documents programmatically
- Usage: `xmlBuilder({ root: { child: 'value' } })`
- Usage: `xmlbuilder2({ root: { child: 'value' } })`
- Package: `xmlbuilder2` (create function)
- Injected from: `globalVMContext.xmlBuilder`
- Injected from: `globalVMContext.xmlbuilder2`
7. **redis** - Redis client
- Purpose: Token caching and shared state across requests

View File

@@ -11,12 +11,45 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
---
## [0.4.0] - 2026-04-23
### Added
- Sitemap pagination via `hydra:view['hydra:last']`: after the first search page, all subsequent pages are fetched in parallel using the correct 0-based item-index `start` model (`start = size, 2×size, …, lastStart`); when all results fit on one page (`hydra:view` absent) no additional requests are made
- Latest `vkm:datePublished` selection per `SearchResultItem`: when a search result contains multiple content fragments, only the fragment with the most recent `vkm:datePublished` is included in the sitemap; fragments without a date are treated as epoch 0
- Sitemap URL cap: output is limited to 50,000 `<loc>` entries per the [Sitemaps protocol](https://www.sitemaps.org/protocol.html); a `warn` log is emitted when results are truncated
- Full HTML document wrapper for content fetch responses: body is now `<!DOCTYPE html><html><head><title>…</title></head><body>…</body></html>` instead of a bare `articleBody` fragment
- `<title>` element populated from the `vkm:name` field of the fetched article (empty `<title></title>` when `vkm:name` is absent)
### Changed
- `oidcAuthFlow` route removed: requests that do not match `?kmeURL=` or `/sitemap.xml` now return `404 Not Found`
### Fixed
- `proxyBaseUrl` is now derived dynamically from the incoming request (`X-Forwarded-Proto`, `X-Forwarded-Host`, `Host` headers) rather than read from settings, ensuring correct `<loc>` URLs in all deployment environments
---
## [0.3.0] - 2026-04-23
### Added
- `GET /?kmeURL=<upstream-article-url>` content fetch endpoint: fetches a KME article by URL and returns its `vkm:articleBody` as `200 text/html; charset=utf-8`
- `contentFetchFlow()` async function in `kmeContentSourceAdapter.js` — URL routing branch, 9-step implementation: validates `kmeURL` parameter (400 for missing/blank/malformed/non-http), acquires OIDC token via `getValidToken` (502 on failure), fetches upstream article with 10-second timeout, handles all error paths (4xx upstream → 404, 5xx/timeout/network → 502, unparseable body → 502, missing/empty `vkm:articleBody` → 404)
- URL routing updated: `?kmeURL=` present → `contentFetchFlow()`, `/sitemap.xml``sitemapFlow()`, otherwise → `oidcAuthFlow()` (passthrough, FR-012 preserved)
- `extractArticleBody(data)` pure helper in `kmeContentSourceAdapterHelpers.js` — returns `data['vkm:articleBody']` if non-empty non-whitespace string, otherwise `null`; guards against null/non-object input
- Unit test describe blocks in `tests/unit/proxy.test.js`: `extractArticleBody helper` (7 edge-case tests), `US-content-fetch: happy path` (2 tests), `US-content-fetch: input validation` (6 tests), `US-content-fetch: upstream errors` (7 tests), `US-content-fetch: body parsing` (5 tests), `US-content-fetch: passthrough preserved` (1 test)
- Contract tests in `tests/contract/proxy-http.test.js`: `content fetch: happy path` (full round-trip 200 + SC-001 timing), `content fetch: error handling` (upstream 404 → 404, upstream 503 → 502, server hang → 502 within 12s)
---
## [0.2.0] - 2026-04-23
### Added
- `GET /sitemap.xml` endpoint: returns a well-formed XML Sitemap (Sitemaps protocol 0.9) containing one `<url><loc>` per knowledge item from the KME Knowledge Search Service
- `sitemapFlow()` async function in `kmeContentSourceAdapter.js` — settings validation, OIDC token reuse, search API call, XML build via `xmlBuilder`, 10-second timeout, 502/504/500 error responses
- `sitemapFlow()` async function in `kmeContentSourceAdapter.js` — settings validation, OIDC token reuse, search API call, XML build via `xmlbuilder2`, 10-second timeout, 502/504/500 error responses
- `getValidToken()` shared helper extracted from the existing OIDC auth flow — used by both sitemap and non-sitemap paths
- URL routing at IIFE entry point: requests ending in `/sitemap.xml``sitemapFlow()`, all others → `oidcAuthFlow()`
- Three new fields in `src/globalVariables/kme_CSA_settings.json`: `searchApiBaseUrl`, `tenant`, `proxyBaseUrl`

View File

@@ -91,7 +91,7 @@ All dependencies are injected into each request's sandbox:
| `axios` | HTTP client |
| `jwt` | `jsonwebtoken` |
| `uuidv4` | UUID v4 generator |
| `xmlBuilder` | `xmlbuilder2` `create` |
| `xmlbuilder2` | `xmlbuilder2` `create` |
| `redis` | Connected Redis client |
| `URLSearchParams`, `URL` | Node.js globals |
| `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` |

4
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "kme-content-adapter",
"version": "1.0.0",
"version": "0.4.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "kme-content-adapter",
"version": "1.0.0",
"version": "0.4.0",
"license": "ISC",
"dependencies": {
"axios": "^1.13.6",

View File

@@ -1,6 +1,6 @@
{
"name": "kme-content-adapter",
"version": "0.2.0",
"version": "0.4.0",
"description": "HTTP proxy adapter to search and export documents from KME",
"type": "module",
"main": "src/server.js",

View File

@@ -120,7 +120,7 @@ These are injected by `server.js` (`globalVMContext`) and are available to proxy
| `crypto` | Web Crypto API | No UUID or crypto ops in this script |
| `jwt` | jsonwebtoken | No JWT signing/verification needed |
| `uuidv4` | uuid function | No request-ID generation needed |
| `xmlBuilder` | xmlbuilder2 | No XML output |
| `xmlbuilder2` | xmlbuilder2 | No XML output |
---
@@ -136,7 +136,7 @@ const globalVMContext = {
axios, // ← used by proxy.js
uuidv4,
jwt,
xmlBuilder,
xmlbuilder2,
redis, // ← used by proxy.js (token cache)
};

View File

@@ -72,7 +72,7 @@ If the token service rejects the credentials or is unreachable, the proxy script
- **FR-008**: The proxy script MUST respond with HTTP `401 Unauthorized` and a descriptive plain-text message when authentication fails (invalid credentials, unreachable token service, or malformed response).
- **FR-009**: The proxy script file (`src/proxyScripts/proxy.js`) MUST contain zero `import` or `export` statements, as it executes inside a Node.js VM sandbox.
- **FR-010**: The proxy script MUST NOT reference `config`, `global.config`, or `process.env` for any configuration or credential values.
- **FR-011**: The proxy script MUST use only dependencies injected via the VM context: `axios`, `console`, `crypto`, `jwt`, `uuidv4`, `xmlBuilder`, `URLSearchParams`, `URL`, and `redis`.
- **FR-011**: The proxy script MUST use only dependencies injected via the VM context: `axios`, `console`, `crypto`, `jwt`, `uuidv4`, `xmlbuilder2`, `URLSearchParams`, `URL`, and `redis`.
- **FR-012**: `req` and `res` must be treated as the injected Node.js HTTP request and response objects; no other I/O mechanism may be used.
- **FR-013**: When two or more concurrent requests arrive while no valid token is cached, only one token fetch request MUST be made to the token service; all other requests MUST queue and share the result of that single fetch.
- **FR-014**: The token POST request to the OIDC service MUST apply a 5-second HTTP timeout; a timeout error MUST be treated as an authentication failure (FR-008).
@@ -107,4 +107,4 @@ If the token service rejects the credentials or is unreachable, the proxy script
- The `scope` value `openid tags content_entitlements` is fixed and not expected to vary per request.
- The caller of the proxy endpoint does not require the actual OIDC token in the response body; the `200 OK / Authorized` reply is sufficient to confirm authentication succeeded.
- Error responses should be plain text to keep the script simple; no structured error body format is required.
- The VM context is always initialised with all listed dependencies (`axios`, `console`, `crypto`, `jwt`, `uuidv4`, `xmlBuilder`, `URLSearchParams`, `URL`) before the script executes.
- The VM context is always initialised with all listed dependencies (`axios`, `console`, `crypto`, `jwt`, `uuidv4`, `xmlbuilder2`, `URLSearchParams`, `URL`) before the script executes.

View File

@@ -6,7 +6,7 @@
## Content Quality
- [x] No implementation details (languages, frameworks, APIs) — *Note: FR-008/FR-009 reference `xmlBuilder` and the VM sandbox constraint. These are explicitly mandated architectural constraints from the feature description, not incidental implementation choices; they belong in the spec as requirements.*
- [x] No implementation details (languages, frameworks, APIs) — *Note: FR-008/FR-009 reference `xmlbuilder2` and the VM sandbox constraint. These are explicitly mandated architectural constraints from the feature description, not incidental implementation choices; they belong in the spec as requirements.*
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders — *Technical terms (Redis, OIDC) are domain-specific to this integration; they cannot be abstracted away without losing meaning.*
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria, Assumptions all present
@@ -18,7 +18,7 @@
- [x] Success criteria are measurable — SC-001 (5-second response time), SC-002 (zero silent drops), SC-003 (zero regressions), SC-004 (XSD validation), SC-005 (10-second error bound)
- [x] Success criteria are technology-agnostic — SC-004 references the public Sitemaps XSD standard, not an internal tool
- [x] All acceptance scenarios are defined — 8 acceptance scenarios across 3 user stories
- [x] Edge cases are identified — 5 edge cases documented (expired token, missing `vkm:url`, large result sets, missing settings, missing `xmlBuilder`)
- [x] Edge cases are identified — 5 edge cases documented (expired token, missing `vkm:url`, large result sets, missing settings, missing `xmlbuilder2`)
- [x] Scope is clearly bounded — v1 scope explicitly excludes pagination, multi-tenant, and optional sitemap elements
- [x] Dependencies and assumptions identified — 8 assumptions documented

View File

@@ -156,7 +156,7 @@ Incoming GET /…/sitemap.xml
(skip empty vkm:url)
|
v
Build SitemapDocument (xmlBuilder)
Build SitemapDocument (xmlbuilder2)
|
v
200 OK

View File

@@ -19,7 +19,7 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
## Technical Context
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlBuilder`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Primary Dependencies**: `axios` (HTTP), `redis` (token cache), `xmlbuilder2` (XML — already injected as `xmlbuilder2`), `uuid`, `jsonwebtoken` — all already in `package.json`
**Storage**: Redis read/write (`hGet`/`hSet`) for OIDC token cache only — no new storage
**Testing**: Node.js built-in test runner (`node:test`); no external test framework
**Target Platform**: Linux server / container (HTTP proxy adapter)
@@ -28,7 +28,7 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
**Constraints**:
- Zero `import`/`export` in `kmeContentSourceAdapter.js` (runs in `vm.createContext`)
- No references to `config`, `global.config`, or `process.env` in proxy script
- XML built exclusively with the injected `xmlBuilder` (FR-008)
- XML built exclusively with the injected `xmlbuilder2` (FR-008)
- No new npm packages; no new source files (monolithic architecture — Section I of constitution)
**Scale/Scope**: Single tenant per deployment; all search results in one API call (no pagination, v1)
@@ -42,12 +42,12 @@ to `kme_CSA_settings.json` (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`).
|---|---|---|---|
| I | Monolithic architecture | ✅ PASS | All new code added to `kmeContentSourceAdapter.js`; no new source files |
| I (vm.Script) | Zero imports/exports in proxy script | ✅ PASS | Sitemap logic is inlined; no import statements introduced |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlBuilder`, `req`, `res` used |
| I.0 | No forbidden globals (`config`, `global.config`, `process.env`) | ✅ PASS | Only `kme_CSA_settings`, `redis`, `axios`, `xmlbuilder2`, `req`, `res` used |
| I.I | Business logic in proxy.js | ✅ PASS | Auth, API call, XML generation all in `kmeContentSourceAdapter.js` |
| I.II | Separate files only for allowed categories | ✅ PASS | Settings JSON in `src/globalVariables/` (existing pattern) |
| I.III | No new files challenged | ✅ PASS | No new files in `src/` |
| I.IV | New config in `src/globalVariables/` not `config/default.json` | ✅ PASS | Three fields added to `kme_CSA_settings.json` |
| I.V | `xmlBuilder` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| I.V | `xmlbuilder2` already in `globalVMContext` | ✅ PASS | `xmlbuilder2` `create` already injected; no server.js changes needed |
| II | API-First Design | ✅ PASS | HTTP contract documented in `contracts/sitemap-endpoint.md` |
| III | Test-First Development | ✅ REQUIRED | Unit + contract tests must be written before/alongside implementation |
| VII | No new dependencies | ✅ PASS | All required packages already installed (`xmlbuilder2`, `axios`, `redis`) |
@@ -103,7 +103,7 @@ settings JSON, and the existing test files are modified.
|---|---|---|
| R-001 | Token reuse | Inline shared `getValidToken()` helper in proxy script; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API during implementation |
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset',{xmlns:...})…doc.end({})` — no prettyPrint |
| R-004 | Error mapping | Reuse `err.response` / `err.code === ECONNABORTED\|ERR_CANCELED` pattern |
| R-005 | Settings validation | `requiredSitemapFields` guard before any async work → HTTP 500 |
| R-006 | `loc` construction | `` `${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` `` |
@@ -183,7 +183,7 @@ async function sitemapFlow() {
const items = searchResponse.data.items ?? searchResponse.data ?? [];
// FR-004, FR-005, FR-006, FR-008: Build sitemap XML
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];

View File

@@ -118,9 +118,9 @@ Contract tests live in `tests/contract/proxy-http.test.js`.
- **No new files**: All new logic is added directly to
`src/proxyScripts/kmeContentSourceAdapter.js` (monolithic architecture constraint).
- **No new dependencies**: `xmlbuilder2` is already in `package.json` and injected into the
VM context as `xmlBuilder`.
VM context as `xmlbuilder2`.
- **Token reuse**: The sitemap flow reuses the existing Redis `hGet`/token-refresh pattern —
no separate auth logic.
- **VM isolation**: The proxy script runs in a `vm.createContext` sandbox. It has access only
to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlBuilder`,
to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlbuilder2`,
`kme_CSA_settings`, `req`, `res`, `console`, `URLSearchParams`, `URL`, `crypto`).

View File

@@ -72,11 +72,11 @@ adaption to the real shape is a one-line change.
## R-003: xmlbuilder2 `create()` API for Sitemap XML
**Decision**: Use the `xmlBuilder` context variable (which is `xmlbuilder2`'s `create` function)
**Decision**: Use the `xmlbuilder2` context variable (which is `xmlbuilder2`'s `create` function)
with the following call chain:
```javascript
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
urlset.ele('url').ele('loc').txt(locValue).up().up();
@@ -98,7 +98,7 @@ Unit tests will assert this.
**Alternatives considered**:
- Manual string concatenation: rejected (error-prone escaping, violates FR-008 which requires
xmlBuilder).
xmlbuilder2).
- `xmlbuilder` (v1/v2): not the installed package; rejected.
---
@@ -184,7 +184,7 @@ if (!vkmUrl) continue; // omit silently
|---|---|---|
| R-001 | Token reuse | Inline shared token-fetch logic; branch on URL first |
| R-002 | Search API response shape | Assume `{ items: [...] }`; verify against live API |
| R-003 | xmlbuilder2 API | `xmlBuilder({...}).ele('urlset', {...})…doc.end({})` |
| R-003 | xmlbuilder2 API | `xmlbuilder2({...}).ele('urlset', {...})…doc.end({})` |
| R-004 | Error mapping | Reuse existing `err.response` / `err.code` pattern |
| R-005 | Settings validation | Explicit `requiredSitemapFields` guard → HTTP 500 |
| R-006 | `loc` construction | `proxyBaseUrl?kmeURL=encodeURIComponent(vkm:url)` |

View File

@@ -59,7 +59,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- What happens when a knowledge item has a missing or empty `vkm:url` field? That item must be omitted from the sitemap rather than producing a malformed `<loc>` entry.
- What happens when the search API returns a very large number of results? The sitemap should include all returned results; pagination handling is out of scope for v1 (assumption documented below).
- What happens when `searchApiBaseUrl`, `tenant`, or `proxyBaseUrl` are missing from the settings file? The adapter must respond with a `500` error and a descriptive message.
- What happens when `xmlBuilder` is not available in the VM context? The adapter must respond with a `500` error.
- What happens when `xmlbuilder2` is not available in the VM context? The adapter must respond with a `500` error.
## Requirements *(mandatory)*
@@ -72,7 +72,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- **FR-005**: Each `<loc>` value MUST be constructed as `<proxyBaseUrl>?kmeURL=<vkm:url value>`, where `proxyBaseUrl` is taken from `kme_CSA_settings.proxyBaseUrl`.
- **FR-006**: Knowledge items with a missing or empty `vkm:url` field MUST be silently omitted from the sitemap.
- **FR-007**: The sitemap response MUST be returned with the HTTP header `Content-Type: application/xml`.
- **FR-008**: The XML MUST be built using the `xmlBuilder` utility already available in the VM context — no additional XML libraries may be imported.
- **FR-008**: The XML MUST be built using the `xmlbuilder2` utility already available in the VM context — no additional XML libraries may be imported.
- **FR-009**: The proxy script MUST contain zero `import` or `export` statements and MUST NOT reference `config`, `global.config`, or `process.env`.
- **FR-010**: `kme_CSA_settings.json` MUST be extended with three new fields: `searchApiBaseUrl`, `tenant`, and `proxyBaseUrl`.
- **FR-011**: If any required settings field (`searchApiBaseUrl`, `tenant`, `proxyBaseUrl`) is absent at runtime, the adapter MUST respond with HTTP 500 and a descriptive error message.
@@ -100,7 +100,7 @@ When the KME Knowledge Search Service is unreachable or returns an error, the ad
- The KME Knowledge Search Service returns all relevant knowledge items in a single response for v1; pagination of search results is out of scope.
- The `vkm:url` field is present at the top level of each item object in the search results array; the exact response envelope shape will be confirmed against the live API during implementation.
- The `xmlBuilder` injected into the VM context exposes a builder API compatible with the existing usage in the project (e.g., `fast-xml-parser` `XMLBuilder` or equivalent).
- The `xmlbuilder2` injected into the VM context exposes a builder API compatible with the existing usage in the project (e.g., `fast-xml-parser` `XMLBuilder` or equivalent).
- No additional `<lastmod>`, `<changefreq>`, or `<priority>` elements are required in sitemap entries for v1; only `<loc>` is mandatory.
- The proxy adapter is deployed behind a reverse proxy or load balancer that handles TLS termination; the `proxyBaseUrl` in settings reflects the externally accessible HTTPS URL.
- A single tenant is configured per adapter deployment; multi-tenant sitemap generation is out of scope.

View File

@@ -42,7 +42,7 @@ OIDC auth flow share a clean entry point. **No user-story work can begin until t
## Phase 3: User Story 1 — Search Crawler Discovers KME Content (Priority: P1) 🎯 MVP
**Goal**: A consumer calling `GET /sitemap.xml` receives a well-formed XML Sitemap containing
one `<url>/<loc>` per knowledge item, built via `xmlBuilder`, with `Content-Type: application/xml`.
one `<url>/<loc>` per knowledge item, built via `xmlbuilder2`, with `Content-Type: application/xml`.
**Independent Test**: `curl http://localhost:3000/sitemap.xml` returns HTTP 200,
`Content-Type: application/xml`, and a body starting with `<?xml` containing `<urlset>`.
@@ -64,7 +64,7 @@ one `<url>/<loc>` per knowledge item, built via `xmlBuilder`, with `Content-Type
- [X] T007 [US1] Add token fetch and search API call to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: call `const token = await getValidToken();` (throws on failure, caught by outer try/catch → 401), then call `const searchResponse = await axios.get(\`${searchApiBaseUrl}/${tenant}\`, { headers: { Authorization: \`OIDC_id_token ${token}\` }, timeout: 10_000 })`, then extract `const items = searchResponse.data.items ?? searchResponse.data ?? [];` (per R-002)
- [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
- [X] T008 [US1] Add item mapping, XML build, and HTTP response to `sitemapFlow()` in `src/proxyScripts/kmeContentSourceAdapter.js`: iterate `items`, skip entries where `!item['vkm:url']` (FR-006), for each valid item compute `const loc = \`${proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}\`` (FR-005, R-006); build XML via `const doc = xmlbuilder2({ version: '1.0', encoding: 'UTF-8' }); const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' }); urlset.ele('url').ele('loc').txt(loc).up().up();` (FR-008, R-003); serialise with `const xml = doc.end({ prettyPrint: false })`; respond `res.writeHead(200, { 'Content-Type': 'application/xml' }); res.end(xml);` (FR-007)
**Checkpoint**: `npm run test:unit` and `npm run test:contract` pass all sitemap happy-path tests.
At this point `GET /sitemap.xml` is fully functional; MVP is deliverable.
@@ -133,7 +133,7 @@ responds with a meaningful 5xx code and a human-readable message within 10 secon
**Purpose**: Constitution compliance, API shape verification, and final test suite green.
- [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlBuilder` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/`
- [X] T014 [P] Verify `src/proxyScripts/kmeContentSourceAdapter.js` constitution compliance: run `grep -n 'import\|export\|process\.env\|global\.config\b\|config\.' src/proxyScripts/kmeContentSourceAdapter.js` and confirm zero matches (FR-009, Constitution §I); confirm `xmlbuilder2` is the sole XML-building mechanism (FR-008); confirm no new files were created in `src/`
- [X] T015 [P] Verify live search API response shape against R-002 assumption: using a test token, call `GET ${searchApiBaseUrl}/${tenant}` manually with `curl -H "Authorization: OIDC_id_token <token>" <searchApiBaseUrl>/<tenant>` and confirm (a) the top-level key holding the items array (`items` vs `results` vs bare array) and (b) that `vkm:url` is a direct string property of each item; update the extraction line `response.data.items ?? response.data` in T007 if the actual shape differs

View File

@@ -0,0 +1,34 @@
# Specification Quality Checklist: KME Article Content Fetch
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2025-07-15
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
- All checklist items pass. Spec is ready for `/speckit.clarify` or `/speckit.plan`.

View File

@@ -0,0 +1,201 @@
# HTTP Contract: Content Fetch Route
**Feature**: `003-kme-content-fetch`
**File**: `specs/003-kme-content-fetch/contracts/http-content-fetch.md`
This document defines the HTTP request/response contract for the content-fetch route exposed by the
KME Content Adapter proxy.
---
## Route
```
GET {proxy-base-url}?kmeURL={encoded-article-url}
```
The proxy detects the content-fetch route when:
- The incoming URL does **not** end in `/sitemap.xml`, AND
- The query string contains a `kmeURL` parameter (present, regardless of value)
Requests without `kmeURL` (and not a sitemap request) are routed to the existing auth-check
passthrough (returns 200 "Authorized").
---
## Request
### Method
`GET`
### Query Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `kmeURL` | Yes | The verbatim `vkm:url` value from the KME Search API response. Must be a well-formed absolute `http` or `https` URL. Percent-encoded characters are decoded once (standard URL decoding) — double-encoding must not occur. |
### Headers
None required on the inbound request. The proxy adds its own `Authorization` header on the upstream
request.
### Example Request
```
GET /?kmeURL=https%3A%2F%2Fcontent.kme.example%2Farticles%2F123 HTTP/1.1
Host: proxy.example.com
```
---
## Responses
### 200 OK — Article HTML Body
The article was successfully fetched and `vkm:articleBody` was extracted.
```
HTTP/1.1 200 OK
Content-Type: text/html
<p>Article content here...</p>
```
| Field | Value |
|-------|-------|
| Status | `200` |
| `Content-Type` | `text/html` |
| Body | Raw HTML string from `vkm:articleBody` field of the KME Content Service JSON-LD response. Not sanitised or transformed. |
---
### 400 Bad Request — Invalid `kmeURL`
Returned when `kmeURL` is absent, empty, whitespace-only, or not a well-formed absolute http/https URL.
No upstream request is made.
```
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Bad Request: kmeURL parameter is required
```
```
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Bad Request: kmeURL must be a well-formed absolute http/https URL
```
| Trigger | Response body |
|---------|---------------|
| `kmeURL` absent, empty, or whitespace | `Bad Request: kmeURL parameter is required` |
| `kmeURL` present but malformed or non-http/https | `Bad Request: kmeURL must be a well-formed absolute http/https URL` |
---
### 404 Not Found — Article Not Found
Returned when the upstream KME Content Service returns a 4xx response for the article URL, or when
the upstream response does not contain a non-empty `vkm:articleBody`.
```
HTTP/1.1 404 Not Found
Content-Type: text/plain
Not Found: article not found at upstream
```
```
HTTP/1.1 404 Not Found
Content-Type: text/plain
Not Found: article body not present in upstream response
```
| Trigger | Response body |
|---------|---------------|
| Upstream 4xx HTTP response | `Not Found: article not found at upstream` |
| `vkm:articleBody` absent, null, or empty string | `Not Found: article body not present in upstream response` |
---
### 500 Internal Server Error — Proxy Configuration Error
Returned when a required OIDC setting is missing from `kme_CSA_settings`. Indicates a proxy
deployment/configuration issue.
```
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain
Configuration error: missing required field: tokenUrl
```
---
### 502 Bad Gateway — Upstream or Token Failure
Returned for any upstream connectivity, protocol, or data error, and for token acquisition failure.
```
HTTP/1.1 502 Bad Gateway
Content-Type: text/plain
Bad Gateway: token acquisition failed
```
| Trigger | Response body |
|---------|---------------|
| OIDC token acquisition failure | `Bad Gateway: token acquisition failed` |
| Upstream request timeout (`ECONNABORTED`/`ERR_CANCELED`) | `Bad Gateway: upstream request timed out` |
| Upstream 5xx HTTP response | `Bad Gateway: upstream error HTTP {status}` |
| Network-level error (no HTTP response) | `Bad Gateway: {error message}` |
| Upstream response body is not valid JSON | `Bad Gateway: unparseable response from upstream` |
| Upstream response body is not an object | `Bad Gateway: unexpected response from upstream` |
---
## Upstream Request (Proxy → KME Content Service)
The proxy makes a single GET request to the verbatim `kmeURL` value.
```
GET {kmeURL} HTTP/1.1
Authorization: OIDC_id_token {id_token}
```
| Field | Value |
|-------|-------|
| Method | `GET` |
| URL | Verbatim value of `kmeURL` query parameter — no manipulation, no re-encoding |
| `Authorization` | `OIDC_id_token {id_token}` where `id_token` is from `getValidToken()` |
| Timeout | 10 000 ms (10 seconds) |
---
## Error Mapping Summary
```
kmeURL absent/empty → 400
kmeURL malformed / non-http(s) → 400
Missing OIDC config → 500
Token acquisition failure → 502
Upstream 4xx → 404
Upstream 5xx → 502
Upstream timeout → 502
Network error → 502
Unparseable response body → 502
vkm:articleBody absent/null/empty → 404
Success → 200 text/html
```
---
## Non-regression: Existing Routes
This feature does not change the behaviour of existing routes:
| Route | Behaviour |
|-------|-----------|
| URL ends in `/sitemap.xml` | Sitemap flow (unchanged) |
| No `kmeURL`, not sitemap | Auth-check passthrough → 200 "Authorized" (unchanged) |

View File

@@ -0,0 +1,170 @@
# Data Model: KME Article Content Fetch (003)
**Phase 1 output for `003-kme-content-fetch`**
---
## Entities
### 1. KME Article Content
Represents a single article fetched from the KME Content Service.
| Field | Type | Source | Notes |
|-------|------|--------|-------|
| `vkm:url` | `string` | KME Content Service JSON-LD | Identifies the article; used as the fetch target (`kmeURL` param) |
| `vkm:articleBody` | `string \| null` | KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty |
**Validation rules**:
- `vkm:articleBody` must be a non-empty, non-whitespace string to constitute a valid article body.
- Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.
**State transitions**: None. This is a read-only fetch — no mutations, no lifecycle.
---
### 2. OIDC Token
Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.
| Field | Type | Storage | Notes |
|-------|------|---------|-------|
| `id_token` | `string` | Redis hash `authorization:token` | The OIDC id_token value |
| `expiry` | `number` (Unix epoch, seconds) | Redis hash `authorization:expiry` | Expiry timestamp; compared to `Date.now() / 1000` |
**Validation rules**:
- Token is valid if `cachedToken !== null && Date.now() / 1000 < expiry`.
- Managed exclusively by `getValidToken()` in `kmeContentSourceAdapterHelpers.js`. Not modified by `contentFetchFlow()`.
**Managed by**: `getValidToken()` (existing helper — unmodified).
---
### 3. Proxy Request
Incoming HTTP request received by the adapter, carrying routing signals and parameters.
| Field | Type | Source | Notes |
|-------|------|--------|-------|
| `req.url` | `string` | Node.js `http.IncomingMessage` | Relative path + query string, e.g. `/?kmeURL=https://...` |
| `req.method` | `string` | Node.js `http.IncomingMessage` | Always `GET` for content-fetch flow |
| `kmeURL` (extracted) | `string` | `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` | The verbatim target URL for upstream fetch |
**Validation rules for `kmeURL`**:
1. **Absent or empty**: `!kmeURL.trim()` → 400 Bad Request (FR-007)
2. **Malformed or non-absolute**: `new URL(kmeURL)` throws, or protocol is not `http:`/`https:` → 400 Bad Request (FR-008)
3. **Valid**: passes both guards → proceed to token acquisition + upstream fetch
---
## Data Flow
```
Incoming request (req.url contains ?kmeURL=...)
Extract kmeURL from query string
(new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
┌── Validate kmeURL ──────────────────────────────────────┐
│ absent/empty? ──────────────────────────────► 400 │
│ malformed/non-https? ────────────────────────► 400 │
└─────────────────────────────────────────────────────────┘
│ valid
getValidToken() → OIDC token (from Redis cache or fresh fetch)
│ token fetch failed? ──────────────────► 502
axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
│ timeout? ──────────────────► 502
│ upstream 4xx? ──────────────────► 404
│ upstream 5xx? ──────────────────► 502
│ network error? ──────────────────► 502
Parse response.data as JSON-LD object
│ unparseable? ──────────────────► 502
│ non-object? ──────────────────► 502
extractArticleBody(data) → vkm:articleBody string or null
│ null (absent/empty/whitespace)? ─────► 404
res.writeHead(200, { 'Content-Type': 'text/html' })
res.end(articleBody)
```
---
## Helper: `extractArticleBody(data)`
**Location**: `src/globalVariables/kmeContentSourceAdapterHelpers.js`
**Type**: Pure function — no side effects, no state, no injected globals required
**Added to exports**: `return { ..., extractArticleBody }`
**Signature**:
```javascript
function extractArticleBody(data) string | null
```
**Input/output contract**:
| Input | Output |
|-------|--------|
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
| `{ 'vkm:articleBody': '' }` | `null` |
| `{ 'vkm:articleBody': ' ' }` | `null` |
| `{ 'vkm:articleBody': null }` | `null` |
| `{}` (field absent) | `null` |
| `null` | `null` |
| `'a string'` (non-object) | `null` |
**Implementation**:
```javascript
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
const body = data['vkm:articleBody'];
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
return body;
}
```
---
## KME Content Service Response Shape
The KME Content Service returns a JSON-LD document. Only `vkm:articleBody` is consumed by this
feature. Example:
```json
{
"@context": "https://vocabs.kme.example/context.jsonld",
"@type": "vkm:Article",
"vkm:url": "https://content.kme.example/articles/123",
"vkm:articleBody": "<p>Article content here...</p>",
"vkm:title": "Example Article"
}
```
All other fields are ignored by `extractArticleBody`. The proxy passes the raw HTML string of
`vkm:articleBody` directly as the response body — no transformation, sanitisation, or re-encoding.
---
## Settings Used (from `kme_CSA_settings`)
The content-fetch flow reads the same OIDC settings as `oidcAuthFlow` and `sitemapFlow`:
| Field | Purpose |
|-------|---------|
| `tokenUrl` | OIDC token endpoint |
| `username` | OIDC username credential |
| `password` | OIDC password credential |
| `clientId` | OIDC client identifier |
| `scope` | OIDC requested scope |
These are validated via `kmeContentSourceAdapterHelpers.validateSettings()` before calling
`getValidToken()`. Missing fields produce a 500 Configuration Error response.

View File

@@ -0,0 +1,330 @@
# Implementation Plan: KME Article Content Fetch
**Branch**: `003-kme-content-fetch` | **Date**: 2025-07-15 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `specs/003-kme-content-fetch/spec.md`
## Summary
Add a new `contentFetchFlow()` to `src/proxyScripts/kmeContentSourceAdapter.js` that handles
requests carrying a `?kmeURL=` query parameter. The flow validates the parameter, obtains an OIDC
token via the existing `getValidToken()`, performs a GET request to the `kmeURL` with
`Authorization: OIDC_id_token {token}`, extracts `vkm:articleBody` from the JSON-LD response, and
returns it as `text/html`. A new pure helper `extractArticleBody(data)` is added to
`src/globalVariables/kmeContentSourceAdapterHelpers.js`. No new files, no new npm dependencies.
## Technical Context
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
**Primary Dependencies**: `axios ^1.13` (HTTP client, already in context), `redis ^5` (token cache, injected), `xmlbuilder2 ^4` (sitemap, unrelated to this feature)
**Storage**: Redis (token cache only — managed by `getValidToken()`, not modified by this feature)
**Testing**: Node.js built-in test runner (`node:test`) — `npm run test:unit`, `npm run test:contract`
**Target Platform**: Node.js server (Linux/macOS); proxy script executed inside `vm.createContext` per request
**Project Type**: HTTP proxy adapter — monolithic VM-sandbox architecture
**Performance Goals**: End-to-end response ≤11 s (10 s upstream timeout + 1 s proxy overhead) per SC-001
**Constraints**: Zero new imports/exports in VM sandbox files; no new npm dependencies; no new `src/` files
**Scale/Scope**: Two files modified (`kmeContentSourceAdapter.js`, `kmeContentSourceAdapterHelpers.js`); new unit tests in `tests/unit/proxy.test.js`; new contract tests in `tests/contract/proxy-http.test.js`
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
| Gate | Status | Notes |
|------|--------|-------|
| All business logic stays in `src/proxyScripts/kmeContentSourceAdapter.js` | ✅ PASS | `contentFetchFlow()` lives entirely in the adapter file |
| Zero `import`/`export` in VM sandbox file | ✅ PASS | No imports added; all dependencies via injected context |
| `extractArticleBody` is a pure utility → helper file | ✅ PASS | No state, no API calls — qualifies for `kmeContentSourceAdapterHelpers.js` |
| No new files in `src/` | ✅ PASS | Only two existing files are modified |
| No new npm dependencies | ✅ PASS | `axios` and `URL`/`URLSearchParams` already available in context |
| Helpers file uses literal function body pattern | ✅ PASS | New helper added before existing `return { ... }` block |
| Authentication (`getValidToken`) stays in proxy script (called from adapter, not moved) | ✅ PASS | `getValidToken()` is invoked from `contentFetchFlow()` in the adapter |
**Post-design re-check**: All gates pass. No constitutional violations. No complexity tracking required.
## Project Structure
### Documentation (this feature)
```text
specs/003-kme-content-fetch/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── contracts/
│ └── http-content-fetch.md # HTTP request/response contract
└── tasks.md # Phase 2 output (/speckit.tasks — NOT created here)
```
### Source Code (repository root)
```text
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # MODIFIED: add contentFetchFlow(), update routing
├── globalVariables/
│ └── kmeContentSourceAdapterHelpers.js # MODIFIED: add extractArticleBody()
├── logger.js # unchanged
└── server.js # unchanged
tests/
├── unit/
│ └── proxy.test.js # MODIFIED: add content-fetch describe blocks
└── contract/
└── proxy-http.test.js # MODIFIED: add content-fetch contract tests
```
**Structure Decision**: Single-project monolith. Two existing source files modified; two existing test
files extended. No new files in `src/`. The VM sandbox pattern and helper file pattern are preserved
exactly as documented in the constitution.
---
## Phase 0: Research Findings → `research.md`
See [research.md](research.md) for full decision log. Summary:
- **URL parameter extraction**: `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` — confirmed safe from VM context research; `URL` is injected at server.js line 19.
- **URL validation**: `new URL(kmeURL)` + protocol check in try/catch — cleanly handles FR-007/FR-008 in one guard.
- **Axios error handling**: Confirmed `ECONNABORTED`/`ERR_CANCELED` for timeout; `err.response.status` available for all HTTP errors; JSON auto-parsed when `Content-Type: application/json`.
- **JSON-LD parsing**: `response.data` is an object when axios auto-parses; fallback `JSON.parse()` needed for non-JSON content-types; non-object → 502.
- **No unknowns remaining**: All NEEDS CLARIFICATION resolved. Research complete.
---
## Phase 1: Design
### Routing Change (`kmeContentSourceAdapter.js`)
Add a new `else if` branch between the existing sitemap check and the `oidcAuthFlow` fallback:
```javascript
// Entry point — URL routing
try {
if (req.url.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) {
await contentFetchFlow(); // ← NEW
} else {
await oidcAuthFlow();
}
} catch (err) { /* existing outer catch → 401 */ }
```
`contentFetchFlow()` is fully self-contained — all errors are caught internally and never propagate to the outer catch.
### `contentFetchFlow()` — Complete Logic
```javascript
async function contentFetchFlow() {
// Step 1: Extract kmeURL (FR-001, FR-002)
const kmeURL = new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? '';
// Step 2: Validate — absent / empty (FR-007)
if (!kmeURL.trim()) {
res.writeHead(400, { 'Content-Type': 'text/plain' });
res.end('Bad Request: kmeURL parameter is required');
return;
}
// Step 3: Validate — well-formed absolute http/https URL (FR-008)
try {
const u = new URL(kmeURL);
if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error();
} catch {
res.writeHead(400, { 'Content-Type': 'text/plain' });
res.end('Bad Request: kmeURL must be a well-formed absolute http/https URL');
return;
}
// Step 4: Validate OIDC settings (config guard, returns 500 for missing config)
const missingField = kmeContentSourceAdapterHelpers.validateSettings(
kme_CSA_settings,
['tokenUrl', 'username', 'password', 'clientId', 'scope'],
);
if (missingField) {
console.error({ message: 'Content fetch: config error', missingField });
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Configuration error: missing required field: ' + missingField);
return;
}
// Step 5: Obtain OIDC token (FR-003, FR-011)
let token;
try {
token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
} catch (tokenErr) {
console.error({ message: 'Content fetch: token acquisition failed', error: tokenErr.message });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: token acquisition failed');
return;
}
// Step 6: GET kmeURL verbatim with auth header (FR-002, FR-003, FR-004)
let response;
try {
console.debug({ message: 'Content fetch: fetching article', kmeURL });
response = await axios.get(kmeURL, {
headers: { Authorization: `OIDC_id_token ${token}` },
timeout: 10000,
});
} catch (fetchErr) {
if (fetchErr.code === 'ECONNABORTED' || fetchErr.code === 'ERR_CANCELED') {
console.error({ message: 'Content fetch: upstream timeout', code: fetchErr.code });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: upstream request timed out');
} else if (fetchErr.response) {
const status = fetchErr.response.status;
console.error({ message: 'Content fetch: upstream HTTP error', status });
if (status >= 400 && status < 500) {
res.writeHead(404, { 'Content-Type': 'text/plain' });
res.end('Not Found: article not found at upstream');
} else {
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: upstream error HTTP ' + status);
}
} else {
console.error({ message: 'Content fetch: network error', error: fetchErr.message });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: ' + fetchErr.message);
}
return;
}
// Step 7: Parse body — handle non-JSON content-type (FR-005, FR-010)
let data = response.data;
if (typeof data === 'string') {
try {
data = JSON.parse(data);
} catch {
console.error({ message: 'Content fetch: unparseable response body', kmeURL });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: unparseable response from upstream');
return;
}
}
if (typeof data !== 'object' || data === null) {
console.error({ message: 'Content fetch: unexpected non-object response', kmeURL });
res.writeHead(502, { 'Content-Type': 'text/plain' });
res.end('Bad Gateway: unexpected response from upstream');
return;
}
// Step 8: Extract vkm:articleBody (FR-005, FR-009)
const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data);
if (!articleBody) {
console.error({ message: 'Content fetch: vkm:articleBody absent or empty', kmeURL });
res.writeHead(404, { 'Content-Type': 'text/plain' });
res.end('Not Found: article body not present in upstream response');
return;
}
// Step 9: Return article HTML (FR-006)
console.info({ message: 'Content fetch: article fetched successfully', kmeURL });
res.writeHead(200, { 'Content-Type': 'text/html' });
res.end(articleBody);
}
```
### `extractArticleBody(data)` — New Helper
Add to `kmeContentSourceAdapterHelpers.js` before the existing `return { ... }` block:
```javascript
/**
* Extracts the vkm:articleBody string from a KME Content Service JSON-LD response.
* Returns null if the field is absent, null, not a string, or an empty/whitespace string.
* @param {object} data parsed JSON-LD response from the KME Content Service
* @returns {string|null}
*/
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
const body = data['vkm:articleBody'];
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
return body;
}
```
Update the `return { ... }` at the bottom of the helpers file to export the new function:
```javascript
return {
validateSettings,
extractHydraItems,
buildSitemapXml,
getValidToken,
extractArticleBody, // ← NEW
};
```
### Error Response Matrix
| Condition | HTTP Status | Response Body |
|-----------|-------------|---------------|
| `kmeURL` absent or empty | 400 | `Bad Request: kmeURL parameter is required` |
| `kmeURL` not a well-formed absolute http/https URL | 400 | `Bad Request: kmeURL must be a well-formed absolute http/https URL` |
| Missing OIDC config field | 500 | `Configuration error: missing required field: {field}` |
| Token acquisition failure | 502 | `Bad Gateway: token acquisition failed` |
| Upstream 4xx response | 404 | `Not Found: article not found at upstream` |
| Upstream 5xx response | 502 | `Bad Gateway: upstream error HTTP {status}` |
| Upstream timeout (`ECONNABORTED`/`ERR_CANCELED`) | 502 | `Bad Gateway: upstream request timed out` |
| Network error (no `err.response`) | 502 | `Bad Gateway: {err.message}` |
| Response body unparseable as JSON | 502 | `Bad Gateway: unparseable response from upstream` |
| Non-object response body | 502 | `Bad Gateway: unexpected response from upstream` |
| `vkm:articleBody` absent, null, or empty | 404 | `Not Found: article body not present in upstream response` |
| Success | 200 `text/html` | article body HTML |
### Test Coverage Plan
**Unit tests** (add to `tests/unit/proxy.test.js`):
| Describe block | Test case | Verifies |
|---------------|-----------|---------|
| `US-content-fetch: happy path` | cached token → 200 HTML body | FR-001, FR-005, FR-006 |
| `US-content-fetch: happy path` | cache miss → token fetch → 200 HTML body | FR-003 |
| `US-content-fetch: happy path` | expired token → refresh → 200 HTML body | FR-003 |
| `US-content-fetch: input validation` | no `kmeURL` → oidcAuthFlow (unchanged 200) | FR-012 |
| `US-content-fetch: input validation` | `kmeURL` empty string → 400 | FR-007 |
| `US-content-fetch: input validation` | `kmeURL` whitespace → 400 | FR-007 |
| `US-content-fetch: input validation` | `kmeURL` relative URL → 400 | FR-008 |
| `US-content-fetch: input validation` | `kmeURL` non-http protocol (`ftp:`) → 400 | FR-008 |
| `US-content-fetch: input validation` | `kmeURL` malformed string → 400 | FR-008 |
| `US-content-fetch: token failure` | `getValidToken` throws → 502 | FR-011 |
| `US-content-fetch: upstream errors` | upstream 404 → 404 | FR-009 |
| `US-content-fetch: upstream errors` | upstream 410 → 404 | FR-009 |
| `US-content-fetch: upstream errors` | upstream 503 → 502 | FR-010 |
| `US-content-fetch: upstream errors` | timeout `ECONNABORTED` → 502 | FR-010 |
| `US-content-fetch: upstream errors` | timeout `ERR_CANCELED` → 502 | FR-010 |
| `US-content-fetch: upstream errors` | network error (no `err.response`) → 502 | FR-010 |
| `US-content-fetch: body parsing` | unparseable string body → 502 | FR-010 |
| `US-content-fetch: body parsing` | `vkm:articleBody` absent → 404 | FR-009 |
| `US-content-fetch: body parsing` | `vkm:articleBody` null → 404 | FR-009 |
| `US-content-fetch: body parsing` | `vkm:articleBody` empty string → 404 | FR-009 |
| `US-content-fetch: body parsing` | `vkm:articleBody` whitespace → 404 | FR-009 |
| `US-content-fetch: passthrough preserved` | no `kmeURL`, not sitemap → 200 'Authorized' | FR-012 |
| `extractArticleBody helper` | returns body string | FR-005 |
| `extractArticleBody helper` | null data → null | FR-005 |
| `extractArticleBody helper` | no `vkm:articleBody` field → null | FR-009 |
| `extractArticleBody helper` | empty string → null | FR-009 |
| `extractArticleBody helper` | whitespace string → null | FR-009 |
**Contract tests** (add to `tests/contract/proxy-http.test.js`):
| Test case | Setup | Verifies |
|-----------|-------|---------|
| valid `kmeURL` → real mock HTTP server returning JSON-LD with `vkm:articleBody` → 200 HTML | real HTTP server, real token server | SC-001, FR-006 |
| real mock server returning 404 → proxy returns 404 | real 404 HTTP server | FR-009 |
| real mock server returning 503 → proxy returns 502 | real 503 HTTP server | FR-010 |
| non-responding server → proxy returns 502 within 12 s | real server that never responds | FR-010 |
### `extractArticleBody` — Edge Case Coverage
| Input | Expected output |
|-------|----------------|
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
| `{ 'vkm:articleBody': '' }` | `null` |
| `{ 'vkm:articleBody': ' ' }` | `null` |
| `{ 'vkm:articleBody': null }` | `null` |
| `{ 'vkm:articleBody': undefined }` (field absent) | `null` |
| `{}` (field absent) | `null` |
| `null` | `null` |
| `'string'` (non-object) | `null` |

View File

@@ -0,0 +1,122 @@
# Quickstart: KME Article Content Fetch (003)
**Feature branch**: `003-kme-content-fetch`
This guide explains how to develop and test the content-fetch feature locally.
---
## Prerequisites
- Node.js ≥18
- A running Redis instance (default: `localhost:6379`)
- `kme_CSA_settings.json` populated (see `src/globalVariables/kme_CSA_settings.json.example`)
- `npm install` already run
---
## Running the Proxy
```bash
npm run dev # start with --watch (auto-restart on changes)
npm start # start with jq log formatting
```
---
## Testing the Content-Fetch Route
### Happy path (requires a real or stubbed KME Content Service)
```bash
curl -s "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123"
# Expected: 200 OK, Content-Type: text/html, body = <p>Article HTML...</p>
```
### Bad input — missing kmeURL
```bash
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/"
# Expected: 200 (auth-check passthrough, no kmeURL → oidcAuthFlow)
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL="
# Expected: 400
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url"
# Expected: 400
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=ftp://example.com/article"
# Expected: 400
```
### Existing sitemap route (unchanged)
```bash
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/sitemap.xml"
# Expected: 200 (sitemap XML) or 502 if KME Search API is unreachable
```
---
## Running Tests
```bash
npm run test:unit # unit tests (mocked axios and Redis)
npm run test:contract # contract tests (real HTTP servers, real Redis fake)
npm test # all tests
```
### Running a single test file
```bash
node --test tests/unit/proxy.test.js
node --test tests/contract/proxy-http.test.js
```
---
## Key Files Modified by This Feature
| File | Change |
|------|--------|
| `src/proxyScripts/kmeContentSourceAdapter.js` | Add `contentFetchFlow()` function; add routing branch `else if (searchParams.has('kmeURL'))` |
| `src/globalVariables/kmeContentSourceAdapterHelpers.js` | Add `extractArticleBody(data)` function; export it in `return { ... }` |
| `tests/unit/proxy.test.js` | Add `describe` blocks for content-fetch unit tests and `extractArticleBody` helper tests |
| `tests/contract/proxy-http.test.js` | Add contract tests for content-fetch (real mock HTTP servers) |
---
## Architecture Reminder
The proxy runs inside a Node.js `vm.Script` / `vm.createContext` sandbox — **zero imports or
exports** are permitted in `kmeContentSourceAdapter.js`. All dependencies arrive via the injected
context:
| Variable | What it is |
|----------|-----------|
| `axios` | HTTP client — `axios.get(url, { headers, timeout })` |
| `kmeContentSourceAdapterHelpers` | Helpers object — `getValidToken()`, `extractArticleBody()`, `validateSettings()` |
| `kme_CSA_settings` | OIDC + service settings from `src/globalVariables/kme_CSA_settings.json` |
| `URL`, `URLSearchParams` | WHATWG URL API — for parsing `req.url` and validating `kmeURL` |
| `console` | Structured logger — `console.info/debug/error({ message, ... })` |
| `req`, `res` | Node.js HTTP request/response |
The helpers file (`kmeContentSourceAdapterHelpers.js`) is a **literal function body** — it ends
with `return { ... }` and contains no `import`/`export` statements. server.js wraps it as an IIFE.
---
## Content-Fetch Flow Summary
```
Request: GET /?kmeURL=https://content.kme.example/articles/123
1. Routing: req.url has ?kmeURL= → contentFetchFlow()
2. Extract kmeURL: new URL(req.url, 'http://localhost').searchParams.get('kmeURL')
3. Validate kmeURL: empty → 400; malformed / non-http(s) → 400
4. getValidToken() → OIDC id_token (from Redis cache or fresh fetch)
5. axios.get(kmeURL, { Authorization: 'OIDC_id_token {token}', timeout: 10000 })
6. Error handling: 4xx upstream → 404; 5xx/timeout/network → 502
7. extractArticleBody(response.data) → vkm:articleBody string or null
8. null → 404; string → 200 text/html
```

View File

@@ -0,0 +1,146 @@
# Research: KME Article Content Fetch (003)
**Phase 0 output for `003-kme-content-fetch`**
All questions resolved from existing codebase investigation and spec analysis. No external research
needed. No NEEDS CLARIFICATION items remain.
---
## Decision 1: URL Parameter Extraction Strategy
**Question**: How should `kmeURL` be extracted from `req.url` (a relative path string) inside the VM
sandbox where only the injected `URL` and `URLSearchParams` globals are available?
**Decision**: Use `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')`.
**Rationale**:
- `req.url` in Node's `http.IncomingMessage` is always a relative path + query string (e.g.,
`/?kmeURL=https://...`), never a full absolute URL.
- `new URL(relPath, 'http://localhost')` resolves the relative path against a dummy base, giving
a full `URL` object whose `.searchParams` can be queried safely. The dummy base is never
contacted.
- `URLSearchParams.get()` decodes percent-encoding exactly once — satisfying the spec edge case
"kmeURL must be used verbatim; double-encoding must not occur" (the decoded value is then passed
directly to `axios.get()` without further manipulation, per FR-002).
- `URL` is confirmed injected: `src/server.js` line 19 (`globalVMContext = { URLSearchParams, URL, ... }`).
**Alternatives considered**:
- Manual string split on `?` and parsing manually — rejected: more error-prone, reinvents WHATWG URL parsing.
- `req.url.includes('?kmeURL=')` for routing check only — acceptable for the routing `has()` check
but `new URL().searchParams.has('kmeURL')` is used for consistency with the extraction call.
---
## Decision 2: kmeURL Validation Strategy
**Question**: How should FR-007 (absent/empty → 400) and FR-008 (malformed/non-absolute → 400) be
implemented in the VM sandbox?
**Decision**: Two-stage guard:
1. `if (!kmeURL.trim())` → 400 (handles absent, empty, whitespace-only)
2. `try { const u = new URL(kmeURL); if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error(); } catch { → 400 }`
**Rationale**:
- `new URL(value)` (no base) throws on any malformed string, satisfying FR-008.
- Protocol check rejects `ftp:`, `file:`, `data:`, `javascript:` etc. while accepting http/https.
- `URL` is already in context; no helper needed — the two-line guard is readable inline.
- The spec assumption states kmeURL values are verbatim `vkm:url` values from KME Search (always
absolute http/https), so protocol-level validation is a lightweight safety check.
**Alternatives considered**:
- Regex validation — rejected: more complex, less accurate than the URL parser itself.
- Validating domain/path structure — rejected: spec explicitly says not to validate beyond
well-formedness (assumption bullet 1 in spec).
---
## Decision 3: Axios Timeout and Error Code Handling
**Question**: What error codes distinguish axios timeout from HTTP errors from network errors?
**Decision** (confirmed from existing code and tests):
- `err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED'` → timeout → 502
- `err.response` populated → HTTP error → inspect `err.response.status`:
- `4xx` → 404 (spec FR-009)
- `5xx` → 502 (spec FR-010)
- No `err.response`, no timeout code → network error → 502
**Rationale**: This pattern already exists verbatim in `sitemapFlow()` (lines 6173 of
`kmeContentSourceAdapter.js`) and is battle-tested in both unit and contract tests. Using the same
pattern for `contentFetchFlow()` is consistent and maintainable.
**Alternatives considered**:
- Using `axios.isAxiosError()` — not needed; the existing pattern already discriminates all cases.
- Mapping all errors to 502 — rejected: spec explicitly requires 404 for upstream 4xx (FR-009).
---
## Decision 4: JSON-LD Response Body Parsing
**Question**: When should the proxy return 502 (unparseable) vs 404 (article not found)?
**Decision**:
1. If `response.data` is already an object (axios auto-parsed JSON) → proceed to extraction.
2. If `response.data` is a string → try `JSON.parse(data)`; on failure → 502.
3. If `data` is not an object after parsing → 502.
4. Call `extractArticleBody(data)`:
- Returns `null` → 404 (absent, null, empty, whitespace)
- Returns string → 200 text/html
**Rationale**:
- Axios auto-parses JSON when `Content-Type: application/json` (confirmed from contract tests
using `startMockServer` with JSON bodies). If the upstream KME service sends non-JSON content-type,
`response.data` is a string — explicit `JSON.parse` is required as a fallback.
- Spec edge case: "If JSON-LD response body is not valid JSON → 502". The string branch handles this.
- Spec edge case: "If `vkm:articleBody` is empty string → treat as absent → 404". This is handled
by `extractArticleBody`'s `body.trim() === ''` check.
**Alternatives considered**:
- Always calling `JSON.parse(JSON.stringify(data))` — rejected: unnecessary if axios already parsed.
- A single `JSON.parse(response.data)` regardless — rejected: axios may have already parsed.
---
## Decision 5: `extractArticleBody` Placement
**Question**: Should `extractArticleBody` go in `kmeContentSourceAdapter.js` or
`kmeContentSourceAdapterHelpers.js`?
**Decision**: `kmeContentSourceAdapterHelpers.js` (helpers file).
**Rationale**:
- The function is a pure data transformation: given an object, return a string or null. No side
effects, no state, no API calls.
- Constitution Section I.I classifies "data transformation" as eligible for the helpers file when
it is a pure utility (no business decisions or authentication).
- Consistent with `extractHydraItems` and `buildSitemapXml` which are also pure transformations.
**Alternatives considered**:
- Inline in `contentFetchFlow()` — valid but reduces testability and clarity.
- Defining it as a local helper inside the adapter — rejected: less testable and inconsistent with
existing pattern where data-extraction helpers live in the helpers file.
---
## Decision 6: Error Handling Boundary for `contentFetchFlow()`
**Question**: Should `contentFetchFlow` let errors bubble to the outer catch (→ 401) or handle all
errors internally?
**Decision**: `contentFetchFlow()` is fully self-contained — handles ALL errors internally, returns
early on every error path, never throws to the outer catch.
**Rationale**:
- The outer catch produces a 401 Unauthorized response, which is correct for `oidcAuthFlow` errors
but wrong for content-fetch errors (should be 400/404/500/502 per spec).
- Self-contained error handling makes the function testable in isolation and prevents accidental
401s from unexpected throws.
- Consistent with `sitemapFlow()`, which also handles all errors internally (the outer catch was
designed for `oidcAuthFlow` only).
---
## No Remaining Unknowns
All NEEDS CLARIFICATION items resolved. Phase 1 design may proceed.

View File

@@ -0,0 +1,122 @@
# Feature Specification: KME Article Content Fetch
**Feature Branch**: `003-kme-content-fetch`
**Created**: 2025-07-15
**Status**: Draft
## User Scenarios & Testing *(mandatory)*
### User Story 1 — Happy Path Article Fetch (Priority: P1)
A downstream consumer (e.g. a CMS or search front-end) sends a request to the proxy with a `kmeURL` query parameter containing the verbatim `vkm:url` value it received from the KME Search API. The proxy authenticates the request to the KME Content Service, fetches the article, and streams back the HTML body of that article so the consumer can render it.
**Why this priority**: This is the core business value of the feature. Without a working happy path there is nothing to build on.
**Independent Test**: Issue a GET request to the proxy with a valid, reachable `kmeURL`. Verify the response body is HTML matching the `vkm:articleBody` field in the KME Content Service response, status 200 and `Content-Type: text/html`.
**Acceptance Scenarios**:
1. **Given** the proxy receives a GET request whose URL does **not** end in `/sitemap.xml`, **When** the request contains `?kmeURL=https://content.kme.example/articles/123`, **Then** the proxy fetches that URL from the KME Content Service with `Authorization: OIDC_id_token {token}`, extracts `vkm:articleBody` from the JSON-LD response, and returns it as the HTTP response body with status 200 and `Content-Type: text/html`.
2. **Given** the token cache holds a valid OIDC token, **When** the proxy makes the upstream request, **Then** it uses the cached token without a new token acquisition round-trip.
3. **Given** the token cache has expired, **When** the proxy makes the upstream request, **Then** `getValidToken()` refreshes the token transparently before the upstream call is made.
---
### User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)
A consumer sends a request that matches the content-fetch route (not a sitemap URL) but omits the `kmeURL` parameter or provides it as an empty string. The proxy must reject the request immediately with a clear 400 response rather than making a malformed upstream call.
**Why this priority**: Bad-input rejection prevents meaningless upstream calls and gives consumers a clear, actionable error signal.
**Independent Test**: Send a GET request to the proxy without `kmeURL`, or with `kmeURL=`. Verify a 400 Bad Request response is returned.
**Acceptance Scenarios**:
1. **Given** the proxy receives a request with no `kmeURL` query parameter, **When** the request is processed, **Then** the proxy returns HTTP 400 without making any upstream request.
2. **Given** the proxy receives a request with `?kmeURL=` (empty value), **When** the request is processed, **Then** the proxy returns HTTP 400 without making any upstream request.
---
### User Story 3 — Upstream Content Fetch Failure or Missing Article Body (Priority: P3)
The KME Content Service is unreachable, returns an HTTP error status, times out, or returns a valid JSON-LD document that does not contain `vkm:articleBody`. The proxy must surface an appropriate error to the consumer.
**Why this priority**: Robust error handling avoids silent failures and lets consumers distinguish between "article not found" and "upstream service error".
**Independent Test**: Simulate or stub each failure mode and verify the correct HTTP error code is returned by the proxy.
**Acceptance Scenarios**:
1. **Given** the KME Content Service returns a 4xx response for the requested URL, **When** the proxy processes the response, **Then** the proxy returns HTTP 404 to the caller.
2. **Given** the KME Content Service returns a 5xx response or the request times out (exceeding 10 seconds), **When** the proxy processes the response, **Then** the proxy returns HTTP 502 to the caller.
3. **Given** the KME Content Service returns a 200 JSON-LD response but the `vkm:articleBody` field is absent or null, **When** the proxy processes the response, **Then** the proxy returns HTTP 404 to the caller.
4. **Given** a network-level error prevents the upstream request from completing, **When** the proxy processes the error, **Then** the proxy returns HTTP 502 to the caller.
---
### User Story 4 — Existing Passthrough Behaviour Preserved (Priority: P4)
Requests that do not match the sitemap route and do not carry a `kmeURL` parameter must continue to receive the existing 200 OK response (auth-check passthrough) without any change in behaviour.
**Why this priority**: Non-regression of existing behaviour is required to avoid breaking active consumers that rely on the passthrough route.
**Independent Test**: Send a GET request to the proxy with neither a `/sitemap.xml` suffix nor a `kmeURL` parameter. Verify a 200 OK response is returned, identical to current behaviour.
**Acceptance Scenarios**:
1. **Given** the proxy receives a request with no `kmeURL` parameter and a URL not ending in `/sitemap.xml`, **When** the request is processed, **Then** the proxy returns HTTP 200 (the existing auth-check passthrough).
---
### Edge Cases
- What happens when `kmeURL` contains an already-encoded URL (percent-encoded characters)? The value must be used verbatim; double-encoding must not occur.
- What happens if the JSON-LD response body from the KME Content Service is not valid JSON? The proxy should treat this as a 502 upstream error.
- What happens if the upstream response contains `vkm:articleBody` but its value is an empty string? Treat as absent → return 404.
- What happens if the OIDC token cannot be acquired (e.g. auth service down)? Surface this as a 502 upstream error.
- What happens if `kmeURL` is present but the URL is not a well-formed absolute URL? Return 400 Bad Request (same as missing/empty).
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: The proxy MUST detect when an incoming request URL does NOT end in `/sitemap.xml` AND contains a non-empty `kmeURL` query parameter, and route such requests through the content-fetch flow.
- **FR-002**: The proxy MUST use the `kmeURL` parameter value exactly as provided — without any manipulation, re-encoding, or URL construction — as the target URL for the upstream GET request.
- **FR-003**: The proxy MUST attach an `Authorization: OIDC_id_token {token}` header to the upstream GET request, obtaining the token via `getValidToken()` from `kmeContentSourceAdapterHelpers`.
- **FR-004**: The upstream GET request MUST have a timeout of 10 seconds.
- **FR-005**: On a successful upstream response, the proxy MUST extract the `vkm:articleBody` field from the JSON-LD response body.
- **FR-006**: The proxy MUST return the `vkm:articleBody` value as the HTTP response body with status 200 and `Content-Type: text/html`.
- **FR-007**: If `kmeURL` is absent, empty, or blank, the proxy MUST return HTTP 400 without making an upstream request.
- **FR-008**: If `kmeURL` is present but not a well-formed absolute URL, the proxy MUST return HTTP 400.
- **FR-009**: If the upstream request results in a 4xx response, or the `vkm:articleBody` field is absent, null, or empty in an otherwise successful response, the proxy MUST return HTTP 404 to the caller.
- **FR-010**: If the upstream request results in a 5xx response, a timeout, a network error, or an unparseable response body, the proxy MUST return HTTP 502 to the caller.
- **FR-011**: If the OIDC token cannot be acquired, the proxy MUST return HTTP 502 to the caller.
- **FR-012**: Requests that neither end in `/sitemap.xml` nor carry a `kmeURL` parameter MUST continue to receive the existing 200 OK passthrough response, unchanged.
- **FR-013**: The content-fetch flow MUST be implemented entirely within the VM sandbox file (`src/proxyScripts/kmeContentSourceAdapter.js`) using only the injected context variables (`axios`, `kmeContentSourceAdapterHelpers`, `kme_CSA_settings`, `console`, `URLSearchParams`, `URL`, `req`, `res`) — no new imports or module-level exports are permitted.
### Key Entities
- **KME Article Content**: Represents a single article fetched from the KME Content Service. Identified by its `vkm:url`. Key field: `vkm:articleBody` (HTML string).
- **OIDC Token**: A short-lived bearer credential used to authenticate requests to the KME Content Service. Managed by `getValidToken()`, which handles caching (Redis) and refresh transparently.
- **Proxy Request**: An incoming HTTP request received by the proxy script, carrying routing signals in the URL path (sitemap detection) and query string (`kmeURL`).
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: A consumer submitting a valid `kmeURL` receives the corresponding article HTML body in under 11 seconds end-to-end (10 s upstream timeout + 1 s proxy overhead) under normal network conditions.
- **SC-002**: 100% of requests with a missing, empty, or malformed `kmeURL` parameter receive a 400 response without triggering any upstream call.
- **SC-003**: 100% of upstream 4xx responses and missing/empty `vkm:articleBody` scenarios result in a 404 response to the caller.
- **SC-004**: 100% of upstream 5xx, timeout, and network-error scenarios result in a 502 response to the caller.
- **SC-005**: All existing proxy routes (sitemap flow and passthrough) continue to behave identically to their pre-feature behaviour — zero regression.
- **SC-006**: The unit test suite for the proxy script achieves ≥90% branch coverage across the content-fetch flow, including all four error paths.
## Assumptions
- The `kmeURL` value provided by callers is the verbatim `vkm:url` value from the KME Search API response — the spec does not need to validate its domain or path structure beyond confirming it is a well-formed absolute URL.
- `getValidToken()` is already implemented, tested, and handles all OIDC token edge cases (expiry, Redis connectivity, refresh). This feature does not modify it.
- The `axios` instance injected into the VM context supports a `timeout` configuration option and throws a recognisable error on timeout (following standard axios behaviour).
- The KME Content Service always returns `Content-Type: application/ld+json` (or similar JSON) for valid article requests; no binary or streaming responses are expected.
- HTTP method is always GET for the content-fetch flow; no authentication or session concept exists on the proxy's inbound side.
- The existing sitemap route detection (URL ends in `/sitemap.xml`) takes priority over the `kmeURL` check — a URL ending in `/sitemap.xml?kmeURL=...` would route to the sitemap flow, not the content-fetch flow.
- Error response bodies are plain text or minimal JSON — no prescribed format is required beyond the correct HTTP status code.

View File

@@ -0,0 +1,227 @@
---
description: "Task list for KME Article Content Fetch (003)"
---
# Tasks: KME Article Content Fetch
**Input**: Design documents from `specs/003-kme-content-fetch/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/http-content-fetch.md ✅, quickstart.md ✅
**Architecture constraints**:
- Zero new files in `src/` — only `src/proxyScripts/kmeContentSourceAdapter.js` and `src/globalVariables/kmeContentSourceAdapterHelpers.js` are modified
- VM sandbox: zero `import`/`export` statements in proxy script or helpers file
- Helpers file is a literal function body (ends with `return { ... }`) — new function added before that block
- Tests use Node.js built-in test runner (`node:test`)
**Files in scope**:
| File | Change |
|------|--------|
| `src/globalVariables/kmeContentSourceAdapterHelpers.js` | Add `extractArticleBody(data)`; export in `return { ... }` |
| `src/proxyScripts/kmeContentSourceAdapter.js` | Add `contentFetchFlow()`; add routing branch |
| `tests/unit/proxy.test.js` | Add content-fetch describe blocks and helper tests |
| `tests/contract/proxy-http.test.js` | Add content-fetch contract tests |
| `CHANGELOG.md` | Add feature entry |
## Format: `[ID] [P?] [Story?] Description`
- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
- **[Story]**: Which user story this task belongs to (US1US4)
- All file paths are relative to repository root
---
## Phase 1: Setup
**Purpose**: Confirm baseline before any modifications
- [X] T001 Run `npm test` from repository root to confirm all existing tests pass and record the baseline count
**Checkpoint**: Baseline confirmed — no pre-existing failures
---
## Phase 2: Foundational (Blocking Prerequisite)
**Purpose**: Add `extractArticleBody` pure helper — required by `contentFetchFlow()` in every user story phase
**⚠️ CRITICAL**: Phase 3 implementation cannot begin until T002 and T003 are complete; T004 is independently testable after T002+T003
- [X] T002 Add `extractArticleBody(data)` function body to `src/globalVariables/kmeContentSourceAdapterHelpers.js` — insert immediately before the existing `return { ... }` block; implementation: guard for non-object input (`if (!data || typeof data !== 'object') return null`), extract `data['vkm:articleBody']`, return null if field is null/undefined/non-string/empty/whitespace, otherwise return the string
- [X] T003 Add `extractArticleBody` to the exports in the `return { ... }` block at the bottom of `src/globalVariables/kmeContentSourceAdapterHelpers.js` so the injected VM context exposes the new function
- [X] T004 [P] Add `extractArticleBody helper` describe block to `tests/unit/proxy.test.js` covering all 7 edge cases per data-model.md: valid HTML string → returns string; empty string → null; whitespace-only string → null; null field value → null; field absent (`{}`) → null; null input → null; non-object input (string) → null — no mocking needed, call the helper directly
**Checkpoint**: `extractArticleBody` is implemented, exported, and unit-tested; run `npm run test:unit` to confirm T004 passes
---
## Phase 3: User Story 1 — Happy Path Article Fetch (Priority: P1) 🎯 MVP
**Goal**: Proxy receives a valid `?kmeURL=` request, obtains an OIDC token, fetches the upstream article, extracts `vkm:articleBody`, and returns it as `200 text/html`
**Independent Test**: `curl "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123"` returns `200 OK`, `Content-Type: text/html`, and body matching `vkm:articleBody` from the mock upstream
### Implementation for User Story 1
- [X] T005 [US1] Implement complete `contentFetchFlow()` async function in `src/proxyScripts/kmeContentSourceAdapter.js` following the 9-step design in plan.md: (1) extract `kmeURL` via `new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? ''`, (2) empty/blank → 400, (3) malformed/non-http(s) → 400, (4) `validateSettings` missing field → 500, (5) `getValidToken` throws → 502, (6) `axios.get(kmeURL, { headers: { Authorization: 'OIDC_id_token {token}' }, timeout: 10000 })` — ECONNABORTED/ERR_CANCELED → 502, upstream 4xx → 404, upstream 5xx → 502, network error → 502, (7) string body fallback `JSON.parse` — failure → 502; non-object → 502, (8) `extractArticleBody(data)` → null → 404, (9) `res.writeHead(200, { 'Content-Type': 'text/html' }); res.end(articleBody)`
- [X] T006 [US1] Add content-fetch routing branch to the URL dispatch block in `src/proxyScripts/kmeContentSourceAdapter.js`: insert `else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) { await contentFetchFlow(); }` between the existing sitemap check and the `oidcAuthFlow()` fallback
### Tests for User Story 1
- [X] T007 [P] [US1] Add `US-content-fetch: happy path` describe block to `tests/unit/proxy.test.js` with two tests: (a) stub `getValidToken` returning cached token + stub `axios.get` returning `{ data: { 'vkm:articleBody': '<p>Hello</p>' } }` → assert status 200, `Content-Type: text/html`, body `<p>Hello</p>`; (b) stub `getValidToken` simulating cache miss (returns a freshly acquired token) → same 200 assertion
- [X] T008 [P] [US1] Add happy path contract test to `tests/contract/proxy-http.test.js`: start a real mock HTTP server that returns `{ "vkm:articleBody": "<p>Contract test article</p>" }` with `Content-Type: application/ld+json`; start a real mock token server; issue `GET /?kmeURL={mock-server-url}` to the proxy; assert status 200, `Content-Type: text/html`, response body equals `<p>Contract test article</p>`; verify total round-trip is under 11 s (SC-001)
**Checkpoint**: `npm run test:unit` and `npm run test:contract` both pass for happy path; manually verify with `curl` per quickstart.md
---
## Phase 4: User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)
**Goal**: Requests with absent, empty, whitespace, or malformed `kmeURL` receive a 400 response with no upstream call made
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL="` returns `400`; `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url"` returns `400`
### Tests for User Story 2
- [X] T009 [P] [US2] Add `US-content-fetch: input validation` describe block to `tests/unit/proxy.test.js` with 6 tests using a spy on `axios.get` to assert it is never called: (a) `?kmeURL` absent (no kmeURL param) → routes to `oidcAuthFlow` → 200 (confirms FR-012); (b) `?kmeURL=` empty string → 400, body `Bad Request: kmeURL parameter is required`; (c) `?kmeURL=%20` whitespace-only → 400; (d) `?kmeURL=relative/path` → 400, body `Bad Request: kmeURL must be a well-formed absolute http/https URL`; (e) `?kmeURL=ftp://example.com/article` non-http protocol → 400; (f) `?kmeURL=:::malformed` → 400
**Checkpoint**: `npm run test:unit` passes for validation tests; confirm no upstream stubs are invoked in any 400 scenario
---
## Phase 5: User Story 3 — Upstream Failure & Missing Article Body (Priority: P3)
**Goal**: All upstream error conditions (token failure, 4xx, 5xx, timeout, network error, bad body, missing/empty `vkm:articleBody`) return the correct 404 or 502 status to the caller
**Independent Test**: Stub `axios.get` to throw an ECONNABORTED error; verify proxy returns 502. Stub `getValidToken` to throw; verify proxy returns 502. Stub `axios.get` returning `{ data: {} }`; verify proxy returns 404.
### Tests for User Story 3
- [X] T010 [P] [US3] Add `US-content-fetch: upstream errors` describe block to `tests/unit/proxy.test.js` with 7 tests: (a) `getValidToken` throws → 502, body `Bad Gateway: token acquisition failed`; (b) `axios.get` throws with `{ response: { status: 404 } }` → 404, body `Not Found: article not found at upstream`; (c) `axios.get` throws with `{ response: { status: 410 } }` → 404; (d) `axios.get` throws with `{ response: { status: 503 } }` → 502, body `Bad Gateway: upstream error HTTP 503`; (e) `axios.get` throws with `{ code: 'ECONNABORTED' }` → 502, body `Bad Gateway: upstream request timed out`; (f) `axios.get` throws with `{ code: 'ERR_CANCELED' }` → 502; (g) `axios.get` throws with `{ message: 'ENOTFOUND' }` (no `response`, no code) → 502, body contains `Bad Gateway:`
- [X] T011 [P] [US3] Add `US-content-fetch: body parsing` describe block to `tests/unit/proxy.test.js` with 5 tests (all require valid `getValidToken` stub): (a) `axios.get` returns `{ data: 'not json{{{' }` (string, unparseable) → 502, body `Bad Gateway: unparseable response from upstream`; (b) `axios.get` returns `{ data: { 'vkm:articleBody': undefined } }` (field absent) → 404, body `Not Found: article body not present in upstream response`; (c) field is `null` → 404; (d) field is `''` empty string → 404; (e) field is `' '` whitespace-only → 404
- [X] T012 [P] [US3] Add contract error tests to `tests/contract/proxy-http.test.js`: (a) mock upstream server returns HTTP 404 → proxy returns 404; (b) mock upstream server returns HTTP 503 → proxy returns 502; (c) mock server accepts connection but never responds (use `server.on('request', () => {})`) → proxy returns 502 within 12 s and does not hang
**Checkpoint**: All 19 unit tests in T010+T011 pass; all 3 contract error tests in T012 pass
---
## Phase 6: User Story 4 — Passthrough Behaviour Preserved (Priority: P4)
**Goal**: Requests without `kmeURL` and without `/sitemap.xml` suffix continue to receive the existing 200 OK auth-check passthrough — zero regression
**Independent Test**: `curl -o /dev/null -w "%{http_code}" "http://localhost:3000/"` returns `200` and body is `Authorized` (unchanged)
### Tests for User Story 4
- [X] T013 [US4] Add `US-content-fetch: passthrough preserved` describe block to `tests/unit/proxy.test.js` with 1 test: GET `/?someOtherParam=value` (no `kmeURL`, not sitemap) → assert status 200, body `Authorized`, and confirm `axios.get` is never called (spy asserts not called) — verifies FR-012 and SC-005
**Checkpoint**: Passthrough test passes; run full `npm test` to confirm zero regressions across entire suite
---
## Final Phase: Polish & Cross-Cutting Concerns
**Purpose**: Changelog documentation and final validation
- [X] T014 [P] Add entry to `CHANGELOG.md` for feature `003-kme-content-fetch`: document new `contentFetchFlow()` in `kmeContentSourceAdapter.js` (routes `?kmeURL=` requests, handles all error paths 400/404/500/502, 10 s timeout), new `extractArticleBody(data)` in `kmeContentSourceAdapterHelpers.js`, new unit test describe blocks in `tests/unit/proxy.test.js`, and new contract tests in `tests/contract/proxy-http.test.js`
- [X] T015 Run full test suite `npm test` and confirm all tests pass; run the four quickstart.md `curl` smoke tests (valid kmeURL passthrough, empty kmeURL → 400, malformed kmeURL → 400, sitemap → 200) to validate end-to-end behaviour
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — run immediately
- **Foundational (Phase 2)**: Depends on Setup ✅ — **BLOCKS** all user story phases
- T002 → T003 (sequential, same file)
- T004 [P] can run after T002+T003 (different file: test file)
- **US1 (Phase 3)**: Depends on Foundational complete (T002+T003)
- T005 → T006 (sequential, same file)
- T007 [P] and T008 [P] can run after T005+T006 (different files)
- **US2 (Phase 4)**: Depends on T005+T006 complete (tests the validation guards inside `contentFetchFlow`)
- **US3 (Phase 5)**: Depends on T005+T006 complete (tests the error guards inside `contentFetchFlow`)
- **US4 (Phase 6)**: Depends on T006 complete (tests the routing branch)
- **Polish (Final)**: Depends on all user story phases complete
### User Story Dependencies
- **US1 (P1)**: Depends only on Foundational phase
- **US2 (P2)**: Depends on US1 implementation (T005+T006) — validation lives inside `contentFetchFlow()`
- **US3 (P3)**: Depends on US1 implementation (T005+T006) — error paths live inside `contentFetchFlow()`
- **US4 (P4)**: Depends on T006 routing branch — tests that passthrough still reached when no `kmeURL`
### Within Each Phase
- Source file edits must complete before their corresponding test tasks
- T002 must complete before T003 (same file, sequential)
- T005 must complete before T006 (same file, sequential)
- T005+T006 must complete before T007, T008, T009, T010, T011, T012, T013
### Parallel Opportunities
- T004 [P] runs in parallel with T005+T006 (different files: test file vs source file)
- After T005+T006: T007 [P], T008 [P], T009 [P], T010 [P], T011 [P], T012 [P] can all run in parallel (different describe blocks, or separate test file vs unit file)
- T013 and T014 [P] run in parallel (different files)
---
## Parallel Execution Examples
### Foundational Phase Parallelism
```
# After T002+T003 complete, run simultaneously:
Task A: T004 — Write extractArticleBody unit tests in tests/unit/proxy.test.js
Task B: T005 — Implement contentFetchFlow() in src/proxyScripts/kmeContentSourceAdapter.js
```
### After T005+T006 Complete
```
# These 6 tasks can all run in parallel (different describe blocks / different files):
Task A: T007 — Happy path unit tests (proxy.test.js)
Task B: T008 — Happy path contract test (proxy-http.test.js)
Task C: T009 — Input validation unit tests (proxy.test.js, separate describe block)
Task D: T010 — Upstream error unit tests (proxy.test.js, separate describe block)
Task E: T011 — Body parsing unit tests (proxy.test.js, separate describe block)
Task F: T012 — Contract error tests (proxy-http.test.js, separate describe block)
```
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup baseline verification
2. Complete Phase 2: Add `extractArticleBody` helper (CRITICAL — blocks everything)
3. Complete Phase 3: Implement `contentFetchFlow()`, routing branch, and happy path tests
4. **STOP and VALIDATE**: `npm run test:unit` + `npm run test:contract` pass; manual `curl` smoke test works
5. **Deploy/demo if ready** — consumers can now fetch articles via the proxy
### Incremental Delivery
1. Foundation + US1 → happy path working → Demo MVP
2. Add US2 tests → validate 400 rejection works
3. Add US3 tests → validate error handling works
4. Add US4 test → confirm no regression
5. Polish → CHANGELOG + final `npm test`
### Single-Developer Sequence (Optimal Order)
```
T001 → T002 → T003 → T005 → T006 → T004* → T007 → T009 → T010 → T011 → T013 → T008 → T012 → T014 → T015
(* T004 can be done any time after T003 — fits naturally here before test sprint)
```
---
## Notes
- **VM sandbox constraint**: `contentFetchFlow()` must not contain any `import` or `require` — all dependencies (`axios`, `kmeContentSourceAdapterHelpers`, `kme_CSA_settings`, `URL`, `URLSearchParams`, `console`, `req`, `res`) arrive via the injected VM context
- **Helpers file constraint**: `extractArticleBody` must be inserted as a plain `function` declaration before the existing `return { ... }` block — no module syntax
- **`[P]` tasks**: different files with no dependency on incomplete tasks in the same file
- **`[Story]` labels**: map each test task back to the user story it validates for traceability
- Each user story's test tasks are independently runnable with `node --test tests/unit/proxy.test.js` (filter by describe block name)
- Commit after each logical group (e.g., after T002+T003, after T005+T006, after all unit test tasks)
- Verify `npm test` green at each checkpoint before proceeding to next phase

View File

@@ -1,7 +1,7 @@
// Helpers for kmeContentSourceAdapter.js
// This file is the literal body of a function — no imports or exports.
// server.js wraps and executes it as: (function() { <this file> })()
// Context globals available: redis, axios, console, xmlBuilder, URLSearchParams, kme_CSA_settings
// Context globals available: redis, axios, console, xmlbuilder2, URLSearchParams, kme_CSA_settings
/**
* Returns the first missing required field name, or null if all present.
@@ -21,23 +21,36 @@ function validateSettings(settings, requiredFields) {
* structure returned by the KME Knowledge Search Service:
* data["hydra:member"][n] → SearchResultItem
* data["hydra:member"][n]["hydra:member"] → SearchResultItemFragment[] (has vkm:url)
*
* For each SearchResultItem, only the fragment with the latest vkm:datePublished
* is returned. If no vkm:datePublished is present the fragment is treated as
* epoch 0, so dated fragments always win over undated ones.
*
* @param {object} data response.data from the search API
* @returns {object[]}
*/
function extractHydraItems(data) {
const topMembers = data['hydra:member'] ?? [];
return topMembers.flatMap(resultItem => resultItem['hydra:member'] ?? []);
return topMembers.map(resultItem => {
const fragments = resultItem['hydra:member'] ?? [];
if (fragments.length === 0) return null;
return fragments.reduce((latest, current) => {
const latestDate = new Date(latest['vkm:datePublished'] ?? 0).getTime();
const currentDate = new Date(current['vkm:datePublished'] ?? 0).getTime();
return currentDate > latestDate ? current : latest;
});
}).filter(Boolean);
}
/**
* Builds a Sitemaps-protocol 0.9 XML document from the given items.
* Uses xmlBuilder from the enclosing VM context.
* Uses xmlbuilder2 from the enclosing VM context.
* @param {object[]} items SearchResultItemFragment objects with vkm:url
* @param {string} proxyBaseUrl base URL for <loc> values
* @returns {string} serialised XML
*/
function buildSitemapXml(items, proxyBaseUrl) {
const doc = xmlBuilder({ version: '1.0', encoding: 'UTF-8' });
const doc = xmlbuilder2.create({ version: '1.0', encoding: 'UTF-8' });
const urlset = doc.ele('urlset', { xmlns: 'http://www.sitemaps.org/schemas/sitemap/0.9' });
for (const item of items) {
const vkmUrl = item['vkm:url'];
@@ -120,9 +133,26 @@ async function getValidToken(reqUrl, reqMethod) {
}
}
/**
* Extracts the article body HTML from a KME content API response.
* Returns the value of data['vkm:articleBody'] if it is a non-empty, non-whitespace string,
* otherwise returns null.
* @param {any} data response.data from the content API
* @returns {string|null}
*/
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
// KME returns JSON-LD ('vkm:articleBody') with Accept: application/ld+json,
// or a simplified representation ('articleBody') otherwise.
const body = data['vkm:articleBody'] ?? data['articleBody'];
if (body === null || body === undefined || typeof body !== 'string' || !body.trim()) return null;
return body;
}
return {
validateSettings,
extractHydraItems,
buildSitemapXml,
getValidToken,
extractArticleBody,
};

View File

@@ -1,18 +1,117 @@
(async () => {
// ---------------------------------------------------------------------------
// OIDC auth flow — existing non-sitemap behaviour, unchanged
// Content fetch flow — GET /?kmeURL=<upstream-article-url>
// ---------------------------------------------------------------------------
async function oidcAuthFlow() {
const missingField = kmeContentSourceAdapterHelpers.validateSettings(
kme_CSA_settings,
['tokenUrl', 'username', 'password', 'clientId', 'scope'],
);
if (missingField) throw new Error('missing required field: ' + missingField);
async function contentFetchFlow() {
const searchParams = new URL(req.url, 'http://localhost').searchParams;
const kmeURL = searchParams.get('kmeURL') ?? '';
await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
// Step 2: Empty / blank → 400
if (!kmeURL || !kmeURL.trim()) {
res.writeHead(400, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Request: kmeURL parameter is required' }));
return;
}
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('Authorized');
// Step 3: Validate URL shape
let parsedURL;
try {
parsedURL = new URL(kmeURL);
} catch {
res.writeHead(400, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Request: kmeURL must be a well-formed absolute http/https URL' }));
return;
}
if (!['http:', 'https:'].includes(parsedURL.protocol)) {
res.writeHead(400, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Request: kmeURL must be a well-formed absolute http/https URL' }));
return;
}
// Step 5: Acquire OIDC token
let token;
try {
token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
} catch (err) {
console.error({ message: 'Failed to acquire OIDC token for content fetch', error: err.message });
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Gateway: token acquisition failed' }));
return;
}
// Step 6: Fetch content from upstream
let contentResponse;
try {
contentResponse = await axios.get(kmeURL, {
headers: {
Authorization: `OIDC_id_token ${token}`,
Accept: 'application/ld+json',
},
timeout: 10000,
});
} catch (err) {
if (err.code === 'ECONNABORTED' || err.code === 'ERR_CANCELED') {
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Gateway: upstream request timed out' }));
} else if (err.response) {
const status = err.response.status;
if (status >= 400 && status < 500) {
res.writeHead(404, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Not Found: article not found at upstream' }));
} else {
console.error({ message: 'KME content service error', error: err.message });
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: `Bad Gateway: upstream error HTTP ${status}` }));
}
} else {
console.error({ message: 'KME content network error', error: err.message });
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: `Bad Gateway: ${err.message}` }));
}
return;
}
// Step 7: Normalise response body — handle string payloads (e.g. pre-parsed axios quirk)
let data;
if (typeof contentResponse.data === 'string') {
let parsed;
try {
parsed = JSON.parse(contentResponse.data);
} catch {
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Gateway: unparseable response from upstream' }));
return;
}
if (!parsed || typeof parsed !== 'object') {
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Bad Gateway: unparseable response from upstream' }));
return;
}
data = parsed;
} else {
data = contentResponse.data;
}
// Step 8: Extract article body
const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data);
if (!articleBody) {
res.writeHead(404, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Not Found: article body not present in upstream response' }));
return;
}
// Step 9: Return HTML wrapped in a full document
const title = data['vkm:name'] ?? '';
const html = `<!DOCTYPE html>
<html>
<head><title>${title}</title></head>
<body>
${articleBody}
</body>
</html>`;
res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
res.end(html);
}
// ---------------------------------------------------------------------------
@@ -21,7 +120,7 @@
async function sitemapFlow() {
const missingSitemapField = kmeContentSourceAdapterHelpers.validateSettings(
kme_CSA_settings,
['searchApiBaseUrl', 'tenant', 'proxyBaseUrl'],
['searchApiBaseUrl', 'tenant'],
);
if (missingSitemapField) {
console.error({ message: 'Sitemap config error', missingField: missingSitemapField });
@@ -30,7 +129,7 @@
return;
}
const { searchApiBaseUrl, tenant, proxyBaseUrl } = kme_CSA_settings;
const { searchApiBaseUrl, tenant } = kme_CSA_settings;
const missingOidcField = kmeContentSourceAdapterHelpers.validateSettings(
kme_CSA_settings,
@@ -38,18 +137,71 @@
);
if (missingOidcField) throw new Error('missing required field: ' + missingOidcField);
// Derive the proxy base URL from the incoming request so <loc> elements work
// regardless of where the proxy is deployed. Strip /sitemap.xml from the path
// and keep the trailing slash so ?kmeURL= is a valid relative addition.
const proto = req.headers['x-forwarded-proto'] || 'https';
const host = req.headers['x-forwarded-host'] || req.headers.host || kme_CSA_settings.proxyBaseUrl;
const basePath = new URL(req.url, 'http://localhost').pathname.replace(/\/sitemap\.xml$/, '/');
const proxyBaseUrl = host ? `${proto}://${host}${basePath}` : (kme_CSA_settings.proxyBaseUrl || '');
console.debug({ message: 'Sitemap flow: computed proxyBaseUrl', proxyBaseUrl });
try {
console.debug({ message: 'Sitemap flow: obtaining token', url: req.url });
const token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
const searchUrl = `${searchApiBaseUrl}/${tenant}/search?query=*&size=100&category=vkm:ArticleCategory`;
console.info({ message: 'Sitemap flow: calling search API', url: searchUrl });
const searchResponse = await axios.get(searchUrl, {
const reqParams = new URL(req.url, 'http://localhost').searchParams;
const pageSize = reqParams.get('size') ?? '100';
const searchParams = new URLSearchParams({
query: reqParams.get('query') ?? '*',
size: pageSize,
category: reqParams.get('category') ?? 'vkm:ArticleCategory',
});
const searchUrl = `${searchApiBaseUrl}/${tenant}/search?${searchParams}`;
console.info({ message: 'Sitemap flow: calling search API (page 1)', url: searchUrl });
const firstResponse = await axios.get(searchUrl, {
headers: { Authorization: `OIDC_id_token ${token}`, 'Accept': 'application/ld+json' },
timeout: 10000,
});
const items = kmeContentSourceAdapterHelpers.extractHydraItems(searchResponse.data);
const firstData = firstResponse.data;
let allData = [firstData];
// Paginate: hydra:last is nested inside hydra:view.
// hydra:view is absent when all results fit on one page — no pagination needed.
// start= is a 0-based item index; subsequent page start values increment by size.
// e.g. 22 results, size=5 → hydra:view.hydra:last start=20, pages at start=5,10,15,20
const hydraLast = firstData['hydra:view']?.['hydra:last'];
if (hydraLast) {
const lastUrl = new URL(hydraLast);
const lastStart = parseInt(lastUrl.searchParams.get('start') ?? '0', 10);
const size = parseInt(lastUrl.searchParams.get('size') ?? pageSize, 10);
if (lastStart > 0 && size > 0) {
const pageUrls = [];
for (let start = size; start <= lastStart; start += size) {
const pageUrl = new URL(searchUrl);
pageUrl.searchParams.set('start', String(start));
pageUrls.push(pageUrl.toString());
}
console.info({ message: 'Sitemap flow: fetching additional pages', count: pageUrls.length });
const pageResponses = await Promise.all(
pageUrls.map(url => axios.get(url, {
headers: { Authorization: `OIDC_id_token ${token}`, 'Accept': 'application/ld+json' },
timeout: 10000,
}))
);
allData = [firstData, ...pageResponses.map(r => r.data)];
}
}
const SITEMAP_MAX_URLS = 50_000;
const allItems = allData.flatMap(
data => kmeContentSourceAdapterHelpers.extractHydraItems(data)
);
const items = allItems.length > SITEMAP_MAX_URLS ? allItems.slice(0, SITEMAP_MAX_URLS) : allItems;
if (allItems.length > SITEMAP_MAX_URLS) {
console.warn({ message: 'Sitemap flow: result set truncated to 50,000 (sitemaps.org limit)', total: allItems.length });
}
console.debug({ message: 'Sitemap flow: items received', count: items.length });
const xml = kmeContentSourceAdapterHelpers.buildSitemapXml(items, proxyBaseUrl);
@@ -78,10 +230,13 @@
// Entry point — URL routing
// ---------------------------------------------------------------------------
try {
if (req.url.endsWith('/sitemap.xml')) {
if (req.url.endsWith('/sitemap.xml') || new URL(req.url, 'http://localhost').pathname.endsWith('/sitemap.xml')) {
await sitemapFlow();
} else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) {
await contentFetchFlow();
} else {
await oidcAuthFlow();
res.writeHead(404, { 'Content-Type': 'text/plain' });
res.end('Not Found');
}
} catch (err) {
let message;
@@ -92,8 +247,8 @@
} else {
message = err.message;
}
console.error({ message: 'Auth failed', error: message, url: req.url });
res.writeHead(401, { 'Content-Type': 'text/plain' });
res.end('Unauthorized: ' + message);
console.error({ message: 'Request failed', error: message, url: req.url });
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Internal Server Error: ' + message);
}
})()

View File

@@ -7,7 +7,7 @@ import vm from "node:vm";
import axios from "axios";
import { v4 as uuidv4 } from "uuid";
import jwt from "jsonwebtoken";
import { create as xmlBuilder } from "xmlbuilder2";
import xmlbuilder2 from "xmlbuilder2";
import { logger } from "./logger.js";
import { createClient } from "redis";
@@ -22,7 +22,7 @@ const globalVMContext = {
axios,
uuidv4,
jwt,
xmlBuilder,
xmlbuilder2,
};
let globalVariableContext = {};

View File

@@ -6,7 +6,7 @@ import { readFileSync } from 'node:fs';
import { fileURLToPath } from 'node:url';
import { dirname, join } from 'node:path';
import axios from 'axios';
import { create as xmlBuilder } from 'xmlbuilder2';
import xmlbuilder2 from 'xmlbuilder2';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
@@ -50,7 +50,6 @@ function startMockServer(statusCode, responseBody) {
/**
* Start a mock token server (alias for backwards compatibility).
*/
const startMockTokenServer = startMockServer;
/** Build an in-memory Redis fake. */
function makeRedisFake() {
@@ -79,79 +78,6 @@ function makeRes() {
// Contract: 200 OK — successful OIDC token fetch
// ---------------------------------------------------------------------------
describe('proxy HTTP contract: 200 OK', () => {
test('fresh token fetch → 200 Authorized with Content-Type text/plain', async () => {
const mock = await startMockTokenServer(200, {
id_token: 'contract-token',
expires_in: 9_999_999_999,
});
try {
const res = makeRes();
const redis = makeRedisFake();
const kme_CSA_settings = {
tokenUrl: mock.url,
username: 'user',
password: 'pass',
clientId: 'client',
scope: 'openid',
};
const deps = { URLSearchParams, console, axios, xmlBuilder, redis, kme_CSA_settings };
const ctx = vm.createContext({
...deps,
kmeContentSourceAdapterHelpers: makeHelpers(deps),
req: { url: '/', method: 'GET', headers: {} },
res,
});
await proxyScript.runInContext(ctx);
assert.strictEqual(res.statusCode, 200);
assert.strictEqual(res.body, 'Authorized');
assert.strictEqual(res.headers['Content-Type'], 'text/plain');
} finally {
await mock.close();
}
});
});
// ---------------------------------------------------------------------------
// Contract: 401 Unauthorized — token service returns 4xx
// ---------------------------------------------------------------------------
describe('proxy HTTP contract: 401 Unauthorized', () => {
test('token service 401 → proxy 401 with Unauthorized: prefix', async () => {
const mock = await startMockTokenServer(401, {});
try {
const res = makeRes();
const redis = makeRedisFake();
const kme_CSA_settings = {
tokenUrl: mock.url,
username: 'bad-user',
password: 'bad-pass',
clientId: 'client',
scope: 'openid',
};
const deps = { URLSearchParams, console, axios, xmlBuilder, redis, kme_CSA_settings };
const ctx = vm.createContext({
...deps,
kmeContentSourceAdapterHelpers: makeHelpers(deps),
req: { url: '/', method: 'GET', headers: {} },
res,
});
await proxyScript.runInContext(ctx);
assert.strictEqual(res.statusCode, 401);
assert.match(res.body, /^Unauthorized: .+/);
assert.strictEqual(res.headers['Content-Type'], 'text/plain');
} finally {
await mock.close();
}
});
});
// ---------------------------------------------------------------------------
// Contract: sitemap endpoint (T005, T012)
// ---------------------------------------------------------------------------
@@ -178,7 +104,7 @@ describe('sitemap endpoint', () => {
tenant: 'test',
proxyBaseUrl: 'https://proxy.example.com',
};
const deps = { URLSearchParams, console, axios, xmlBuilder, redis, kme_CSA_settings };
const deps = { URLSearchParams, URL, console, axios, xmlbuilder2, redis, kme_CSA_settings };
const ctx = vm.createContext({
...deps,
kmeContentSourceAdapterHelpers: makeHelpers(deps),
@@ -266,41 +192,133 @@ describe('sitemap endpoint', () => {
});
});
// ---------------------------------------------------------------------------
// Non-sitemap endpoint regression (T010)
// Content fetch: happy path contract test — T008
// ---------------------------------------------------------------------------
describe('non-sitemap endpoint (regression)', () => {
test('GET / with valid OIDC credentials → 200 Authorized', async () => {
const mock = await startMockTokenServer(200, {
id_token: 'regression-token',
expires_in: 9_999_999_999,
});
describe('content fetch: happy path', () => {
test('GET /?kmeURL=<mock-server> → 200 text/html with article body (SC-001 < 11s)', async () => {
// Mock content server returning a valid article JSON-LD response
const contentMock = await startMockServer(200, { 'vkm:name': 'Contract Article', 'vkm:articleBody': '<p>Contract test article</p>' });
try {
const res = makeRes();
const redis = makeRedisFake();
// Pre-seed token — no real token exchange needed
await redis.hSet('authorization', 'token', 'content-contract-token');
await redis.hSet('authorization', 'expiry', '9999999999');
const res = makeRes();
const kme_CSA_settings = {
tokenUrl: mock.url,
tokenUrl: 'http://127.0.0.1:1', // unreachable — not used (cache hit)
username: 'user',
password: 'pass',
clientId: 'client',
scope: 'openid',
};
const deps = { URLSearchParams, console, axios, xmlBuilder, redis, kme_CSA_settings };
const deps = { URLSearchParams, URL, console, axios, xmlbuilder2, redis, kme_CSA_settings };
const ctx = vm.createContext({
...deps,
kmeContentSourceAdapterHelpers: makeHelpers(deps),
req: { url: '/', method: 'GET', headers: {} },
req: { url: `/?kmeURL=${encodeURIComponent(contentMock.url)}`, method: 'GET', headers: {} },
res,
});
const start = Date.now();
await proxyScript.runInContext(ctx);
const elapsed = Date.now() - start;
assert.strictEqual(res.statusCode, 200);
assert.strictEqual(res.body, 'Authorized');
assert.ok(
res.headers['Content-Type'].startsWith('text/html'),
`Content-Type was: ${res.headers['Content-Type']}`,
);
assert.ok(res.body.includes('<!DOCTYPE html>'), 'body should contain DOCTYPE');
assert.ok(res.body.includes('<title>Contract Article</title>'), 'body should contain title');
assert.ok(res.body.includes('<p>Contract test article</p>'), 'body should contain article content verbatim');
assert.ok(elapsed < 11000, `Round-trip should be under 11 s, took ${elapsed}ms`);
} finally {
await contentMock.close();
}
});
});
// ---------------------------------------------------------------------------
// Content fetch: error handling contract tests — T012
// ---------------------------------------------------------------------------
describe('content fetch: error handling', () => {
/** Build a content-fetch VM context wired to the given upstream URL. */
function makeContentFetchCtx(contentUrl) {
const redis = makeRedisFake();
// Pre-seed token so no real auth server is needed
redis.hSet('authorization', 'token', 'content-contract-token');
redis.hSet('authorization', 'expiry', '9999999999');
const res = makeRes();
const kme_CSA_settings = {
tokenUrl: 'http://127.0.0.1:1', // not used (cache hit)
username: 'user',
password: 'pass',
clientId: 'client',
scope: 'openid',
};
const deps = { URLSearchParams, URL, console, axios, xmlbuilder2, redis, kme_CSA_settings };
const ctx = vm.createContext({
...deps,
kmeContentSourceAdapterHelpers: makeHelpers(deps),
req: { url: `/?kmeURL=${encodeURIComponent(contentUrl)}`, method: 'GET', headers: {} },
res,
});
ctx._res = res;
return ctx;
}
test('mock upstream returns 404 → proxy returns 404', async () => {
const mock = await startMockServer(404, { error: 'Not Found' });
try {
const ctx = makeContentFetchCtx(mock.url);
await proxyScript.runInContext(ctx);
assert.strictEqual(ctx._res.statusCode, 404, `body was: ${ctx._res.body}`);
} finally {
await mock.close();
}
});
test('mock upstream returns 503 → proxy returns 502', async () => {
const mock = await startMockServer(503, { error: 'Service Unavailable' });
try {
const ctx = makeContentFetchCtx(mock.url);
await proxyScript.runInContext(ctx);
assert.strictEqual(ctx._res.statusCode, 502, `body was: ${ctx._res.body}`);
} finally {
await mock.close();
}
});
test('server accepts connection but never responds → proxy returns 502 within 12s', async () => {
const hangServer = await new Promise((resolve, reject) => {
const s = http.createServer(() => { /* intentionally hang — never respond */ });
s.listen(0, '127.0.0.1', () => {
const { port } = s.address();
const close = () => new Promise((res, rej) => s.close(err => err ? rej(err) : res()));
resolve({ server: s, url: `http://127.0.0.1:${port}`, close });
});
s.once('error', reject);
});
try {
const ctx = makeContentFetchCtx(hangServer.url);
const start = Date.now();
await proxyScript.runInContext(ctx);
const elapsed = Date.now() - start;
assert.strictEqual(ctx._res.statusCode, 502, `body was: ${ctx._res.body}`);
assert.ok(elapsed < 12000, `Should respond within 12s, took ${elapsed}ms`);
} finally {
await hangServer.close();
}
});
});

View File

@@ -4,7 +4,7 @@ import vm from 'node:vm';
import { readFileSync } from 'node:fs';
import { fileURLToPath } from 'node:url';
import { dirname, join } from 'node:path';
import { create as xmlBuilder } from 'xmlbuilder2';
import xmlbuilder2 from 'xmlbuilder2';
const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
@@ -86,16 +86,17 @@ function makeContext(t, overrides = {}) {
axios: resolvedAxios,
redis,
kme_CSA_settings: resolvedSettings,
xmlBuilder,
xmlbuilder2,
});
const ctx = vm.createContext({
URLSearchParams,
URL,
console,
axios: resolvedAxios,
redis,
kme_CSA_settings: defaultSettings,
xmlBuilder,
xmlbuilder2,
kmeContentSourceAdapterHelpers,
req: { url: '/', method: 'GET', headers: {} },
res,
@@ -124,156 +125,6 @@ async function runScript(ctx) {
// User Story 1 — Successful Authenticated Request
// ---------------------------------------------------------------------------
describe('US1: successful authenticated request', () => {
test('cache miss → fresh fetch → 200 OK', async (t) => {
const ctx = makeContext(t);
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
assert.strictEqual(ctx._axios.post.mock.calls.length, 1);
});
test('cache hit → no fetch → 200 OK', async (t) => {
const ctx = makeContext(t, {});
// Pre-seed Redis store via the fake
ctx._store['authorization:token'] = 'cached-tok';
ctx._store['authorization:expiry'] = '9999999999';
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
assert.strictEqual(ctx._axios.post.mock.calls.length, 0);
});
});
// ---------------------------------------------------------------------------
// User Story 2 — Token Expiry and Refresh
// ---------------------------------------------------------------------------
describe('US2: token expiry and refresh', () => {
test('expired token → re-fetch → 200 OK', async (t) => {
const ctx = makeContext(t);
ctx._store['authorization:token'] = 'old-tok';
ctx._store['authorization:expiry'] = '1'; // epoch far in the past
await runScript(ctx);
assert.strictEqual(ctx._axios.post.mock.calls.length, 1, 'should re-fetch');
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
// New token should have been written to Redis
const hSetCalls = ctx._redis.hSet.mock.calls;
assert.ok(hSetCalls.length >= 2, 'hSet should be called for token and expiry');
});
test('future expiry → no re-fetch → 200 OK', async (t) => {
const ctx = makeContext(t);
ctx._store['authorization:token'] = 'fresh-tok';
ctx._store['authorization:expiry'] = '9999999999';
await runScript(ctx);
assert.strictEqual(ctx._axios.post.mock.calls.length, 0, 'should not re-fetch');
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
});
});
// ---------------------------------------------------------------------------
// User Story 3 — Authentication Failure Handling
// ---------------------------------------------------------------------------
describe('US3: authentication failure handling', () => {
test('HTTP 401 from token service → 401 Unauthorized: HTTP 401', async (t) => {
const axiosError = Object.assign(new Error('Request failed with status code 401'), {
response: { status: 401 },
});
const ctx = makeContext(t, {
axios: { post: t.mock.fn(async () => { throw axiosError; }), get: t.mock.fn() },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: HTTP 401');
});
test('timeout (ECONNABORTED) → 401 Unauthorized: token service timeout', async (t) => {
const axiosError = Object.assign(new Error('timeout'), { code: 'ECONNABORTED' });
const ctx = makeContext(t, {
axios: { post: t.mock.fn(async () => { throw axiosError; }), get: t.mock.fn() },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: token service timeout');
});
test('timeout (ERR_CANCELED) → 401 Unauthorized: token service timeout', async (t) => {
const axiosError = Object.assign(new Error('canceled'), { code: 'ERR_CANCELED' });
const ctx = makeContext(t, {
axios: { post: t.mock.fn(async () => { throw axiosError; }), get: t.mock.fn() },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: token service timeout');
});
test('missing id_token in response → 401 Unauthorized: id_token missing from response', async (t) => {
const ctx = makeContext(t, {
axios: {
post: t.mock.fn(async () => ({ data: { expires_in: 9999 } })),
get: t.mock.fn(),
},
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: id_token missing from response');
});
test('missing expires_in in response → 401 Unauthorized: expires_in missing from response', async (t) => {
const ctx = makeContext(t, {
axios: {
post: t.mock.fn(async () => ({ data: { id_token: 'a-token' } })),
get: t.mock.fn(),
},
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: expires_in missing from response');
});
test('missing tokenUrl in kme_CSA_settings → 401 missing required field: tokenUrl', async (t) => {
const ctx = makeContext(t);
ctx.kme_CSA_settings.tokenUrl = '';
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: missing required field: tokenUrl');
});
test('missing username in kme_CSA_settings → 401 missing required field: username', async (t) => {
const ctx = makeContext(t);
ctx.kme_CSA_settings.username = undefined;
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.strictEqual(ctx._res.body, 'Unauthorized: missing required field: username');
});
});
// ---------------------------------------------------------------------------
// Phase 6 — Stampede Guard (FR-013)
// ---------------------------------------------------------------------------
@@ -299,12 +150,15 @@ describe('stampede guard', () => {
hGet: t.mock.fn(async (key, field) => _store[`${key}:${field}`] ?? null),
};
// Slow axios mock — 50ms delay before returning token
// Slow axios post mock — 50ms delay before returning token
const mockAxiosPost = t.mock.fn(async () => {
await new Promise(resolve => setTimeout(resolve, 50));
return { data: { id_token: 'stampede-token', expires_in: 9_999_999_999 } };
});
const sharedAxios = { post: mockAxiosPost, get: t.mock.fn() };
const mockAxiosGet = t.mock.fn(async () => ({
data: { 'vkm:articleBody': '<p>Article</p>' },
}));
const sharedAxios = { post: mockAxiosPost, get: mockAxiosGet };
// Build two contexts sharing kme_CSA_settings, redis, and axios references
function makeRes(tctx) {
@@ -324,21 +178,22 @@ describe('stampede guard', () => {
// Helpers must share the same redis/kme_CSA_settings/axios so the stampede guard works
const sharedHelpers = makeHelpers({
URLSearchParams, console, axios: sharedAxios,
redis, kme_CSA_settings, xmlBuilder,
redis, kme_CSA_settings, xmlbuilder2,
});
const kmeURL = encodeURIComponent('https://kme.example.com/article/1');
const ctx1 = vm.createContext({
URLSearchParams, console, axios: sharedAxios,
redis, kme_CSA_settings, xmlBuilder,
URLSearchParams, URL, console, axios: sharedAxios,
redis, kme_CSA_settings, xmlbuilder2,
kmeContentSourceAdapterHelpers: sharedHelpers,
req: { url: '/', method: 'GET', headers: {} },
req: { url: `/?kmeURL=${kmeURL}`, method: 'GET', headers: {} },
res: res1,
});
const ctx2 = vm.createContext({
URLSearchParams, console, axios: sharedAxios,
redis, kme_CSA_settings, xmlBuilder,
URLSearchParams, URL, console, axios: sharedAxios,
redis, kme_CSA_settings, xmlbuilder2,
kmeContentSourceAdapterHelpers: sharedHelpers,
req: { url: '/', method: 'GET', headers: {} },
req: { url: `/?kmeURL=${kmeURL}`, method: 'GET', headers: {} },
res: res2,
});
@@ -350,8 +205,8 @@ describe('stampede guard', () => {
assert.strictEqual(mockAxiosPost.mock.calls.length, 1, 'stampede guard: only one fetch');
assert.strictEqual(res1.statusCode, 200);
assert.strictEqual(res2.statusCode, 200);
assert.strictEqual(res1.body, 'Authorized');
assert.strictEqual(res2.body, 'Authorized');
assert.ok(res1.body.includes('<p>Article</p>'));
assert.ok(res2.body.includes('<p>Article</p>'));
});
});
@@ -362,12 +217,11 @@ describe('stampede guard', () => {
describe('sitemap flow', () => {
function makeSitemapContext(t, axiosGetImpl, settingsOverrides = {}) {
const ctx = makeContext(t, {
req: { url: '/sitemap.xml', method: 'GET', headers: {} },
req: { url: '/sitemap.xml', method: 'GET', headers: { host: 'proxy.example.com', 'x-forwarded-proto': 'https' } },
});
// Add sitemap-specific settings
ctx.kme_CSA_settings.searchApiBaseUrl = 'https://search.example.com/api';
ctx.kme_CSA_settings.tenant = 'test-tenant';
ctx.kme_CSA_settings.proxyBaseUrl = 'https://proxy.example.com';
Object.assign(ctx.kme_CSA_settings, settingsOverrides);
// Pre-seed token cache so getValidToken() returns immediately
@@ -386,8 +240,8 @@ describe('sitemap flow', () => {
const ctx = makeSitemapContext(t, async () => ({
data: {
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-1' }] },
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-2' }] },
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-1', 'vkm:datePublished': '2024-01-01T00:00:00Z' }] },
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-2', 'vkm:datePublished': '2024-06-01T00:00:00Z' }] },
],
},
}));
@@ -399,11 +253,11 @@ describe('sitemap flow', () => {
assert.ok(ctx._res.body.includes('<?xml'), 'body should start with XML declaration');
assert.ok(ctx._res.body.includes('<urlset'), 'body should contain urlset');
assert.ok(
ctx._res.body.includes('<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>'),
ctx._res.body.includes('<loc>https://proxy.example.com/?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>'),
'body should contain encoded loc for doc-1',
);
assert.ok(
ctx._res.body.includes('<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>'),
ctx._res.body.includes('<loc>https://proxy.example.com/?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>'),
'body should contain encoded loc for doc-2',
);
});
@@ -437,6 +291,139 @@ describe('sitemap flow', () => {
assert.ok(ctx._res.body.includes('valid'), 'the valid URL should appear in the loc');
});
test('multiple fragments per SearchResultItem → only latest vkm:datePublished wins', async (t) => {
const ctx = makeSitemapContext(t, async () => ({
data: {
'hydra:member': [
{
'hydra:member': [
{ 'vkm:url': 'https://kme.example.com/doc/v1', 'vkm:datePublished': '2023-01-01T00:00:00Z' },
{ 'vkm:url': 'https://kme.example.com/doc/v3', 'vkm:datePublished': '2024-06-01T00:00:00Z' },
{ 'vkm:url': 'https://kme.example.com/doc/v2', 'vkm:datePublished': '2023-12-01T00:00:00Z' },
],
},
],
},
}));
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
const locMatches = ctx._res.body.match(/<loc>/g);
assert.strictEqual(locMatches?.length ?? 0, 1, 'exactly one <loc> element (latest version only)');
assert.ok(ctx._res.body.includes('doc%2Fv3'), 'the latest fragment (v3) should be the loc');
assert.ok(!ctx._res.body.includes('doc%2Fv1'), 'older fragment v1 should not appear');
assert.ok(!ctx._res.body.includes('doc%2Fv2'), 'older fragment v2 should not appear');
});
// Pagination: hydra:last nested inside hydra:view drives multi-page fetching.
// hydra:view is absent when all results fit on one page — no pagination needed.
// e.g. 22 results, size=5 → hydra:view['hydra:last'] start=20, fetch start=5,10,15,20
test('hydra:last (22 results, size=5, start=20) → fetches 4 extra pages, all 5 pages combined', async (t) => {
// Simulate the example from the spec: 22 results, page size 5
// First call has no start param; subsequent pages: start=5,10,15,20
const base = 'https://search.example.com/api/test-tenant/search?query=*&size=5&category=vkm%3AArticleCategory';
const pageData = {
[`${base}`]: {
'hydra:view': { 'hydra:last': `${base}&start=20` },
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-p1', 'vkm:datePublished': '2024-01-01T00:00:00Z' }] },
],
},
[`${base}&start=5`]: {
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-p2', 'vkm:datePublished': '2024-02-01T00:00:00Z' }] },
],
},
[`${base}&start=10`]: {
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-p3', 'vkm:datePublished': '2024-03-01T00:00:00Z' }] },
],
},
[`${base}&start=15`]: {
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-p4', 'vkm:datePublished': '2024-04-01T00:00:00Z' }] },
],
},
[`${base}&start=20`]: {
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/doc-p5', 'vkm:datePublished': '2024-05-01T00:00:00Z' }] },
],
},
};
// Build context with size=5 in the request URL
const ctx = makeContext(t, {
req: { url: '/sitemap.xml?size=5', method: 'GET', headers: { host: 'proxy.example.com', 'x-forwarded-proto': 'https' } },
});
ctx.kme_CSA_settings.searchApiBaseUrl = 'https://search.example.com/api';
ctx.kme_CSA_settings.tenant = 'test-tenant';
ctx._store['authorization:token'] = 'sitemap-token';
ctx._store['authorization:expiry'] = '9999999999';
ctx._axios.get = t.mock.fn(async (url) => ({ data: pageData[url] ?? { 'hydra:member': [] } }));
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._axios.get.mock.calls.length, 5, 'should make 5 GET calls (start 0,5,10,15,20)');
const locMatches = ctx._res.body.match(/<loc>/g);
assert.strictEqual(locMatches?.length ?? 0, 5, 'all 5 items from all pages should appear');
assert.ok(ctx._res.body.includes('doc-p1'));
assert.ok(ctx._res.body.includes('doc-p5'));
});
test('hydra:view absent (all results on one page) → no additional pages fetched', async (t) => {
const ctx = makeSitemapContext(t, async () => ({
data: {
// No hydra:view — all 22 results fit in size=50
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/only-doc', 'vkm:datePublished': '2024-01-01T00:00:00Z' }] },
],
},
}));
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._axios.get.mock.calls.length, 1, 'only one GET call when hydra:view absent');
const locMatches = ctx._res.body.match(/<loc>/g);
assert.strictEqual(locMatches?.length ?? 0, 1);
});
test('hydra:view present but hydra:last start=0 → no additional pages fetched', async (t) => {
const ctx = makeSitemapContext(t, async () => ({
data: {
'hydra:view': { 'hydra:last': 'https://search.example.com/api/test-tenant/search?query=*&size=100&category=vkm%3AArticleCategory&start=0' },
'hydra:member': [
{ 'hydra:member': [{ 'vkm:url': 'https://kme.example.com/only-doc', 'vkm:datePublished': '2024-01-01T00:00:00Z' }] },
],
},
}));
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._axios.get.mock.calls.length, 1, 'only one GET call when hydra:last start=0');
const locMatches = ctx._res.body.match(/<loc>/g);
assert.strictEqual(locMatches?.length ?? 0, 1);
});
test('more than 50,000 items → sitemap truncated to exactly 50,000 <loc> elements', async (t) => {
const LIMIT = 50_000;
// Build a response with LIMIT + 5 items
const members = Array.from({ length: LIMIT + 5 }, (_, i) => ({
'hydra:member': [{ 'vkm:url': `https://kme.example.com/doc-${i}`, 'vkm:datePublished': '2024-01-01T00:00:00Z' }],
}));
const ctx = makeSitemapContext(t, async () => ({ data: { 'hydra:member': members } }));
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
const locMatches = ctx._res.body.match(/<loc>/g);
assert.strictEqual(locMatches?.length ?? 0, LIMIT, `should be capped at ${LIMIT}`);
});
// US3 error scenarios (T011b)
test('upstream 503 → 502 with Search service error message', async (t) => {
@@ -488,71 +475,374 @@ describe('sitemap flow', () => {
assert.strictEqual(ctx._res.statusCode, 500);
assert.strictEqual(ctx._res.body, 'Configuration error: missing required field: tenant');
});
});
test('missing proxyBaseUrl → 500 Configuration error', async (t) => {
const ctx = makeSitemapContext(t, null, { proxyBaseUrl: undefined });
// ---------------------------------------------------------------------------
// extractArticleBody helper — T004
// ---------------------------------------------------------------------------
await runScript(ctx);
describe('extractArticleBody helper', () => {
const minimalDeps = {
URLSearchParams,
URL,
console,
axios: { post: async () => ({}), get: async () => ({}) },
redis: { hGet: async () => null, hSet: async () => 1 },
kme_CSA_settings: {},
xmlbuilder2,
};
const helpers = makeHelpers(minimalDeps);
assert.strictEqual(ctx._res.statusCode, 500);
assert.strictEqual(ctx._res.body, 'Configuration error: missing required field: proxyBaseUrl');
test('valid HTML string → returns the string', () => {
assert.strictEqual(helpers.extractArticleBody({ 'vkm:articleBody': '<p>Hello</p>' }), '<p>Hello</p>');
});
test('empty string → null', () => {
assert.strictEqual(helpers.extractArticleBody({ 'vkm:articleBody': '' }), null);
});
test('whitespace-only string → null', () => {
assert.strictEqual(helpers.extractArticleBody({ 'vkm:articleBody': ' ' }), null);
});
test('null field value → null', () => {
assert.strictEqual(helpers.extractArticleBody({ 'vkm:articleBody': null }), null);
});
test('field absent ({}) → null', () => {
assert.strictEqual(helpers.extractArticleBody({}), null);
});
test('null input → null', () => {
assert.strictEqual(helpers.extractArticleBody(null), null);
});
test('non-object input (string) → null', () => {
assert.strictEqual(helpers.extractArticleBody('not-an-object'), null);
});
});
// ---------------------------------------------------------------------------
// Non-sitemap URL routing — regression guard (T009)
// US-content-fetch: happy path — T007
// ---------------------------------------------------------------------------
describe('non-sitemap URL routing', () => {
test('cache hit → no fetch → 200 Authorized', async (t) => {
describe('US-content-fetch: happy path', () => {
test('cached token + valid article response → 200 text/html with body and title', async (t) => {
const contentAxios = {
post: t.mock.fn(async () => ({ data: { id_token: 'mock-token', expires_in: 9_999_999_999 } })),
get: t.mock.fn(async () => ({ data: { 'vkm:name': 'My Article', 'vkm:articleBody': '<p>Hello</p>' } })),
};
const ctx = makeContext(t, {
req: { url: '/', method: 'GET', headers: {} },
axios: {
post: t.mock.fn(async () => { throw new Error('should not be called'); }),
get: t.mock.fn(),
},
req: { url: '/?kmeURL=https://kme.example.com/content/article/123', method: 'GET', headers: {} },
axios: contentAxios,
});
// Pre-seed valid token
ctx._store['authorization:token'] = 'cached-tok';
// Pre-seed token cache → cache hit, no axios.post call
ctx._store['authorization:token'] = 'cached-token';
ctx._store['authorization:expiry'] = '9999999999';
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
// axios.post was set to throw, so if it was called the test would fail
assert.ok(ctx._res.headers['Content-Type'].startsWith('text/html'), `Content-Type was: ${ctx._res.headers['Content-Type']}`);
assert.ok(ctx._res.body.includes('<!DOCTYPE html>'), 'body should contain DOCTYPE');
assert.ok(ctx._res.body.includes('<title>My Article</title>'), 'body should contain title');
assert.ok(ctx._res.body.includes('<p>Hello</p>'), 'body should contain article content verbatim');
assert.ok(!ctx._res.body.includes('<p><p>'), 'article content should not be double-wrapped in <p>');
assert.strictEqual(contentAxios.post.mock.calls.length, 0, 'should not re-fetch token on cache hit');
});
test('cache miss fresh fetch → 200 Authorized', async (t) => {
test('cache miss (fresh token acquired) → 200 text/html with body', async (t) => {
const contentAxios = {
post: t.mock.fn(async () => ({ data: { id_token: 'fresh-token', expires_in: 9_999_999_999 } })),
get: t.mock.fn(async () => ({ data: { 'vkm:name': 'Fresh Article', 'vkm:articleBody': '<p>Hello</p>' } })),
};
const ctx = makeContext(t, {
req: { url: '/', method: 'GET', headers: {} },
req: { url: '/?kmeURL=https://kme.example.com/content/article/123', method: 'GET', headers: {} },
axios: contentAxios,
});
// No pre-seeded token → cache miss
// No pre-seeded token → cache miss → axios.post will be called for fresh token
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.strictEqual(ctx._res.body, 'Authorized');
// Verify token was written to Redis
const hSetCalls = ctx._redis.hSet.mock.calls;
const tokenCall = hSetCalls.find(c => c.arguments[0] === 'authorization' && c.arguments[1] === 'token');
assert.ok(tokenCall, 'hSet should be called with token');
assert.strictEqual(tokenCall.arguments[2], 'mock-token');
assert.ok(ctx._res.headers['Content-Type'].startsWith('text/html'), `Content-Type was: ${ctx._res.headers['Content-Type']}`);
assert.ok(ctx._res.body.includes('<!DOCTYPE html>'), 'body should contain DOCTYPE');
assert.ok(ctx._res.body.includes('<title>Fresh Article</title>'), 'body should contain title');
assert.ok(ctx._res.body.includes('<p>Hello</p>'), 'body should contain article content');
assert.strictEqual(contentAxios.post.mock.calls.length, 1, 'should have fetched fresh token');
});
test('token service down (ECONNABORTED) → 401 Unauthorized', async (t) => {
const timeoutErr = Object.assign(new Error('timeout'), { code: 'ECONNABORTED' });
test('vkm:name absent → title element is empty', async (t) => {
const contentAxios = {
post: t.mock.fn(),
get: t.mock.fn(async () => ({ data: { 'vkm:articleBody': '<p>No title</p>' } })),
};
const ctx = makeContext(t, {
req: { url: '/', method: 'GET', headers: {} },
axios: {
post: t.mock.fn(async () => { throw timeoutErr; }),
get: t.mock.fn(),
},
req: { url: '/?kmeURL=https://kme.example.com/content/article/123', method: 'GET', headers: {} },
axios: contentAxios,
});
ctx._store['authorization:token'] = 'cached-token';
ctx._store['authorization:expiry'] = '9999999999';
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 200);
assert.ok(ctx._res.body.includes('<title></title>'), 'title should be empty when vkm:name absent');
assert.ok(ctx._res.body.includes('<p>No title</p>'));
});
});
// ---------------------------------------------------------------------------
// US-content-fetch: input validation — T009
// ---------------------------------------------------------------------------
describe('US-content-fetch: input validation', () => {
test('no kmeURL param → 404 Not Found, axios.get not called', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?someOtherParam=value', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 401);
assert.ok(ctx._res.body.startsWith('Unauthorized:'), `body was: ${ctx._res.body}`);
assert.strictEqual(ctx._res.statusCode, 404);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
test('empty kmeURL → 400 kmeURL parameter is required', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 400);
assert.ok(ctx._res.body.includes('kmeURL parameter is required'), `body was: ${ctx._res.body}`);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
test('whitespace-only kmeURL (%20) → 400', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=%20', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 400);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
test('relative path kmeURL → 400 well-formed absolute http/https URL', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=relative/path', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 400);
assert.ok(
ctx._res.body.includes('well-formed absolute http/https URL'),
`body was: ${ctx._res.body}`,
);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
test('ftp protocol kmeURL → 400', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=ftp://example.com/article', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 400);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
test(':::malformed kmeURL → 400', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=:::malformed', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 400);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get should not be called');
});
});
// ---------------------------------------------------------------------------
// US-content-fetch: upstream errors — T010
// ---------------------------------------------------------------------------
describe('US-content-fetch: upstream errors', () => {
/** Build a context with kmeURL set and (optionally) a pre-seeded token. */
function makeUpstreamErrCtx(t, axiosGetImpl, { seedToken = true } = {}) {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=https://kme.example.com/article', method: 'GET', headers: {} },
axios: {
post: t.mock.fn(async () => ({ data: { id_token: 'mock-token', expires_in: 9_999_999_999 } })),
get: t.mock.fn(axiosGetImpl),
},
});
if (seedToken) {
ctx._store['authorization:token'] = 'cached-tok';
ctx._store['authorization:expiry'] = '9999999999';
}
return ctx;
}
test('getValidToken throws → 502 token acquisition failed', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=https://kme.example.com/article', method: 'GET', headers: {} },
axios: {
post: t.mock.fn(async () => { throw new Error('token service down'); }),
get: t.mock.fn(),
},
});
// No pre-seeded token → getValidToken will try to POST → throw
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
assert.ok(ctx._res.body.includes('token acquisition failed'), `body was: ${ctx._res.body}`);
});
test('upstream 404 → proxy 404 article not found at upstream', async (t) => {
const err = Object.assign(new Error('Not Found'), { response: { status: 404 } });
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
assert.ok(ctx._res.body.includes('article not found at upstream'), `body was: ${ctx._res.body}`);
});
test('upstream 410 → proxy 404', async (t) => {
const err = Object.assign(new Error('Gone'), { response: { status: 410 } });
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
});
test('upstream 503 → proxy 502 upstream error HTTP 503', async (t) => {
const err = Object.assign(new Error('Service Unavailable'), { response: { status: 503 } });
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
assert.ok(ctx._res.body.includes('upstream error HTTP 503'), `body was: ${ctx._res.body}`);
});
test('ECONNABORTED → proxy 502 upstream request timed out', async (t) => {
const err = Object.assign(new Error('timeout'), { code: 'ECONNABORTED' });
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
assert.ok(ctx._res.body.includes('upstream request timed out'), `body was: ${ctx._res.body}`);
});
test('ERR_CANCELED → proxy 502', async (t) => {
const err = Object.assign(new Error('canceled'), { code: 'ERR_CANCELED' });
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
});
test('network error (no response, no code) → proxy 502 containing Bad Gateway:', async (t) => {
const err = new Error('ENOTFOUND');
const ctx = makeUpstreamErrCtx(t, async () => { throw err; });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
assert.ok(ctx._res.body.includes('Bad Gateway:'), `body was: ${ctx._res.body}`);
});
});
// ---------------------------------------------------------------------------
// US-content-fetch: body parsing — T011
// ---------------------------------------------------------------------------
describe('US-content-fetch: body parsing', () => {
/** Build a context that will produce a given data value from axios.get. */
function makeBodyCtx(t, dataValue) {
const ctx = makeContext(t, {
req: { url: '/?kmeURL=https://kme.example.com/article', method: 'GET', headers: {} },
axios: {
post: t.mock.fn(async () => ({ data: { id_token: 'mock-token', expires_in: 9_999_999_999 } })),
get: t.mock.fn(async () => ({ data: dataValue })),
},
});
ctx._store['authorization:token'] = 'cached-tok';
ctx._store['authorization:expiry'] = '9999999999';
return ctx;
}
test('unparseable string response → 502 unparseable response from upstream', async (t) => {
const ctx = makeBodyCtx(t, 'not json{{{');
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 502);
assert.ok(
ctx._res.body.includes('unparseable response from upstream'),
`body was: ${ctx._res.body}`,
);
});
test('vkm:articleBody absent (undefined) → 404 article body not present', async (t) => {
const ctx = makeBodyCtx(t, { 'vkm:articleBody': undefined });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
assert.ok(ctx._res.body.includes('article body not present'), `body was: ${ctx._res.body}`);
});
test('vkm:articleBody is null → 404', async (t) => {
const ctx = makeBodyCtx(t, { 'vkm:articleBody': null });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
});
test('vkm:articleBody is empty string → 404', async (t) => {
const ctx = makeBodyCtx(t, { 'vkm:articleBody': '' });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
});
test('vkm:articleBody is whitespace-only → 404', async (t) => {
const ctx = makeBodyCtx(t, { 'vkm:articleBody': ' ' });
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
});
});
// ---------------------------------------------------------------------------
// US-content-fetch: passthrough preserved — T013
// ---------------------------------------------------------------------------
describe('US-content-fetch: passthrough preserved', () => {
test('GET with no kmeURL, not sitemap → 404 Not Found, axios.get not called', async (t) => {
const ctx = makeContext(t, {
req: { url: '/?someOtherParam=value', method: 'GET', headers: {} },
});
await runScript(ctx);
assert.strictEqual(ctx._res.statusCode, 404);
assert.strictEqual(ctx._axios.get.mock.calls.length, 0, 'axios.get must not be called');
});
});