feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
330
specs/003-kme-content-fetch/plan.md
Normal file
330
specs/003-kme-content-fetch/plan.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Implementation Plan: KME Article Content Fetch
|
||||
|
||||
**Branch**: `003-kme-content-fetch` | **Date**: 2025-07-15 | **Spec**: [spec.md](spec.md)
|
||||
**Input**: Feature specification from `specs/003-kme-content-fetch/spec.md`
|
||||
|
||||
## Summary
|
||||
|
||||
Add a new `contentFetchFlow()` to `src/proxyScripts/kmeContentSourceAdapter.js` that handles
|
||||
requests carrying a `?kmeURL=` query parameter. The flow validates the parameter, obtains an OIDC
|
||||
token via the existing `getValidToken()`, performs a GET request to the `kmeURL` with
|
||||
`Authorization: OIDC_id_token {token}`, extracts `vkm:articleBody` from the JSON-LD response, and
|
||||
returns it as `text/html`. A new pure helper `extractArticleBody(data)` is added to
|
||||
`src/globalVariables/kmeContentSourceAdapterHelpers.js`. No new files, no new npm dependencies.
|
||||
|
||||
## Technical Context
|
||||
|
||||
**Language/Version**: Node.js ≥18, ESM (`"type": "module"`)
|
||||
**Primary Dependencies**: `axios ^1.13` (HTTP client, already in context), `redis ^5` (token cache, injected), `xmlbuilder2 ^4` (sitemap, unrelated to this feature)
|
||||
**Storage**: Redis (token cache only — managed by `getValidToken()`, not modified by this feature)
|
||||
**Testing**: Node.js built-in test runner (`node:test`) — `npm run test:unit`, `npm run test:contract`
|
||||
**Target Platform**: Node.js server (Linux/macOS); proxy script executed inside `vm.createContext` per request
|
||||
**Project Type**: HTTP proxy adapter — monolithic VM-sandbox architecture
|
||||
**Performance Goals**: End-to-end response ≤11 s (10 s upstream timeout + 1 s proxy overhead) per SC-001
|
||||
**Constraints**: Zero new imports/exports in VM sandbox files; no new npm dependencies; no new `src/` files
|
||||
**Scale/Scope**: Two files modified (`kmeContentSourceAdapter.js`, `kmeContentSourceAdapterHelpers.js`); new unit tests in `tests/unit/proxy.test.js`; new contract tests in `tests/contract/proxy-http.test.js`
|
||||
|
||||
## Constitution Check
|
||||
|
||||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||||
|
||||
| Gate | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| All business logic stays in `src/proxyScripts/kmeContentSourceAdapter.js` | ✅ PASS | `contentFetchFlow()` lives entirely in the adapter file |
|
||||
| Zero `import`/`export` in VM sandbox file | ✅ PASS | No imports added; all dependencies via injected context |
|
||||
| `extractArticleBody` is a pure utility → helper file | ✅ PASS | No state, no API calls — qualifies for `kmeContentSourceAdapterHelpers.js` |
|
||||
| No new files in `src/` | ✅ PASS | Only two existing files are modified |
|
||||
| No new npm dependencies | ✅ PASS | `axios` and `URL`/`URLSearchParams` already available in context |
|
||||
| Helpers file uses literal function body pattern | ✅ PASS | New helper added before existing `return { ... }` block |
|
||||
| Authentication (`getValidToken`) stays in proxy script (called from adapter, not moved) | ✅ PASS | `getValidToken()` is invoked from `contentFetchFlow()` in the adapter |
|
||||
|
||||
**Post-design re-check**: All gates pass. No constitutional violations. No complexity tracking required.
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Documentation (this feature)
|
||||
|
||||
```text
|
||||
specs/003-kme-content-fetch/
|
||||
├── plan.md # This file
|
||||
├── research.md # Phase 0 output
|
||||
├── data-model.md # Phase 1 output
|
||||
├── quickstart.md # Phase 1 output
|
||||
├── contracts/
|
||||
│ └── http-content-fetch.md # HTTP request/response contract
|
||||
└── tasks.md # Phase 2 output (/speckit.tasks — NOT created here)
|
||||
```
|
||||
|
||||
### Source Code (repository root)
|
||||
|
||||
```text
|
||||
src/
|
||||
├── proxyScripts/
|
||||
│ └── kmeContentSourceAdapter.js # MODIFIED: add contentFetchFlow(), update routing
|
||||
├── globalVariables/
|
||||
│ └── kmeContentSourceAdapterHelpers.js # MODIFIED: add extractArticleBody()
|
||||
├── logger.js # unchanged
|
||||
└── server.js # unchanged
|
||||
|
||||
tests/
|
||||
├── unit/
|
||||
│ └── proxy.test.js # MODIFIED: add content-fetch describe blocks
|
||||
└── contract/
|
||||
└── proxy-http.test.js # MODIFIED: add content-fetch contract tests
|
||||
```
|
||||
|
||||
**Structure Decision**: Single-project monolith. Two existing source files modified; two existing test
|
||||
files extended. No new files in `src/`. The VM sandbox pattern and helper file pattern are preserved
|
||||
exactly as documented in the constitution.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Research Findings → `research.md`
|
||||
|
||||
See [research.md](research.md) for full decision log. Summary:
|
||||
|
||||
- **URL parameter extraction**: `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` — confirmed safe from VM context research; `URL` is injected at server.js line 19.
|
||||
- **URL validation**: `new URL(kmeURL)` + protocol check in try/catch — cleanly handles FR-007/FR-008 in one guard.
|
||||
- **Axios error handling**: Confirmed `ECONNABORTED`/`ERR_CANCELED` for timeout; `err.response.status` available for all HTTP errors; JSON auto-parsed when `Content-Type: application/json`.
|
||||
- **JSON-LD parsing**: `response.data` is an object when axios auto-parses; fallback `JSON.parse()` needed for non-JSON content-types; non-object → 502.
|
||||
- **No unknowns remaining**: All NEEDS CLARIFICATION resolved. Research complete.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Design
|
||||
|
||||
### Routing Change (`kmeContentSourceAdapter.js`)
|
||||
|
||||
Add a new `else if` branch between the existing sitemap check and the `oidcAuthFlow` fallback:
|
||||
|
||||
```javascript
|
||||
// Entry point — URL routing
|
||||
try {
|
||||
if (req.url.endsWith('/sitemap.xml')) {
|
||||
await sitemapFlow();
|
||||
} else if (new URL(req.url, 'http://localhost').searchParams.has('kmeURL')) {
|
||||
await contentFetchFlow(); // ← NEW
|
||||
} else {
|
||||
await oidcAuthFlow();
|
||||
}
|
||||
} catch (err) { /* existing outer catch → 401 */ }
|
||||
```
|
||||
|
||||
`contentFetchFlow()` is fully self-contained — all errors are caught internally and never propagate to the outer catch.
|
||||
|
||||
### `contentFetchFlow()` — Complete Logic
|
||||
|
||||
```javascript
|
||||
async function contentFetchFlow() {
|
||||
// Step 1: Extract kmeURL (FR-001, FR-002)
|
||||
const kmeURL = new URL(req.url, 'http://localhost').searchParams.get('kmeURL') ?? '';
|
||||
|
||||
// Step 2: Validate — absent / empty (FR-007)
|
||||
if (!kmeURL.trim()) {
|
||||
res.writeHead(400, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Request: kmeURL parameter is required');
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 3: Validate — well-formed absolute http/https URL (FR-008)
|
||||
try {
|
||||
const u = new URL(kmeURL);
|
||||
if (u.protocol !== 'http:' && u.protocol !== 'https:') throw new Error();
|
||||
} catch {
|
||||
res.writeHead(400, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Request: kmeURL must be a well-formed absolute http/https URL');
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 4: Validate OIDC settings (config guard, returns 500 for missing config)
|
||||
const missingField = kmeContentSourceAdapterHelpers.validateSettings(
|
||||
kme_CSA_settings,
|
||||
['tokenUrl', 'username', 'password', 'clientId', 'scope'],
|
||||
);
|
||||
if (missingField) {
|
||||
console.error({ message: 'Content fetch: config error', missingField });
|
||||
res.writeHead(500, { 'Content-Type': 'text/plain' });
|
||||
res.end('Configuration error: missing required field: ' + missingField);
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 5: Obtain OIDC token (FR-003, FR-011)
|
||||
let token;
|
||||
try {
|
||||
token = await kmeContentSourceAdapterHelpers.getValidToken(req.url, req.method);
|
||||
} catch (tokenErr) {
|
||||
console.error({ message: 'Content fetch: token acquisition failed', error: tokenErr.message });
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: token acquisition failed');
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 6: GET kmeURL verbatim with auth header (FR-002, FR-003, FR-004)
|
||||
let response;
|
||||
try {
|
||||
console.debug({ message: 'Content fetch: fetching article', kmeURL });
|
||||
response = await axios.get(kmeURL, {
|
||||
headers: { Authorization: `OIDC_id_token ${token}` },
|
||||
timeout: 10000,
|
||||
});
|
||||
} catch (fetchErr) {
|
||||
if (fetchErr.code === 'ECONNABORTED' || fetchErr.code === 'ERR_CANCELED') {
|
||||
console.error({ message: 'Content fetch: upstream timeout', code: fetchErr.code });
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: upstream request timed out');
|
||||
} else if (fetchErr.response) {
|
||||
const status = fetchErr.response.status;
|
||||
console.error({ message: 'Content fetch: upstream HTTP error', status });
|
||||
if (status >= 400 && status < 500) {
|
||||
res.writeHead(404, { 'Content-Type': 'text/plain' });
|
||||
res.end('Not Found: article not found at upstream');
|
||||
} else {
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: upstream error HTTP ' + status);
|
||||
}
|
||||
} else {
|
||||
console.error({ message: 'Content fetch: network error', error: fetchErr.message });
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: ' + fetchErr.message);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 7: Parse body — handle non-JSON content-type (FR-005, FR-010)
|
||||
let data = response.data;
|
||||
if (typeof data === 'string') {
|
||||
try {
|
||||
data = JSON.parse(data);
|
||||
} catch {
|
||||
console.error({ message: 'Content fetch: unparseable response body', kmeURL });
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: unparseable response from upstream');
|
||||
return;
|
||||
}
|
||||
}
|
||||
if (typeof data !== 'object' || data === null) {
|
||||
console.error({ message: 'Content fetch: unexpected non-object response', kmeURL });
|
||||
res.writeHead(502, { 'Content-Type': 'text/plain' });
|
||||
res.end('Bad Gateway: unexpected response from upstream');
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 8: Extract vkm:articleBody (FR-005, FR-009)
|
||||
const articleBody = kmeContentSourceAdapterHelpers.extractArticleBody(data);
|
||||
if (!articleBody) {
|
||||
console.error({ message: 'Content fetch: vkm:articleBody absent or empty', kmeURL });
|
||||
res.writeHead(404, { 'Content-Type': 'text/plain' });
|
||||
res.end('Not Found: article body not present in upstream response');
|
||||
return;
|
||||
}
|
||||
|
||||
// Step 9: Return article HTML (FR-006)
|
||||
console.info({ message: 'Content fetch: article fetched successfully', kmeURL });
|
||||
res.writeHead(200, { 'Content-Type': 'text/html' });
|
||||
res.end(articleBody);
|
||||
}
|
||||
```
|
||||
|
||||
### `extractArticleBody(data)` — New Helper
|
||||
|
||||
Add to `kmeContentSourceAdapterHelpers.js` before the existing `return { ... }` block:
|
||||
|
||||
```javascript
|
||||
/**
|
||||
* Extracts the vkm:articleBody string from a KME Content Service JSON-LD response.
|
||||
* Returns null if the field is absent, null, not a string, or an empty/whitespace string.
|
||||
* @param {object} data – parsed JSON-LD response from the KME Content Service
|
||||
* @returns {string|null}
|
||||
*/
|
||||
function extractArticleBody(data) {
|
||||
if (!data || typeof data !== 'object') return null;
|
||||
const body = data['vkm:articleBody'];
|
||||
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
|
||||
return body;
|
||||
}
|
||||
```
|
||||
|
||||
Update the `return { ... }` at the bottom of the helpers file to export the new function:
|
||||
|
||||
```javascript
|
||||
return {
|
||||
validateSettings,
|
||||
extractHydraItems,
|
||||
buildSitemapXml,
|
||||
getValidToken,
|
||||
extractArticleBody, // ← NEW
|
||||
};
|
||||
```
|
||||
|
||||
### Error Response Matrix
|
||||
|
||||
| Condition | HTTP Status | Response Body |
|
||||
|-----------|-------------|---------------|
|
||||
| `kmeURL` absent or empty | 400 | `Bad Request: kmeURL parameter is required` |
|
||||
| `kmeURL` not a well-formed absolute http/https URL | 400 | `Bad Request: kmeURL must be a well-formed absolute http/https URL` |
|
||||
| Missing OIDC config field | 500 | `Configuration error: missing required field: {field}` |
|
||||
| Token acquisition failure | 502 | `Bad Gateway: token acquisition failed` |
|
||||
| Upstream 4xx response | 404 | `Not Found: article not found at upstream` |
|
||||
| Upstream 5xx response | 502 | `Bad Gateway: upstream error HTTP {status}` |
|
||||
| Upstream timeout (`ECONNABORTED`/`ERR_CANCELED`) | 502 | `Bad Gateway: upstream request timed out` |
|
||||
| Network error (no `err.response`) | 502 | `Bad Gateway: {err.message}` |
|
||||
| Response body unparseable as JSON | 502 | `Bad Gateway: unparseable response from upstream` |
|
||||
| Non-object response body | 502 | `Bad Gateway: unexpected response from upstream` |
|
||||
| `vkm:articleBody` absent, null, or empty | 404 | `Not Found: article body not present in upstream response` |
|
||||
| Success | 200 `text/html` | article body HTML |
|
||||
|
||||
### Test Coverage Plan
|
||||
|
||||
**Unit tests** (add to `tests/unit/proxy.test.js`):
|
||||
|
||||
| Describe block | Test case | Verifies |
|
||||
|---------------|-----------|---------|
|
||||
| `US-content-fetch: happy path` | cached token → 200 HTML body | FR-001, FR-005, FR-006 |
|
||||
| `US-content-fetch: happy path` | cache miss → token fetch → 200 HTML body | FR-003 |
|
||||
| `US-content-fetch: happy path` | expired token → refresh → 200 HTML body | FR-003 |
|
||||
| `US-content-fetch: input validation` | no `kmeURL` → oidcAuthFlow (unchanged 200) | FR-012 |
|
||||
| `US-content-fetch: input validation` | `kmeURL` empty string → 400 | FR-007 |
|
||||
| `US-content-fetch: input validation` | `kmeURL` whitespace → 400 | FR-007 |
|
||||
| `US-content-fetch: input validation` | `kmeURL` relative URL → 400 | FR-008 |
|
||||
| `US-content-fetch: input validation` | `kmeURL` non-http protocol (`ftp:`) → 400 | FR-008 |
|
||||
| `US-content-fetch: input validation` | `kmeURL` malformed string → 400 | FR-008 |
|
||||
| `US-content-fetch: token failure` | `getValidToken` throws → 502 | FR-011 |
|
||||
| `US-content-fetch: upstream errors` | upstream 404 → 404 | FR-009 |
|
||||
| `US-content-fetch: upstream errors` | upstream 410 → 404 | FR-009 |
|
||||
| `US-content-fetch: upstream errors` | upstream 503 → 502 | FR-010 |
|
||||
| `US-content-fetch: upstream errors` | timeout `ECONNABORTED` → 502 | FR-010 |
|
||||
| `US-content-fetch: upstream errors` | timeout `ERR_CANCELED` → 502 | FR-010 |
|
||||
| `US-content-fetch: upstream errors` | network error (no `err.response`) → 502 | FR-010 |
|
||||
| `US-content-fetch: body parsing` | unparseable string body → 502 | FR-010 |
|
||||
| `US-content-fetch: body parsing` | `vkm:articleBody` absent → 404 | FR-009 |
|
||||
| `US-content-fetch: body parsing` | `vkm:articleBody` null → 404 | FR-009 |
|
||||
| `US-content-fetch: body parsing` | `vkm:articleBody` empty string → 404 | FR-009 |
|
||||
| `US-content-fetch: body parsing` | `vkm:articleBody` whitespace → 404 | FR-009 |
|
||||
| `US-content-fetch: passthrough preserved` | no `kmeURL`, not sitemap → 200 'Authorized' | FR-012 |
|
||||
| `extractArticleBody helper` | returns body string | FR-005 |
|
||||
| `extractArticleBody helper` | null data → null | FR-005 |
|
||||
| `extractArticleBody helper` | no `vkm:articleBody` field → null | FR-009 |
|
||||
| `extractArticleBody helper` | empty string → null | FR-009 |
|
||||
| `extractArticleBody helper` | whitespace string → null | FR-009 |
|
||||
|
||||
**Contract tests** (add to `tests/contract/proxy-http.test.js`):
|
||||
|
||||
| Test case | Setup | Verifies |
|
||||
|-----------|-------|---------|
|
||||
| valid `kmeURL` → real mock HTTP server returning JSON-LD with `vkm:articleBody` → 200 HTML | real HTTP server, real token server | SC-001, FR-006 |
|
||||
| real mock server returning 404 → proxy returns 404 | real 404 HTTP server | FR-009 |
|
||||
| real mock server returning 503 → proxy returns 502 | real 503 HTTP server | FR-010 |
|
||||
| non-responding server → proxy returns 502 within 12 s | real server that never responds | FR-010 |
|
||||
|
||||
### `extractArticleBody` — Edge Case Coverage
|
||||
|
||||
| Input | Expected output |
|
||||
|-------|----------------|
|
||||
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
|
||||
| `{ 'vkm:articleBody': '' }` | `null` |
|
||||
| `{ 'vkm:articleBody': ' ' }` | `null` |
|
||||
| `{ 'vkm:articleBody': null }` | `null` |
|
||||
| `{ 'vkm:articleBody': undefined }` (field absent) | `null` |
|
||||
| `{}` (field absent) | `null` |
|
||||
| `null` | `null` |
|
||||
| `'string'` (non-object) | `null` |
|
||||
Reference in New Issue
Block a user