- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.4 KiB
Data Model: KME Article Content Fetch (003)
Phase 1 output for 003-kme-content-fetch
Entities
1. KME Article Content
Represents a single article fetched from the KME Content Service.
| Field | Type | Source | Notes |
|---|---|---|---|
vkm:url |
string |
KME Content Service JSON-LD | Identifies the article; used as the fetch target (kmeURL param) |
vkm:articleBody |
string | null |
KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty |
Validation rules:
vkm:articleBodymust be a non-empty, non-whitespace string to constitute a valid article body.- Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.
State transitions: None. This is a read-only fetch — no mutations, no lifecycle.
2. OIDC Token
Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.
| Field | Type | Storage | Notes |
|---|---|---|---|
id_token |
string |
Redis hash authorization:token |
The OIDC id_token value |
expiry |
number (Unix epoch, seconds) |
Redis hash authorization:expiry |
Expiry timestamp; compared to Date.now() / 1000 |
Validation rules:
- Token is valid if
cachedToken !== null && Date.now() / 1000 < expiry. - Managed exclusively by
getValidToken()inkmeContentSourceAdapterHelpers.js. Not modified bycontentFetchFlow().
Managed by: getValidToken() (existing helper — unmodified).
3. Proxy Request
Incoming HTTP request received by the adapter, carrying routing signals and parameters.
| Field | Type | Source | Notes |
|---|---|---|---|
req.url |
string |
Node.js http.IncomingMessage |
Relative path + query string, e.g. /?kmeURL=https://... |
req.method |
string |
Node.js http.IncomingMessage |
Always GET for content-fetch flow |
kmeURL (extracted) |
string |
new URL(req.url, 'http://localhost').searchParams.get('kmeURL') |
The verbatim target URL for upstream fetch |
Validation rules for kmeURL:
- Absent or empty:
!kmeURL.trim()→ 400 Bad Request (FR-007) - Malformed or non-absolute:
new URL(kmeURL)throws, or protocol is nothttp:/https:→ 400 Bad Request (FR-008) - Valid: passes both guards → proceed to token acquisition + upstream fetch
Data Flow
Incoming request (req.url contains ?kmeURL=...)
│
▼
Extract kmeURL from query string
(new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
│
▼
┌── Validate kmeURL ──────────────────────────────────────┐
│ absent/empty? ──────────────────────────────► 400 │
│ malformed/non-https? ────────────────────────► 400 │
└─────────────────────────────────────────────────────────┘
│ valid
▼
getValidToken() → OIDC token (from Redis cache or fresh fetch)
│
│ token fetch failed? ──────────────────► 502
▼
axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
│
│ timeout? ──────────────────► 502
│ upstream 4xx? ──────────────────► 404
│ upstream 5xx? ──────────────────► 502
│ network error? ──────────────────► 502
▼
Parse response.data as JSON-LD object
│
│ unparseable? ──────────────────► 502
│ non-object? ──────────────────► 502
▼
extractArticleBody(data) → vkm:articleBody string or null
│
│ null (absent/empty/whitespace)? ─────► 404
▼
res.writeHead(200, { 'Content-Type': 'text/html' })
res.end(articleBody)
Helper: extractArticleBody(data)
Location: src/globalVariables/kmeContentSourceAdapterHelpers.js
Type: Pure function — no side effects, no state, no injected globals required
Added to exports: return { ..., extractArticleBody }
Signature:
function extractArticleBody(data) → string | null
Input/output contract:
| Input | Output |
|---|---|
{ 'vkm:articleBody': '<p>Hello</p>' } |
'<p>Hello</p>' |
{ 'vkm:articleBody': '' } |
null |
{ 'vkm:articleBody': ' ' } |
null |
{ 'vkm:articleBody': null } |
null |
{} (field absent) |
null |
null |
null |
'a string' (non-object) |
null |
Implementation:
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
const body = data['vkm:articleBody'];
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
return body;
}
KME Content Service Response Shape
The KME Content Service returns a JSON-LD document. Only vkm:articleBody is consumed by this
feature. Example:
{
"@context": "https://vocabs.kme.example/context.jsonld",
"@type": "vkm:Article",
"vkm:url": "https://content.kme.example/articles/123",
"vkm:articleBody": "<p>Article content here...</p>",
"vkm:title": "Example Article"
}
All other fields are ignored by extractArticleBody. The proxy passes the raw HTML string of
vkm:articleBody directly as the response body — no transformation, sanitisation, or re-encoding.
Settings Used (from kme_CSA_settings)
The content-fetch flow reads the same OIDC settings as oidcAuthFlow and sitemapFlow:
| Field | Purpose |
|---|---|
tokenUrl |
OIDC token endpoint |
username |
OIDC username credential |
password |
OIDC password credential |
clientId |
OIDC client identifier |
scope |
OIDC requested scope |
These are validated via kmeContentSourceAdapterHelpers.validateSettings() before calling
getValidToken(). Missing fields produce a 500 Configuration Error response.