# Data Model: KME Article Content Fetch (003) **Phase 1 output for `003-kme-content-fetch`** --- ## Entities ### 1. KME Article Content Represents a single article fetched from the KME Content Service. | Field | Type | Source | Notes | |-------|------|--------|-------| | `vkm:url` | `string` | KME Content Service JSON-LD | Identifies the article; used as the fetch target (`kmeURL` param) | | `vkm:articleBody` | `string \| null` | KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty | **Validation rules**: - `vkm:articleBody` must be a non-empty, non-whitespace string to constitute a valid article body. - Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404. **State transitions**: None. This is a read-only fetch — no mutations, no lifecycle. --- ### 2. OIDC Token Short-lived bearer credential used to authenticate upstream requests to the KME Content Service. | Field | Type | Storage | Notes | |-------|------|---------|-------| | `id_token` | `string` | Redis hash `authorization:token` | The OIDC id_token value | | `expiry` | `number` (Unix epoch, seconds) | Redis hash `authorization:expiry` | Expiry timestamp; compared to `Date.now() / 1000` | **Validation rules**: - Token is valid if `cachedToken !== null && Date.now() / 1000 < expiry`. - Managed exclusively by `getValidToken()` in `kmeContentSourceAdapterHelpers.js`. Not modified by `contentFetchFlow()`. **Managed by**: `getValidToken()` (existing helper — unmodified). --- ### 3. Proxy Request Incoming HTTP request received by the adapter, carrying routing signals and parameters. | Field | Type | Source | Notes | |-------|------|--------|-------| | `req.url` | `string` | Node.js `http.IncomingMessage` | Relative path + query string, e.g. `/?kmeURL=https://...` | | `req.method` | `string` | Node.js `http.IncomingMessage` | Always `GET` for content-fetch flow | | `kmeURL` (extracted) | `string` | `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` | The verbatim target URL for upstream fetch | **Validation rules for `kmeURL`**: 1. **Absent or empty**: `!kmeURL.trim()` → 400 Bad Request (FR-007) 2. **Malformed or non-absolute**: `new URL(kmeURL)` throws, or protocol is not `http:`/`https:` → 400 Bad Request (FR-008) 3. **Valid**: passes both guards → proceed to token acquisition + upstream fetch --- ## Data Flow ``` Incoming request (req.url contains ?kmeURL=...) │ ▼ Extract kmeURL from query string (new URL(req.url, 'http://localhost').searchParams.get('kmeURL')) │ ▼ ┌── Validate kmeURL ──────────────────────────────────────┐ │ absent/empty? ──────────────────────────────► 400 │ │ malformed/non-https? ────────────────────────► 400 │ └─────────────────────────────────────────────────────────┘ │ valid ▼ getValidToken() → OIDC token (from Redis cache or fresh fetch) │ │ token fetch failed? ──────────────────► 502 ▼ axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 }) │ │ timeout? ──────────────────► 502 │ upstream 4xx? ──────────────────► 404 │ upstream 5xx? ──────────────────► 502 │ network error? ──────────────────► 502 ▼ Parse response.data as JSON-LD object │ │ unparseable? ──────────────────► 502 │ non-object? ──────────────────► 502 ▼ extractArticleBody(data) → vkm:articleBody string or null │ │ null (absent/empty/whitespace)? ─────► 404 ▼ res.writeHead(200, { 'Content-Type': 'text/html' }) res.end(articleBody) ``` --- ## Helper: `extractArticleBody(data)` **Location**: `src/globalVariables/kmeContentSourceAdapterHelpers.js` **Type**: Pure function — no side effects, no state, no injected globals required **Added to exports**: `return { ..., extractArticleBody }` **Signature**: ```javascript function extractArticleBody(data) → string | null ``` **Input/output contract**: | Input | Output | |-------|--------| | `{ 'vkm:articleBody': '

Hello

' }` | `'

Hello

'` | | `{ 'vkm:articleBody': '' }` | `null` | | `{ 'vkm:articleBody': ' ' }` | `null` | | `{ 'vkm:articleBody': null }` | `null` | | `{}` (field absent) | `null` | | `null` | `null` | | `'a string'` (non-object) | `null` | **Implementation**: ```javascript function extractArticleBody(data) { if (!data || typeof data !== 'object') return null; const body = data['vkm:articleBody']; if (body == null || typeof body !== 'string' || body.trim() === '') return null; return body; } ``` --- ## KME Content Service Response Shape The KME Content Service returns a JSON-LD document. Only `vkm:articleBody` is consumed by this feature. Example: ```json { "@context": "https://vocabs.kme.example/context.jsonld", "@type": "vkm:Article", "vkm:url": "https://content.kme.example/articles/123", "vkm:articleBody": "

Article content here...

", "vkm:title": "Example Article" } ``` All other fields are ignored by `extractArticleBody`. The proxy passes the raw HTML string of `vkm:articleBody` directly as the response body — no transformation, sanitisation, or re-encoding. --- ## Settings Used (from `kme_CSA_settings`) The content-fetch flow reads the same OIDC settings as `oidcAuthFlow` and `sitemapFlow`: | Field | Purpose | |-------|---------| | `tokenUrl` | OIDC token endpoint | | `username` | OIDC username credential | | `password` | OIDC password credential | | `clientId` | OIDC client identifier | | `scope` | OIDC requested scope | These are validated via `kmeContentSourceAdapterHelpers.validateSettings()` before calling `getValidToken()`. Missing fields produce a 500 Configuration Error response.