Files
kme_content_adapter/specs/003-kme-content-fetch/data-model.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

171 lines
6.4 KiB
Markdown

# Data Model: KME Article Content Fetch (003)
**Phase 1 output for `003-kme-content-fetch`**
---
## Entities
### 1. KME Article Content
Represents a single article fetched from the KME Content Service.
| Field | Type | Source | Notes |
|-------|------|--------|-------|
| `vkm:url` | `string` | KME Content Service JSON-LD | Identifies the article; used as the fetch target (`kmeURL` param) |
| `vkm:articleBody` | `string \| null` | KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty |
**Validation rules**:
- `vkm:articleBody` must be a non-empty, non-whitespace string to constitute a valid article body.
- Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.
**State transitions**: None. This is a read-only fetch — no mutations, no lifecycle.
---
### 2. OIDC Token
Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.
| Field | Type | Storage | Notes |
|-------|------|---------|-------|
| `id_token` | `string` | Redis hash `authorization:token` | The OIDC id_token value |
| `expiry` | `number` (Unix epoch, seconds) | Redis hash `authorization:expiry` | Expiry timestamp; compared to `Date.now() / 1000` |
**Validation rules**:
- Token is valid if `cachedToken !== null && Date.now() / 1000 < expiry`.
- Managed exclusively by `getValidToken()` in `kmeContentSourceAdapterHelpers.js`. Not modified by `contentFetchFlow()`.
**Managed by**: `getValidToken()` (existing helper — unmodified).
---
### 3. Proxy Request
Incoming HTTP request received by the adapter, carrying routing signals and parameters.
| Field | Type | Source | Notes |
|-------|------|--------|-------|
| `req.url` | `string` | Node.js `http.IncomingMessage` | Relative path + query string, e.g. `/?kmeURL=https://...` |
| `req.method` | `string` | Node.js `http.IncomingMessage` | Always `GET` for content-fetch flow |
| `kmeURL` (extracted) | `string` | `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` | The verbatim target URL for upstream fetch |
**Validation rules for `kmeURL`**:
1. **Absent or empty**: `!kmeURL.trim()` → 400 Bad Request (FR-007)
2. **Malformed or non-absolute**: `new URL(kmeURL)` throws, or protocol is not `http:`/`https:` → 400 Bad Request (FR-008)
3. **Valid**: passes both guards → proceed to token acquisition + upstream fetch
---
## Data Flow
```
Incoming request (req.url contains ?kmeURL=...)
Extract kmeURL from query string
(new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
┌── Validate kmeURL ──────────────────────────────────────┐
│ absent/empty? ──────────────────────────────► 400 │
│ malformed/non-https? ────────────────────────► 400 │
└─────────────────────────────────────────────────────────┘
│ valid
getValidToken() → OIDC token (from Redis cache or fresh fetch)
│ token fetch failed? ──────────────────► 502
axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
│ timeout? ──────────────────► 502
│ upstream 4xx? ──────────────────► 404
│ upstream 5xx? ──────────────────► 502
│ network error? ──────────────────► 502
Parse response.data as JSON-LD object
│ unparseable? ──────────────────► 502
│ non-object? ──────────────────► 502
extractArticleBody(data) → vkm:articleBody string or null
│ null (absent/empty/whitespace)? ─────► 404
res.writeHead(200, { 'Content-Type': 'text/html' })
res.end(articleBody)
```
---
## Helper: `extractArticleBody(data)`
**Location**: `src/globalVariables/kmeContentSourceAdapterHelpers.js`
**Type**: Pure function — no side effects, no state, no injected globals required
**Added to exports**: `return { ..., extractArticleBody }`
**Signature**:
```javascript
function extractArticleBody(data) string | null
```
**Input/output contract**:
| Input | Output |
|-------|--------|
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
| `{ 'vkm:articleBody': '' }` | `null` |
| `{ 'vkm:articleBody': ' ' }` | `null` |
| `{ 'vkm:articleBody': null }` | `null` |
| `{}` (field absent) | `null` |
| `null` | `null` |
| `'a string'` (non-object) | `null` |
**Implementation**:
```javascript
function extractArticleBody(data) {
if (!data || typeof data !== 'object') return null;
const body = data['vkm:articleBody'];
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
return body;
}
```
---
## KME Content Service Response Shape
The KME Content Service returns a JSON-LD document. Only `vkm:articleBody` is consumed by this
feature. Example:
```json
{
"@context": "https://vocabs.kme.example/context.jsonld",
"@type": "vkm:Article",
"vkm:url": "https://content.kme.example/articles/123",
"vkm:articleBody": "<p>Article content here...</p>",
"vkm:title": "Example Article"
}
```
All other fields are ignored by `extractArticleBody`. The proxy passes the raw HTML string of
`vkm:articleBody` directly as the response body — no transformation, sanitisation, or re-encoding.
---
## Settings Used (from `kme_CSA_settings`)
The content-fetch flow reads the same OIDC settings as `oidcAuthFlow` and `sitemapFlow`:
| Field | Purpose |
|-------|---------|
| `tokenUrl` | OIDC token endpoint |
| `username` | OIDC username credential |
| `password` | OIDC password credential |
| `clientId` | OIDC client identifier |
| `scope` | OIDC requested scope |
These are validated via `kmeContentSourceAdapterHelpers.validateSettings()` before calling
`getValidToken()`. Missing fields produce a 500 Configuration Error response.