- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
171 lines
6.4 KiB
Markdown
171 lines
6.4 KiB
Markdown
# Data Model: KME Article Content Fetch (003)
|
|
|
|
**Phase 1 output for `003-kme-content-fetch`**
|
|
|
|
---
|
|
|
|
## Entities
|
|
|
|
### 1. KME Article Content
|
|
|
|
Represents a single article fetched from the KME Content Service.
|
|
|
|
| Field | Type | Source | Notes |
|
|
|-------|------|--------|-------|
|
|
| `vkm:url` | `string` | KME Content Service JSON-LD | Identifies the article; used as the fetch target (`kmeURL` param) |
|
|
| `vkm:articleBody` | `string \| null` | KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty |
|
|
|
|
**Validation rules**:
|
|
- `vkm:articleBody` must be a non-empty, non-whitespace string to constitute a valid article body.
|
|
- Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.
|
|
|
|
**State transitions**: None. This is a read-only fetch — no mutations, no lifecycle.
|
|
|
|
---
|
|
|
|
### 2. OIDC Token
|
|
|
|
Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.
|
|
|
|
| Field | Type | Storage | Notes |
|
|
|-------|------|---------|-------|
|
|
| `id_token` | `string` | Redis hash `authorization:token` | The OIDC id_token value |
|
|
| `expiry` | `number` (Unix epoch, seconds) | Redis hash `authorization:expiry` | Expiry timestamp; compared to `Date.now() / 1000` |
|
|
|
|
**Validation rules**:
|
|
- Token is valid if `cachedToken !== null && Date.now() / 1000 < expiry`.
|
|
- Managed exclusively by `getValidToken()` in `kmeContentSourceAdapterHelpers.js`. Not modified by `contentFetchFlow()`.
|
|
|
|
**Managed by**: `getValidToken()` (existing helper — unmodified).
|
|
|
|
---
|
|
|
|
### 3. Proxy Request
|
|
|
|
Incoming HTTP request received by the adapter, carrying routing signals and parameters.
|
|
|
|
| Field | Type | Source | Notes |
|
|
|-------|------|--------|-------|
|
|
| `req.url` | `string` | Node.js `http.IncomingMessage` | Relative path + query string, e.g. `/?kmeURL=https://...` |
|
|
| `req.method` | `string` | Node.js `http.IncomingMessage` | Always `GET` for content-fetch flow |
|
|
| `kmeURL` (extracted) | `string` | `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` | The verbatim target URL for upstream fetch |
|
|
|
|
**Validation rules for `kmeURL`**:
|
|
1. **Absent or empty**: `!kmeURL.trim()` → 400 Bad Request (FR-007)
|
|
2. **Malformed or non-absolute**: `new URL(kmeURL)` throws, or protocol is not `http:`/`https:` → 400 Bad Request (FR-008)
|
|
3. **Valid**: passes both guards → proceed to token acquisition + upstream fetch
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
Incoming request (req.url contains ?kmeURL=...)
|
|
│
|
|
▼
|
|
Extract kmeURL from query string
|
|
(new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
|
|
│
|
|
▼
|
|
┌── Validate kmeURL ──────────────────────────────────────┐
|
|
│ absent/empty? ──────────────────────────────► 400 │
|
|
│ malformed/non-https? ────────────────────────► 400 │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│ valid
|
|
▼
|
|
getValidToken() → OIDC token (from Redis cache or fresh fetch)
|
|
│
|
|
│ token fetch failed? ──────────────────► 502
|
|
▼
|
|
axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
|
|
│
|
|
│ timeout? ──────────────────► 502
|
|
│ upstream 4xx? ──────────────────► 404
|
|
│ upstream 5xx? ──────────────────► 502
|
|
│ network error? ──────────────────► 502
|
|
▼
|
|
Parse response.data as JSON-LD object
|
|
│
|
|
│ unparseable? ──────────────────► 502
|
|
│ non-object? ──────────────────► 502
|
|
▼
|
|
extractArticleBody(data) → vkm:articleBody string or null
|
|
│
|
|
│ null (absent/empty/whitespace)? ─────► 404
|
|
▼
|
|
res.writeHead(200, { 'Content-Type': 'text/html' })
|
|
res.end(articleBody)
|
|
```
|
|
|
|
---
|
|
|
|
## Helper: `extractArticleBody(data)`
|
|
|
|
**Location**: `src/globalVariables/kmeContentSourceAdapterHelpers.js`
|
|
**Type**: Pure function — no side effects, no state, no injected globals required
|
|
**Added to exports**: `return { ..., extractArticleBody }`
|
|
|
|
**Signature**:
|
|
```javascript
|
|
function extractArticleBody(data) → string | null
|
|
```
|
|
|
|
**Input/output contract**:
|
|
|
|
| Input | Output |
|
|
|-------|--------|
|
|
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
|
|
| `{ 'vkm:articleBody': '' }` | `null` |
|
|
| `{ 'vkm:articleBody': ' ' }` | `null` |
|
|
| `{ 'vkm:articleBody': null }` | `null` |
|
|
| `{}` (field absent) | `null` |
|
|
| `null` | `null` |
|
|
| `'a string'` (non-object) | `null` |
|
|
|
|
**Implementation**:
|
|
```javascript
|
|
function extractArticleBody(data) {
|
|
if (!data || typeof data !== 'object') return null;
|
|
const body = data['vkm:articleBody'];
|
|
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
|
|
return body;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## KME Content Service Response Shape
|
|
|
|
The KME Content Service returns a JSON-LD document. Only `vkm:articleBody` is consumed by this
|
|
feature. Example:
|
|
|
|
```json
|
|
{
|
|
"@context": "https://vocabs.kme.example/context.jsonld",
|
|
"@type": "vkm:Article",
|
|
"vkm:url": "https://content.kme.example/articles/123",
|
|
"vkm:articleBody": "<p>Article content here...</p>",
|
|
"vkm:title": "Example Article"
|
|
}
|
|
```
|
|
|
|
All other fields are ignored by `extractArticleBody`. The proxy passes the raw HTML string of
|
|
`vkm:articleBody` directly as the response body — no transformation, sanitisation, or re-encoding.
|
|
|
|
---
|
|
|
|
## Settings Used (from `kme_CSA_settings`)
|
|
|
|
The content-fetch flow reads the same OIDC settings as `oidcAuthFlow` and `sitemapFlow`:
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `tokenUrl` | OIDC token endpoint |
|
|
| `username` | OIDC username credential |
|
|
| `password` | OIDC password credential |
|
|
| `clientId` | OIDC client identifier |
|
|
| `scope` | OIDC requested scope |
|
|
|
|
These are validated via `kmeContentSourceAdapterHelpers.validateSettings()` before calling
|
|
`getValidToken()`. Missing fields produce a 500 Configuration Error response.
|