feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
170
specs/003-kme-content-fetch/data-model.md
Normal file
170
specs/003-kme-content-fetch/data-model.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# Data Model: KME Article Content Fetch (003)
|
||||
|
||||
**Phase 1 output for `003-kme-content-fetch`**
|
||||
|
||||
---
|
||||
|
||||
## Entities
|
||||
|
||||
### 1. KME Article Content
|
||||
|
||||
Represents a single article fetched from the KME Content Service.
|
||||
|
||||
| Field | Type | Source | Notes |
|
||||
|-------|------|--------|-------|
|
||||
| `vkm:url` | `string` | KME Content Service JSON-LD | Identifies the article; used as the fetch target (`kmeURL` param) |
|
||||
| `vkm:articleBody` | `string \| null` | KME Content Service JSON-LD | HTML body of the article; may be absent, null, or empty |
|
||||
|
||||
**Validation rules**:
|
||||
- `vkm:articleBody` must be a non-empty, non-whitespace string to constitute a valid article body.
|
||||
- Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.
|
||||
|
||||
**State transitions**: None. This is a read-only fetch — no mutations, no lifecycle.
|
||||
|
||||
---
|
||||
|
||||
### 2. OIDC Token
|
||||
|
||||
Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.
|
||||
|
||||
| Field | Type | Storage | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| `id_token` | `string` | Redis hash `authorization:token` | The OIDC id_token value |
|
||||
| `expiry` | `number` (Unix epoch, seconds) | Redis hash `authorization:expiry` | Expiry timestamp; compared to `Date.now() / 1000` |
|
||||
|
||||
**Validation rules**:
|
||||
- Token is valid if `cachedToken !== null && Date.now() / 1000 < expiry`.
|
||||
- Managed exclusively by `getValidToken()` in `kmeContentSourceAdapterHelpers.js`. Not modified by `contentFetchFlow()`.
|
||||
|
||||
**Managed by**: `getValidToken()` (existing helper — unmodified).
|
||||
|
||||
---
|
||||
|
||||
### 3. Proxy Request
|
||||
|
||||
Incoming HTTP request received by the adapter, carrying routing signals and parameters.
|
||||
|
||||
| Field | Type | Source | Notes |
|
||||
|-------|------|--------|-------|
|
||||
| `req.url` | `string` | Node.js `http.IncomingMessage` | Relative path + query string, e.g. `/?kmeURL=https://...` |
|
||||
| `req.method` | `string` | Node.js `http.IncomingMessage` | Always `GET` for content-fetch flow |
|
||||
| `kmeURL` (extracted) | `string` | `new URL(req.url, 'http://localhost').searchParams.get('kmeURL')` | The verbatim target URL for upstream fetch |
|
||||
|
||||
**Validation rules for `kmeURL`**:
|
||||
1. **Absent or empty**: `!kmeURL.trim()` → 400 Bad Request (FR-007)
|
||||
2. **Malformed or non-absolute**: `new URL(kmeURL)` throws, or protocol is not `http:`/`https:` → 400 Bad Request (FR-008)
|
||||
3. **Valid**: passes both guards → proceed to token acquisition + upstream fetch
|
||||
|
||||
---
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Incoming request (req.url contains ?kmeURL=...)
|
||||
│
|
||||
▼
|
||||
Extract kmeURL from query string
|
||||
(new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
|
||||
│
|
||||
▼
|
||||
┌── Validate kmeURL ──────────────────────────────────────┐
|
||||
│ absent/empty? ──────────────────────────────► 400 │
|
||||
│ malformed/non-https? ────────────────────────► 400 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│ valid
|
||||
▼
|
||||
getValidToken() → OIDC token (from Redis cache or fresh fetch)
|
||||
│
|
||||
│ token fetch failed? ──────────────────► 502
|
||||
▼
|
||||
axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
|
||||
│
|
||||
│ timeout? ──────────────────► 502
|
||||
│ upstream 4xx? ──────────────────► 404
|
||||
│ upstream 5xx? ──────────────────► 502
|
||||
│ network error? ──────────────────► 502
|
||||
▼
|
||||
Parse response.data as JSON-LD object
|
||||
│
|
||||
│ unparseable? ──────────────────► 502
|
||||
│ non-object? ──────────────────► 502
|
||||
▼
|
||||
extractArticleBody(data) → vkm:articleBody string or null
|
||||
│
|
||||
│ null (absent/empty/whitespace)? ─────► 404
|
||||
▼
|
||||
res.writeHead(200, { 'Content-Type': 'text/html' })
|
||||
res.end(articleBody)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Helper: `extractArticleBody(data)`
|
||||
|
||||
**Location**: `src/globalVariables/kmeContentSourceAdapterHelpers.js`
|
||||
**Type**: Pure function — no side effects, no state, no injected globals required
|
||||
**Added to exports**: `return { ..., extractArticleBody }`
|
||||
|
||||
**Signature**:
|
||||
```javascript
|
||||
function extractArticleBody(data) → string | null
|
||||
```
|
||||
|
||||
**Input/output contract**:
|
||||
|
||||
| Input | Output |
|
||||
|-------|--------|
|
||||
| `{ 'vkm:articleBody': '<p>Hello</p>' }` | `'<p>Hello</p>'` |
|
||||
| `{ 'vkm:articleBody': '' }` | `null` |
|
||||
| `{ 'vkm:articleBody': ' ' }` | `null` |
|
||||
| `{ 'vkm:articleBody': null }` | `null` |
|
||||
| `{}` (field absent) | `null` |
|
||||
| `null` | `null` |
|
||||
| `'a string'` (non-object) | `null` |
|
||||
|
||||
**Implementation**:
|
||||
```javascript
|
||||
function extractArticleBody(data) {
|
||||
if (!data || typeof data !== 'object') return null;
|
||||
const body = data['vkm:articleBody'];
|
||||
if (body == null || typeof body !== 'string' || body.trim() === '') return null;
|
||||
return body;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## KME Content Service Response Shape
|
||||
|
||||
The KME Content Service returns a JSON-LD document. Only `vkm:articleBody` is consumed by this
|
||||
feature. Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"@context": "https://vocabs.kme.example/context.jsonld",
|
||||
"@type": "vkm:Article",
|
||||
"vkm:url": "https://content.kme.example/articles/123",
|
||||
"vkm:articleBody": "<p>Article content here...</p>",
|
||||
"vkm:title": "Example Article"
|
||||
}
|
||||
```
|
||||
|
||||
All other fields are ignored by `extractArticleBody`. The proxy passes the raw HTML string of
|
||||
`vkm:articleBody` directly as the response body — no transformation, sanitisation, or re-encoding.
|
||||
|
||||
---
|
||||
|
||||
## Settings Used (from `kme_CSA_settings`)
|
||||
|
||||
The content-fetch flow reads the same OIDC settings as `oidcAuthFlow` and `sitemapFlow`:
|
||||
|
||||
| Field | Purpose |
|
||||
|-------|---------|
|
||||
| `tokenUrl` | OIDC token endpoint |
|
||||
| `username` | OIDC username credential |
|
||||
| `password` | OIDC password credential |
|
||||
| `clientId` | OIDC client identifier |
|
||||
| `scope` | OIDC requested scope |
|
||||
|
||||
These are validated via `kmeContentSourceAdapterHelpers.validateSettings()` before calling
|
||||
`getValidToken()`. Missing fields produce a 500 Configuration Error response.
|
||||
Reference in New Issue
Block a user