Files
kme_content_adapter/specs/003-kme-content-fetch/data-model.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

6.4 KiB

Data Model: KME Article Content Fetch (003)

Phase 1 output for 003-kme-content-fetch


Entities

1. KME Article Content

Represents a single article fetched from the KME Content Service.

Field Type Source Notes
vkm:url string KME Content Service JSON-LD Identifies the article; used as the fetch target (kmeURL param)
vkm:articleBody string | null KME Content Service JSON-LD HTML body of the article; may be absent, null, or empty

Validation rules:

  • vkm:articleBody must be a non-empty, non-whitespace string to constitute a valid article body.
  • Absent, null, empty string, and whitespace-only string are all treated as "article body not present" → 404.

State transitions: None. This is a read-only fetch — no mutations, no lifecycle.


2. OIDC Token

Short-lived bearer credential used to authenticate upstream requests to the KME Content Service.

Field Type Storage Notes
id_token string Redis hash authorization:token The OIDC id_token value
expiry number (Unix epoch, seconds) Redis hash authorization:expiry Expiry timestamp; compared to Date.now() / 1000

Validation rules:

  • Token is valid if cachedToken !== null && Date.now() / 1000 < expiry.
  • Managed exclusively by getValidToken() in kmeContentSourceAdapterHelpers.js. Not modified by contentFetchFlow().

Managed by: getValidToken() (existing helper — unmodified).


3. Proxy Request

Incoming HTTP request received by the adapter, carrying routing signals and parameters.

Field Type Source Notes
req.url string Node.js http.IncomingMessage Relative path + query string, e.g. /?kmeURL=https://...
req.method string Node.js http.IncomingMessage Always GET for content-fetch flow
kmeURL (extracted) string new URL(req.url, 'http://localhost').searchParams.get('kmeURL') The verbatim target URL for upstream fetch

Validation rules for kmeURL:

  1. Absent or empty: !kmeURL.trim() → 400 Bad Request (FR-007)
  2. Malformed or non-absolute: new URL(kmeURL) throws, or protocol is not http:/https: → 400 Bad Request (FR-008)
  3. Valid: passes both guards → proceed to token acquisition + upstream fetch

Data Flow

Incoming request (req.url contains ?kmeURL=...)
          │
          ▼
  Extract kmeURL from query string
  (new URL(req.url, 'http://localhost').searchParams.get('kmeURL'))
          │
          ▼
  ┌── Validate kmeURL ──────────────────────────────────────┐
  │  absent/empty?  ──────────────────────────────► 400     │
  │  malformed/non-https? ────────────────────────► 400     │
  └─────────────────────────────────────────────────────────┘
          │ valid
          ▼
  getValidToken() → OIDC token (from Redis cache or fresh fetch)
          │
          │  token fetch failed? ──────────────────► 502
          ▼
  axios.get(kmeURL, { Authorization: OIDC_id_token {token}, timeout: 10000 })
          │
          │  timeout?           ──────────────────► 502
          │  upstream 4xx?      ──────────────────► 404
          │  upstream 5xx?      ──────────────────► 502
          │  network error?     ──────────────────► 502
          ▼
  Parse response.data as JSON-LD object
          │
          │  unparseable?       ──────────────────► 502
          │  non-object?        ──────────────────► 502
          ▼
  extractArticleBody(data) → vkm:articleBody string or null
          │
          │  null (absent/empty/whitespace)? ─────► 404
          ▼
  res.writeHead(200, { 'Content-Type': 'text/html' })
  res.end(articleBody)

Helper: extractArticleBody(data)

Location: src/globalVariables/kmeContentSourceAdapterHelpers.js
Type: Pure function — no side effects, no state, no injected globals required
Added to exports: return { ..., extractArticleBody }

Signature:

function extractArticleBody(data)  string | null

Input/output contract:

Input Output
{ 'vkm:articleBody': '<p>Hello</p>' } '<p>Hello</p>'
{ 'vkm:articleBody': '' } null
{ 'vkm:articleBody': ' ' } null
{ 'vkm:articleBody': null } null
{} (field absent) null
null null
'a string' (non-object) null

Implementation:

function extractArticleBody(data) {
  if (!data || typeof data !== 'object') return null;
  const body = data['vkm:articleBody'];
  if (body == null || typeof body !== 'string' || body.trim() === '') return null;
  return body;
}

KME Content Service Response Shape

The KME Content Service returns a JSON-LD document. Only vkm:articleBody is consumed by this feature. Example:

{
  "@context": "https://vocabs.kme.example/context.jsonld",
  "@type": "vkm:Article",
  "vkm:url": "https://content.kme.example/articles/123",
  "vkm:articleBody": "<p>Article content here...</p>",
  "vkm:title": "Example Article"
}

All other fields are ignored by extractArticleBody. The proxy passes the raw HTML string of vkm:articleBody directly as the response body — no transformation, sanitisation, or re-encoding.


Settings Used (from kme_CSA_settings)

The content-fetch flow reads the same OIDC settings as oidcAuthFlow and sitemapFlow:

Field Purpose
tokenUrl OIDC token endpoint
username OIDC username credential
password OIDC password credential
clientId OIDC client identifier
scope OIDC requested scope

These are validated via kmeContentSourceAdapterHelpers.validateSettings() before calling getValidToken(). Missing fields produce a 500 Configuration Error response.