Files
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

10 KiB

Feature Specification: KME Article Content Fetch

Feature Branch: 003-kme-content-fetch
Created: 2025-07-15
Status: Draft

User Scenarios & Testing (mandatory)

User Story 1 — Happy Path Article Fetch (Priority: P1)

A downstream consumer (e.g. a CMS or search front-end) sends a request to the proxy with a kmeURL query parameter containing the verbatim vkm:url value it received from the KME Search API. The proxy authenticates the request to the KME Content Service, fetches the article, and streams back the HTML body of that article so the consumer can render it.

Why this priority: This is the core business value of the feature. Without a working happy path there is nothing to build on.

Independent Test: Issue a GET request to the proxy with a valid, reachable kmeURL. Verify the response body is HTML matching the vkm:articleBody field in the KME Content Service response, status 200 and Content-Type: text/html.

Acceptance Scenarios:

  1. Given the proxy receives a GET request whose URL does not end in /sitemap.xml, When the request contains ?kmeURL=https://content.kme.example/articles/123, Then the proxy fetches that URL from the KME Content Service with Authorization: OIDC_id_token {token}, extracts vkm:articleBody from the JSON-LD response, and returns it as the HTTP response body with status 200 and Content-Type: text/html.
  2. Given the token cache holds a valid OIDC token, When the proxy makes the upstream request, Then it uses the cached token without a new token acquisition round-trip.
  3. Given the token cache has expired, When the proxy makes the upstream request, Then getValidToken() refreshes the token transparently before the upstream call is made.

User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)

A consumer sends a request that matches the content-fetch route (not a sitemap URL) but omits the kmeURL parameter or provides it as an empty string. The proxy must reject the request immediately with a clear 400 response rather than making a malformed upstream call.

Why this priority: Bad-input rejection prevents meaningless upstream calls and gives consumers a clear, actionable error signal.

Independent Test: Send a GET request to the proxy without kmeURL, or with kmeURL=. Verify a 400 Bad Request response is returned.

Acceptance Scenarios:

  1. Given the proxy receives a request with no kmeURL query parameter, When the request is processed, Then the proxy returns HTTP 400 without making any upstream request.
  2. Given the proxy receives a request with ?kmeURL= (empty value), When the request is processed, Then the proxy returns HTTP 400 without making any upstream request.

User Story 3 — Upstream Content Fetch Failure or Missing Article Body (Priority: P3)

The KME Content Service is unreachable, returns an HTTP error status, times out, or returns a valid JSON-LD document that does not contain vkm:articleBody. The proxy must surface an appropriate error to the consumer.

Why this priority: Robust error handling avoids silent failures and lets consumers distinguish between "article not found" and "upstream service error".

Independent Test: Simulate or stub each failure mode and verify the correct HTTP error code is returned by the proxy.

Acceptance Scenarios:

  1. Given the KME Content Service returns a 4xx response for the requested URL, When the proxy processes the response, Then the proxy returns HTTP 404 to the caller.
  2. Given the KME Content Service returns a 5xx response or the request times out (exceeding 10 seconds), When the proxy processes the response, Then the proxy returns HTTP 502 to the caller.
  3. Given the KME Content Service returns a 200 JSON-LD response but the vkm:articleBody field is absent or null, When the proxy processes the response, Then the proxy returns HTTP 404 to the caller.
  4. Given a network-level error prevents the upstream request from completing, When the proxy processes the error, Then the proxy returns HTTP 502 to the caller.

User Story 4 — Existing Passthrough Behaviour Preserved (Priority: P4)

Requests that do not match the sitemap route and do not carry a kmeURL parameter must continue to receive the existing 200 OK response (auth-check passthrough) without any change in behaviour.

Why this priority: Non-regression of existing behaviour is required to avoid breaking active consumers that rely on the passthrough route.

Independent Test: Send a GET request to the proxy with neither a /sitemap.xml suffix nor a kmeURL parameter. Verify a 200 OK response is returned, identical to current behaviour.

Acceptance Scenarios:

  1. Given the proxy receives a request with no kmeURL parameter and a URL not ending in /sitemap.xml, When the request is processed, Then the proxy returns HTTP 200 (the existing auth-check passthrough).

Edge Cases

  • What happens when kmeURL contains an already-encoded URL (percent-encoded characters)? The value must be used verbatim; double-encoding must not occur.
  • What happens if the JSON-LD response body from the KME Content Service is not valid JSON? The proxy should treat this as a 502 upstream error.
  • What happens if the upstream response contains vkm:articleBody but its value is an empty string? Treat as absent → return 404.
  • What happens if the OIDC token cannot be acquired (e.g. auth service down)? Surface this as a 502 upstream error.
  • What happens if kmeURL is present but the URL is not a well-formed absolute URL? Return 400 Bad Request (same as missing/empty).

Requirements (mandatory)

Functional Requirements

  • FR-001: The proxy MUST detect when an incoming request URL does NOT end in /sitemap.xml AND contains a non-empty kmeURL query parameter, and route such requests through the content-fetch flow.
  • FR-002: The proxy MUST use the kmeURL parameter value exactly as provided — without any manipulation, re-encoding, or URL construction — as the target URL for the upstream GET request.
  • FR-003: The proxy MUST attach an Authorization: OIDC_id_token {token} header to the upstream GET request, obtaining the token via getValidToken() from kmeContentSourceAdapterHelpers.
  • FR-004: The upstream GET request MUST have a timeout of 10 seconds.
  • FR-005: On a successful upstream response, the proxy MUST extract the vkm:articleBody field from the JSON-LD response body.
  • FR-006: The proxy MUST return the vkm:articleBody value as the HTTP response body with status 200 and Content-Type: text/html.
  • FR-007: If kmeURL is absent, empty, or blank, the proxy MUST return HTTP 400 without making an upstream request.
  • FR-008: If kmeURL is present but not a well-formed absolute URL, the proxy MUST return HTTP 400.
  • FR-009: If the upstream request results in a 4xx response, or the vkm:articleBody field is absent, null, or empty in an otherwise successful response, the proxy MUST return HTTP 404 to the caller.
  • FR-010: If the upstream request results in a 5xx response, a timeout, a network error, or an unparseable response body, the proxy MUST return HTTP 502 to the caller.
  • FR-011: If the OIDC token cannot be acquired, the proxy MUST return HTTP 502 to the caller.
  • FR-012: Requests that neither end in /sitemap.xml nor carry a kmeURL parameter MUST continue to receive the existing 200 OK passthrough response, unchanged.
  • FR-013: The content-fetch flow MUST be implemented entirely within the VM sandbox file (src/proxyScripts/kmeContentSourceAdapter.js) using only the injected context variables (axios, kmeContentSourceAdapterHelpers, kme_CSA_settings, console, URLSearchParams, URL, req, res) — no new imports or module-level exports are permitted.

Key Entities

  • KME Article Content: Represents a single article fetched from the KME Content Service. Identified by its vkm:url. Key field: vkm:articleBody (HTML string).
  • OIDC Token: A short-lived bearer credential used to authenticate requests to the KME Content Service. Managed by getValidToken(), which handles caching (Redis) and refresh transparently.
  • Proxy Request: An incoming HTTP request received by the proxy script, carrying routing signals in the URL path (sitemap detection) and query string (kmeURL).

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: A consumer submitting a valid kmeURL receives the corresponding article HTML body in under 11 seconds end-to-end (10 s upstream timeout + 1 s proxy overhead) under normal network conditions.
  • SC-002: 100% of requests with a missing, empty, or malformed kmeURL parameter receive a 400 response without triggering any upstream call.
  • SC-003: 100% of upstream 4xx responses and missing/empty vkm:articleBody scenarios result in a 404 response to the caller.
  • SC-004: 100% of upstream 5xx, timeout, and network-error scenarios result in a 502 response to the caller.
  • SC-005: All existing proxy routes (sitemap flow and passthrough) continue to behave identically to their pre-feature behaviour — zero regression.
  • SC-006: The unit test suite for the proxy script achieves ≥90% branch coverage across the content-fetch flow, including all four error paths.

Assumptions

  • The kmeURL value provided by callers is the verbatim vkm:url value from the KME Search API response — the spec does not need to validate its domain or path structure beyond confirming it is a well-formed absolute URL.
  • getValidToken() is already implemented, tested, and handles all OIDC token edge cases (expiry, Redis connectivity, refresh). This feature does not modify it.
  • The axios instance injected into the VM context supports a timeout configuration option and throws a recognisable error on timeout (following standard axios behaviour).
  • The KME Content Service always returns Content-Type: application/ld+json (or similar JSON) for valid article requests; no binary or streaming responses are expected.
  • HTTP method is always GET for the content-fetch flow; no authentication or session concept exists on the proxy's inbound side.
  • The existing sitemap route detection (URL ends in /sitemap.xml) takes priority over the kmeURL check — a URL ending in /sitemap.xml?kmeURL=... would route to the sitemap flow, not the content-fetch flow.
  • Error response bodies are plain text or minimal JSON — no prescribed format is required beyond the correct HTTP status code.