- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
10 KiB
Feature Specification: KME Article Content Fetch
Feature Branch: 003-kme-content-fetch
Created: 2025-07-15
Status: Draft
User Scenarios & Testing (mandatory)
User Story 1 — Happy Path Article Fetch (Priority: P1)
A downstream consumer (e.g. a CMS or search front-end) sends a request to the proxy with a kmeURL query parameter containing the verbatim vkm:url value it received from the KME Search API. The proxy authenticates the request to the KME Content Service, fetches the article, and streams back the HTML body of that article so the consumer can render it.
Why this priority: This is the core business value of the feature. Without a working happy path there is nothing to build on.
Independent Test: Issue a GET request to the proxy with a valid, reachable kmeURL. Verify the response body is HTML matching the vkm:articleBody field in the KME Content Service response, status 200 and Content-Type: text/html.
Acceptance Scenarios:
- Given the proxy receives a GET request whose URL does not end in
/sitemap.xml, When the request contains?kmeURL=https://content.kme.example/articles/123, Then the proxy fetches that URL from the KME Content Service withAuthorization: OIDC_id_token {token}, extractsvkm:articleBodyfrom the JSON-LD response, and returns it as the HTTP response body with status 200 andContent-Type: text/html. - Given the token cache holds a valid OIDC token, When the proxy makes the upstream request, Then it uses the cached token without a new token acquisition round-trip.
- Given the token cache has expired, When the proxy makes the upstream request, Then
getValidToken()refreshes the token transparently before the upstream call is made.
User Story 2 — Missing or Empty kmeURL Parameter (Priority: P2)
A consumer sends a request that matches the content-fetch route (not a sitemap URL) but omits the kmeURL parameter or provides it as an empty string. The proxy must reject the request immediately with a clear 400 response rather than making a malformed upstream call.
Why this priority: Bad-input rejection prevents meaningless upstream calls and gives consumers a clear, actionable error signal.
Independent Test: Send a GET request to the proxy without kmeURL, or with kmeURL=. Verify a 400 Bad Request response is returned.
Acceptance Scenarios:
- Given the proxy receives a request with no
kmeURLquery parameter, When the request is processed, Then the proxy returns HTTP 400 without making any upstream request. - Given the proxy receives a request with
?kmeURL=(empty value), When the request is processed, Then the proxy returns HTTP 400 without making any upstream request.
User Story 3 — Upstream Content Fetch Failure or Missing Article Body (Priority: P3)
The KME Content Service is unreachable, returns an HTTP error status, times out, or returns a valid JSON-LD document that does not contain vkm:articleBody. The proxy must surface an appropriate error to the consumer.
Why this priority: Robust error handling avoids silent failures and lets consumers distinguish between "article not found" and "upstream service error".
Independent Test: Simulate or stub each failure mode and verify the correct HTTP error code is returned by the proxy.
Acceptance Scenarios:
- Given the KME Content Service returns a 4xx response for the requested URL, When the proxy processes the response, Then the proxy returns HTTP 404 to the caller.
- Given the KME Content Service returns a 5xx response or the request times out (exceeding 10 seconds), When the proxy processes the response, Then the proxy returns HTTP 502 to the caller.
- Given the KME Content Service returns a 200 JSON-LD response but the
vkm:articleBodyfield is absent or null, When the proxy processes the response, Then the proxy returns HTTP 404 to the caller. - Given a network-level error prevents the upstream request from completing, When the proxy processes the error, Then the proxy returns HTTP 502 to the caller.
User Story 4 — Existing Passthrough Behaviour Preserved (Priority: P4)
Requests that do not match the sitemap route and do not carry a kmeURL parameter must continue to receive the existing 200 OK response (auth-check passthrough) without any change in behaviour.
Why this priority: Non-regression of existing behaviour is required to avoid breaking active consumers that rely on the passthrough route.
Independent Test: Send a GET request to the proxy with neither a /sitemap.xml suffix nor a kmeURL parameter. Verify a 200 OK response is returned, identical to current behaviour.
Acceptance Scenarios:
- Given the proxy receives a request with no
kmeURLparameter and a URL not ending in/sitemap.xml, When the request is processed, Then the proxy returns HTTP 200 (the existing auth-check passthrough).
Edge Cases
- What happens when
kmeURLcontains an already-encoded URL (percent-encoded characters)? The value must be used verbatim; double-encoding must not occur. - What happens if the JSON-LD response body from the KME Content Service is not valid JSON? The proxy should treat this as a 502 upstream error.
- What happens if the upstream response contains
vkm:articleBodybut its value is an empty string? Treat as absent → return 404. - What happens if the OIDC token cannot be acquired (e.g. auth service down)? Surface this as a 502 upstream error.
- What happens if
kmeURLis present but the URL is not a well-formed absolute URL? Return 400 Bad Request (same as missing/empty).
Requirements (mandatory)
Functional Requirements
- FR-001: The proxy MUST detect when an incoming request URL does NOT end in
/sitemap.xmlAND contains a non-emptykmeURLquery parameter, and route such requests through the content-fetch flow. - FR-002: The proxy MUST use the
kmeURLparameter value exactly as provided — without any manipulation, re-encoding, or URL construction — as the target URL for the upstream GET request. - FR-003: The proxy MUST attach an
Authorization: OIDC_id_token {token}header to the upstream GET request, obtaining the token viagetValidToken()fromkmeContentSourceAdapterHelpers. - FR-004: The upstream GET request MUST have a timeout of 10 seconds.
- FR-005: On a successful upstream response, the proxy MUST extract the
vkm:articleBodyfield from the JSON-LD response body. - FR-006: The proxy MUST return the
vkm:articleBodyvalue as the HTTP response body with status 200 andContent-Type: text/html. - FR-007: If
kmeURLis absent, empty, or blank, the proxy MUST return HTTP 400 without making an upstream request. - FR-008: If
kmeURLis present but not a well-formed absolute URL, the proxy MUST return HTTP 400. - FR-009: If the upstream request results in a 4xx response, or the
vkm:articleBodyfield is absent, null, or empty in an otherwise successful response, the proxy MUST return HTTP 404 to the caller. - FR-010: If the upstream request results in a 5xx response, a timeout, a network error, or an unparseable response body, the proxy MUST return HTTP 502 to the caller.
- FR-011: If the OIDC token cannot be acquired, the proxy MUST return HTTP 502 to the caller.
- FR-012: Requests that neither end in
/sitemap.xmlnor carry akmeURLparameter MUST continue to receive the existing 200 OK passthrough response, unchanged. - FR-013: The content-fetch flow MUST be implemented entirely within the VM sandbox file (
src/proxyScripts/kmeContentSourceAdapter.js) using only the injected context variables (axios,kmeContentSourceAdapterHelpers,kme_CSA_settings,console,URLSearchParams,URL,req,res) — no new imports or module-level exports are permitted.
Key Entities
- KME Article Content: Represents a single article fetched from the KME Content Service. Identified by its
vkm:url. Key field:vkm:articleBody(HTML string). - OIDC Token: A short-lived bearer credential used to authenticate requests to the KME Content Service. Managed by
getValidToken(), which handles caching (Redis) and refresh transparently. - Proxy Request: An incoming HTTP request received by the proxy script, carrying routing signals in the URL path (sitemap detection) and query string (
kmeURL).
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: A consumer submitting a valid
kmeURLreceives the corresponding article HTML body in under 11 seconds end-to-end (10 s upstream timeout + 1 s proxy overhead) under normal network conditions. - SC-002: 100% of requests with a missing, empty, or malformed
kmeURLparameter receive a 400 response without triggering any upstream call. - SC-003: 100% of upstream 4xx responses and missing/empty
vkm:articleBodyscenarios result in a 404 response to the caller. - SC-004: 100% of upstream 5xx, timeout, and network-error scenarios result in a 502 response to the caller.
- SC-005: All existing proxy routes (sitemap flow and passthrough) continue to behave identically to their pre-feature behaviour — zero regression.
- SC-006: The unit test suite for the proxy script achieves ≥90% branch coverage across the content-fetch flow, including all four error paths.
Assumptions
- The
kmeURLvalue provided by callers is the verbatimvkm:urlvalue from the KME Search API response — the spec does not need to validate its domain or path structure beyond confirming it is a well-formed absolute URL. getValidToken()is already implemented, tested, and handles all OIDC token edge cases (expiry, Redis connectivity, refresh). This feature does not modify it.- The
axiosinstance injected into the VM context supports atimeoutconfiguration option and throws a recognisable error on timeout (following standard axios behaviour). - The KME Content Service always returns
Content-Type: application/ld+json(or similar JSON) for valid article requests; no binary or streaming responses are expected. - HTTP method is always GET for the content-fetch flow; no authentication or session concept exists on the proxy's inbound side.
- The existing sitemap route detection (URL ends in
/sitemap.xml) takes priority over thekmeURLcheck — a URL ending in/sitemap.xml?kmeURL=...would route to the sitemap flow, not the content-fetch flow. - Error response bodies are plain text or minimal JSON — no prescribed format is required beyond the correct HTTP status code.