- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5.4 KiB
HTTP Contract: Content Fetch Route
Feature: 003-kme-content-fetch
File: specs/003-kme-content-fetch/contracts/http-content-fetch.md
This document defines the HTTP request/response contract for the content-fetch route exposed by the KME Content Adapter proxy.
Route
GET {proxy-base-url}?kmeURL={encoded-article-url}
The proxy detects the content-fetch route when:
- The incoming URL does not end in
/sitemap.xml, AND - The query string contains a
kmeURLparameter (present, regardless of value)
Requests without kmeURL (and not a sitemap request) are routed to the existing auth-check
passthrough (returns 200 "Authorized").
Request
Method
GET
Query Parameters
| Parameter | Required | Description |
|---|---|---|
kmeURL |
Yes | The verbatim vkm:url value from the KME Search API response. Must be a well-formed absolute http or https URL. Percent-encoded characters are decoded once (standard URL decoding) — double-encoding must not occur. |
Headers
None required on the inbound request. The proxy adds its own Authorization header on the upstream
request.
Example Request
GET /?kmeURL=https%3A%2F%2Fcontent.kme.example%2Farticles%2F123 HTTP/1.1
Host: proxy.example.com
Responses
200 OK — Article HTML Body
The article was successfully fetched and vkm:articleBody was extracted.
HTTP/1.1 200 OK
Content-Type: text/html
<p>Article content here...</p>
| Field | Value |
|---|---|
| Status | 200 |
Content-Type |
text/html |
| Body | Raw HTML string from vkm:articleBody field of the KME Content Service JSON-LD response. Not sanitised or transformed. |
400 Bad Request — Invalid kmeURL
Returned when kmeURL is absent, empty, whitespace-only, or not a well-formed absolute http/https URL.
No upstream request is made.
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Bad Request: kmeURL parameter is required
HTTP/1.1 400 Bad Request
Content-Type: text/plain
Bad Request: kmeURL must be a well-formed absolute http/https URL
| Trigger | Response body |
|---|---|
kmeURL absent, empty, or whitespace |
Bad Request: kmeURL parameter is required |
kmeURL present but malformed or non-http/https |
Bad Request: kmeURL must be a well-formed absolute http/https URL |
404 Not Found — Article Not Found
Returned when the upstream KME Content Service returns a 4xx response for the article URL, or when
the upstream response does not contain a non-empty vkm:articleBody.
HTTP/1.1 404 Not Found
Content-Type: text/plain
Not Found: article not found at upstream
HTTP/1.1 404 Not Found
Content-Type: text/plain
Not Found: article body not present in upstream response
| Trigger | Response body |
|---|---|
| Upstream 4xx HTTP response | Not Found: article not found at upstream |
vkm:articleBody absent, null, or empty string |
Not Found: article body not present in upstream response |
500 Internal Server Error — Proxy Configuration Error
Returned when a required OIDC setting is missing from kme_CSA_settings. Indicates a proxy
deployment/configuration issue.
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain
Configuration error: missing required field: tokenUrl
502 Bad Gateway — Upstream or Token Failure
Returned for any upstream connectivity, protocol, or data error, and for token acquisition failure.
HTTP/1.1 502 Bad Gateway
Content-Type: text/plain
Bad Gateway: token acquisition failed
| Trigger | Response body |
|---|---|
| OIDC token acquisition failure | Bad Gateway: token acquisition failed |
Upstream request timeout (ECONNABORTED/ERR_CANCELED) |
Bad Gateway: upstream request timed out |
| Upstream 5xx HTTP response | Bad Gateway: upstream error HTTP {status} |
| Network-level error (no HTTP response) | Bad Gateway: {error message} |
| Upstream response body is not valid JSON | Bad Gateway: unparseable response from upstream |
| Upstream response body is not an object | Bad Gateway: unexpected response from upstream |
Upstream Request (Proxy → KME Content Service)
The proxy makes a single GET request to the verbatim kmeURL value.
GET {kmeURL} HTTP/1.1
Authorization: OIDC_id_token {id_token}
| Field | Value |
|---|---|
| Method | GET |
| URL | Verbatim value of kmeURL query parameter — no manipulation, no re-encoding |
Authorization |
OIDC_id_token {id_token} where id_token is from getValidToken() |
| Timeout | 10 000 ms (10 seconds) |
Error Mapping Summary
kmeURL absent/empty → 400
kmeURL malformed / non-http(s) → 400
Missing OIDC config → 500
Token acquisition failure → 502
Upstream 4xx → 404
Upstream 5xx → 502
Upstream timeout → 502
Network error → 502
Unparseable response body → 502
vkm:articleBody absent/null/empty → 404
Success → 200 text/html
Non-regression: Existing Routes
This feature does not change the behaviour of existing routes:
| Route | Behaviour |
|---|---|
URL ends in /sitemap.xml |
Sitemap flow (unchanged) |
No kmeURL, not sitemap |
Auth-check passthrough → 200 "Authorized" (unchanged) |