Files
kme_content_adapter/specs/003-kme-content-fetch/contracts/http-content-fetch.md
Peter.Morton f840587e5e feat: content fetch, sitemap fixes, remove oidcAuthFlow
- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-23 16:40:06 -05:00

5.4 KiB

HTTP Contract: Content Fetch Route

Feature: 003-kme-content-fetch
File: specs/003-kme-content-fetch/contracts/http-content-fetch.md

This document defines the HTTP request/response contract for the content-fetch route exposed by the KME Content Adapter proxy.


Route

GET {proxy-base-url}?kmeURL={encoded-article-url}

The proxy detects the content-fetch route when:

  • The incoming URL does not end in /sitemap.xml, AND
  • The query string contains a kmeURL parameter (present, regardless of value)

Requests without kmeURL (and not a sitemap request) are routed to the existing auth-check passthrough (returns 200 "Authorized").


Request

Method

GET

Query Parameters

Parameter Required Description
kmeURL Yes The verbatim vkm:url value from the KME Search API response. Must be a well-formed absolute http or https URL. Percent-encoded characters are decoded once (standard URL decoding) — double-encoding must not occur.

Headers

None required on the inbound request. The proxy adds its own Authorization header on the upstream request.

Example Request

GET /?kmeURL=https%3A%2F%2Fcontent.kme.example%2Farticles%2F123 HTTP/1.1
Host: proxy.example.com

Responses

200 OK — Article HTML Body

The article was successfully fetched and vkm:articleBody was extracted.

HTTP/1.1 200 OK
Content-Type: text/html

<p>Article content here...</p>
Field Value
Status 200
Content-Type text/html
Body Raw HTML string from vkm:articleBody field of the KME Content Service JSON-LD response. Not sanitised or transformed.

400 Bad Request — Invalid kmeURL

Returned when kmeURL is absent, empty, whitespace-only, or not a well-formed absolute http/https URL. No upstream request is made.

HTTP/1.1 400 Bad Request
Content-Type: text/plain

Bad Request: kmeURL parameter is required
HTTP/1.1 400 Bad Request
Content-Type: text/plain

Bad Request: kmeURL must be a well-formed absolute http/https URL
Trigger Response body
kmeURL absent, empty, or whitespace Bad Request: kmeURL parameter is required
kmeURL present but malformed or non-http/https Bad Request: kmeURL must be a well-formed absolute http/https URL

404 Not Found — Article Not Found

Returned when the upstream KME Content Service returns a 4xx response for the article URL, or when the upstream response does not contain a non-empty vkm:articleBody.

HTTP/1.1 404 Not Found
Content-Type: text/plain

Not Found: article not found at upstream
HTTP/1.1 404 Not Found
Content-Type: text/plain

Not Found: article body not present in upstream response
Trigger Response body
Upstream 4xx HTTP response Not Found: article not found at upstream
vkm:articleBody absent, null, or empty string Not Found: article body not present in upstream response

500 Internal Server Error — Proxy Configuration Error

Returned when a required OIDC setting is missing from kme_CSA_settings. Indicates a proxy deployment/configuration issue.

HTTP/1.1 500 Internal Server Error
Content-Type: text/plain

Configuration error: missing required field: tokenUrl

502 Bad Gateway — Upstream or Token Failure

Returned for any upstream connectivity, protocol, or data error, and for token acquisition failure.

HTTP/1.1 502 Bad Gateway
Content-Type: text/plain

Bad Gateway: token acquisition failed
Trigger Response body
OIDC token acquisition failure Bad Gateway: token acquisition failed
Upstream request timeout (ECONNABORTED/ERR_CANCELED) Bad Gateway: upstream request timed out
Upstream 5xx HTTP response Bad Gateway: upstream error HTTP {status}
Network-level error (no HTTP response) Bad Gateway: {error message}
Upstream response body is not valid JSON Bad Gateway: unparseable response from upstream
Upstream response body is not an object Bad Gateway: unexpected response from upstream

Upstream Request (Proxy → KME Content Service)

The proxy makes a single GET request to the verbatim kmeURL value.

GET {kmeURL} HTTP/1.1
Authorization: OIDC_id_token {id_token}
Field Value
Method GET
URL Verbatim value of kmeURL query parameter — no manipulation, no re-encoding
Authorization OIDC_id_token {id_token} where id_token is from getValidToken()
Timeout 10 000 ms (10 seconds)

Error Mapping Summary

kmeURL absent/empty                  → 400
kmeURL malformed / non-http(s)       → 400
Missing OIDC config                  → 500
Token acquisition failure            → 502
Upstream 4xx                         → 404
Upstream 5xx                         → 502
Upstream timeout                     → 502
Network error                        → 502
Unparseable response body            → 502
vkm:articleBody absent/null/empty    → 404
Success                              → 200 text/html

Non-regression: Existing Routes

This feature does not change the behaviour of existing routes:

Route Behaviour
URL ends in /sitemap.xml Sitemap flow (unchanged)
No kmeURL, not sitemap Auth-check passthrough → 200 "Authorized" (unchanged)