- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
202 lines
5.4 KiB
Markdown
202 lines
5.4 KiB
Markdown
# HTTP Contract: Content Fetch Route
|
|
|
|
**Feature**: `003-kme-content-fetch`
|
|
**File**: `specs/003-kme-content-fetch/contracts/http-content-fetch.md`
|
|
|
|
This document defines the HTTP request/response contract for the content-fetch route exposed by the
|
|
KME Content Adapter proxy.
|
|
|
|
---
|
|
|
|
## Route
|
|
|
|
```
|
|
GET {proxy-base-url}?kmeURL={encoded-article-url}
|
|
```
|
|
|
|
The proxy detects the content-fetch route when:
|
|
- The incoming URL does **not** end in `/sitemap.xml`, AND
|
|
- The query string contains a `kmeURL` parameter (present, regardless of value)
|
|
|
|
Requests without `kmeURL` (and not a sitemap request) are routed to the existing auth-check
|
|
passthrough (returns 200 "Authorized").
|
|
|
|
---
|
|
|
|
## Request
|
|
|
|
### Method
|
|
`GET`
|
|
|
|
### Query Parameters
|
|
|
|
| Parameter | Required | Description |
|
|
|-----------|----------|-------------|
|
|
| `kmeURL` | Yes | The verbatim `vkm:url` value from the KME Search API response. Must be a well-formed absolute `http` or `https` URL. Percent-encoded characters are decoded once (standard URL decoding) — double-encoding must not occur. |
|
|
|
|
### Headers
|
|
None required on the inbound request. The proxy adds its own `Authorization` header on the upstream
|
|
request.
|
|
|
|
### Example Request
|
|
```
|
|
GET /?kmeURL=https%3A%2F%2Fcontent.kme.example%2Farticles%2F123 HTTP/1.1
|
|
Host: proxy.example.com
|
|
```
|
|
|
|
---
|
|
|
|
## Responses
|
|
|
|
### 200 OK — Article HTML Body
|
|
|
|
The article was successfully fetched and `vkm:articleBody` was extracted.
|
|
|
|
```
|
|
HTTP/1.1 200 OK
|
|
Content-Type: text/html
|
|
|
|
<p>Article content here...</p>
|
|
```
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Status | `200` |
|
|
| `Content-Type` | `text/html` |
|
|
| Body | Raw HTML string from `vkm:articleBody` field of the KME Content Service JSON-LD response. Not sanitised or transformed. |
|
|
|
|
---
|
|
|
|
### 400 Bad Request — Invalid `kmeURL`
|
|
|
|
Returned when `kmeURL` is absent, empty, whitespace-only, or not a well-formed absolute http/https URL.
|
|
No upstream request is made.
|
|
|
|
```
|
|
HTTP/1.1 400 Bad Request
|
|
Content-Type: text/plain
|
|
|
|
Bad Request: kmeURL parameter is required
|
|
```
|
|
|
|
```
|
|
HTTP/1.1 400 Bad Request
|
|
Content-Type: text/plain
|
|
|
|
Bad Request: kmeURL must be a well-formed absolute http/https URL
|
|
```
|
|
|
|
| Trigger | Response body |
|
|
|---------|---------------|
|
|
| `kmeURL` absent, empty, or whitespace | `Bad Request: kmeURL parameter is required` |
|
|
| `kmeURL` present but malformed or non-http/https | `Bad Request: kmeURL must be a well-formed absolute http/https URL` |
|
|
|
|
---
|
|
|
|
### 404 Not Found — Article Not Found
|
|
|
|
Returned when the upstream KME Content Service returns a 4xx response for the article URL, or when
|
|
the upstream response does not contain a non-empty `vkm:articleBody`.
|
|
|
|
```
|
|
HTTP/1.1 404 Not Found
|
|
Content-Type: text/plain
|
|
|
|
Not Found: article not found at upstream
|
|
```
|
|
|
|
```
|
|
HTTP/1.1 404 Not Found
|
|
Content-Type: text/plain
|
|
|
|
Not Found: article body not present in upstream response
|
|
```
|
|
|
|
| Trigger | Response body |
|
|
|---------|---------------|
|
|
| Upstream 4xx HTTP response | `Not Found: article not found at upstream` |
|
|
| `vkm:articleBody` absent, null, or empty string | `Not Found: article body not present in upstream response` |
|
|
|
|
---
|
|
|
|
### 500 Internal Server Error — Proxy Configuration Error
|
|
|
|
Returned when a required OIDC setting is missing from `kme_CSA_settings`. Indicates a proxy
|
|
deployment/configuration issue.
|
|
|
|
```
|
|
HTTP/1.1 500 Internal Server Error
|
|
Content-Type: text/plain
|
|
|
|
Configuration error: missing required field: tokenUrl
|
|
```
|
|
|
|
---
|
|
|
|
### 502 Bad Gateway — Upstream or Token Failure
|
|
|
|
Returned for any upstream connectivity, protocol, or data error, and for token acquisition failure.
|
|
|
|
```
|
|
HTTP/1.1 502 Bad Gateway
|
|
Content-Type: text/plain
|
|
|
|
Bad Gateway: token acquisition failed
|
|
```
|
|
|
|
| Trigger | Response body |
|
|
|---------|---------------|
|
|
| OIDC token acquisition failure | `Bad Gateway: token acquisition failed` |
|
|
| Upstream request timeout (`ECONNABORTED`/`ERR_CANCELED`) | `Bad Gateway: upstream request timed out` |
|
|
| Upstream 5xx HTTP response | `Bad Gateway: upstream error HTTP {status}` |
|
|
| Network-level error (no HTTP response) | `Bad Gateway: {error message}` |
|
|
| Upstream response body is not valid JSON | `Bad Gateway: unparseable response from upstream` |
|
|
| Upstream response body is not an object | `Bad Gateway: unexpected response from upstream` |
|
|
|
|
---
|
|
|
|
## Upstream Request (Proxy → KME Content Service)
|
|
|
|
The proxy makes a single GET request to the verbatim `kmeURL` value.
|
|
|
|
```
|
|
GET {kmeURL} HTTP/1.1
|
|
Authorization: OIDC_id_token {id_token}
|
|
```
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Method | `GET` |
|
|
| URL | Verbatim value of `kmeURL` query parameter — no manipulation, no re-encoding |
|
|
| `Authorization` | `OIDC_id_token {id_token}` where `id_token` is from `getValidToken()` |
|
|
| Timeout | 10 000 ms (10 seconds) |
|
|
|
|
---
|
|
|
|
## Error Mapping Summary
|
|
|
|
```
|
|
kmeURL absent/empty → 400
|
|
kmeURL malformed / non-http(s) → 400
|
|
Missing OIDC config → 500
|
|
Token acquisition failure → 502
|
|
Upstream 4xx → 404
|
|
Upstream 5xx → 502
|
|
Upstream timeout → 502
|
|
Network error → 502
|
|
Unparseable response body → 502
|
|
vkm:articleBody absent/null/empty → 404
|
|
Success → 200 text/html
|
|
```
|
|
|
|
---
|
|
|
|
## Non-regression: Existing Routes
|
|
|
|
This feature does not change the behaviour of existing routes:
|
|
|
|
| Route | Behaviour |
|
|
|-------|-----------|
|
|
| URL ends in `/sitemap.xml` | Sitemap flow (unchanged) |
|
|
| No `kmeURL`, not sitemap | Auth-check passthrough → 200 "Authorized" (unchanged) |
|