Files
kme_content_adapter/specs/002-sitemap-generation/contracts/sitemap-endpoint.md
Peter.Morton 50b87297d2 feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
2026-04-22 22:08:08 -05:00

190 lines
4.5 KiB
Markdown

# Contract: Sitemap Endpoint
**Feature**: `002-sitemap-generation`
**Endpoint type**: HTTP GET
**Introduced in**: `002-sitemap-generation`
---
## Overview
The `kme-content-adapter` proxy exposes a single new HTTP endpoint: `GET /sitemap.xml` (or
any URL whose path ends with `/sitemap.xml`). This contract governs the complete observable
behaviour of that endpoint from the consumer's perspective.
---
## Endpoint
```
GET <proxy-base-url>/sitemap.xml
```
The adapter detects sitemap requests by checking whether `req.url` ends with `/sitemap.xml`.
The full path prefix (if any) is determined by how the reverse proxy routes requests to this
adapter.
---
## Request
### Method
`GET`
### Headers
No special request headers required. The adapter uses its own internally cached OIDC token
to authenticate the upstream call to the KME Knowledge Search Service.
### Body
None.
---
## Responses
### 200 OK — Sitemap generated successfully
**Condition**: The KME Knowledge Search Service returned a 2xx response and the sitemap was
built without errors.
**Headers**:
```
Content-Type: application/xml
```
**Body**: A well-formed XML Sitemap document conforming to
[https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd](https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd).
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://{proxyBaseUrl}?kmeURL={encodeURIComponent(vkmUrl)}</loc>
</url>
<!-- one <url> element per knowledge item with a non-empty vkm:url -->
</urlset>
```
**Empty-result variant** (search service returns zero items):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
```
### 500 Internal Server Error — Missing configuration
**Condition**: One or more required settings fields (`searchApiBaseUrl`, `tenant`,
`proxyBaseUrl`) are absent from `kme_CSA_settings`.
**Headers**:
```
Content-Type: text/plain
```
**Body**:
```
Configuration error: missing required field: <fieldName>
```
### 502 Bad Gateway — Upstream search service error
**Condition**: The KME Knowledge Search Service returned a non-2xx HTTP response.
**Headers**:
```
Content-Type: text/plain
```
**Body**:
```
Search service error: HTTP <status>
```
### 504 Gateway Timeout — Upstream search service timeout
**Condition**: The KME Knowledge Search Service connection timed out (>10 000 ms).
**Headers**:
```
Content-Type: text/plain
```
**Body**:
```
Search service timeout
```
---
## `<loc>` URL Format
Each `<loc>` element is constructed as:
```
{proxyBaseUrl}?kmeURL={encodeURIComponent(item['vkm:url'])}
```
Where:
- `proxyBaseUrl` is taken from `kme_CSA_settings.proxyBaseUrl` (e.g., `https://adapter.example.com`)
- `item['vkm:url']` is the raw `vkm:url` value from the search service result
- `encodeURIComponent` percent-encodes the value so it is safe as a query parameter
**Example**:
```
https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fknowledge%2Farticle-123
```
---
## Authentication to Upstream (internal, not exposed to consumer)
The adapter authenticates to the KME Knowledge Search Service using:
```
Authorization: OIDC_id_token <token>
```
Where `<token>` is the `id_token` from the OIDC token service, cached in Redis at
`authorization.token`. Token refresh uses the same stampede-guarded fetch already present
in the existing OIDC auth flow.
---
## Existing Endpoint Behaviour (unchanged)
All requests whose URL does **not** end in `/sitemap.xml` continue to use the existing OIDC
authentication flow with no change in response behaviour:
| Condition | Response |
|---|---|
| Valid cached OIDC token | `200 Authorized` (`text/plain`) |
| No cached token — fetch succeeds | `200 Authorized` (`text/plain`) |
| Token service unreachable | `401 Unauthorized: <error>` (`text/plain`) |
---
## Non-Functional Constraints
| Constraint | Value | Source |
|---|---|---|
| Search API timeout | 10 000 ms | Spec assumption |
| Max response time (normal conditions) | < 5 000 ms | SC-001 |
| Max response time (error scenarios) | < 10 000 ms | SC-005 |
| Pagination | Not supported (v1) | Spec assumption |
| Multi-tenant | Not supported (v1) | Spec assumption |
---
## Sitemap Protocol Compliance
The returned XML must validate against the Sitemaps XSD:
`https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd`
Required elements per entry (v1 scope):
- `<loc>` — mandatory
Optional elements **not included** in v1:
- `<lastmod>` — out of scope
- `<changefreq>` — out of scope
- `<priority>` — out of scope