feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
This commit is contained in:
189
specs/002-sitemap-generation/contracts/sitemap-endpoint.md
Normal file
189
specs/002-sitemap-generation/contracts/sitemap-endpoint.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Contract: Sitemap Endpoint
|
||||
|
||||
**Feature**: `002-sitemap-generation`
|
||||
**Endpoint type**: HTTP GET
|
||||
**Introduced in**: `002-sitemap-generation`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The `kme-content-adapter` proxy exposes a single new HTTP endpoint: `GET /sitemap.xml` (or
|
||||
any URL whose path ends with `/sitemap.xml`). This contract governs the complete observable
|
||||
behaviour of that endpoint from the consumer's perspective.
|
||||
|
||||
---
|
||||
|
||||
## Endpoint
|
||||
|
||||
```
|
||||
GET <proxy-base-url>/sitemap.xml
|
||||
```
|
||||
|
||||
The adapter detects sitemap requests by checking whether `req.url` ends with `/sitemap.xml`.
|
||||
The full path prefix (if any) is determined by how the reverse proxy routes requests to this
|
||||
adapter.
|
||||
|
||||
---
|
||||
|
||||
## Request
|
||||
|
||||
### Method
|
||||
`GET`
|
||||
|
||||
### Headers
|
||||
No special request headers required. The adapter uses its own internally cached OIDC token
|
||||
to authenticate the upstream call to the KME Knowledge Search Service.
|
||||
|
||||
### Body
|
||||
None.
|
||||
|
||||
---
|
||||
|
||||
## Responses
|
||||
|
||||
### 200 OK — Sitemap generated successfully
|
||||
|
||||
**Condition**: The KME Knowledge Search Service returned a 2xx response and the sitemap was
|
||||
built without errors.
|
||||
|
||||
**Headers**:
|
||||
```
|
||||
Content-Type: application/xml
|
||||
```
|
||||
|
||||
**Body**: A well-formed XML Sitemap document conforming to
|
||||
[https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd](https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd).
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>https://{proxyBaseUrl}?kmeURL={encodeURIComponent(vkmUrl)}</loc>
|
||||
</url>
|
||||
<!-- one <url> element per knowledge item with a non-empty vkm:url -->
|
||||
</urlset>
|
||||
```
|
||||
|
||||
**Empty-result variant** (search service returns zero items):
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
|
||||
```
|
||||
|
||||
### 500 Internal Server Error — Missing configuration
|
||||
|
||||
**Condition**: One or more required settings fields (`searchApiBaseUrl`, `tenant`,
|
||||
`proxyBaseUrl`) are absent from `kme_CSA_settings`.
|
||||
|
||||
**Headers**:
|
||||
```
|
||||
Content-Type: text/plain
|
||||
```
|
||||
|
||||
**Body**:
|
||||
```
|
||||
Configuration error: missing required field: <fieldName>
|
||||
```
|
||||
|
||||
### 502 Bad Gateway — Upstream search service error
|
||||
|
||||
**Condition**: The KME Knowledge Search Service returned a non-2xx HTTP response.
|
||||
|
||||
**Headers**:
|
||||
```
|
||||
Content-Type: text/plain
|
||||
```
|
||||
|
||||
**Body**:
|
||||
```
|
||||
Search service error: HTTP <status>
|
||||
```
|
||||
|
||||
### 504 Gateway Timeout — Upstream search service timeout
|
||||
|
||||
**Condition**: The KME Knowledge Search Service connection timed out (>10 000 ms).
|
||||
|
||||
**Headers**:
|
||||
```
|
||||
Content-Type: text/plain
|
||||
```
|
||||
|
||||
**Body**:
|
||||
```
|
||||
Search service timeout
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `<loc>` URL Format
|
||||
|
||||
Each `<loc>` element is constructed as:
|
||||
|
||||
```
|
||||
{proxyBaseUrl}?kmeURL={encodeURIComponent(item['vkm:url'])}
|
||||
```
|
||||
|
||||
Where:
|
||||
- `proxyBaseUrl` is taken from `kme_CSA_settings.proxyBaseUrl` (e.g., `https://adapter.example.com`)
|
||||
- `item['vkm:url']` is the raw `vkm:url` value from the search service result
|
||||
- `encodeURIComponent` percent-encodes the value so it is safe as a query parameter
|
||||
|
||||
**Example**:
|
||||
```
|
||||
https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fknowledge%2Farticle-123
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authentication to Upstream (internal, not exposed to consumer)
|
||||
|
||||
The adapter authenticates to the KME Knowledge Search Service using:
|
||||
|
||||
```
|
||||
Authorization: OIDC_id_token <token>
|
||||
```
|
||||
|
||||
Where `<token>` is the `id_token` from the OIDC token service, cached in Redis at
|
||||
`authorization.token`. Token refresh uses the same stampede-guarded fetch already present
|
||||
in the existing OIDC auth flow.
|
||||
|
||||
---
|
||||
|
||||
## Existing Endpoint Behaviour (unchanged)
|
||||
|
||||
All requests whose URL does **not** end in `/sitemap.xml` continue to use the existing OIDC
|
||||
authentication flow with no change in response behaviour:
|
||||
|
||||
| Condition | Response |
|
||||
|---|---|
|
||||
| Valid cached OIDC token | `200 Authorized` (`text/plain`) |
|
||||
| No cached token — fetch succeeds | `200 Authorized` (`text/plain`) |
|
||||
| Token service unreachable | `401 Unauthorized: <error>` (`text/plain`) |
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Constraints
|
||||
|
||||
| Constraint | Value | Source |
|
||||
|---|---|---|
|
||||
| Search API timeout | 10 000 ms | Spec assumption |
|
||||
| Max response time (normal conditions) | < 5 000 ms | SC-001 |
|
||||
| Max response time (error scenarios) | < 10 000 ms | SC-005 |
|
||||
| Pagination | Not supported (v1) | Spec assumption |
|
||||
| Multi-tenant | Not supported (v1) | Spec assumption |
|
||||
|
||||
---
|
||||
|
||||
## Sitemap Protocol Compliance
|
||||
|
||||
The returned XML must validate against the Sitemaps XSD:
|
||||
`https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd`
|
||||
|
||||
Required elements per entry (v1 scope):
|
||||
- `<loc>` — mandatory
|
||||
|
||||
Optional elements **not included** in v1:
|
||||
- `<lastmod>` — out of scope
|
||||
- `<changefreq>` — out of scope
|
||||
- `<priority>` — out of scope
|
||||
Reference in New Issue
Block a user