Files
kme_content_adapter/specs/002-sitemap-generation/data-model.md
Peter.Morton 50b87297d2 feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
2026-04-22 22:08:08 -05:00

203 lines
6.0 KiB
Markdown

# Data Model: Sitemap XML Generation
**Feature**: `002-sitemap-generation`
**Branch**: `002-sitemap-generation`
**Date**: 2025-07-14
---
## Entities
### 1. `KnowledgeItem` (external, read-only)
Represents a single document returned by the KME Knowledge Search Service. The adapter reads
this shape from the upstream API response and never persists or mutates it.
| Field | Type | Source | Notes |
|---|---|---|---|
| `vkm:url` | `string \| undefined` | Search API response `items[]` | Canonical document URL. **Required** for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006). |
| `title` | `string \| undefined` | Search API response | Not used by the sitemap; present in payload, ignored. |
| *(other fields)* | `any` | Search API response | Ignored; adapter reads only `vkm:url`. |
**Assumed response envelope** (to be verified against live API — see research.md R-002):
```json
{
"items": [
{ "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" },
{ "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" }
]
}
```
If the root is a bare array, `response.data` itself is treated as the items array.
---
### 2. `SitemapEntry` (derived, in-memory)
Represents a single `<url>/<loc>` entry in the generated sitemap XML. Derived from a `KnowledgeItem`
during the transformation step.
| Field | Type | Derivation |
|---|---|---|
| `loc` | `string` | `${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` |
**Validation rules**:
- Only produced if `item['vkm:url']` is a non-empty string.
- The resulting `loc` must be a percent-encoded absolute URL.
---
### 3. `SitemapDocument` (output)
The XML document returned in the HTTP response body.
| Attribute | Value |
|---|---|
| XML version | `1.0` |
| Encoding | `UTF-8` |
| Root element | `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">` |
| Child elements | Zero or more `<url><loc>…</loc></url>` entries |
**Populated sitemap**:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
</url>
<url>
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>
</url>
</urlset>
```
**Empty sitemap** (zero results from search API):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
```
---
### 4. `OIDCTokenCache` (shared, Redis)
The existing Redis-backed OIDC token store. The sitemap flow **reads** and **writes** this store
using the identical hGet/hSet pattern as the existing OIDC auth flow.
| Redis Key | Field | Type | Description |
|---|---|---|---|
| `authorization` | `token` | `string` | The OIDC `id_token` JWT |
| `authorization` | `expiry` | `string (float)` | Unix timestamp (seconds) when token expires |
**Access pattern in sitemap flow**:
1. `hGet('authorization', 'token')` — read cached token
2. `hGet('authorization', 'expiry')` — read cached expiry
3. If expired or absent: invoke token-refresh sequence → `hSet` both fields
---
### 5. `kme_CSA_settings` (configuration, JSON)
The settings object injected into the VM context from `src/globalVariables/kme_CSA_settings.json`.
This feature extends it with three new fields.
**Full schema after this feature**:
| Field | Type | Existing/New | Required By |
|---|---|---|---|
| `tokenUrl` | `string` | Existing | OIDC token fetch (all flows) |
| `username` | `string` | Existing | OIDC token fetch |
| `password` | `string` | Existing | OIDC token fetch |
| `clientId` | `string` | Existing | OIDC token fetch |
| `scope` | `string` | Existing | OIDC token fetch |
| `searchApiBaseUrl` | `string` | **New** | FR-002, FR-010 |
| `tenant` | `string` | **New** | FR-002, FR-010 |
| `proxyBaseUrl` | `string` | **New** | FR-005, FR-010 |
| `_pendingFetch` | `Promise \| null` | Runtime only (not in JSON) | Stampede guard |
**Validation**:
- Existing fields validated at top of script for all requests (unchanged).
- New fields validated at start of sitemap branch only (FR-011).
---
## State Transitions
### Sitemap Request Lifecycle
```
Incoming GET /…/sitemap.xml
|
v
Validate settings --> 500 Internal Server Error (missing field)
(searchApiBaseUrl,
tenant, proxyBaseUrl)
|
v
Read token from Redis
|
[valid?]
YES | NO
| v
| Refresh token --> 401 Unauthorized (token fetch failed)
| |
+-------+
v
GET <searchApiBaseUrl>/<tenant>
Authorization: OIDC_id_token <token>
timeout: 10 000 ms
|
[success?]
YES | NO
| +--> timeout --> 504 Gateway Timeout
| +--> non-2xx response --> 502 Bad Gateway
v
Map items --> SitemapEntry[]
(skip empty vkm:url)
|
v
Build SitemapDocument (xmlBuilder)
|
v
200 OK
Content-Type: application/xml
Body: <?xml ...><urlset>...</urlset>
```
### Non-Sitemap Request Lifecycle (unchanged)
All requests whose URL does NOT end with `/sitemap.xml` follow the existing OIDC auth flow
exactly as before. No modification to that path.
---
## File Changes
### Modified: `src/globalVariables/kme_CSA_settings.json`
Three new fields added (existing fields unchanged):
```json
{
"tokenUrl": "…",
"username": "…",
"password": "…",
"clientId": "…",
"scope": "…",
"searchApiBaseUrl": "https://kme-search.example.com/api/search",
"tenant": "my-tenant",
"proxyBaseUrl": "https://adapter.example.com"
}
```
### Modified: `src/proxyScripts/kmeContentSourceAdapter.js`
Logic added:
1. URL routing guard at entry point.
2. `sitemapFlow` async block: settings validation, token reuse, search API call, XML build, response.
3. Existing OIDC auth flow moved to `else` branch (no logic changes).
### Modified: `src/globalVariables/kme_CSA_settings.json.example`
Updated to include the three new fields with placeholder values.