# Data Model: Sitemap XML Generation **Feature**: `002-sitemap-generation` **Branch**: `002-sitemap-generation` **Date**: 2025-07-14 --- ## Entities ### 1. `KnowledgeItem` (external, read-only) Represents a single document returned by the KME Knowledge Search Service. The adapter reads this shape from the upstream API response and never persists or mutates it. | Field | Type | Source | Notes | |---|---|---|---| | `vkm:url` | `string \| undefined` | Search API response `items[]` | Canonical document URL. **Required** for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006). | | `title` | `string \| undefined` | Search API response | Not used by the sitemap; present in payload, ignored. | | *(other fields)* | `any` | Search API response | Ignored; adapter reads only `vkm:url`. | **Assumed response envelope** (to be verified against live API — see research.md R-002): ```json { "items": [ { "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" }, { "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" } ] } ``` If the root is a bare array, `response.data` itself is treated as the items array. --- ### 2. `SitemapEntry` (derived, in-memory) Represents a single `/` entry in the generated sitemap XML. Derived from a `KnowledgeItem` during the transformation step. | Field | Type | Derivation | |---|---|---| | `loc` | `string` | `${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` | **Validation rules**: - Only produced if `item['vkm:url']` is a non-empty string. - The resulting `loc` must be a percent-encoded absolute URL. --- ### 3. `SitemapDocument` (output) The XML document returned in the HTTP response body. | Attribute | Value | |---|---| | XML version | `1.0` | | Encoding | `UTF-8` | | Root element | `` | | Child elements | Zero or more `` entries | **Populated sitemap**: ```xml https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1 https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2 ``` **Empty sitemap** (zero results from search API): ```xml ``` --- ### 4. `OIDCTokenCache` (shared, Redis) The existing Redis-backed OIDC token store. The sitemap flow **reads** and **writes** this store using the identical hGet/hSet pattern as the existing OIDC auth flow. | Redis Key | Field | Type | Description | |---|---|---|---| | `authorization` | `token` | `string` | The OIDC `id_token` JWT | | `authorization` | `expiry` | `string (float)` | Unix timestamp (seconds) when token expires | **Access pattern in sitemap flow**: 1. `hGet('authorization', 'token')` — read cached token 2. `hGet('authorization', 'expiry')` — read cached expiry 3. If expired or absent: invoke token-refresh sequence → `hSet` both fields --- ### 5. `kme_CSA_settings` (configuration, JSON) The settings object injected into the VM context from `src/globalVariables/kme_CSA_settings.json`. This feature extends it with three new fields. **Full schema after this feature**: | Field | Type | Existing/New | Required By | |---|---|---|---| | `tokenUrl` | `string` | Existing | OIDC token fetch (all flows) | | `username` | `string` | Existing | OIDC token fetch | | `password` | `string` | Existing | OIDC token fetch | | `clientId` | `string` | Existing | OIDC token fetch | | `scope` | `string` | Existing | OIDC token fetch | | `searchApiBaseUrl` | `string` | **New** | FR-002, FR-010 | | `tenant` | `string` | **New** | FR-002, FR-010 | | `proxyBaseUrl` | `string` | **New** | FR-005, FR-010 | | `_pendingFetch` | `Promise \| null` | Runtime only (not in JSON) | Stampede guard | **Validation**: - Existing fields validated at top of script for all requests (unchanged). - New fields validated at start of sitemap branch only (FR-011). --- ## State Transitions ### Sitemap Request Lifecycle ``` Incoming GET /…/sitemap.xml | v Validate settings --> 500 Internal Server Error (missing field) (searchApiBaseUrl, tenant, proxyBaseUrl) | v Read token from Redis | [valid?] YES | NO | v | Refresh token --> 401 Unauthorized (token fetch failed) | | +-------+ v GET / Authorization: OIDC_id_token timeout: 10 000 ms | [success?] YES | NO | +--> timeout --> 504 Gateway Timeout | +--> non-2xx response --> 502 Bad Gateway v Map items --> SitemapEntry[] (skip empty vkm:url) | v Build SitemapDocument (xmlBuilder) | v 200 OK Content-Type: application/xml Body: ... ``` ### Non-Sitemap Request Lifecycle (unchanged) All requests whose URL does NOT end with `/sitemap.xml` follow the existing OIDC auth flow exactly as before. No modification to that path. --- ## File Changes ### Modified: `src/globalVariables/kme_CSA_settings.json` Three new fields added (existing fields unchanged): ```json { "tokenUrl": "…", "username": "…", "password": "…", "clientId": "…", "scope": "…", "searchApiBaseUrl": "https://kme-search.example.com/api/search", "tenant": "my-tenant", "proxyBaseUrl": "https://adapter.example.com" } ``` ### Modified: `src/proxyScripts/kmeContentSourceAdapter.js` Logic added: 1. URL routing guard at entry point. 2. `sitemapFlow` async block: settings validation, token reuse, search API call, XML build, response. 3. Existing OIDC auth flow moved to `else` branch (no logic changes). ### Modified: `src/globalVariables/kme_CSA_settings.json.example` Updated to include the three new fields with placeholder values.