- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
6.0 KiB
Data Model: Sitemap XML Generation
Feature: 002-sitemap-generation
Branch: 002-sitemap-generation
Date: 2025-07-14
Entities
1. KnowledgeItem (external, read-only)
Represents a single document returned by the KME Knowledge Search Service. The adapter reads this shape from the upstream API response and never persists or mutates it.
| Field | Type | Source | Notes |
|---|---|---|---|
vkm:url |
string | undefined |
Search API response items[] |
Canonical document URL. Required for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006). |
title |
string | undefined |
Search API response | Not used by the sitemap; present in payload, ignored. |
| (other fields) | any |
Search API response | Ignored; adapter reads only vkm:url. |
Assumed response envelope (to be verified against live API — see research.md R-002):
{
"items": [
{ "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" },
{ "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" }
]
}
If the root is a bare array, response.data itself is treated as the items array.
2. SitemapEntry (derived, in-memory)
Represents a single <url>/<loc> entry in the generated sitemap XML. Derived from a KnowledgeItem
during the transformation step.
| Field | Type | Derivation |
|---|---|---|
loc |
string |
${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])} |
Validation rules:
- Only produced if
item['vkm:url']is a non-empty string. - The resulting
locmust be a percent-encoded absolute URL.
3. SitemapDocument (output)
The XML document returned in the HTTP response body.
| Attribute | Value |
|---|---|
| XML version | 1.0 |
| Encoding | UTF-8 |
| Root element | <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> |
| Child elements | Zero or more <url><loc>…</loc></url> entries |
Populated sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
</url>
<url>
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>
</url>
</urlset>
Empty sitemap (zero results from search API):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
4. OIDCTokenCache (shared, Redis)
The existing Redis-backed OIDC token store. The sitemap flow reads and writes this store using the identical hGet/hSet pattern as the existing OIDC auth flow.
| Redis Key | Field | Type | Description |
|---|---|---|---|
authorization |
token |
string |
The OIDC id_token JWT |
authorization |
expiry |
string (float) |
Unix timestamp (seconds) when token expires |
Access pattern in sitemap flow:
hGet('authorization', 'token')— read cached tokenhGet('authorization', 'expiry')— read cached expiry- If expired or absent: invoke token-refresh sequence →
hSetboth fields
5. kme_CSA_settings (configuration, JSON)
The settings object injected into the VM context from src/globalVariables/kme_CSA_settings.json.
This feature extends it with three new fields.
Full schema after this feature:
| Field | Type | Existing/New | Required By |
|---|---|---|---|
tokenUrl |
string |
Existing | OIDC token fetch (all flows) |
username |
string |
Existing | OIDC token fetch |
password |
string |
Existing | OIDC token fetch |
clientId |
string |
Existing | OIDC token fetch |
scope |
string |
Existing | OIDC token fetch |
searchApiBaseUrl |
string |
New | FR-002, FR-010 |
tenant |
string |
New | FR-002, FR-010 |
proxyBaseUrl |
string |
New | FR-005, FR-010 |
_pendingFetch |
Promise | null |
Runtime only (not in JSON) | Stampede guard |
Validation:
- Existing fields validated at top of script for all requests (unchanged).
- New fields validated at start of sitemap branch only (FR-011).
State Transitions
Sitemap Request Lifecycle
Incoming GET /…/sitemap.xml
|
v
Validate settings --> 500 Internal Server Error (missing field)
(searchApiBaseUrl,
tenant, proxyBaseUrl)
|
v
Read token from Redis
|
[valid?]
YES | NO
| v
| Refresh token --> 401 Unauthorized (token fetch failed)
| |
+-------+
v
GET <searchApiBaseUrl>/<tenant>
Authorization: OIDC_id_token <token>
timeout: 10 000 ms
|
[success?]
YES | NO
| +--> timeout --> 504 Gateway Timeout
| +--> non-2xx response --> 502 Bad Gateway
v
Map items --> SitemapEntry[]
(skip empty vkm:url)
|
v
Build SitemapDocument (xmlBuilder)
|
v
200 OK
Content-Type: application/xml
Body: <?xml ...><urlset>...</urlset>
Non-Sitemap Request Lifecycle (unchanged)
All requests whose URL does NOT end with /sitemap.xml follow the existing OIDC auth flow
exactly as before. No modification to that path.
File Changes
Modified: src/globalVariables/kme_CSA_settings.json
Three new fields added (existing fields unchanged):
{
"tokenUrl": "…",
"username": "…",
"password": "…",
"clientId": "…",
"scope": "…",
"searchApiBaseUrl": "https://kme-search.example.com/api/search",
"tenant": "my-tenant",
"proxyBaseUrl": "https://adapter.example.com"
}
Modified: src/proxyScripts/kmeContentSourceAdapter.js
Logic added:
- URL routing guard at entry point.
sitemapFlowasync block: settings validation, token reuse, search API call, XML build, response.- Existing OIDC auth flow moved to
elsebranch (no logic changes).
Modified: src/globalVariables/kme_CSA_settings.json.example
Updated to include the three new fields with placeholder values.