Files
kme_content_adapter/specs/002-sitemap-generation/data-model.md
Peter.Morton 50b87297d2 feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
2026-04-22 22:08:08 -05:00

6.0 KiB

Data Model: Sitemap XML Generation

Feature: 002-sitemap-generation Branch: 002-sitemap-generation Date: 2025-07-14


Entities

1. KnowledgeItem (external, read-only)

Represents a single document returned by the KME Knowledge Search Service. The adapter reads this shape from the upstream API response and never persists or mutates it.

Field Type Source Notes
vkm:url string | undefined Search API response items[] Canonical document URL. Required for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006).
title string | undefined Search API response Not used by the sitemap; present in payload, ignored.
(other fields) any Search API response Ignored; adapter reads only vkm:url.

Assumed response envelope (to be verified against live API — see research.md R-002):

{
  "items": [
    { "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" },
    { "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" }
  ]
}

If the root is a bare array, response.data itself is treated as the items array.


2. SitemapEntry (derived, in-memory)

Represents a single <url>/<loc> entry in the generated sitemap XML. Derived from a KnowledgeItem during the transformation step.

Field Type Derivation
loc string ${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}

Validation rules:

  • Only produced if item['vkm:url'] is a non-empty string.
  • The resulting loc must be a percent-encoded absolute URL.

3. SitemapDocument (output)

The XML document returned in the HTTP response body.

Attribute Value
XML version 1.0
Encoding UTF-8
Root element <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
Child elements Zero or more <url><loc>…</loc></url> entries

Populated sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
  </url>
  <url>
    <loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>
  </url>
</urlset>

Empty sitemap (zero results from search API):

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>

4. OIDCTokenCache (shared, Redis)

The existing Redis-backed OIDC token store. The sitemap flow reads and writes this store using the identical hGet/hSet pattern as the existing OIDC auth flow.

Redis Key Field Type Description
authorization token string The OIDC id_token JWT
authorization expiry string (float) Unix timestamp (seconds) when token expires

Access pattern in sitemap flow:

  1. hGet('authorization', 'token') — read cached token
  2. hGet('authorization', 'expiry') — read cached expiry
  3. If expired or absent: invoke token-refresh sequence → hSet both fields

5. kme_CSA_settings (configuration, JSON)

The settings object injected into the VM context from src/globalVariables/kme_CSA_settings.json. This feature extends it with three new fields.

Full schema after this feature:

Field Type Existing/New Required By
tokenUrl string Existing OIDC token fetch (all flows)
username string Existing OIDC token fetch
password string Existing OIDC token fetch
clientId string Existing OIDC token fetch
scope string Existing OIDC token fetch
searchApiBaseUrl string New FR-002, FR-010
tenant string New FR-002, FR-010
proxyBaseUrl string New FR-005, FR-010
_pendingFetch Promise | null Runtime only (not in JSON) Stampede guard

Validation:

  • Existing fields validated at top of script for all requests (unchanged).
  • New fields validated at start of sitemap branch only (FR-011).

State Transitions

Sitemap Request Lifecycle

Incoming GET /…/sitemap.xml
         |
         v
  Validate settings         --> 500 Internal Server Error (missing field)
  (searchApiBaseUrl,
   tenant, proxyBaseUrl)
         |
         v
  Read token from Redis
         |
    [valid?]
    YES  |  NO
         |  v
         |  Refresh token   --> 401 Unauthorized (token fetch failed)
         |       |
         +-------+
         v
  GET <searchApiBaseUrl>/<tenant>
  Authorization: OIDC_id_token <token>
  timeout: 10 000 ms
         |
   [success?]
   YES   |  NO
         |  +--> timeout          --> 504 Gateway Timeout
         |  +--> non-2xx response --> 502 Bad Gateway
         v
  Map items --> SitemapEntry[]
  (skip empty vkm:url)
         |
         v
  Build SitemapDocument (xmlBuilder)
         |
         v
  200 OK
  Content-Type: application/xml
  Body: <?xml ...><urlset>...</urlset>

Non-Sitemap Request Lifecycle (unchanged)

All requests whose URL does NOT end with /sitemap.xml follow the existing OIDC auth flow exactly as before. No modification to that path.


File Changes

Modified: src/globalVariables/kme_CSA_settings.json

Three new fields added (existing fields unchanged):

{
  "tokenUrl": "…",
  "username": "…",
  "password": "…",
  "clientId": "…",
  "scope": "…",
  "searchApiBaseUrl": "https://kme-search.example.com/api/search",
  "tenant": "my-tenant",
  "proxyBaseUrl": "https://adapter.example.com"
}

Modified: src/proxyScripts/kmeContentSourceAdapter.js

Logic added:

  1. URL routing guard at entry point.
  2. sitemapFlow async block: settings validation, token reuse, search API call, XML build, response.
  3. Existing OIDC auth flow moved to else branch (no logic changes).

Modified: src/globalVariables/kme_CSA_settings.json.example

Updated to include the three new fields with placeholder values.