feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
This commit is contained in:
202
specs/002-sitemap-generation/data-model.md
Normal file
202
specs/002-sitemap-generation/data-model.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Data Model: Sitemap XML Generation
|
||||
|
||||
**Feature**: `002-sitemap-generation`
|
||||
**Branch**: `002-sitemap-generation`
|
||||
**Date**: 2025-07-14
|
||||
|
||||
---
|
||||
|
||||
## Entities
|
||||
|
||||
### 1. `KnowledgeItem` (external, read-only)
|
||||
|
||||
Represents a single document returned by the KME Knowledge Search Service. The adapter reads
|
||||
this shape from the upstream API response and never persists or mutates it.
|
||||
|
||||
| Field | Type | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| `vkm:url` | `string \| undefined` | Search API response `items[]` | Canonical document URL. **Required** for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006). |
|
||||
| `title` | `string \| undefined` | Search API response | Not used by the sitemap; present in payload, ignored. |
|
||||
| *(other fields)* | `any` | Search API response | Ignored; adapter reads only `vkm:url`. |
|
||||
|
||||
**Assumed response envelope** (to be verified against live API — see research.md R-002):
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{ "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" },
|
||||
{ "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" }
|
||||
]
|
||||
}
|
||||
```
|
||||
If the root is a bare array, `response.data` itself is treated as the items array.
|
||||
|
||||
---
|
||||
|
||||
### 2. `SitemapEntry` (derived, in-memory)
|
||||
|
||||
Represents a single `<url>/<loc>` entry in the generated sitemap XML. Derived from a `KnowledgeItem`
|
||||
during the transformation step.
|
||||
|
||||
| Field | Type | Derivation |
|
||||
|---|---|---|
|
||||
| `loc` | `string` | `${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` |
|
||||
|
||||
**Validation rules**:
|
||||
- Only produced if `item['vkm:url']` is a non-empty string.
|
||||
- The resulting `loc` must be a percent-encoded absolute URL.
|
||||
|
||||
---
|
||||
|
||||
### 3. `SitemapDocument` (output)
|
||||
|
||||
The XML document returned in the HTTP response body.
|
||||
|
||||
| Attribute | Value |
|
||||
|---|---|
|
||||
| XML version | `1.0` |
|
||||
| Encoding | `UTF-8` |
|
||||
| Root element | `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">` |
|
||||
| Child elements | Zero or more `<url><loc>…</loc></url>` entries |
|
||||
|
||||
**Populated sitemap**:
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
|
||||
</url>
|
||||
<url>
|
||||
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>
|
||||
</url>
|
||||
</urlset>
|
||||
```
|
||||
|
||||
**Empty sitemap** (zero results from search API):
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. `OIDCTokenCache` (shared, Redis)
|
||||
|
||||
The existing Redis-backed OIDC token store. The sitemap flow **reads** and **writes** this store
|
||||
using the identical hGet/hSet pattern as the existing OIDC auth flow.
|
||||
|
||||
| Redis Key | Field | Type | Description |
|
||||
|---|---|---|---|
|
||||
| `authorization` | `token` | `string` | The OIDC `id_token` JWT |
|
||||
| `authorization` | `expiry` | `string (float)` | Unix timestamp (seconds) when token expires |
|
||||
|
||||
**Access pattern in sitemap flow**:
|
||||
1. `hGet('authorization', 'token')` — read cached token
|
||||
2. `hGet('authorization', 'expiry')` — read cached expiry
|
||||
3. If expired or absent: invoke token-refresh sequence → `hSet` both fields
|
||||
|
||||
---
|
||||
|
||||
### 5. `kme_CSA_settings` (configuration, JSON)
|
||||
|
||||
The settings object injected into the VM context from `src/globalVariables/kme_CSA_settings.json`.
|
||||
This feature extends it with three new fields.
|
||||
|
||||
**Full schema after this feature**:
|
||||
|
||||
| Field | Type | Existing/New | Required By |
|
||||
|---|---|---|---|
|
||||
| `tokenUrl` | `string` | Existing | OIDC token fetch (all flows) |
|
||||
| `username` | `string` | Existing | OIDC token fetch |
|
||||
| `password` | `string` | Existing | OIDC token fetch |
|
||||
| `clientId` | `string` | Existing | OIDC token fetch |
|
||||
| `scope` | `string` | Existing | OIDC token fetch |
|
||||
| `searchApiBaseUrl` | `string` | **New** | FR-002, FR-010 |
|
||||
| `tenant` | `string` | **New** | FR-002, FR-010 |
|
||||
| `proxyBaseUrl` | `string` | **New** | FR-005, FR-010 |
|
||||
| `_pendingFetch` | `Promise \| null` | Runtime only (not in JSON) | Stampede guard |
|
||||
|
||||
**Validation**:
|
||||
- Existing fields validated at top of script for all requests (unchanged).
|
||||
- New fields validated at start of sitemap branch only (FR-011).
|
||||
|
||||
---
|
||||
|
||||
## State Transitions
|
||||
|
||||
### Sitemap Request Lifecycle
|
||||
|
||||
```
|
||||
Incoming GET /…/sitemap.xml
|
||||
|
|
||||
v
|
||||
Validate settings --> 500 Internal Server Error (missing field)
|
||||
(searchApiBaseUrl,
|
||||
tenant, proxyBaseUrl)
|
||||
|
|
||||
v
|
||||
Read token from Redis
|
||||
|
|
||||
[valid?]
|
||||
YES | NO
|
||||
| v
|
||||
| Refresh token --> 401 Unauthorized (token fetch failed)
|
||||
| |
|
||||
+-------+
|
||||
v
|
||||
GET <searchApiBaseUrl>/<tenant>
|
||||
Authorization: OIDC_id_token <token>
|
||||
timeout: 10 000 ms
|
||||
|
|
||||
[success?]
|
||||
YES | NO
|
||||
| +--> timeout --> 504 Gateway Timeout
|
||||
| +--> non-2xx response --> 502 Bad Gateway
|
||||
v
|
||||
Map items --> SitemapEntry[]
|
||||
(skip empty vkm:url)
|
||||
|
|
||||
v
|
||||
Build SitemapDocument (xmlBuilder)
|
||||
|
|
||||
v
|
||||
200 OK
|
||||
Content-Type: application/xml
|
||||
Body: <?xml ...><urlset>...</urlset>
|
||||
```
|
||||
|
||||
### Non-Sitemap Request Lifecycle (unchanged)
|
||||
|
||||
All requests whose URL does NOT end with `/sitemap.xml` follow the existing OIDC auth flow
|
||||
exactly as before. No modification to that path.
|
||||
|
||||
---
|
||||
|
||||
## File Changes
|
||||
|
||||
### Modified: `src/globalVariables/kme_CSA_settings.json`
|
||||
|
||||
Three new fields added (existing fields unchanged):
|
||||
|
||||
```json
|
||||
{
|
||||
"tokenUrl": "…",
|
||||
"username": "…",
|
||||
"password": "…",
|
||||
"clientId": "…",
|
||||
"scope": "…",
|
||||
"searchApiBaseUrl": "https://kme-search.example.com/api/search",
|
||||
"tenant": "my-tenant",
|
||||
"proxyBaseUrl": "https://adapter.example.com"
|
||||
}
|
||||
```
|
||||
|
||||
### Modified: `src/proxyScripts/kmeContentSourceAdapter.js`
|
||||
|
||||
Logic added:
|
||||
1. URL routing guard at entry point.
|
||||
2. `sitemapFlow` async block: settings validation, token reuse, search API call, XML build, response.
|
||||
3. Existing OIDC auth flow moved to `else` branch (no logic changes).
|
||||
|
||||
### Modified: `src/globalVariables/kme_CSA_settings.json.example`
|
||||
|
||||
Updated to include the three new fields with placeholder values.
|
||||
Reference in New Issue
Block a user