- Add contentFetchFlow() to proxy (FR-001 through FR-012) - Add extractArticleBody() helper with vkm:articleBody / articleBody fallback - Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers - Forward query/size/category params on /sitemap.xml requests - Add Accept: application/ld+json header to content API calls - Remove oidcAuthFlow() - unmatched requests now return 404 Not Found - Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...) - Version bump 0.2.0 → 0.3.0 - 45/45 tests passing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
203 lines
6.0 KiB
Markdown
203 lines
6.0 KiB
Markdown
# Data Model: Sitemap XML Generation
|
|
|
|
**Feature**: `002-sitemap-generation`
|
|
**Branch**: `002-sitemap-generation`
|
|
**Date**: 2025-07-14
|
|
|
|
---
|
|
|
|
## Entities
|
|
|
|
### 1. `KnowledgeItem` (external, read-only)
|
|
|
|
Represents a single document returned by the KME Knowledge Search Service. The adapter reads
|
|
this shape from the upstream API response and never persists or mutates it.
|
|
|
|
| Field | Type | Source | Notes |
|
|
|---|---|---|---|
|
|
| `vkm:url` | `string \| undefined` | Search API response `items[]` | Canonical document URL. **Required** for sitemap inclusion. Items where this field is absent or empty are silently omitted (FR-006). |
|
|
| `title` | `string \| undefined` | Search API response | Not used by the sitemap; present in payload, ignored. |
|
|
| *(other fields)* | `any` | Search API response | Ignored; adapter reads only `vkm:url`. |
|
|
|
|
**Assumed response envelope** (to be verified against live API — see research.md R-002):
|
|
```json
|
|
{
|
|
"items": [
|
|
{ "vkm:url": "https://kme.example.com/knowledge/doc-1", "title": "Doc One" },
|
|
{ "vkm:url": "https://kme.example.com/knowledge/doc-2", "title": "Doc Two" }
|
|
]
|
|
}
|
|
```
|
|
If the root is a bare array, `response.data` itself is treated as the items array.
|
|
|
|
---
|
|
|
|
### 2. `SitemapEntry` (derived, in-memory)
|
|
|
|
Represents a single `<url>/<loc>` entry in the generated sitemap XML. Derived from a `KnowledgeItem`
|
|
during the transformation step.
|
|
|
|
| Field | Type | Derivation |
|
|
|---|---|---|
|
|
| `loc` | `string` | `${kme_CSA_settings.proxyBaseUrl}?kmeURL=${encodeURIComponent(item['vkm:url'])}` |
|
|
|
|
**Validation rules**:
|
|
- Only produced if `item['vkm:url']` is a non-empty string.
|
|
- The resulting `loc` must be a percent-encoded absolute URL.
|
|
|
|
---
|
|
|
|
### 3. `SitemapDocument` (output)
|
|
|
|
The XML document returned in the HTTP response body.
|
|
|
|
| Attribute | Value |
|
|
|---|---|
|
|
| XML version | `1.0` |
|
|
| Encoding | `UTF-8` |
|
|
| Root element | `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">` |
|
|
| Child elements | Zero or more `<url><loc>…</loc></url>` entries |
|
|
|
|
**Populated sitemap**:
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
|
<url>
|
|
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
|
|
</url>
|
|
<url>
|
|
<loc>https://adapter.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-2</loc>
|
|
</url>
|
|
</urlset>
|
|
```
|
|
|
|
**Empty sitemap** (zero results from search API):
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"/>
|
|
```
|
|
|
|
---
|
|
|
|
### 4. `OIDCTokenCache` (shared, Redis)
|
|
|
|
The existing Redis-backed OIDC token store. The sitemap flow **reads** and **writes** this store
|
|
using the identical hGet/hSet pattern as the existing OIDC auth flow.
|
|
|
|
| Redis Key | Field | Type | Description |
|
|
|---|---|---|---|
|
|
| `authorization` | `token` | `string` | The OIDC `id_token` JWT |
|
|
| `authorization` | `expiry` | `string (float)` | Unix timestamp (seconds) when token expires |
|
|
|
|
**Access pattern in sitemap flow**:
|
|
1. `hGet('authorization', 'token')` — read cached token
|
|
2. `hGet('authorization', 'expiry')` — read cached expiry
|
|
3. If expired or absent: invoke token-refresh sequence → `hSet` both fields
|
|
|
|
---
|
|
|
|
### 5. `kme_CSA_settings` (configuration, JSON)
|
|
|
|
The settings object injected into the VM context from `src/globalVariables/kme_CSA_settings.json`.
|
|
This feature extends it with three new fields.
|
|
|
|
**Full schema after this feature**:
|
|
|
|
| Field | Type | Existing/New | Required By |
|
|
|---|---|---|---|
|
|
| `tokenUrl` | `string` | Existing | OIDC token fetch (all flows) |
|
|
| `username` | `string` | Existing | OIDC token fetch |
|
|
| `password` | `string` | Existing | OIDC token fetch |
|
|
| `clientId` | `string` | Existing | OIDC token fetch |
|
|
| `scope` | `string` | Existing | OIDC token fetch |
|
|
| `searchApiBaseUrl` | `string` | **New** | FR-002, FR-010 |
|
|
| `tenant` | `string` | **New** | FR-002, FR-010 |
|
|
| `proxyBaseUrl` | `string` | **New** | FR-005, FR-010 |
|
|
| `_pendingFetch` | `Promise \| null` | Runtime only (not in JSON) | Stampede guard |
|
|
|
|
**Validation**:
|
|
- Existing fields validated at top of script for all requests (unchanged).
|
|
- New fields validated at start of sitemap branch only (FR-011).
|
|
|
|
---
|
|
|
|
## State Transitions
|
|
|
|
### Sitemap Request Lifecycle
|
|
|
|
```
|
|
Incoming GET /…/sitemap.xml
|
|
|
|
|
v
|
|
Validate settings --> 500 Internal Server Error (missing field)
|
|
(searchApiBaseUrl,
|
|
tenant, proxyBaseUrl)
|
|
|
|
|
v
|
|
Read token from Redis
|
|
|
|
|
[valid?]
|
|
YES | NO
|
|
| v
|
|
| Refresh token --> 401 Unauthorized (token fetch failed)
|
|
| |
|
|
+-------+
|
|
v
|
|
GET <searchApiBaseUrl>/<tenant>
|
|
Authorization: OIDC_id_token <token>
|
|
timeout: 10 000 ms
|
|
|
|
|
[success?]
|
|
YES | NO
|
|
| +--> timeout --> 504 Gateway Timeout
|
|
| +--> non-2xx response --> 502 Bad Gateway
|
|
v
|
|
Map items --> SitemapEntry[]
|
|
(skip empty vkm:url)
|
|
|
|
|
v
|
|
Build SitemapDocument (xmlbuilder2)
|
|
|
|
|
v
|
|
200 OK
|
|
Content-Type: application/xml
|
|
Body: <?xml ...><urlset>...</urlset>
|
|
```
|
|
|
|
### Non-Sitemap Request Lifecycle (unchanged)
|
|
|
|
All requests whose URL does NOT end with `/sitemap.xml` follow the existing OIDC auth flow
|
|
exactly as before. No modification to that path.
|
|
|
|
---
|
|
|
|
## File Changes
|
|
|
|
### Modified: `src/globalVariables/kme_CSA_settings.json`
|
|
|
|
Three new fields added (existing fields unchanged):
|
|
|
|
```json
|
|
{
|
|
"tokenUrl": "…",
|
|
"username": "…",
|
|
"password": "…",
|
|
"clientId": "…",
|
|
"scope": "…",
|
|
"searchApiBaseUrl": "https://kme-search.example.com/api/search",
|
|
"tenant": "my-tenant",
|
|
"proxyBaseUrl": "https://adapter.example.com"
|
|
}
|
|
```
|
|
|
|
### Modified: `src/proxyScripts/kmeContentSourceAdapter.js`
|
|
|
|
Logic added:
|
|
1. URL routing guard at entry point.
|
|
2. `sitemapFlow` async block: settings validation, token reuse, search API call, XML build, response.
|
|
3. Existing OIDC auth flow moved to `else` branch (no logic changes).
|
|
|
|
### Modified: `src/globalVariables/kme_CSA_settings.json.example`
|
|
|
|
Updated to include the three new fields with placeholder values.
|