From 7d6637effa1a569db12e9b9f5124d70ea07d827a Mon Sep 17 00:00:00 2001 From: "Peter.Morton" Date: Thu, 23 Apr 2026 19:11:36 -0500 Subject: [PATCH] docs: update README for v0.4.0 - Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback - Expand settings table with searchApiBaseUrl and tenant fields - Update file tree to reflect kmeContentSourceAdapterHelpers.js - Add Helpers section documenting each exported function - Expand VM context globals table with helpers and correct xmlbuilder2 usage - Note dynamic proxyBaseUrl derivation from request headers - Add stampede guard detail to Token Caching section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 105 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 94 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b92bdf3..a064dc3 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # kme-content-adapter -An HTTP proxy adapter that authenticates against KME and proxies content requests through an isolated VM sandbox, mirroring the IVA Studio proxy script execution environment. +An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment. ## Requirements @@ -20,7 +20,7 @@ cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA ### `src/globalVariables/kme_CSA_settings.json` -Credentials and OIDC settings — **never commit this file**. +Credentials and API settings — **never commit this file**. ```json { @@ -28,10 +28,21 @@ Credentials and OIDC settings — **never commit this file**. "username": "", "password": "", "clientId": "default", - "scope": "openid tags content_entitlements" + "scope": "openid tags content_entitlements", + "searchApiBaseUrl": "https:///km-search-service", + "tenant": "" } ``` +| Field | Description | +|---|---| +| `tokenUrl` | OIDC token endpoint | +| `username` / `password` | KME credentials | +| `clientId` | OAuth client ID (usually `default`) | +| `scope` | OAuth scopes | +| `searchApiBaseUrl` | KME Knowledge Search Service base URL | +| `tenant` | KME tenant/environment path segment (e.g. `qa`) | + ### `config/default.json` Infrastructure settings (port, host, log level). Override with environment variables: @@ -42,6 +53,60 @@ Infrastructure settings (port, host, log level). Override with environment varia | `HOST` | `0.0.0.0` | Bind address | | `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` | +## Endpoints + +### `GET /sitemap.xml` + +Returns a [Sitemaps protocol 0.9](https://www.sitemaps.org/protocol.html) XML document. Each `` points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles. + +**Query parameters** (all optional): + +| Parameter | Default | Description | +|---|---|---| +| `query` | `*` | KME search query string | +| `size` | `100` | Max results per search page | +| `category` | `vkm:ArticleCategory` | KME category filter | + +Results are paginated automatically using `hydra:view['hydra:last']`. The response is capped at **50,000 URLs** per the Sitemaps protocol. + +``` +GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory +``` + +### `GET /?kmeURL=` + +Fetches a single KME article by its upstream URL and returns it as a full HTML document. + +``` +GET /?kmeURL=https%3A%2F%2F%2Fkm-content-service%2F... +``` + +**Response:** `200 text/html; charset=utf-8` — a complete HTML document: + +```html + + +Article Title from vkm:name + + + + +``` + +**Error responses:** + +| Status | Cause | +|---|---| +| `400` | `kmeURL` missing, blank, malformed, or non-http/https | +| `404` | Upstream returned 4xx, or article body absent in response | +| `502` | Token acquisition failed, upstream 5xx, network error, or timeout | + +### `GET /*` (anything else) + +Returns `404 Not Found`. + +--- + ## Running ```bash @@ -70,14 +135,15 @@ The server loads `src/proxyScripts/kmeContentSourceAdapter.js` once at startup v ``` src/ ├── proxyScripts/ -│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports) +│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports) ├── globalVariables/ -│ ├── kme_CSA_settings.json # OIDC credentials (gitignored) -│ └── adapterHelper.js # Pure utilities (optional) -├── logger.js # Structured JSON logger -└── server.js # HTTP server bootstrap only +│ ├── kme_CSA_settings.json # Credentials & API config (gitignored) +│ ├── kme_CSA_settings.json.example # Template for version control +│ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body) +├── logger.js # Structured JSON logger +└── server.js # HTTP server bootstrap only config/ -└── default.json # Infrastructure settings +└── default.json # Infrastructure settings ``` ### VM Context Globals @@ -91,10 +157,11 @@ All dependencies are injected into each request's sandbox: | `axios` | HTTP client | | `jwt` | `jsonwebtoken` | | `uuidv4` | UUID v4 generator | -| `xmlbuilder2` | `xmlbuilder2` `create` | +| `xmlbuilder2` | `xmlbuilder2` default export (call as `xmlbuilder2.create(...)`) | | `redis` | Connected Redis client | | `URLSearchParams`, `URL` | Node.js globals | | `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` | +| `kmeContentSourceAdapterHelpers` | Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js` | | `req`, `res` | Node.js HTTP request/response | ### Key Constraints for `kmeContentSourceAdapter.js` @@ -102,11 +169,27 @@ All dependencies are injected into each request's sandbox: - **Zero `import`/`export`** — runs in a VM with no module system - **No `config`, `global.config`, or `process.env`** — use injected globals only - Routing metadata is available via `req.params` (set by `server.js`) +- `proxyBaseUrl` is derived dynamically from request headers (`x-forwarded-proto`, `x-forwarded-host`, `host`) — not read from settings ## Token Caching -OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. +OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss. + +## Helpers (`kmeContentSourceAdapterHelpers.js`) + +A pure-utility module injected into the VM context. Key functions: + +| Function | Description | +|---|---| +| `getValidToken(reqUrl, reqMethod)` | Returns a cached or freshly-fetched OIDC `id_token`; throws on failure | +| `extractHydraItems(data)` | Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished` | +| `buildSitemapXml(items, proxyBaseUrl)` | Builds Sitemaps 0.9 XML from an array of fragments | +| `extractArticleBody(data)` | Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response | +| `validateSettings(settings, fields)` | Returns the first missing required field name, or `null` | + +> **Note:** This file is a literal function body — `server.js` wraps it as `(function() { })()`. It must end with a bare `return { ... }` and contain zero `import`/`export` statements. ## Changelog See [CHANGELOG.md](CHANGELOG.md). +