docs: update README for v0.4.0

- Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback
- Expand settings table with searchApiBaseUrl and tenant fields
- Update file tree to reflect kmeContentSourceAdapterHelpers.js
- Add Helpers section documenting each exported function
- Expand VM context globals table with helpers and correct xmlbuilder2 usage
- Note dynamic proxyBaseUrl derivation from request headers
- Add stampede guard detail to Token Caching section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-23 19:11:36 -05:00
parent 6b171c2b45
commit 7d6637effa

105
README.md
View File

@@ -1,6 +1,6 @@
# kme-content-adapter # kme-content-adapter
An HTTP proxy adapter that authenticates against KME and proxies content requests through an isolated VM sandbox, mirroring the IVA Studio proxy script execution environment. An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.
## Requirements ## Requirements
@@ -20,7 +20,7 @@ cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA
### `src/globalVariables/kme_CSA_settings.json` ### `src/globalVariables/kme_CSA_settings.json`
Credentials and OIDC settings — **never commit this file**. Credentials and API settings — **never commit this file**.
```json ```json
{ {
@@ -28,10 +28,21 @@ Credentials and OIDC settings — **never commit this file**.
"username": "<username>", "username": "<username>",
"password": "<password>", "password": "<password>",
"clientId": "default", "clientId": "default",
"scope": "openid tags content_entitlements" "scope": "openid tags content_entitlements",
"searchApiBaseUrl": "https://<host>/km-search-service",
"tenant": "<env>"
} }
``` ```
| Field | Description |
|---|---|
| `tokenUrl` | OIDC token endpoint |
| `username` / `password` | KME credentials |
| `clientId` | OAuth client ID (usually `default`) |
| `scope` | OAuth scopes |
| `searchApiBaseUrl` | KME Knowledge Search Service base URL |
| `tenant` | KME tenant/environment path segment (e.g. `qa`) |
### `config/default.json` ### `config/default.json`
Infrastructure settings (port, host, log level). Override with environment variables: Infrastructure settings (port, host, log level). Override with environment variables:
@@ -42,6 +53,60 @@ Infrastructure settings (port, host, log level). Override with environment varia
| `HOST` | `0.0.0.0` | Bind address | | `HOST` | `0.0.0.0` | Bind address |
| `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` | | `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
## Endpoints
### `GET /sitemap.xml`
Returns a [Sitemaps protocol 0.9](https://www.sitemaps.org/protocol.html) XML document. Each `<loc>` points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.
**Query parameters** (all optional):
| Parameter | Default | Description |
|---|---|---|
| `query` | `*` | KME search query string |
| `size` | `100` | Max results per search page |
| `category` | `vkm:ArticleCategory` | KME category filter |
Results are paginated automatically using `hydra:view['hydra:last']`. The response is capped at **50,000 URLs** per the Sitemaps protocol.
```
GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory
```
### `GET /?kmeURL=<upstream-article-url>`
Fetches a single KME article by its upstream URL and returns it as a full HTML document.
```
GET /?kmeURL=https%3A%2F%2F<kme-host>%2Fkm-content-service%2F...
```
**Response:** `200 text/html; charset=utf-8` — a complete HTML document:
```html
<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>
```
**Error responses:**
| Status | Cause |
|---|---|
| `400` | `kmeURL` missing, blank, malformed, or non-http/https |
| `404` | Upstream returned 4xx, or article body absent in response |
| `502` | Token acquisition failed, upstream 5xx, network error, or timeout |
### `GET /*` (anything else)
Returns `404 Not Found`.
---
## Running ## Running
```bash ```bash
@@ -70,14 +135,15 @@ The server loads `src/proxyScripts/kmeContentSourceAdapter.js` once at startup v
``` ```
src/ src/
├── proxyScripts/ ├── proxyScripts/
│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports) │ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports)
├── globalVariables/ ├── globalVariables/
│ ├── kme_CSA_settings.json # OIDC credentials (gitignored) │ ├── kme_CSA_settings.json # Credentials & API config (gitignored)
── adapterHelper.js # Pure utilities (optional) ── kme_CSA_settings.json.example # Template for version control
├── logger.js # Structured JSON logger │ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body)
── server.js # HTTP server bootstrap only ── logger.js # Structured JSON logger
└── server.js # HTTP server bootstrap only
config/ config/
└── default.json # Infrastructure settings └── default.json # Infrastructure settings
``` ```
### VM Context Globals ### VM Context Globals
@@ -91,10 +157,11 @@ All dependencies are injected into each request's sandbox:
| `axios` | HTTP client | | `axios` | HTTP client |
| `jwt` | `jsonwebtoken` | | `jwt` | `jsonwebtoken` |
| `uuidv4` | UUID v4 generator | | `uuidv4` | UUID v4 generator |
| `xmlbuilder2` | `xmlbuilder2` `create` | | `xmlbuilder2` | `xmlbuilder2` default export (call as `xmlbuilder2.create(...)`) |
| `redis` | Connected Redis client | | `redis` | Connected Redis client |
| `URLSearchParams`, `URL` | Node.js globals | | `URLSearchParams`, `URL` | Node.js globals |
| `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` | | `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` |
| `kmeContentSourceAdapterHelpers` | Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js` |
| `req`, `res` | Node.js HTTP request/response | | `req`, `res` | Node.js HTTP request/response |
### Key Constraints for `kmeContentSourceAdapter.js` ### Key Constraints for `kmeContentSourceAdapter.js`
@@ -102,11 +169,27 @@ All dependencies are injected into each request's sandbox:
- **Zero `import`/`export`** — runs in a VM with no module system - **Zero `import`/`export`** — runs in a VM with no module system
- **No `config`, `global.config`, or `process.env`** — use injected globals only - **No `config`, `global.config`, or `process.env`** — use injected globals only
- Routing metadata is available via `req.params` (set by `server.js`) - Routing metadata is available via `req.params` (set by `server.js`)
- `proxyBaseUrl` is derived dynamically from request headers (`x-forwarded-proto`, `x-forwarded-host`, `host`) — not read from settings
## Token Caching ## Token Caching
OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.
## Helpers (`kmeContentSourceAdapterHelpers.js`)
A pure-utility module injected into the VM context. Key functions:
| Function | Description |
|---|---|
| `getValidToken(reqUrl, reqMethod)` | Returns a cached or freshly-fetched OIDC `id_token`; throws on failure |
| `extractHydraItems(data)` | Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished` |
| `buildSitemapXml(items, proxyBaseUrl)` | Builds Sitemaps 0.9 XML from an array of fragments |
| `extractArticleBody(data)` | Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response |
| `validateSettings(settings, fields)` | Returns the first missing required field name, or `null` |
> **Note:** This file is a literal function body — `server.js` wraps it as `(function() { <file> })()`. It must end with a bare `return { ... }` and contain zero `import`/`export` statements.
## Changelog ## Changelog
See [CHANGELOG.md](CHANGELOG.md). See [CHANGELOG.md](CHANGELOG.md).