- Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback - Expand settings table with searchApiBaseUrl and tenant fields - Update file tree to reflect kmeContentSourceAdapterHelpers.js - Add Helpers section documenting each exported function - Expand VM context globals table with helpers and correct xmlbuilder2 usage - Note dynamic proxyBaseUrl derivation from request headers - Add stampede guard detail to Token Caching section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.7 KiB
kme-content-adapter
An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.
Requirements
- Node.js ≥ 18
- Redis (used for token caching)
jq(optional — used bynpm startfor log pretty-printing)
Setup
npm install
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
# Edit kme_CSA_settings.json with real credentials
Configuration
src/globalVariables/kme_CSA_settings.json
Credentials and API settings — never commit this file.
{
"tokenUrl": "https://<host>/oidc-token-service/<env>/token",
"username": "<username>",
"password": "<password>",
"clientId": "default",
"scope": "openid tags content_entitlements",
"searchApiBaseUrl": "https://<host>/km-search-service",
"tenant": "<env>"
}
| Field | Description |
|---|---|
tokenUrl |
OIDC token endpoint |
username / password |
KME credentials |
clientId |
OAuth client ID (usually default) |
scope |
OAuth scopes |
searchApiBaseUrl |
KME Knowledge Search Service base URL |
tenant |
KME tenant/environment path segment (e.g. qa) |
config/default.json
Infrastructure settings (port, host, log level). Override with environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
HTTP server port |
HOST |
0.0.0.0 |
Bind address |
LOG_LEVEL |
debug |
Log level: DEBUG, INFO, WARN, ERROR |
Endpoints
GET /sitemap.xml
Returns a Sitemaps protocol 0.9 XML document. Each <loc> points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.
Query parameters (all optional):
| Parameter | Default | Description |
|---|---|---|
query |
* |
KME search query string |
size |
100 |
Max results per search page |
category |
vkm:ArticleCategory |
KME category filter |
Results are paginated automatically using hydra:view['hydra:last']. The response is capped at 50,000 URLs per the Sitemaps protocol.
GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory
GET /?kmeURL=<upstream-article-url>
Fetches a single KME article by its upstream URL and returns it as a full HTML document.
GET /?kmeURL=https%3A%2F%2F<kme-host>%2Fkm-content-service%2F...
Response: 200 text/html; charset=utf-8 — a complete HTML document:
<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>
Error responses:
| Status | Cause |
|---|---|
400 |
kmeURL missing, blank, malformed, or non-http/https |
404 |
Upstream returned 4xx, or article body absent in response |
502 |
Token acquisition failed, upstream 5xx, network error, or timeout |
GET /* (anything else)
Returns 404 Not Found.
Running
npm run dev # Development — auto-restart on file changes
npm start # Production — logs piped through jq
Testing
npm test # All tests
npm run test:unit # Unit tests only
npm run test:integration # Integration tests only
npm run test:contract # Contract tests only
# Single test file
node --test tests/unit/proxy.test.js
Tests use the Node.js built-in node:test runner. No external test framework.
Architecture
The server loads src/proxyScripts/kmeContentSourceAdapter.js once at startup via vm.Script, then executes it in a fresh isolated VM context per request via vm.createContext.
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports)
├── globalVariables/
│ ├── kme_CSA_settings.json # Credentials & API config (gitignored)
│ ├── kme_CSA_settings.json.example # Template for version control
│ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body)
├── logger.js # Structured JSON logger
└── server.js # HTTP server bootstrap only
config/
└── default.json # Infrastructure settings
VM Context Globals
All dependencies are injected into each request's sandbox:
| Variable | Source |
|---|---|
console |
Structured logger |
crypto |
Node.js Web Crypto API |
axios |
HTTP client |
jwt |
jsonwebtoken |
uuidv4 |
UUID v4 generator |
xmlbuilder2 |
xmlbuilder2 default export (call as xmlbuilder2.create(...)) |
redis |
Connected Redis client |
URLSearchParams, URL |
Node.js globals |
kme_CSA_settings |
Loaded from src/globalVariables/kme_CSA_settings.json |
kmeContentSourceAdapterHelpers |
Loaded from src/globalVariables/kmeContentSourceAdapterHelpers.js |
req, res |
Node.js HTTP request/response |
Key Constraints for kmeContentSourceAdapter.js
- Zero
import/export— runs in a VM with no module system - No
config,global.config, orprocess.env— use injected globals only - Routing metadata is available via
req.params(set byserver.js) proxyBaseUrlis derived dynamically from request headers (x-forwarded-proto,x-forwarded-host,host) — not read from settings
Token Caching
OIDC tokens are cached in Redis under the hash key authorization (fields token and expiry). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.
Helpers (kmeContentSourceAdapterHelpers.js)
A pure-utility module injected into the VM context. Key functions:
| Function | Description |
|---|---|
getValidToken(reqUrl, reqMethod) |
Returns a cached or freshly-fetched OIDC id_token; throws on failure |
extractHydraItems(data) |
Extracts one fragment per SearchResultItem — the one with the latest vkm:datePublished |
buildSitemapXml(items, proxyBaseUrl) |
Builds Sitemaps 0.9 XML from an array of fragments |
extractArticleBody(data) |
Returns vkm:articleBody (or articleBody fallback) from a content API response |
validateSettings(settings, fields) |
Returns the first missing required field name, or null |
Note: This file is a literal function body —
server.jswraps it as(function() { <file> })(). It must end with a barereturn { ... }and contain zeroimport/exportstatements.
Changelog
See CHANGELOG.md.