8.5 KiB
kme-content-adapter
An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.
Requirements
- Node.js ≥ 18
- Redis (used for token caching)
jq(optional — used bynpm startfor log pretty-printing)
Setup
npm install
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
# Edit kme_CSA_settings.json with real credentials
Configuration
src/globalVariables/kme_CSA_settings.json
Credentials and API settings — never commit this file.
{
"tokenUrl": "https://<host>/oidc-token-service/<env>/token",
"username": "<username>",
"password": "<password>",
"clientId": "default",
"scope": "openid tags content_entitlements",
"searchApiBaseUrl": "https://<host>/km-search-service",
"tenant": "<env>"
}
| Field | Description |
|---|---|
tokenUrl |
OIDC token endpoint |
username / password |
KME credentials |
clientId |
OAuth client ID (usually default) |
scope |
OAuth scopes |
searchApiBaseUrl |
KME Knowledge Search Service base URL |
tenant |
KME tenant/environment path segment (e.g. qa) |
config/default.json
Infrastructure settings (port, host, log level). Override with environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
HTTP server port |
HOST |
0.0.0.0 |
Bind address |
LOG_LEVEL |
debug |
Log level: DEBUG, INFO, WARN, ERROR |
Endpoints
GET /sitemap.xml
Returns a Sitemaps protocol 0.9 XML document. Each <loc> points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.
Query parameters (all optional):
| Parameter | Default | Description |
|---|---|---|
query |
* |
KME search query string |
size |
100 |
Max results per search page |
category |
vkm:ArticleCategory |
KME category filter |
Results are paginated automatically using hydra:view['hydra:last']. The response is capped at 50,000 URLs per the Sitemaps protocol.
curl "http://localhost:3000/sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory"
GET /?kmeURL=<upstream-article-url>
Fetches a single KME article by its upstream URL and returns it as a full HTML document.
curl "http://localhost:3000/?kmeURL=https%3A%2F%2Fexample.com%2Fkm-content-service%2Farticles%2F123"
Response: 200 text/html; charset=utf-8 — a complete HTML document:
<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>
Error responses:
| Status | Cause |
|---|---|
400 |
kmeURL missing, blank, malformed, or non-http/https |
404 |
Upstream returned 4xx, or article body absent in response |
502 |
Token acquisition failed, upstream 5xx, network error, or timeout |
GET /* (anything else)
Returns 404 Not Found.
Running
npm run dev # Development — auto-restart on file changes
npm start # Production — logs piped through jq
Testing
npm test # All tests
npm run test:unit # Unit tests only
npm run test:integration # Integration tests only
npm run test:contract # Contract tests only
# Single test file
node --test tests/unit/proxy.test.js
Tests use the Node.js built-in node:test runner. No external test framework.
Architecture
The server loads src/proxyScripts/kmeContentSourceAdapter.js once at startup via vm.Script, then executes it in a fresh isolated VM context per request via vm.createContext. This mirrors the IVA Studio proxy script execution environment.
flowchart TD
subgraph Client["Client/Crawler"]
A[Request]
end
subgraph Server["Node.js Server (server.js)"]
B[HTTP Request Handler]
C[VM Context Per Request]
end
subgraph ProxyScript["Proxy Script (proxy.js)"]
D[Business Logic]
E[API Calls via axios]
F[Redis Token Cache]
end
subgraph External["External Services"]
G[KME Search API]
H[KME Content API]
I[OIDC Token Service]
end
A --> B
B --> C
C --> D
D --> E
E --> G
E --> H
D --> F
F --> I
style C fill:#e1f5ff,stroke:#0288d1,stroke-width:2px
style D fill:#fff3e0,stroke:#fb8c00,stroke-width:2px
style F fill:#e8f5e9,stroke:#43a047,stroke-width:2px
Request Flow
- Request arrives →
server.jsHTTP handler extracts routing metadata (workspaceId,branch,route) from URL params - VM context created → Fresh isolated sandbox with injected globals (
console,crypto,axios,jwt,redis, etc.) - Proxy script executes → Business logic in
proxy.jsruns, using injected helpers and settings - Token caching → OIDC tokens cached in Redis under key
authorization; stampede guard prevents cache thrashing - Response returned → XML sitemap or HTML article rendered via
xmlbuilder2
File Structure
src/
├── proxyScripts/
│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports)
├── globalVariables/
│ ├── kme_CSA_settings.json # Credentials & API config (gitignored, runtime only)
│ ├── kme_CSA_settings.json.example # Template for version control
│ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body pattern)
├── logger.js # Structured JSON logger
└── server.js # HTTP server bootstrap only
config/
└── default.json # Infrastructure settings (port, host, log level)
VM Context Globals
All dependencies are injected into each request's sandbox:
| Variable | Source |
|---|---|
console |
Structured logger |
crypto |
Node.js Web Crypto API |
axios |
HTTP client |
jwt |
jsonwebtoken |
uuidv4 |
UUID v4 generator |
xmlbuilder2 |
xmlbuilder2 default export (call as xmlbuilder2.create(...)) |
redis |
Connected Redis client |
URLSearchParams, URL |
Node.js globals |
kme_CSA_settings |
Loaded from src/globalVariables/kme_CSA_settings.json |
kmeContentSourceAdapterHelpers |
Loaded from src/globalVariables/kmeContentSourceAdapterHelpers.js |
req, res |
Node.js HTTP request/response |
Key Constraints for kmeContentSourceAdapter.js
- Zero
import/export— runs in a VM with no module system - No
config,global.config, orprocess.env— use injected globals only - Routing metadata is available via
req.params(set byserver.js) proxyBaseUrlis derived dynamically from request headers (x-forwarded-proto,x-forwarded-host,host) — not read from settings
Token Caching
OIDC tokens are cached in Redis under the hash key authorization (fields token and expiry). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.
Helpers (kmeContentSourceAdapterHelpers.js)
A pure-utility module injected into the VM context. Key functions:
| Function | Description |
|---|---|
getValidToken(reqUrl, reqMethod) |
Returns a cached or freshly-fetched OIDC id_token; throws on failure |
extractHydraItems(data) |
Extracts one fragment per SearchResultItem — the one with the latest vkm:datePublished |
buildSitemapXml(items, proxyBaseUrl) |
Builds Sitemaps 0.9 XML from an array of fragments |
extractArticleBody(data) |
Returns vkm:articleBody (or articleBody fallback) from a content API response |
validateSettings(settings, fields) |
Returns the first missing required field name, or null |
Note: This file is a literal function body —
server.jswraps it as(function() { <file> })(). It must end with a barereturn { ... }and contain zeroimport/exportstatements.
Changelog
See CHANGELOG.md.