Files

Peter.Morton 7d6637effa docs: update README for v0.4.0

- Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback
- Expand settings table with searchApiBaseUrl and tenant fields
- Update file tree to reflect kmeContentSourceAdapterHelpers.js
- Add Helpers section documenting each exported function
- Expand VM context globals table with helpers and correct xmlbuilder2 usage
- Note dynamic proxyBaseUrl derivation from request headers
- Add stampede guard detail to Token Caching section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-23 19:11:36 -05:00

6.7 KiB

Raw Blame History

kme-content-adapter

An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.

Requirements

Node.js ≥ 18
Redis (used for token caching)
jq (optional — used by npm start for log pretty-printing)

Setup

npm install
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
# Edit kme_CSA_settings.json with real credentials

Configuration

`src/globalVariables/kme_CSA_settings.json`

Credentials and API settings — never commit this file.

{
  "tokenUrl": "https://<host>/oidc-token-service/<env>/token",
  "username": "<username>",
  "password": "<password>",
  "clientId": "default",
  "scope": "openid tags content_entitlements",
  "searchApiBaseUrl": "https://<host>/km-search-service",
  "tenant": "<env>"
}

Field	Description
`tokenUrl`	OIDC token endpoint
`username` / `password`	KME credentials
`clientId`	OAuth client ID (usually `default`)
`scope`	OAuth scopes
`searchApiBaseUrl`	KME Knowledge Search Service base URL
`tenant`	KME tenant/environment path segment (e.g. `qa`)

`config/default.json`

Infrastructure settings (port, host, log level). Override with environment variables:

Variable	Default	Description
`PORT`	`3000`	HTTP server port
`HOST`	`0.0.0.0`	Bind address
`LOG_LEVEL`	`debug`	Log level: `DEBUG`, `INFO`, `WARN`, `ERROR`

Endpoints

`GET /sitemap.xml`

Returns a Sitemaps protocol 0.9 XML document. Each <loc> points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.

Query parameters (all optional):

Parameter	Default	Description
`query`	`*`	KME search query string
`size`	`100`	Max results per search page
`category`	`vkm:ArticleCategory`	KME category filter

Results are paginated automatically using hydra:view['hydra:last']. The response is capped at 50,000 URLs per the Sitemaps protocol.

GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory

`GET /?kmeURL=<upstream-article-url>`

Fetches a single KME article by its upstream URL and returns it as a full HTML document.

GET /?kmeURL=https%3A%2F%2F<kme-host>%2Fkm-content-service%2F...

Response: 200 text/html; charset=utf-8 — a complete HTML document:

<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>

Error responses:

Status	Cause
`400`	`kmeURL` missing, blank, malformed, or non-http/https
`404`	Upstream returned 4xx, or article body absent in response
`502`	Token acquisition failed, upstream 5xx, network error, or timeout

`GET /*` (anything else)

Returns 404 Not Found.

Running

npm run dev      # Development — auto-restart on file changes
npm start        # Production — logs piped through jq

Testing

npm test                    # All tests
npm run test:unit           # Unit tests only
npm run test:integration    # Integration tests only
npm run test:contract       # Contract tests only

# Single test file
node --test tests/unit/proxy.test.js

Tests use the Node.js built-in node:test runner. No external test framework.

Architecture

The server loads src/proxyScripts/kmeContentSourceAdapter.js once at startup via vm.Script, then executes it in a fresh isolated VM context per request via vm.createContext.

src/
├── proxyScripts/
│   └── kmeContentSourceAdapter.js          # All business logic (zero imports/exports)
├── globalVariables/
│   ├── kme_CSA_settings.json               # Credentials & API config (gitignored)
│   ├── kme_CSA_settings.json.example       # Template for version control
│   └── kmeContentSourceAdapterHelpers.js   # Pure utilities (literal function body)
├── logger.js                               # Structured JSON logger
└── server.js                               # HTTP server bootstrap only
config/
└── default.json                            # Infrastructure settings

VM Context Globals

All dependencies are injected into each request's sandbox:

Variable	Source
`console`	Structured logger
`crypto`	Node.js Web Crypto API
`axios`	HTTP client
`jwt`	`jsonwebtoken`
`uuidv4`	UUID v4 generator
`xmlbuilder2`	`xmlbuilder2` default export (call as `xmlbuilder2.create(...)`)
`redis`	Connected Redis client
`URLSearchParams`, `URL`	Node.js globals
`kme_CSA_settings`	Loaded from `src/globalVariables/kme_CSA_settings.json`
`kmeContentSourceAdapterHelpers`	Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js`
`req`, `res`	Node.js HTTP request/response

Key Constraints for `kmeContentSourceAdapter.js`

Zero import/export — runs in a VM with no module system
No config, global.config, or process.env — use injected globals only
Routing metadata is available via req.params (set by server.js)
proxyBaseUrl is derived dynamically from request headers (x-forwarded-proto, x-forwarded-host, host) — not read from settings

Token Caching

OIDC tokens are cached in Redis under the hash key authorization (fields token and expiry). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.

Helpers (`kmeContentSourceAdapterHelpers.js`)

A pure-utility module injected into the VM context. Key functions:

Function	Description
`getValidToken(reqUrl, reqMethod)`	Returns a cached or freshly-fetched OIDC `id_token`; throws on failure
`extractHydraItems(data)`	Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished`
`buildSitemapXml(items, proxyBaseUrl)`	Builds Sitemaps 0.9 XML from an array of fragments
`extractArticleBody(data)`	Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response
`validateSettings(settings, fields)`	Returns the first missing required field name, or `null`

Note: This file is a literal function body — server.js wraps it as (function() { <file> })(). It must end with a bare return { ... } and contain zero import/export statements.

Changelog

See CHANGELOG.md.

6.7 KiB Raw Blame History