# kme-content-adapter An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment. ## Requirements - Node.js ≥ 18 - Redis (used for token caching) - `jq` (optional — used by `npm start` for log pretty-printing) ## Setup ```bash npm install cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json # Edit kme_CSA_settings.json with real credentials ``` ## Configuration ### `src/globalVariables/kme_CSA_settings.json` Credentials and API settings — **never commit this file**. ```json { "tokenUrl": "https:///oidc-token-service//token", "username": "", "password": "", "clientId": "default", "scope": "openid tags content_entitlements", "searchApiBaseUrl": "https:///km-search-service", "tenant": "" } ``` | Field | Description | |---|---| | `tokenUrl` | OIDC token endpoint | | `username` / `password` | KME credentials | | `clientId` | OAuth client ID (usually `default`) | | `scope` | OAuth scopes | | `searchApiBaseUrl` | KME Knowledge Search Service base URL | | `tenant` | KME tenant/environment path segment (e.g. `qa`) | ### `config/default.json` Infrastructure settings (port, host, log level). Override with environment variables: | Variable | Default | Description | |---|---|---| | `PORT` | `3000` | HTTP server port | | `HOST` | `0.0.0.0` | Bind address | | `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` | ## Endpoints ### `GET /sitemap.xml` Returns a [Sitemaps protocol 0.9](https://www.sitemaps.org/protocol.html) XML document. Each `` points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles. **Query parameters** (all optional): | Parameter | Default | Description | |---|---|---| | `query` | `*` | KME search query string | | `size` | `100` | Max results per search page | | `category` | `vkm:ArticleCategory` | KME category filter | Results are paginated automatically using `hydra:view['hydra:last']`. The response is capped at **50,000 URLs** per the Sitemaps protocol. ``` GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory ``` ### `GET /?kmeURL=` Fetches a single KME article by its upstream URL and returns it as a full HTML document. ``` GET /?kmeURL=https%3A%2F%2F%2Fkm-content-service%2F... ``` **Response:** `200 text/html; charset=utf-8` — a complete HTML document: ```html Article Title from vkm:name ``` **Error responses:** | Status | Cause | |---|---| | `400` | `kmeURL` missing, blank, malformed, or non-http/https | | `404` | Upstream returned 4xx, or article body absent in response | | `502` | Token acquisition failed, upstream 5xx, network error, or timeout | ### `GET /*` (anything else) Returns `404 Not Found`. --- ## Running ```bash npm run dev # Development — auto-restart on file changes npm start # Production — logs piped through jq ``` ## Testing ```bash npm test # All tests npm run test:unit # Unit tests only npm run test:integration # Integration tests only npm run test:contract # Contract tests only # Single test file node --test tests/unit/proxy.test.js ``` Tests use the Node.js built-in `node:test` runner. No external test framework. ## Architecture The server loads `src/proxyScripts/kmeContentSourceAdapter.js` once at startup via `vm.Script`, then executes it in a **fresh isolated VM context per request** via `vm.createContext`. ``` src/ ├── proxyScripts/ │ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports) ├── globalVariables/ │ ├── kme_CSA_settings.json # Credentials & API config (gitignored) │ ├── kme_CSA_settings.json.example # Template for version control │ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body) ├── logger.js # Structured JSON logger └── server.js # HTTP server bootstrap only config/ └── default.json # Infrastructure settings ``` ### VM Context Globals All dependencies are injected into each request's sandbox: | Variable | Source | |---|---| | `console` | Structured logger | | `crypto` | Node.js Web Crypto API | | `axios` | HTTP client | | `jwt` | `jsonwebtoken` | | `uuidv4` | UUID v4 generator | | `xmlbuilder2` | `xmlbuilder2` default export (call as `xmlbuilder2.create(...)`) | | `redis` | Connected Redis client | | `URLSearchParams`, `URL` | Node.js globals | | `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` | | `kmeContentSourceAdapterHelpers` | Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js` | | `req`, `res` | Node.js HTTP request/response | ### Key Constraints for `kmeContentSourceAdapter.js` - **Zero `import`/`export`** — runs in a VM with no module system - **No `config`, `global.config`, or `process.env`** — use injected globals only - Routing metadata is available via `req.params` (set by `server.js`) - `proxyBaseUrl` is derived dynamically from request headers (`x-forwarded-proto`, `x-forwarded-host`, `host`) — not read from settings ## Token Caching OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss. ## Helpers (`kmeContentSourceAdapterHelpers.js`) A pure-utility module injected into the VM context. Key functions: | Function | Description | |---|---| | `getValidToken(reqUrl, reqMethod)` | Returns a cached or freshly-fetched OIDC `id_token`; throws on failure | | `extractHydraItems(data)` | Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished` | | `buildSitemapXml(items, proxyBaseUrl)` | Builds Sitemaps 0.9 XML from an array of fragments | | `extractArticleBody(data)` | Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response | | `validateSettings(settings, fields)` | Returns the first missing required field name, or `null` | > **Note:** This file is a literal function body — `server.js` wraps it as `(function() { })()`. It must end with a bare `return { ... }` and contain zero `import`/`export` statements. ## Changelog See [CHANGELOG.md](CHANGELOG.md).