- Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback - Expand settings table with searchApiBaseUrl and tenant fields - Update file tree to reflect kmeContentSourceAdapterHelpers.js - Add Helpers section documenting each exported function - Expand VM context globals table with helpers and correct xmlbuilder2 usage - Note dynamic proxyBaseUrl derivation from request headers - Add stampede guard detail to Token Caching section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
196 lines
6.7 KiB
Markdown
196 lines
6.7 KiB
Markdown
# kme-content-adapter
|
|
|
|
An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.
|
|
|
|
## Requirements
|
|
|
|
- Node.js ≥ 18
|
|
- Redis (used for token caching)
|
|
- `jq` (optional — used by `npm start` for log pretty-printing)
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
npm install
|
|
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
|
|
# Edit kme_CSA_settings.json with real credentials
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### `src/globalVariables/kme_CSA_settings.json`
|
|
|
|
Credentials and API settings — **never commit this file**.
|
|
|
|
```json
|
|
{
|
|
"tokenUrl": "https://<host>/oidc-token-service/<env>/token",
|
|
"username": "<username>",
|
|
"password": "<password>",
|
|
"clientId": "default",
|
|
"scope": "openid tags content_entitlements",
|
|
"searchApiBaseUrl": "https://<host>/km-search-service",
|
|
"tenant": "<env>"
|
|
}
|
|
```
|
|
|
|
| Field | Description |
|
|
|---|---|
|
|
| `tokenUrl` | OIDC token endpoint |
|
|
| `username` / `password` | KME credentials |
|
|
| `clientId` | OAuth client ID (usually `default`) |
|
|
| `scope` | OAuth scopes |
|
|
| `searchApiBaseUrl` | KME Knowledge Search Service base URL |
|
|
| `tenant` | KME tenant/environment path segment (e.g. `qa`) |
|
|
|
|
### `config/default.json`
|
|
|
|
Infrastructure settings (port, host, log level). Override with environment variables:
|
|
|
|
| Variable | Default | Description |
|
|
|---|---|---|
|
|
| `PORT` | `3000` | HTTP server port |
|
|
| `HOST` | `0.0.0.0` | Bind address |
|
|
| `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
|
|
|
|
## Endpoints
|
|
|
|
### `GET /sitemap.xml`
|
|
|
|
Returns a [Sitemaps protocol 0.9](https://www.sitemaps.org/protocol.html) XML document. Each `<loc>` points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.
|
|
|
|
**Query parameters** (all optional):
|
|
|
|
| Parameter | Default | Description |
|
|
|---|---|---|
|
|
| `query` | `*` | KME search query string |
|
|
| `size` | `100` | Max results per search page |
|
|
| `category` | `vkm:ArticleCategory` | KME category filter |
|
|
|
|
Results are paginated automatically using `hydra:view['hydra:last']`. The response is capped at **50,000 URLs** per the Sitemaps protocol.
|
|
|
|
```
|
|
GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory
|
|
```
|
|
|
|
### `GET /?kmeURL=<upstream-article-url>`
|
|
|
|
Fetches a single KME article by its upstream URL and returns it as a full HTML document.
|
|
|
|
```
|
|
GET /?kmeURL=https%3A%2F%2F<kme-host>%2Fkm-content-service%2F...
|
|
```
|
|
|
|
**Response:** `200 text/html; charset=utf-8` — a complete HTML document:
|
|
|
|
```html
|
|
<!DOCTYPE html>
|
|
<html>
|
|
<head><title>Article Title from vkm:name</title></head>
|
|
<body>
|
|
<!-- vkm:articleBody content verbatim -->
|
|
</body>
|
|
</html>
|
|
```
|
|
|
|
**Error responses:**
|
|
|
|
| Status | Cause |
|
|
|---|---|
|
|
| `400` | `kmeURL` missing, blank, malformed, or non-http/https |
|
|
| `404` | Upstream returned 4xx, or article body absent in response |
|
|
| `502` | Token acquisition failed, upstream 5xx, network error, or timeout |
|
|
|
|
### `GET /*` (anything else)
|
|
|
|
Returns `404 Not Found`.
|
|
|
|
---
|
|
|
|
## Running
|
|
|
|
```bash
|
|
npm run dev # Development — auto-restart on file changes
|
|
npm start # Production — logs piped through jq
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
npm test # All tests
|
|
npm run test:unit # Unit tests only
|
|
npm run test:integration # Integration tests only
|
|
npm run test:contract # Contract tests only
|
|
|
|
# Single test file
|
|
node --test tests/unit/proxy.test.js
|
|
```
|
|
|
|
Tests use the Node.js built-in `node:test` runner. No external test framework.
|
|
|
|
## Architecture
|
|
|
|
The server loads `src/proxyScripts/kmeContentSourceAdapter.js` once at startup via `vm.Script`, then executes it in a **fresh isolated VM context per request** via `vm.createContext`.
|
|
|
|
```
|
|
src/
|
|
├── proxyScripts/
|
|
│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports)
|
|
├── globalVariables/
|
|
│ ├── kme_CSA_settings.json # Credentials & API config (gitignored)
|
|
│ ├── kme_CSA_settings.json.example # Template for version control
|
|
│ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body)
|
|
├── logger.js # Structured JSON logger
|
|
└── server.js # HTTP server bootstrap only
|
|
config/
|
|
└── default.json # Infrastructure settings
|
|
```
|
|
|
|
### VM Context Globals
|
|
|
|
All dependencies are injected into each request's sandbox:
|
|
|
|
| Variable | Source |
|
|
|---|---|
|
|
| `console` | Structured logger |
|
|
| `crypto` | Node.js Web Crypto API |
|
|
| `axios` | HTTP client |
|
|
| `jwt` | `jsonwebtoken` |
|
|
| `uuidv4` | UUID v4 generator |
|
|
| `xmlbuilder2` | `xmlbuilder2` default export (call as `xmlbuilder2.create(...)`) |
|
|
| `redis` | Connected Redis client |
|
|
| `URLSearchParams`, `URL` | Node.js globals |
|
|
| `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` |
|
|
| `kmeContentSourceAdapterHelpers` | Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js` |
|
|
| `req`, `res` | Node.js HTTP request/response |
|
|
|
|
### Key Constraints for `kmeContentSourceAdapter.js`
|
|
|
|
- **Zero `import`/`export`** — runs in a VM with no module system
|
|
- **No `config`, `global.config`, or `process.env`** — use injected globals only
|
|
- Routing metadata is available via `req.params` (set by `server.js`)
|
|
- `proxyBaseUrl` is derived dynamically from request headers (`x-forwarded-proto`, `x-forwarded-host`, `host`) — not read from settings
|
|
|
|
## Token Caching
|
|
|
|
OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.
|
|
|
|
## Helpers (`kmeContentSourceAdapterHelpers.js`)
|
|
|
|
A pure-utility module injected into the VM context. Key functions:
|
|
|
|
| Function | Description |
|
|
|---|---|
|
|
| `getValidToken(reqUrl, reqMethod)` | Returns a cached or freshly-fetched OIDC `id_token`; throws on failure |
|
|
| `extractHydraItems(data)` | Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished` |
|
|
| `buildSitemapXml(items, proxyBaseUrl)` | Builds Sitemaps 0.9 XML from an array of fragments |
|
|
| `extractArticleBody(data)` | Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response |
|
|
| `validateSettings(settings, fields)` | Returns the first missing required field name, or `null` |
|
|
|
|
> **Note:** This file is a literal function body — `server.js` wraps it as `(function() { <file> })()`. It must end with a bare `return { ... }` and contain zero `import`/`export` statements.
|
|
|
|
## Changelog
|
|
|
|
See [CHANGELOG.md](CHANGELOG.md).
|
|
|