Files

Peter.Morton 28ea425af3 Adding opencode files

2026-04-24 22:18:40 -05:00

8.5 KiB

Raw Blame History

kme-content-adapter

An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.

Requirements

Node.js ≥ 18
Redis (used for token caching)
jq (optional — used by npm start for log pretty-printing)

Setup

npm install
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
# Edit kme_CSA_settings.json with real credentials

Configuration

`src/globalVariables/kme_CSA_settings.json`

Credentials and API settings — never commit this file.

{
  "tokenUrl": "https://<host>/oidc-token-service/<env>/token",
  "username": "<username>",
  "password": "<password>",
  "clientId": "default",
  "scope": "openid tags content_entitlements",
  "searchApiBaseUrl": "https://<host>/km-search-service",
  "tenant": "<env>"
}

Field	Description
`tokenUrl`	OIDC token endpoint
`username` / `password`	KME credentials
`clientId`	OAuth client ID (usually `default`)
`scope`	OAuth scopes
`searchApiBaseUrl`	KME Knowledge Search Service base URL
`tenant`	KME tenant/environment path segment (e.g. `qa`)

`config/default.json`

Infrastructure settings (port, host, log level). Override with environment variables:

Variable	Default	Description
`PORT`	`3000`	HTTP server port
`HOST`	`0.0.0.0`	Bind address
`LOG_LEVEL`	`debug`	Log level: `DEBUG`, `INFO`, `WARN`, `ERROR`

Endpoints

`GET /sitemap.xml`

Returns a Sitemaps protocol 0.9 XML document. Each <loc> points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.

Query parameters (all optional):

Parameter	Default	Description
`query`	`*`	KME search query string
`size`	`100`	Max results per search page
`category`	`vkm:ArticleCategory`	KME category filter

Results are paginated automatically using hydra:view['hydra:last']. The response is capped at 50,000 URLs per the Sitemaps protocol.

curl "http://localhost:3000/sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory"

`GET /?kmeURL=<upstream-article-url>`

Fetches a single KME article by its upstream URL and returns it as a full HTML document.

curl "http://localhost:3000/?kmeURL=https%3A%2F%2Fexample.com%2Fkm-content-service%2Farticles%2F123"

Response: 200 text/html; charset=utf-8 — a complete HTML document:

<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>

Error responses:

Status	Cause
`400`	`kmeURL` missing, blank, malformed, or non-http/https
`404`	Upstream returned 4xx, or article body absent in response
`502`	Token acquisition failed, upstream 5xx, network error, or timeout

`GET /*` (anything else)

Returns 404 Not Found.

Running

npm run dev      # Development — auto-restart on file changes
npm start        # Production — logs piped through jq

Testing

npm test                    # All tests
npm run test:unit           # Unit tests only
npm run test:integration    # Integration tests only
npm run test:contract       # Contract tests only

# Single test file
node --test tests/unit/proxy.test.js

Tests use the Node.js built-in node:test runner. No external test framework.

Architecture

The server loads src/proxyScripts/kmeContentSourceAdapter.js once at startup via vm.Script, then executes it in a fresh isolated VM context per request via vm.createContext. This mirrors the IVA Studio proxy script execution environment.

flowchart TD
    subgraph Client["Client/Crawler"]
        A[Request]
    end
    
    subgraph Server["Node.js Server (server.js)"]
        B[HTTP Request Handler]
        C[VM Context Per Request]
    end
    
    subgraph ProxyScript["Proxy Script (proxy.js)"]
        D[Business Logic]
        E[API Calls via axios]
        F[Redis Token Cache]
    end
    
    subgraph External["External Services"]
        G[KME Search API]
        H[KME Content API]
        I[OIDC Token Service]
    end
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> G
    E --> H
    D --> F
    F --> I
    
    style C fill:#e1f5ff,stroke:#0288d1,stroke-width:2px
    style D fill:#fff3e0,stroke:#fb8c00,stroke-width:2px
    style F fill:#e8f5e9,stroke:#43a047,stroke-width:2px

Request Flow

Request arrives → server.js HTTP handler extracts routing metadata (workspaceId, branch, route) from URL params
VM context created → Fresh isolated sandbox with injected globals (console, crypto, axios, jwt, redis, etc.)
Proxy script executes → Business logic in proxy.js runs, using injected helpers and settings
Token caching → OIDC tokens cached in Redis under key authorization; stampede guard prevents cache thrashing
Response returned → XML sitemap or HTML article rendered via xmlbuilder2

File Structure

src/
├── proxyScripts/
│   └── kmeContentSourceAdapter.js          # All business logic (zero imports/exports)
├── globalVariables/
│   ├── kme_CSA_settings.json               # Credentials & API config (gitignored, runtime only)
│   ├── kme_CSA_settings.json.example       # Template for version control
│   └── kmeContentSourceAdapterHelpers.js   # Pure utilities (literal function body pattern)
├── logger.js                               # Structured JSON logger
└── server.js                               # HTTP server bootstrap only

config/
└── default.json                            # Infrastructure settings (port, host, log level)

VM Context Globals

All dependencies are injected into each request's sandbox:

Variable	Source
`console`	Structured logger
`crypto`	Node.js Web Crypto API
`axios`	HTTP client
`jwt`	`jsonwebtoken`
`uuidv4`	UUID v4 generator
`xmlbuilder2`	`xmlbuilder2` default export (call as `xmlbuilder2.create(...)`)
`redis`	Connected Redis client
`URLSearchParams`, `URL`	Node.js globals
`kme_CSA_settings`	Loaded from `src/globalVariables/kme_CSA_settings.json`
`kmeContentSourceAdapterHelpers`	Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js`
`req`, `res`	Node.js HTTP request/response

Key Constraints for `kmeContentSourceAdapter.js`

Zero import/export — runs in a VM with no module system
No config, global.config, or process.env — use injected globals only
Routing metadata is available via req.params (set by server.js)
proxyBaseUrl is derived dynamically from request headers (x-forwarded-proto, x-forwarded-host, host) — not read from settings

Token Caching

OIDC tokens are cached in Redis under the hash key authorization (fields token and expiry). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.

Helpers (`kmeContentSourceAdapterHelpers.js`)

A pure-utility module injected into the VM context. Key functions:

Function	Description
`getValidToken(reqUrl, reqMethod)`	Returns a cached or freshly-fetched OIDC `id_token`; throws on failure
`extractHydraItems(data)`	Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished`
`buildSitemapXml(items, proxyBaseUrl)`	Builds Sitemaps 0.9 XML from an array of fragments
`extractArticleBody(data)`	Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response
`validateSettings(settings, fields)`	Returns the first missing required field name, or `null`

Note: This file is a literal function body — server.js wraps it as (function() { <file> })(). It must end with a bare return { ... } and contain zero import/export statements.

Changelog

See CHANGELOG.md.

8.5 KiB Raw Blame History