Files
kme_content_adapter/README.md
2026-04-24 22:18:40 -05:00

8.5 KiB

kme-content-adapter

Node.js License: MIT Build Status

An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.

Requirements

  • Node.js ≥ 18
  • Redis (used for token caching)
  • jq (optional — used by npm start for log pretty-printing)

Setup

npm install
cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA_settings.json
# Edit kme_CSA_settings.json with real credentials

Configuration

src/globalVariables/kme_CSA_settings.json

Credentials and API settings — never commit this file.

{
  "tokenUrl": "https://<host>/oidc-token-service/<env>/token",
  "username": "<username>",
  "password": "<password>",
  "clientId": "default",
  "scope": "openid tags content_entitlements",
  "searchApiBaseUrl": "https://<host>/km-search-service",
  "tenant": "<env>"
}
Field Description
tokenUrl OIDC token endpoint
username / password KME credentials
clientId OAuth client ID (usually default)
scope OAuth scopes
searchApiBaseUrl KME Knowledge Search Service base URL
tenant KME tenant/environment path segment (e.g. qa)

config/default.json

Infrastructure settings (port, host, log level). Override with environment variables:

Variable Default Description
PORT 3000 HTTP server port
HOST 0.0.0.0 Bind address
LOG_LEVEL debug Log level: DEBUG, INFO, WARN, ERROR

Endpoints

GET /sitemap.xml

Returns a Sitemaps protocol 0.9 XML document. Each <loc> points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.

Query parameters (all optional):

Parameter Default Description
query * KME search query string
size 100 Max results per search page
category vkm:ArticleCategory KME category filter

Results are paginated automatically using hydra:view['hydra:last']. The response is capped at 50,000 URLs per the Sitemaps protocol.

curl "http://localhost:3000/sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory"

GET /?kmeURL=<upstream-article-url>

Fetches a single KME article by its upstream URL and returns it as a full HTML document.

curl "http://localhost:3000/?kmeURL=https%3A%2F%2Fexample.com%2Fkm-content-service%2Farticles%2F123"

Response: 200 text/html; charset=utf-8 — a complete HTML document:

<!DOCTYPE html>
<html>
<head><title>Article Title from vkm:name</title></head>
<body>
<!-- vkm:articleBody content verbatim -->
</body>
</html>

Error responses:

Status Cause
400 kmeURL missing, blank, malformed, or non-http/https
404 Upstream returned 4xx, or article body absent in response
502 Token acquisition failed, upstream 5xx, network error, or timeout

GET /* (anything else)

Returns 404 Not Found.


Running

npm run dev      # Development — auto-restart on file changes
npm start        # Production — logs piped through jq

Testing

npm test                    # All tests
npm run test:unit           # Unit tests only
npm run test:integration    # Integration tests only
npm run test:contract       # Contract tests only

# Single test file
node --test tests/unit/proxy.test.js

Tests use the Node.js built-in node:test runner. No external test framework.

Architecture

The server loads src/proxyScripts/kmeContentSourceAdapter.js once at startup via vm.Script, then executes it in a fresh isolated VM context per request via vm.createContext. This mirrors the IVA Studio proxy script execution environment.

flowchart TD
    subgraph Client["Client/Crawler"]
        A[Request]
    end
    
    subgraph Server["Node.js Server (server.js)"]
        B[HTTP Request Handler]
        C[VM Context Per Request]
    end
    
    subgraph ProxyScript["Proxy Script (proxy.js)"]
        D[Business Logic]
        E[API Calls via axios]
        F[Redis Token Cache]
    end
    
    subgraph External["External Services"]
        G[KME Search API]
        H[KME Content API]
        I[OIDC Token Service]
    end
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> G
    E --> H
    D --> F
    F --> I
    
    style C fill:#e1f5ff,stroke:#0288d1,stroke-width:2px
    style D fill:#fff3e0,stroke:#fb8c00,stroke-width:2px
    style F fill:#e8f5e9,stroke:#43a047,stroke-width:2px

Request Flow

  1. Request arrivesserver.js HTTP handler extracts routing metadata (workspaceId, branch, route) from URL params
  2. VM context created → Fresh isolated sandbox with injected globals (console, crypto, axios, jwt, redis, etc.)
  3. Proxy script executes → Business logic in proxy.js runs, using injected helpers and settings
  4. Token caching → OIDC tokens cached in Redis under key authorization; stampede guard prevents cache thrashing
  5. Response returned → XML sitemap or HTML article rendered via xmlbuilder2

File Structure

src/
├── proxyScripts/
│   └── kmeContentSourceAdapter.js          # All business logic (zero imports/exports)
├── globalVariables/
│   ├── kme_CSA_settings.json               # Credentials & API config (gitignored, runtime only)
│   ├── kme_CSA_settings.json.example       # Template for version control
│   └── kmeContentSourceAdapterHelpers.js   # Pure utilities (literal function body pattern)
├── logger.js                               # Structured JSON logger
└── server.js                               # HTTP server bootstrap only

config/
└── default.json                            # Infrastructure settings (port, host, log level)

VM Context Globals

All dependencies are injected into each request's sandbox:

Variable Source
console Structured logger
crypto Node.js Web Crypto API
axios HTTP client
jwt jsonwebtoken
uuidv4 UUID v4 generator
xmlbuilder2 xmlbuilder2 default export (call as xmlbuilder2.create(...))
redis Connected Redis client
URLSearchParams, URL Node.js globals
kme_CSA_settings Loaded from src/globalVariables/kme_CSA_settings.json
kmeContentSourceAdapterHelpers Loaded from src/globalVariables/kmeContentSourceAdapterHelpers.js
req, res Node.js HTTP request/response

Key Constraints for kmeContentSourceAdapter.js

  • Zero import/export — runs in a VM with no module system
  • No config, global.config, or process.env — use injected globals only
  • Routing metadata is available via req.params (set by server.js)
  • proxyBaseUrl is derived dynamically from request headers (x-forwarded-proto, x-forwarded-host, host) — not read from settings

Token Caching

OIDC tokens are cached in Redis under the hash key authorization (fields token and expiry). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.

Helpers (kmeContentSourceAdapterHelpers.js)

A pure-utility module injected into the VM context. Key functions:

Function Description
getValidToken(reqUrl, reqMethod) Returns a cached or freshly-fetched OIDC id_token; throws on failure
extractHydraItems(data) Extracts one fragment per SearchResultItem — the one with the latest vkm:datePublished
buildSitemapXml(items, proxyBaseUrl) Builds Sitemaps 0.9 XML from an array of fragments
extractArticleBody(data) Returns vkm:articleBody (or articleBody fallback) from a content API response
validateSettings(settings, fields) Returns the first missing required field name, or null

Note: This file is a literal function body — server.js wraps it as (function() { <file> })(). It must end with a bare return { ... } and contain zero import/export statements.

Changelog

See CHANGELOG.md.