docs: update README for v0.4.0
- Add Endpoints section documenting /sitemap.xml, /?kmeURL=, and 404 fallback - Expand settings table with searchApiBaseUrl and tenant fields - Update file tree to reflect kmeContentSourceAdapterHelpers.js - Add Helpers section documenting each exported function - Expand VM context globals table with helpers and correct xmlbuilder2 usage - Note dynamic proxyBaseUrl derivation from request headers - Add stampede guard detail to Token Caching section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
97
README.md
97
README.md
@@ -1,6 +1,6 @@
|
||||
# kme-content-adapter
|
||||
|
||||
An HTTP proxy adapter that authenticates against KME and proxies content requests through an isolated VM sandbox, mirroring the IVA Studio proxy script execution environment.
|
||||
An HTTP proxy adapter that searches and fetches content from KME (Knowledge Management Engine) and exposes it as a Sitemaps-compliant XML feed and individual HTML article pages. Business logic runs in an isolated Node.js VM sandbox, mirroring the IVA Studio proxy script execution environment.
|
||||
|
||||
## Requirements
|
||||
|
||||
@@ -20,7 +20,7 @@ cp src/globalVariables/kme_CSA_settings.json.example src/globalVariables/kme_CSA
|
||||
|
||||
### `src/globalVariables/kme_CSA_settings.json`
|
||||
|
||||
Credentials and OIDC settings — **never commit this file**.
|
||||
Credentials and API settings — **never commit this file**.
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -28,10 +28,21 @@ Credentials and OIDC settings — **never commit this file**.
|
||||
"username": "<username>",
|
||||
"password": "<password>",
|
||||
"clientId": "default",
|
||||
"scope": "openid tags content_entitlements"
|
||||
"scope": "openid tags content_entitlements",
|
||||
"searchApiBaseUrl": "https://<host>/km-search-service",
|
||||
"tenant": "<env>"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| `tokenUrl` | OIDC token endpoint |
|
||||
| `username` / `password` | KME credentials |
|
||||
| `clientId` | OAuth client ID (usually `default`) |
|
||||
| `scope` | OAuth scopes |
|
||||
| `searchApiBaseUrl` | KME Knowledge Search Service base URL |
|
||||
| `tenant` | KME tenant/environment path segment (e.g. `qa`) |
|
||||
|
||||
### `config/default.json`
|
||||
|
||||
Infrastructure settings (port, host, log level). Override with environment variables:
|
||||
@@ -42,6 +53,60 @@ Infrastructure settings (port, host, log level). Override with environment varia
|
||||
| `HOST` | `0.0.0.0` | Bind address |
|
||||
| `LOG_LEVEL` | `debug` | Log level: `DEBUG`, `INFO`, `WARN`, `ERROR` |
|
||||
|
||||
## Endpoints
|
||||
|
||||
### `GET /sitemap.xml`
|
||||
|
||||
Returns a [Sitemaps protocol 0.9](https://www.sitemaps.org/protocol.html) XML document. Each `<loc>` points back to this adapter's content fetch endpoint so crawlers can retrieve individual articles.
|
||||
|
||||
**Query parameters** (all optional):
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|---|---|---|
|
||||
| `query` | `*` | KME search query string |
|
||||
| `size` | `100` | Max results per search page |
|
||||
| `category` | `vkm:ArticleCategory` | KME category filter |
|
||||
|
||||
Results are paginated automatically using `hydra:view['hydra:last']`. The response is capped at **50,000 URLs** per the Sitemaps protocol.
|
||||
|
||||
```
|
||||
GET /sitemap.xml?query=temple&size=50&category=vkm:ArticleCategory
|
||||
```
|
||||
|
||||
### `GET /?kmeURL=<upstream-article-url>`
|
||||
|
||||
Fetches a single KME article by its upstream URL and returns it as a full HTML document.
|
||||
|
||||
```
|
||||
GET /?kmeURL=https%3A%2F%2F<kme-host>%2Fkm-content-service%2F...
|
||||
```
|
||||
|
||||
**Response:** `200 text/html; charset=utf-8` — a complete HTML document:
|
||||
|
||||
```html
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head><title>Article Title from vkm:name</title></head>
|
||||
<body>
|
||||
<!-- vkm:articleBody content verbatim -->
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
**Error responses:**
|
||||
|
||||
| Status | Cause |
|
||||
|---|---|
|
||||
| `400` | `kmeURL` missing, blank, malformed, or non-http/https |
|
||||
| `404` | Upstream returned 4xx, or article body absent in response |
|
||||
| `502` | Token acquisition failed, upstream 5xx, network error, or timeout |
|
||||
|
||||
### `GET /*` (anything else)
|
||||
|
||||
Returns `404 Not Found`.
|
||||
|
||||
---
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
@@ -72,8 +137,9 @@ src/
|
||||
├── proxyScripts/
|
||||
│ └── kmeContentSourceAdapter.js # All business logic (zero imports/exports)
|
||||
├── globalVariables/
|
||||
│ ├── kme_CSA_settings.json # OIDC credentials (gitignored)
|
||||
│ └── adapterHelper.js # Pure utilities (optional)
|
||||
│ ├── kme_CSA_settings.json # Credentials & API config (gitignored)
|
||||
│ ├── kme_CSA_settings.json.example # Template for version control
|
||||
│ └── kmeContentSourceAdapterHelpers.js # Pure utilities (literal function body)
|
||||
├── logger.js # Structured JSON logger
|
||||
└── server.js # HTTP server bootstrap only
|
||||
config/
|
||||
@@ -91,10 +157,11 @@ All dependencies are injected into each request's sandbox:
|
||||
| `axios` | HTTP client |
|
||||
| `jwt` | `jsonwebtoken` |
|
||||
| `uuidv4` | UUID v4 generator |
|
||||
| `xmlbuilder2` | `xmlbuilder2` `create` |
|
||||
| `xmlbuilder2` | `xmlbuilder2` default export (call as `xmlbuilder2.create(...)`) |
|
||||
| `redis` | Connected Redis client |
|
||||
| `URLSearchParams`, `URL` | Node.js globals |
|
||||
| `kme_CSA_settings` | Loaded from `src/globalVariables/kme_CSA_settings.json` |
|
||||
| `kmeContentSourceAdapterHelpers` | Loaded from `src/globalVariables/kmeContentSourceAdapterHelpers.js` |
|
||||
| `req`, `res` | Node.js HTTP request/response |
|
||||
|
||||
### Key Constraints for `kmeContentSourceAdapter.js`
|
||||
@@ -102,11 +169,27 @@ All dependencies are injected into each request's sandbox:
|
||||
- **Zero `import`/`export`** — runs in a VM with no module system
|
||||
- **No `config`, `global.config`, or `process.env`** — use injected globals only
|
||||
- Routing metadata is available via `req.params` (set by `server.js`)
|
||||
- `proxyBaseUrl` is derived dynamically from request headers (`x-forwarded-proto`, `x-forwarded-host`, `host`) — not read from settings
|
||||
|
||||
## Token Caching
|
||||
|
||||
OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp.
|
||||
OIDC tokens are cached in Redis under the hash key `authorization` (fields `token` and `expiry`). The cache survives adapter restarts. Token expiry is stored as an absolute Unix epoch timestamp. A stampede guard ensures only one token fetch is in flight at a time when multiple concurrent requests encounter a cache miss.
|
||||
|
||||
## Helpers (`kmeContentSourceAdapterHelpers.js`)
|
||||
|
||||
A pure-utility module injected into the VM context. Key functions:
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `getValidToken(reqUrl, reqMethod)` | Returns a cached or freshly-fetched OIDC `id_token`; throws on failure |
|
||||
| `extractHydraItems(data)` | Extracts one fragment per `SearchResultItem` — the one with the latest `vkm:datePublished` |
|
||||
| `buildSitemapXml(items, proxyBaseUrl)` | Builds Sitemaps 0.9 XML from an array of fragments |
|
||||
| `extractArticleBody(data)` | Returns `vkm:articleBody` (or `articleBody` fallback) from a content API response |
|
||||
| `validateSettings(settings, fields)` | Returns the first missing required field name, or `null` |
|
||||
|
||||
> **Note:** This file is a literal function body — `server.js` wraps it as `(function() { <file> })()`. It must end with a bare `return { ... }` and contain zero `import`/`export` statements.
|
||||
|
||||
## Changelog
|
||||
|
||||
See [CHANGELOG.md](CHANGELOG.md).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user