feat: content fetch, sitemap fixes, remove oidcAuthFlow

- Add contentFetchFlow() to proxy (FR-001 through FR-012)
- Add extractArticleBody() helper with vkm:articleBody / articleBody fallback
- Dynamic proxyBaseUrl derivation from x-forwarded-proto/host headers
- Forward query/size/category params on /sitemap.xml requests
- Add Accept: application/ld+json header to content API calls
- Remove oidcAuthFlow() - unmatched requests now return 404 Not Found
- Fix xmlbuilder2 import: default import, call as xmlbuilder2.create(...)
- Version bump 0.2.0 → 0.3.0
- 45/45 tests passing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-04-23 16:40:06 -05:00
parent d50f041488
commit f840587e5e
29 changed files with 1998 additions and 352 deletions

View File

@@ -0,0 +1,122 @@
# Quickstart: KME Article Content Fetch (003)
**Feature branch**: `003-kme-content-fetch`
This guide explains how to develop and test the content-fetch feature locally.
---
## Prerequisites
- Node.js ≥18
- A running Redis instance (default: `localhost:6379`)
- `kme_CSA_settings.json` populated (see `src/globalVariables/kme_CSA_settings.json.example`)
- `npm install` already run
---
## Running the Proxy
```bash
npm run dev # start with --watch (auto-restart on changes)
npm start # start with jq log formatting
```
---
## Testing the Content-Fetch Route
### Happy path (requires a real or stubbed KME Content Service)
```bash
curl -s "http://localhost:3000/?kmeURL=https://content.kme.example/articles/123"
# Expected: 200 OK, Content-Type: text/html, body = <p>Article HTML...</p>
```
### Bad input — missing kmeURL
```bash
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/"
# Expected: 200 (auth-check passthrough, no kmeURL → oidcAuthFlow)
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL="
# Expected: 400
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=not-a-url"
# Expected: 400
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/?kmeURL=ftp://example.com/article"
# Expected: 400
```
### Existing sitemap route (unchanged)
```bash
curl -s -o /dev/null -w "%{http_code}" "http://localhost:3000/sitemap.xml"
# Expected: 200 (sitemap XML) or 502 if KME Search API is unreachable
```
---
## Running Tests
```bash
npm run test:unit # unit tests (mocked axios and Redis)
npm run test:contract # contract tests (real HTTP servers, real Redis fake)
npm test # all tests
```
### Running a single test file
```bash
node --test tests/unit/proxy.test.js
node --test tests/contract/proxy-http.test.js
```
---
## Key Files Modified by This Feature
| File | Change |
|------|--------|
| `src/proxyScripts/kmeContentSourceAdapter.js` | Add `contentFetchFlow()` function; add routing branch `else if (searchParams.has('kmeURL'))` |
| `src/globalVariables/kmeContentSourceAdapterHelpers.js` | Add `extractArticleBody(data)` function; export it in `return { ... }` |
| `tests/unit/proxy.test.js` | Add `describe` blocks for content-fetch unit tests and `extractArticleBody` helper tests |
| `tests/contract/proxy-http.test.js` | Add contract tests for content-fetch (real mock HTTP servers) |
---
## Architecture Reminder
The proxy runs inside a Node.js `vm.Script` / `vm.createContext` sandbox — **zero imports or
exports** are permitted in `kmeContentSourceAdapter.js`. All dependencies arrive via the injected
context:
| Variable | What it is |
|----------|-----------|
| `axios` | HTTP client — `axios.get(url, { headers, timeout })` |
| `kmeContentSourceAdapterHelpers` | Helpers object — `getValidToken()`, `extractArticleBody()`, `validateSettings()` |
| `kme_CSA_settings` | OIDC + service settings from `src/globalVariables/kme_CSA_settings.json` |
| `URL`, `URLSearchParams` | WHATWG URL API — for parsing `req.url` and validating `kmeURL` |
| `console` | Structured logger — `console.info/debug/error({ message, ... })` |
| `req`, `res` | Node.js HTTP request/response |
The helpers file (`kmeContentSourceAdapterHelpers.js`) is a **literal function body** — it ends
with `return { ... }` and contains no `import`/`export` statements. server.js wraps it as an IIFE.
---
## Content-Fetch Flow Summary
```
Request: GET /?kmeURL=https://content.kme.example/articles/123
1. Routing: req.url has ?kmeURL= → contentFetchFlow()
2. Extract kmeURL: new URL(req.url, 'http://localhost').searchParams.get('kmeURL')
3. Validate kmeURL: empty → 400; malformed / non-http(s) → 400
4. getValidToken() → OIDC id_token (from Redis cache or fresh fetch)
5. axios.get(kmeURL, { Authorization: 'OIDC_id_token {token}', timeout: 10000 })
6. Error handling: 4xx upstream → 404; 5xx/timeout/network → 502
7. extractArticleBody(response.data) → vkm:articleBody string or null
8. null → 404; string → 200 text/html
```