Files
kme_content_adapter/specs/002-sitemap-generation/quickstart.md
Peter.Morton 50b87297d2 feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(),
  and sitemapFlow(); add sitemap generation using hydra:member response structure
- Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json
  and kme_CSA_settings.json.example
- Add 17 unit tests for sitemap flow and non-sitemap routing regression
- Add 5 contract tests for sitemap endpoint (proxy-http.test.js)
- Add [Unreleased] sitemap entry to CHANGELOG.md
- Add full specs/002-sitemap-generation/ artifact directory
  (spec, plan, tasks, data-model, contracts, research, quickstart, checklist)
- Update constitution.md: add redis as permitted global, refresh
  kme_CSA_settings references
- Update copilot-instructions.md SPECKIT marker to sitemap plan
2026-04-22 22:08:08 -05:00

3.9 KiB

Quickstart: Sitemap XML Generation

Feature: 002-sitemap-generation Branch: 002-sitemap-generation


What This Feature Does

Adds a GET /sitemap.xml endpoint to the kme-content-adapter proxy. When a crawler or sitemap consumer requests this URL, the adapter:

  1. Obtains a valid OIDC id_token from the Redis cache (refreshing if expired).
  2. Calls the KME Knowledge Search Service to retrieve all knowledge items.
  3. Builds a standards-compliant XML Sitemap (urlset) with one <loc> per item.
  4. Returns the sitemap as application/xml with HTTP 200.

All other requests continue to use the existing OIDC auth flow without modification.


Setup

1. Add the new settings fields

Open src/globalVariables/kme_CSA_settings.json and add the three new fields:

{
  "tokenUrl": "https://<your-oidc-host>/token",
  "username": "apiclient",
  "password": "<your-password>",
  "clientId": "<your-client-id>",
  "scope": "openid ...",
  "searchApiBaseUrl": "https://<kme-search-host>/api/search",
  "tenant": "<your-tenant-id>",
  "proxyBaseUrl": "https://<your-adapter-external-url>"
}
Field Description Example
searchApiBaseUrl Base URL of the KME Knowledge Search Service https://kme-qa.example.com/search
tenant Tenant identifier appended to the search URL path my-org
proxyBaseUrl Externally accessible HTTPS URL of this adapter https://proxy.example.com

The adapter will call GET {searchApiBaseUrl}/{tenant} to retrieve knowledge items.

2. Start the adapter

npm run dev    # development (auto-restart on changes)
npm start      # production

Redis must be running and accessible (default: redis://localhost:6379).


Usage

Request the sitemap

curl -v http://localhost:3000/sitemap.xml

Expected response:

HTTP/1.1 200 OK
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
  </url>
  ...
</urlset>

Validate the sitemap against the Sitemaps XSD

# Using xmllint (libxml2)
curl -s http://localhost:3000/sitemap.xml | \
  xmllint --schema https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd --noout -

Running the Tests

npm run test:unit      # unit tests (VM context mocking, no network)
npm run test:contract  # contract tests (real HTTP, mock token/search servers)
npm test               # all tests

Unit tests live in tests/unit/proxy.test.js. Contract tests live in tests/contract/proxy-http.test.js.


Error Scenarios

Scenario How to reproduce Expected response
Missing searchApiBaseUrl Remove field from kme_CSA_settings.json, restart 500 Configuration error: missing required field: searchApiBaseUrl
Search service down Point searchApiBaseUrl to an unreachable host 502 Search service error: HTTP <status> or 504 Search service timeout
Zero results Search service returns empty items array 200 OK with empty <urlset/>
Items with empty vkm:url (covered by unit tests) Items silently omitted from sitemap

Architecture Notes

  • No new files: All new logic is added directly to src/proxyScripts/kmeContentSourceAdapter.js (monolithic architecture constraint).
  • No new dependencies: xmlbuilder2 is already in package.json and injected into the VM context as xmlBuilder.
  • Token reuse: The sitemap flow reuses the existing Redis hGet/token-refresh pattern — no separate auth logic.
  • VM isolation: The proxy script runs in a vm.createContext sandbox. It has access only to the injected globals listed in src/server.js (axios, redis, xmlBuilder, kme_CSA_settings, req, res, console, URLSearchParams, URL, crypto).