- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
3.9 KiB
3.9 KiB
Quickstart: Sitemap XML Generation
Feature: 002-sitemap-generation
Branch: 002-sitemap-generation
What This Feature Does
Adds a GET /sitemap.xml endpoint to the kme-content-adapter proxy. When a crawler or
sitemap consumer requests this URL, the adapter:
- Obtains a valid OIDC
id_tokenfrom the Redis cache (refreshing if expired). - Calls the KME Knowledge Search Service to retrieve all knowledge items.
- Builds a standards-compliant XML Sitemap (
urlset) with one<loc>per item. - Returns the sitemap as
application/xmlwith HTTP 200.
All other requests continue to use the existing OIDC auth flow without modification.
Setup
1. Add the new settings fields
Open src/globalVariables/kme_CSA_settings.json and add the three new fields:
{
"tokenUrl": "https://<your-oidc-host>/token",
"username": "apiclient",
"password": "<your-password>",
"clientId": "<your-client-id>",
"scope": "openid ...",
"searchApiBaseUrl": "https://<kme-search-host>/api/search",
"tenant": "<your-tenant-id>",
"proxyBaseUrl": "https://<your-adapter-external-url>"
}
| Field | Description | Example |
|---|---|---|
searchApiBaseUrl |
Base URL of the KME Knowledge Search Service | https://kme-qa.example.com/search |
tenant |
Tenant identifier appended to the search URL path | my-org |
proxyBaseUrl |
Externally accessible HTTPS URL of this adapter | https://proxy.example.com |
The adapter will call GET {searchApiBaseUrl}/{tenant} to retrieve knowledge items.
2. Start the adapter
npm run dev # development (auto-restart on changes)
npm start # production
Redis must be running and accessible (default: redis://localhost:6379).
Usage
Request the sitemap
curl -v http://localhost:3000/sitemap.xml
Expected response:
HTTP/1.1 200 OK
Content-Type: application/xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
</url>
...
</urlset>
Validate the sitemap against the Sitemaps XSD
# Using xmllint (libxml2)
curl -s http://localhost:3000/sitemap.xml | \
xmllint --schema https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd --noout -
Running the Tests
npm run test:unit # unit tests (VM context mocking, no network)
npm run test:contract # contract tests (real HTTP, mock token/search servers)
npm test # all tests
Unit tests live in tests/unit/proxy.test.js.
Contract tests live in tests/contract/proxy-http.test.js.
Error Scenarios
| Scenario | How to reproduce | Expected response |
|---|---|---|
Missing searchApiBaseUrl |
Remove field from kme_CSA_settings.json, restart |
500 Configuration error: missing required field: searchApiBaseUrl |
| Search service down | Point searchApiBaseUrl to an unreachable host |
502 Search service error: HTTP <status> or 504 Search service timeout |
| Zero results | Search service returns empty items array | 200 OK with empty <urlset/> |
Items with empty vkm:url |
(covered by unit tests) | Items silently omitted from sitemap |
Architecture Notes
- No new files: All new logic is added directly to
src/proxyScripts/kmeContentSourceAdapter.js(monolithic architecture constraint). - No new dependencies:
xmlbuilder2is already inpackage.jsonand injected into the VM context asxmlBuilder. - Token reuse: The sitemap flow reuses the existing Redis
hGet/token-refresh pattern — no separate auth logic. - VM isolation: The proxy script runs in a
vm.createContextsandbox. It has access only to the injected globals listed insrc/server.js(axios,redis,xmlBuilder,kme_CSA_settings,req,res,console,URLSearchParams,URL,crypto).