# Quickstart: Sitemap XML Generation **Feature**: `002-sitemap-generation` **Branch**: `002-sitemap-generation` --- ## What This Feature Does Adds a `GET /sitemap.xml` endpoint to the `kme-content-adapter` proxy. When a crawler or sitemap consumer requests this URL, the adapter: 1. Obtains a valid OIDC `id_token` from the Redis cache (refreshing if expired). 2. Calls the KME Knowledge Search Service to retrieve all knowledge items. 3. Builds a standards-compliant XML Sitemap (`urlset`) with one `` per item. 4. Returns the sitemap as `application/xml` with HTTP 200. All other requests continue to use the existing OIDC auth flow without modification. --- ## Setup ### 1. Add the new settings fields Open `src/globalVariables/kme_CSA_settings.json` and add the three new fields: ```json { "tokenUrl": "https:///token", "username": "apiclient", "password": "", "clientId": "", "scope": "openid ...", "searchApiBaseUrl": "https:///api/search", "tenant": "", "proxyBaseUrl": "https://" } ``` | Field | Description | Example | |---|---|---| | `searchApiBaseUrl` | Base URL of the KME Knowledge Search Service | `https://kme-qa.example.com/search` | | `tenant` | Tenant identifier appended to the search URL path | `my-org` | | `proxyBaseUrl` | Externally accessible HTTPS URL of this adapter | `https://proxy.example.com` | The adapter will call `GET {searchApiBaseUrl}/{tenant}` to retrieve knowledge items. ### 2. Start the adapter ```bash npm run dev # development (auto-restart on changes) npm start # production ``` Redis must be running and accessible (default: `redis://localhost:6379`). --- ## Usage ### Request the sitemap ```bash curl -v http://localhost:3000/sitemap.xml ``` **Expected response**: ``` HTTP/1.1 200 OK Content-Type: application/xml https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1 ... ``` ### Validate the sitemap against the Sitemaps XSD ```bash # Using xmllint (libxml2) curl -s http://localhost:3000/sitemap.xml | \ xmllint --schema https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd --noout - ``` --- ## Running the Tests ```bash npm run test:unit # unit tests (VM context mocking, no network) npm run test:contract # contract tests (real HTTP, mock token/search servers) npm test # all tests ``` Unit tests live in `tests/unit/proxy.test.js`. Contract tests live in `tests/contract/proxy-http.test.js`. --- ## Error Scenarios | Scenario | How to reproduce | Expected response | |---|---|---| | Missing `searchApiBaseUrl` | Remove field from `kme_CSA_settings.json`, restart | `500 Configuration error: missing required field: searchApiBaseUrl` | | Search service down | Point `searchApiBaseUrl` to an unreachable host | `502 Search service error: HTTP ` or `504 Search service timeout` | | Zero results | Search service returns empty items array | `200 OK` with empty `` | | Items with empty `vkm:url` | (covered by unit tests) | Items silently omitted from sitemap | --- ## Architecture Notes - **No new files**: All new logic is added directly to `src/proxyScripts/kmeContentSourceAdapter.js` (monolithic architecture constraint). - **No new dependencies**: `xmlbuilder2` is already in `package.json` and injected into the VM context as `xmlbuilder2`. - **Token reuse**: The sitemap flow reuses the existing Redis `hGet`/token-refresh pattern — no separate auth logic. - **VM isolation**: The proxy script runs in a `vm.createContext` sandbox. It has access only to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlbuilder2`, `kme_CSA_settings`, `req`, `res`, `console`, `URLSearchParams`, `URL`, `crypto`).