feat(002): add sitemap generation feature
- Refactor kmeContentSourceAdapter.js into getValidToken(), oidcAuthFlow(), and sitemapFlow(); add sitemap generation using hydra:member response structure - Add searchApiBaseUrl, tenant, proxyBaseUrl fields to kme_CSA_settings.json and kme_CSA_settings.json.example - Add 17 unit tests for sitemap flow and non-sitemap routing regression - Add 5 contract tests for sitemap endpoint (proxy-http.test.js) - Add [Unreleased] sitemap entry to CHANGELOG.md - Add full specs/002-sitemap-generation/ artifact directory (spec, plan, tasks, data-model, contracts, research, quickstart, checklist) - Update constitution.md: add redis as permitted global, refresh kme_CSA_settings references - Update copilot-instructions.md SPECKIT marker to sitemap plan
This commit is contained in:
126
specs/002-sitemap-generation/quickstart.md
Normal file
126
specs/002-sitemap-generation/quickstart.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Quickstart: Sitemap XML Generation
|
||||
|
||||
**Feature**: `002-sitemap-generation`
|
||||
**Branch**: `002-sitemap-generation`
|
||||
|
||||
---
|
||||
|
||||
## What This Feature Does
|
||||
|
||||
Adds a `GET /sitemap.xml` endpoint to the `kme-content-adapter` proxy. When a crawler or
|
||||
sitemap consumer requests this URL, the adapter:
|
||||
|
||||
1. Obtains a valid OIDC `id_token` from the Redis cache (refreshing if expired).
|
||||
2. Calls the KME Knowledge Search Service to retrieve all knowledge items.
|
||||
3. Builds a standards-compliant XML Sitemap (`urlset`) with one `<loc>` per item.
|
||||
4. Returns the sitemap as `application/xml` with HTTP 200.
|
||||
|
||||
All other requests continue to use the existing OIDC auth flow without modification.
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Add the new settings fields
|
||||
|
||||
Open `src/globalVariables/kme_CSA_settings.json` and add the three new fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"tokenUrl": "https://<your-oidc-host>/token",
|
||||
"username": "apiclient",
|
||||
"password": "<your-password>",
|
||||
"clientId": "<your-client-id>",
|
||||
"scope": "openid ...",
|
||||
"searchApiBaseUrl": "https://<kme-search-host>/api/search",
|
||||
"tenant": "<your-tenant-id>",
|
||||
"proxyBaseUrl": "https://<your-adapter-external-url>"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description | Example |
|
||||
|---|---|---|
|
||||
| `searchApiBaseUrl` | Base URL of the KME Knowledge Search Service | `https://kme-qa.example.com/search` |
|
||||
| `tenant` | Tenant identifier appended to the search URL path | `my-org` |
|
||||
| `proxyBaseUrl` | Externally accessible HTTPS URL of this adapter | `https://proxy.example.com` |
|
||||
|
||||
The adapter will call `GET {searchApiBaseUrl}/{tenant}` to retrieve knowledge items.
|
||||
|
||||
### 2. Start the adapter
|
||||
|
||||
```bash
|
||||
npm run dev # development (auto-restart on changes)
|
||||
npm start # production
|
||||
```
|
||||
|
||||
Redis must be running and accessible (default: `redis://localhost:6379`).
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Request the sitemap
|
||||
|
||||
```bash
|
||||
curl -v http://localhost:3000/sitemap.xml
|
||||
```
|
||||
|
||||
**Expected response**:
|
||||
```
|
||||
HTTP/1.1 200 OK
|
||||
Content-Type: application/xml
|
||||
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>https://proxy.example.com?kmeURL=https%3A%2F%2Fkme.example.com%2Fdoc-1</loc>
|
||||
</url>
|
||||
...
|
||||
</urlset>
|
||||
```
|
||||
|
||||
### Validate the sitemap against the Sitemaps XSD
|
||||
|
||||
```bash
|
||||
# Using xmllint (libxml2)
|
||||
curl -s http://localhost:3000/sitemap.xml | \
|
||||
xmllint --schema https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd --noout -
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running the Tests
|
||||
|
||||
```bash
|
||||
npm run test:unit # unit tests (VM context mocking, no network)
|
||||
npm run test:contract # contract tests (real HTTP, mock token/search servers)
|
||||
npm test # all tests
|
||||
```
|
||||
|
||||
Unit tests live in `tests/unit/proxy.test.js`.
|
||||
Contract tests live in `tests/contract/proxy-http.test.js`.
|
||||
|
||||
---
|
||||
|
||||
## Error Scenarios
|
||||
|
||||
| Scenario | How to reproduce | Expected response |
|
||||
|---|---|---|
|
||||
| Missing `searchApiBaseUrl` | Remove field from `kme_CSA_settings.json`, restart | `500 Configuration error: missing required field: searchApiBaseUrl` |
|
||||
| Search service down | Point `searchApiBaseUrl` to an unreachable host | `502 Search service error: HTTP <status>` or `504 Search service timeout` |
|
||||
| Zero results | Search service returns empty items array | `200 OK` with empty `<urlset/>` |
|
||||
| Items with empty `vkm:url` | (covered by unit tests) | Items silently omitted from sitemap |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Notes
|
||||
|
||||
- **No new files**: All new logic is added directly to
|
||||
`src/proxyScripts/kmeContentSourceAdapter.js` (monolithic architecture constraint).
|
||||
- **No new dependencies**: `xmlbuilder2` is already in `package.json` and injected into the
|
||||
VM context as `xmlBuilder`.
|
||||
- **Token reuse**: The sitemap flow reuses the existing Redis `hGet`/token-refresh pattern —
|
||||
no separate auth logic.
|
||||
- **VM isolation**: The proxy script runs in a `vm.createContext` sandbox. It has access only
|
||||
to the injected globals listed in `src/server.js` (`axios`, `redis`, `xmlBuilder`,
|
||||
`kme_CSA_settings`, `req`, `res`, `console`, `URLSearchParams`, `URL`, `crypto`).
|
||||
Reference in New Issue
Block a user