# API Contract: Sitemap Endpoint **Feature**: 001-drive-proxy-adapter **Date**: 2026-03-07 **Phase**: 1 - Design & Contracts **Endpoint**: `GET /sitemap.xml` ## Overview The `/sitemap.xml` endpoint returns an XML sitemap listing all Google Drive documents accessible to the Service Account. This is the only endpoint exposed by the adapter. --- ## Endpoint Definition ### URL ``` GET /sitemap.xml ``` ### Authentication - **Method**: None (endpoint is public) - **Backend Authentication**: Service Account JWT to Google Drive API (transparent to client) - **Credentials**: Loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable ### Request **Method**: `GET` **Headers**: - None required **Query Parameters**: - None supported **Request Body**: - None (GET request) **Example Request**: ```http GET /sitemap.xml HTTP/1.1 Host: adapter.example.com User-Agent: Mozilla/5.0 ``` --- ## Response Specifications ### Success Response (200 OK) **Status Code**: `200 OK` **Headers**: - `Content-Type: application/xml` - `Content-Length: {size_in_bytes}` **Body**: Valid XML sitemap conforming to sitemap protocol **XML Schema**: ```xml https://adapter.example.com/documents/{documentId} 2026-03-06T10:30:00.000Z ``` **Field Descriptions**: - ``: Root element with sitemap namespace - ``: Individual URL entry (0 to 50,000 entries) - ``: Absolute URL to document using RESTful format `/documents/{documentId}` - ``: ISO 8601 timestamp of last document modification **Constraints**: - Maximum 50,000 `` entries (sitemap protocol limit per spec.md FR-015) - Maximum 50MB uncompressed (protocol limit, not enforced) - All `` URLs use same base URL (configured via `BASE_URL` env var) - All `` URLs use RESTful path format: `/documents/{documentId}` **Example Response**: ```http HTTP/1.1 200 OK Content-Type: application/xml Content-Length: 4582 https://adapter.example.com/documents/1BxAA_example123 2026-03-06T10:30:00.000Z https://adapter.example.com/documents/1CyBB_example456 2026-03-05T14:20:00.000Z https://adapter.example.com/documents/1DzCC_example789 2026-03-04T08:15:00.000Z ``` **Performance Targets** (from spec.md success criteria): - Response time: < 5 seconds for up to 10,000 documents - Memory usage: < 256MB under normal load - Concurrent requests: Support 10 concurrent requests without degradation --- ### Not Found Response (404) **Status Code**: `404 Not Found` **Headers**: None **Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body") **When Returned**: - Any path other than `/sitemap.xml` (per spec.md FR-007) **Example Response**: ```http HTTP/1.1 404 Not Found ``` --- ### Unauthorized Response (401) **Status Code**: `401 Unauthorized` **Headers**: None **Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body") **When Returned**: - Service Account JWT authentication failed (per spec.md FR-010) - OAuth token refresh failed - Invalid Service Account credentials **Example Response**: ```http HTTP/1.1 401 Unauthorized ``` **Client Action**: Check Service Account credentials in `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable --- ### Rate Limited Response (429) **Status Code**: `429 Too Many Requests` **Headers**: - `Retry-After: {seconds}` (integer, seconds until retry allowed) **Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body") **When Returned**: - Google Drive API rate limit exceeded (per spec.md FR-013) - Quota exhausted for Service Account **Example Response**: ```http HTTP/1.1 429 Too Many Requests Retry-After: 60 ``` **Client Action**: Wait `Retry-After` seconds before retrying request **Retry-After Values**: - Derived from Google Drive API `Retry-After` header if available - Default: 60 seconds if not specified by Drive API --- ### Internal Server Error (500) **Status Code**: `500 Internal Server Error` **Headers**: None **Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body") **When Returned**: - Unexpected server error (per spec.md FR-008) - Configuration error (missing environment variables) - XML generation failure **Example Response**: ```http HTTP/1.1 500 Internal Server Error ``` **Client Action**: Report error to adapter administrator **Server Logging**: All 500 errors logged with stack trace to stderr (per spec.md FR-012) --- ### Service Unavailable Response (503) **Status Code**: `503 Service Unavailable` **Headers**: None **Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body") **When Returned**: - Google Drive API unavailable (per spec.md FR-017) - Drive API returns 503 status (no retries per spec clarification) **Example Response**: ```http HTTP/1.1 503 Service Unavailable ``` **Client Action**: Retry request later (Drive API temporarily unavailable) **Retry Behavior**: Adapter does NOT retry Drive API 503 errors; immediately returns 503 to client (per spec.md FR-017 clarification) --- ## Error Handling Specification ### Error Response Format **All error responses follow same pattern**: - Status code indicates error type - No response body (per spec.md clarification) - Minimal headers (only `Retry-After` for 429) **Rationale**: Simplicity, consistency, fail-fast approach ### Error Status Code Matrix | Error Condition | Status Code | Headers | Body | Retry? | |----------------|-------------|---------|------|--------| | Authentication failed | 401 | None | Empty | No (fix credentials) | | Rate limit exceeded | 429 | `Retry-After` | Empty | Yes (after delay) | | Drive API unavailable | 503 | None | Empty | Yes (later) | | Internal error | 500 | None | Empty | No (report to admin) | | Path not found | 404 | None | Empty | No | --- ## Logging Specification ### Request Logging (stdout) **All requests logged with**: - Timestamp (ISO 8601) - HTTP method and path - Response status code - Response time (milliseconds) **Example**: ``` [2026-03-07T14:30:15.456Z] GET /sitemap.xml -> 200 (1234ms) [2026-03-07T14:30:20.789Z] GET /sitemap.xml -> 429 (234ms) [2026-03-07T14:30:25.012Z] GET /invalid.xml -> 404 (1ms) ``` ### Error Logging (stderr) **All errors logged with**: - Timestamp (ISO 8601) - Request ID (for correlation) - Error message - Stack trace (for 500 errors) **Example**: ``` [2026-03-07T14:30:20.789Z] [ERROR] Rate limit exceeded: Drive API quota exhausted [2026-03-07T14:30:25.012Z] [ERROR] Authentication failed: Invalid Service Account key [2026-03-07T14:30:30.345Z] [ERROR] Drive API unavailable: Connection timeout ``` --- ## Contract Tests ### Test Scenarios 1. **Successful sitemap generation** - Request: `GET /sitemap.xml` - Expected: 200 status, valid XML, `Content-Type: application/xml` 2. **Not found for other paths** - Request: `GET /invalid.xml` - Expected: 404 status, empty body 3. **Rate limiting** - Simulate Drive API 429 response - Expected: 429 status, `Retry-After` header, empty body 4. **Authentication failure** - Simulate invalid credentials - Expected: 401 status, empty body 5. **Service unavailable** - Simulate Drive API 503 response - Expected: 503 status, empty body (no retries) 6. **XML schema validation** - Request: `GET /sitemap.xml` - Validate XML against sitemap protocol schema 7. **URL format validation** - Request: `GET /sitemap.xml` - Verify all `` URLs use `/documents/{documentId}` format ### Test Assertions **XML Schema Validation**: - Root element: `` - Each `` has required `` child - Each `` is valid ISO 8601 timestamp - Maximum 50,000 `` entries **URL Format Validation**: - All `` URLs are absolute (start with http:// or https://) - All `` URLs use RESTful format: `{baseUrl}/documents/{documentId}` - Document IDs match regex: `^[a-zA-Z0-9_-]+$` **Header Validation**: - 200 responses include `Content-Type: application/xml` - 429 responses include `Retry-After` header with integer value - All error responses have empty body --- ## Configuration ### Environment Variables | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `GOOGLE_SERVICE_ACCOUNT_KEY` | Yes | None | Inline JSON of Service Account key file | | `BASE_URL` | Yes | None | Base URL for sitemap links (e.g., `https://adapter.example.com`) | | `PORT` | No | 3000 | HTTP server port | **Example .env**: ```bash GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@developer.gserviceaccount.com",...}' BASE_URL=https://adapter.example.com PORT=3000 ``` --- ## Compatibility ### Sitemap Protocol Compliance **Protocol**: https://www.sitemaps.org/protocol.html **Compliance**: - ✅ Valid XML with namespace - ✅ `` with absolute URLs - ✅ `` with W3C Datetime format (ISO 8601) - ✅ Maximum 50,000 URLs - ✅ Maximum 50MB uncompressed size **Optional Elements Not Used**: - ``: Not applicable (no historical change data) - ``: Not applicable (all documents equal priority) ### HTTP Compliance **HTTP Version**: HTTP/1.1 **Methods Supported**: `GET` only **Status Codes Used**: 200, 401, 404, 429, 500, 503 **Headers Used**: - Response: `Content-Type`, `Content-Length`, `Retry-After` - Request: Standard HTTP headers accepted, none required --- ## Security Considerations ### Authentication - Service Account credentials secured in environment variable (not in code or config files) - Credentials never logged or exposed in error messages - Read-only Drive scope (`drive.readonly`) - no write permissions ### Rate Limiting - Transparent propagation of Drive API rate limits to client - No internal rate limiting (rely on Drive API limits) ### Input Validation - Path validation: Only `/sitemap.xml` accepted - Method validation: Only `GET` accepted - No query parameters processed (rejection not required, just ignored) ### Output Sanitization - All URLs XML-escaped to prevent injection - All timestamps XML-escaped (though ISO 8601 format doesn't contain XML special chars) --- ## Versioning **Current Version**: 1.0.0 (initial implementation) **Future Changes**: - Breaking changes (new required parameters): Major version bump (2.0.0) - Backward-compatible additions (query parameters): Minor version bump (1.1.0) - Bug fixes: Patch version bump (1.0.1) **Deprecation Policy**: - Breaking changes include migration guide - Deprecated features supported for at least one minor version --- ## References - Feature Specification: `/specs/001-drive-proxy-adapter/spec.md` - Data Model: `/specs/001-drive-proxy-adapter/data-model.md` - Research Document: `/specs/001-drive-proxy-adapter/research.md` - Sitemap Protocol: https://www.sitemaps.org/protocol.html - Google Drive API v3: https://developers.google.com/drive/api/v3/reference