Added new feature for document export
This commit is contained in:
436
specs/001-sitemap/contracts/sitemap-api.md
Normal file
436
specs/001-sitemap/contracts/sitemap-api.md
Normal file
@@ -0,0 +1,436 @@
|
||||
# API Contract: Sitemap Endpoint
|
||||
|
||||
**Feature**: 001-drive-proxy-adapter
|
||||
**Date**: 2026-03-07
|
||||
**Phase**: 1 - Design & Contracts
|
||||
**Endpoint**: `GET /sitemap.xml`
|
||||
|
||||
## Overview
|
||||
|
||||
The `/sitemap.xml` endpoint returns an XML sitemap listing all Google Drive documents accessible to the Service Account. This is the only endpoint exposed by the adapter.
|
||||
|
||||
---
|
||||
|
||||
## Endpoint Definition
|
||||
|
||||
### URL
|
||||
```
|
||||
GET /sitemap.xml
|
||||
```
|
||||
|
||||
### Authentication
|
||||
- **Method**: None (endpoint is public)
|
||||
- **Backend Authentication**: Service Account JWT to Google Drive API (transparent to client)
|
||||
- **Credentials**: Loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
|
||||
|
||||
### Request
|
||||
|
||||
**Method**: `GET`
|
||||
|
||||
**Headers**:
|
||||
- None required
|
||||
|
||||
**Query Parameters**:
|
||||
- None supported
|
||||
|
||||
**Request Body**:
|
||||
- None (GET request)
|
||||
|
||||
**Example Request**:
|
||||
```http
|
||||
GET /sitemap.xml HTTP/1.1
|
||||
Host: adapter.example.com
|
||||
User-Agent: Mozilla/5.0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Response Specifications
|
||||
|
||||
### Success Response (200 OK)
|
||||
|
||||
**Status Code**: `200 OK`
|
||||
|
||||
**Headers**:
|
||||
- `Content-Type: application/xml`
|
||||
- `Content-Length: {size_in_bytes}`
|
||||
|
||||
**Body**: Valid XML sitemap conforming to sitemap protocol
|
||||
|
||||
**XML Schema**:
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>https://adapter.example.com/documents/{documentId}</loc>
|
||||
<lastmod>2026-03-06T10:30:00.000Z</lastmod>
|
||||
</url>
|
||||
<!-- Additional <url> entries (up to 50,000) -->
|
||||
</urlset>
|
||||
```
|
||||
|
||||
**Field Descriptions**:
|
||||
- `<urlset>`: Root element with sitemap namespace
|
||||
- `<url>`: Individual URL entry (0 to 50,000 entries)
|
||||
- `<loc>`: Absolute URL to document using RESTful format `/documents/{documentId}`
|
||||
- `<lastmod>`: ISO 8601 timestamp of last document modification
|
||||
|
||||
**Constraints**:
|
||||
- Maximum 50,000 `<url>` entries (sitemap protocol limit per spec.md FR-015)
|
||||
- Maximum 50MB uncompressed (protocol limit, not enforced)
|
||||
- All `<loc>` URLs use same base URL (configured via `BASE_URL` env var)
|
||||
- All `<loc>` URLs use RESTful path format: `/documents/{documentId}`
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 200 OK
|
||||
Content-Type: application/xml
|
||||
Content-Length: 4582
|
||||
|
||||
```
|
||||
|
||||
**Performance Targets** (from spec.md success criteria):
|
||||
- Response time: < 5 seconds for up to 10,000 documents
|
||||
- Memory usage: < 256MB under normal load
|
||||
- Concurrent requests: Support 10 concurrent requests without degradation
|
||||
|
||||
---
|
||||
|
||||
### Not Found Response (404)
|
||||
|
||||
**Status Code**: `404 Not Found`
|
||||
|
||||
**Headers**: None
|
||||
|
||||
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
|
||||
|
||||
**When Returned**:
|
||||
- Any path other than `/sitemap.xml` (per spec.md FR-007)
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 404 Not Found
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Unauthorized Response (401)
|
||||
|
||||
**Status Code**: `401 Unauthorized`
|
||||
|
||||
**Headers**: None
|
||||
|
||||
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
|
||||
|
||||
**When Returned**:
|
||||
- Service Account JWT authentication failed (per spec.md FR-010)
|
||||
- OAuth token refresh failed
|
||||
- Invalid Service Account credentials
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 401 Unauthorized
|
||||
|
||||
```
|
||||
|
||||
**Client Action**: Check Service Account credentials in `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
|
||||
|
||||
---
|
||||
|
||||
### Rate Limited Response (429)
|
||||
|
||||
**Status Code**: `429 Too Many Requests`
|
||||
|
||||
**Headers**:
|
||||
- `Retry-After: {seconds}` (integer, seconds until retry allowed)
|
||||
|
||||
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
|
||||
|
||||
**When Returned**:
|
||||
- Google Drive API rate limit exceeded (per spec.md FR-013)
|
||||
- Quota exhausted for Service Account
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 429 Too Many Requests
|
||||
Retry-After: 60
|
||||
|
||||
```
|
||||
|
||||
**Client Action**: Wait `Retry-After` seconds before retrying request
|
||||
|
||||
**Retry-After Values**:
|
||||
- Derived from Google Drive API `Retry-After` header if available
|
||||
- Default: 60 seconds if not specified by Drive API
|
||||
|
||||
---
|
||||
|
||||
### Internal Server Error (500)
|
||||
|
||||
**Status Code**: `500 Internal Server Error`
|
||||
|
||||
**Headers**: None
|
||||
|
||||
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
|
||||
|
||||
**When Returned**:
|
||||
- Unexpected server error (per spec.md FR-008)
|
||||
- Configuration error (missing environment variables)
|
||||
- XML generation failure
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 500 Internal Server Error
|
||||
|
||||
```
|
||||
|
||||
**Client Action**: Report error to adapter administrator
|
||||
|
||||
**Server Logging**: All 500 errors logged with stack trace to stderr (per spec.md FR-012)
|
||||
|
||||
---
|
||||
|
||||
### Service Unavailable Response (503)
|
||||
|
||||
**Status Code**: `503 Service Unavailable`
|
||||
|
||||
**Headers**: None
|
||||
|
||||
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
|
||||
|
||||
**When Returned**:
|
||||
- Google Drive API unavailable (per spec.md FR-017)
|
||||
- Drive API returns 503 status (no retries per spec clarification)
|
||||
|
||||
**Example Response**:
|
||||
```http
|
||||
HTTP/1.1 503 Service Unavailable
|
||||
|
||||
```
|
||||
|
||||
**Client Action**: Retry request later (Drive API temporarily unavailable)
|
||||
|
||||
**Retry Behavior**: Adapter does NOT retry Drive API 503 errors; immediately returns 503 to client (per spec.md FR-017 clarification)
|
||||
|
||||
---
|
||||
|
||||
## Error Handling Specification
|
||||
|
||||
### Error Response Format
|
||||
|
||||
**All error responses follow same pattern**:
|
||||
- Status code indicates error type
|
||||
- No response body (per spec.md clarification)
|
||||
- Minimal headers (only `Retry-After` for 429)
|
||||
|
||||
**Rationale**: Simplicity, consistency, fail-fast approach
|
||||
|
||||
### Error Status Code Matrix
|
||||
|
||||
| Error Condition | Status Code | Headers | Body | Retry? |
|
||||
|----------------|-------------|---------|------|--------|
|
||||
| Authentication failed | 401 | None | Empty | No (fix credentials) |
|
||||
| Rate limit exceeded | 429 | `Retry-After` | Empty | Yes (after delay) |
|
||||
| Drive API unavailable | 503 | None | Empty | Yes (later) |
|
||||
| Internal error | 500 | None | Empty | No (report to admin) |
|
||||
| Path not found | 404 | None | Empty | No |
|
||||
|
||||
---
|
||||
|
||||
## Logging Specification
|
||||
|
||||
### Request Logging (stdout)
|
||||
|
||||
**All requests logged with**:
|
||||
- Timestamp (ISO 8601)
|
||||
- HTTP method and path
|
||||
- Response status code
|
||||
- Response time (milliseconds)
|
||||
|
||||
**Example**:
|
||||
```
|
||||
[2026-03-07T14:30:15.456Z] GET /sitemap.xml -> 200 (1234ms)
|
||||
[2026-03-07T14:30:20.789Z] GET /sitemap.xml -> 429 (234ms)
|
||||
[2026-03-07T14:30:25.012Z] GET /invalid.xml -> 404 (1ms)
|
||||
```
|
||||
|
||||
### Error Logging (stderr)
|
||||
|
||||
**All errors logged with**:
|
||||
- Timestamp (ISO 8601)
|
||||
- Request ID (for correlation)
|
||||
- Error message
|
||||
- Stack trace (for 500 errors)
|
||||
|
||||
**Example**:
|
||||
```
|
||||
[2026-03-07T14:30:20.789Z] [ERROR] Rate limit exceeded: Drive API quota exhausted
|
||||
[2026-03-07T14:30:25.012Z] [ERROR] Authentication failed: Invalid Service Account key
|
||||
[2026-03-07T14:30:30.345Z] [ERROR] Drive API unavailable: Connection timeout
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contract Tests
|
||||
|
||||
### Test Scenarios
|
||||
|
||||
1. **Successful sitemap generation**
|
||||
- Request: `GET /sitemap.xml`
|
||||
- Expected: 200 status, valid XML, `Content-Type: application/xml`
|
||||
|
||||
2. **Not found for other paths**
|
||||
- Request: `GET /invalid.xml`
|
||||
- Expected: 404 status, empty body
|
||||
|
||||
3. **Rate limiting**
|
||||
- Simulate Drive API 429 response
|
||||
- Expected: 429 status, `Retry-After` header, empty body
|
||||
|
||||
4. **Authentication failure**
|
||||
- Simulate invalid credentials
|
||||
- Expected: 401 status, empty body
|
||||
|
||||
5. **Service unavailable**
|
||||
- Simulate Drive API 503 response
|
||||
- Expected: 503 status, empty body (no retries)
|
||||
|
||||
6. **XML schema validation**
|
||||
- Request: `GET /sitemap.xml`
|
||||
- Validate XML against sitemap protocol schema
|
||||
|
||||
7. **URL format validation**
|
||||
- Request: `GET /sitemap.xml`
|
||||
- Verify all `<loc>` URLs use `/documents/{documentId}` format
|
||||
|
||||
### Test Assertions
|
||||
|
||||
**XML Schema Validation**:
|
||||
- Root element: `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`
|
||||
- Each `<url>` has required `<loc>` child
|
||||
- Each `<lastmod>` is valid ISO 8601 timestamp
|
||||
- Maximum 50,000 `<url>` entries
|
||||
|
||||
**URL Format Validation**:
|
||||
- All `<loc>` URLs are absolute (start with http:// or https://)
|
||||
- All `<loc>` URLs use RESTful format: `{baseUrl}/documents/{documentId}`
|
||||
- Document IDs match regex: `^[a-zA-Z0-9_-]+$`
|
||||
|
||||
**Header Validation**:
|
||||
- 200 responses include `Content-Type: application/xml`
|
||||
- 429 responses include `Retry-After` header with integer value
|
||||
- All error responses have empty body
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `GOOGLE_SERVICE_ACCOUNT_KEY` | Yes | None | Inline JSON of Service Account key file |
|
||||
| `BASE_URL` | Yes | None | Base URL for sitemap links (e.g., `https://adapter.example.com`) |
|
||||
| `PORT` | No | 3000 | HTTP server port |
|
||||
|
||||
**Example .env**:
|
||||
```bash
|
||||
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@developer.gserviceaccount.com",...}'
|
||||
BASE_URL=https://adapter.example.com
|
||||
PORT=3000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compatibility
|
||||
|
||||
### Sitemap Protocol Compliance
|
||||
|
||||
**Protocol**: https://www.sitemaps.org/protocol.html
|
||||
|
||||
**Compliance**:
|
||||
- ✅ Valid XML with namespace
|
||||
- ✅ `<loc>` with absolute URLs
|
||||
- ✅ `<lastmod>` with W3C Datetime format (ISO 8601)
|
||||
- ✅ Maximum 50,000 URLs
|
||||
- ✅ Maximum 50MB uncompressed size
|
||||
|
||||
**Optional Elements Not Used**:
|
||||
- `<changefreq>`: Not applicable (no historical change data)
|
||||
- `<priority>`: Not applicable (all documents equal priority)
|
||||
|
||||
### HTTP Compliance
|
||||
|
||||
**HTTP Version**: HTTP/1.1
|
||||
|
||||
**Methods Supported**: `GET` only
|
||||
|
||||
**Status Codes Used**: 200, 401, 404, 429, 500, 503
|
||||
|
||||
**Headers Used**:
|
||||
- Response: `Content-Type`, `Content-Length`, `Retry-After`
|
||||
- Request: Standard HTTP headers accepted, none required
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Authentication
|
||||
- Service Account credentials secured in environment variable (not in code or config files)
|
||||
- Credentials never logged or exposed in error messages
|
||||
- Read-only Drive scope (`drive.readonly`) - no write permissions
|
||||
|
||||
### Rate Limiting
|
||||
- Transparent propagation of Drive API rate limits to client
|
||||
- No internal rate limiting (rely on Drive API limits)
|
||||
|
||||
### Input Validation
|
||||
- Path validation: Only `/sitemap.xml` accepted
|
||||
- Method validation: Only `GET` accepted
|
||||
- No query parameters processed (rejection not required, just ignored)
|
||||
|
||||
### Output Sanitization
|
||||
- All URLs XML-escaped to prevent injection
|
||||
- All timestamps XML-escaped (though ISO 8601 format doesn't contain XML special chars)
|
||||
|
||||
---
|
||||
|
||||
## Versioning
|
||||
|
||||
**Current Version**: 1.0.0 (initial implementation)
|
||||
|
||||
**Future Changes**:
|
||||
- Breaking changes (new required parameters): Major version bump (2.0.0)
|
||||
- Backward-compatible additions (query parameters): Minor version bump (1.1.0)
|
||||
- Bug fixes: Patch version bump (1.0.1)
|
||||
|
||||
**Deprecation Policy**:
|
||||
- Breaking changes include migration guide
|
||||
- Deprecated features supported for at least one minor version
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
|
||||
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
|
||||
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
|
||||
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
|
||||
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference
|
||||
|
||||
|
||||
**Deprecation Policy**:
|
||||
- Breaking changes include migration guide
|
||||
- Deprecated features supported for at least one minor version
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
|
||||
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
|
||||
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
|
||||
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
|
||||
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference
|
||||
|
||||
Reference in New Issue
Block a user