Added new feature for document export

This commit is contained in:
2026-03-10 16:25:05 -05:00
parent d477367256
commit 2acb04ad76
11 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,290 @@
openapi: 3.0.3
info:
title: Google Drive Sitemap Adapter API
description: |
HTTP adapter for generating XML sitemaps listing accessible Google Drive documents.
## Overview
This adapter provides a single endpoint (`/sitemap.xml`) that generates a valid XML sitemap
conforming to the sitemap protocol (https://www.sitemaps.org/protocol.html).
The sitemap lists all documents accessible to the configured Google Service Account,
with URLs pointing back to this adapter using document IDs.
## Authentication
The adapter uses OAuth 2.0 Service Account authentication to access Google Drive.
External clients do not need to authenticate with this API.
## Rate Limiting
Google Drive API rate limits are handled gracefully. If rate limited, the adapter
returns HTTP 429 with a Retry-After header indicating seconds until retry.
## Sitemap Protocol Compliance
- Maximum 50,000 URLs per sitemap (protocol limit)
- Each URL includes document ID and last modified timestamp
- Always returns fresh data (no caching)
version: 1.0.0
contact:
name: API Support
license:
name: ISC
servers:
- url: http://localhost:3000
description: Development server
- url: https://adapter.example.com
description: Production server
tags:
- name: Sitemap
description: XML sitemap generation
paths:
/sitemap.xml:
get:
summary: Generate XML sitemap
description: |
Returns an XML sitemap listing all accessible Google Drive documents.
Each URL in the sitemap points to this adapter with a document ID:
`{baseUrl}/{documentId}`
The sitemap is generated on-demand (no caching) and may take up to 5 seconds
for drives containing up to 10,000 documents.
## Sitemap Format
Conforms to https://www.sitemaps.org/protocol.html:
- `<loc>`: Absolute URL with document ID
- `<lastmod>`: Last modified timestamp (ISO 8601)
## Document Retrieval
Note: The URLs in the sitemap point back to this adapter, but document retrieval
endpoints are not implemented. This adapter only generates sitemaps for discovery.
operationId: getSitemap
tags:
- Sitemap
responses:
'200':
description: Successfully generated sitemap
headers:
Content-Type:
description: Always application/xml
schema:
type: string
example: application/xml
Content-Length:
description: Size of sitemap in bytes
schema:
type: integer
example: 204800
content:
application/xml:
schema:
type: string
format: xml
example: |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com/1BxAA_example123</loc>
<lastmod>2026-03-06T10:30:00.000Z</lastmod>
</url>
<url>
<loc>https://adapter.example.com/1CyBB_example456</loc>
<lastmod>2026-03-05T14:20:00.000Z</lastmod>
</url>
</urlset>
'401':
description: Unauthorized - OAuth authentication failed
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'429':
description: Too Many Requests - Rate limited by Google Drive API
headers:
Retry-After:
description: Seconds to wait before retrying
schema:
type: integer
example: 60
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'500':
description: Internal Server Error
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'503':
description: Service Unavailable - Google Drive API is down
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
/{documentId}:
get:
summary: Document retrieval endpoint (NOT IMPLEMENTED)
description: |
This endpoint is referenced in sitemap URLs but is not implemented.
The adapter only generates sitemaps; it does not serve documents.
Clients should treat sitemap URLs as metadata only.
operationId: getDocument
tags:
- Documents (Not Implemented)
parameters:
- name: documentId
in: path
description: Google Drive document ID
required: true
schema:
type: string
pattern: '^[a-zA-Z0-9_-]+$'
example: 1BxAA_example123
responses:
'404':
description: Not Found - Document retrieval not implemented
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
/{anyOtherPath}:
get:
summary: All other paths
description: |
Any path other than `/sitemap.xml` returns 404 Not Found.
operationId: notFound
tags:
- Routing
parameters:
- name: anyOtherPath
in: path
description: Any path other than /sitemap.xml
required: true
schema:
type: string
responses:
'404':
description: Not Found
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
components:
schemas:
Sitemap:
type: object
description: XML sitemap structure (logical representation, actual response is XML)
properties:
xmlns:
type: string
description: XML namespace for sitemap protocol
example: http://www.sitemaps.org/schemas/sitemap/0.9
urls:
type: array
description: Array of URL entries
items:
$ref: '#/components/schemas/SitemapUrl'
maxItems: 50000
SitemapUrl:
type: object
description: Single URL entry in sitemap
required:
- loc
- lastmod
properties:
loc:
type: string
format: uri
description: Absolute URL to document (adapter URL + document ID)
example: https://adapter.example.com/1BxAA_example123
lastmod:
type: string
format: date-time
description: Last modified timestamp in ISO 8601 format
example: 2026-03-06T10:30:00.000Z
Error:
type: object
description: Error response (note - most errors return empty body per spec)
properties:
code:
type: integer
description: HTTP status code
example: 500
message:
type: string
description: Error message (not included in actual responses)
example: Internal Server Error
responses:
UnauthorizedError:
description: Unauthorized - OAuth authentication failed
headers:
Content-Length:
schema:
type: integer
example: 0
RateLimitError:
description: Too Many Requests - Rate limited by Google Drive API
headers:
Retry-After:
description: Seconds to wait before retrying
schema:
type: integer
example: 60
Content-Length:
schema:
type: integer
example: 0
InternalError:
description: Internal Server Error
headers:
Content-Length:
schema:
type: integer
example: 0
ServiceUnavailable:
description: Service Unavailable - Google Drive API is down
headers:
Content-Length:
schema:
type: integer
example: 0
NotFound:
description: Not Found - Path not recognized
headers:
Content-Length:
schema:
type: integer
example: 0
externalDocs:
description: Sitemap Protocol Specification
url: https://www.sitemaps.org/protocol.html

View File

@@ -0,0 +1,454 @@
openapi: 3.0.3
info:
title: Google Drive HTTP Proxy Adapter API
description: |
HTTP proxy adapter for exporting Google Drive documents in multiple formats (Markdown, HTML, PDF)
and generating XML sitemaps of accessible documents.
## Authentication
The adapter uses OAuth 2.0 to access Google Drive on behalf of configured users.
External clients do not need to authenticate with this API directly.
## Rate Limiting
API requests are rate-limited to 100 requests per minute per IP address.
Rate limit information is included in response headers.
version: 1.0.0
contact:
name: API Support
license:
name: MIT
servers:
- url: http://localhost:3000
description: Development server
- url: https://api.example.com
description: Production server
tags:
- name: Documents
description: Document export operations
- name: Discovery
description: Document discovery and listing
- name: Health
description: Service health monitoring
paths:
/health:
get:
summary: Health check endpoint
description: Returns service health status and version information
tags:
- Health
responses:
'200':
description: Service is healthy
content:
application/json:
schema:
type: object
properties:
status:
type: string
example: ok
version:
type: string
example: 1.0.0
uptime:
type: number
description: Service uptime in seconds
example: 86400
/sitemap.xml:
get:
summary: Generate sitemap of accessible documents
description: |
Returns an XML sitemap listing all Google Drive documents accessible to the configured user.
Follows the sitemap protocol specification (https://www.sitemaps.org/protocol.html).
tags:
- Discovery
responses:
'200':
description: Sitemap generated successfully
headers:
Content-Type:
schema:
type: string
example: application/xml; charset=utf-8
X-Request-Id:
schema:
type: string
format: uuid
description: Unique request identifier for tracing
X-Document-Count:
schema:
type: integer
description: Number of documents in the sitemap
content:
application/xml:
schema:
type: string
format: xml
example: |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost:3000/1BxAA_example123</loc>
<lastmod>2026-03-06T10:30:00Z</lastmod>
</url>
<url>
<loc>http://localhost:3000/2CyBB_example456</loc>
<lastmod>2026-03-05T14:20:00Z</lastmod>
</url>
</urlset>
'401':
$ref: '#/components/responses/Unauthorized'
'429':
$ref: '#/components/responses/RateLimited'
'500':
$ref: '#/components/responses/InternalError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
/{documentId}:
get:
summary: Export Google Drive document in specified format
description: |
Fetches a Google Drive document by ID and exports it in the requested format.
Supports Markdown (default), HTML, and PDF formats.
tags:
- Documents
parameters:
- name: documentId
in: path
required: true
description: Google Drive file ID (8-128 alphanumeric characters, hyphens, or underscores)
schema:
type: string
pattern: '^[a-zA-Z0-9_-]{8,128}$'
example: 1BxAA_example123
- name: format
in: query
required: false
description: Export format (defaults to markdown if not specified)
schema:
type: string
enum:
- markdown
- html
- pdf
default: markdown
example: markdown
responses:
'200':
description: Document exported successfully
headers:
Content-Type:
schema:
type: string
enum:
- text/markdown; charset=utf-8
- text/html; charset=utf-8
- application/pdf
description: MIME type of exported document
X-Request-Id:
schema:
type: string
format: uuid
description: Unique request identifier for tracing
X-Document-Title:
schema:
type: string
description: Original document title from Google Drive
X-Document-Modified:
schema:
type: string
format: date-time
description: Last modified timestamp (ISO 8601)
content:
text/markdown:
schema:
type: string
example: |
# Document Title
This is a paragraph with **bold** and *italic* text.
## Section Heading
- List item 1
- List item 2
text/html:
schema:
type: string
example: |
<!DOCTYPE html>
<html>
<head><title>Document Title</title></head>
<body>
<h1>Document Title</h1>
<p>This is a paragraph with <strong>bold</strong> and <em>italic</em> text.</p>
</body>
</html>
application/pdf:
schema:
type: string
format: binary
'400':
$ref: '#/components/responses/BadRequest'
'401':
$ref: '#/components/responses/Unauthorized'
'403':
$ref: '#/components/responses/Forbidden'
'404':
$ref: '#/components/responses/NotFound'
'413':
$ref: '#/components/responses/PayloadTooLarge'
'415':
$ref: '#/components/responses/UnsupportedMediaType'
'429':
$ref: '#/components/responses/RateLimited'
'500':
$ref: '#/components/responses/InternalError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
components:
schemas:
ErrorResponse:
type: object
required:
- error
- timestamp
properties:
error:
type: object
required:
- code
- message
- requestId
properties:
code:
type: string
description: Machine-readable error code
enum:
- DOCUMENT_NOT_FOUND
- DOCUMENT_FORBIDDEN
- UNAUTHORIZED
- INVALID_FORMAT
- UNSUPPORTED_DOCUMENT_TYPE
- RATE_LIMITED
- DRIVE_API_ERROR
- INTERNAL_ERROR
- PAYLOAD_TOO_LARGE
example: DOCUMENT_NOT_FOUND
message:
type: string
description: Human-readable error message
example: Document with ID '1BxAA_example123' does not exist or is not accessible
details:
type: object
description: Optional additional context
additionalProperties: true
requestId:
type: string
format: uuid
description: Request ID for support and debugging
example: 550e8400-e29b-41d4-a716-446655440000
timestamp:
type: string
format: date-time
description: ISO 8601 timestamp when error occurred
example: '2026-03-06T10:30:00.123Z'
responses:
BadRequest:
description: Invalid request parameters
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: INVALID_FORMAT
message: "Invalid format 'docx'. Supported formats: markdown, html, pdf"
requestId: 550e8400-e29b-41d4-a716-446655440000
timestamp: '2026-03-06T10:30:00.123Z'
Unauthorized:
description: Authentication failed or missing
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: UNAUTHORIZED
message: Authentication with Google Drive failed
requestId: 550e8400-e29b-41d4-a716-446655440001
timestamp: '2026-03-06T10:30:01.456Z'
Forbidden:
description: User lacks permission to access the document
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DOCUMENT_FORBIDDEN
message: You do not have permission to access this document
requestId: 550e8400-e29b-41d4-a716-446655440002
timestamp: '2026-03-06T10:30:02.789Z'
NotFound:
description: Document does not exist
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DOCUMENT_NOT_FOUND
message: Document with ID '1BxAA_invalid' does not exist or is not accessible
requestId: 550e8400-e29b-41d4-a716-446655440003
timestamp: '2026-03-06T10:30:03.012Z'
PayloadTooLarge:
description: Document exceeds maximum size limit
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: PAYLOAD_TOO_LARGE
message: Document size exceeds maximum limit of 100MB
requestId: 550e8400-e29b-41d4-a716-446655440004
timestamp: '2026-03-06T10:30:04.345Z'
UnsupportedMediaType:
description: Document type cannot be exported in requested format
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: UNSUPPORTED_DOCUMENT_TYPE
message: Document type 'application/vnd.google-apps.form' cannot be exported as PDF
requestId: 550e8400-e29b-41d4-a716-446655440005
timestamp: '2026-03-06T10:30:05.678Z'
RateLimited:
description: Rate limit exceeded
headers:
X-Request-Id:
schema:
type: string
format: uuid
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests per minute
example: 100
X-RateLimit-Remaining:
schema:
type: integer
description: Remaining requests in current window
example: 0
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when rate limit resets
example: 1709724660
Retry-After:
schema:
type: integer
description: Seconds until rate limit resets
example: 60
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: RATE_LIMITED
message: Rate limit exceeded. Please retry after 60 seconds
requestId: 550e8400-e29b-41d4-a716-446655440006
timestamp: '2026-03-06T10:30:06.901Z'
InternalError:
description: Internal server error
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: INTERNAL_ERROR
message: An unexpected error occurred while processing your request
requestId: 550e8400-e29b-41d4-a716-446655440007
timestamp: '2026-03-06T10:30:07.234Z'
ServiceUnavailable:
description: Service temporarily unavailable (Google Drive API down or rate limited)
headers:
X-Request-Id:
schema:
type: string
format: uuid
Retry-After:
schema:
type: integer
description: Seconds until service may be available
example: 300
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DRIVE_API_ERROR
message: Google Drive API is temporarily unavailable. Please retry later
requestId: 550e8400-e29b-41d4-a716-446655440008
timestamp: '2026-03-06T10:30:08.567Z'

View File

@@ -0,0 +1,436 @@
# API Contract: Sitemap Endpoint
**Feature**: 001-drive-proxy-adapter
**Date**: 2026-03-07
**Phase**: 1 - Design & Contracts
**Endpoint**: `GET /sitemap.xml`
## Overview
The `/sitemap.xml` endpoint returns an XML sitemap listing all Google Drive documents accessible to the Service Account. This is the only endpoint exposed by the adapter.
---
## Endpoint Definition
### URL
```
GET /sitemap.xml
```
### Authentication
- **Method**: None (endpoint is public)
- **Backend Authentication**: Service Account JWT to Google Drive API (transparent to client)
- **Credentials**: Loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
### Request
**Method**: `GET`
**Headers**:
- None required
**Query Parameters**:
- None supported
**Request Body**:
- None (GET request)
**Example Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: adapter.example.com
User-Agent: Mozilla/5.0
```
---
## Response Specifications
### Success Response (200 OK)
**Status Code**: `200 OK`
**Headers**:
- `Content-Type: application/xml`
- `Content-Length: {size_in_bytes}`
**Body**: Valid XML sitemap conforming to sitemap protocol
**XML Schema**:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com/documents/{documentId}</loc>
<lastmod>2026-03-06T10:30:00.000Z</lastmod>
</url>
<!-- Additional <url> entries (up to 50,000) -->
</urlset>
```
**Field Descriptions**:
- `<urlset>`: Root element with sitemap namespace
- `<url>`: Individual URL entry (0 to 50,000 entries)
- `<loc>`: Absolute URL to document using RESTful format `/documents/{documentId}`
- `<lastmod>`: ISO 8601 timestamp of last document modification
**Constraints**:
- Maximum 50,000 `<url>` entries (sitemap protocol limit per spec.md FR-015)
- Maximum 50MB uncompressed (protocol limit, not enforced)
- All `<loc>` URLs use same base URL (configured via `BASE_URL` env var)
- All `<loc>` URLs use RESTful path format: `/documents/{documentId}`
**Example Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 4582
```
**Performance Targets** (from spec.md success criteria):
- Response time: < 5 seconds for up to 10,000 documents
- Memory usage: < 256MB under normal load
- Concurrent requests: Support 10 concurrent requests without degradation
---
### Not Found Response (404)
**Status Code**: `404 Not Found`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Any path other than `/sitemap.xml` (per spec.md FR-007)
**Example Response**:
```http
HTTP/1.1 404 Not Found
```
---
### Unauthorized Response (401)
**Status Code**: `401 Unauthorized`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Service Account JWT authentication failed (per spec.md FR-010)
- OAuth token refresh failed
- Invalid Service Account credentials
**Example Response**:
```http
HTTP/1.1 401 Unauthorized
```
**Client Action**: Check Service Account credentials in `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
---
### Rate Limited Response (429)
**Status Code**: `429 Too Many Requests`
**Headers**:
- `Retry-After: {seconds}` (integer, seconds until retry allowed)
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Google Drive API rate limit exceeded (per spec.md FR-013)
- Quota exhausted for Service Account
**Example Response**:
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 60
```
**Client Action**: Wait `Retry-After` seconds before retrying request
**Retry-After Values**:
- Derived from Google Drive API `Retry-After` header if available
- Default: 60 seconds if not specified by Drive API
---
### Internal Server Error (500)
**Status Code**: `500 Internal Server Error`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Unexpected server error (per spec.md FR-008)
- Configuration error (missing environment variables)
- XML generation failure
**Example Response**:
```http
HTTP/1.1 500 Internal Server Error
```
**Client Action**: Report error to adapter administrator
**Server Logging**: All 500 errors logged with stack trace to stderr (per spec.md FR-012)
---
### Service Unavailable Response (503)
**Status Code**: `503 Service Unavailable`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Google Drive API unavailable (per spec.md FR-017)
- Drive API returns 503 status (no retries per spec clarification)
**Example Response**:
```http
HTTP/1.1 503 Service Unavailable
```
**Client Action**: Retry request later (Drive API temporarily unavailable)
**Retry Behavior**: Adapter does NOT retry Drive API 503 errors; immediately returns 503 to client (per spec.md FR-017 clarification)
---
## Error Handling Specification
### Error Response Format
**All error responses follow same pattern**:
- Status code indicates error type
- No response body (per spec.md clarification)
- Minimal headers (only `Retry-After` for 429)
**Rationale**: Simplicity, consistency, fail-fast approach
### Error Status Code Matrix
| Error Condition | Status Code | Headers | Body | Retry? |
|----------------|-------------|---------|------|--------|
| Authentication failed | 401 | None | Empty | No (fix credentials) |
| Rate limit exceeded | 429 | `Retry-After` | Empty | Yes (after delay) |
| Drive API unavailable | 503 | None | Empty | Yes (later) |
| Internal error | 500 | None | Empty | No (report to admin) |
| Path not found | 404 | None | Empty | No |
---
## Logging Specification
### Request Logging (stdout)
**All requests logged with**:
- Timestamp (ISO 8601)
- HTTP method and path
- Response status code
- Response time (milliseconds)
**Example**:
```
[2026-03-07T14:30:15.456Z] GET /sitemap.xml -> 200 (1234ms)
[2026-03-07T14:30:20.789Z] GET /sitemap.xml -> 429 (234ms)
[2026-03-07T14:30:25.012Z] GET /invalid.xml -> 404 (1ms)
```
### Error Logging (stderr)
**All errors logged with**:
- Timestamp (ISO 8601)
- Request ID (for correlation)
- Error message
- Stack trace (for 500 errors)
**Example**:
```
[2026-03-07T14:30:20.789Z] [ERROR] Rate limit exceeded: Drive API quota exhausted
[2026-03-07T14:30:25.012Z] [ERROR] Authentication failed: Invalid Service Account key
[2026-03-07T14:30:30.345Z] [ERROR] Drive API unavailable: Connection timeout
```
---
## Contract Tests
### Test Scenarios
1. **Successful sitemap generation**
- Request: `GET /sitemap.xml`
- Expected: 200 status, valid XML, `Content-Type: application/xml`
2. **Not found for other paths**
- Request: `GET /invalid.xml`
- Expected: 404 status, empty body
3. **Rate limiting**
- Simulate Drive API 429 response
- Expected: 429 status, `Retry-After` header, empty body
4. **Authentication failure**
- Simulate invalid credentials
- Expected: 401 status, empty body
5. **Service unavailable**
- Simulate Drive API 503 response
- Expected: 503 status, empty body (no retries)
6. **XML schema validation**
- Request: `GET /sitemap.xml`
- Validate XML against sitemap protocol schema
7. **URL format validation**
- Request: `GET /sitemap.xml`
- Verify all `<loc>` URLs use `/documents/{documentId}` format
### Test Assertions
**XML Schema Validation**:
- Root element: `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`
- Each `<url>` has required `<loc>` child
- Each `<lastmod>` is valid ISO 8601 timestamp
- Maximum 50,000 `<url>` entries
**URL Format Validation**:
- All `<loc>` URLs are absolute (start with http:// or https://)
- All `<loc>` URLs use RESTful format: `{baseUrl}/documents/{documentId}`
- Document IDs match regex: `^[a-zA-Z0-9_-]+$`
**Header Validation**:
- 200 responses include `Content-Type: application/xml`
- 429 responses include `Retry-After` header with integer value
- All error responses have empty body
---
## Configuration
### Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GOOGLE_SERVICE_ACCOUNT_KEY` | Yes | None | Inline JSON of Service Account key file |
| `BASE_URL` | Yes | None | Base URL for sitemap links (e.g., `https://adapter.example.com`) |
| `PORT` | No | 3000 | HTTP server port |
**Example .env**:
```bash
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@developer.gserviceaccount.com",...}'
BASE_URL=https://adapter.example.com
PORT=3000
```
---
## Compatibility
### Sitemap Protocol Compliance
**Protocol**: https://www.sitemaps.org/protocol.html
**Compliance**:
- ✅ Valid XML with namespace
-`<loc>` with absolute URLs
-`<lastmod>` with W3C Datetime format (ISO 8601)
- ✅ Maximum 50,000 URLs
- ✅ Maximum 50MB uncompressed size
**Optional Elements Not Used**:
- `<changefreq>`: Not applicable (no historical change data)
- `<priority>`: Not applicable (all documents equal priority)
### HTTP Compliance
**HTTP Version**: HTTP/1.1
**Methods Supported**: `GET` only
**Status Codes Used**: 200, 401, 404, 429, 500, 503
**Headers Used**:
- Response: `Content-Type`, `Content-Length`, `Retry-After`
- Request: Standard HTTP headers accepted, none required
---
## Security Considerations
### Authentication
- Service Account credentials secured in environment variable (not in code or config files)
- Credentials never logged or exposed in error messages
- Read-only Drive scope (`drive.readonly`) - no write permissions
### Rate Limiting
- Transparent propagation of Drive API rate limits to client
- No internal rate limiting (rely on Drive API limits)
### Input Validation
- Path validation: Only `/sitemap.xml` accepted
- Method validation: Only `GET` accepted
- No query parameters processed (rejection not required, just ignored)
### Output Sanitization
- All URLs XML-escaped to prevent injection
- All timestamps XML-escaped (though ISO 8601 format doesn't contain XML special chars)
---
## Versioning
**Current Version**: 1.0.0 (initial implementation)
**Future Changes**:
- Breaking changes (new required parameters): Major version bump (2.0.0)
- Backward-compatible additions (query parameters): Minor version bump (1.1.0)
- Bug fixes: Patch version bump (1.0.1)
**Deprecation Policy**:
- Breaking changes include migration guide
- Deprecated features supported for at least one minor version
---
## References
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference
**Deprecation Policy**:
- Breaking changes include migration guide
- Deprecated features supported for at least one minor version
---
## References
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference

View File

@@ -0,0 +1,382 @@
# API Contract: Sitemap XML Endpoint
**Feature**: 001-drive-proxy-adapter
**Contract Type**: HTTP API
**Endpoint**: `/sitemap.xml`
**Version**: 1.0.0
**Date**: 2026-03-07
---
## Endpoint Specification
### `GET /sitemap.xml`
Generate an XML sitemap of all accessible Google Drive documents.
---
## Request
### HTTP Method
`GET`
### URL
`/sitemap.xml`
### Query Parameters
None
### Request Headers
None required
### Request Body
None (GET request)
---
## Response
### Success Response (200 OK)
**Status Code**: `200 OK`
**Response Headers**:
```
Content-Type: application/xml; charset=utf-8
Content-Length: {size_in_bytes}
```
**Response Body** (XML):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/documents/{documentId1}</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://example.com/documents/{documentId2}</loc>
<lastmod>2026-03-06</lastmod>
</url>
<!-- ... up to 50,000 entries -->
</urlset>
```
**XML Schema Requirements**:
- Root element: `<urlset>` with namespace `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
- Each document: `<url>` element containing:
- `<loc>` (REQUIRED): Absolute URL in format `{baseUrl}/documents/{documentId}`
- Must be URL-encoded
- Must escape XML special characters: `&``&amp;`, `<``&lt;`, `>``&gt;`, `"``&quot;`, `'``&apos;`
- `<lastmod>` (OPTIONAL): ISO 8601 date format
- Format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS+00:00`
- Omitted if Drive API provides no `modifiedTime`
**Empty Drive Response** (0 documents):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>
```
**Constraints**:
- Maximum 50,000 `<url>` entries (sitemap protocol limit)
- If >50,000 documents exist, return 413 error instead
---
### Error Responses
#### 404 Not Found
**Trigger**: Request to any endpoint other than `/sitemap.xml`
**Status Code**: `404 Not Found`
**Response Headers**: None
**Response Body**: Empty (no content)
**Example**:
```
GET /documents/abc123 → 404 Not Found (empty body)
GET /api/sitemap → 404 Not Found (empty body)
POST /sitemap.xml → 404 Not Found (empty body)
```
---
#### 413 Payload Too Large
**Trigger**: Google Drive contains more than 50,000 documents
**Status Code**: `413 Payload Too Large`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.
---
#### 429 Too Many Requests
**Trigger**: Google Drive API returns rate limit error
**Status Code**: `429 Too Many Requests`
**Response Headers**:
```
Retry-After: {seconds}
```
**Response Body**: Empty (no content)
**Example**:
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
(empty body)
```
**Rationale**: Client should retry after the specified number of seconds.
---
#### 401 Unauthorized
**Trigger**: Service Account token refresh failed
**Status Code**: `401 Unauthorized`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Authentication failed. Check Service Account credentials configuration.
---
#### 503 Service Unavailable
**Trigger**: Google Drive API returns 503 error
**Status Code**: `503 Service Unavailable`
**Response Headers**: None
**Response Body**: Empty (no content)
**Behavior**: No retries - immediately pass through 503 to client per specification.
---
#### 500 Internal Server Error
**Trigger**: Unexpected error during sitemap generation
**Status Code**: `500 Internal Server Error`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Unexpected server error. Check logs for details.
---
## Examples
### Example 1: Successful Sitemap (3 documents)
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 512
```
---
### Example 2: Empty Drive
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 123
```
---
### Example 3: Rate Limit Exceeded
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 120
```
---
### Example 4: Too Many Documents
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 413 Payload Too Large
```
---
### Example 5: Invalid Endpoint
**Request**:
```http
GET /documents/abc123 HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 404 Not Found
```
---
## Contract Validation
### XML Schema Validation
The sitemap XML MUST validate against the sitemap protocol schema:
- **Namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9`
- **Root element**: `<urlset>`
- **Child elements**: Zero or more `<url>` elements
- **URL elements**: Each contains `<loc>` (required) and `<lastmod>` (optional)
**Validation Tools**:
- XML parser (ensure well-formed XML)
- Sitemap validator: [https://www.xml-sitemaps.com/validate-xml-sitemap.html](https://www.xml-sitemaps.com/validate-xml-sitemap.html)
- XSD schema validation against official sitemap schema
---
### Contract Testing Requirements
All contract tests MUST verify:
1. **Success Path**:
- Response status 200
- Content-Type header is `application/xml; charset=utf-8`
- Response body is valid XML
- XML contains correct namespace
- All `<loc>` URLs are absolute and properly formatted
- All `<loc>` URLs follow pattern: `{baseUrl}/documents/{documentId}`
- All `<lastmod>` dates are valid ISO 8601 format (if present)
2. **Error Handling**:
- Invalid endpoints return 404 with empty body
- >50k documents returns 413 with empty body
- Rate limiting returns 429 with `Retry-After` header and empty body
- Drive API 503 returns 503 with empty body (no retries)
- All error responses have no `Content-Type` header
- All error responses have empty body
3. **Edge Cases**:
- Empty Drive (0 documents) returns valid sitemap with no `<url>` entries
- Documents without `modifiedTime` omit `<lastmod>` tag
- Special characters in document IDs are properly URL-encoded
- XML special characters in URLs are properly escaped
---
## Breaking Changes
Changes that constitute breaking changes (require MAJOR version bump):
1. Changing URL format from `/documents/{id}` to different format
2. Changing XML namespace or root element structure
3. Removing `<lastmod>` field entirely
4. Changing error response status codes
5. Adding required query parameters
6. Changing response Content-Type
---
## References
- [Sitemap Protocol Specification](https://www.sitemaps.org/protocol.html)
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
- [XML Specification](https://www.w3.org/TR/xml/)
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).