Files

11 KiB

API Contract: Sitemap Endpoint

Feature: 001-drive-proxy-adapter
Date: 2026-03-07
Phase: 1 - Design & Contracts
Endpoint: GET /sitemap.xml

Overview

The /sitemap.xml endpoint returns an XML sitemap listing all Google Drive documents accessible to the Service Account. This is the only endpoint exposed by the adapter.


Endpoint Definition

URL

GET /sitemap.xml

Authentication

  • Method: None (endpoint is public)
  • Backend Authentication: Service Account JWT to Google Drive API (transparent to client)
  • Credentials: Loaded from GOOGLE_SERVICE_ACCOUNT_KEY environment variable

Request

Method: GET

Headers:

  • None required

Query Parameters:

  • None supported

Request Body:

  • None (GET request)

Example Request:

GET /sitemap.xml HTTP/1.1
Host: adapter.example.com
User-Agent: Mozilla/5.0

Response Specifications

Success Response (200 OK)

Status Code: 200 OK

Headers:

  • Content-Type: application/xml
  • Content-Length: {size_in_bytes}

Body: Valid XML sitemap conforming to sitemap protocol

XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://adapter.example.com/documents/{documentId}</loc>
    <lastmod>2026-03-06T10:30:00.000Z</lastmod>
  </url>
  <!-- Additional <url> entries (up to 50,000) -->
</urlset>

Field Descriptions:

  • <urlset>: Root element with sitemap namespace
  • <url>: Individual URL entry (0 to 50,000 entries)
  • <loc>: Absolute URL to document using RESTful format /documents/{documentId}
  • <lastmod>: ISO 8601 timestamp of last document modification

Constraints:

  • Maximum 50,000 <url> entries (sitemap protocol limit per spec.md FR-015)
  • Maximum 50MB uncompressed (protocol limit, not enforced)
  • All <loc> URLs use same base URL (configured via BASE_URL env var)
  • All <loc> URLs use RESTful path format: /documents/{documentId}

Example Response:

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 4582

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://adapter.example.com/documents/1BxAA_example123</loc>
    <lastmod>2026-03-06T10:30:00.000Z</lastmod>
  </url>
  <url>
    <loc>https://adapter.example.com/documents/1CyBB_example456</loc>
    <lastmod>2026-03-05T14:20:00.000Z</lastmod>
  </url>
  <url>
    <loc>https://adapter.example.com/documents/1DzCC_example789</loc>
    <lastmod>2026-03-04T08:15:00.000Z</lastmod>
  </url>
</urlset>

Performance Targets (from spec.md success criteria):

  • Response time: < 5 seconds for up to 10,000 documents
  • Memory usage: < 256MB under normal load
  • Concurrent requests: Support 10 concurrent requests without degradation

Not Found Response (404)

Status Code: 404 Not Found

Headers: None

Body: Empty (per spec.md clarification: "HTTP status code only, no error response body")

When Returned:

  • Any path other than /sitemap.xml (per spec.md FR-007)

Example Response:

HTTP/1.1 404 Not Found


Unauthorized Response (401)

Status Code: 401 Unauthorized

Headers: None

Body: Empty (per spec.md clarification: "HTTP status code only, no error response body")

When Returned:

  • Service Account JWT authentication failed (per spec.md FR-010)
  • OAuth token refresh failed
  • Invalid Service Account credentials

Example Response:

HTTP/1.1 401 Unauthorized

Client Action: Check Service Account credentials in GOOGLE_SERVICE_ACCOUNT_KEY environment variable


Rate Limited Response (429)

Status Code: 429 Too Many Requests

Headers:

  • Retry-After: {seconds} (integer, seconds until retry allowed)

Body: Empty (per spec.md clarification: "HTTP status code only, no error response body")

When Returned:

  • Google Drive API rate limit exceeded (per spec.md FR-013)
  • Quota exhausted for Service Account

Example Response:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Client Action: Wait Retry-After seconds before retrying request

Retry-After Values:

  • Derived from Google Drive API Retry-After header if available
  • Default: 60 seconds if not specified by Drive API

Internal Server Error (500)

Status Code: 500 Internal Server Error

Headers: None

Body: Empty (per spec.md clarification: "HTTP status code only, no error response body")

When Returned:

  • Unexpected server error (per spec.md FR-008)
  • Configuration error (missing environment variables)
  • XML generation failure

Example Response:

HTTP/1.1 500 Internal Server Error

Client Action: Report error to adapter administrator

Server Logging: All 500 errors logged with stack trace to stderr (per spec.md FR-012)


Service Unavailable Response (503)

Status Code: 503 Service Unavailable

Headers: None

Body: Empty (per spec.md clarification: "HTTP status code only, no error response body")

When Returned:

  • Google Drive API unavailable (per spec.md FR-017)
  • Drive API returns 503 status (no retries per spec clarification)

Example Response:

HTTP/1.1 503 Service Unavailable

Client Action: Retry request later (Drive API temporarily unavailable)

Retry Behavior: Adapter does NOT retry Drive API 503 errors; immediately returns 503 to client (per spec.md FR-017 clarification)


Error Handling Specification

Error Response Format

All error responses follow same pattern:

  • Status code indicates error type
  • No response body (per spec.md clarification)
  • Minimal headers (only Retry-After for 429)

Rationale: Simplicity, consistency, fail-fast approach

Error Status Code Matrix

Error Condition Status Code Headers Body Retry?
Authentication failed 401 None Empty No (fix credentials)
Rate limit exceeded 429 Retry-After Empty Yes (after delay)
Drive API unavailable 503 None Empty Yes (later)
Internal error 500 None Empty No (report to admin)
Path not found 404 None Empty No

Logging Specification

Request Logging (stdout)

All requests logged with:

  • Timestamp (ISO 8601)
  • HTTP method and path
  • Response status code
  • Response time (milliseconds)

Example:

[2026-03-07T14:30:15.456Z] GET /sitemap.xml -> 200 (1234ms)
[2026-03-07T14:30:20.789Z] GET /sitemap.xml -> 429 (234ms)
[2026-03-07T14:30:25.012Z] GET /invalid.xml -> 404 (1ms)

Error Logging (stderr)

All errors logged with:

  • Timestamp (ISO 8601)
  • Request ID (for correlation)
  • Error message
  • Stack trace (for 500 errors)

Example:

[2026-03-07T14:30:20.789Z] [ERROR] Rate limit exceeded: Drive API quota exhausted
[2026-03-07T14:30:25.012Z] [ERROR] Authentication failed: Invalid Service Account key
[2026-03-07T14:30:30.345Z] [ERROR] Drive API unavailable: Connection timeout

Contract Tests

Test Scenarios

  1. Successful sitemap generation

    • Request: GET /sitemap.xml
    • Expected: 200 status, valid XML, Content-Type: application/xml
  2. Not found for other paths

    • Request: GET /invalid.xml
    • Expected: 404 status, empty body
  3. Rate limiting

    • Simulate Drive API 429 response
    • Expected: 429 status, Retry-After header, empty body
  4. Authentication failure

    • Simulate invalid credentials
    • Expected: 401 status, empty body
  5. Service unavailable

    • Simulate Drive API 503 response
    • Expected: 503 status, empty body (no retries)
  6. XML schema validation

    • Request: GET /sitemap.xml
    • Validate XML against sitemap protocol schema
  7. URL format validation

    • Request: GET /sitemap.xml
    • Verify all <loc> URLs use /documents/{documentId} format

Test Assertions

XML Schema Validation:

  • Root element: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  • Each <url> has required <loc> child
  • Each <lastmod> is valid ISO 8601 timestamp
  • Maximum 50,000 <url> entries

URL Format Validation:

  • All <loc> URLs are absolute (start with http:// or https://)
  • All <loc> URLs use RESTful format: {baseUrl}/documents/{documentId}
  • Document IDs match regex: ^[a-zA-Z0-9_-]+$

Header Validation:

  • 200 responses include Content-Type: application/xml
  • 429 responses include Retry-After header with integer value
  • All error responses have empty body

Configuration

Environment Variables

Variable Required Default Description
GOOGLE_SERVICE_ACCOUNT_KEY Yes None Inline JSON of Service Account key file
BASE_URL Yes None Base URL for sitemap links (e.g., https://adapter.example.com)
PORT No 3000 HTTP server port

Example .env:

GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@developer.gserviceaccount.com",...}'
BASE_URL=https://adapter.example.com
PORT=3000

Compatibility

Sitemap Protocol Compliance

Protocol: https://www.sitemaps.org/protocol.html

Compliance:

  • Valid XML with namespace
  • <loc> with absolute URLs
  • <lastmod> with W3C Datetime format (ISO 8601)
  • Maximum 50,000 URLs
  • Maximum 50MB uncompressed size

Optional Elements Not Used:

  • <changefreq>: Not applicable (no historical change data)
  • <priority>: Not applicable (all documents equal priority)

HTTP Compliance

HTTP Version: HTTP/1.1

Methods Supported: GET only

Status Codes Used: 200, 401, 404, 429, 500, 503

Headers Used:

  • Response: Content-Type, Content-Length, Retry-After
  • Request: Standard HTTP headers accepted, none required

Security Considerations

Authentication

  • Service Account credentials secured in environment variable (not in code or config files)
  • Credentials never logged or exposed in error messages
  • Read-only Drive scope (drive.readonly) - no write permissions

Rate Limiting

  • Transparent propagation of Drive API rate limits to client
  • No internal rate limiting (rely on Drive API limits)

Input Validation

  • Path validation: Only /sitemap.xml accepted
  • Method validation: Only GET accepted
  • No query parameters processed (rejection not required, just ignored)

Output Sanitization

  • All URLs XML-escaped to prevent injection
  • All timestamps XML-escaped (though ISO 8601 format doesn't contain XML special chars)

Versioning

Current Version: 1.0.0 (initial implementation)

Future Changes:

  • Breaking changes (new required parameters): Major version bump (2.0.0)
  • Backward-compatible additions (query parameters): Minor version bump (1.1.0)
  • Bug fixes: Patch version bump (1.0.1)

Deprecation Policy:

  • Breaking changes include migration guide
  • Deprecated features supported for at least one minor version

References