Files
google-drive-content-adapter/specs/001-drive-proxy-adapter/contracts/sitemap-xml-schema.md

8.2 KiB

API Contract: Sitemap XML Endpoint

Feature: 001-drive-proxy-adapter
Contract Type: HTTP API
Endpoint: /sitemap.xml
Version: 1.0.0
Date: 2026-03-07


Endpoint Specification

GET /sitemap.xml

Generate an XML sitemap of all accessible Google Drive documents.


Request

HTTP Method

GET

URL

/sitemap.xml

Query Parameters

None

Request Headers

None required

Request Body

None (GET request)


Response

Success Response (200 OK)

Status Code: 200 OK

Response Headers:

Content-Type: application/xml; charset=utf-8
Content-Length: {size_in_bytes}

Response Body (XML):

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://example.com/documents/{documentId1}</loc>
    <lastmod>2026-03-07</lastmod>
  </url>
  <url>
    <loc>http://example.com/documents/{documentId2}</loc>
    <lastmod>2026-03-06</lastmod>
  </url>
  <!-- ... up to 50,000 entries -->
</urlset>

XML Schema Requirements:

  • Root element: <urlset> with namespace xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  • Each document: <url> element containing:
    • <loc> (REQUIRED): Absolute URL in format {baseUrl}/documents/{documentId}
      • Must be URL-encoded
      • Must escape XML special characters: &&amp;, <&lt;, >&gt;, "&quot;, '&apos;
    • <lastmod> (OPTIONAL): ISO 8601 date format
      • Format: YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+00:00
      • Omitted if Drive API provides no modifiedTime

Empty Drive Response (0 documents):

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>

Constraints:

  • Maximum 50,000 <url> entries (sitemap protocol limit)
  • If >50,000 documents exist, return 413 error instead

Error Responses

404 Not Found

Trigger: Request to any endpoint other than /sitemap.xml

Status Code: 404 Not Found

Response Headers: None

Response Body: Empty (no content)

Example:

GET /documents/abc123 → 404 Not Found (empty body)
GET /api/sitemap → 404 Not Found (empty body)
POST /sitemap.xml → 404 Not Found (empty body)

413 Payload Too Large

Trigger: Google Drive contains more than 50,000 documents

Status Code: 413 Payload Too Large

Response Headers: None

Response Body: Empty (no content)

Rationale: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.


429 Too Many Requests

Trigger: Google Drive API returns rate limit error

Status Code: 429 Too Many Requests

Response Headers:

Retry-After: {seconds}

Response Body: Empty (no content)

Example:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

(empty body)

Rationale: Client should retry after the specified number of seconds.


401 Unauthorized

Trigger: Service Account token refresh failed

Status Code: 401 Unauthorized

Response Headers: None

Response Body: Empty (no content)

Rationale: Authentication failed. Check Service Account credentials configuration.


503 Service Unavailable

Trigger: Google Drive API returns 503 error

Status Code: 503 Service Unavailable

Response Headers: None

Response Body: Empty (no content)

Behavior: No retries - immediately pass through 503 to client per specification.


500 Internal Server Error

Trigger: Unexpected error during sitemap generation

Status Code: 500 Internal Server Error

Response Headers: None

Response Body: Empty (no content)

Rationale: Unexpected server error. Check logs for details.


Examples

Example 1: Successful Sitemap (3 documents)

Request:

GET /sitemap.xml HTTP/1.1
Host: example.com

Response:

HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 512

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://example.com/documents/1A2B3C4D5E6F7G8H</loc>
    <lastmod>2026-03-07</lastmod>
  </url>
  <url>
    <loc>http://example.com/documents/9I0J1K2L3M4N5O6P</loc>
    <lastmod>2026-03-05</lastmod>
  </url>
  <url>
    <loc>http://example.com/documents/7Q8R9S0T1U2V3W4X</loc>
    <lastmod>2026-03-01</lastmod>
  </url>
</urlset>

Example 2: Empty Drive

Request:

GET /sitemap.xml HTTP/1.1
Host: example.com

Response:

HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 123

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>

Example 3: Rate Limit Exceeded

Request:

GET /sitemap.xml HTTP/1.1
Host: example.com

Response:

HTTP/1.1 429 Too Many Requests
Retry-After: 120


Example 4: Too Many Documents

Request:

GET /sitemap.xml HTTP/1.1
Host: example.com

Response:

HTTP/1.1 413 Payload Too Large


Example 5: Invalid Endpoint

Request:

GET /documents/abc123 HTTP/1.1
Host: example.com

Response:

HTTP/1.1 404 Not Found


Contract Validation

XML Schema Validation

The sitemap XML MUST validate against the sitemap protocol schema:

  • Namespace: http://www.sitemaps.org/schemas/sitemap/0.9
  • Root element: <urlset>
  • Child elements: Zero or more <url> elements
  • URL elements: Each contains <loc> (required) and <lastmod> (optional)

Validation Tools:


Contract Testing Requirements

All contract tests MUST verify:

  1. Success Path:

    • Response status 200
    • Content-Type header is application/xml; charset=utf-8
    • Response body is valid XML
    • XML contains correct namespace
    • All <loc> URLs are absolute and properly formatted
    • All <loc> URLs follow pattern: {baseUrl}/documents/{documentId}
    • All <lastmod> dates are valid ISO 8601 format (if present)
  2. Error Handling:

    • Invalid endpoints return 404 with empty body
    • 50k documents returns 413 with empty body

    • Rate limiting returns 429 with Retry-After header and empty body
    • Drive API 503 returns 503 with empty body (no retries)
    • All error responses have no Content-Type header
    • All error responses have empty body
  3. Edge Cases:

    • Empty Drive (0 documents) returns valid sitemap with no <url> entries
    • Documents without modifiedTime omit <lastmod> tag
    • Special characters in document IDs are properly URL-encoded
    • XML special characters in URLs are properly escaped

Breaking Changes

Changes that constitute breaking changes (require MAJOR version bump):

  1. Changing URL format from /documents/{id} to different format
  2. Changing XML namespace or root element structure
  3. Removing <lastmod> field entirely
  4. Changing error response status codes
  5. Adding required query parameters
  6. Changing response Content-Type

References


Version History

Version Date Changes
1.0.0 2026-03-07 Initial contract specification

Summary

This contract defines the complete API specification for the /sitemap.xml endpoint, including:

  1. Request/response formats with examples
  2. Error handling with all status codes (404, 413, 429, 401, 503, 500)
  3. XML schema requirements for sitemap format
  4. Validation criteria for contract testing
  5. Breaking change policy for version management

All error responses follow the spec requirement: status code only, no response body (except 429 which includes Retry-After header).