383 lines
8.2 KiB
Markdown
383 lines
8.2 KiB
Markdown
# API Contract: Sitemap XML Endpoint
|
|
|
|
**Feature**: 001-drive-proxy-adapter
|
|
**Contract Type**: HTTP API
|
|
**Endpoint**: `/sitemap.xml`
|
|
**Version**: 1.0.0
|
|
**Date**: 2026-03-07
|
|
|
|
---
|
|
|
|
## Endpoint Specification
|
|
|
|
### `GET /sitemap.xml`
|
|
|
|
Generate an XML sitemap of all accessible Google Drive documents.
|
|
|
|
---
|
|
|
|
## Request
|
|
|
|
### HTTP Method
|
|
`GET`
|
|
|
|
### URL
|
|
`/sitemap.xml`
|
|
|
|
### Query Parameters
|
|
None
|
|
|
|
### Request Headers
|
|
None required
|
|
|
|
### Request Body
|
|
None (GET request)
|
|
|
|
---
|
|
|
|
## Response
|
|
|
|
### Success Response (200 OK)
|
|
|
|
**Status Code**: `200 OK`
|
|
|
|
**Response Headers**:
|
|
```
|
|
Content-Type: application/xml; charset=utf-8
|
|
Content-Length: {size_in_bytes}
|
|
```
|
|
|
|
**Response Body** (XML):
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
|
<url>
|
|
<loc>http://example.com/documents/{documentId1}</loc>
|
|
<lastmod>2026-03-07</lastmod>
|
|
</url>
|
|
<url>
|
|
<loc>http://example.com/documents/{documentId2}</loc>
|
|
<lastmod>2026-03-06</lastmod>
|
|
</url>
|
|
<!-- ... up to 50,000 entries -->
|
|
</urlset>
|
|
```
|
|
|
|
**XML Schema Requirements**:
|
|
- Root element: `<urlset>` with namespace `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
|
|
- Each document: `<url>` element containing:
|
|
- `<loc>` (REQUIRED): Absolute URL in format `{baseUrl}/documents/{documentId}`
|
|
- Must be URL-encoded
|
|
- Must escape XML special characters: `&` → `&`, `<` → `<`, `>` → `>`, `"` → `"`, `'` → `'`
|
|
- `<lastmod>` (OPTIONAL): ISO 8601 date format
|
|
- Format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS+00:00`
|
|
- Omitted if Drive API provides no `modifiedTime`
|
|
|
|
**Empty Drive Response** (0 documents):
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
|
</urlset>
|
|
```
|
|
|
|
**Constraints**:
|
|
- Maximum 50,000 `<url>` entries (sitemap protocol limit)
|
|
- If >50,000 documents exist, return 413 error instead
|
|
|
|
---
|
|
|
|
### Error Responses
|
|
|
|
#### 404 Not Found
|
|
|
|
**Trigger**: Request to any endpoint other than `/sitemap.xml`
|
|
|
|
**Status Code**: `404 Not Found`
|
|
|
|
**Response Headers**: None
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Example**:
|
|
```
|
|
GET /documents/abc123 → 404 Not Found (empty body)
|
|
GET /api/sitemap → 404 Not Found (empty body)
|
|
POST /sitemap.xml → 404 Not Found (empty body)
|
|
```
|
|
|
|
---
|
|
|
|
#### 413 Payload Too Large
|
|
|
|
**Trigger**: Google Drive contains more than 50,000 documents
|
|
|
|
**Status Code**: `413 Payload Too Large`
|
|
|
|
**Response Headers**: None
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Rationale**: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.
|
|
|
|
---
|
|
|
|
#### 429 Too Many Requests
|
|
|
|
**Trigger**: Google Drive API returns rate limit error
|
|
|
|
**Status Code**: `429 Too Many Requests`
|
|
|
|
**Response Headers**:
|
|
```
|
|
Retry-After: {seconds}
|
|
```
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Example**:
|
|
```
|
|
HTTP/1.1 429 Too Many Requests
|
|
Retry-After: 60
|
|
|
|
(empty body)
|
|
```
|
|
|
|
**Rationale**: Client should retry after the specified number of seconds.
|
|
|
|
---
|
|
|
|
#### 401 Unauthorized
|
|
|
|
**Trigger**: Service Account token refresh failed
|
|
|
|
**Status Code**: `401 Unauthorized`
|
|
|
|
**Response Headers**: None
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Rationale**: Authentication failed. Check Service Account credentials configuration.
|
|
|
|
---
|
|
|
|
#### 503 Service Unavailable
|
|
|
|
**Trigger**: Google Drive API returns 503 error
|
|
|
|
**Status Code**: `503 Service Unavailable`
|
|
|
|
**Response Headers**: None
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Behavior**: No retries - immediately pass through 503 to client per specification.
|
|
|
|
---
|
|
|
|
#### 500 Internal Server Error
|
|
|
|
**Trigger**: Unexpected error during sitemap generation
|
|
|
|
**Status Code**: `500 Internal Server Error`
|
|
|
|
**Response Headers**: None
|
|
|
|
**Response Body**: Empty (no content)
|
|
|
|
**Rationale**: Unexpected server error. Check logs for details.
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Example 1: Successful Sitemap (3 documents)
|
|
|
|
**Request**:
|
|
```http
|
|
GET /sitemap.xml HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Response**:
|
|
```http
|
|
HTTP/1.1 200 OK
|
|
Content-Type: application/xml; charset=utf-8
|
|
Content-Length: 512
|
|
|
|
```
|
|
|
|
---
|
|
|
|
### Example 2: Empty Drive
|
|
|
|
**Request**:
|
|
```http
|
|
GET /sitemap.xml HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Response**:
|
|
```http
|
|
HTTP/1.1 200 OK
|
|
Content-Type: application/xml; charset=utf-8
|
|
Content-Length: 123
|
|
|
|
```
|
|
|
|
---
|
|
|
|
### Example 3: Rate Limit Exceeded
|
|
|
|
**Request**:
|
|
```http
|
|
GET /sitemap.xml HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Response**:
|
|
```http
|
|
HTTP/1.1 429 Too Many Requests
|
|
Retry-After: 120
|
|
|
|
```
|
|
|
|
---
|
|
|
|
### Example 4: Too Many Documents
|
|
|
|
**Request**:
|
|
```http
|
|
GET /sitemap.xml HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Response**:
|
|
```http
|
|
HTTP/1.1 413 Payload Too Large
|
|
|
|
```
|
|
|
|
---
|
|
|
|
### Example 5: Invalid Endpoint
|
|
|
|
**Request**:
|
|
```http
|
|
GET /documents/abc123 HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Response**:
|
|
```http
|
|
HTTP/1.1 404 Not Found
|
|
|
|
```
|
|
|
|
---
|
|
|
|
## Contract Validation
|
|
|
|
### XML Schema Validation
|
|
|
|
The sitemap XML MUST validate against the sitemap protocol schema:
|
|
- **Namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9`
|
|
- **Root element**: `<urlset>`
|
|
- **Child elements**: Zero or more `<url>` elements
|
|
- **URL elements**: Each contains `<loc>` (required) and `<lastmod>` (optional)
|
|
|
|
**Validation Tools**:
|
|
- XML parser (ensure well-formed XML)
|
|
- Sitemap validator: [https://www.xml-sitemaps.com/validate-xml-sitemap.html](https://www.xml-sitemaps.com/validate-xml-sitemap.html)
|
|
- XSD schema validation against official sitemap schema
|
|
|
|
---
|
|
|
|
### Contract Testing Requirements
|
|
|
|
All contract tests MUST verify:
|
|
|
|
1. **Success Path**:
|
|
- Response status 200
|
|
- Content-Type header is `application/xml; charset=utf-8`
|
|
- Response body is valid XML
|
|
- XML contains correct namespace
|
|
- All `<loc>` URLs are absolute and properly formatted
|
|
- All `<loc>` URLs follow pattern: `{baseUrl}/documents/{documentId}`
|
|
- All `<lastmod>` dates are valid ISO 8601 format (if present)
|
|
|
|
2. **Error Handling**:
|
|
- Invalid endpoints return 404 with empty body
|
|
- >50k documents returns 413 with empty body
|
|
- Rate limiting returns 429 with `Retry-After` header and empty body
|
|
- Drive API 503 returns 503 with empty body (no retries)
|
|
- All error responses have no `Content-Type` header
|
|
- All error responses have empty body
|
|
|
|
3. **Edge Cases**:
|
|
- Empty Drive (0 documents) returns valid sitemap with no `<url>` entries
|
|
- Documents without `modifiedTime` omit `<lastmod>` tag
|
|
- Special characters in document IDs are properly URL-encoded
|
|
- XML special characters in URLs are properly escaped
|
|
|
|
---
|
|
|
|
## Breaking Changes
|
|
|
|
Changes that constitute breaking changes (require MAJOR version bump):
|
|
|
|
1. Changing URL format from `/documents/{id}` to different format
|
|
2. Changing XML namespace or root element structure
|
|
3. Removing `<lastmod>` field entirely
|
|
4. Changing error response status codes
|
|
5. Adding required query parameters
|
|
6. Changing response Content-Type
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Sitemap Protocol Specification](https://www.sitemaps.org/protocol.html)
|
|
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
|
|
- [XML Specification](https://www.w3.org/TR/xml/)
|
|
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0.0 | 2026-03-07 | Initial contract specification |
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
|
|
|
|
1. **Request/response formats** with examples
|
|
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
|
|
3. **XML schema requirements** for sitemap format
|
|
4. **Validation criteria** for contract testing
|
|
5. **Breaking change policy** for version management
|
|
|
|
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
|