Added new feature for document export

This commit is contained in:
2026-03-10 16:25:05 -05:00
parent d477367256
commit 2acb04ad76
11 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,382 @@
# API Contract: Sitemap XML Endpoint
**Feature**: 001-drive-proxy-adapter
**Contract Type**: HTTP API
**Endpoint**: `/sitemap.xml`
**Version**: 1.0.0
**Date**: 2026-03-07
---
## Endpoint Specification
### `GET /sitemap.xml`
Generate an XML sitemap of all accessible Google Drive documents.
---
## Request
### HTTP Method
`GET`
### URL
`/sitemap.xml`
### Query Parameters
None
### Request Headers
None required
### Request Body
None (GET request)
---
## Response
### Success Response (200 OK)
**Status Code**: `200 OK`
**Response Headers**:
```
Content-Type: application/xml; charset=utf-8
Content-Length: {size_in_bytes}
```
**Response Body** (XML):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/documents/{documentId1}</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://example.com/documents/{documentId2}</loc>
<lastmod>2026-03-06</lastmod>
</url>
<!-- ... up to 50,000 entries -->
</urlset>
```
**XML Schema Requirements**:
- Root element: `<urlset>` with namespace `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
- Each document: `<url>` element containing:
- `<loc>` (REQUIRED): Absolute URL in format `{baseUrl}/documents/{documentId}`
- Must be URL-encoded
- Must escape XML special characters: `&``&amp;`, `<``&lt;`, `>``&gt;`, `"``&quot;`, `'``&apos;`
- `<lastmod>` (OPTIONAL): ISO 8601 date format
- Format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS+00:00`
- Omitted if Drive API provides no `modifiedTime`
**Empty Drive Response** (0 documents):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>
```
**Constraints**:
- Maximum 50,000 `<url>` entries (sitemap protocol limit)
- If >50,000 documents exist, return 413 error instead
---
### Error Responses
#### 404 Not Found
**Trigger**: Request to any endpoint other than `/sitemap.xml`
**Status Code**: `404 Not Found`
**Response Headers**: None
**Response Body**: Empty (no content)
**Example**:
```
GET /documents/abc123 → 404 Not Found (empty body)
GET /api/sitemap → 404 Not Found (empty body)
POST /sitemap.xml → 404 Not Found (empty body)
```
---
#### 413 Payload Too Large
**Trigger**: Google Drive contains more than 50,000 documents
**Status Code**: `413 Payload Too Large`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.
---
#### 429 Too Many Requests
**Trigger**: Google Drive API returns rate limit error
**Status Code**: `429 Too Many Requests`
**Response Headers**:
```
Retry-After: {seconds}
```
**Response Body**: Empty (no content)
**Example**:
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
(empty body)
```
**Rationale**: Client should retry after the specified number of seconds.
---
#### 401 Unauthorized
**Trigger**: Service Account token refresh failed
**Status Code**: `401 Unauthorized`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Authentication failed. Check Service Account credentials configuration.
---
#### 503 Service Unavailable
**Trigger**: Google Drive API returns 503 error
**Status Code**: `503 Service Unavailable`
**Response Headers**: None
**Response Body**: Empty (no content)
**Behavior**: No retries - immediately pass through 503 to client per specification.
---
#### 500 Internal Server Error
**Trigger**: Unexpected error during sitemap generation
**Status Code**: `500 Internal Server Error`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Unexpected server error. Check logs for details.
---
## Examples
### Example 1: Successful Sitemap (3 documents)
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 512
```
---
### Example 2: Empty Drive
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 123
```
---
### Example 3: Rate Limit Exceeded
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 120
```
---
### Example 4: Too Many Documents
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 413 Payload Too Large
```
---
### Example 5: Invalid Endpoint
**Request**:
```http
GET /documents/abc123 HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 404 Not Found
```
---
## Contract Validation
### XML Schema Validation
The sitemap XML MUST validate against the sitemap protocol schema:
- **Namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9`
- **Root element**: `<urlset>`
- **Child elements**: Zero or more `<url>` elements
- **URL elements**: Each contains `<loc>` (required) and `<lastmod>` (optional)
**Validation Tools**:
- XML parser (ensure well-formed XML)
- Sitemap validator: [https://www.xml-sitemaps.com/validate-xml-sitemap.html](https://www.xml-sitemaps.com/validate-xml-sitemap.html)
- XSD schema validation against official sitemap schema
---
### Contract Testing Requirements
All contract tests MUST verify:
1. **Success Path**:
- Response status 200
- Content-Type header is `application/xml; charset=utf-8`
- Response body is valid XML
- XML contains correct namespace
- All `<loc>` URLs are absolute and properly formatted
- All `<loc>` URLs follow pattern: `{baseUrl}/documents/{documentId}`
- All `<lastmod>` dates are valid ISO 8601 format (if present)
2. **Error Handling**:
- Invalid endpoints return 404 with empty body
- >50k documents returns 413 with empty body
- Rate limiting returns 429 with `Retry-After` header and empty body
- Drive API 503 returns 503 with empty body (no retries)
- All error responses have no `Content-Type` header
- All error responses have empty body
3. **Edge Cases**:
- Empty Drive (0 documents) returns valid sitemap with no `<url>` entries
- Documents without `modifiedTime` omit `<lastmod>` tag
- Special characters in document IDs are properly URL-encoded
- XML special characters in URLs are properly escaped
---
## Breaking Changes
Changes that constitute breaking changes (require MAJOR version bump):
1. Changing URL format from `/documents/{id}` to different format
2. Changing XML namespace or root element structure
3. Removing `<lastmod>` field entirely
4. Changing error response status codes
5. Adding required query parameters
6. Changing response Content-Type
---
## References
- [Sitemap Protocol Specification](https://www.sitemaps.org/protocol.html)
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
- [XML Specification](https://www.w3.org/TR/xml/)
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).