Added new feature for document export
This commit is contained in:
382
specs/001-sitemap/contracts/sitemap-xml-schema.md
Normal file
382
specs/001-sitemap/contracts/sitemap-xml-schema.md
Normal file
@@ -0,0 +1,382 @@
|
||||
# API Contract: Sitemap XML Endpoint
|
||||
|
||||
**Feature**: 001-drive-proxy-adapter
|
||||
**Contract Type**: HTTP API
|
||||
**Endpoint**: `/sitemap.xml`
|
||||
**Version**: 1.0.0
|
||||
**Date**: 2026-03-07
|
||||
|
||||
---
|
||||
|
||||
## Endpoint Specification
|
||||
|
||||
### `GET /sitemap.xml`
|
||||
|
||||
Generate an XML sitemap of all accessible Google Drive documents.
|
||||
|
||||
---
|
||||
|
||||
## Request
|
||||
|
||||
### HTTP Method
|
||||
`GET`
|
||||
|
||||
### URL
|
||||
`/sitemap.xml`
|
||||
|
||||
### Query Parameters
|
||||
None
|
||||
|
||||
### Request Headers
|
||||
None required
|
||||
|
||||
### Request Body
|
||||
None (GET request)
|
||||
|
||||
---
|
||||
|
||||
## Response
|
||||
|
||||
### Success Response (200 OK)
|
||||
|
||||
**Status Code**: `200 OK`
|
||||
|
||||
**Response Headers**:
|
||||
```
|
||||
Content-Type: application/xml; charset=utf-8
|
||||
Content-Length: {size_in_bytes}
|
||||
```
|
||||
|
||||
**Response Body** (XML):
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
<url>
|
||||
<loc>http://example.com/documents/{documentId1}</loc>
|
||||
<lastmod>2026-03-07</lastmod>
|
||||
</url>
|
||||
<url>
|
||||
<loc>http://example.com/documents/{documentId2}</loc>
|
||||
<lastmod>2026-03-06</lastmod>
|
||||
</url>
|
||||
<!-- ... up to 50,000 entries -->
|
||||
</urlset>
|
||||
```
|
||||
|
||||
**XML Schema Requirements**:
|
||||
- Root element: `<urlset>` with namespace `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
|
||||
- Each document: `<url>` element containing:
|
||||
- `<loc>` (REQUIRED): Absolute URL in format `{baseUrl}/documents/{documentId}`
|
||||
- Must be URL-encoded
|
||||
- Must escape XML special characters: `&` → `&`, `<` → `<`, `>` → `>`, `"` → `"`, `'` → `'`
|
||||
- `<lastmod>` (OPTIONAL): ISO 8601 date format
|
||||
- Format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS+00:00`
|
||||
- Omitted if Drive API provides no `modifiedTime`
|
||||
|
||||
**Empty Drive Response** (0 documents):
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
||||
</urlset>
|
||||
```
|
||||
|
||||
**Constraints**:
|
||||
- Maximum 50,000 `<url>` entries (sitemap protocol limit)
|
||||
- If >50,000 documents exist, return 413 error instead
|
||||
|
||||
---
|
||||
|
||||
### Error Responses
|
||||
|
||||
#### 404 Not Found
|
||||
|
||||
**Trigger**: Request to any endpoint other than `/sitemap.xml`
|
||||
|
||||
**Status Code**: `404 Not Found`
|
||||
|
||||
**Response Headers**: None
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Example**:
|
||||
```
|
||||
GET /documents/abc123 → 404 Not Found (empty body)
|
||||
GET /api/sitemap → 404 Not Found (empty body)
|
||||
POST /sitemap.xml → 404 Not Found (empty body)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### 413 Payload Too Large
|
||||
|
||||
**Trigger**: Google Drive contains more than 50,000 documents
|
||||
|
||||
**Status Code**: `413 Payload Too Large`
|
||||
|
||||
**Response Headers**: None
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Rationale**: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.
|
||||
|
||||
---
|
||||
|
||||
#### 429 Too Many Requests
|
||||
|
||||
**Trigger**: Google Drive API returns rate limit error
|
||||
|
||||
**Status Code**: `429 Too Many Requests`
|
||||
|
||||
**Response Headers**:
|
||||
```
|
||||
Retry-After: {seconds}
|
||||
```
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Example**:
|
||||
```
|
||||
HTTP/1.1 429 Too Many Requests
|
||||
Retry-After: 60
|
||||
|
||||
(empty body)
|
||||
```
|
||||
|
||||
**Rationale**: Client should retry after the specified number of seconds.
|
||||
|
||||
---
|
||||
|
||||
#### 401 Unauthorized
|
||||
|
||||
**Trigger**: Service Account token refresh failed
|
||||
|
||||
**Status Code**: `401 Unauthorized`
|
||||
|
||||
**Response Headers**: None
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Rationale**: Authentication failed. Check Service Account credentials configuration.
|
||||
|
||||
---
|
||||
|
||||
#### 503 Service Unavailable
|
||||
|
||||
**Trigger**: Google Drive API returns 503 error
|
||||
|
||||
**Status Code**: `503 Service Unavailable`
|
||||
|
||||
**Response Headers**: None
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Behavior**: No retries - immediately pass through 503 to client per specification.
|
||||
|
||||
---
|
||||
|
||||
#### 500 Internal Server Error
|
||||
|
||||
**Trigger**: Unexpected error during sitemap generation
|
||||
|
||||
**Status Code**: `500 Internal Server Error`
|
||||
|
||||
**Response Headers**: None
|
||||
|
||||
**Response Body**: Empty (no content)
|
||||
|
||||
**Rationale**: Unexpected server error. Check logs for details.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Successful Sitemap (3 documents)
|
||||
|
||||
**Request**:
|
||||
```http
|
||||
GET /sitemap.xml HTTP/1.1
|
||||
Host: example.com
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```http
|
||||
HTTP/1.1 200 OK
|
||||
Content-Type: application/xml; charset=utf-8
|
||||
Content-Length: 512
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 2: Empty Drive
|
||||
|
||||
**Request**:
|
||||
```http
|
||||
GET /sitemap.xml HTTP/1.1
|
||||
Host: example.com
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```http
|
||||
HTTP/1.1 200 OK
|
||||
Content-Type: application/xml; charset=utf-8
|
||||
Content-Length: 123
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 3: Rate Limit Exceeded
|
||||
|
||||
**Request**:
|
||||
```http
|
||||
GET /sitemap.xml HTTP/1.1
|
||||
Host: example.com
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```http
|
||||
HTTP/1.1 429 Too Many Requests
|
||||
Retry-After: 120
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 4: Too Many Documents
|
||||
|
||||
**Request**:
|
||||
```http
|
||||
GET /sitemap.xml HTTP/1.1
|
||||
Host: example.com
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```http
|
||||
HTTP/1.1 413 Payload Too Large
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 5: Invalid Endpoint
|
||||
|
||||
**Request**:
|
||||
```http
|
||||
GET /documents/abc123 HTTP/1.1
|
||||
Host: example.com
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```http
|
||||
HTTP/1.1 404 Not Found
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contract Validation
|
||||
|
||||
### XML Schema Validation
|
||||
|
||||
The sitemap XML MUST validate against the sitemap protocol schema:
|
||||
- **Namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9`
|
||||
- **Root element**: `<urlset>`
|
||||
- **Child elements**: Zero or more `<url>` elements
|
||||
- **URL elements**: Each contains `<loc>` (required) and `<lastmod>` (optional)
|
||||
|
||||
**Validation Tools**:
|
||||
- XML parser (ensure well-formed XML)
|
||||
- Sitemap validator: [https://www.xml-sitemaps.com/validate-xml-sitemap.html](https://www.xml-sitemaps.com/validate-xml-sitemap.html)
|
||||
- XSD schema validation against official sitemap schema
|
||||
|
||||
---
|
||||
|
||||
### Contract Testing Requirements
|
||||
|
||||
All contract tests MUST verify:
|
||||
|
||||
1. **Success Path**:
|
||||
- Response status 200
|
||||
- Content-Type header is `application/xml; charset=utf-8`
|
||||
- Response body is valid XML
|
||||
- XML contains correct namespace
|
||||
- All `<loc>` URLs are absolute and properly formatted
|
||||
- All `<loc>` URLs follow pattern: `{baseUrl}/documents/{documentId}`
|
||||
- All `<lastmod>` dates are valid ISO 8601 format (if present)
|
||||
|
||||
2. **Error Handling**:
|
||||
- Invalid endpoints return 404 with empty body
|
||||
- >50k documents returns 413 with empty body
|
||||
- Rate limiting returns 429 with `Retry-After` header and empty body
|
||||
- Drive API 503 returns 503 with empty body (no retries)
|
||||
- All error responses have no `Content-Type` header
|
||||
- All error responses have empty body
|
||||
|
||||
3. **Edge Cases**:
|
||||
- Empty Drive (0 documents) returns valid sitemap with no `<url>` entries
|
||||
- Documents without `modifiedTime` omit `<lastmod>` tag
|
||||
- Special characters in document IDs are properly URL-encoded
|
||||
- XML special characters in URLs are properly escaped
|
||||
|
||||
---
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
Changes that constitute breaking changes (require MAJOR version bump):
|
||||
|
||||
1. Changing URL format from `/documents/{id}` to different format
|
||||
2. Changing XML namespace or root element structure
|
||||
3. Removing `<lastmod>` field entirely
|
||||
4. Changing error response status codes
|
||||
5. Adding required query parameters
|
||||
6. Changing response Content-Type
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Sitemap Protocol Specification](https://www.sitemaps.org/protocol.html)
|
||||
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
|
||||
- [XML Specification](https://www.w3.org/TR/xml/)
|
||||
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0.0 | 2026-03-07 | Initial contract specification |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
|
||||
|
||||
1. **Request/response formats** with examples
|
||||
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
|
||||
3. **XML schema requirements** for sitemap format
|
||||
4. **Validation criteria** for contract testing
|
||||
5. **Breaking change policy** for version management
|
||||
|
||||
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
|
||||
|
||||
| Version | Date | Changes |
|
||||
|---------|------|---------|
|
||||
| 1.0.0 | 2026-03-07 | Initial contract specification |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
|
||||
|
||||
1. **Request/response formats** with examples
|
||||
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
|
||||
3. **XML schema requirements** for sitemap format
|
||||
4. **Validation criteria** for contract testing
|
||||
5. **Breaking change policy** for version management
|
||||
|
||||
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
|
||||
Reference in New Issue
Block a user