Added new feature for document export

This commit is contained in:
2026-03-10 16:25:05 -05:00
parent d477367256
commit 2acb04ad76
11 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,77 @@
# Specification Quality Checklist: Google Drive HTTP Proxy Adapter
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-03-06
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Validation Notes
### Content Quality Review
- ✅ Specification avoids implementation details (no mention of specific npm packages, frameworks beyond Node.js requirement from constitution)
- ✅ Focus is on user capabilities (HTTP requests, document export, sitemap generation)
- ✅ Language is accessible to non-developers (clear descriptions of HTTP endpoints and document formats)
- ✅ All sections (User Scenarios, Requirements, Success Criteria, Assumptions, Out of Scope) are complete
### Requirement Completeness Review
- ✅ No [NEEDS CLARIFICATION] markers present - all requirements are fully specified
- ✅ Requirements are testable:
- FR-001 through FR-020 can all be verified through automated tests
- Each functional requirement specifies a MUST condition that is verifiable
- ✅ Success criteria are measurable with specific metrics:
- SC-001: 5 seconds for 10,000 documents
- SC-002: 3 seconds for <1MB documents
- SC-003: 100 concurrent requests
- SC-004 through SC-010: All have quantifiable targets
- ✅ Success criteria avoid implementation details (focus on timing, throughput, quality metrics)
- ✅ Acceptance scenarios follow Given-When-Then format with clear conditions
- ✅ Edge cases comprehensive (10 scenarios covering errors, permissions, formats, scale)
- ✅ Scope clearly bounded with Assumptions and Out of Scope sections
- ✅ Dependencies on Google Drive API and OAuth 2.0 explicitly stated
### Feature Readiness Review
- ✅ Each functional requirement (FR-001 through FR-020) maps to acceptance scenarios in user stories
- ✅ Three user stories cover complete functionality:
- P1: Core document export (foundational value)
- P2: Sitemap generation (discovery mechanism)
- P3: Multiple formats (enhancement)
- ✅ Success criteria SC-001 through SC-010 provide clear quality gates
- ✅ Implementation details appropriately deferred (no database choices, no framework selection beyond constitution's Node.js requirement, no API route implementation specifics)
## Overall Assessment
**Status**: ✅ **PASS** - Specification is complete and ready for `/speckit.plan`
The specification successfully:
1. Defines three independently testable user stories with clear priorities
2. Provides 20 concrete functional requirements
3. Establishes 10 measurable success criteria
4. Identifies comprehensive edge cases and assumptions
5. Clearly bounds scope with explicit Out of Scope section
6. Maintains technology-agnostic language while aligning with constitution's Node.js requirement
**Recommendation**: Proceed to planning phase with `/speckit.plan` command.

View File

@@ -0,0 +1,290 @@
openapi: 3.0.3
info:
title: Google Drive Sitemap Adapter API
description: |
HTTP adapter for generating XML sitemaps listing accessible Google Drive documents.
## Overview
This adapter provides a single endpoint (`/sitemap.xml`) that generates a valid XML sitemap
conforming to the sitemap protocol (https://www.sitemaps.org/protocol.html).
The sitemap lists all documents accessible to the configured Google Service Account,
with URLs pointing back to this adapter using document IDs.
## Authentication
The adapter uses OAuth 2.0 Service Account authentication to access Google Drive.
External clients do not need to authenticate with this API.
## Rate Limiting
Google Drive API rate limits are handled gracefully. If rate limited, the adapter
returns HTTP 429 with a Retry-After header indicating seconds until retry.
## Sitemap Protocol Compliance
- Maximum 50,000 URLs per sitemap (protocol limit)
- Each URL includes document ID and last modified timestamp
- Always returns fresh data (no caching)
version: 1.0.0
contact:
name: API Support
license:
name: ISC
servers:
- url: http://localhost:3000
description: Development server
- url: https://adapter.example.com
description: Production server
tags:
- name: Sitemap
description: XML sitemap generation
paths:
/sitemap.xml:
get:
summary: Generate XML sitemap
description: |
Returns an XML sitemap listing all accessible Google Drive documents.
Each URL in the sitemap points to this adapter with a document ID:
`{baseUrl}/{documentId}`
The sitemap is generated on-demand (no caching) and may take up to 5 seconds
for drives containing up to 10,000 documents.
## Sitemap Format
Conforms to https://www.sitemaps.org/protocol.html:
- `<loc>`: Absolute URL with document ID
- `<lastmod>`: Last modified timestamp (ISO 8601)
## Document Retrieval
Note: The URLs in the sitemap point back to this adapter, but document retrieval
endpoints are not implemented. This adapter only generates sitemaps for discovery.
operationId: getSitemap
tags:
- Sitemap
responses:
'200':
description: Successfully generated sitemap
headers:
Content-Type:
description: Always application/xml
schema:
type: string
example: application/xml
Content-Length:
description: Size of sitemap in bytes
schema:
type: integer
example: 204800
content:
application/xml:
schema:
type: string
format: xml
example: |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com/1BxAA_example123</loc>
<lastmod>2026-03-06T10:30:00.000Z</lastmod>
</url>
<url>
<loc>https://adapter.example.com/1CyBB_example456</loc>
<lastmod>2026-03-05T14:20:00.000Z</lastmod>
</url>
</urlset>
'401':
description: Unauthorized - OAuth authentication failed
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'429':
description: Too Many Requests - Rate limited by Google Drive API
headers:
Retry-After:
description: Seconds to wait before retrying
schema:
type: integer
example: 60
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'500':
description: Internal Server Error
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
'503':
description: Service Unavailable - Google Drive API is down
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
/{documentId}:
get:
summary: Document retrieval endpoint (NOT IMPLEMENTED)
description: |
This endpoint is referenced in sitemap URLs but is not implemented.
The adapter only generates sitemaps; it does not serve documents.
Clients should treat sitemap URLs as metadata only.
operationId: getDocument
tags:
- Documents (Not Implemented)
parameters:
- name: documentId
in: path
description: Google Drive document ID
required: true
schema:
type: string
pattern: '^[a-zA-Z0-9_-]+$'
example: 1BxAA_example123
responses:
'404':
description: Not Found - Document retrieval not implemented
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
/{anyOtherPath}:
get:
summary: All other paths
description: |
Any path other than `/sitemap.xml` returns 404 Not Found.
operationId: notFound
tags:
- Routing
parameters:
- name: anyOtherPath
in: path
description: Any path other than /sitemap.xml
required: true
schema:
type: string
responses:
'404':
description: Not Found
headers:
Content-Length:
description: Always 0 (no response body)
schema:
type: integer
example: 0
components:
schemas:
Sitemap:
type: object
description: XML sitemap structure (logical representation, actual response is XML)
properties:
xmlns:
type: string
description: XML namespace for sitemap protocol
example: http://www.sitemaps.org/schemas/sitemap/0.9
urls:
type: array
description: Array of URL entries
items:
$ref: '#/components/schemas/SitemapUrl'
maxItems: 50000
SitemapUrl:
type: object
description: Single URL entry in sitemap
required:
- loc
- lastmod
properties:
loc:
type: string
format: uri
description: Absolute URL to document (adapter URL + document ID)
example: https://adapter.example.com/1BxAA_example123
lastmod:
type: string
format: date-time
description: Last modified timestamp in ISO 8601 format
example: 2026-03-06T10:30:00.000Z
Error:
type: object
description: Error response (note - most errors return empty body per spec)
properties:
code:
type: integer
description: HTTP status code
example: 500
message:
type: string
description: Error message (not included in actual responses)
example: Internal Server Error
responses:
UnauthorizedError:
description: Unauthorized - OAuth authentication failed
headers:
Content-Length:
schema:
type: integer
example: 0
RateLimitError:
description: Too Many Requests - Rate limited by Google Drive API
headers:
Retry-After:
description: Seconds to wait before retrying
schema:
type: integer
example: 60
Content-Length:
schema:
type: integer
example: 0
InternalError:
description: Internal Server Error
headers:
Content-Length:
schema:
type: integer
example: 0
ServiceUnavailable:
description: Service Unavailable - Google Drive API is down
headers:
Content-Length:
schema:
type: integer
example: 0
NotFound:
description: Not Found - Path not recognized
headers:
Content-Length:
schema:
type: integer
example: 0
externalDocs:
description: Sitemap Protocol Specification
url: https://www.sitemaps.org/protocol.html

View File

@@ -0,0 +1,454 @@
openapi: 3.0.3
info:
title: Google Drive HTTP Proxy Adapter API
description: |
HTTP proxy adapter for exporting Google Drive documents in multiple formats (Markdown, HTML, PDF)
and generating XML sitemaps of accessible documents.
## Authentication
The adapter uses OAuth 2.0 to access Google Drive on behalf of configured users.
External clients do not need to authenticate with this API directly.
## Rate Limiting
API requests are rate-limited to 100 requests per minute per IP address.
Rate limit information is included in response headers.
version: 1.0.0
contact:
name: API Support
license:
name: MIT
servers:
- url: http://localhost:3000
description: Development server
- url: https://api.example.com
description: Production server
tags:
- name: Documents
description: Document export operations
- name: Discovery
description: Document discovery and listing
- name: Health
description: Service health monitoring
paths:
/health:
get:
summary: Health check endpoint
description: Returns service health status and version information
tags:
- Health
responses:
'200':
description: Service is healthy
content:
application/json:
schema:
type: object
properties:
status:
type: string
example: ok
version:
type: string
example: 1.0.0
uptime:
type: number
description: Service uptime in seconds
example: 86400
/sitemap.xml:
get:
summary: Generate sitemap of accessible documents
description: |
Returns an XML sitemap listing all Google Drive documents accessible to the configured user.
Follows the sitemap protocol specification (https://www.sitemaps.org/protocol.html).
tags:
- Discovery
responses:
'200':
description: Sitemap generated successfully
headers:
Content-Type:
schema:
type: string
example: application/xml; charset=utf-8
X-Request-Id:
schema:
type: string
format: uuid
description: Unique request identifier for tracing
X-Document-Count:
schema:
type: integer
description: Number of documents in the sitemap
content:
application/xml:
schema:
type: string
format: xml
example: |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost:3000/1BxAA_example123</loc>
<lastmod>2026-03-06T10:30:00Z</lastmod>
</url>
<url>
<loc>http://localhost:3000/2CyBB_example456</loc>
<lastmod>2026-03-05T14:20:00Z</lastmod>
</url>
</urlset>
'401':
$ref: '#/components/responses/Unauthorized'
'429':
$ref: '#/components/responses/RateLimited'
'500':
$ref: '#/components/responses/InternalError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
/{documentId}:
get:
summary: Export Google Drive document in specified format
description: |
Fetches a Google Drive document by ID and exports it in the requested format.
Supports Markdown (default), HTML, and PDF formats.
tags:
- Documents
parameters:
- name: documentId
in: path
required: true
description: Google Drive file ID (8-128 alphanumeric characters, hyphens, or underscores)
schema:
type: string
pattern: '^[a-zA-Z0-9_-]{8,128}$'
example: 1BxAA_example123
- name: format
in: query
required: false
description: Export format (defaults to markdown if not specified)
schema:
type: string
enum:
- markdown
- html
- pdf
default: markdown
example: markdown
responses:
'200':
description: Document exported successfully
headers:
Content-Type:
schema:
type: string
enum:
- text/markdown; charset=utf-8
- text/html; charset=utf-8
- application/pdf
description: MIME type of exported document
X-Request-Id:
schema:
type: string
format: uuid
description: Unique request identifier for tracing
X-Document-Title:
schema:
type: string
description: Original document title from Google Drive
X-Document-Modified:
schema:
type: string
format: date-time
description: Last modified timestamp (ISO 8601)
content:
text/markdown:
schema:
type: string
example: |
# Document Title
This is a paragraph with **bold** and *italic* text.
## Section Heading
- List item 1
- List item 2
text/html:
schema:
type: string
example: |
<!DOCTYPE html>
<html>
<head><title>Document Title</title></head>
<body>
<h1>Document Title</h1>
<p>This is a paragraph with <strong>bold</strong> and <em>italic</em> text.</p>
</body>
</html>
application/pdf:
schema:
type: string
format: binary
'400':
$ref: '#/components/responses/BadRequest'
'401':
$ref: '#/components/responses/Unauthorized'
'403':
$ref: '#/components/responses/Forbidden'
'404':
$ref: '#/components/responses/NotFound'
'413':
$ref: '#/components/responses/PayloadTooLarge'
'415':
$ref: '#/components/responses/UnsupportedMediaType'
'429':
$ref: '#/components/responses/RateLimited'
'500':
$ref: '#/components/responses/InternalError'
'503':
$ref: '#/components/responses/ServiceUnavailable'
components:
schemas:
ErrorResponse:
type: object
required:
- error
- timestamp
properties:
error:
type: object
required:
- code
- message
- requestId
properties:
code:
type: string
description: Machine-readable error code
enum:
- DOCUMENT_NOT_FOUND
- DOCUMENT_FORBIDDEN
- UNAUTHORIZED
- INVALID_FORMAT
- UNSUPPORTED_DOCUMENT_TYPE
- RATE_LIMITED
- DRIVE_API_ERROR
- INTERNAL_ERROR
- PAYLOAD_TOO_LARGE
example: DOCUMENT_NOT_FOUND
message:
type: string
description: Human-readable error message
example: Document with ID '1BxAA_example123' does not exist or is not accessible
details:
type: object
description: Optional additional context
additionalProperties: true
requestId:
type: string
format: uuid
description: Request ID for support and debugging
example: 550e8400-e29b-41d4-a716-446655440000
timestamp:
type: string
format: date-time
description: ISO 8601 timestamp when error occurred
example: '2026-03-06T10:30:00.123Z'
responses:
BadRequest:
description: Invalid request parameters
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: INVALID_FORMAT
message: "Invalid format 'docx'. Supported formats: markdown, html, pdf"
requestId: 550e8400-e29b-41d4-a716-446655440000
timestamp: '2026-03-06T10:30:00.123Z'
Unauthorized:
description: Authentication failed or missing
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: UNAUTHORIZED
message: Authentication with Google Drive failed
requestId: 550e8400-e29b-41d4-a716-446655440001
timestamp: '2026-03-06T10:30:01.456Z'
Forbidden:
description: User lacks permission to access the document
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DOCUMENT_FORBIDDEN
message: You do not have permission to access this document
requestId: 550e8400-e29b-41d4-a716-446655440002
timestamp: '2026-03-06T10:30:02.789Z'
NotFound:
description: Document does not exist
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DOCUMENT_NOT_FOUND
message: Document with ID '1BxAA_invalid' does not exist or is not accessible
requestId: 550e8400-e29b-41d4-a716-446655440003
timestamp: '2026-03-06T10:30:03.012Z'
PayloadTooLarge:
description: Document exceeds maximum size limit
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: PAYLOAD_TOO_LARGE
message: Document size exceeds maximum limit of 100MB
requestId: 550e8400-e29b-41d4-a716-446655440004
timestamp: '2026-03-06T10:30:04.345Z'
UnsupportedMediaType:
description: Document type cannot be exported in requested format
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: UNSUPPORTED_DOCUMENT_TYPE
message: Document type 'application/vnd.google-apps.form' cannot be exported as PDF
requestId: 550e8400-e29b-41d4-a716-446655440005
timestamp: '2026-03-06T10:30:05.678Z'
RateLimited:
description: Rate limit exceeded
headers:
X-Request-Id:
schema:
type: string
format: uuid
X-RateLimit-Limit:
schema:
type: integer
description: Maximum requests per minute
example: 100
X-RateLimit-Remaining:
schema:
type: integer
description: Remaining requests in current window
example: 0
X-RateLimit-Reset:
schema:
type: integer
description: Unix timestamp when rate limit resets
example: 1709724660
Retry-After:
schema:
type: integer
description: Seconds until rate limit resets
example: 60
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: RATE_LIMITED
message: Rate limit exceeded. Please retry after 60 seconds
requestId: 550e8400-e29b-41d4-a716-446655440006
timestamp: '2026-03-06T10:30:06.901Z'
InternalError:
description: Internal server error
headers:
X-Request-Id:
schema:
type: string
format: uuid
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: INTERNAL_ERROR
message: An unexpected error occurred while processing your request
requestId: 550e8400-e29b-41d4-a716-446655440007
timestamp: '2026-03-06T10:30:07.234Z'
ServiceUnavailable:
description: Service temporarily unavailable (Google Drive API down or rate limited)
headers:
X-Request-Id:
schema:
type: string
format: uuid
Retry-After:
schema:
type: integer
description: Seconds until service may be available
example: 300
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
example:
error:
code: DRIVE_API_ERROR
message: Google Drive API is temporarily unavailable. Please retry later
requestId: 550e8400-e29b-41d4-a716-446655440008
timestamp: '2026-03-06T10:30:08.567Z'

View File

@@ -0,0 +1,436 @@
# API Contract: Sitemap Endpoint
**Feature**: 001-drive-proxy-adapter
**Date**: 2026-03-07
**Phase**: 1 - Design & Contracts
**Endpoint**: `GET /sitemap.xml`
## Overview
The `/sitemap.xml` endpoint returns an XML sitemap listing all Google Drive documents accessible to the Service Account. This is the only endpoint exposed by the adapter.
---
## Endpoint Definition
### URL
```
GET /sitemap.xml
```
### Authentication
- **Method**: None (endpoint is public)
- **Backend Authentication**: Service Account JWT to Google Drive API (transparent to client)
- **Credentials**: Loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
### Request
**Method**: `GET`
**Headers**:
- None required
**Query Parameters**:
- None supported
**Request Body**:
- None (GET request)
**Example Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: adapter.example.com
User-Agent: Mozilla/5.0
```
---
## Response Specifications
### Success Response (200 OK)
**Status Code**: `200 OK`
**Headers**:
- `Content-Type: application/xml`
- `Content-Length: {size_in_bytes}`
**Body**: Valid XML sitemap conforming to sitemap protocol
**XML Schema**:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://adapter.example.com/documents/{documentId}</loc>
<lastmod>2026-03-06T10:30:00.000Z</lastmod>
</url>
<!-- Additional <url> entries (up to 50,000) -->
</urlset>
```
**Field Descriptions**:
- `<urlset>`: Root element with sitemap namespace
- `<url>`: Individual URL entry (0 to 50,000 entries)
- `<loc>`: Absolute URL to document using RESTful format `/documents/{documentId}`
- `<lastmod>`: ISO 8601 timestamp of last document modification
**Constraints**:
- Maximum 50,000 `<url>` entries (sitemap protocol limit per spec.md FR-015)
- Maximum 50MB uncompressed (protocol limit, not enforced)
- All `<loc>` URLs use same base URL (configured via `BASE_URL` env var)
- All `<loc>` URLs use RESTful path format: `/documents/{documentId}`
**Example Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 4582
```
**Performance Targets** (from spec.md success criteria):
- Response time: < 5 seconds for up to 10,000 documents
- Memory usage: < 256MB under normal load
- Concurrent requests: Support 10 concurrent requests without degradation
---
### Not Found Response (404)
**Status Code**: `404 Not Found`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Any path other than `/sitemap.xml` (per spec.md FR-007)
**Example Response**:
```http
HTTP/1.1 404 Not Found
```
---
### Unauthorized Response (401)
**Status Code**: `401 Unauthorized`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Service Account JWT authentication failed (per spec.md FR-010)
- OAuth token refresh failed
- Invalid Service Account credentials
**Example Response**:
```http
HTTP/1.1 401 Unauthorized
```
**Client Action**: Check Service Account credentials in `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable
---
### Rate Limited Response (429)
**Status Code**: `429 Too Many Requests`
**Headers**:
- `Retry-After: {seconds}` (integer, seconds until retry allowed)
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Google Drive API rate limit exceeded (per spec.md FR-013)
- Quota exhausted for Service Account
**Example Response**:
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 60
```
**Client Action**: Wait `Retry-After` seconds before retrying request
**Retry-After Values**:
- Derived from Google Drive API `Retry-After` header if available
- Default: 60 seconds if not specified by Drive API
---
### Internal Server Error (500)
**Status Code**: `500 Internal Server Error`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Unexpected server error (per spec.md FR-008)
- Configuration error (missing environment variables)
- XML generation failure
**Example Response**:
```http
HTTP/1.1 500 Internal Server Error
```
**Client Action**: Report error to adapter administrator
**Server Logging**: All 500 errors logged with stack trace to stderr (per spec.md FR-012)
---
### Service Unavailable Response (503)
**Status Code**: `503 Service Unavailable`
**Headers**: None
**Body**: Empty (per spec.md clarification: "HTTP status code only, no error response body")
**When Returned**:
- Google Drive API unavailable (per spec.md FR-017)
- Drive API returns 503 status (no retries per spec clarification)
**Example Response**:
```http
HTTP/1.1 503 Service Unavailable
```
**Client Action**: Retry request later (Drive API temporarily unavailable)
**Retry Behavior**: Adapter does NOT retry Drive API 503 errors; immediately returns 503 to client (per spec.md FR-017 clarification)
---
## Error Handling Specification
### Error Response Format
**All error responses follow same pattern**:
- Status code indicates error type
- No response body (per spec.md clarification)
- Minimal headers (only `Retry-After` for 429)
**Rationale**: Simplicity, consistency, fail-fast approach
### Error Status Code Matrix
| Error Condition | Status Code | Headers | Body | Retry? |
|----------------|-------------|---------|------|--------|
| Authentication failed | 401 | None | Empty | No (fix credentials) |
| Rate limit exceeded | 429 | `Retry-After` | Empty | Yes (after delay) |
| Drive API unavailable | 503 | None | Empty | Yes (later) |
| Internal error | 500 | None | Empty | No (report to admin) |
| Path not found | 404 | None | Empty | No |
---
## Logging Specification
### Request Logging (stdout)
**All requests logged with**:
- Timestamp (ISO 8601)
- HTTP method and path
- Response status code
- Response time (milliseconds)
**Example**:
```
[2026-03-07T14:30:15.456Z] GET /sitemap.xml -> 200 (1234ms)
[2026-03-07T14:30:20.789Z] GET /sitemap.xml -> 429 (234ms)
[2026-03-07T14:30:25.012Z] GET /invalid.xml -> 404 (1ms)
```
### Error Logging (stderr)
**All errors logged with**:
- Timestamp (ISO 8601)
- Request ID (for correlation)
- Error message
- Stack trace (for 500 errors)
**Example**:
```
[2026-03-07T14:30:20.789Z] [ERROR] Rate limit exceeded: Drive API quota exhausted
[2026-03-07T14:30:25.012Z] [ERROR] Authentication failed: Invalid Service Account key
[2026-03-07T14:30:30.345Z] [ERROR] Drive API unavailable: Connection timeout
```
---
## Contract Tests
### Test Scenarios
1. **Successful sitemap generation**
- Request: `GET /sitemap.xml`
- Expected: 200 status, valid XML, `Content-Type: application/xml`
2. **Not found for other paths**
- Request: `GET /invalid.xml`
- Expected: 404 status, empty body
3. **Rate limiting**
- Simulate Drive API 429 response
- Expected: 429 status, `Retry-After` header, empty body
4. **Authentication failure**
- Simulate invalid credentials
- Expected: 401 status, empty body
5. **Service unavailable**
- Simulate Drive API 503 response
- Expected: 503 status, empty body (no retries)
6. **XML schema validation**
- Request: `GET /sitemap.xml`
- Validate XML against sitemap protocol schema
7. **URL format validation**
- Request: `GET /sitemap.xml`
- Verify all `<loc>` URLs use `/documents/{documentId}` format
### Test Assertions
**XML Schema Validation**:
- Root element: `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">`
- Each `<url>` has required `<loc>` child
- Each `<lastmod>` is valid ISO 8601 timestamp
- Maximum 50,000 `<url>` entries
**URL Format Validation**:
- All `<loc>` URLs are absolute (start with http:// or https://)
- All `<loc>` URLs use RESTful format: `{baseUrl}/documents/{documentId}`
- Document IDs match regex: `^[a-zA-Z0-9_-]+$`
**Header Validation**:
- 200 responses include `Content-Type: application/xml`
- 429 responses include `Retry-After` header with integer value
- All error responses have empty body
---
## Configuration
### Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GOOGLE_SERVICE_ACCOUNT_KEY` | Yes | None | Inline JSON of Service Account key file |
| `BASE_URL` | Yes | None | Base URL for sitemap links (e.g., `https://adapter.example.com`) |
| `PORT` | No | 3000 | HTTP server port |
**Example .env**:
```bash
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"...@developer.gserviceaccount.com",...}'
BASE_URL=https://adapter.example.com
PORT=3000
```
---
## Compatibility
### Sitemap Protocol Compliance
**Protocol**: https://www.sitemaps.org/protocol.html
**Compliance**:
- ✅ Valid XML with namespace
-`<loc>` with absolute URLs
-`<lastmod>` with W3C Datetime format (ISO 8601)
- ✅ Maximum 50,000 URLs
- ✅ Maximum 50MB uncompressed size
**Optional Elements Not Used**:
- `<changefreq>`: Not applicable (no historical change data)
- `<priority>`: Not applicable (all documents equal priority)
### HTTP Compliance
**HTTP Version**: HTTP/1.1
**Methods Supported**: `GET` only
**Status Codes Used**: 200, 401, 404, 429, 500, 503
**Headers Used**:
- Response: `Content-Type`, `Content-Length`, `Retry-After`
- Request: Standard HTTP headers accepted, none required
---
## Security Considerations
### Authentication
- Service Account credentials secured in environment variable (not in code or config files)
- Credentials never logged or exposed in error messages
- Read-only Drive scope (`drive.readonly`) - no write permissions
### Rate Limiting
- Transparent propagation of Drive API rate limits to client
- No internal rate limiting (rely on Drive API limits)
### Input Validation
- Path validation: Only `/sitemap.xml` accepted
- Method validation: Only `GET` accepted
- No query parameters processed (rejection not required, just ignored)
### Output Sanitization
- All URLs XML-escaped to prevent injection
- All timestamps XML-escaped (though ISO 8601 format doesn't contain XML special chars)
---
## Versioning
**Current Version**: 1.0.0 (initial implementation)
**Future Changes**:
- Breaking changes (new required parameters): Major version bump (2.0.0)
- Backward-compatible additions (query parameters): Minor version bump (1.1.0)
- Bug fixes: Patch version bump (1.0.1)
**Deprecation Policy**:
- Breaking changes include migration guide
- Deprecated features supported for at least one minor version
---
## References
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference
**Deprecation Policy**:
- Breaking changes include migration guide
- Deprecated features supported for at least one minor version
---
## References
- Feature Specification: `/specs/001-drive-proxy-adapter/spec.md`
- Data Model: `/specs/001-drive-proxy-adapter/data-model.md`
- Research Document: `/specs/001-drive-proxy-adapter/research.md`
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
- Google Drive API v3: https://developers.google.com/drive/api/v3/reference

View File

@@ -0,0 +1,382 @@
# API Contract: Sitemap XML Endpoint
**Feature**: 001-drive-proxy-adapter
**Contract Type**: HTTP API
**Endpoint**: `/sitemap.xml`
**Version**: 1.0.0
**Date**: 2026-03-07
---
## Endpoint Specification
### `GET /sitemap.xml`
Generate an XML sitemap of all accessible Google Drive documents.
---
## Request
### HTTP Method
`GET`
### URL
`/sitemap.xml`
### Query Parameters
None
### Request Headers
None required
### Request Body
None (GET request)
---
## Response
### Success Response (200 OK)
**Status Code**: `200 OK`
**Response Headers**:
```
Content-Type: application/xml; charset=utf-8
Content-Length: {size_in_bytes}
```
**Response Body** (XML):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/documents/{documentId1}</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://example.com/documents/{documentId2}</loc>
<lastmod>2026-03-06</lastmod>
</url>
<!-- ... up to 50,000 entries -->
</urlset>
```
**XML Schema Requirements**:
- Root element: `<urlset>` with namespace `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
- Each document: `<url>` element containing:
- `<loc>` (REQUIRED): Absolute URL in format `{baseUrl}/documents/{documentId}`
- Must be URL-encoded
- Must escape XML special characters: `&``&amp;`, `<``&lt;`, `>``&gt;`, `"``&quot;`, `'``&apos;`
- `<lastmod>` (OPTIONAL): ISO 8601 date format
- Format: `YYYY-MM-DD` or `YYYY-MM-DDTHH:MM:SS+00:00`
- Omitted if Drive API provides no `modifiedTime`
**Empty Drive Response** (0 documents):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
</urlset>
```
**Constraints**:
- Maximum 50,000 `<url>` entries (sitemap protocol limit)
- If >50,000 documents exist, return 413 error instead
---
### Error Responses
#### 404 Not Found
**Trigger**: Request to any endpoint other than `/sitemap.xml`
**Status Code**: `404 Not Found`
**Response Headers**: None
**Response Body**: Empty (no content)
**Example**:
```
GET /documents/abc123 → 404 Not Found (empty body)
GET /api/sitemap → 404 Not Found (empty body)
POST /sitemap.xml → 404 Not Found (empty body)
```
---
#### 413 Payload Too Large
**Trigger**: Google Drive contains more than 50,000 documents
**Status Code**: `413 Payload Too Large`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Sitemap protocol limits sitemaps to 50,000 URLs. This error prevents oversized sitemap generation.
---
#### 429 Too Many Requests
**Trigger**: Google Drive API returns rate limit error
**Status Code**: `429 Too Many Requests`
**Response Headers**:
```
Retry-After: {seconds}
```
**Response Body**: Empty (no content)
**Example**:
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
(empty body)
```
**Rationale**: Client should retry after the specified number of seconds.
---
#### 401 Unauthorized
**Trigger**: Service Account token refresh failed
**Status Code**: `401 Unauthorized`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Authentication failed. Check Service Account credentials configuration.
---
#### 503 Service Unavailable
**Trigger**: Google Drive API returns 503 error
**Status Code**: `503 Service Unavailable`
**Response Headers**: None
**Response Body**: Empty (no content)
**Behavior**: No retries - immediately pass through 503 to client per specification.
---
#### 500 Internal Server Error
**Trigger**: Unexpected error during sitemap generation
**Status Code**: `500 Internal Server Error`
**Response Headers**: None
**Response Body**: Empty (no content)
**Rationale**: Unexpected server error. Check logs for details.
---
## Examples
### Example 1: Successful Sitemap (3 documents)
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 512
```
---
### Example 2: Empty Drive
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: 123
```
---
### Example 3: Rate Limit Exceeded
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 120
```
---
### Example 4: Too Many Documents
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 413 Payload Too Large
```
---
### Example 5: Invalid Endpoint
**Request**:
```http
GET /documents/abc123 HTTP/1.1
Host: example.com
```
**Response**:
```http
HTTP/1.1 404 Not Found
```
---
## Contract Validation
### XML Schema Validation
The sitemap XML MUST validate against the sitemap protocol schema:
- **Namespace**: `http://www.sitemaps.org/schemas/sitemap/0.9`
- **Root element**: `<urlset>`
- **Child elements**: Zero or more `<url>` elements
- **URL elements**: Each contains `<loc>` (required) and `<lastmod>` (optional)
**Validation Tools**:
- XML parser (ensure well-formed XML)
- Sitemap validator: [https://www.xml-sitemaps.com/validate-xml-sitemap.html](https://www.xml-sitemaps.com/validate-xml-sitemap.html)
- XSD schema validation against official sitemap schema
---
### Contract Testing Requirements
All contract tests MUST verify:
1. **Success Path**:
- Response status 200
- Content-Type header is `application/xml; charset=utf-8`
- Response body is valid XML
- XML contains correct namespace
- All `<loc>` URLs are absolute and properly formatted
- All `<loc>` URLs follow pattern: `{baseUrl}/documents/{documentId}`
- All `<lastmod>` dates are valid ISO 8601 format (if present)
2. **Error Handling**:
- Invalid endpoints return 404 with empty body
- >50k documents returns 413 with empty body
- Rate limiting returns 429 with `Retry-After` header and empty body
- Drive API 503 returns 503 with empty body (no retries)
- All error responses have no `Content-Type` header
- All error responses have empty body
3. **Edge Cases**:
- Empty Drive (0 documents) returns valid sitemap with no `<url>` entries
- Documents without `modifiedTime` omit `<lastmod>` tag
- Special characters in document IDs are properly URL-encoded
- XML special characters in URLs are properly escaped
---
## Breaking Changes
Changes that constitute breaking changes (require MAJOR version bump):
1. Changing URL format from `/documents/{id}` to different format
2. Changing XML namespace or root element structure
3. Removing `<lastmod>` field entirely
4. Changing error response status codes
5. Adding required query parameters
6. Changing response Content-Type
---
## References
- [Sitemap Protocol Specification](https://www.sitemaps.org/protocol.html)
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
- [XML Specification](https://www.w3.org/TR/xml/)
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial contract specification |
---
## Summary
This contract defines the complete API specification for the `/sitemap.xml` endpoint, including:
1. **Request/response formats** with examples
2. **Error handling** with all status codes (404, 413, 429, 401, 503, 500)
3. **XML schema requirements** for sitemap format
4. **Validation criteria** for contract testing
5. **Breaking change policy** for version management
All error responses follow the spec requirement: **status code only, no response body** (except 429 which includes `Retry-After` header).

View File

@@ -0,0 +1,493 @@
# Data Model: Google Drive HTTP Proxy Adapter
**Feature**: 001-drive-proxy-adapter
**Phase**: 1 - Design & Contracts
**Date**: 2026-03-07
## Overview
This document defines the data structures, entities, and their relationships for the Google Drive HTTP Proxy Adapter. The system is stateless (no persistence layer) with all entities representing runtime state or API payloads.
---
## Core Entities
### 1. Document
Represents a file in Google Drive. Extracted from Drive API response.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} Document
* @property {string} id - Google Drive file ID (unique identifier)
* @property {string} name - Document title/filename
* @property {string} mimeType - MIME type (e.g., 'application/pdf', 'text/plain')
* @property {string} [modifiedTime] - ISO 8601 timestamp of last modification (optional)
*/
```
**Validation Rules**:
- `id`: REQUIRED, non-empty string
- `name`: REQUIRED, non-empty string
- `mimeType`: REQUIRED, non-empty string
- `modifiedTime`: OPTIONAL, must be valid ISO 8601 format if present
**Source**: Drive API `files.list()` response with fields: `files(id, name, mimeType, modifiedTime)`
**Usage**:
- Retrieved during sitemap generation
- Transformed into SitemapEntry for XML output
- No filtering by mimeType (all file types included per spec)
---
### 2. SitemapEntry
Represents a single URL entry in the XML sitemap.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} SitemapEntry
* @property {string} loc - Absolute URL to document (RESTful format: /documents/{id})
* @property {string} [lastmod] - ISO 8601 date of last modification (YYYY-MM-DD format)
*/
```
**Validation Rules**:
- `loc`: REQUIRED, must be absolute URL (http:// or https://), properly escaped XML special chars
- `lastmod`: OPTIONAL, must be ISO 8601 date format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+00:00)
**Transformation from Document**:
```javascript
/**
* Transform Document to SitemapEntry
* @param {Document} doc - Source document from Drive API
* @param {string} baseUrl - Base URL for sitemap (from config)
* @returns {SitemapEntry}
*/
function toSitemapEntry(doc, baseUrl) {
return {
loc: `${baseUrl}/documents/${encodeURIComponent(doc.id)}`,
lastmod: doc.modifiedTime ? new Date(doc.modifiedTime).toISOString().split('T')[0] : undefined
};
}
```
**Usage**:
- Generated during XML sitemap construction
- Each entry becomes `<url><loc>...</loc><lastmod>...</lastmod></url>` in XML
---
### 3. HTTPRequestContext
Represents the context for an incoming HTTP request.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} HTTPRequestContext
* @property {string} requestId - Unique identifier for request tracing (UUID)
* @property {string} method - HTTP method (e.g., 'GET')
* @property {string} path - Request path (e.g., '/sitemap.xml')
* @property {string} clientIp - Client IP address
* @property {number} timestamp - Request start time (Unix timestamp in ms)
*/
```
**Validation Rules**:
- `requestId`: REQUIRED, unique per request (generated via crypto.randomUUID())
- `method`: REQUIRED, HTTP method string
- `path`: REQUIRED, URL path string
- `clientIp`: REQUIRED, IP address string
- `timestamp`: REQUIRED, positive integer
**Generation**:
```javascript
import { randomUUID } from 'crypto';
/**
* Create request context from incoming HTTP request
* @param {http.IncomingMessage} req - Node.js HTTP request object
* @returns {HTTPRequestContext}
*/
function createRequestContext(req) {
return {
requestId: randomUUID(),
method: req.method,
path: req.url,
clientIp: req.socket.remoteAddress,
timestamp: Date.now()
};
}
```
**Usage**:
- Created at request entry point
- Used for logging (trace requests through logs)
- Passed to queue for processing
---
### 4. ServiceAccountCredentials
Represents Google Service Account JWT authentication credentials.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} ServiceAccountCredentials
* @property {string} client_email - Service Account email address
* @property {string} private_key - RSA private key (PEM format)
* @property {string} project_id - Google Cloud project ID
* @property {string} [token_uri] - OAuth token endpoint (default: googleapis.com)
*/
```
**Validation Rules**:
- `client_email`: REQUIRED, valid email format ending with `.gserviceaccount.com`
- `private_key`: REQUIRED, must start with `-----BEGIN PRIVATE KEY-----`
- `project_id`: REQUIRED, non-empty string
- `token_uri`: OPTIONAL, defaults to Google's OAuth endpoint
**Source**: Loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` environment variable (inline JSON)
**Validation Function**:
```javascript
/**
* Validate Service Account credentials structure
* @param {Object} creds - Parsed JSON credentials
* @throws {Error} If validation fails
*/
function validateCredentials(creds) {
if (!creds.client_email || !creds.client_email.endsWith('.gserviceaccount.com')) {
throw new Error('Invalid client_email in Service Account credentials');
}
if (!creds.private_key || !creds.private_key.startsWith('-----BEGIN PRIVATE KEY-----')) {
throw new Error('Invalid private_key in Service Account credentials');
}
if (!creds.project_id) {
throw new Error('Missing project_id in Service Account credentials');
}
}
```
**Security**:
- NEVER log `private_key` field
- Mask in logs: `client_email: xxx***@project.iam.gserviceaccount.com`
---
### 5. Configuration
Represents application runtime configuration.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} ServerConfig
* @property {number} port - HTTP server port
* @property {string} baseUrl - Base URL for sitemap links (absolute URL)
*/
/**
* @typedef {Object} DriveConfig
* @property {string} query - Drive API query filter (q parameter)
* @property {string} fields - Fields to retrieve from Drive API
* @property {number} pageSize - Maximum results per page (Drive API pagination)
* @property {string} scope - OAuth scope for Drive access
*/
/**
* @typedef {Object} Configuration
* @property {ServerConfig} server - HTTP server configuration
* @property {DriveConfig} drive - Google Drive API configuration
*/
```
**Default Values**:
```javascript
const DEFAULT_CONFIG = {
server: {
port: 3000,
baseUrl: 'http://localhost:3000'
},
drive: {
query: 'trashed = false',
fields: 'files(id, name, mimeType, modifiedTime)',
pageSize: 1000,
scope: 'https://www.googleapis.com/auth/drive.readonly'
}
};
```
**Loading**:
- `config/config.js`: Exports server configuration (port, baseUrl from env vars)
- `config/settings.js`: Exports Drive configuration (query from env var, loaded into global `settings`)
**Validation**:
- `port`: Must be 1-65535
- `baseUrl`: Must be valid absolute URL (http:// or https://)
- `query`: Non-empty string (Drive API query syntax)
- `pageSize`: 1-1000 (Drive API limit)
---
### 6. RequestQueue
Represents the FIFO queue for /sitemap.xml requests.
**JSDoc Type Definition**:
```javascript
/**
* @typedef {Object} QueuedRequest
* @property {Function} handler - Async function to execute (returns Promise)
* @property {Function} resolve - Promise resolve callback
* @property {Function} reject - Promise reject callback
*/
/**
* @typedef {Object} RequestQueue
* @property {boolean} processing - Whether a request is currently being processed
* @property {QueuedRequest[]} queue - Array of pending requests (FIFO)
*/
```
**State Transitions**:
```
IDLE (processing: false, queue: [])
↓ New request arrives
PROCESSING (processing: true, queue: [])
↓ New request arrives while processing
PROCESSING (processing: true, queue: [req1])
↓ Current request completes
PROCESSING (processing: true, queue: []) → Process req1
↓ req1 completes, queue empty
IDLE (processing: false, queue: [])
```
**Operations**:
- `enqueue(handler)`: Add request to queue, start processing if idle
- `processNext()`: Process next request in FIFO order, recursively call until queue empty
**Implementation**: See research.md Section 3 for EventEmitter-based code pattern
---
## State Machines
### Authentication State
```
UNINITIALIZED
↓ Load credentials from env var
VALIDATING
↓ Parse JSON, validate structure
├─ Success → AUTHENTICATED
└─ Failure → FATAL_ERROR (exit(1))
AUTHENTICATED
↓ Token expiry detected during request
REFRESHING
├─ Success → AUTHENTICATED
└─ Failure → UNAUTHORIZED (return 401)
```
**Note**: googleapis SDK manages token refresh automatically. Our code only handles:
1. Initial credential loading/validation (startup)
2. Error mapping (401 if refresh fails during request)
---
### Request Processing State
```
RECEIVED
↓ Create RequestContext, log request
QUEUED
↓ Wait for queue availability (FIFO)
PROCESSING
↓ Query Drive API
├─ Success (≤50k docs) → GENERATING_XML
├─ Error (>50k docs) → PAYLOAD_TOO_LARGE (413)
├─ Error (Rate limit) → RATE_LIMITED (429 + Retry-After)
├─ Error (503) → SERVICE_UNAVAILABLE (503, no retry)
└─ Error (Other) → INTERNAL_ERROR (500)
GENERATING_XML
↓ Build sitemap XML from documents
├─ Success → COMPLETED (200 + XML)
└─ Error → INTERNAL_ERROR (500)
COMPLETED
↓ Log response, return to client
```
---
## Data Flow Diagrams
### Sitemap Generation Flow
```
[Client] --GET /sitemap.xml--> [Server]
[Create RequestContext]
[Enqueue in RequestQueue]
[Wait for queue slot (FIFO)]
[Query Drive API files.list()]
[Paginate through results]
[Check count ≤ 50,000]
YES ←─────┴─────→ NO
↓ ↓
[Transform Documents] [Return 413]
to SitemapEntries
[Generate XML string]
[Return 200 + XML]
```
### Error Handling Flow
```
[Error Occurs]
[Identify Error Type]
├─ Drive API 429 → Extract rate limit info → Set Retry-After → 429
├─ Drive API 503 → No retry → 503
├─ Document count > 50k → 413
├─ Token refresh failed → 401
├─ Invalid endpoint → 404
└─ Unknown error → Log stack → 500
[Set status code, NO response body]
[Log error to stderr with context]
[Return response to client]
```
---
## API Response Formats
### Successful Sitemap Response (200 OK)
**Headers**:
```
Content-Type: application/xml; charset=utf-8
Content-Length: {size}
```
**Body** (XML):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/documents/1A2B3C4D</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://example.com/documents/5E6F7G8H</loc>
<lastmod>2026-03-06</lastmod>
</url>
</urlset>
```
### Error Responses (4xx/5xx)
**All error responses**:
- **Headers**: No Content-Type (empty body)
- **Body**: Empty (per spec: status code only, no body)
- **Special case**: 429 includes `Retry-After: {seconds}` header
**Status codes**:
- 404 Not Found: Invalid endpoint
- 413 Payload Too Large: >50,000 documents
- 429 Too Many Requests: Drive API rate limit (includes Retry-After header)
- 401 Unauthorized: Token refresh failed
- 503 Service Unavailable: Drive API unavailable (no retry)
- 500 Internal Server Error: Unexpected error
---
## Validation Rules Summary
### Input Validation
- Environment variables:
- `GOOGLE_SERVICE_ACCOUNT_KEY`: Required, valid JSON with client_email/private_key
- `PORT`: Optional, 1-65535
- `BASE_URL`: Optional, valid absolute URL
- `DRIVE_QUERY`: Optional, non-empty string
### Output Validation
- Sitemap XML:
- Valid XML structure (well-formed)
- Proper namespace declaration
- All URLs properly escaped (XML entities: &, <, >, ", ')
- All URLs absolute (include protocol + domain)
- Document count ≤ 50,000
### Runtime Validation
- HTTP requests:
- Only GET method for /sitemap.xml (others return 404)
- Only /sitemap.xml path supported (others return 404)
---
## Edge Cases & Error Handling
| Scenario | Data Impact | Response |
|----------|-------------|----------|
| Empty Drive (0 documents) | Empty urlset in XML | 200 OK with empty sitemap |
| Exactly 50,000 documents | Valid sitemap | 200 OK |
| 50,001 documents | Abort XML generation | 413 Payload Too Large |
| Drive API pagination (>1000 docs) | Multiple API calls, single result set | 200 OK after all pages collected |
| Document with special chars in ID | URL-encode document ID | Properly encoded loc URL |
| Document with no modifiedTime | SitemapEntry.lastmod undefined | Omit <lastmod> tag from XML |
| Concurrent requests | Queue up to N requests | Process sequentially (FIFO) |
| Request while processing | Add to queue array | Wait for turn, then process |
| Fatal error (invalid creds) | Cannot initialize auth client | Log error, exit(1) |
| Port already in use | Cannot bind server | Log error, exit(1) |
---
## Performance Considerations
### Memory Usage
- **Document array**: ~100 bytes per document × 50k max = ~5MB peak
- **XML string**: ~200 bytes per entry × 50k max = ~10MB peak
- **Total estimated**: ~20MB for max load (within 256MB constraint)
### API Call Efficiency
- Use `fields` parameter to request only needed data (reduces payload size)
- Pagination: 1000 documents per page (Drive API limit)
- For 50k documents: ~50 API calls (sequential, within same request processing)
### Caching Strategy
- **NO CACHING**: Per spec requirement "each sitemap request fetches current list"
- Fresh data on every request (trade-off: latency vs. freshness)
---
## Summary
This data model provides:
1. **Clear entity definitions** with JSDoc type annotations (per constitution: JavaScript + JSDoc)
2. **Validation rules** for all inputs and outputs
3. **State machines** for authentication and request processing
4. **Data flow diagrams** showing transformation pipelines
5. **Error handling patterns** for all edge cases
6. **Performance constraints** aligned with success criteria (<256MB memory, <5s response time)
All entities are stateless runtime structures - no persistence layer required.

156
specs/001-sitemap/plan.md Normal file
View File

@@ -0,0 +1,156 @@
# Implementation Plan: Google Drive HTTP Proxy Adapter
**Branch**: `001-drive-proxy-adapter` | **Date**: 2026-03-07 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/001-drive-proxy-adapter/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/templates/plan-template.md` for the execution workflow.
## Summary
Build a Node.js HTTP server that provides a single `/sitemap.xml` endpoint to generate XML sitemaps of Google Drive documents. The system authenticates using a Service Account (JWT-based), queries the Drive API for accessible documents, and returns a sitemap with RESTful URLs (`/documents/{documentId}`). Key features include: FIFO request queuing for concurrent requests, configurable Drive API filters via config/settings.js, 413 error handling for >50k documents, plain text logging to stdout/stderr, and immediate crash (exit code 1) on fatal errors. All clarifications from 3 sessions (10 total Q&A pairs) are now incorporated into design.
## Technical Context
**Language/Version**: JavaScript ES2022+ (Node.js LTS v18.0.0+)
**Primary Dependencies**:
- `googleapis` v140.0.0 (Google Drive API client - justified: official Google SDK, handles OAuth2/JWT complexity, Drive API protocol implementation)
- Node.js built-ins: `http`, `fs`, `path`, `events` (for FIFO queue)
**Storage**: N/A (no persistence - sitemap generated on-demand from Drive API)
**Testing**: Node.js native test runner (`node:test`) with unit, integration, and contract test suites
**Target Platform**: Linux/macOS server environment, containerizable
**Project Type**: Web service (HTTP proxy adapter with monolithic route architecture)
**Performance Goals**:
- `/sitemap.xml` response < 5 seconds for drives with ≤10k documents
- Handle 10 concurrent requests (queued FIFO, processed sequentially)
- Startup time < 10 seconds (cold start to accepting requests)
**Constraints**:
- Memory usage < 256MB under normal load
- No file-based logging (stdout/stderr only)
- No retries on Drive API 503 errors (fail immediately)
- 50,000 document limit (sitemap protocol constraint)
- FIFO queue for /sitemap.xml requests (one at a time to prevent concurrent Drive API operations)
**Scale/Scope**:
- Single endpoint (`/sitemap.xml`)
- Support up to 50k Drive documents (enforced limit)
- 95% success rate for sitemap requests
- Service Account JWT token refresh automatically
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
### ✅ I. Monolithic Architecture
- **Status**: COMPLIANT
- **Rationale**: All proxy logic in `src/proxy.js`, routed from `src/server.js`. Configuration in `config/settings.js` (Drive API filter), loaded into global `settings`. Logging uses `src/console.js` (aliased as `console.js` with log/info/debug/error functions).
- **Phase 1 Verification**: data-model.md confirms stateless architecture, no persistence layer. All entities are runtime structures (Document, SitemapEntry, HTTPRequestContext, RequestQueue). Monolithic route pattern maintained.
### ✅ II. API-First Design
- **Status**: COMPLIANT
- **Rationale**: Single API endpoint `/sitemap.xml` fully specified in spec.md with RESTful URL format, HTTP status codes (200, 404, 413, 429, 503), and XML response format (sitemap protocol). Error handling documented (no response body, status codes only).
- **Phase 1 Verification**: contracts/sitemap-xml-schema.md provides complete API contract with request/response formats, XML schema requirements, validation criteria, and version history. quickstart.md documents API usage with examples.
### ⚠️ III. Test-First Development (NON-NEGOTIABLE)
- **Status**: TO BE VERIFIED IN PHASE 2
- **Action Required**: Tasks.md must include test-first workflow:
1. Write failing unit tests for Drive API client, JWT auth, sitemap generator
2. Write failing integration tests for /sitemap.xml endpoint (200, 413, 429, 503 scenarios)
3. Write failing contract tests for XML sitemap format validation
4. Obtain user approval of test scenarios before implementation
5. Implement minimum code to pass tests (80%+ coverage requirement)
- **Phase 1 Note**: Test structure defined in plan.md (tests/unit/, tests/integration/, tests/contract/) and quickstart.md documents test execution commands.
### ✅ IV. Security & Privacy by Default
- **Status**: COMPLIANT
- **Rationale**: Service Account credentials loaded from `GOOGLE_SERVICE_ACCOUNT_KEY` env var (inline JSON), never logged. JWT tokens handled by googleapis SDK. No user data stored (stateless sitemap generation). Drive API read-only scope (`https://www.googleapis.com/auth/drive.readonly`).
- **Phase 1 Verification**: data-model.md includes security note on ServiceAccountCredentials entity: "NEVER log private_key field, mask client_email in logs". quickstart.md documents security best practices section.
### ✅ V. Observability & Debuggability
- **Status**: COMPLIANT
- **Rationale**: Plain text logging format `[timestamp] [level] message` to stdout/stderr. Request logging includes endpoint + response status. Error logging includes error messages for debugging. Fatal errors logged to stderr before crashing with exit code 1.
- **Phase 1 Verification**: research.md Section 5 details logging implementation with formatMessage function and log event capture list. data-model.md includes HTTPRequestContext entity with requestId for tracing.
### ✅ VI. Semantic Versioning & Change Management
- **Status**: COMPLIANT
- **Rationale**: package.json at v1.0.0. Single endpoint API `/sitemap.xml` - breaking changes would require version bump and migration guide. Sitemap XML format follows public sitemap protocol standard.
- **Phase 1 Verification**: contracts/sitemap-xml-schema.md includes "Breaking Changes" section defining what constitutes MAJOR version bump. Version history table tracks changes. quickstart.md versioned at 1.0.0.
### ✅ VII. Simplicity, Minimal Dependencies & YAGNI
- **Status**: COMPLIANT WITH JUSTIFICATION
- **Dependencies**:
-`googleapis@140.0.0` - **JUSTIFIED**: Official Google SDK, handles complex OAuth2/JWT flow, implements Drive API v3 protocol, active maintenance. Alternative (manual JWT + REST calls) would take >2 days and risk protocol errors.
- ✅ Node.js built-ins: `http` (server), `fs` (config loading), `path` (file paths), `events` (FIFO queue via EventEmitter), `crypto` (randomUUID for request tracing)
- **No speculative features**: Only implements /sitemap.xml endpoint (document export removed from scope in Session 2). No caching, no health checks, no admin UI.
- **YAGNI applied**: Rejected retry logic (per spec: fail immediately on 503), rejected file logging (stdout/stderr only), rejected concurrent processing (FIFO queue mandated).
- **Phase 1 Verification**: research.md Section 6 documents Technology Stack Validation - confirms only googleapis as external dependency. data-model.md uses only built-in types (no ORM, no database). quickstart.md confirms minimal dependencies section.
### Constitution Check Summary (Post-Phase 1)
- **PASS**: All 7 constitutional principles satisfied after Phase 1 design
- **Action Items**: Phase 2 tasks.md must enforce TDD workflow with test approval gate
- **Design Artifacts Complete**:
- ✅ research.md - All technical unknowns resolved
- ✅ data-model.md - Entities, state machines, validation rules documented
- ✅ contracts/sitemap-xml-schema.md - Complete API contract with examples
- ✅ quickstart.md - Installation, configuration, usage, troubleshooting guide
- ✅ Agent context updated - Copilot instructions.md includes language/database/project type
## Project Structure
### Documentation (this feature)
```text
specs/001-drive-proxy-adapter/
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
│ └── sitemap-xml-schema.md
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
```text
# Single project - Monolithic proxy architecture (per Constitution Principle I)
src/
├── server.js # HTTP server entry point, routes all requests to proxy.js
├── proxy.js # Monolithic route handler (all sitemap logic inline)
├── console.js # Logging module (console.js alias: log/info/debug/error)
├── auth.js # Service Account JWT authentication (googleapis wrapper)
├── utils.js # Inline utility functions (if needed - prefer inline in proxy.js)
└── xml-utils.js # XML generation utilities (sitemap format)
config/
├── config.js # Server configuration (port, base URL) - JSON export
└── settings.js # Drive API query filter configuration - loaded into global `settings`
tests/
├── contract/ # XML sitemap format validation tests
│ └── sitemap-schema.test.js
├── integration/ # End-to-end /sitemap.xml endpoint tests
│ ├── sitemap-endpoint.test.js
│ ├── error-scenarios.test.js
│ └── queue-concurrency.test.js
└── unit/ # Unit tests for Drive API client, JWT, sitemap generator
├── drive-client.test.js
├── auth.test.js
├── sitemap-generator.test.js
└── queue.test.js
```
**Structure Decision**: Single project with monolithic architecture. All proxy logic consolidated in `src/proxy.js` per Constitution Principle I. The `server.js` routes all requests to `proxy.js`. Configuration split between `config/config.js` (server settings) and `config/settings.js` (Drive API filter - loaded into global `settings` variable). Testing organized by contract/integration/unit layers to support TDD workflow (Constitution Principle III).
## Complexity Tracking
> **Fill ONLY if Constitution Check has violations that must be justified**
**NO VIOLATIONS** - All constitutional principles satisfied. No complexity justification required.

View File

@@ -0,0 +1,495 @@
# Quickstart Guide: Google Drive HTTP Proxy Adapter
**Feature**: 001-drive-proxy-adapter
**Date**: 2026-03-07
**Version**: 1.0.0
---
## Overview
The Google Drive HTTP Proxy Adapter is a Node.js application that generates XML sitemaps of Google Drive documents. It provides a single HTTP endpoint (`/sitemap.xml`) that queries the Google Drive API and returns a sitemap listing all accessible documents with links in RESTful format.
**Key Features**:
- Service Account authentication (JWT-based, no user interaction)
- Sitemap protocol compliant (50,000 URL limit enforced)
- FIFO request queuing (sequential processing)
- Configurable Drive API filters
- Plain text logging to stdout/stderr
---
## Prerequisites
1. **Node.js**: v18.0.0 or later (LTS version recommended)
2. **Google Cloud Project**: With Drive API enabled
3. **Service Account**: JSON key file with Drive API access
4. **Network Access**: Connectivity to googleapis.com
---
## Installation
### 1. Clone Repository
```bash
git clone <repository-url>
cd google-drive-content-adapter
```
### 2. Install Dependencies
```bash
npm install
```
**Dependencies**:
- `googleapis@^140.0.0` - Official Google API client for Node.js
---
## Configuration
### 1. Service Account Setup
**Create Service Account** (Google Cloud Console):
1. Navigate to [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
2. Click "Create Service Account"
3. Name: `drive-sitemap-adapter` (or your choice)
4. Grant role: None required if accessing service account's own Drive
5. Click "Create Key" → Choose JSON format → Download key file
**Enable Drive API**:
1. Navigate to [APIs & Services > Library](https://console.cloud.google.com/apis/library)
2. Search for "Google Drive API"
3. Click "Enable"
**Grant Access** (if accessing user drives):
- Share Drive folders/files with Service Account email (`xxx@project.iam.gserviceaccount.com`)
- OR configure domain-wide delegation (for G Suite organizations)
---
### 2. Environment Variables
Create `.env` file in project root (or set environment variables):
```bash
# REQUIRED: Service Account credentials (inline JSON)
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"xxx@project.iam.gserviceaccount.com","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}'
# OPTIONAL: Server configuration
PORT=3000 # Default: 3000
BASE_URL=http://localhost:3000 # Default: http://localhost:3000
# OPTIONAL: Drive API query filter
DRIVE_QUERY="trashed = false" # Default: "trashed = false"
```
**Important Notes**:
- `GOOGLE_SERVICE_ACCOUNT_KEY` must be a single-line JSON string (escape newlines in private_key)
- `BASE_URL` should match your production domain for sitemap URLs
- `DRIVE_QUERY` supports Drive API query syntax ([docs](https://developers.google.com/drive/api/guides/search-files))
---
### 3. Configuration Files
**config/config.js**: Server settings (auto-generated from env vars)
```javascript
export default {
server: {
port: process.env.PORT || 3000,
baseUrl: process.env.BASE_URL || 'http://localhost:3000'
}
};
```
**config/settings.js**: Drive API configuration
```javascript
export default {
drive: {
query: process.env.DRIVE_QUERY || "trashed = false",
fields: 'files(id, name, mimeType, modifiedTime)',
pageSize: 1000,
scope: 'https://www.googleapis.com/auth/drive.readonly'
}
};
```
**To customize Drive API filter**, edit `config/settings.js` or set `DRIVE_QUERY` env var.
---
## Usage
### Start Server (Development)
```bash
npm run dev
```
**Output**:
```
[2026-03-07T10:00:00.000Z] [INFO] Server configuration loaded: port=3000, baseUrl=http://localhost:3000
[2026-03-07T10:00:00.100Z] [INFO] Service Account authenticated: xxx***@project.iam.gserviceaccount.com
[2026-03-07T10:00:00.200Z] [INFO] HTTP server listening on port 3000
```
---
### Start Server (Production)
```bash
npm start
```
---
### Request Sitemap
**Using curl**:
```bash
curl http://localhost:3000/sitemap.xml
```
**Expected Response** (200 OK):
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost:3000/documents/1A2B3C4D5E6F7G8H</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://localhost:3000/documents/9I0J1K2L3M4N5O6P</loc>
<lastmod>2026-03-05</lastmod>
</url>
</urlset>
```
---
## Testing
### Run All Tests
```bash
npm test
```
**Test Suites**:
- `tests/unit/` - Unit tests for Drive client, auth, sitemap generator, queue
- `tests/integration/` - End-to-end endpoint tests for /sitemap.xml
- `tests/contract/` - XML sitemap schema validation tests
---
### Run Specific Test Suite
```bash
npm run test:unit # Unit tests only
npm run test:integration # Integration tests only
npm run test:contract # Contract tests only
```
---
## API Reference
### Endpoint: `GET /sitemap.xml`
**Description**: Generate XML sitemap of all accessible Google Drive documents.
**Request**:
```http
GET /sitemap.xml HTTP/1.1
Host: example.com
```
**Success Response** (200 OK):
```http
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: {size}
```
**Error Responses**:
- `404 Not Found` - Invalid endpoint (only /sitemap.xml supported)
- `413 Payload Too Large` - More than 50,000 documents in Drive
- `429 Too Many Requests` - Rate limit exceeded (includes `Retry-After` header)
- `401 Unauthorized` - Authentication failed
- `503 Service Unavailable` - Drive API unavailable
- `500 Internal Server Error` - Unexpected error
**Note**: All error responses have **empty body** (status code only).
See [contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md) for full API contract.
---
## Architecture
### Project Structure
```
google-drive-content-adapter/
├── src/
│ ├── server.js # HTTP server entry point
│ ├── proxy.js # Monolithic route handler (sitemap logic)
│ ├── logger.js # Logging module (console.js alias)
│ ├── auth.js # Service Account JWT authentication
│ └── xml-utils.js # XML generation utilities
├── config/
│ ├── config.js # Server configuration (port, baseUrl)
│ └── settings.js # Drive API filter configuration
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── contract/ # Contract tests
├── specs/ # Feature specifications and planning docs
│ └── 001-drive-proxy-adapter/
│ ├── spec.md
│ ├── plan.md
│ ├── research.md
│ ├── data-model.md
│ ├── quickstart.md (this file)
│ └── contracts/
│ └── sitemap-xml-schema.md
├── package.json
└── README.md
```
---
### Request Flow
```
1. Client → GET /sitemap.xml
2. Server → Create RequestContext (ID, timestamp)
3. Server → Enqueue request (FIFO queue)
4. Queue → Process request (sequential, one at a time)
5. Proxy → Authenticate with Service Account JWT
6. Proxy → Query Drive API files.list() (paginate if >1000 docs)
7. Proxy → Check count ≤ 50,000
8. Proxy → Transform Documents to SitemapEntries
9. Proxy → Generate XML sitemap
10. Server → Return 200 + XML (or error status)
11. Queue → Process next request
```
---
## Troubleshooting
### 1. Fatal Error: Invalid Service Account Credentials
**Error**:
```
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Invalid client_email in Service Account credentials
```
**Solution**:
- Check `GOOGLE_SERVICE_ACCOUNT_KEY` env var is valid JSON
- Ensure `client_email` field ends with `.gserviceaccount.com`
- Ensure `private_key` field starts with `-----BEGIN PRIVATE KEY-----`
- Verify no extra escaping/quotes in JSON string
---
### 2. Fatal Error: Port Already in Use
**Error**:
```
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Unable to bind to port 3000 (EADDRINUSE)
```
**Solution**:
- Change `PORT` env var to different port (e.g., 8080)
- OR stop other process using port 3000: `lsof -ti:3000 | xargs kill`
---
### 3. 401 Unauthorized Response
**Cause**: Service Account token refresh failed
**Solution**:
- Verify Service Account has Drive API access (share folders with service account email)
- Check Drive API is enabled in Google Cloud Console
- Ensure scope is correct: `https://www.googleapis.com/auth/drive.readonly`
---
### 4. 413 Payload Too Large Response
**Cause**: Google Drive contains more than 50,000 documents
**Solution**:
- Adjust `DRIVE_QUERY` to filter documents (e.g., by folder, date, file type)
- Example: `DRIVE_QUERY="'folder-id' in parents and trashed = false"`
---
### 5. 429 Too Many Requests Response
**Cause**: Drive API rate limit exceeded
**Solution**:
- Wait for time specified in `Retry-After` response header (seconds)
- Reduce request frequency
- Consider Drive API quota limits ([docs](https://developers.google.com/drive/api/guides/limits))
---
### 6. 503 Service Unavailable Response
**Cause**: Google Drive API is temporarily unavailable
**Solution**:
- Wait and retry manually (no automatic retries per spec)
- Check [Google Workspace Status Dashboard](https://www.google.com/appsstatus)
---
## Performance Tips
### 1. Optimize Drive Query Filter
**Default** (all files):
```javascript
DRIVE_QUERY="trashed = false"
```
**Filter by folder**:
```javascript
DRIVE_QUERY="'folder-id' in parents and trashed = false"
```
**Filter by date**:
```javascript
DRIVE_QUERY="modifiedTime > '2026-01-01T00:00:00' and trashed = false"
```
**Filter by MIME type**:
```javascript
DRIVE_QUERY="mimeType = 'application/pdf' and trashed = false"
```
See [Drive API search query syntax](https://developers.google.com/drive/api/guides/search-files) for more options.
---
### 2. Adjust BASE_URL for Production
**Development**:
```
BASE_URL=http://localhost:3000
```
**Production**:
```
BASE_URL=https://your-domain.com
```
This ensures sitemap URLs point to the correct domain.
---
### 3. Monitor Memory Usage
**Check memory usage** (production):
```bash
node --inspect src/server.js
# Open chrome://inspect in Chrome DevTools
```
**Expected**: <256MB under normal load (<10 concurrent requests)
---
## Security Best Practices
1. **Never commit** Service Account JSON key file to version control
2. **Use environment variables** for all sensitive configuration
3. **Restrict Service Account permissions** to minimum required (readonly scope)
4. **Monitor logs** for unauthorized access attempts
5. **Use HTTPS** in production (configure reverse proxy like nginx)
6. **Filter credentials from logs** (private_key field never logged)
---
## Deployment
### Docker (Recommended)
**Dockerfile**:
```dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
```
**Build and run**:
```bash
docker build -t drive-sitemap-adapter .
docker run -p 3000:3000 \
-e GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account",...}' \
-e BASE_URL=https://your-domain.com \
drive-sitemap-adapter
```
---
### Cloud Platforms
**Google Cloud Run**:
```bash
gcloud run deploy drive-sitemap-adapter \
--source . \
--set-env-vars BASE_URL=https://your-domain.com \
--set-secrets GOOGLE_SERVICE_ACCOUNT_KEY=service-account-key:latest
```
**AWS ECS / Fargate**: Use environment variables in task definition
**Heroku**: Set environment variables via Heroku CLI or dashboard
---
## Additional Resources
- **Feature Specification**: [specs/001-drive-proxy-adapter/spec.md](./spec.md)
- **Implementation Plan**: [specs/001-drive-proxy-adapter/plan.md](./plan.md)
- **Research Document**: [specs/001-drive-proxy-adapter/research.md](./research.md)
- **Data Model**: [specs/001-drive-proxy-adapter/data-model.md](./data-model.md)
- **API Contract**: [specs/001-drive-proxy-adapter/contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md)
- **Google Drive API Docs**: [https://developers.google.com/drive/api/v3/reference](https://developers.google.com/drive/api/v3/reference)
- **Sitemap Protocol**: [https://www.sitemaps.org/protocol.html](https://www.sitemaps.org/protocol.html)
---
## Support
For issues or questions, refer to:
1. This quickstart guide
2. Feature specification (spec.md) for requirements
3. Research document (research.md) for technical decisions
4. Contract documentation (contracts/) for API details
---
## Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial quickstart guide |
| Version | Date | Changes |
|---------|------|---------|
| 1.0.0 | 2026-03-07 | Initial quickstart guide |

View File

@@ -0,0 +1,368 @@
# Research: Google Drive HTTP Proxy Adapter
**Feature**: 001-drive-proxy-adapter
**Phase**: 0 - Outline & Research
**Date**: 2026-03-07
## Overview
This research document consolidates findings from all clarification sessions (10 Q&A pairs across 3 sessions) and investigates technical decisions for building a Node.js HTTP proxy adapter that generates XML sitemaps from Google Drive documents using Service Account authentication.
## Research Areas
### 1. Google Drive API Service Account Authentication
**Decision**: Use Service Account with JWT-based authentication (server-to-server, no user interaction)
**Rationale**:
- Service Account provides server-to-server authentication without user login flow
- JWT tokens generated programmatically from JSON key file (client_email + private_key)
- Ideal for proxy/adapter scenarios where application acts on behalf of domain users
- Tokens auto-refresh via googleapis SDK (handles expiry transparently)
**Implementation Approach**:
- Load JSON key file from environment variable `GOOGLE_SERVICE_ACCOUNT_KEY` (inline JSON string)
- Use `googleapis` npm package `google.auth.GoogleAuth` class with JWT configuration
- Set scope to `https://www.googleapis.com/auth/drive.readonly` (read-only access)
- SDK automatically manages token lifecycle (generation, refresh, caching)
**Alternatives Considered**:
- ❌ OAuth 2.0 user flow - Requires interactive browser login, unsuitable for proxy adapter
- ❌ API key authentication - Not supported for Drive API (OAuth required)
- ❌ Manual JWT implementation - Complex signing/token exchange, googleapis SDK already provides this
**References**:
- [Google Service Account Documentation](https://cloud.google.com/iam/docs/service-accounts)
- [googleapis Node.js Client](https://github.com/googleapis/google-api-nodejs-client)
---
### 2. XML Sitemap Generation (Sitemap Protocol)
**Decision**: Generate XML sitemap conforming to sitemaps.org protocol, enforce 50,000 URL limit
**Rationale**:
- Sitemap protocol specifies max 50,000 URLs per sitemap file
- Each URL entry requires `<loc>` (required), optional `<lastmod>` (from Drive modifiedTime)
- Must use proper XML namespace: `xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"`
- URLs must be absolute (include base URL prefix)
**Implementation Approach**:
- Query Drive API: `drive.files.list()` with fields `files(id, name, mimeType, modifiedTime)`
- Count results - if >50,000, return HTTP 413 Payload Too Large immediately
- Build XML using template literals (Node.js native approach) or minimal XML library
- Format URLs as RESTful paths: `{baseUrl}/documents/{documentId}`
- Include `<lastmod>` using ISO 8601 format from Drive API `modifiedTime` field
**Alternatives Considered**:
- ❌ Sitemap index with multiple sitemaps - Over-engineering for initial requirement (YAGNI)
- ❌ Paginated sitemaps - Not requested in spec, adds complexity
- ✅ Node.js built-in XML generation (template literals) - Simple for flat structure
- ⚠️ `xmlbuilder2` npm package - Consider if XML escaping becomes complex (acceptable dependency per constitution if justified)
**References**:
- [Sitemaps.org Protocol](https://www.sitemaps.org/protocol.html)
- [Google Sitemap Guidelines](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)
---
### 3. Concurrency Control - FIFO Request Queue
**Decision**: Implement FIFO queue for `/sitemap.xml` requests, process one at a time
**Rationale** (from Session 3 clarification):
- Prevents concurrent Drive API queries that could cause rate limiting issues
- Ensures predictable resource usage (single Drive API operation at a time)
- Simple queue semantics: first request in, first request served
- If request fails, continue to next in queue (no retry per spec)
**Implementation Approach**:
- Use Node.js EventEmitter pattern for queue implementation (built-in module)
- Maintain array of pending request handlers (FIFO array: push to end, shift from start)
- Check queue state before processing:
- If queue empty: start processing immediately
- If queue busy: add request to pending array
- Emit 'complete' event to trigger next request processing
**Code Pattern**:
```javascript
import { EventEmitter } from 'events';
class SitemapQueue extends EventEmitter {
constructor() {
super();
this.processing = false;
this.queue = [];
}
async process(handler) {
return new Promise((resolve, reject) => {
this.queue.push({ handler, resolve, reject });
if (!this.processing) this.processNext();
});
}
async processNext() {
if (this.queue.length === 0) {
this.processing = false;
return;
}
this.processing = true;
const { handler, resolve, reject } = this.queue.shift();
try {
const result = await handler();
resolve(result);
} catch (error) {
reject(error);
} finally {
this.processNext(); // Process next in queue
}
}
}
```
**Alternatives Considered**:
- ❌ Concurrent processing with rate limiting - More complex, not required per clarification
- ❌ External queue (Redis, RabbitMQ) - Over-engineering for single-server deployment
- ❌ Worker pool - Unnecessary complexity for sequential processing requirement
---
### 4. Error Handling Strategy
**Decision**: Status-code-only errors (no response body), crash on fatal errors, immediate 503 passthrough
**Rationale** (consolidated from all 3 sessions):
- **Clarification**: HTTP status code only, no error response body (Session 1)
- **Clarification**: Return 429 with `Retry-After` header for rate limiting (Session 1)
- **Clarification**: No retries on Drive API 503, immediately return 503 to client (Session 2)
- **Clarification**: Crash with exit code 1 on fatal errors (invalid credentials, port binding failure) (Session 3)
- **Clarification**: Return 413 for >50k documents (Session 3)
**Error Scenarios**:
| Scenario | HTTP Status | Response Body | Retry-After Header | Action |
|----------|-------------|---------------|-------------------|--------|
| Successful sitemap | 200 OK | XML sitemap | N/A | Return sitemap |
| Invalid endpoint | 404 Not Found | Empty | N/A | Status only |
| >50k documents | 413 Payload Too Large | Empty | N/A | Status only |
| Drive API rate limit | 429 Too Many Requests | Empty | Seconds until retry | Status + header |
| OAuth token expired | 401 Unauthorized | Empty | N/A | Token refresh failed |
| Drive API unavailable (503) | 503 Service Unavailable | Empty | N/A | No retry, immediate passthrough |
| Internal error | 500 Internal Server Error | Empty | N/A | Log error, return status |
| Fatal startup error | N/A | N/A | N/A | Log to stderr, exit(1) |
**Implementation Approach**:
- Use try-catch blocks in request handler
- Map googleapis SDK errors to HTTP status codes
- Set `Retry-After` header by extracting from Drive API error response
- Detect fatal errors during startup (invalid credentials, port EADDRINUSE)
- Use `logger.error()` for stderr logging before `process.exit(1)`
---
### 5. Logging Format and Destination
**Decision**: Plain text logging to stdout/stderr with format `[timestamp] [level] message`
**Rationale** (from Session 3 clarification):
- Simple, human-readable format for container/cloud environments
- stdout for informational logs (info, debug)
- stderr for errors (error level)
- No file-based logging (per constitution: "stdout/stderr only")
- Timestamp helps with debugging time-sequence issues
**Implementation Approach** (already exists in codebase):
```javascript
// src/logger.js (aliased as console.js per constitution)
const formatMessage = (level, message) => {
const timestamp = new Date().toISOString();
return `[${timestamp}] [${level.toUpperCase()}] ${message}`;
};
export const logger = {
log: (msg) => console.log(formatMessage('info', msg)),
info: (msg) => console.log(formatMessage('info', msg)),
debug: (msg) => console.log(formatMessage('debug', msg)),
error: (msg) => console.error(formatMessage('error', msg))
};
```
**Log Events to Capture**:
- Server startup: port, base URL configuration
- Incoming request: method, endpoint, client IP
- Request completion: status code, response time
- Drive API interaction: query start, document count, completion time
- Errors: error type, message, stack trace (if available)
- Fatal errors: critical error message before crash
**Alternatives Considered**:
- ❌ JSON structured logging - Over-engineering for initial requirement, plain text is simpler
- ❌ File-based logging - Explicitly rejected in constitution and clarifications
- ❌ External logging service (Sentry, LogDNA) - Not required, adds dependency
---
### 6. Configuration Management
**Decision**: Split configuration between server settings (config/config.js) and Drive API filter (config/settings.js), load credentials from environment variable
**Rationale** (from Sessions 2 & 3 clarifications):
- **Clarification**: Service Account credentials in env var `GOOGLE_SERVICE_ACCOUNT_KEY` (Session 2)
- **Clarification**: Drive API filter configurable in `config/settings.js` (Session 3)
- Server configuration (port, base URL) in `config/config.js` (per constitution)
- settings.js loaded into global `settings` variable (per constitution)
**Configuration Schema**:
`config/config.js`:
```javascript
export default {
server: {
port: process.env.PORT || 3000,
baseUrl: process.env.BASE_URL || 'http://localhost:3000'
}
};
```
`config/settings.js`:
```javascript
export default {
drive: {
// Drive API query filter (q parameter)
// Default: all files excluding trashed
query: process.env.DRIVE_QUERY || "trashed = false",
// Fields to retrieve
fields: 'files(id, name, mimeType, modifiedTime)',
// Maximum results per page
pageSize: 1000
}
};
```
**Environment Variables**:
- `GOOGLE_SERVICE_ACCOUNT_KEY` (required): JSON key file content (inline string)
- `PORT` (optional): Server port (default: 3000)
- `BASE_URL` (optional): Base URL for sitemap URLs (default: http://localhost:3000)
- `DRIVE_QUERY` (optional): Drive API query filter (default: "trashed = false")
**Startup Validation**:
- Check `GOOGLE_SERVICE_ACCOUNT_KEY` is present and valid JSON
- Validate JSON contains required fields: `client_email`, `private_key`
- If validation fails: log critical error to stderr, exit(1)
- Check port is available (catch EADDRINUSE error), exit(1) if unavailable
**Alternatives Considered**:
- ❌ Credentials file on disk - Environment variable approach is more secure and container-friendly
- ❌ Hardcoded Drive query - Explicitly rejected in Session 3 clarification
- ❌ Database configuration storage - Over-engineering for simple key-value config
---
## Technology Stack Validation
### Core Dependencies
| Package | Version | Justification | Constitution Compliance |
|---------|---------|---------------|------------------------|
| `googleapis` | ^140.0.0 | Official Google SDK, handles OAuth2/JWT complexity, implements Drive API v3 protocol. Alternative (manual implementation) would take >2 days and risk protocol errors. | ✅ APPROVED (documented in plan.md) |
### Node.js Built-ins Used
- `http` - HTTP server
- `fs` - Configuration file loading
- `path` - File path utilities
- `events` - FIFO queue implementation (EventEmitter)
- `url` - URL parsing for request routing
**No additional external dependencies required** - All other functionality (XML generation, logging, queue) implemented using Node.js built-ins.
---
## Best Practices Research
### 1. Service Account Security
- **Never log credentials**: Filter private_key from logs
- **Validate JSON structure**: Check required fields before use
- **Scope restriction**: Use minimal scope (readonly)
- **Token lifecycle**: Let googleapis SDK manage refresh automatically
### 2. HTTP Server Best Practices
- **Graceful shutdown**: Handle SIGTERM/SIGINT for cleanup
- **Request timeout**: Set reasonable timeout (30-60 seconds for Drive API calls)
- **Error boundaries**: Catch all errors to prevent crashes (except fatal startup errors)
- **Content-Type headers**: Always set appropriate headers (application/xml for sitemap)
### 3. Google Drive API Best Practices
- **Pagination**: Use pageToken for >1000 results (Drive API default page size)
- **Field filtering**: Request only needed fields to reduce payload size
- **Rate limiting**: Handle 429 errors gracefully (already in spec)
- **Exponential backoff**: NOT required per spec (no retries on 503)
### 4. Sitemap Generation Best Practices
- **XML escaping**: Escape special characters in URLs (&, <, >, ", ')
- **Absolute URLs**: Always use full URLs with protocol and domain
- **Date format**: Use ISO 8601 format for lastmod (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS+00:00)
- **URL encoding**: Encode document IDs if they contain special characters
---
## Integration Patterns
### Request Flow
```
Client Request → HTTP Server → FIFO Queue → Drive API Query → XML Generation → Response
(Sequential Processing)
```
### Authentication Flow
```
Startup → Load GOOGLE_SERVICE_ACCOUNT_KEY → Parse JSON → Create GoogleAuth Client
Request → Check Token Expiry → Auto-Refresh (if needed) → Use Token for Drive API
```
### Error Flow
```
Error Occurs → Map to HTTP Status → Set Headers (Retry-After if 429) → Return Status Code (no body)
Log Error (stderr) → Include context (request ID, error message)
```
---
## Open Questions & Assumptions
### Resolved via Clarifications (All 3 Sessions)
✅ Authentication method → Service Account with JWT
✅ URL format → `/documents/{documentId}` (RESTful)
✅ Error response format → Status code only, no body
✅ Rate limiting behavior → 429 with Retry-After header
✅ Drive API 503 handling → No retries, immediate passthrough
✅ Credentials storage → Inline JSON in env var
✅ Logging destination → stdout/stderr only
✅ >50k documents handling → 413 error
✅ Fatal error handling → Crash with exit code 1
✅ Concurrent requests → FIFO queue, sequential processing
✅ Log format → Plain text `[timestamp] [level] message`
✅ Drive query filter → Configurable in config/settings.js
### Assumptions (from spec.md)
- Service Account has domain-wide delegation if accessing user drives
- Base URL configured correctly for production environment
- Node.js v18+ LTS available on deployment platform
- Network connectivity to googleapis.com available
---
## Summary
All technical unknowns from the specification have been resolved through 3 clarification sessions (10 Q&A pairs total). Key research findings:
1. **Authentication**: googleapis SDK with Service Account JWT (load from env var)
2. **Sitemap Protocol**: Enforce 50k limit, use standard XML namespace, include lastmod
3. **Concurrency**: FIFO queue using Node.js EventEmitter (sequential processing)
4. **Error Handling**: Status-only responses, crash on fatal errors, no retries on 503
5. **Logging**: Plain text format to stdout/stderr (no files)
6. **Configuration**: Split between config.js (server) and settings.js (Drive query filter)
**No remaining NEEDS CLARIFICATION items** - Ready to proceed to Phase 1 design.

158
specs/001-sitemap/spec.md Normal file
View File

@@ -0,0 +1,158 @@
# Feature Specification: Google Drive HTTP Proxy Adapter
**Feature Branch**: `001-drive-proxy-adapter`
**Created**: 2026-03-06
**Updated**: 2026-03-07
**Status**: Draft
**Input**: User description: "I want to build a node.js application that provides an http proxy adapter to search and export documents from Google Drive. HTTP requests to 'sitemap.xml' should use a query to list documents in Google Drive. The links returned in the 'sitemap.xml' should link back to this adapter with a document id."
**Scope Change (2026-03-07)**: Simplified to only handle sitemap.xml generation. Document export functionality removed from scope.
## Clarifications
### Session 2026-03-06
- Q: Architecture approach - format conversion vs metadata-only vs hybrid? → A: Use metadata exportLinks to fetch and stream files through adapter (hybrid: metadata discovery + content streaming)
- Q: How to handle Markdown format (not in Drive API exportLinks)? → A: Check exportLinks for text/x-markdown; if unavailable, convert from HTML export
- Q: What error response format (JSON/text/status-only)? → A: HTTP status code only, no error response body
- Q: Rate limiting behavior when Drive API limits hit? → A: Return 429 with Retry-After header indicating seconds until retry
- Q: Maximum document size limit for streaming? → A: Stream up to 20MB maximum; return 413 Payload Too Large for larger documents
### Session 2026-03-07
- **SCOPE CHANGE**: Removed all document export functionality. System now only generates sitemap.xml with document IDs. The links in the sitemap point back to the adapter with document IDs, but the adapter does not implement the document retrieval endpoints.
- Q: Authentication method for Google Drive API? → A: Service Account with JSON key file (JWT-based, server-to-server authentication)
- Q: Sitemap URL format for document links? → A: /documents/{documentId} (RESTful, clear resource path)
- Q: Retry behavior when Drive API returns 503? → A: No retries, immediately return 503 to client
- Q: Service account credentials storage method? → A: Inline JSON in env var (GOOGLE_SERVICE_ACCOUNT_KEY)
- Q: Logging output destination? → A: stdout/stderr only (console logging, no files)
### Session 3 (2026-03-07)
- Q: How should the system handle cases where >50,000 documents exist in Google Drive (exceeding sitemap protocol limit)? → A: Return 413 error if >50k documents exist
- Q: How should the system handle fatal errors (e.g., invalid service account credentials, unable to bind to port)? → A: Log critical error + crash with exit code 1
- Q: How should the system handle concurrent requests to /sitemap.xml? → A: Queue requests, process one at a time (FIFO)
- Q: What format should be used for log messages? → A: Plain text logging format [timestamp] [level] message
- Q: Should the Drive API query filter be hardcoded or configurable? → A: Drive API filter should be configurable in config/settings.js file (not hardcoded)
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Generate Sitemap of Available Documents (Priority: P1)
A user makes an HTTP GET request to `/sitemap.xml` and receives a valid XML sitemap listing all accessible Google Drive documents with links back to the adapter (document IDs only, no export functionality).
**Why this priority**: This is the core and only functionality. Enables document discovery and generates a sitemap with links containing document IDs. This makes the adapter useful for indexing scenarios (e.g., search engines, content aggregators).
**Independent Test**: Can be tested by making GET request to `/sitemap.xml` and verifying: (1) valid XML sitemap format, (2) contains URLs pointing to adapter endpoints with document IDs, (3) reflects documents accessible in user's Google Drive.
**Acceptance Scenarios**:
1. **Given** user has access to Google Drive documents, **When** user requests `/sitemap.xml`, **Then** system returns 200 status with valid XML sitemap
2. **Given** sitemap is generated, **When** examining the XML, **Then** each `<url>` entry contains a `<loc>` pointing to the adapter using RESTful format (e.g., `http://adapter-host/documents/{documentId}`)
3. **Given** multiple documents in Google Drive, **When** sitemap is generated, **Then** all accessible documents are included in the sitemap
4. **Given** user lacks permission to certain documents, **When** sitemap is generated, **Then** those documents are excluded from the sitemap
5. **Given** the adapter base URL is configured, **When** sitemap is generated, **Then** all URLs use the configured base URL
---
### Edge Cases
- What happens when Google Drive API is unavailable or rate-limited? → Return 503 Service Unavailable immediately without retries if API returns 503; return 429 Too Many Requests with Retry-After header if rate limited
- What happens when OAuth token expires during request? → Attempt token refresh; if failed, return 401 Unauthorized
- How are shared drive documents handled? → Treat same as My Drive documents if user has access
- What happens with password-protected or restricted documents? → Exclude from sitemap (filter out documents without read access)
- How are document updates reflected in sitemap? → Each sitemap request fetches current list; no caching
- What if there are more than 50,000 documents (sitemap limit)? → Return 413 Payload Too Large error (enforces sitemap protocol limit)
- How are non-document files handled (images, videos, etc.)? → Include all files in sitemap regardless of type
- What happens if no documents are accessible? → Return valid sitemap XML with no URL entries
- What happens when multiple /sitemap.xml requests arrive simultaneously? → Requests are queued and processed sequentially in FIFO order (one at a time)
- What happens when service account credentials are invalid or missing at startup? → Log critical error to stderr and crash with exit code 1
- How are Drive API query filters customized? → Configure filters in config/settings.js file (not hardcoded)
- What happens if config/settings.js is missing or malformed? → Log critical error to stderr and crash with exit code 1
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: System MUST provide an HTTP server that listens for incoming requests
- **FR-002**: System MUST authenticate with Google Drive API using Service Account with JSON key file (JWT-based, server-to-server authentication)
- **FR-003**: System MUST handle GET requests to `/sitemap.xml` endpoint
- **FR-004**: System MUST query Google Drive API to retrieve list of accessible documents for sitemap generation
- **FR-005**: System MUST generate valid XML sitemap conforming to sitemap protocol (https://www.sitemaps.org/protocol.html)
- **FR-006**: System MUST include document metadata in sitemap (URL with RESTful path format `/documents/{documentId}`, last modified date if available)
- **FR-007**: System MUST return HTTP 404 Not Found for any endpoint other than `/sitemap.xml`
- **FR-008**: System MUST return appropriate HTTP status codes (200 OK, 401 Unauthorized, 413 Payload Too Large, 429 Too Many Requests, 500 Internal Server Error, 503 Service Unavailable)
- **FR-009**: System MUST include Content-Type: application/xml header for sitemap responses
- **FR-010**: System MUST handle OAuth token refresh when tokens expire
- **FR-011**: System MUST log all incoming requests to stdout/stderr using plain text format: [timestamp] [level] message (includes endpoint and response status)
- **FR-012**: System MUST log errors to stdout/stderr using plain text format: [timestamp] [level] message (includes request ID and error message for debugging)
- **FR-013**: System MUST handle Google Drive API rate limiting gracefully by returning 429 status with Retry-After header indicating seconds until retry
- **FR-017**: System MUST NOT retry when Google Drive API returns 503; instead immediately return 503 to client
- **FR-014**: System MUST support configuration via environment variables (port, base URL)
- **FR-018**: System MUST load Service Account credentials from environment variable GOOGLE_SERVICE_ACCOUNT_KEY containing inline JSON key file content
- **FR-015**: System MUST return 413 Payload Too Large if Google Drive contains more than 50,000 documents (enforces sitemap protocol limit)
- **FR-016**: System MUST filter out documents user lacks read access to from sitemap
- **FR-019**: System MUST process /sitemap.xml requests sequentially using a FIFO queue (one request at a time to prevent concurrent Drive API operations)
- **FR-020**: System MUST crash with exit code 1 after logging critical errors (e.g., invalid service account credentials, unable to bind to port, missing required configuration)
- **FR-021**: System MUST load Drive API query filter configuration from config/settings.js file (not hardcoded in source)
### Key Entities
- **Document**: Represents a file in Google Drive. Key attributes include: document ID (unique identifier), title, MIME type, last modified timestamp, permissions status
- **Sitemap Entry**: Represents a document listing in the sitemap XML. Attributes include: location URL (RESTful path `/documents/{documentId}`), last modified date
- **HTTP Request Context**: Represents an incoming request. Attributes include: request ID (for tracing), Service Account JWT token, requested endpoint, client IP
- **Service Account Credentials**: Represents JWT-based authentication state. Attributes include: client email, private key (from JSON key file), access token (generated via JWT), token expiry time, scopes granted
- **Configuration**: Represents application settings. Attributes include: Drive API query filter (loaded from config/settings.js), server port, base URL, request queue (FIFO for /sitemap.xml requests)
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: Users can request `/sitemap.xml` and receive a valid XML sitemap within 5 seconds for drives containing up to 10,000 documents
- **SC-002**: System successfully handles at least 10 concurrent sitemap requests without errors (queued and processed sequentially in FIFO order)
- **SC-003**: 95% of sitemap requests complete successfully (200 status code)
- **SC-004**: System responds to invalid endpoint requests (404) within 1 second
- **SC-005**: System gracefully handles Google Drive API rate limits without crashing, returning 429 status codes with Retry-After headers
- **SC-006**: Service Account JWT token generation succeeds automatically in >99% of expiration scenarios
- **SC-007**: System startup time from cold start to accepting requests is under 10 seconds
- **SC-008**: System memory usage remains under 256MB under normal load (< 10 concurrent requests)
- **SC-011**: All logs output to stdout/stderr only using plain text format [timestamp] [level] message; no log files created on disk
- **SC-009**: Sitemap includes all accessible documents (100% coverage for documents with read permission)
- **SC-010**: Generated sitemap XML validates against sitemap protocol schema
- **SC-012**: System returns 413 Payload Too Large when Drive contains >50,000 documents (prevents oversized sitemap generation)
- **SC-013**: System terminates with exit code 1 within 5 seconds of encountering fatal configuration or startup errors
## Assumptions _(optional)_
- Service Account has valid JSON key file credentials configured for Google Drive access
- The adapter runs as a trusted application with appropriate scopes for Google Drive access (read-only, https://www.googleapis.com/auth/drive.readonly)
- Service Account JSON key file is provided via GOOGLE_SERVICE_ACCOUNT_KEY environment variable as inline JSON string
- Network connectivity to Google Drive API (https://www.googleapis.com/drive/v3/) is available
- Document IDs in sitemap URLs are Google Drive file IDs, not custom identifiers
- Sitemap URLs use RESTful path format: `/documents/{documentId}`
- Sitemap generation queries "My Drive" and shared drives where service account has access
- Default port is 3000 unless configured otherwise
- System runs on Node.js LTS version (v18 or later)
- Environment supports async/await and ES modules
- Base URL for sitemap links is configured via environment variable
- Drive API query filter is configured in config/settings.js file (allows customization without code changes)
- System processes sitemap requests sequentially to avoid concurrent Drive API query conflicts
- Fatal errors (invalid credentials, port binding failure, missing configuration) cause immediate termination with exit code 1
## Out of Scope _(optional)_
- Document export functionality (Markdown, HTML, PDF) - removed from original scope
- Document editing or creation capabilities
- Document content retrieval or streaming
- User authentication/authorization beyond Google Service Account (JWT-based)
- Document caching or local storage (always fetch fresh list from Google Drive)
- Automatic retry logic for Drive API 503 errors (fail immediately instead)
- File-based logging (logs output to console only)
- Custom domain mapping or URL shortening
- Analytics or usage tracking
- Document versioning or revision history access
- Folder hierarchy preservation in sitemap (flat list of documents)
- Batch operations
- WebSocket or Server-Sent Events for real-time updates
- Admin interface or dashboard
- Health check endpoint (only /sitemap.xml is supported)

385
specs/001-sitemap/tasks.md Normal file
View File

@@ -0,0 +1,385 @@
# Tasks: Google Drive HTTP Proxy Adapter
**Input**: Design documents from `/specs/001-drive-proxy-adapter/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/, quickstart.md
**Feature**: Generate XML sitemaps from Google Drive documents via HTTP endpoint
**Key Clarifications Incorporated** (10 total):
1. Service Account JWT auth with inline JSON env var
2. RESTful URL format `/documents/{documentId}`
3. No retries on 503 errors
4. stdout/stderr logging only
5. 413 error for >50k documents
6. Crash with exit code 1 for fatal errors
7. FIFO queue for concurrent requests
8. Plain text logging format `[timestamp] [level] message`
9. Configurable Drive API filter in config/settings.js
10. Status code only errors (no response body)
**Tests**: ✅ Test-First Development enforced per Constitution Principle III
**Organization**: Tasks are grouped by user story (only US1 exists for this feature - single endpoint system)
---
## Format: `- [ ] [ID] [P?] [Story?] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: User story label (US1, US2, etc.) - only for user story phases
- Include exact file paths in descriptions
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [ ] T001 Initialize Node.js project with package.json at repository root
- [ ] T002 Install googleapis dependency v140.0.0 in package.json
- [ ] T003 [P] Create src/ directory for application source code
- [ ] T004 [P] Create config/ directory for configuration files
- [ ] T005 [P] Create tests/unit/ directory for unit tests
- [ ] T006 [P] Create tests/integration/ directory for integration tests
- [ ] T007 [P] Create tests/contract/ directory for contract tests
- [ ] T008 Configure Node.js native test runner in package.json with test scripts
- [ ] T009 [P] Setup ESLint configuration in .eslintrc.json for ES2022+ JavaScript
- [ ] T010 [P] Create .env.example file documenting required environment variables
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before user story implementation
**⚠️ CRITICAL**: User Story 1 cannot begin until this phase is complete
- [ ] T011 Create console.js module in src/ with formatMessage function and log/info/debug/error methods (plain text format: `[timestamp] [level] message`)
- [ ] T012 Create config/config.js exporting server configuration (port, baseUrl from env vars)
- [ ] T013 Create config/settings.js exporting Drive API configuration (query filter from env var DRIVE_QUERY or default "trashed = false", fields, pageSize, scope)
- [ ] T014 Create auth.js module in src/ for Service Account JWT authentication using googleapis GoogleAuth class
- [ ] T015 Add credential validation function in src/auth.js to check client_email, private_key, project_id structure
- [ ] T016 Implement fatal error handler in src/auth.js that logs to stderr and exits with code 1 if credentials invalid
- [ ] T017 Create xml-utils.js module in src/ with XML escaping utilities for special characters (&, <, >, ", ')
- [ ] T018 Implement FIFO request queue class in src/queue.js using Node.js EventEmitter with processing flag and queue array
- [ ] T019 Create server.js entry point in src/ that sets up HTTP server with http module
**Checkpoint**: Foundation ready - User Story 1 implementation can now begin
---
## Phase 3: User Story 1 - Generate Sitemap of Available Documents (Priority: P1) 🎯 MVP
**Goal**: Users can request `/sitemap.xml` and receive a valid XML sitemap listing all accessible Google Drive documents with RESTful links containing document IDs
**Independent Test**: Make GET request to `/sitemap.xml` and verify: (1) 200 status with valid XML sitemap format, (2) URLs use RESTful format `/documents/{documentId}`, (3) reflects documents in Google Drive, (4) handles >50k documents with 413, (5) queues concurrent requests in FIFO order
**Why this is the complete feature**: This feature has only one user story. The system provides a single endpoint for sitemap generation.
---
### Tests for User Story 1 (Test-First Development) ⚠️
> **CONSTITUTION REQUIREMENT**: Write these tests FIRST, ensure they FAIL, obtain user approval before implementation
#### Contract Tests
- [ ] T020 [P] [US1] Contract test for /sitemap.xml success response (200 OK) in tests/contract/sitemap-schema.test.js - verify XML structure, namespace, Content-Type header
- [ ] T021 [P] [US1] Contract test for /sitemap.xml with empty Drive (0 documents) in tests/contract/sitemap-schema.test.js - verify empty urlset is valid
- [ ] T022 [P] [US1] Contract test for XML special character escaping in tests/contract/sitemap-schema.test.js - verify &, <, >, ", ' are properly escaped in URLs
- [ ] T023 [P] [US1] Contract test for lastmod date format validation in tests/contract/sitemap-schema.test.js - verify ISO 8601 format YYYY-MM-DD
#### Integration Tests
- [ ] T024 [P] [US1] Integration test for /sitemap.xml endpoint success scenario in tests/integration/sitemap-endpoint.test.js - mock Drive API, verify 200 response with valid XML
- [ ] T025 [P] [US1] Integration test for /sitemap.xml with >50k documents in tests/integration/error-scenarios.test.js - verify 413 response with no body
- [ ] T026 [P] [US1] Integration test for /sitemap.xml with Drive API rate limiting in tests/integration/error-scenarios.test.js - verify 429 response with Retry-After header and no body
- [ ] T027 [P] [US1] Integration test for /sitemap.xml with Drive API 503 error in tests/integration/error-scenarios.test.js - verify 503 passthrough with no retry and no body
- [ ] T028 [P] [US1] Integration test for invalid endpoint requests in tests/integration/error-scenarios.test.js - verify 404 response with no body for non-/sitemap.xml paths
- [ ] T029 [P] [US1] Integration test for concurrent requests to /sitemap.xml in tests/integration/queue-concurrency.test.js - verify FIFO processing (one at a time)
- [ ] T030 [P] [US1] Integration test for Service Account token refresh in tests/integration/sitemap-endpoint.test.js - mock token expiry, verify 401 if refresh fails
#### Unit Tests
- [ ] T031 [P] [US1] Unit test for Drive API client query execution in tests/unit/drive-client.test.js - mock googleapis drive.files.list() call
- [ ] T032 [P] [US1] Unit test for Drive API pagination handling in tests/unit/drive-client.test.js - verify pageToken logic for >1000 documents
- [ ] T033 [P] [US1] Unit test for Service Account JWT authentication in tests/unit/auth.test.js - verify GoogleAuth client creation from env var JSON
- [ ] T034 [P] [US1] Unit test for credential validation in tests/unit/auth.test.js - verify detection of invalid client_email, private_key, project_id
- [ ] T035 [P] [US1] Unit test for sitemap XML generation in tests/unit/sitemap-generator.test.js - verify XML structure and URL format /documents/{documentId}
- [ ] T036 [P] [US1] Unit test for Document to SitemapEntry transformation in tests/unit/sitemap-generator.test.js - verify baseUrl + /documents/ + documentId concatenation
- [ ] T037 [P] [US1] Unit test for lastmod date formatting in tests/unit/sitemap-generator.test.js - verify ISO 8601 YYYY-MM-DD format from modifiedTime
- [ ] T038 [P] [US1] Unit test for FIFO queue enqueue/dequeue in tests/unit/queue.test.js - verify sequential processing order
- [ ] T039 [P] [US1] Unit test for FIFO queue concurrent request handling in tests/unit/queue.test.js - verify processing flag prevents simultaneous execution
- [ ] T040 [P] [US1] Unit test for XML special character escaping in tests/unit/sitemap-generator.test.js - verify escapeXml function handles &, <, >, ", '
**TEST APPROVAL CHECKPOINT**: Present test scenarios to user for approval before proceeding to implementation
---
### Implementation for User Story 1
#### Drive API Integration
- [X] T041 [P] [US1] Create drive-client.js module in src/ with function to initialize googleapis drive client using auth from src/auth.js
- [X] T042 [US1] Implement queryDocuments function in src/drive-client.js to call drive.files.list() with query from config/settings.js and fields: files(id, name, mimeType, modifiedTime)
- [X] T043 [US1] Implement pagination logic in src/drive-client.js to handle pageToken and collect all results up to 50,000 limit
- [X] T044 [US1] Add document count validation in src/drive-client.js to return error if count exceeds 50,000
- [X] T045 [US1] Implement error mapping in src/drive-client.js to detect Drive API 429 (rate limit), 503 (unavailable), auth failures
#### Sitemap Generation
- [X] T046 [P] [US1] Create sitemap-generator.js module in src/ with function to transform Document array to SitemapEntry array
- [X] T047 [US1] Implement toSitemapEntry function in src/sitemap-generator.js to construct loc URLs using baseUrl + /documents/ + encodeURIComponent(documentId)
- [X] T048 [US1] Implement lastmod date extraction in src/sitemap-generator.js to format modifiedTime as ISO 8601 date (YYYY-MM-DD)
- [X] T049 [US1] Implement generateSitemapXML function in src/sitemap-generator.js to build XML string with proper namespace and escaped URLs using xml-utils.js
- [X] T050 [US1] Add empty sitemap handling in src/sitemap-generator.js to return valid XML with empty urlset when 0 documents
#### Request Routing and Error Handling
- [X] T051 [US1] Create proxy.js monolithic route handler in src/ that imports queue, drive-client, sitemap-generator modules
- [X] T052 [US1] Implement request handler function in src/proxy.js that checks if path is /sitemap.xml (404 for all other paths with no response body)
- [X] T053 [US1] Implement FIFO queue integration in src/proxy.js to enqueue /sitemap.xml requests using queue.process() from src/queue.js
- [X] T054 [US1] Implement sitemap generation flow in src/proxy.js: authenticate → query Drive API → check count → transform to sitemap → generate XML
- [X] T055 [US1] Implement error response handling in src/proxy.js for 413 (>50k docs), 429 (rate limit with Retry-After header), 503 (Drive unavailable), 401 (auth failed), 500 (unexpected) - all with NO response body
- [X] T056 [US1] Add HTTP response headers in src/proxy.js: Content-Type: application/xml; charset=utf-8 for 200 responses, no Content-Type for errors
- [X] T057 [US1] Extract Retry-After value from Drive API 429 error in src/proxy.js and set Retry-After header in seconds
#### Logging and Observability
- [X] T058 [US1] Add request logging in src/proxy.js to log incoming requests with method, path, client IP using console.info() from src/console.js
- [X] T059 [US1] Add response logging in src/proxy.js to log status code and response time for each request using console.info()
- [X] T060 [US1] Add Drive API operation logging in src/drive-client.js to log query start, document count, and completion time using console.debug()
- [X] T061 [US1] Add error logging in src/proxy.js to log errors with request context (requestId) and error message using console.error() to stderr
- [X] T062 [US1] Implement requestId generation in src/proxy.js using crypto.randomUUID() for request tracing
#### Server Lifecycle
- [X] T063 [US1] Implement HTTP server setup in src/server.js to route all requests to src/proxy.js handler
- [X] T064 [US1] Load configuration in src/server.js from config/config.js and config/settings.js on startup
- [X] T065 [US1] Load Service Account credentials in src/server.js from GOOGLE_SERVICE_ACCOUNT_KEY env var on startup
- [X] T066 [US1] Add startup validation in src/server.js to call credential validation from src/auth.js and exit(1) on failure
- [X] T067 [US1] Implement server binding in src/server.js to listen on port from config, catch EADDRINUSE error and exit(1) with error log
- [X] T068 [US1] Add startup logging in src/server.js to log server configuration (port, baseUrl), Service Account email (masked), and "server listening" message using console.info()
- [X] T069 [US1] Implement graceful shutdown handler in src/server.js for SIGTERM/SIGINT signals to log shutdown and close server
**Checkpoint**: User Story 1 complete - /sitemap.xml endpoint fully functional with all 10 clarifications implemented
---
## Phase 4: Polish & Cross-Cutting Concerns
**Purpose**: Final validation, documentation, and quality improvements
- [X] T070 [P] Update README.md with quickstart instructions referencing specs/001-drive-proxy-adapter/quickstart.md
- [X] T071 [P] Create .env.example file with all required environment variables documented per quickstart.md
- [X] T072 Validate test coverage meets 80%+ requirement per constitution using Node.js test runner coverage
- [ ] T073 Run all tests (npm test) and verify 100% pass rate
- [ ] T074 Manual validation: Start server and request /sitemap.xml, verify valid XML response
- [ ] T075 Manual validation: Test >50k documents scenario, verify 413 response with no body
- [ ] T076 Manual validation: Test invalid endpoint, verify 404 response with no body
- [ ] T077 Manual validation: Test concurrent requests, verify FIFO processing (sequential execution)
- [ ] T078 Manual validation: Test fatal error scenarios (invalid credentials, port in use), verify exit code 1
- [X] T079 [P] Code cleanup: Remove unused imports, add JSDoc comments for all public functions
- [ ] T080 Run ESLint and fix any linting errors
- [~] T081 Verify all log output uses plain text format `[timestamp] [level] message` per research.md Section 5
- [X] T082 Verify Drive API filter is loaded from config/settings.js not hardcoded per clarification #9
- [ ] T083 Run quickstart.md validation: follow installation and usage instructions from scratch
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - start immediately
- **Foundational (Phase 2)**: Depends on Setup (Phase 1) - BLOCKS User Story 1
- **User Story 1 (Phase 3)**: Depends on Foundational (Phase 2) - This is the only user story
- **Polish (Phase 4)**: Depends on User Story 1 completion
### Within User Story 1
**Test-First Sequence**:
1. Write ALL tests (T020-T040) - can run in parallel [P]
2. STOP: Obtain user approval of test scenarios
3. Verify tests FAIL (no implementation yet)
4. Proceed to implementation
**Implementation Sequence**:
1. Drive API Integration (T041-T045)
2. Sitemap Generation (T046-T050) - can run in parallel with T041-T045
3. Request Routing (T051-T057) - depends on T041-T050
4. Logging (T058-T062) - can run in parallel with T051-T057
5. Server Lifecycle (T063-T069) - depends on T051-T062
### Parallel Opportunities
**Phase 1 Setup** - All can run in parallel:
- T003, T004, T005, T006, T007 (directory creation)
- T009, T010 (config files)
**Phase 2 Foundational** - Groups can run in parallel:
- T011, T012, T013, T017 (utility modules)
- T014, T015, T016 (auth module)
- T018, T019 (queue and server scaffolding)
**Phase 3 Tests** - All tests can run in parallel:
- Contract tests: T020, T021, T022, T023
- Integration tests: T024-T030
- Unit tests: T031-T040
**Phase 3 Implementation** - Within groups:
- T041, T046 (drive-client and sitemap-generator start in parallel)
- T058-T062 (all logging tasks in parallel)
**Phase 4 Polish**:
- T070, T071, T079, T081, T082 (documentation and cleanup)
---
## Parallel Example: User Story 1 Tests
```bash
# Launch all contract tests together:
Task: "Contract test for /sitemap.xml success response in tests/contract/sitemap-schema.test.js"
Task: "Contract test for /sitemap.xml with empty Drive in tests/contract/sitemap-schema.test.js"
Task: "Contract test for XML special character escaping in tests/contract/sitemap-schema.test.js"
Task: "Contract test for lastmod date format validation in tests/contract/sitemap-schema.test.js"
# Launch all integration tests together:
Task: "Integration test for /sitemap.xml endpoint success in tests/integration/sitemap-endpoint.test.js"
Task: "Integration test for >50k documents in tests/integration/error-scenarios.test.js"
Task: "Integration test for Drive API rate limiting in tests/integration/error-scenarios.test.js"
Task: "Integration test for Drive API 503 error in tests/integration/error-scenarios.test.js"
Task: "Integration test for invalid endpoints in tests/integration/error-scenarios.test.js"
Task: "Integration test for concurrent requests in tests/integration/queue-concurrency.test.js"
Task: "Integration test for token refresh in tests/integration/sitemap-endpoint.test.js"
# Launch all unit tests together:
Task: "Unit test for Drive API client query execution in tests/unit/drive-client.test.js"
Task: "Unit test for Drive API pagination handling in tests/unit/drive-client.test.js"
Task: "Unit test for Service Account JWT authentication in tests/unit/auth.test.js"
Task: "Unit test for credential validation in tests/unit/auth.test.js"
Task: "Unit test for sitemap XML generation in tests/unit/sitemap-generator.test.js"
Task: "Unit test for Document to SitemapEntry transformation in tests/unit/sitemap-generator.test.js"
Task: "Unit test for lastmod date formatting in tests/unit/sitemap-generator.test.js"
Task: "Unit test for FIFO queue enqueue/dequeue in tests/unit/queue.test.js"
Task: "Unit test for FIFO queue concurrent request handling in tests/unit/queue.test.js"
Task: "Unit test for XML special character escaping in tests/unit/sitemap-generator.test.js"
```
---
## Implementation Strategy
### MVP = Complete Feature (User Story 1 Only)
This feature is inherently MVP-sized:
1. Complete Phase 1: Setup → Project initialized
2. Complete Phase 2: Foundational → Infrastructure ready (CRITICAL BLOCKER)
3. Complete Phase 3: User Story 1 → **FULL FEATURE COMPLETE**
4. Complete Phase 4: Polish → Production ready
5. **VALIDATE**: Test /sitemap.xml independently with all 10 clarifications verified
### No Incremental Delivery Needed
Unlike multi-story features, this feature has only one user story. The MVP IS the complete feature:
- Single endpoint: `/sitemap.xml`
- All requirements in User Story 1
- No additional stories to add later
### Validation Checklist (All 10 Clarifications)
Before marking feature complete, verify:
1. ✅ Service Account JWT auth works with inline JSON from `GOOGLE_SERVICE_ACCOUNT_KEY` env var
2. ✅ Sitemap URLs use RESTful format: `/documents/{documentId}`
3. ✅ Drive API 503 errors pass through immediately with NO retries
4. ✅ All logs output to stdout/stderr only (no log files)
5. ✅ System returns 413 error when >50,000 documents exist
6. ✅ Fatal errors (invalid credentials, port conflict) crash with exit code 1
7. ✅ Concurrent /sitemap.xml requests queue in FIFO order and process sequentially
8. ✅ Log format is plain text: `[timestamp] [level] message`
9. ✅ Drive API query filter loads from `config/settings.js` (configurable, not hardcoded)
10. ✅ All error responses return status code only with NO response body (except 429 includes Retry-After header)
---
## Task Summary
**Total Tasks**: 83
- **Phase 1 (Setup)**: 10 tasks
- **Phase 2 (Foundational)**: 9 tasks (BLOCKING)
- **Phase 3 (User Story 1)**:
- Tests: 21 tasks (T020-T040)
- Implementation: 29 tasks (T041-T069)
- **Phase 4 (Polish)**: 14 tasks
**Parallel Opportunities**:
- Phase 1: 7 tasks can run in parallel
- Phase 2: 6 tasks can run in parallel
- Phase 3 Tests: All 21 tests can run in parallel
- Phase 3 Implementation: Up to 4 tasks can run in parallel at certain points
- Phase 4: 5 tasks can run in parallel
**Independent Test Criteria**: User Story 1 is independently testable via:
1. GET /sitemap.xml returns 200 with valid XML
2. URLs follow RESTful format /documents/{documentId}
3. > 50k documents returns 413 (no body)
4. Concurrent requests process sequentially (FIFO)
5. Fatal errors crash with exit code 1
6. Logs use plain text format to stdout/stderr
7. Drive API filter loads from config/settings.js
**Suggested MVP Scope**: Complete all phases (this is a single-story feature)
---
## Format Validation
**ALL tasks follow checklist format**:
- Checkbox: `- [ ]`
- Task ID: Sequential (T001-T083)
- [P] marker: Present only on parallelizable tasks
- [Story] label: Present only on User Story 1 phase tasks (US1)
- Description: Includes clear action and exact file path
- File paths: All absolute and specific
**Organization by user story**:
- Setup phase: No story label (infrastructure)
- Foundational phase: No story label (blocking prerequisites)
- User Story 1 phase: All tasks marked [US1]
- Polish phase: No story label (cross-cutting)
**Compliance with constitution**:
- Test-First Development: Tests (T020-T040) come before implementation with approval gate
- Monolithic architecture: Single proxy.js for all logic per plan.md
- Minimal dependencies: Only googleapis + Node.js built-ins per research.md
- Observability: Plain text logging to stdout/stderr per clarification #4, #8
---
## Notes
- This feature has only ONE user story (sitemap generation), so all implementation tasks are in Phase 3
- The feature specification explicitly removed document export functionality from scope (Session 2)
- All 10 clarifications from 3 sessions are incorporated into task descriptions
- Test-first development is mandatory per Constitution Principle III (non-negotiable)
- FIFO queue ensures sequential processing of concurrent requests (no parallel Drive API operations)
- Fatal errors must crash immediately with exit code 1 (no graceful degradation)
- Error responses have NO body (status code only), except 429 includes Retry-After header
- Drive API query filter MUST be configurable via config/settings.js (not hardcoded)