Files
google-drive-content-adapter/specs/002-document-export/data-model.md

286 lines
9.6 KiB
Markdown

# Data Model: Document Export API Route
**Feature**: 002-document-export
**Date**: 2026-03-09
**Purpose**: Define data structures and entities for document export functionality
## Overview
This feature introduces three primary entities for handling document export requests: **Document**, **ExportRequest**, and **ExportFormat**. These entities represent the data flowing through the export pipeline from request initiation to response delivery.
---
## Entities
### 1. Document
Represents a file stored in Google Drive, accessed by unique ID.
**Attributes**:
| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `id` | string | Yes | Google Drive document identifier (extracted from URL parameter) | Non-empty string, alphanumeric with hyphens/underscores |
| `name` | string | Yes | Document name from Google Drive metadata | Non-empty string, used in Content-Disposition filename |
| `mimeType` | string | Yes | MIME type of the document | One of Google Workspace types or native file types |
| `exportLinks` | object | No | Map of available export formats to URLs | Key: MIME type (string), Value: Export URL (string) |
**Document Types**:
1. **Google Workspace Documents**:
- Docs: `application/vnd.google-apps.document`
- Sheets: `application/vnd.google-apps.spreadsheet`
- Slides: `application/vnd.google-apps.presentation`
- **Characteristic**: Have `exportLinks` field with conversion options
2. **Native Files**:
- PDF: `application/pdf`
- Images: `image/jpeg`, `image/png`, etc.
- Other: Various MIME types
- **Characteristic**: No `exportLinks` field, streamed directly
**State Transitions**:
- N/A (stateless - documents fetched per request)
**Example**:
```javascript
// Google Workspace Document (has exportLinks)
{
id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
name: "Meeting Notes Q1 2026",
mimeType: "application/vnd.google-apps.document",
exportLinks: {
"text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...",
"text/html": "https://docs.google.com/feeds/download/documents/export/Export?...",
"application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..."
}
}
// Native PDF (no exportLinks)
{
id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890",
name: "Product Specs",
mimeType: "application/pdf",
exportLinks: null
}
```
---
### 2. ExportRequest
Represents a user's request to export a document via the `/documents/:documentId` route.
**Attributes**:
| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `documentId` | string | Yes | Document ID from URL path parameter | Non-empty string, alphanumeric with hyphens/underscores |
| `timestamp` | Date | Yes | Request initiation timestamp | ISO 8601 format, used for timeout calculation |
| `accessToken` | string | Yes | Google Drive API access token (from auth context) | Valid JWT, not expired |
**Lifecycle**:
1. **Initiated**: Request received on `/documents/:documentId`
2. **Authenticated**: Access token validated and available
3. **Metadata Fetched**: Google Drive API called for document metadata
4. **Format Selected**: Export format chosen based on availability
5. **Content Streamed**: Document content piped to response
6. **Completed**: Response sent to client
**Timeout Handling**:
- Maximum duration: 30 seconds from timestamp
- Enforced via axios timeout configuration
- Returns HTTP 504 if exceeded
**Example**:
```javascript
{
documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
timestamp: "2026-03-09T18:00:00.000Z",
accessToken: "ya29.a0AfH6SMBx..." // Google OAuth2 access token
}
```
---
### 3. ExportFormat
Represents the selected output format for a document export.
**Attributes**:
| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `mimeType` | string | Yes | MIME type of the export format | One of: `text/x-markdown`, `text/html`, `application/pdf` |
| `extension` | string | Yes | File extension for Content-Disposition header | One of: `md`, `html`, `pdf` |
| `url` | string | Conditional | Export URL from Google Drive exportLinks | Required for Google Workspace docs, null for native files |
| `isNative` | boolean | Yes | Whether this is a native file (direct stream) or export | `true` for native PDFs, `false` for conversions |
**Format Priority**:
Priority order for selection when multiple formats available:
1. `text/x-markdown` (.md) - Most portable for content processing
2. `text/html` (.html) - Rich formatting fallback
3. `application/pdf` (.pdf) - Universal viewing format
**Selection Rules**:
1. If `exportLinks` exist: Select first available format from priority list
2. If no `exportLinks` and `mimeType === 'application/pdf'`: Use native PDF streaming
3. Otherwise: Return HTTP 403 "mimetype not supported"
**Example**:
```javascript
// Google Workspace Document export (Markdown selected)
{
mimeType: "text/x-markdown",
extension: "md",
url: "https://docs.google.com/feeds/download/documents/export/Export?...",
isNative: false
}
// Native PDF file (direct stream)
{
mimeType: "application/pdf",
extension: "pdf",
url: null, // Not used - file streamed directly
isNative: true
}
// Unsupported file (image)
{
mimeType: null,
extension: null,
url: null,
isNative: false
}
// Returns HTTP 403
```
---
## Entity Relationships
```
ExportRequest
|
| 1:1 (fetches)
v
Document
|
| 1:1 (determines)
v
ExportFormat
```
**Flow**:
1. ExportRequest initiated with documentId
2. Document metadata fetched from Google Drive API
3. ExportFormat selected based on Document attributes (mimeType, exportLinks)
4. Content streamed using ExportFormat configuration
---
## Validation Rules
### Document Validation
- **ID Format**: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores)
- **Name Sanitization**: Remove special characters for Content-Disposition filename
- **MIME Type**: Must be recognized Google Workspace or native file type
- **Export Links**: If present, must be object with string keys and URL string values
### Size & Timeout Constraints
- **Max Document Size**: 10MB (10,485,760 bytes)
- Validated via `Content-Length` header before streaming
- Returns HTTP 413 if exceeded
- **Max Request Duration**: 30 seconds
- Enforced via axios timeout
- Returns HTTP 504 if exceeded
### Format Selection Validation
- **Priority Check**: Iterate through formats in order: Markdown → HTML → PDF
- **Availability Check**: Format must exist in exportLinks object
- **Fallback Check**: If no exportLinks, mimeType must be `application/pdf`
- **Rejection**: If none of above, return HTTP 403
---
## Error States
### Document Not Found
- **Condition**: Google Drive API returns 404 or document doesn't exist
- **Response**: HTTP 404 "Document not found"
- **Data State**: No Document entity created
### Unauthorized Access
- **Condition**: User lacks permissions, invalid/expired token
- **Response**: HTTP 401 "Unauthorized"
- **Data State**: No Document entity created
### Unsupported Format
- **Condition**: No exportLinks, mimeType not application/pdf
- **Response**: HTTP 403 "mimetype not supported"
- **Data State**: Document entity exists, ExportFormat entity null
### Size Limit Exceeded
- **Condition**: Content-Length > 10MB
- **Response**: HTTP 413 "Payload Too Large"
- **Data State**: Document entity exists, ExportFormat selected, streaming aborted
### Timeout Exceeded
- **Condition**: Request duration > 30 seconds
- **Response**: HTTP 504 "Gateway Timeout"
- **Data State**: Partial processing, request abandoned
### Google Drive API Error
- **Condition**: API unavailable, rate limit exceeded
- **Response**: HTTP 502 "Bad Gateway - Google Drive API unavailable"
- **Data State**: Variable depending on failure point
---
## Data Flow Example
**Successful Export (Google Workspace Document)**:
```
1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." }
2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} }
3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false }
4. Stream content from url to client
5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md"
```
**Successful Export (Native PDF)**:
```
1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." }
2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null }
3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true }
4. Stream file using files.get with alt=media
5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf"
```
**Failed Export (Unsupported Type)**:
```
1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." }
2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null }
3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false }
4. Return HTTP 403 "mimetype not supported"
```
---
## Implementation Notes
### Statelessness
- No entities persisted to database or cache
- All data exists only for request duration
- Document metadata fetched fresh per request
### Memory Management
- Document metadata buffered in memory (typically <1KB)
- Content never buffered - streamed directly
- Maximum memory per request: ~10MB + metadata
### Concurrency
- Each request handled independently with isolated ExportRequest entity
- No shared state between requests
- Target: 50 concurrent requests without degradation