# Data Model: Document Export API Route **Feature**: 002-document-export **Date**: 2026-03-09 **Purpose**: Define data structures and entities for document export functionality ## Overview This feature introduces three primary entities for handling document export requests: **Document**, **ExportRequest**, and **ExportFormat**. These entities represent the data flowing through the export pipeline from request initiation to response delivery. --- ## Entities ### 1. Document Represents a file stored in Google Drive, accessed by unique ID. **Attributes**: | Field | Type | Required | Description | Validation | |-------|------|----------|-------------|------------| | `id` | string | Yes | Google Drive document identifier (extracted from URL parameter) | Non-empty string, alphanumeric with hyphens/underscores | | `name` | string | Yes | Document name from Google Drive metadata | Non-empty string, used in Content-Disposition filename | | `mimeType` | string | Yes | MIME type of the document | One of Google Workspace types or native file types | | `exportLinks` | object | No | Map of available export formats to URLs | Key: MIME type (string), Value: Export URL (string) | **Document Types**: 1. **Google Workspace Documents**: - Docs: `application/vnd.google-apps.document` - Sheets: `application/vnd.google-apps.spreadsheet` - Slides: `application/vnd.google-apps.presentation` - **Characteristic**: Have `exportLinks` field with conversion options 2. **Native Files**: - PDF: `application/pdf` - Images: `image/jpeg`, `image/png`, etc. - Other: Various MIME types - **Characteristic**: No `exportLinks` field, streamed directly **State Transitions**: - N/A (stateless - documents fetched per request) **Example**: ```javascript // Google Workspace Document (has exportLinks) { id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms", name: "Meeting Notes Q1 2026", mimeType: "application/vnd.google-apps.document", exportLinks: { "text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...", "text/html": "https://docs.google.com/feeds/download/documents/export/Export?...", "application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..." } } // Native PDF (no exportLinks) { id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890", name: "Product Specs", mimeType: "application/pdf", exportLinks: null } ``` --- ### 2. ExportRequest Represents a user's request to export a document via the `/documents/:documentId` route. **Attributes**: | Field | Type | Required | Description | Validation | |-------|------|----------|-------------|------------| | `documentId` | string | Yes | Document ID from URL path parameter | Non-empty string, alphanumeric with hyphens/underscores | | `timestamp` | Date | Yes | Request initiation timestamp | ISO 8601 format, used for timeout calculation | | `accessToken` | string | Yes | Google Drive API access token (from auth context) | Valid JWT, not expired | **Lifecycle**: 1. **Initiated**: Request received on `/documents/:documentId` 2. **Authenticated**: Access token validated and available 3. **Metadata Fetched**: Google Drive API called for document metadata 4. **Format Selected**: Export format chosen based on availability 5. **Content Streamed**: Document content piped to response 6. **Completed**: Response sent to client **Timeout Handling**: - Maximum duration: 30 seconds from timestamp - Enforced via axios timeout configuration - Returns HTTP 504 if exceeded **Example**: ```javascript { documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms", timestamp: "2026-03-09T18:00:00.000Z", accessToken: "ya29.a0AfH6SMBx..." // Google OAuth2 access token } ``` --- ### 3. ExportFormat Represents the selected output format for a document export. **Attributes**: | Field | Type | Required | Description | Validation | |-------|------|----------|-------------|------------| | `mimeType` | string | Yes | MIME type of the export format | One of: `text/x-markdown`, `text/html`, `application/pdf` | | `extension` | string | Yes | File extension for Content-Disposition header | One of: `md`, `html`, `pdf` | | `url` | string | Conditional | Export URL from Google Drive exportLinks | Required for Google Workspace docs, null for native files | | `isNative` | boolean | Yes | Whether this is a native file (direct stream) or export | `true` for native PDFs, `false` for conversions | **Format Priority**: Priority order for selection when multiple formats available: 1. `text/x-markdown` (.md) - Most portable for content processing 2. `text/html` (.html) - Rich formatting fallback 3. `application/pdf` (.pdf) - Universal viewing format **Selection Rules**: 1. If `exportLinks` exist: Select first available format from priority list 2. If no `exportLinks` and `mimeType === 'application/pdf'`: Use native PDF streaming 3. Otherwise: Return HTTP 403 "mimetype not supported" **Example**: ```javascript // Google Workspace Document export (Markdown selected) { mimeType: "text/x-markdown", extension: "md", url: "https://docs.google.com/feeds/download/documents/export/Export?...", isNative: false } // Native PDF file (direct stream) { mimeType: "application/pdf", extension: "pdf", url: null, // Not used - file streamed directly isNative: true } // Unsupported file (image) { mimeType: null, extension: null, url: null, isNative: false } // Returns HTTP 403 ``` --- ## Entity Relationships ``` ExportRequest | | 1:1 (fetches) v Document | | 1:1 (determines) v ExportFormat ``` **Flow**: 1. ExportRequest initiated with documentId 2. Document metadata fetched from Google Drive API 3. ExportFormat selected based on Document attributes (mimeType, exportLinks) 4. Content streamed using ExportFormat configuration --- ## Validation Rules ### Document Validation - **ID Format**: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores) - **Name Sanitization**: Remove special characters for Content-Disposition filename - **MIME Type**: Must be recognized Google Workspace or native file type - **Export Links**: If present, must be object with string keys and URL string values ### Size & Timeout Constraints - **Max Document Size**: 10MB (10,485,760 bytes) - Validated via `Content-Length` header before streaming - Returns HTTP 413 if exceeded - **Max Request Duration**: 30 seconds - Enforced via axios timeout - Returns HTTP 504 if exceeded ### Format Selection Validation - **Priority Check**: Iterate through formats in order: Markdown → HTML → PDF - **Availability Check**: Format must exist in exportLinks object - **Fallback Check**: If no exportLinks, mimeType must be `application/pdf` - **Rejection**: If none of above, return HTTP 403 --- ## Error States ### Document Not Found - **Condition**: Google Drive API returns 404 or document doesn't exist - **Response**: HTTP 404 "Document not found" - **Data State**: No Document entity created ### Unauthorized Access - **Condition**: User lacks permissions, invalid/expired token - **Response**: HTTP 401 "Unauthorized" - **Data State**: No Document entity created ### Unsupported Format - **Condition**: No exportLinks, mimeType not application/pdf - **Response**: HTTP 403 "mimetype not supported" - **Data State**: Document entity exists, ExportFormat entity null ### Size Limit Exceeded - **Condition**: Content-Length > 10MB - **Response**: HTTP 413 "Payload Too Large" - **Data State**: Document entity exists, ExportFormat selected, streaming aborted ### Timeout Exceeded - **Condition**: Request duration > 30 seconds - **Response**: HTTP 504 "Gateway Timeout" - **Data State**: Partial processing, request abandoned ### Google Drive API Error - **Condition**: API unavailable, rate limit exceeded - **Response**: HTTP 502 "Bad Gateway - Google Drive API unavailable" - **Data State**: Variable depending on failure point --- ## Data Flow Example **Successful Export (Google Workspace Document)**: ``` 1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." } 2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} } 3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false } 4. Stream content from url to client 5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md" ``` **Successful Export (Native PDF)**: ``` 1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." } 2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null } 3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true } 4. Stream file using files.get with alt=media 5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf" ``` **Failed Export (Unsupported Type)**: ``` 1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." } 2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null } 3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false } 4. Return HTTP 403 "mimetype not supported" ``` --- ## Implementation Notes ### Statelessness - No entities persisted to database or cache - All data exists only for request duration - Document metadata fetched fresh per request ### Memory Management - Document metadata buffered in memory (typically <1KB) - Content never buffered - streamed directly - Maximum memory per request: ~10MB + metadata ### Concurrency - Each request handled independently with isolated ExportRequest entity - No shared state between requests - Target: 50 concurrent requests without degradation