286 lines
9.6 KiB
Markdown
286 lines
9.6 KiB
Markdown
# Data Model: Document Export API Route
|
|
|
|
**Feature**: 002-document-export
|
|
**Date**: 2026-03-09
|
|
**Purpose**: Define data structures and entities for document export functionality
|
|
|
|
## Overview
|
|
|
|
This feature introduces three primary entities for handling document export requests: **Document**, **ExportRequest**, and **ExportFormat**. These entities represent the data flowing through the export pipeline from request initiation to response delivery.
|
|
|
|
---
|
|
|
|
## Entities
|
|
|
|
### 1. Document
|
|
|
|
Represents a file stored in Google Drive, accessed by unique ID.
|
|
|
|
**Attributes**:
|
|
|
|
| Field | Type | Required | Description | Validation |
|
|
|-------|------|----------|-------------|------------|
|
|
| `id` | string | Yes | Google Drive document identifier (extracted from URL parameter) | Non-empty string, alphanumeric with hyphens/underscores |
|
|
| `name` | string | Yes | Document name from Google Drive metadata | Non-empty string, used in Content-Disposition filename |
|
|
| `mimeType` | string | Yes | MIME type of the document | One of Google Workspace types or native file types |
|
|
| `exportLinks` | object | No | Map of available export formats to URLs | Key: MIME type (string), Value: Export URL (string) |
|
|
|
|
**Document Types**:
|
|
|
|
1. **Google Workspace Documents**:
|
|
- Docs: `application/vnd.google-apps.document`
|
|
- Sheets: `application/vnd.google-apps.spreadsheet`
|
|
- Slides: `application/vnd.google-apps.presentation`
|
|
- **Characteristic**: Have `exportLinks` field with conversion options
|
|
|
|
2. **Native Files**:
|
|
- PDF: `application/pdf`
|
|
- Images: `image/jpeg`, `image/png`, etc.
|
|
- Other: Various MIME types
|
|
- **Characteristic**: No `exportLinks` field, streamed directly
|
|
|
|
**State Transitions**:
|
|
- N/A (stateless - documents fetched per request)
|
|
|
|
**Example**:
|
|
```javascript
|
|
// Google Workspace Document (has exportLinks)
|
|
{
|
|
id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
|
|
name: "Meeting Notes Q1 2026",
|
|
mimeType: "application/vnd.google-apps.document",
|
|
exportLinks: {
|
|
"text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...",
|
|
"text/html": "https://docs.google.com/feeds/download/documents/export/Export?...",
|
|
"application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..."
|
|
}
|
|
}
|
|
|
|
// Native PDF (no exportLinks)
|
|
{
|
|
id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890",
|
|
name: "Product Specs",
|
|
mimeType: "application/pdf",
|
|
exportLinks: null
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. ExportRequest
|
|
|
|
Represents a user's request to export a document via the `/documents/:documentId` route.
|
|
|
|
**Attributes**:
|
|
|
|
| Field | Type | Required | Description | Validation |
|
|
|-------|------|----------|-------------|------------|
|
|
| `documentId` | string | Yes | Document ID from URL path parameter | Non-empty string, alphanumeric with hyphens/underscores |
|
|
| `timestamp` | Date | Yes | Request initiation timestamp | ISO 8601 format, used for timeout calculation |
|
|
| `accessToken` | string | Yes | Google Drive API access token (from auth context) | Valid JWT, not expired |
|
|
|
|
**Lifecycle**:
|
|
1. **Initiated**: Request received on `/documents/:documentId`
|
|
2. **Authenticated**: Access token validated and available
|
|
3. **Metadata Fetched**: Google Drive API called for document metadata
|
|
4. **Format Selected**: Export format chosen based on availability
|
|
5. **Content Streamed**: Document content piped to response
|
|
6. **Completed**: Response sent to client
|
|
|
|
**Timeout Handling**:
|
|
- Maximum duration: 30 seconds from timestamp
|
|
- Enforced via axios timeout configuration
|
|
- Returns HTTP 504 if exceeded
|
|
|
|
**Example**:
|
|
```javascript
|
|
{
|
|
documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
|
|
timestamp: "2026-03-09T18:00:00.000Z",
|
|
accessToken: "ya29.a0AfH6SMBx..." // Google OAuth2 access token
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3. ExportFormat
|
|
|
|
Represents the selected output format for a document export.
|
|
|
|
**Attributes**:
|
|
|
|
| Field | Type | Required | Description | Validation |
|
|
|-------|------|----------|-------------|------------|
|
|
| `mimeType` | string | Yes | MIME type of the export format | One of: `text/x-markdown`, `text/html`, `application/pdf` |
|
|
| `extension` | string | Yes | File extension for Content-Disposition header | One of: `md`, `html`, `pdf` |
|
|
| `url` | string | Conditional | Export URL from Google Drive exportLinks | Required for Google Workspace docs, null for native files |
|
|
| `isNative` | boolean | Yes | Whether this is a native file (direct stream) or export | `true` for native PDFs, `false` for conversions |
|
|
|
|
**Format Priority**:
|
|
Priority order for selection when multiple formats available:
|
|
1. `text/x-markdown` (.md) - Most portable for content processing
|
|
2. `text/html` (.html) - Rich formatting fallback
|
|
3. `application/pdf` (.pdf) - Universal viewing format
|
|
|
|
**Selection Rules**:
|
|
1. If `exportLinks` exist: Select first available format from priority list
|
|
2. If no `exportLinks` and `mimeType === 'application/pdf'`: Use native PDF streaming
|
|
3. Otherwise: Return HTTP 403 "mimetype not supported"
|
|
|
|
**Example**:
|
|
```javascript
|
|
// Google Workspace Document export (Markdown selected)
|
|
{
|
|
mimeType: "text/x-markdown",
|
|
extension: "md",
|
|
url: "https://docs.google.com/feeds/download/documents/export/Export?...",
|
|
isNative: false
|
|
}
|
|
|
|
// Native PDF file (direct stream)
|
|
{
|
|
mimeType: "application/pdf",
|
|
extension: "pdf",
|
|
url: null, // Not used - file streamed directly
|
|
isNative: true
|
|
}
|
|
|
|
// Unsupported file (image)
|
|
{
|
|
mimeType: null,
|
|
extension: null,
|
|
url: null,
|
|
isNative: false
|
|
}
|
|
// Returns HTTP 403
|
|
```
|
|
|
|
---
|
|
|
|
## Entity Relationships
|
|
|
|
```
|
|
ExportRequest
|
|
|
|
|
| 1:1 (fetches)
|
|
v
|
|
Document
|
|
|
|
|
| 1:1 (determines)
|
|
v
|
|
ExportFormat
|
|
```
|
|
|
|
**Flow**:
|
|
1. ExportRequest initiated with documentId
|
|
2. Document metadata fetched from Google Drive API
|
|
3. ExportFormat selected based on Document attributes (mimeType, exportLinks)
|
|
4. Content streamed using ExportFormat configuration
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
### Document Validation
|
|
- **ID Format**: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores)
|
|
- **Name Sanitization**: Remove special characters for Content-Disposition filename
|
|
- **MIME Type**: Must be recognized Google Workspace or native file type
|
|
- **Export Links**: If present, must be object with string keys and URL string values
|
|
|
|
### Size & Timeout Constraints
|
|
- **Max Document Size**: 10MB (10,485,760 bytes)
|
|
- Validated via `Content-Length` header before streaming
|
|
- Returns HTTP 413 if exceeded
|
|
- **Max Request Duration**: 30 seconds
|
|
- Enforced via axios timeout
|
|
- Returns HTTP 504 if exceeded
|
|
|
|
### Format Selection Validation
|
|
- **Priority Check**: Iterate through formats in order: Markdown → HTML → PDF
|
|
- **Availability Check**: Format must exist in exportLinks object
|
|
- **Fallback Check**: If no exportLinks, mimeType must be `application/pdf`
|
|
- **Rejection**: If none of above, return HTTP 403
|
|
|
|
---
|
|
|
|
## Error States
|
|
|
|
### Document Not Found
|
|
- **Condition**: Google Drive API returns 404 or document doesn't exist
|
|
- **Response**: HTTP 404 "Document not found"
|
|
- **Data State**: No Document entity created
|
|
|
|
### Unauthorized Access
|
|
- **Condition**: User lacks permissions, invalid/expired token
|
|
- **Response**: HTTP 401 "Unauthorized"
|
|
- **Data State**: No Document entity created
|
|
|
|
### Unsupported Format
|
|
- **Condition**: No exportLinks, mimeType not application/pdf
|
|
- **Response**: HTTP 403 "mimetype not supported"
|
|
- **Data State**: Document entity exists, ExportFormat entity null
|
|
|
|
### Size Limit Exceeded
|
|
- **Condition**: Content-Length > 10MB
|
|
- **Response**: HTTP 413 "Payload Too Large"
|
|
- **Data State**: Document entity exists, ExportFormat selected, streaming aborted
|
|
|
|
### Timeout Exceeded
|
|
- **Condition**: Request duration > 30 seconds
|
|
- **Response**: HTTP 504 "Gateway Timeout"
|
|
- **Data State**: Partial processing, request abandoned
|
|
|
|
### Google Drive API Error
|
|
- **Condition**: API unavailable, rate limit exceeded
|
|
- **Response**: HTTP 502 "Bad Gateway - Google Drive API unavailable"
|
|
- **Data State**: Variable depending on failure point
|
|
|
|
---
|
|
|
|
## Data Flow Example
|
|
|
|
**Successful Export (Google Workspace Document)**:
|
|
```
|
|
1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." }
|
|
2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} }
|
|
3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false }
|
|
4. Stream content from url to client
|
|
5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md"
|
|
```
|
|
|
|
**Successful Export (Native PDF)**:
|
|
```
|
|
1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." }
|
|
2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null }
|
|
3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true }
|
|
4. Stream file using files.get with alt=media
|
|
5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf"
|
|
```
|
|
|
|
**Failed Export (Unsupported Type)**:
|
|
```
|
|
1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." }
|
|
2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null }
|
|
3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false }
|
|
4. Return HTTP 403 "mimetype not supported"
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Notes
|
|
|
|
### Statelessness
|
|
- No entities persisted to database or cache
|
|
- All data exists only for request duration
|
|
- Document metadata fetched fresh per request
|
|
|
|
### Memory Management
|
|
- Document metadata buffered in memory (typically <1KB)
|
|
- Content never buffered - streamed directly
|
|
- Maximum memory per request: ~10MB + metadata
|
|
|
|
### Concurrency
|
|
- Each request handled independently with isolated ExportRequest entity
|
|
- No shared state between requests
|
|
- Target: 50 concurrent requests without degradation
|