Added new feature for document export, including API contracts, data model, implementation plan, and tests. Updated related configurations and instructions.
This commit is contained in:
285
specs/002-document-export/data-model.md
Normal file
285
specs/002-document-export/data-model.md
Normal file
@@ -0,0 +1,285 @@
|
||||
# Data Model: Document Export API Route
|
||||
|
||||
**Feature**: 002-document-export
|
||||
**Date**: 2026-03-09
|
||||
**Purpose**: Define data structures and entities for document export functionality
|
||||
|
||||
## Overview
|
||||
|
||||
This feature introduces three primary entities for handling document export requests: **Document**, **ExportRequest**, and **ExportFormat**. These entities represent the data flowing through the export pipeline from request initiation to response delivery.
|
||||
|
||||
---
|
||||
|
||||
## Entities
|
||||
|
||||
### 1. Document
|
||||
|
||||
Represents a file stored in Google Drive, accessed by unique ID.
|
||||
|
||||
**Attributes**:
|
||||
|
||||
| Field | Type | Required | Description | Validation |
|
||||
|-------|------|----------|-------------|------------|
|
||||
| `id` | string | Yes | Google Drive document identifier (extracted from URL parameter) | Non-empty string, alphanumeric with hyphens/underscores |
|
||||
| `name` | string | Yes | Document name from Google Drive metadata | Non-empty string, used in Content-Disposition filename |
|
||||
| `mimeType` | string | Yes | MIME type of the document | One of Google Workspace types or native file types |
|
||||
| `exportLinks` | object | No | Map of available export formats to URLs | Key: MIME type (string), Value: Export URL (string) |
|
||||
|
||||
**Document Types**:
|
||||
|
||||
1. **Google Workspace Documents**:
|
||||
- Docs: `application/vnd.google-apps.document`
|
||||
- Sheets: `application/vnd.google-apps.spreadsheet`
|
||||
- Slides: `application/vnd.google-apps.presentation`
|
||||
- **Characteristic**: Have `exportLinks` field with conversion options
|
||||
|
||||
2. **Native Files**:
|
||||
- PDF: `application/pdf`
|
||||
- Images: `image/jpeg`, `image/png`, etc.
|
||||
- Other: Various MIME types
|
||||
- **Characteristic**: No `exportLinks` field, streamed directly
|
||||
|
||||
**State Transitions**:
|
||||
- N/A (stateless - documents fetched per request)
|
||||
|
||||
**Example**:
|
||||
```javascript
|
||||
// Google Workspace Document (has exportLinks)
|
||||
{
|
||||
id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
|
||||
name: "Meeting Notes Q1 2026",
|
||||
mimeType: "application/vnd.google-apps.document",
|
||||
exportLinks: {
|
||||
"text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...",
|
||||
"text/html": "https://docs.google.com/feeds/download/documents/export/Export?...",
|
||||
"application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..."
|
||||
}
|
||||
}
|
||||
|
||||
// Native PDF (no exportLinks)
|
||||
{
|
||||
id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890",
|
||||
name: "Product Specs",
|
||||
mimeType: "application/pdf",
|
||||
exportLinks: null
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. ExportRequest
|
||||
|
||||
Represents a user's request to export a document via the `/documents/:documentId` route.
|
||||
|
||||
**Attributes**:
|
||||
|
||||
| Field | Type | Required | Description | Validation |
|
||||
|-------|------|----------|-------------|------------|
|
||||
| `documentId` | string | Yes | Document ID from URL path parameter | Non-empty string, alphanumeric with hyphens/underscores |
|
||||
| `timestamp` | Date | Yes | Request initiation timestamp | ISO 8601 format, used for timeout calculation |
|
||||
| `accessToken` | string | Yes | Google Drive API access token (from auth context) | Valid JWT, not expired |
|
||||
|
||||
**Lifecycle**:
|
||||
1. **Initiated**: Request received on `/documents/:documentId`
|
||||
2. **Authenticated**: Access token validated and available
|
||||
3. **Metadata Fetched**: Google Drive API called for document metadata
|
||||
4. **Format Selected**: Export format chosen based on availability
|
||||
5. **Content Streamed**: Document content piped to response
|
||||
6. **Completed**: Response sent to client
|
||||
|
||||
**Timeout Handling**:
|
||||
- Maximum duration: 30 seconds from timestamp
|
||||
- Enforced via axios timeout configuration
|
||||
- Returns HTTP 504 if exceeded
|
||||
|
||||
**Example**:
|
||||
```javascript
|
||||
{
|
||||
documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
|
||||
timestamp: "2026-03-09T18:00:00.000Z",
|
||||
accessToken: "ya29.a0AfH6SMBx..." // Google OAuth2 access token
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. ExportFormat
|
||||
|
||||
Represents the selected output format for a document export.
|
||||
|
||||
**Attributes**:
|
||||
|
||||
| Field | Type | Required | Description | Validation |
|
||||
|-------|------|----------|-------------|------------|
|
||||
| `mimeType` | string | Yes | MIME type of the export format | One of: `text/x-markdown`, `text/html`, `application/pdf` |
|
||||
| `extension` | string | Yes | File extension for Content-Disposition header | One of: `md`, `html`, `pdf` |
|
||||
| `url` | string | Conditional | Export URL from Google Drive exportLinks | Required for Google Workspace docs, null for native files |
|
||||
| `isNative` | boolean | Yes | Whether this is a native file (direct stream) or export | `true` for native PDFs, `false` for conversions |
|
||||
|
||||
**Format Priority**:
|
||||
Priority order for selection when multiple formats available:
|
||||
1. `text/x-markdown` (.md) - Most portable for content processing
|
||||
2. `text/html` (.html) - Rich formatting fallback
|
||||
3. `application/pdf` (.pdf) - Universal viewing format
|
||||
|
||||
**Selection Rules**:
|
||||
1. If `exportLinks` exist: Select first available format from priority list
|
||||
2. If no `exportLinks` and `mimeType === 'application/pdf'`: Use native PDF streaming
|
||||
3. Otherwise: Return HTTP 403 "mimetype not supported"
|
||||
|
||||
**Example**:
|
||||
```javascript
|
||||
// Google Workspace Document export (Markdown selected)
|
||||
{
|
||||
mimeType: "text/x-markdown",
|
||||
extension: "md",
|
||||
url: "https://docs.google.com/feeds/download/documents/export/Export?...",
|
||||
isNative: false
|
||||
}
|
||||
|
||||
// Native PDF file (direct stream)
|
||||
{
|
||||
mimeType: "application/pdf",
|
||||
extension: "pdf",
|
||||
url: null, // Not used - file streamed directly
|
||||
isNative: true
|
||||
}
|
||||
|
||||
// Unsupported file (image)
|
||||
{
|
||||
mimeType: null,
|
||||
extension: null,
|
||||
url: null,
|
||||
isNative: false
|
||||
}
|
||||
// Returns HTTP 403
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Entity Relationships
|
||||
|
||||
```
|
||||
ExportRequest
|
||||
|
|
||||
| 1:1 (fetches)
|
||||
v
|
||||
Document
|
||||
|
|
||||
| 1:1 (determines)
|
||||
v
|
||||
ExportFormat
|
||||
```
|
||||
|
||||
**Flow**:
|
||||
1. ExportRequest initiated with documentId
|
||||
2. Document metadata fetched from Google Drive API
|
||||
3. ExportFormat selected based on Document attributes (mimeType, exportLinks)
|
||||
4. Content streamed using ExportFormat configuration
|
||||
|
||||
---
|
||||
|
||||
## Validation Rules
|
||||
|
||||
### Document Validation
|
||||
- **ID Format**: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores)
|
||||
- **Name Sanitization**: Remove special characters for Content-Disposition filename
|
||||
- **MIME Type**: Must be recognized Google Workspace or native file type
|
||||
- **Export Links**: If present, must be object with string keys and URL string values
|
||||
|
||||
### Size & Timeout Constraints
|
||||
- **Max Document Size**: 10MB (10,485,760 bytes)
|
||||
- Validated via `Content-Length` header before streaming
|
||||
- Returns HTTP 413 if exceeded
|
||||
- **Max Request Duration**: 30 seconds
|
||||
- Enforced via axios timeout
|
||||
- Returns HTTP 504 if exceeded
|
||||
|
||||
### Format Selection Validation
|
||||
- **Priority Check**: Iterate through formats in order: Markdown → HTML → PDF
|
||||
- **Availability Check**: Format must exist in exportLinks object
|
||||
- **Fallback Check**: If no exportLinks, mimeType must be `application/pdf`
|
||||
- **Rejection**: If none of above, return HTTP 403
|
||||
|
||||
---
|
||||
|
||||
## Error States
|
||||
|
||||
### Document Not Found
|
||||
- **Condition**: Google Drive API returns 404 or document doesn't exist
|
||||
- **Response**: HTTP 404 "Document not found"
|
||||
- **Data State**: No Document entity created
|
||||
|
||||
### Unauthorized Access
|
||||
- **Condition**: User lacks permissions, invalid/expired token
|
||||
- **Response**: HTTP 401 "Unauthorized"
|
||||
- **Data State**: No Document entity created
|
||||
|
||||
### Unsupported Format
|
||||
- **Condition**: No exportLinks, mimeType not application/pdf
|
||||
- **Response**: HTTP 403 "mimetype not supported"
|
||||
- **Data State**: Document entity exists, ExportFormat entity null
|
||||
|
||||
### Size Limit Exceeded
|
||||
- **Condition**: Content-Length > 10MB
|
||||
- **Response**: HTTP 413 "Payload Too Large"
|
||||
- **Data State**: Document entity exists, ExportFormat selected, streaming aborted
|
||||
|
||||
### Timeout Exceeded
|
||||
- **Condition**: Request duration > 30 seconds
|
||||
- **Response**: HTTP 504 "Gateway Timeout"
|
||||
- **Data State**: Partial processing, request abandoned
|
||||
|
||||
### Google Drive API Error
|
||||
- **Condition**: API unavailable, rate limit exceeded
|
||||
- **Response**: HTTP 502 "Bad Gateway - Google Drive API unavailable"
|
||||
- **Data State**: Variable depending on failure point
|
||||
|
||||
---
|
||||
|
||||
## Data Flow Example
|
||||
|
||||
**Successful Export (Google Workspace Document)**:
|
||||
```
|
||||
1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." }
|
||||
2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} }
|
||||
3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false }
|
||||
4. Stream content from url to client
|
||||
5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md"
|
||||
```
|
||||
|
||||
**Successful Export (Native PDF)**:
|
||||
```
|
||||
1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." }
|
||||
2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null }
|
||||
3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true }
|
||||
4. Stream file using files.get with alt=media
|
||||
5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf"
|
||||
```
|
||||
|
||||
**Failed Export (Unsupported Type)**:
|
||||
```
|
||||
1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." }
|
||||
2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null }
|
||||
3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false }
|
||||
4. Return HTTP 403 "mimetype not supported"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Statelessness
|
||||
- No entities persisted to database or cache
|
||||
- All data exists only for request duration
|
||||
- Document metadata fetched fresh per request
|
||||
|
||||
### Memory Management
|
||||
- Document metadata buffered in memory (typically <1KB)
|
||||
- Content never buffered - streamed directly
|
||||
- Maximum memory per request: ~10MB + metadata
|
||||
|
||||
### Concurrency
|
||||
- Each request handled independently with isolated ExportRequest entity
|
||||
- No shared state between requests
|
||||
- Target: 50 concurrent requests without degradation
|
||||
Reference in New Issue
Block a user