google-drive-content-adapter/specs/002-document-export/data-model.md

# Data Model: Document Export API Route

**Feature**: 002-document-export
**Date**: 2026-03-09
**Purpose**: Define data structures and entities for document export functionality

## Overview

This feature introduces three primary entities for handling document export requests: **Document**, **ExportRequest**, and **ExportFormat**. These entities represent the data flowing through the export pipeline from request initiation to response delivery.

---

## Entities

### 1. Document

Represents a file stored in Google Drive, accessed by unique ID.

**Attributes**:

| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `id` | string | Yes | Google Drive document identifier (extracted from URL parameter) | Non-empty string, alphanumeric with hyphens/underscores |
| `name` | string | Yes | Document name from Google Drive metadata | Non-empty string, used in Content-Disposition filename |
| `mimeType` | string | Yes | MIME type of the document | One of Google Workspace types or native file types |
| `exportLinks` | object | No | Map of available export formats to URLs | Key: MIME type (string), Value: Export URL (string) |

**Document Types**:

1. **Google Workspace Documents**:
   - Docs: `application/vnd.google-apps.document`
   - Sheets: `application/vnd.google-apps.spreadsheet`
   - Slides: `application/vnd.google-apps.presentation`
   - **Characteristic**: Have `exportLinks` field with conversion options

2. **Native Files**:
   - PDF: `application/pdf`
   - Images: `image/jpeg`, `image/png`, etc.
   - Other: Various MIME types
   - **Characteristic**: No `exportLinks` field, streamed directly

**State Transitions**:
- N/A (stateless - documents fetched per request)

**Example**:
```javascript
// Google Workspace Document (has exportLinks)
{
  id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  name: "Meeting Notes Q1 2026",
  mimeType: "application/vnd.google-apps.document",
  exportLinks: {
    "text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...",
    "text/html": "https://docs.google.com/feeds/download/documents/export/Export?...",
    "application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..."
  }
}

// Native PDF (no exportLinks)
{
  id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890",
  name: "Product Specs",
  mimeType: "application/pdf",
  exportLinks: null
}
```

---

### 2. ExportRequest

Represents a user's request to export a document via the `/documents/:documentId` route.

**Attributes**:

| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `documentId` | string | Yes | Document ID from URL path parameter | Non-empty string, alphanumeric with hyphens/underscores |
| `timestamp` | Date | Yes | Request initiation timestamp | ISO 8601 format, used for timeout calculation |
| `accessToken` | string | Yes | Google Drive API access token (from auth context) | Valid JWT, not expired |

**Lifecycle**:
1. **Initiated**: Request received on `/documents/:documentId`
2. **Authenticated**: Access token validated and available
3. **Metadata Fetched**: Google Drive API called for document metadata
4. **Format Selected**: Export format chosen based on availability
5. **Content Streamed**: Document content piped to response
6. **Completed**: Response sent to client

**Timeout Handling**:
- Maximum duration: 30 seconds from timestamp
- Enforced via axios timeout configuration
- Returns HTTP 504 if exceeded

**Example**:
```javascript
{
  documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  timestamp: "2026-03-09T18:00:00.000Z",
  accessToken: "ya29.a0AfH6SMBx..."  // Google OAuth2 access token
}
```

---

### 3. ExportFormat

Represents the selected output format for a document export.

**Attributes**:

| Field | Type | Required | Description | Validation |
|-------|------|----------|-------------|------------|
| `mimeType` | string | Yes | MIME type of the export format | One of: `text/x-markdown`, `text/html`, `application/pdf` |
| `extension` | string | Yes | File extension for Content-Disposition header | One of: `md`, `html`, `pdf` |
| `url` | string | Conditional | Export URL from Google Drive exportLinks | Required for Google Workspace docs, null for native files |
| `isNative` | boolean | Yes | Whether this is a native file (direct stream) or export | `true` for native PDFs, `false` for conversions |

**Format Priority**:
Priority order for selection when multiple formats available:
1. `text/x-markdown` (.md) - Most portable for content processing
2. `text/html` (.html) - Rich formatting fallback
3. `application/pdf` (.pdf) - Universal viewing format

**Selection Rules**:
1. If `exportLinks` exist: Select first available format from priority list
2. If no `exportLinks` and `mimeType === 'application/pdf'`: Use native PDF streaming
3. Otherwise: Return HTTP 403 "mimetype not supported"

**Example**:
```javascript
// Google Workspace Document export (Markdown selected)
{
  mimeType: "text/x-markdown",
  extension: "md",
  url: "https://docs.google.com/feeds/download/documents/export/Export?...",
  isNative: false
}

// Native PDF file (direct stream)
{
  mimeType: "application/pdf",
  extension: "pdf",
  url: null,  // Not used - file streamed directly
  isNative: true
}

// Unsupported file (image)
{
  mimeType: null,
  extension: null,
  url: null,
  isNative: false
}
// Returns HTTP 403
```

---

## Entity Relationships

```
ExportRequest
    |
    | 1:1 (fetches)
    v
Document
    |
    | 1:1 (determines)
    v
ExportFormat
```

**Flow**:
1. ExportRequest initiated with documentId
2. Document metadata fetched from Google Drive API
3. ExportFormat selected based on Document attributes (mimeType, exportLinks)
4. Content streamed using ExportFormat configuration

---

## Validation Rules

### Document Validation
- **ID Format**: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores)
- **Name Sanitization**: Remove special characters for Content-Disposition filename
- **MIME Type**: Must be recognized Google Workspace or native file type
- **Export Links**: If present, must be object with string keys and URL string values

### Size & Timeout Constraints
- **Max Document Size**: 10MB (10,485,760 bytes)
  - Validated via `Content-Length` header before streaming
  - Returns HTTP 413 if exceeded
- **Max Request Duration**: 30 seconds
  - Enforced via axios timeout
  - Returns HTTP 504 if exceeded

### Format Selection Validation
- **Priority Check**: Iterate through formats in order: Markdown → HTML → PDF
- **Availability Check**: Format must exist in exportLinks object
- **Fallback Check**: If no exportLinks, mimeType must be `application/pdf`
- **Rejection**: If none of above, return HTTP 403

---

## Error States

### Document Not Found
- **Condition**: Google Drive API returns 404 or document doesn't exist
- **Response**: HTTP 404 "Document not found"
- **Data State**: No Document entity created

### Unauthorized Access
- **Condition**: User lacks permissions, invalid/expired token
- **Response**: HTTP 401 "Unauthorized"
- **Data State**: No Document entity created

### Unsupported Format
- **Condition**: No exportLinks, mimeType not application/pdf
- **Response**: HTTP 403 "mimetype not supported"
- **Data State**: Document entity exists, ExportFormat entity null

### Size Limit Exceeded
- **Condition**: Content-Length > 10MB
- **Response**: HTTP 413 "Payload Too Large"
- **Data State**: Document entity exists, ExportFormat selected, streaming aborted

### Timeout Exceeded
- **Condition**: Request duration > 30 seconds
- **Response**: HTTP 504 "Gateway Timeout"
- **Data State**: Partial processing, request abandoned

### Google Drive API Error
- **Condition**: API unavailable, rate limit exceeded
- **Response**: HTTP 502 "Bad Gateway - Google Drive API unavailable"
- **Data State**: Variable depending on failure point

---

## Data Flow Example

**Successful Export (Google Workspace Document)**:
```
1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." }
2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} }
3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false }
4. Stream content from url to client
5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md"
```

**Successful Export (Native PDF)**:
```
1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." }
2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null }
3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true }
4. Stream file using files.get with alt=media
5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf"
```

**Failed Export (Unsupported Type)**:
```
1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." }
2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null }
3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false }
4. Return HTTP 403 "mimetype not supported"
```

---

## Implementation Notes

### Statelessness
- No entities persisted to database or cache
- All data exists only for request duration
- Document metadata fetched fresh per request

### Memory Management
- Document metadata buffered in memory (typically <1KB)
- Content never buffered - streamed directly
- Maximum memory per request: ~10MB + metadata

### Concurrency
- Each request handled independently with isolated ExportRequest entity
- No shared state between requests
- Target: 50 concurrent requests without degradation