Files

9.6 KiB

Data Model: Document Export API Route

Feature: 002-document-export
Date: 2026-03-09
Purpose: Define data structures and entities for document export functionality

Overview

This feature introduces three primary entities for handling document export requests: Document, ExportRequest, and ExportFormat. These entities represent the data flowing through the export pipeline from request initiation to response delivery.


Entities

1. Document

Represents a file stored in Google Drive, accessed by unique ID.

Attributes:

Field Type Required Description Validation
id string Yes Google Drive document identifier (extracted from URL parameter) Non-empty string, alphanumeric with hyphens/underscores
name string Yes Document name from Google Drive metadata Non-empty string, used in Content-Disposition filename
mimeType string Yes MIME type of the document One of Google Workspace types or native file types
exportLinks object No Map of available export formats to URLs Key: MIME type (string), Value: Export URL (string)

Document Types:

  1. Google Workspace Documents:

    • Docs: application/vnd.google-apps.document
    • Sheets: application/vnd.google-apps.spreadsheet
    • Slides: application/vnd.google-apps.presentation
    • Characteristic: Have exportLinks field with conversion options
  2. Native Files:

    • PDF: application/pdf
    • Images: image/jpeg, image/png, etc.
    • Other: Various MIME types
    • Characteristic: No exportLinks field, streamed directly

State Transitions:

  • N/A (stateless - documents fetched per request)

Example:

// Google Workspace Document (has exportLinks)
{
  id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  name: "Meeting Notes Q1 2026",
  mimeType: "application/vnd.google-apps.document",
  exportLinks: {
    "text/x-markdown": "https://docs.google.com/feeds/download/documents/export/Export?...",
    "text/html": "https://docs.google.com/feeds/download/documents/export/Export?...",
    "application/pdf": "https://docs.google.com/feeds/download/documents/export/Export?..."
  }
}

// Native PDF (no exportLinks)
{
  id: "1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890",
  name: "Product Specs",
  mimeType: "application/pdf",
  exportLinks: null
}

2. ExportRequest

Represents a user's request to export a document via the /documents/:documentId route.

Attributes:

Field Type Required Description Validation
documentId string Yes Document ID from URL path parameter Non-empty string, alphanumeric with hyphens/underscores
timestamp Date Yes Request initiation timestamp ISO 8601 format, used for timeout calculation
accessToken string Yes Google Drive API access token (from auth context) Valid JWT, not expired

Lifecycle:

  1. Initiated: Request received on /documents/:documentId
  2. Authenticated: Access token validated and available
  3. Metadata Fetched: Google Drive API called for document metadata
  4. Format Selected: Export format chosen based on availability
  5. Content Streamed: Document content piped to response
  6. Completed: Response sent to client

Timeout Handling:

  • Maximum duration: 30 seconds from timestamp
  • Enforced via axios timeout configuration
  • Returns HTTP 504 if exceeded

Example:

{
  documentId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  timestamp: "2026-03-09T18:00:00.000Z",
  accessToken: "ya29.a0AfH6SMBx..."  // Google OAuth2 access token
}

3. ExportFormat

Represents the selected output format for a document export.

Attributes:

Field Type Required Description Validation
mimeType string Yes MIME type of the export format One of: text/x-markdown, text/html, application/pdf
extension string Yes File extension for Content-Disposition header One of: md, html, pdf
url string Conditional Export URL from Google Drive exportLinks Required for Google Workspace docs, null for native files
isNative boolean Yes Whether this is a native file (direct stream) or export true for native PDFs, false for conversions

Format Priority: Priority order for selection when multiple formats available:

  1. text/x-markdown (.md) - Most portable for content processing
  2. text/html (.html) - Rich formatting fallback
  3. application/pdf (.pdf) - Universal viewing format

Selection Rules:

  1. If exportLinks exist: Select first available format from priority list
  2. If no exportLinks and mimeType === 'application/pdf': Use native PDF streaming
  3. Otherwise: Return HTTP 403 "mimetype not supported"

Example:

// Google Workspace Document export (Markdown selected)
{
  mimeType: "text/x-markdown",
  extension: "md",
  url: "https://docs.google.com/feeds/download/documents/export/Export?...",
  isNative: false
}

// Native PDF file (direct stream)
{
  mimeType: "application/pdf",
  extension: "pdf",
  url: null,  // Not used - file streamed directly
  isNative: true
}

// Unsupported file (image)
{
  mimeType: null,
  extension: null,
  url: null,
  isNative: false
}
// Returns HTTP 403

Entity Relationships

ExportRequest
    |
    | 1:1 (fetches)
    v
Document
    |
    | 1:1 (determines)
    v
ExportFormat

Flow:

  1. ExportRequest initiated with documentId
  2. Document metadata fetched from Google Drive API
  3. ExportFormat selected based on Document attributes (mimeType, exportLinks)
  4. Content streamed using ExportFormat configuration

Validation Rules

Document Validation

  • ID Format: Must be valid Google Drive file ID (alphanumeric, hyphens, underscores)
  • Name Sanitization: Remove special characters for Content-Disposition filename
  • MIME Type: Must be recognized Google Workspace or native file type
  • Export Links: If present, must be object with string keys and URL string values

Size & Timeout Constraints

  • Max Document Size: 10MB (10,485,760 bytes)
    • Validated via Content-Length header before streaming
    • Returns HTTP 413 if exceeded
  • Max Request Duration: 30 seconds
    • Enforced via axios timeout
    • Returns HTTP 504 if exceeded

Format Selection Validation

  • Priority Check: Iterate through formats in order: Markdown → HTML → PDF
  • Availability Check: Format must exist in exportLinks object
  • Fallback Check: If no exportLinks, mimeType must be application/pdf
  • Rejection: If none of above, return HTTP 403

Error States

Document Not Found

  • Condition: Google Drive API returns 404 or document doesn't exist
  • Response: HTTP 404 "Document not found"
  • Data State: No Document entity created

Unauthorized Access

  • Condition: User lacks permissions, invalid/expired token
  • Response: HTTP 401 "Unauthorized"
  • Data State: No Document entity created

Unsupported Format

  • Condition: No exportLinks, mimeType not application/pdf
  • Response: HTTP 403 "mimetype not supported"
  • Data State: Document entity exists, ExportFormat entity null

Size Limit Exceeded

  • Condition: Content-Length > 10MB
  • Response: HTTP 413 "Payload Too Large"
  • Data State: Document entity exists, ExportFormat selected, streaming aborted

Timeout Exceeded

  • Condition: Request duration > 30 seconds
  • Response: HTTP 504 "Gateway Timeout"
  • Data State: Partial processing, request abandoned

Google Drive API Error

  • Condition: API unavailable, rate limit exceeded
  • Response: HTTP 502 "Bad Gateway - Google Drive API unavailable"
  • Data State: Variable depending on failure point

Data Flow Example

Successful Export (Google Workspace Document):

1. ExportRequest { documentId: "abc123", timestamp: T0, accessToken: "..." }
2. Document { id: "abc123", name: "Report", mimeType: "application/vnd.google-apps.document", exportLinks: {...} }
3. ExportFormat { mimeType: "text/x-markdown", extension: "md", url: "https://...", isNative: false }
4. Stream content from url to client
5. Response Headers: Content-Type: text/x-markdown, Content-Disposition: inline; filename="Report.md"

Successful Export (Native PDF):

1. ExportRequest { documentId: "xyz789", timestamp: T0, accessToken: "..." }
2. Document { id: "xyz789", name: "Invoice", mimeType: "application/pdf", exportLinks: null }
3. ExportFormat { mimeType: "application/pdf", extension: "pdf", url: null, isNative: true }
4. Stream file using files.get with alt=media
5. Response Headers: Content-Type: application/pdf, Content-Disposition: inline; filename="Invoice.pdf"

Failed Export (Unsupported Type):

1. ExportRequest { documentId: "img456", timestamp: T0, accessToken: "..." }
2. Document { id: "img456", name: "Photo", mimeType: "image/jpeg", exportLinks: null }
3. ExportFormat { mimeType: null, extension: null, url: null, isNative: false }
4. Return HTTP 403 "mimetype not supported"

Implementation Notes

Statelessness

  • No entities persisted to database or cache
  • All data exists only for request duration
  • Document metadata fetched fresh per request

Memory Management

  • Document metadata buffered in memory (typically <1KB)
  • Content never buffered - streamed directly
  • Maximum memory per request: ~10MB + metadata

Concurrency

  • Each request handled independently with isolated ExportRequest entity
  • No shared state between requests
  • Target: 50 concurrent requests without degradation