Files

10 KiB

Technical Research: Document Export API Route

Feature: 002-document-export
Date: 2026-03-09
Purpose: Research technical patterns and best practices for implementing Google Drive document export functionality

Research Areas

1. Google Drive Files.get API - Metadata Retrieval

Decision: Use Google Drive API v3 files.get endpoint with specific field selection

Rationale:

  • Google Drive API v3 provides files.get endpoint: GET https://www.googleapis.com/drive/v3/files/{fileId}
  • Field selection via fields query parameter reduces response size and improves performance
  • Required fields: id,name,mimeType,exportLinks
  • exportLinks returns map of available export formats for Google Workspace documents
  • Native files (PDFs, images) don't have exportLinks field

Implementation Pattern:

// In proxy.js - Google Drive API call
const metadataUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
const params = {
  fields: 'id,name,mimeType,exportLinks',
  supportsAllDrives: true  // Support shared drives
};
const response = await axios.get(metadataUrl, {
  params,
  headers: { Authorization: `Bearer ${accessToken}` }
});

Alternatives Considered:

  • files.export endpoint directly → Rejected: Requires knowing export format upfront, can't query available formats
  • files.list with query → Rejected: Less efficient, requires additional parsing

References:


2. Export Format Selection Strategy

Decision: Priority-based format selection (Markdown > HTML > PDF) with fallback to native file streaming

Rationale:

  • Google Workspace documents (Docs, Sheets, Slides) provide exportLinks map: {"text/plain": "url", "text/html": "url", ...}
  • Markdown (text/x-markdown) is most portable for downstream content processing
  • HTML fallback provides rich formatting when Markdown unavailable
  • PDF fallback ensures something is always available
  • Native PDFs streamed directly using files.get with alt=media parameter

Implementation Pattern:

// Format priority order
const EXPORT_FORMATS = [
  { mimeType: 'text/x-markdown', extension: 'md' },
  { mimeType: 'text/html', extension: 'html' },
  { mimeType: 'application/pdf', extension: 'pdf' }
];

// Selection logic
function selectExportFormat(exportLinks) {
  for (const format of EXPORT_FORMATS) {
    if (exportLinks && exportLinks[format.mimeType]) {
      return {
        url: exportLinks[format.mimeType],
        contentType: format.mimeType,
        extension: format.extension
      };
    }
  }
  return null;  // No export links available
}

Alternatives Considered:

  • User-specified format via query parameter → Rejected: Out of scope per spec, adds complexity
  • Always export as PDF → Rejected: Markdown preferred for content processing
  • Try all formats in parallel → Rejected: Unnecessary, increases API calls

3. Native PDF File Streaming

Decision: Use Google Drive API files.get with alt=media parameter for direct file content download

Rationale:

  • Native PDF files (mimeType: application/pdf) don't have exportLinks
  • files.get with alt=media returns raw file bytes as response body
  • Response is streamed directly to client (no buffering in proxy)
  • Efficient for large files up to 10MB limit

Implementation Pattern:

// For native PDFs (no exportLinks)
if (metadata.mimeType === 'application/pdf' && !metadata.exportLinks) {
  const fileUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
  const response = await axios.get(fileUrl, {
    params: { alt: 'media' },
    headers: { Authorization: `Bearer ${accessToken}` },
    responseType: 'stream'  // Stream response
  });
  
  // Pipe stream to client
  res.setHeader('Content-Type', 'application/pdf');
  res.setHeader('Content-Disposition', `inline; filename="${metadata.name}.pdf"`);
  response.data.pipe(res);
}

Alternatives Considered:

  • Buffer entire file in memory → Rejected: Inefficient for large files, increases memory usage
  • Download and re-upload → Rejected: Unnecessary overhead, adds latency

References:


4. Content-Disposition Header Format

Decision: Use inline; filename="[name].[ext]" format for Content-Disposition header

Rationale:

  • inline disposition allows browser to display content (PDFs, HTML) in-browser
  • Filename parameter provides sensible default if user saves file
  • RFC 6266 compliant format
  • Filename includes extension matching export format (.md, .html, .pdf)

Implementation Pattern:

// Generate Content-Disposition header
function generateContentDisposition(filename, extension) {
  // Sanitize filename: remove special characters, limit length
  const sanitized = filename
    .replace(/[^a-zA-Z0-9-_. ]/g, '_')  // Replace special chars
    .substring(0, 255);  // Limit length
  
  return `inline; filename="${sanitized}.${extension}"`;
}

// Usage
res.setHeader('Content-Disposition', generateContentDisposition(metadata.name, 'md'));

Alternatives Considered:

  • attachment disposition → Rejected: Forces download, prevents in-browser viewing
  • No filename parameter → Rejected: Browser uses document ID as filename (poor UX)
  • RFC 2231 encoding for Unicode → Deferred: Simple ASCII sanitization sufficient for MVP

References:


5. Error Handling & HTTP Status Codes

Decision: Map Google Drive API errors to appropriate HTTP status codes with descriptive messages

Rationale:

  • Google Drive API returns structured error responses with reason codes
  • Map to standard HTTP status codes for consistent client experience
  • Plain text error messages for simplicity (no JSON wrapper needed)

Implementation Pattern:

// Error mapping
const ERROR_MAP = {
  'notFound': { status: 404, message: 'Document not found' },
  'authError': { status: 401, message: 'Unauthorized' },
  'forbidden': { status: 401, message: 'Unauthorized' },
  'insufficientPermissions': { status: 401, message: 'Unauthorized' },
  'rateLimitExceeded': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' },
  'backendError': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' }
};

// Error handler
function handleDriveError(error) {
  const reason = error.response?.data?.error?.errors?.[0]?.reason;
  const mapped = ERROR_MAP[reason] || { status: 500, message: 'Export failed - unable to retrieve document content' };
  
  return {
    status: mapped.status,
    message: mapped.message
  };
}

Additional Error Scenarios:

  • Document > 10MB: Check Content-Length header, return HTTP 413
  • Timeout > 30s: Use axios timeout option, return HTTP 504
  • Unsupported mimetype: Check mimeType, return HTTP 403

Alternatives Considered:

  • JSON error responses → Rejected: Plain text simpler per spec assumptions
  • Retry logic → Rejected: Out of scope per spec
  • Detailed error messages → Rejected: Security concern, could leak internal details

6. Request Timeout & Size Limits

Decision: Implement 30-second timeout with axios and 10MB size validation via Content-Length header

Rationale:

  • axios supports timeout option for all requests
  • Content-Length header available in Google Drive API responses before streaming
  • Early validation prevents downloading oversized files
  • Timeout prevents hanging requests from blocking proxy

Implementation Pattern:

// Timeout configuration
const TIMEOUT_MS = 30000;  // 30 seconds
const MAX_SIZE_BYTES = 10 * 1024 * 1024;  // 10 MB

// Request with timeout
const response = await axios.get(url, {
  timeout: TIMEOUT_MS,
  headers: { Authorization: `Bearer ${accessToken}` }
});

// Size validation
const contentLength = parseInt(response.headers['content-length'] || '0');
if (contentLength > MAX_SIZE_BYTES) {
  return res.status(413).send('Payload Too Large');
}

Alternatives Considered:

  • Progressive timeout (short for metadata, long for content) → Rejected: Adds complexity, 30s sufficient
  • No size validation → Rejected: Could stream partial files, poor UX
  • Configurable limits → Rejected: Hard-coded per spec, no need for configuration

7. Streaming vs Buffering

Decision: Stream export content directly from Google Drive to client without buffering

Rationale:

  • axios supports streaming via responseType: 'stream'
  • Node.js streams allow piping directly to HTTP response
  • No memory overhead for file contents (only metadata buffered)
  • Efficient for documents approaching 10MB limit

Implementation Pattern:

// Stream response
const exportResponse = await axios.get(exportUrl, {
  headers: { Authorization: `Bearer ${accessToken}` },
  responseType: 'stream',
  timeout: TIMEOUT_MS
});

// Set headers
res.setHeader('Content-Type', contentType);
res.setHeader('Content-Disposition', contentDisposition);

// Pipe stream
exportResponse.data.pipe(res);

// Handle stream errors
exportResponse.data.on('error', (err) => {
  if (!res.headersSent) {
    res.status(500).send('Export failed - unable to retrieve document content');
  }
});

Alternatives Considered:

  • Buffer entire response → Rejected: Increases memory usage, adds latency
  • Chunked encoding → Not needed: Google Drive provides Content-Length

Summary of Technical Decisions

Area Decision Rationale
Metadata API files.get with field selection Minimal response size, single API call
Format Selection Priority order: Markdown > HTML > PDF Most portable to least portable
Native PDFs files.get with alt=media streaming Efficient, no conversion needed
Headers Content-Disposition: inline with filename Browser rendering + save support
Error Mapping Google Drive errors → HTTP status codes Consistent client experience
Timeouts 30s axios timeout Prevents hanging requests
Size Limits 10MB via Content-Length validation Early rejection, no partial downloads
Streaming Direct pipe from Google Drive to client Memory efficient, low latency

All decisions align with constitution principles (monolithic architecture, simplicity, YAGNI) and specification requirements.