# Technical Research: Document Export API Route **Feature**: 002-document-export **Date**: 2026-03-09 **Purpose**: Research technical patterns and best practices for implementing Google Drive document export functionality ## Research Areas ### 1. Google Drive Files.get API - Metadata Retrieval **Decision**: Use Google Drive API v3 `files.get` endpoint with specific field selection **Rationale**: - Google Drive API v3 provides `files.get` endpoint: `GET https://www.googleapis.com/drive/v3/files/{fileId}` - Field selection via `fields` query parameter reduces response size and improves performance - Required fields: `id,name,mimeType,exportLinks` - exportLinks returns map of available export formats for Google Workspace documents - Native files (PDFs, images) don't have exportLinks field **Implementation Pattern**: ```javascript // In proxy.js - Google Drive API call const metadataUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`; const params = { fields: 'id,name,mimeType,exportLinks', supportsAllDrives: true // Support shared drives }; const response = await axios.get(metadataUrl, { params, headers: { Authorization: `Bearer ${accessToken}` } }); ``` **Alternatives Considered**: - files.export endpoint directly → Rejected: Requires knowing export format upfront, can't query available formats - files.list with query → Rejected: Less efficient, requires additional parsing **References**: - Google Drive API v3 Files.get: https://developers.google.com/drive/api/reference/rest/v3/files/get - Field selection: https://developers.google.com/drive/api/guides/fields-parameter --- ### 2. Export Format Selection Strategy **Decision**: Priority-based format selection (Markdown > HTML > PDF) with fallback to native file streaming **Rationale**: - Google Workspace documents (Docs, Sheets, Slides) provide exportLinks map: `{"text/plain": "url", "text/html": "url", ...}` - Markdown (text/x-markdown) is most portable for downstream content processing - HTML fallback provides rich formatting when Markdown unavailable - PDF fallback ensures something is always available - Native PDFs streamed directly using files.get with `alt=media` parameter **Implementation Pattern**: ```javascript // Format priority order const EXPORT_FORMATS = [ { mimeType: 'text/x-markdown', extension: 'md' }, { mimeType: 'text/html', extension: 'html' }, { mimeType: 'application/pdf', extension: 'pdf' } ]; // Selection logic function selectExportFormat(exportLinks) { for (const format of EXPORT_FORMATS) { if (exportLinks && exportLinks[format.mimeType]) { return { url: exportLinks[format.mimeType], contentType: format.mimeType, extension: format.extension }; } } return null; // No export links available } ``` **Alternatives Considered**: - User-specified format via query parameter → Rejected: Out of scope per spec, adds complexity - Always export as PDF → Rejected: Markdown preferred for content processing - Try all formats in parallel → Rejected: Unnecessary, increases API calls --- ### 3. Native PDF File Streaming **Decision**: Use Google Drive API `files.get` with `alt=media` parameter for direct file content download **Rationale**: - Native PDF files (mimeType: application/pdf) don't have exportLinks - files.get with `alt=media` returns raw file bytes as response body - Response is streamed directly to client (no buffering in proxy) - Efficient for large files up to 10MB limit **Implementation Pattern**: ```javascript // For native PDFs (no exportLinks) if (metadata.mimeType === 'application/pdf' && !metadata.exportLinks) { const fileUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`; const response = await axios.get(fileUrl, { params: { alt: 'media' }, headers: { Authorization: `Bearer ${accessToken}` }, responseType: 'stream' // Stream response }); // Pipe stream to client res.setHeader('Content-Type', 'application/pdf'); res.setHeader('Content-Disposition', `inline; filename="${metadata.name}.pdf"`); response.data.pipe(res); } ``` **Alternatives Considered**: - Buffer entire file in memory → Rejected: Inefficient for large files, increases memory usage - Download and re-upload → Rejected: Unnecessary overhead, adds latency **References**: - Google Drive API files.get with alt=media: https://developers.google.com/drive/api/guides/manage-downloads --- ### 4. Content-Disposition Header Format **Decision**: Use `inline; filename="[name].[ext]"` format for Content-Disposition header **Rationale**: - `inline` disposition allows browser to display content (PDFs, HTML) in-browser - Filename parameter provides sensible default if user saves file - RFC 6266 compliant format - Filename includes extension matching export format (.md, .html, .pdf) **Implementation Pattern**: ```javascript // Generate Content-Disposition header function generateContentDisposition(filename, extension) { // Sanitize filename: remove special characters, limit length const sanitized = filename .replace(/[^a-zA-Z0-9-_. ]/g, '_') // Replace special chars .substring(0, 255); // Limit length return `inline; filename="${sanitized}.${extension}"`; } // Usage res.setHeader('Content-Disposition', generateContentDisposition(metadata.name, 'md')); ``` **Alternatives Considered**: - `attachment` disposition → Rejected: Forces download, prevents in-browser viewing - No filename parameter → Rejected: Browser uses document ID as filename (poor UX) - RFC 2231 encoding for Unicode → Deferred: Simple ASCII sanitization sufficient for MVP **References**: - RFC 6266 Content-Disposition: https://datatracker.ietf.org/doc/html/rfc6266 --- ### 5. Error Handling & HTTP Status Codes **Decision**: Map Google Drive API errors to appropriate HTTP status codes with descriptive messages **Rationale**: - Google Drive API returns structured error responses with reason codes - Map to standard HTTP status codes for consistent client experience - Plain text error messages for simplicity (no JSON wrapper needed) **Implementation Pattern**: ```javascript // Error mapping const ERROR_MAP = { 'notFound': { status: 404, message: 'Document not found' }, 'authError': { status: 401, message: 'Unauthorized' }, 'forbidden': { status: 401, message: 'Unauthorized' }, 'insufficientPermissions': { status: 401, message: 'Unauthorized' }, 'rateLimitExceeded': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' }, 'backendError': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' } }; // Error handler function handleDriveError(error) { const reason = error.response?.data?.error?.errors?.[0]?.reason; const mapped = ERROR_MAP[reason] || { status: 500, message: 'Export failed - unable to retrieve document content' }; return { status: mapped.status, message: mapped.message }; } ``` **Additional Error Scenarios**: - Document > 10MB: Check Content-Length header, return HTTP 413 - Timeout > 30s: Use axios timeout option, return HTTP 504 - Unsupported mimetype: Check mimeType, return HTTP 403 **Alternatives Considered**: - JSON error responses → Rejected: Plain text simpler per spec assumptions - Retry logic → Rejected: Out of scope per spec - Detailed error messages → Rejected: Security concern, could leak internal details --- ### 6. Request Timeout & Size Limits **Decision**: Implement 30-second timeout with axios and 10MB size validation via Content-Length header **Rationale**: - axios supports timeout option for all requests - Content-Length header available in Google Drive API responses before streaming - Early validation prevents downloading oversized files - Timeout prevents hanging requests from blocking proxy **Implementation Pattern**: ```javascript // Timeout configuration const TIMEOUT_MS = 30000; // 30 seconds const MAX_SIZE_BYTES = 10 * 1024 * 1024; // 10 MB // Request with timeout const response = await axios.get(url, { timeout: TIMEOUT_MS, headers: { Authorization: `Bearer ${accessToken}` } }); // Size validation const contentLength = parseInt(response.headers['content-length'] || '0'); if (contentLength > MAX_SIZE_BYTES) { return res.status(413).send('Payload Too Large'); } ``` **Alternatives Considered**: - Progressive timeout (short for metadata, long for content) → Rejected: Adds complexity, 30s sufficient - No size validation → Rejected: Could stream partial files, poor UX - Configurable limits → Rejected: Hard-coded per spec, no need for configuration --- ### 7. Streaming vs Buffering **Decision**: Stream export content directly from Google Drive to client without buffering **Rationale**: - axios supports streaming via `responseType: 'stream'` - Node.js streams allow piping directly to HTTP response - No memory overhead for file contents (only metadata buffered) - Efficient for documents approaching 10MB limit **Implementation Pattern**: ```javascript // Stream response const exportResponse = await axios.get(exportUrl, { headers: { Authorization: `Bearer ${accessToken}` }, responseType: 'stream', timeout: TIMEOUT_MS }); // Set headers res.setHeader('Content-Type', contentType); res.setHeader('Content-Disposition', contentDisposition); // Pipe stream exportResponse.data.pipe(res); // Handle stream errors exportResponse.data.on('error', (err) => { if (!res.headersSent) { res.status(500).send('Export failed - unable to retrieve document content'); } }); ``` **Alternatives Considered**: - Buffer entire response → Rejected: Increases memory usage, adds latency - Chunked encoding → Not needed: Google Drive provides Content-Length --- ## Summary of Technical Decisions | Area | Decision | Rationale | |------|----------|-----------| | **Metadata API** | files.get with field selection | Minimal response size, single API call | | **Format Selection** | Priority order: Markdown > HTML > PDF | Most portable to least portable | | **Native PDFs** | files.get with alt=media streaming | Efficient, no conversion needed | | **Headers** | Content-Disposition: inline with filename | Browser rendering + save support | | **Error Mapping** | Google Drive errors → HTTP status codes | Consistent client experience | | **Timeouts** | 30s axios timeout | Prevents hanging requests | | **Size Limits** | 10MB via Content-Length validation | Early rejection, no partial downloads | | **Streaming** | Direct pipe from Google Drive to client | Memory efficient, low latency | All decisions align with constitution principles (monolithic architecture, simplicity, YAGNI) and specification requirements.