Added new feature for document export, including API contracts, data model, implementation plan, and tests. Updated related configurations and instructions.
This commit is contained in:
293
specs/002-document-export/research.md
Normal file
293
specs/002-document-export/research.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# Technical Research: Document Export API Route
|
||||
|
||||
**Feature**: 002-document-export
|
||||
**Date**: 2026-03-09
|
||||
**Purpose**: Research technical patterns and best practices for implementing Google Drive document export functionality
|
||||
|
||||
## Research Areas
|
||||
|
||||
### 1. Google Drive Files.get API - Metadata Retrieval
|
||||
|
||||
**Decision**: Use Google Drive API v3 `files.get` endpoint with specific field selection
|
||||
|
||||
**Rationale**:
|
||||
- Google Drive API v3 provides `files.get` endpoint: `GET https://www.googleapis.com/drive/v3/files/{fileId}`
|
||||
- Field selection via `fields` query parameter reduces response size and improves performance
|
||||
- Required fields: `id,name,mimeType,exportLinks`
|
||||
- exportLinks returns map of available export formats for Google Workspace documents
|
||||
- Native files (PDFs, images) don't have exportLinks field
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// In proxy.js - Google Drive API call
|
||||
const metadataUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
|
||||
const params = {
|
||||
fields: 'id,name,mimeType,exportLinks',
|
||||
supportsAllDrives: true // Support shared drives
|
||||
};
|
||||
const response = await axios.get(metadataUrl, {
|
||||
params,
|
||||
headers: { Authorization: `Bearer ${accessToken}` }
|
||||
});
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- files.export endpoint directly → Rejected: Requires knowing export format upfront, can't query available formats
|
||||
- files.list with query → Rejected: Less efficient, requires additional parsing
|
||||
|
||||
**References**:
|
||||
- Google Drive API v3 Files.get: https://developers.google.com/drive/api/reference/rest/v3/files/get
|
||||
- Field selection: https://developers.google.com/drive/api/guides/fields-parameter
|
||||
|
||||
---
|
||||
|
||||
### 2. Export Format Selection Strategy
|
||||
|
||||
**Decision**: Priority-based format selection (Markdown > HTML > PDF) with fallback to native file streaming
|
||||
|
||||
**Rationale**:
|
||||
- Google Workspace documents (Docs, Sheets, Slides) provide exportLinks map: `{"text/plain": "url", "text/html": "url", ...}`
|
||||
- Markdown (text/x-markdown) is most portable for downstream content processing
|
||||
- HTML fallback provides rich formatting when Markdown unavailable
|
||||
- PDF fallback ensures something is always available
|
||||
- Native PDFs streamed directly using files.get with `alt=media` parameter
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// Format priority order
|
||||
const EXPORT_FORMATS = [
|
||||
{ mimeType: 'text/x-markdown', extension: 'md' },
|
||||
{ mimeType: 'text/html', extension: 'html' },
|
||||
{ mimeType: 'application/pdf', extension: 'pdf' }
|
||||
];
|
||||
|
||||
// Selection logic
|
||||
function selectExportFormat(exportLinks) {
|
||||
for (const format of EXPORT_FORMATS) {
|
||||
if (exportLinks && exportLinks[format.mimeType]) {
|
||||
return {
|
||||
url: exportLinks[format.mimeType],
|
||||
contentType: format.mimeType,
|
||||
extension: format.extension
|
||||
};
|
||||
}
|
||||
}
|
||||
return null; // No export links available
|
||||
}
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- User-specified format via query parameter → Rejected: Out of scope per spec, adds complexity
|
||||
- Always export as PDF → Rejected: Markdown preferred for content processing
|
||||
- Try all formats in parallel → Rejected: Unnecessary, increases API calls
|
||||
|
||||
---
|
||||
|
||||
### 3. Native PDF File Streaming
|
||||
|
||||
**Decision**: Use Google Drive API `files.get` with `alt=media` parameter for direct file content download
|
||||
|
||||
**Rationale**:
|
||||
- Native PDF files (mimeType: application/pdf) don't have exportLinks
|
||||
- files.get with `alt=media` returns raw file bytes as response body
|
||||
- Response is streamed directly to client (no buffering in proxy)
|
||||
- Efficient for large files up to 10MB limit
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// For native PDFs (no exportLinks)
|
||||
if (metadata.mimeType === 'application/pdf' && !metadata.exportLinks) {
|
||||
const fileUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
|
||||
const response = await axios.get(fileUrl, {
|
||||
params: { alt: 'media' },
|
||||
headers: { Authorization: `Bearer ${accessToken}` },
|
||||
responseType: 'stream' // Stream response
|
||||
});
|
||||
|
||||
// Pipe stream to client
|
||||
res.setHeader('Content-Type', 'application/pdf');
|
||||
res.setHeader('Content-Disposition', `inline; filename="${metadata.name}.pdf"`);
|
||||
response.data.pipe(res);
|
||||
}
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Buffer entire file in memory → Rejected: Inefficient for large files, increases memory usage
|
||||
- Download and re-upload → Rejected: Unnecessary overhead, adds latency
|
||||
|
||||
**References**:
|
||||
- Google Drive API files.get with alt=media: https://developers.google.com/drive/api/guides/manage-downloads
|
||||
|
||||
---
|
||||
|
||||
### 4. Content-Disposition Header Format
|
||||
|
||||
**Decision**: Use `inline; filename="[name].[ext]"` format for Content-Disposition header
|
||||
|
||||
**Rationale**:
|
||||
- `inline` disposition allows browser to display content (PDFs, HTML) in-browser
|
||||
- Filename parameter provides sensible default if user saves file
|
||||
- RFC 6266 compliant format
|
||||
- Filename includes extension matching export format (.md, .html, .pdf)
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// Generate Content-Disposition header
|
||||
function generateContentDisposition(filename, extension) {
|
||||
// Sanitize filename: remove special characters, limit length
|
||||
const sanitized = filename
|
||||
.replace(/[^a-zA-Z0-9-_. ]/g, '_') // Replace special chars
|
||||
.substring(0, 255); // Limit length
|
||||
|
||||
return `inline; filename="${sanitized}.${extension}"`;
|
||||
}
|
||||
|
||||
// Usage
|
||||
res.setHeader('Content-Disposition', generateContentDisposition(metadata.name, 'md'));
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- `attachment` disposition → Rejected: Forces download, prevents in-browser viewing
|
||||
- No filename parameter → Rejected: Browser uses document ID as filename (poor UX)
|
||||
- RFC 2231 encoding for Unicode → Deferred: Simple ASCII sanitization sufficient for MVP
|
||||
|
||||
**References**:
|
||||
- RFC 6266 Content-Disposition: https://datatracker.ietf.org/doc/html/rfc6266
|
||||
|
||||
---
|
||||
|
||||
### 5. Error Handling & HTTP Status Codes
|
||||
|
||||
**Decision**: Map Google Drive API errors to appropriate HTTP status codes with descriptive messages
|
||||
|
||||
**Rationale**:
|
||||
- Google Drive API returns structured error responses with reason codes
|
||||
- Map to standard HTTP status codes for consistent client experience
|
||||
- Plain text error messages for simplicity (no JSON wrapper needed)
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// Error mapping
|
||||
const ERROR_MAP = {
|
||||
'notFound': { status: 404, message: 'Document not found' },
|
||||
'authError': { status: 401, message: 'Unauthorized' },
|
||||
'forbidden': { status: 401, message: 'Unauthorized' },
|
||||
'insufficientPermissions': { status: 401, message: 'Unauthorized' },
|
||||
'rateLimitExceeded': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' },
|
||||
'backendError': { status: 502, message: 'Bad Gateway - Google Drive API unavailable' }
|
||||
};
|
||||
|
||||
// Error handler
|
||||
function handleDriveError(error) {
|
||||
const reason = error.response?.data?.error?.errors?.[0]?.reason;
|
||||
const mapped = ERROR_MAP[reason] || { status: 500, message: 'Export failed - unable to retrieve document content' };
|
||||
|
||||
return {
|
||||
status: mapped.status,
|
||||
message: mapped.message
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Additional Error Scenarios**:
|
||||
- Document > 10MB: Check Content-Length header, return HTTP 413
|
||||
- Timeout > 30s: Use axios timeout option, return HTTP 504
|
||||
- Unsupported mimetype: Check mimeType, return HTTP 403
|
||||
|
||||
**Alternatives Considered**:
|
||||
- JSON error responses → Rejected: Plain text simpler per spec assumptions
|
||||
- Retry logic → Rejected: Out of scope per spec
|
||||
- Detailed error messages → Rejected: Security concern, could leak internal details
|
||||
|
||||
---
|
||||
|
||||
### 6. Request Timeout & Size Limits
|
||||
|
||||
**Decision**: Implement 30-second timeout with axios and 10MB size validation via Content-Length header
|
||||
|
||||
**Rationale**:
|
||||
- axios supports timeout option for all requests
|
||||
- Content-Length header available in Google Drive API responses before streaming
|
||||
- Early validation prevents downloading oversized files
|
||||
- Timeout prevents hanging requests from blocking proxy
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// Timeout configuration
|
||||
const TIMEOUT_MS = 30000; // 30 seconds
|
||||
const MAX_SIZE_BYTES = 10 * 1024 * 1024; // 10 MB
|
||||
|
||||
// Request with timeout
|
||||
const response = await axios.get(url, {
|
||||
timeout: TIMEOUT_MS,
|
||||
headers: { Authorization: `Bearer ${accessToken}` }
|
||||
});
|
||||
|
||||
// Size validation
|
||||
const contentLength = parseInt(response.headers['content-length'] || '0');
|
||||
if (contentLength > MAX_SIZE_BYTES) {
|
||||
return res.status(413).send('Payload Too Large');
|
||||
}
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Progressive timeout (short for metadata, long for content) → Rejected: Adds complexity, 30s sufficient
|
||||
- No size validation → Rejected: Could stream partial files, poor UX
|
||||
- Configurable limits → Rejected: Hard-coded per spec, no need for configuration
|
||||
|
||||
---
|
||||
|
||||
### 7. Streaming vs Buffering
|
||||
|
||||
**Decision**: Stream export content directly from Google Drive to client without buffering
|
||||
|
||||
**Rationale**:
|
||||
- axios supports streaming via `responseType: 'stream'`
|
||||
- Node.js streams allow piping directly to HTTP response
|
||||
- No memory overhead for file contents (only metadata buffered)
|
||||
- Efficient for documents approaching 10MB limit
|
||||
|
||||
**Implementation Pattern**:
|
||||
```javascript
|
||||
// Stream response
|
||||
const exportResponse = await axios.get(exportUrl, {
|
||||
headers: { Authorization: `Bearer ${accessToken}` },
|
||||
responseType: 'stream',
|
||||
timeout: TIMEOUT_MS
|
||||
});
|
||||
|
||||
// Set headers
|
||||
res.setHeader('Content-Type', contentType);
|
||||
res.setHeader('Content-Disposition', contentDisposition);
|
||||
|
||||
// Pipe stream
|
||||
exportResponse.data.pipe(res);
|
||||
|
||||
// Handle stream errors
|
||||
exportResponse.data.on('error', (err) => {
|
||||
if (!res.headersSent) {
|
||||
res.status(500).send('Export failed - unable to retrieve document content');
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Buffer entire response → Rejected: Increases memory usage, adds latency
|
||||
- Chunked encoding → Not needed: Google Drive provides Content-Length
|
||||
|
||||
---
|
||||
|
||||
## Summary of Technical Decisions
|
||||
|
||||
| Area | Decision | Rationale |
|
||||
|------|----------|-----------|
|
||||
| **Metadata API** | files.get with field selection | Minimal response size, single API call |
|
||||
| **Format Selection** | Priority order: Markdown > HTML > PDF | Most portable to least portable |
|
||||
| **Native PDFs** | files.get with alt=media streaming | Efficient, no conversion needed |
|
||||
| **Headers** | Content-Disposition: inline with filename | Browser rendering + save support |
|
||||
| **Error Mapping** | Google Drive errors → HTTP status codes | Consistent client experience |
|
||||
| **Timeouts** | 30s axios timeout | Prevents hanging requests |
|
||||
| **Size Limits** | 10MB via Content-Length validation | Early rejection, no partial downloads |
|
||||
| **Streaming** | Direct pipe from Google Drive to client | Memory efficient, low latency |
|
||||
|
||||
All decisions align with constitution principles (monolithic architecture, simplicity, YAGNI) and specification requirements.
|
||||
Reference in New Issue
Block a user