Files
google-drive-content-adapter/specs/002-document-export/spec.md

14 KiB

Feature Specification: Document Export API Route

Feature Branch: 002-document-export
Created: 2026-03-09
Status: Draft
Input: User description: "Document exporting. This feature adds a route to the proxy.sh in the format of /documents/:documentId. This route returns a single response that exports the document. The route should 1st fetch metadata about the document from Google Drive using this api https://developers.google.com/workspace/drive/api/reference/rest/v3/files/get to retrieve these fields 'id,name,mimeType,exportLinks'. If exportLinks are available then select one export option based on availability from the following list in order "text/x-markdown","text/html","application/pdf" and export using the link provided in exportLinks. If exportLinks are not available for the document the determine the mimeType of the document and if the mimeType matches 'application/pdf' then stream the pdf file from Google Drive, otherwise send a '403' mimetype not supported message. In all cases make sure that the 'Content-Type' header is set appropriately in the Response."

User Scenarios & Testing (mandatory)

User Story 1 - Export Google Workspace Documents (Priority: P1)

Users request a Google Workspace document (Google Docs, Sheets, Slides) via the export route and receive it in their preferred format (Markdown, HTML, or PDF). The system intelligently selects the best available export format from Google Drive's export links.

Why this priority: This is the core functionality - exporting Google Workspace documents is the primary use case. Without this, the feature cannot deliver any value.

Independent Test: Can be fully tested by requesting a Google Doc via /documents/:documentId endpoint with a valid document ID and verifying the response contains the exported document in the correct format with appropriate Content-Type headers.

Acceptance Scenarios:

  1. Given a Google Workspace document with export links available, When user requests /documents/:documentId, Then system returns the document in Markdown format (if available) with Content-Type: text/x-markdown and Content-Disposition: inline; filename="[name].md" headers
  2. Given a Google Workspace document without Markdown export but HTML export available, When user requests /documents/:documentId, Then system returns the document in HTML format with Content-Type: text/html and Content-Disposition: inline; filename="[name].html" headers
  3. Given a Google Workspace document with only PDF export available, When user requests /documents/:documentId, Then system returns the document in PDF format with Content-Type: application/pdf and Content-Disposition: inline; filename="[name].pdf" headers
  4. Given a valid document ID, When export route is called, Then system fetches metadata using Google Drive API with fields 'id,name,mimeType,exportLinks'
  5. Given an invalid document ID, When export is requested, Then system returns HTTP 404 with message "Document not found"
  6. Given a document the user cannot access, When export is requested, Then system returns HTTP 401 with message "Unauthorized"

User Story 2 - Export Native PDF Files (Priority: P2)

Users request native PDF files stored in Google Drive and receive them streamed directly without conversion. This handles documents that are already in PDF format rather than Google Workspace documents.

Why this priority: Native PDFs are a common file type in Google Drive. This ensures the export route works for pre-existing PDF files, not just Google Workspace documents. It's lower priority than P1 because Google Workspace documents are the primary use case.

Independent Test: Can be fully tested by uploading a native PDF file to Google Drive, requesting it via /documents/:documentId, and verifying the PDF is streamed correctly with Content-Type: application/pdf header.

Acceptance Scenarios:

  1. Given a native PDF file (mimeType: application/pdf) in Google Drive with no export links, When user requests /documents/:documentId, Then system streams the PDF file directly with Content-Type: application/pdf and Content-Disposition: inline; filename="[name].pdf" headers
  2. Given a native PDF file, When export is requested, Then system bypasses export links and streams the file content
  3. Given a native PDF file larger than 10MB, When export is requested, Then system returns HTTP 413 with message "Payload Too Large"

User Story 3 - Handle Unsupported File Types (Priority: P3)

Users attempting to export unsupported file types receive clear error messages indicating the mimetype is not supported. This provides a graceful failure path for non-exportable documents.

Why this priority: Error handling is important for user experience, but it's lower priority than successfully exporting supported file types. Users primarily need the happy path working first.

Independent Test: Can be fully tested by requesting a document with an unsupported mimetype (e.g., image, video, zip) via /documents/:documentId and verifying a 403 response with appropriate error message.

Acceptance Scenarios:

  1. Given a document with unsupported mimeType (not Google Workspace or PDF) and no export links, When user requests /documents/:documentId, Then system returns HTTP 403 with message "mimetype not supported"
  2. Given an image file (e.g., mimeType: image/jpeg), When export is requested, Then system returns 403 error
  3. Given an export operation that exceeds 30 seconds, When timeout occurs, Then system returns HTTP 504 with message "Gateway Timeout"
  4. Given Google Drive API is unavailable, When export is requested, Then system returns HTTP 502 with message "Bad Gateway - Google Drive API unavailable"

Edge Cases

  • Invalid or non-existent document ID: System returns HTTP 404 with message "Document not found"
  • Insufficient permissions: System returns HTTP 401 with message "Unauthorized" when user lacks access to the requested document
  • Google Drive API unavailable: System returns HTTP 502 with message "Bad Gateway - Google Drive API unavailable"
  • Malformed or inaccessible export links: System returns HTTP 500 with message "Export failed - unable to retrieve document content"
  • Large documents or timeouts: Documents exceeding 10MB return HTTP 413 "Payload Too Large"; exports exceeding 30-second timeout return HTTP 504 "Gateway Timeout"
  • Missing mimeType field: System treats document as unsupported and returns HTTP 403 "mimetype not supported"
  • Multiple export formats with same priority: Not applicable - priority list is strictly ordered

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST provide an HTTP route in the format /documents/:documentId where documentId is the Google Drive document identifier
  • FR-002: System MUST fetch document metadata from Google Drive API using https://developers.google.com/workspace/drive/api/reference/rest/v3/files/get endpoint
  • FR-003: System MUST retrieve the following metadata fields: id, name, mimeType, exportLinks
  • FR-004: System MUST check for the presence of exportLinks in the metadata response
  • FR-005: System MUST select export format from exportLinks based on this priority order: text/x-markdown, text/html, application/pdf (first available wins)
  • FR-006: System MUST use the export link provided in exportLinks to retrieve the exported document content
  • FR-007: System MUST handle documents without exportLinks by checking the mimeType field
  • FR-008: System MUST stream native PDF files (mimeType: application/pdf) directly from Google Drive when no export links are available
  • FR-009: System MUST return HTTP 403 with message "mimetype not supported" for documents without export links and mimeType other than application/pdf
  • FR-010: System MUST set the Content-Type response header appropriately based on the export format used:
    • text/x-markdown for Markdown exports
    • text/html for HTML exports
    • application/pdf for PDF exports or native PDF files
  • FR-011: System MUST set the Content-Disposition response header to inline; filename="[document-name].[extension]" using the document name from Google Drive metadata and appropriate file extension for the export format
  • FR-012: System MUST return the exported document content as the HTTP response body
  • FR-013: System MUST handle authentication with Google Drive API (assumes OAuth2 or service account credentials are configured)
  • FR-014: System MUST return HTTP 404 with message "Document not found" when the document ID is invalid or doesn't exist in Google Drive
  • FR-015: System MUST return HTTP 401 with message "Unauthorized" when the user lacks permissions to access the requested document
  • FR-016: System MUST return HTTP 502 with message "Bad Gateway - Google Drive API unavailable" when Google Drive API is unavailable or returns an error
  • FR-017: System MUST return HTTP 413 with message "Payload Too Large" for documents exceeding 10MB
  • FR-018: System MUST return HTTP 504 with message "Gateway Timeout" when export operations exceed 30 seconds
  • FR-019: System MUST return HTTP 500 with message "Export failed - unable to retrieve document content" when export links are malformed or inaccessible

Key Entities

  • Document: A file stored in Google Drive, identified by a unique documentId

    • Attributes: id, name, mimeType, exportLinks (optional map of format to URL)
    • Can be either a Google Workspace document (Docs, Sheets, Slides) or a native file (PDF, images, etc.)
    • Google Workspace documents provide exportLinks for format conversion
    • Native files like PDFs do not have exportLinks and must be streamed directly
  • Export Request: A user's request to retrieve a document via the export route

    • Attributes: documentId (from URL parameter)
    • Triggers metadata fetch, format selection, and document retrieval
  • Export Format: The output format for a document

    • Supported formats: Markdown (text/x-markdown), HTML (text/html), PDF (application/pdf)
    • Prioritized by preference: Markdown > HTML > PDF
    • Determines the Content-Type header in the response

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Users can successfully export Google Workspace documents in under 5 seconds for documents under 10MB
  • SC-002: System correctly selects Markdown format when available in 100% of cases
  • SC-003: System falls back to HTML or PDF formats appropriately when Markdown is unavailable in 100% of cases
  • SC-004: Native PDF files are streamed successfully without conversion in 100% of attempts
  • SC-005: Unsupported file types return clear error messages (403 status) in 100% of cases
  • SC-006: Response Content-Type headers match the exported format in 100% of requests
  • SC-007: Response Content-Disposition headers include correct filenames with appropriate extensions in 100% of requests
  • SC-008: System handles at least 50 concurrent export requests without degradation
  • SC-009: Export success rate exceeds 99% for valid document IDs with proper permissions
  • SC-010: Error responses return appropriate HTTP status codes (401, 403, 404, 413, 500, 502, 504) in 100% of error scenarios
  • SC-011: Export operations exceeding 30 seconds timeout gracefully with HTTP 504 response

Assumptions

  • Google Drive API credentials (OAuth2 or service account) are already configured and available to the proxy
  • The proxy service has network access to Google Drive API endpoints
  • Document permissions are managed by Google Drive - the proxy inherits the authenticated user's or service account's permissions
  • The route naming convention /documents/:documentId matches the sitemap.xml URL format and is consistent with existing proxy route patterns
  • Export format priority (Markdown > HTML > PDF) represents the most useful format hierarchy for downstream consumers
  • Standard HTTP response codes are used: 200 (success), 401 (unauthorized), 403 (unsupported type), 404 (not found), 413 (too large), 500 (server error), 502 (bad gateway), 504 (timeout)
  • Document size limit is 10MB with a 30-second timeout for export operations, aligning with Google Drive's export API limits
  • Streaming is preferred for native PDF files to handle large files efficiently
  • Error messages are plain text responses for simplicity and consistency
  • Content-Disposition header is set to "inline" to allow browser rendering while preserving filename for downloads
  • File extensions for Content-Disposition filenames: .md (Markdown), .html (HTML), .pdf (PDF)

Scope

In Scope:

  • Single document export via document ID
  • Google Workspace document export (Docs, Sheets, Slides) via exportLinks
  • Native PDF file streaming (up to 10MB)
  • Format selection based on availability and priority
  • Proper Content-Type and Content-Disposition header management
  • Error handling for unsupported mimetypes, invalid document IDs, permission issues, size limits, and timeouts
  • HTTP status codes for all error scenarios (401, 403, 404, 413, 500, 502, 504)
  • 30-second timeout for export operations

Out of Scope:

  • Batch export of multiple documents
  • Custom format selection by user (always uses priority order)
  • Document conversion beyond what Google Drive provides via exportLinks
  • Caching of exported documents
  • Rate limiting or throttling
  • User authentication/authorization (assumes proxy handles this)
  • Document metadata editing or management
  • Progress tracking for large exports
  • Retry logic for failed Google Drive API calls
  • Logging and monitoring (assumes proxy infrastructure handles this)