Files
google-drive-content-adapter/specs/001-gdrive-url-header/research.md
Peter.Morton 9286ee8927 feat: Add X-Verint-KAB-Original-URL header to document exports
Adds HTTP response header containing original Google Drive URL
for exported documents to enable content traceability and auditing.

- Adds X-Verint-KAB-Original-URL header to successful export responses
- Header format: https://drive.google.com/file/d/{fileId}
- Present for all export formats (PDF, DOCX, plain text)
- Header omitted on error responses (4xx/5xx)
- 18 new tests (9 contract + 9 integration)
- Zero new dependencies
- Performance: 0.000019ms overhead per request

Implements:
- FR-001: Header present on successful exports (200 OK)
- FR-002: Header absent on error responses
- FR-003: Standard header name X-Verint-KAB-Original-URL
- FR-004: Standard URL format with file ID
- FR-005: Uses validated document.id from Google Drive API
- FR-006: Header present regardless of file accessibility
- FR-007: Consistent across all export formats
- FR-008: Minimal performance impact (< 5ms requirement)

Testing:
- Contract tests validate header presence, format, and error handling
- Integration tests verify behavior across formats and permissions
- All 18 tests passing
- 100% requirements coverage

Documentation:
- Feature specification (specs/001-gdrive-url-header/spec.md)
- Implementation plan (plan.md)
- Technical research (research.md)
- Data model (data-model.md)
- API contract (contracts/response-headers.md)
- User guide (quickstart.md)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 16:04:54 -05:00

9.8 KiB

Research: Google Drive Original URL Header

Feature: 001-gdrive-url-header
Date: 2026-03-27
Status: Complete

Purpose

Research implementation approach for adding X-Verint-KAB-Original-URL HTTP response header containing the original Google Drive URL for exported documents.

Research Questions

  1. What is the correct Google Drive URL format for linking to files?
  2. How and where is the document/file ID available in the current codebase?
  3. What is the existing pattern for setting HTTP response headers?
  4. Where in the export response flow should the header be added?
  5. How should errors be handled when the file ID is unavailable?

1. Google Drive URL Format

Decision: Use https://drive.google.com/file/d/{fileId} format

Rationale:

  • This is the standard user-facing URL format for Google Drive files
  • Matches the format specified in spec.md (FR-003)
  • Alternative format https://drive.google.com/open?id={fileId} is also valid but the /file/d/ format is more modern

Current Codebase Context:

  • The codebase currently uses Google Drive API URLs (e.g., https://www.googleapis.com/drive/v3/files)
  • These are API endpoints, not user-facing URLs
  • User-facing URLs are not currently constructed anywhere in the codebase

Implementation:

const driveUrl = `https://drive.google.com/file/d/${document.id}`;
res.setHeader("X-Verint-KAB-Original-URL", driveUrl);

Alternatives Considered:

  • https://drive.google.com/open?id={fileId} - Older format, less readable
  • https://docs.google.com/document/d/{fileId} - Document-specific, not suitable for all file types

2. Document ID Availability

Decision: Use document.id after metadata fetch (after proxy.js line 278)

Rationale:

  • The document ID flows through multiple stages in the request lifecycle
  • Using document.id (from Google Drive API response) ensures the ID is validated
  • More reliable than the documentId route parameter which could be malformed

ID Flow Through System:

  1. Route Parsing (googleDriveAdapterHelper.js:466-470):

    • URL pattern: /documents/{documentId}
    • Extracted as routeResult.documentId
  2. Request Handler (proxy.js:467):

    • Passed to handleDocumentExportRequest(res, routeResult.documentId, requestId)
  3. Export Handler (proxy.js:255):

    • Available as documentId parameter throughout function
    • Metadata fetched at line 260-278
    • After line 278: document.id contains validated ID from Google Drive

Code Location:

// proxy.js:260-278
const metadataUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
const metadataResponse = await axios.get(metadataUrl, {
  headers: { Authorization: `Bearer ${accessToken}` },
});
const document = metadataResponse.data;
// document.id is now available and validated

Alternatives Considered:

  • Using documentId parameter directly - Less reliable as it hasn't been validated by Google Drive
  • Extracting from API response URL - Unnecessary complexity

3. HTTP Response Header Pattern

Decision: Follow existing res.setHeader(name, value) pattern

Rationale:

  • Consistent with all existing header setting in the codebase
  • Standard Node.js HTTP response API
  • Custom headers already use X- prefix convention

Current Header Setting Patterns:

Export Success Path (proxy.js:374-386):

res.setHeader("Content-Type", contentType);
res.setHeader("X-Request-Id", requestId);
res.setHeader("Content-Disposition", contentDisposition);
if (contentLength) {
  res.setHeader("Content-Length", contentLength);
}

Sitemap Handler (proxy.js:224-226):

res.setHeader("Content-Type", "application/xml; charset=utf-8");
res.setHeader("X-Request-Id", requestId);
res.setHeader("X-Document-Count", documents.length.toString());

Error Paths (lines 302, 338, 362, 415):

res.setHeader("X-Request-Id", requestId);

Pattern Consistency:

  • All custom headers use X- prefix
  • Headers are set immediately before response streaming or res.end()
  • X-Request-Id is always present for traceability

Alternatives Considered:

  • Using helper function to set headers - Unnecessary for simple operation
  • Setting headers in helper module - Violates monolithic architecture

4. Export Response Flow

Decision: Add header at line 377 or 383 in handleDocumentExportRequest()

Rationale:

  • Single code location handles all successful export responses
  • All export formats (PDF, DOCX, text) flow through this path
  • Headers must be set before streaming starts (line 389)
  • document.id is guaranteed to be available at this point

Exact Code Location (proxy.js:374-389):

// Step 5: Set response headers
res.statusCode = 200;
res.setHeader("Content-Type", contentType);
res.setHeader("X-Request-Id", requestId);

// Generate Content-Disposition header
const sanitizedFilename = googleDriveAdapterHelper.sanitizeFilename(document.name);
const contentDisposition = `inline; filename="${sanitizedFilename}.${fileExtension}"`;
res.setHeader("Content-Disposition", contentDisposition);

// *** ADD NEW HEADER HERE (after line 377 or 382) ***
res.setHeader("X-Verint-KAB-Original-URL", `https://drive.google.com/file/d/${document.id}`);

if (contentLength) {
  res.setHeader("Content-Length", contentLength);
}

// Step 6: Stream the content
contentResponse.data.pipe(res);

Why This Location:

  • Success path only (200 OK responses)
  • After document.id is validated (line 278)
  • Before content streaming begins (line 389)
  • Alongside other response headers
  • All export formats use this code path

Alternatives Considered:

  • Setting header after metadata fetch (line 278) - Too early, may fail before export
  • Setting in helper function - Violates monolithic architecture
  • Setting in multiple locations - Error-prone, inconsistent

5. Error Handling

Rationale:

  • Simpler implementation and clearer API contract
  • Clients can check for header presence to determine success
  • Avoids confusion between empty string and missing value
  • Aligns with existing pattern (custom headers only on success paths)

Error Scenarios:

Scenario Error Code File ID Available? Current Headers Recommendation
Invalid document ID format 404 No (route param) X-Request-Id only Omit URL header
Document not found (404 from Drive) 404 No (validation failed) X-Request-Id only Omit URL header
Unsupported mimetype 403 Yes (after metadata) X-Request-Id only Omit URL header
Size limit exceeded 413 Yes (after metadata) X-Request-Id only Omit URL header
Stream error 500 Yes (during transfer) Already sent Cannot add header
General API error 500 No X-Request-Id only Omit URL header

FR-006 Interpretation:

  • Spec states: "empty or null value when document ID cannot be determined"
  • HTTP headers cannot have null values (only strings)
  • Interpretation: Omit header entirely on error paths (cleaner than empty string)

Implementation:

  • Success path (200 OK): Include header with valid URL
  • Error paths (4xx, 5xx): Do not include header
  • No changes needed to existing error handlers

Alternatives Considered:

  • Setting empty string "" on errors - Ambiguous, adds no value
  • Setting placeholder URL - Misleading, could cause client errors
  • Setting header with error indicator - Violates HTTP semantics

Technology Stack Decisions

No New Dependencies Required

Decision: Use standard JavaScript string operations

Rationale:

  • URL construction is simple: https://drive.google.com/file/d/${document.id}
  • No URL encoding needed (file IDs are alphanumeric)
  • No validation library needed (Google Drive API validates IDs)
  • Aligns with constitution's preference for Node.js built-ins

Dependencies Analysis:

  • No new npm packages required
  • Uses existing res.setHeader() Node.js API
  • Simple string interpolation (ES6 template literals)

Best Practices

URL Construction

  • Use document.id (validated by Google Drive) not documentId (route parameter)
  • Use template literal for clarity: `https://drive.google.com/file/d/${document.id}`
  • No need for helper function (one-line operation)

Header Naming

  • Use X-Verint-KAB-Original-URL exactly as specified in FR-001
  • Note: X- prefix is deprecated in RFC 6648 but required by client standards (per spec assumptions)

Testing Strategy

  • Contract tests: Verify header presence and format in successful exports
  • Integration tests: Verify header contains correct file ID for real Drive documents
  • Unit tests: Not needed (too simple to warrant isolated testing)
  • Coverage: Test all export formats (PDF, DOCX, plain text)

Performance

  • String concatenation overhead: < 1ms
  • Memory impact: ~100 bytes per response
  • Well within SC-005 requirement (< 5ms overhead)

Implementation Checklist

  • Add header at line 377-383 in handleDocumentExportRequest()
  • Use document.id for URL construction
  • Use format: https://drive.google.com/file/d/${document.id}
  • Omit header on error responses (no changes to error handlers)
  • Write contract tests for header presence and format
  • Write integration tests with real Drive API responses
  • Test all export formats (PDF, DOCX, plain text)
  • Verify performance impact < 5ms
  • Update API documentation (contracts/response-headers.md)

References