Files
google-drive-content-adapter/specs/001-gdrive-url-header/research.md
Peter.Morton 9286ee8927 feat: Add X-Verint-KAB-Original-URL header to document exports
Adds HTTP response header containing original Google Drive URL
for exported documents to enable content traceability and auditing.

- Adds X-Verint-KAB-Original-URL header to successful export responses
- Header format: https://drive.google.com/file/d/{fileId}
- Present for all export formats (PDF, DOCX, plain text)
- Header omitted on error responses (4xx/5xx)
- 18 new tests (9 contract + 9 integration)
- Zero new dependencies
- Performance: 0.000019ms overhead per request

Implements:
- FR-001: Header present on successful exports (200 OK)
- FR-002: Header absent on error responses
- FR-003: Standard header name X-Verint-KAB-Original-URL
- FR-004: Standard URL format with file ID
- FR-005: Uses validated document.id from Google Drive API
- FR-006: Header present regardless of file accessibility
- FR-007: Consistent across all export formats
- FR-008: Minimal performance impact (< 5ms requirement)

Testing:
- Contract tests validate header presence, format, and error handling
- Integration tests verify behavior across formats and permissions
- All 18 tests passing
- 100% requirements coverage

Documentation:
- Feature specification (specs/001-gdrive-url-header/spec.md)
- Implementation plan (plan.md)
- Technical research (research.md)
- Data model (data-model.md)
- API contract (contracts/response-headers.md)
- User guide (quickstart.md)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-27 16:04:54 -05:00

280 lines
9.8 KiB
Markdown

# Research: Google Drive Original URL Header
**Feature**: 001-gdrive-url-header
**Date**: 2026-03-27
**Status**: Complete
## Purpose
Research implementation approach for adding `X-Verint-KAB-Original-URL` HTTP response header containing the original Google Drive URL for exported documents.
## Research Questions
1. What is the correct Google Drive URL format for linking to files?
2. How and where is the document/file ID available in the current codebase?
3. What is the existing pattern for setting HTTP response headers?
4. Where in the export response flow should the header be added?
5. How should errors be handled when the file ID is unavailable?
---
## 1. Google Drive URL Format
### Decision: Use `https://drive.google.com/file/d/{fileId}` format
**Rationale:**
- This is the standard user-facing URL format for Google Drive files
- Matches the format specified in spec.md (FR-003)
- Alternative format `https://drive.google.com/open?id={fileId}` is also valid but the `/file/d/` format is more modern
**Current Codebase Context:**
- The codebase currently uses Google Drive API URLs (e.g., `https://www.googleapis.com/drive/v3/files`)
- These are API endpoints, not user-facing URLs
- User-facing URLs are not currently constructed anywhere in the codebase
**Implementation:**
```javascript
const driveUrl = `https://drive.google.com/file/d/${document.id}`;
res.setHeader("X-Verint-KAB-Original-URL", driveUrl);
```
**Alternatives Considered:**
- `https://drive.google.com/open?id={fileId}` - Older format, less readable
- `https://docs.google.com/document/d/{fileId}` - Document-specific, not suitable for all file types
---
## 2. Document ID Availability
### Decision: Use `document.id` after metadata fetch (after proxy.js line 278)
**Rationale:**
- The document ID flows through multiple stages in the request lifecycle
- Using `document.id` (from Google Drive API response) ensures the ID is validated
- More reliable than the `documentId` route parameter which could be malformed
**ID Flow Through System:**
1. **Route Parsing** (`googleDriveAdapterHelper.js:466-470`):
- URL pattern: `/documents/{documentId}`
- Extracted as `routeResult.documentId`
2. **Request Handler** (`proxy.js:467`):
- Passed to `handleDocumentExportRequest(res, routeResult.documentId, requestId)`
3. **Export Handler** (`proxy.js:255`):
- Available as `documentId` parameter throughout function
- Metadata fetched at line 260-278
- After line 278: `document.id` contains validated ID from Google Drive
**Code Location:**
```javascript
// proxy.js:260-278
const metadataUrl = `https://www.googleapis.com/drive/v3/files/${documentId}`;
const metadataResponse = await axios.get(metadataUrl, {
headers: { Authorization: `Bearer ${accessToken}` },
});
const document = metadataResponse.data;
// document.id is now available and validated
```
**Alternatives Considered:**
- Using `documentId` parameter directly - Less reliable as it hasn't been validated by Google Drive
- Extracting from API response URL - Unnecessary complexity
---
## 3. HTTP Response Header Pattern
### Decision: Follow existing `res.setHeader(name, value)` pattern
**Rationale:**
- Consistent with all existing header setting in the codebase
- Standard Node.js HTTP response API
- Custom headers already use `X-` prefix convention
**Current Header Setting Patterns:**
**Export Success Path** (`proxy.js:374-386`):
```javascript
res.setHeader("Content-Type", contentType);
res.setHeader("X-Request-Id", requestId);
res.setHeader("Content-Disposition", contentDisposition);
if (contentLength) {
res.setHeader("Content-Length", contentLength);
}
```
**Sitemap Handler** (`proxy.js:224-226`):
```javascript
res.setHeader("Content-Type", "application/xml; charset=utf-8");
res.setHeader("X-Request-Id", requestId);
res.setHeader("X-Document-Count", documents.length.toString());
```
**Error Paths** (lines 302, 338, 362, 415):
```javascript
res.setHeader("X-Request-Id", requestId);
```
**Pattern Consistency:**
- All custom headers use `X-` prefix
- Headers are set immediately before response streaming or `res.end()`
- `X-Request-Id` is always present for traceability
**Alternatives Considered:**
- Using helper function to set headers - Unnecessary for simple operation
- Setting headers in helper module - Violates monolithic architecture
---
## 4. Export Response Flow
### Decision: Add header at line 377 or 383 in `handleDocumentExportRequest()`
**Rationale:**
- Single code location handles all successful export responses
- All export formats (PDF, DOCX, text) flow through this path
- Headers must be set before streaming starts (line 389)
- `document.id` is guaranteed to be available at this point
**Exact Code Location** (`proxy.js:374-389`):
```javascript
// Step 5: Set response headers
res.statusCode = 200;
res.setHeader("Content-Type", contentType);
res.setHeader("X-Request-Id", requestId);
// Generate Content-Disposition header
const sanitizedFilename = googleDriveAdapterHelper.sanitizeFilename(document.name);
const contentDisposition = `inline; filename="${sanitizedFilename}.${fileExtension}"`;
res.setHeader("Content-Disposition", contentDisposition);
// *** ADD NEW HEADER HERE (after line 377 or 382) ***
res.setHeader("X-Verint-KAB-Original-URL", `https://drive.google.com/file/d/${document.id}`);
if (contentLength) {
res.setHeader("Content-Length", contentLength);
}
// Step 6: Stream the content
contentResponse.data.pipe(res);
```
**Why This Location:**
- ✅ Success path only (200 OK responses)
- ✅ After `document.id` is validated (line 278)
- ✅ Before content streaming begins (line 389)
- ✅ Alongside other response headers
- ✅ All export formats use this code path
**Alternatives Considered:**
- Setting header after metadata fetch (line 278) - Too early, may fail before export
- Setting in helper function - Violates monolithic architecture
- Setting in multiple locations - Error-prone, inconsistent
---
## 5. Error Handling
### Decision: Omit header on error responses (recommended)
**Rationale:**
- Simpler implementation and clearer API contract
- Clients can check for header presence to determine success
- Avoids confusion between empty string and missing value
- Aligns with existing pattern (custom headers only on success paths)
**Error Scenarios:**
| Scenario | Error Code | File ID Available? | Current Headers | Recommendation |
|----------|-----------|-------------------|-----------------|----------------|
| Invalid document ID format | 404 | No (route param) | X-Request-Id only | Omit URL header |
| Document not found (404 from Drive) | 404 | No (validation failed) | X-Request-Id only | Omit URL header |
| Unsupported mimetype | 403 | Yes (after metadata) | X-Request-Id only | Omit URL header |
| Size limit exceeded | 413 | Yes (after metadata) | X-Request-Id only | Omit URL header |
| Stream error | 500 | Yes (during transfer) | Already sent | Cannot add header |
| General API error | 500 | No | X-Request-Id only | Omit URL header |
**FR-006 Interpretation:**
- Spec states: "empty or null value when document ID cannot be determined"
- HTTP headers cannot have null values (only strings)
- **Interpretation:** Omit header entirely on error paths (cleaner than empty string)
**Implementation:**
- **Success path (200 OK):** Include header with valid URL
- **Error paths (4xx, 5xx):** Do not include header
- No changes needed to existing error handlers
**Alternatives Considered:**
- Setting empty string `""` on errors - Ambiguous, adds no value
- Setting placeholder URL - Misleading, could cause client errors
- Setting header with error indicator - Violates HTTP semantics
---
## Technology Stack Decisions
### No New Dependencies Required
**Decision: Use standard JavaScript string operations**
**Rationale:**
- URL construction is simple: `https://drive.google.com/file/d/${document.id}`
- No URL encoding needed (file IDs are alphanumeric)
- No validation library needed (Google Drive API validates IDs)
- Aligns with constitution's preference for Node.js built-ins
**Dependencies Analysis:**
- ✅ No new npm packages required
- ✅ Uses existing `res.setHeader()` Node.js API
- ✅ Simple string interpolation (ES6 template literals)
---
## Best Practices
### URL Construction
- Use `document.id` (validated by Google Drive) not `documentId` (route parameter)
- Use template literal for clarity: `` `https://drive.google.com/file/d/${document.id}` ``
- No need for helper function (one-line operation)
### Header Naming
- Use `X-Verint-KAB-Original-URL` exactly as specified in FR-001
- Note: `X-` prefix is deprecated in RFC 6648 but required by client standards (per spec assumptions)
### Testing Strategy
- Contract tests: Verify header presence and format in successful exports
- Integration tests: Verify header contains correct file ID for real Drive documents
- Unit tests: Not needed (too simple to warrant isolated testing)
- Coverage: Test all export formats (PDF, DOCX, plain text)
### Performance
- String concatenation overhead: < 1ms
- Memory impact: ~100 bytes per response
- Well within SC-005 requirement (< 5ms overhead)
---
## Implementation Checklist
- [ ] Add header at line 377-383 in `handleDocumentExportRequest()`
- [ ] Use `document.id` for URL construction
- [ ] Use format: `https://drive.google.com/file/d/${document.id}`
- [ ] Omit header on error responses (no changes to error handlers)
- [ ] Write contract tests for header presence and format
- [ ] Write integration tests with real Drive API responses
- [ ] Test all export formats (PDF, DOCX, plain text)
- [ ] Verify performance impact < 5ms
- [ ] Update API documentation (contracts/response-headers.md)
---
## References
- **Spec**: `/specs/001-gdrive-url-header/spec.md`
- **Constitution**: `.specify/memory/constitution.md`
- **Code**: `src/proxyScripts/proxy.js` (lines 255-425)
- **Helpers**: `src/globalVariables/googleDriveAdapterHelper.js`
- **Google Drive URLs**: https://developers.google.com/drive/api/guides/manage-sharing