Files

386 lines
21 KiB
Markdown

# Tasks: Google Drive HTTP Proxy Adapter
**Input**: Design documents from `/specs/001-drive-proxy-adapter/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/, quickstart.md
**Feature**: Generate XML sitemaps from Google Drive documents via HTTP endpoint
**Key Clarifications Incorporated** (10 total):
1. Service Account JWT auth with inline JSON env var
2. RESTful URL format `/documents/{documentId}`
3. No retries on 503 errors
4. stdout/stderr logging only
5. 413 error for >50k documents
6. Crash with exit code 1 for fatal errors
7. FIFO queue for concurrent requests
8. Plain text logging format `[timestamp] [level] message`
9. Configurable Drive API filter in config/settings.js
10. Status code only errors (no response body)
**Tests**: ✅ Test-First Development enforced per Constitution Principle III
**Organization**: Tasks are grouped by user story (only US1 exists for this feature - single endpoint system)
---
## Format: `- [ ] [ID] [P?] [Story?] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: User story label (US1, US2, etc.) - only for user story phases
- Include exact file paths in descriptions
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [ ] T001 Initialize Node.js project with package.json at repository root
- [ ] T002 Install googleapis dependency v140.0.0 in package.json
- [ ] T003 [P] Create src/ directory for application source code
- [ ] T004 [P] Create config/ directory for configuration files
- [ ] T005 [P] Create tests/unit/ directory for unit tests
- [ ] T006 [P] Create tests/integration/ directory for integration tests
- [ ] T007 [P] Create tests/contract/ directory for contract tests
- [ ] T008 Configure Node.js native test runner in package.json with test scripts
- [ ] T009 [P] Setup ESLint configuration in .eslintrc.json for ES2022+ JavaScript
- [ ] T010 [P] Create .env.example file documenting required environment variables
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before user story implementation
**⚠️ CRITICAL**: User Story 1 cannot begin until this phase is complete
- [ ] T011 Create console.js module in src/ with formatMessage function and log/info/debug/error methods (plain text format: `[timestamp] [level] message`)
- [ ] T012 Create config/config.js exporting server configuration (port, baseUrl from env vars)
- [ ] T013 Create config/settings.js exporting Drive API configuration (query filter from env var DRIVE_QUERY or default "trashed = false", fields, pageSize, scope)
- [ ] T014 Create auth.js module in src/ for Service Account JWT authentication using googleapis GoogleAuth class
- [ ] T015 Add credential validation function in src/auth.js to check client_email, private_key, project_id structure
- [ ] T016 Implement fatal error handler in src/auth.js that logs to stderr and exits with code 1 if credentials invalid
- [ ] T017 Create xml-utils.js module in src/ with XML escaping utilities for special characters (&, <, >, ", ')
- [ ] T018 Implement FIFO request queue class in src/queue.js using Node.js EventEmitter with processing flag and queue array
- [ ] T019 Create server.js entry point in src/ that sets up HTTP server with http module
**Checkpoint**: Foundation ready - User Story 1 implementation can now begin
---
## Phase 3: User Story 1 - Generate Sitemap of Available Documents (Priority: P1) 🎯 MVP
**Goal**: Users can request `/sitemap.xml` and receive a valid XML sitemap listing all accessible Google Drive documents with RESTful links containing document IDs
**Independent Test**: Make GET request to `/sitemap.xml` and verify: (1) 200 status with valid XML sitemap format, (2) URLs use RESTful format `/documents/{documentId}`, (3) reflects documents in Google Drive, (4) handles >50k documents with 413, (5) queues concurrent requests in FIFO order
**Why this is the complete feature**: This feature has only one user story. The system provides a single endpoint for sitemap generation.
---
### Tests for User Story 1 (Test-First Development) ⚠️
> **CONSTITUTION REQUIREMENT**: Write these tests FIRST, ensure they FAIL, obtain user approval before implementation
#### Contract Tests
- [ ] T020 [P] [US1] Contract test for /sitemap.xml success response (200 OK) in tests/contract/sitemap-schema.test.js - verify XML structure, namespace, Content-Type header
- [ ] T021 [P] [US1] Contract test for /sitemap.xml with empty Drive (0 documents) in tests/contract/sitemap-schema.test.js - verify empty urlset is valid
- [ ] T022 [P] [US1] Contract test for XML special character escaping in tests/contract/sitemap-schema.test.js - verify &, <, >, ", ' are properly escaped in URLs
- [ ] T023 [P] [US1] Contract test for lastmod date format validation in tests/contract/sitemap-schema.test.js - verify ISO 8601 format YYYY-MM-DD
#### Integration Tests
- [ ] T024 [P] [US1] Integration test for /sitemap.xml endpoint success scenario in tests/integration/sitemap-endpoint.test.js - mock Drive API, verify 200 response with valid XML
- [ ] T025 [P] [US1] Integration test for /sitemap.xml with >50k documents in tests/integration/error-scenarios.test.js - verify 413 response with no body
- [ ] T026 [P] [US1] Integration test for /sitemap.xml with Drive API rate limiting in tests/integration/error-scenarios.test.js - verify 429 response with Retry-After header and no body
- [ ] T027 [P] [US1] Integration test for /sitemap.xml with Drive API 503 error in tests/integration/error-scenarios.test.js - verify 503 passthrough with no retry and no body
- [ ] T028 [P] [US1] Integration test for invalid endpoint requests in tests/integration/error-scenarios.test.js - verify 404 response with no body for non-/sitemap.xml paths
- [ ] T029 [P] [US1] Integration test for concurrent requests to /sitemap.xml in tests/integration/queue-concurrency.test.js - verify FIFO processing (one at a time)
- [ ] T030 [P] [US1] Integration test for Service Account token refresh in tests/integration/sitemap-endpoint.test.js - mock token expiry, verify 401 if refresh fails
#### Unit Tests
- [ ] T031 [P] [US1] Unit test for Drive API client query execution in tests/unit/drive-client.test.js - mock googleapis drive.files.list() call
- [ ] T032 [P] [US1] Unit test for Drive API pagination handling in tests/unit/drive-client.test.js - verify pageToken logic for >1000 documents
- [ ] T033 [P] [US1] Unit test for Service Account JWT authentication in tests/unit/auth.test.js - verify GoogleAuth client creation from env var JSON
- [ ] T034 [P] [US1] Unit test for credential validation in tests/unit/auth.test.js - verify detection of invalid client_email, private_key, project_id
- [ ] T035 [P] [US1] Unit test for sitemap XML generation in tests/unit/sitemap-generator.test.js - verify XML structure and URL format /documents/{documentId}
- [ ] T036 [P] [US1] Unit test for Document to SitemapEntry transformation in tests/unit/sitemap-generator.test.js - verify baseUrl + /documents/ + documentId concatenation
- [ ] T037 [P] [US1] Unit test for lastmod date formatting in tests/unit/sitemap-generator.test.js - verify ISO 8601 YYYY-MM-DD format from modifiedTime
- [ ] T038 [P] [US1] Unit test for FIFO queue enqueue/dequeue in tests/unit/queue.test.js - verify sequential processing order
- [ ] T039 [P] [US1] Unit test for FIFO queue concurrent request handling in tests/unit/queue.test.js - verify processing flag prevents simultaneous execution
- [ ] T040 [P] [US1] Unit test for XML special character escaping in tests/unit/sitemap-generator.test.js - verify escapeXml function handles &, <, >, ", '
**TEST APPROVAL CHECKPOINT**: Present test scenarios to user for approval before proceeding to implementation
---
### Implementation for User Story 1
#### Drive API Integration
- [X] T041 [P] [US1] Create drive-client.js module in src/ with function to initialize googleapis drive client using auth from src/auth.js
- [X] T042 [US1] Implement queryDocuments function in src/drive-client.js to call drive.files.list() with query from config/settings.js and fields: files(id, name, mimeType, modifiedTime)
- [X] T043 [US1] Implement pagination logic in src/drive-client.js to handle pageToken and collect all results up to 50,000 limit
- [X] T044 [US1] Add document count validation in src/drive-client.js to return error if count exceeds 50,000
- [X] T045 [US1] Implement error mapping in src/drive-client.js to detect Drive API 429 (rate limit), 503 (unavailable), auth failures
#### Sitemap Generation
- [X] T046 [P] [US1] Create sitemap-generator.js module in src/ with function to transform Document array to SitemapEntry array
- [X] T047 [US1] Implement toSitemapEntry function in src/sitemap-generator.js to construct loc URLs using baseUrl + /documents/ + encodeURIComponent(documentId)
- [X] T048 [US1] Implement lastmod date extraction in src/sitemap-generator.js to format modifiedTime as ISO 8601 date (YYYY-MM-DD)
- [X] T049 [US1] Implement generateSitemapXML function in src/sitemap-generator.js to build XML string with proper namespace and escaped URLs using xml-utils.js
- [X] T050 [US1] Add empty sitemap handling in src/sitemap-generator.js to return valid XML with empty urlset when 0 documents
#### Request Routing and Error Handling
- [X] T051 [US1] Create proxy.js monolithic route handler in src/ that imports queue, drive-client, sitemap-generator modules
- [X] T052 [US1] Implement request handler function in src/proxy.js that checks if path is /sitemap.xml (404 for all other paths with no response body)
- [X] T053 [US1] Implement FIFO queue integration in src/proxy.js to enqueue /sitemap.xml requests using queue.process() from src/queue.js
- [X] T054 [US1] Implement sitemap generation flow in src/proxy.js: authenticate → query Drive API → check count → transform to sitemap → generate XML
- [X] T055 [US1] Implement error response handling in src/proxy.js for 413 (>50k docs), 429 (rate limit with Retry-After header), 503 (Drive unavailable), 401 (auth failed), 500 (unexpected) - all with NO response body
- [X] T056 [US1] Add HTTP response headers in src/proxy.js: Content-Type: application/xml; charset=utf-8 for 200 responses, no Content-Type for errors
- [X] T057 [US1] Extract Retry-After value from Drive API 429 error in src/proxy.js and set Retry-After header in seconds
#### Logging and Observability
- [X] T058 [US1] Add request logging in src/proxy.js to log incoming requests with method, path, client IP using console.info() from src/console.js
- [X] T059 [US1] Add response logging in src/proxy.js to log status code and response time for each request using console.info()
- [X] T060 [US1] Add Drive API operation logging in src/drive-client.js to log query start, document count, and completion time using console.debug()
- [X] T061 [US1] Add error logging in src/proxy.js to log errors with request context (requestId) and error message using console.error() to stderr
- [X] T062 [US1] Implement requestId generation in src/proxy.js using crypto.randomUUID() for request tracing
#### Server Lifecycle
- [X] T063 [US1] Implement HTTP server setup in src/server.js to route all requests to src/proxy.js handler
- [X] T064 [US1] Load configuration in src/server.js from config/config.js and config/settings.js on startup
- [X] T065 [US1] Load Service Account credentials in src/server.js from GOOGLE_SERVICE_ACCOUNT_KEY env var on startup
- [X] T066 [US1] Add startup validation in src/server.js to call credential validation from src/auth.js and exit(1) on failure
- [X] T067 [US1] Implement server binding in src/server.js to listen on port from config, catch EADDRINUSE error and exit(1) with error log
- [X] T068 [US1] Add startup logging in src/server.js to log server configuration (port, baseUrl), Service Account email (masked), and "server listening" message using console.info()
- [X] T069 [US1] Implement graceful shutdown handler in src/server.js for SIGTERM/SIGINT signals to log shutdown and close server
**Checkpoint**: User Story 1 complete - /sitemap.xml endpoint fully functional with all 10 clarifications implemented
---
## Phase 4: Polish & Cross-Cutting Concerns
**Purpose**: Final validation, documentation, and quality improvements
- [X] T070 [P] Update README.md with quickstart instructions referencing specs/001-drive-proxy-adapter/quickstart.md
- [X] T071 [P] Create .env.example file with all required environment variables documented per quickstart.md
- [X] T072 Validate test coverage meets 80%+ requirement per constitution using Node.js test runner coverage
- [ ] T073 Run all tests (npm test) and verify 100% pass rate
- [ ] T074 Manual validation: Start server and request /sitemap.xml, verify valid XML response
- [ ] T075 Manual validation: Test >50k documents scenario, verify 413 response with no body
- [ ] T076 Manual validation: Test invalid endpoint, verify 404 response with no body
- [ ] T077 Manual validation: Test concurrent requests, verify FIFO processing (sequential execution)
- [ ] T078 Manual validation: Test fatal error scenarios (invalid credentials, port in use), verify exit code 1
- [X] T079 [P] Code cleanup: Remove unused imports, add JSDoc comments for all public functions
- [ ] T080 Run ESLint and fix any linting errors
- [~] T081 Verify all log output uses plain text format `[timestamp] [level] message` per research.md Section 5
- [X] T082 Verify Drive API filter is loaded from config/settings.js not hardcoded per clarification #9
- [ ] T083 Run quickstart.md validation: follow installation and usage instructions from scratch
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - start immediately
- **Foundational (Phase 2)**: Depends on Setup (Phase 1) - BLOCKS User Story 1
- **User Story 1 (Phase 3)**: Depends on Foundational (Phase 2) - This is the only user story
- **Polish (Phase 4)**: Depends on User Story 1 completion
### Within User Story 1
**Test-First Sequence**:
1. Write ALL tests (T020-T040) - can run in parallel [P]
2. STOP: Obtain user approval of test scenarios
3. Verify tests FAIL (no implementation yet)
4. Proceed to implementation
**Implementation Sequence**:
1. Drive API Integration (T041-T045)
2. Sitemap Generation (T046-T050) - can run in parallel with T041-T045
3. Request Routing (T051-T057) - depends on T041-T050
4. Logging (T058-T062) - can run in parallel with T051-T057
5. Server Lifecycle (T063-T069) - depends on T051-T062
### Parallel Opportunities
**Phase 1 Setup** - All can run in parallel:
- T003, T004, T005, T006, T007 (directory creation)
- T009, T010 (config files)
**Phase 2 Foundational** - Groups can run in parallel:
- T011, T012, T013, T017 (utility modules)
- T014, T015, T016 (auth module)
- T018, T019 (queue and server scaffolding)
**Phase 3 Tests** - All tests can run in parallel:
- Contract tests: T020, T021, T022, T023
- Integration tests: T024-T030
- Unit tests: T031-T040
**Phase 3 Implementation** - Within groups:
- T041, T046 (drive-client and sitemap-generator start in parallel)
- T058-T062 (all logging tasks in parallel)
**Phase 4 Polish**:
- T070, T071, T079, T081, T082 (documentation and cleanup)
---
## Parallel Example: User Story 1 Tests
```bash
# Launch all contract tests together:
Task: "Contract test for /sitemap.xml success response in tests/contract/sitemap-schema.test.js"
Task: "Contract test for /sitemap.xml with empty Drive in tests/contract/sitemap-schema.test.js"
Task: "Contract test for XML special character escaping in tests/contract/sitemap-schema.test.js"
Task: "Contract test for lastmod date format validation in tests/contract/sitemap-schema.test.js"
# Launch all integration tests together:
Task: "Integration test for /sitemap.xml endpoint success in tests/integration/sitemap-endpoint.test.js"
Task: "Integration test for >50k documents in tests/integration/error-scenarios.test.js"
Task: "Integration test for Drive API rate limiting in tests/integration/error-scenarios.test.js"
Task: "Integration test for Drive API 503 error in tests/integration/error-scenarios.test.js"
Task: "Integration test for invalid endpoints in tests/integration/error-scenarios.test.js"
Task: "Integration test for concurrent requests in tests/integration/queue-concurrency.test.js"
Task: "Integration test for token refresh in tests/integration/sitemap-endpoint.test.js"
# Launch all unit tests together:
Task: "Unit test for Drive API client query execution in tests/unit/drive-client.test.js"
Task: "Unit test for Drive API pagination handling in tests/unit/drive-client.test.js"
Task: "Unit test for Service Account JWT authentication in tests/unit/auth.test.js"
Task: "Unit test for credential validation in tests/unit/auth.test.js"
Task: "Unit test for sitemap XML generation in tests/unit/sitemap-generator.test.js"
Task: "Unit test for Document to SitemapEntry transformation in tests/unit/sitemap-generator.test.js"
Task: "Unit test for lastmod date formatting in tests/unit/sitemap-generator.test.js"
Task: "Unit test for FIFO queue enqueue/dequeue in tests/unit/queue.test.js"
Task: "Unit test for FIFO queue concurrent request handling in tests/unit/queue.test.js"
Task: "Unit test for XML special character escaping in tests/unit/sitemap-generator.test.js"
```
---
## Implementation Strategy
### MVP = Complete Feature (User Story 1 Only)
This feature is inherently MVP-sized:
1. Complete Phase 1: Setup → Project initialized
2. Complete Phase 2: Foundational → Infrastructure ready (CRITICAL BLOCKER)
3. Complete Phase 3: User Story 1 → **FULL FEATURE COMPLETE**
4. Complete Phase 4: Polish → Production ready
5. **VALIDATE**: Test /sitemap.xml independently with all 10 clarifications verified
### No Incremental Delivery Needed
Unlike multi-story features, this feature has only one user story. The MVP IS the complete feature:
- Single endpoint: `/sitemap.xml`
- All requirements in User Story 1
- No additional stories to add later
### Validation Checklist (All 10 Clarifications)
Before marking feature complete, verify:
1. ✅ Service Account JWT auth works with inline JSON from `GOOGLE_SERVICE_ACCOUNT_KEY` env var
2. ✅ Sitemap URLs use RESTful format: `/documents/{documentId}`
3. ✅ Drive API 503 errors pass through immediately with NO retries
4. ✅ All logs output to stdout/stderr only (no log files)
5. ✅ System returns 413 error when >50,000 documents exist
6. ✅ Fatal errors (invalid credentials, port conflict) crash with exit code 1
7. ✅ Concurrent /sitemap.xml requests queue in FIFO order and process sequentially
8. ✅ Log format is plain text: `[timestamp] [level] message`
9. ✅ Drive API query filter loads from `config/settings.js` (configurable, not hardcoded)
10. ✅ All error responses return status code only with NO response body (except 429 includes Retry-After header)
---
## Task Summary
**Total Tasks**: 83
- **Phase 1 (Setup)**: 10 tasks
- **Phase 2 (Foundational)**: 9 tasks (BLOCKING)
- **Phase 3 (User Story 1)**:
- Tests: 21 tasks (T020-T040)
- Implementation: 29 tasks (T041-T069)
- **Phase 4 (Polish)**: 14 tasks
**Parallel Opportunities**:
- Phase 1: 7 tasks can run in parallel
- Phase 2: 6 tasks can run in parallel
- Phase 3 Tests: All 21 tests can run in parallel
- Phase 3 Implementation: Up to 4 tasks can run in parallel at certain points
- Phase 4: 5 tasks can run in parallel
**Independent Test Criteria**: User Story 1 is independently testable via:
1. GET /sitemap.xml returns 200 with valid XML
2. URLs follow RESTful format /documents/{documentId}
3. > 50k documents returns 413 (no body)
4. Concurrent requests process sequentially (FIFO)
5. Fatal errors crash with exit code 1
6. Logs use plain text format to stdout/stderr
7. Drive API filter loads from config/settings.js
**Suggested MVP Scope**: Complete all phases (this is a single-story feature)
---
## Format Validation
**ALL tasks follow checklist format**:
- Checkbox: `- [ ]`
- Task ID: Sequential (T001-T083)
- [P] marker: Present only on parallelizable tasks
- [Story] label: Present only on User Story 1 phase tasks (US1)
- Description: Includes clear action and exact file path
- File paths: All absolute and specific
**Organization by user story**:
- Setup phase: No story label (infrastructure)
- Foundational phase: No story label (blocking prerequisites)
- User Story 1 phase: All tasks marked [US1]
- Polish phase: No story label (cross-cutting)
**Compliance with constitution**:
- Test-First Development: Tests (T020-T040) come before implementation with approval gate
- Monolithic architecture: Single proxy.js for all logic per plan.md
- Minimal dependencies: Only googleapis + Node.js built-ins per research.md
- Observability: Plain text logging to stdout/stderr per clarification #4, #8
---
## Notes
- This feature has only ONE user story (sitemap generation), so all implementation tasks are in Phase 3
- The feature specification explicitly removed document export functionality from scope (Session 2)
- All 10 clarifications from 3 sessions are incorporated into task descriptions
- Test-first development is mandatory per Constitution Principle III (non-negotiable)
- FIFO queue ensures sequential processing of concurrent requests (no parallel Drive API operations)
- Fatal errors must crash immediately with exit code 1 (no graceful degradation)
- Error responses have NO body (status code only), except 429 includes Retry-After header
- Drive API query filter MUST be configurable via config/settings.js (not hardcoded)