# Quickstart Guide: Google Drive HTTP Proxy Adapter **Feature**: 001-drive-proxy-adapter **Date**: 2026-03-07 **Version**: 1.0.0 --- ## Overview The Google Drive HTTP Proxy Adapter is a Node.js application that generates XML sitemaps of Google Drive documents. It provides a single HTTP endpoint (`/sitemap.xml`) that queries the Google Drive API and returns a sitemap listing all accessible documents with links in RESTful format. **Key Features**: - Service Account authentication (JWT-based, no user interaction) - Sitemap protocol compliant (50,000 URL limit enforced) - FIFO request queuing (sequential processing) - Configurable Drive API filters - Plain text logging to stdout/stderr --- ## Prerequisites 1. **Node.js**: v18.0.0 or later (LTS version recommended) 2. **Google Cloud Project**: With Drive API enabled 3. **Service Account**: JSON key file with Drive API access 4. **Network Access**: Connectivity to googleapis.com --- ## Installation ### 1. Clone Repository ```bash git clone cd google-drive-content-adapter ``` ### 2. Install Dependencies ```bash npm install ``` **Dependencies**: - `googleapis@^140.0.0` - Official Google API client for Node.js --- ## Configuration ### 1. Service Account Setup **Create Service Account** (Google Cloud Console): 1. Navigate to [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts) 2. Click "Create Service Account" 3. Name: `drive-sitemap-adapter` (or your choice) 4. Grant role: None required if accessing service account's own Drive 5. Click "Create Key" → Choose JSON format → Download key file **Enable Drive API**: 1. Navigate to [APIs & Services > Library](https://console.cloud.google.com/apis/library) 2. Search for "Google Drive API" 3. Click "Enable" **Grant Access** (if accessing user drives): - Share Drive folders/files with Service Account email (`xxx@project.iam.gserviceaccount.com`) - OR configure domain-wide delegation (for G Suite organizations) --- ### 2. Environment Variables Create `.env` file in project root (or set environment variables): ```bash # REQUIRED: Service Account credentials (inline JSON) GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"xxx@project.iam.gserviceaccount.com","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}' # OPTIONAL: Server configuration PORT=3000 # Default: 3000 BASE_URL=http://localhost:3000 # Default: http://localhost:3000 # OPTIONAL: Drive API query filter DRIVE_QUERY="trashed = false" # Default: "trashed = false" ``` **Important Notes**: - `GOOGLE_SERVICE_ACCOUNT_KEY` must be a single-line JSON string (escape newlines in private_key) - `BASE_URL` should match your production domain for sitemap URLs - `DRIVE_QUERY` supports Drive API query syntax ([docs](https://developers.google.com/drive/api/guides/search-files)) --- ### 3. Configuration Files **config/config.js**: Server settings (auto-generated from env vars) ```javascript export default { server: { port: process.env.PORT || 3000, baseUrl: process.env.BASE_URL || 'http://localhost:3000' } }; ``` **config/settings.js**: Drive API configuration ```javascript export default { drive: { query: process.env.DRIVE_QUERY || "trashed = false", fields: 'files(id, name, mimeType, modifiedTime)', pageSize: 1000, scope: 'https://www.googleapis.com/auth/drive.readonly' } }; ``` **To customize Drive API filter**, edit `config/settings.js` or set `DRIVE_QUERY` env var. --- ## Usage ### Start Server (Development) ```bash npm run dev ``` **Output**: ``` [2026-03-07T10:00:00.000Z] [INFO] Server configuration loaded: port=3000, baseUrl=http://localhost:3000 [2026-03-07T10:00:00.100Z] [INFO] Service Account authenticated: xxx***@project.iam.gserviceaccount.com [2026-03-07T10:00:00.200Z] [INFO] HTTP server listening on port 3000 ``` --- ### Start Server (Production) ```bash npm start ``` --- ### Request Sitemap **Using curl**: ```bash curl http://localhost:3000/sitemap.xml ``` **Expected Response** (200 OK): ```xml http://localhost:3000/documents/1A2B3C4D5E6F7G8H 2026-03-07 http://localhost:3000/documents/9I0J1K2L3M4N5O6P 2026-03-05 ``` --- ## Testing ### Run All Tests ```bash npm test ``` **Test Suites**: - `tests/unit/` - Unit tests for Drive client, auth, sitemap generator, queue - `tests/integration/` - End-to-end endpoint tests for /sitemap.xml - `tests/contract/` - XML sitemap schema validation tests --- ### Run Specific Test Suite ```bash npm run test:unit # Unit tests only npm run test:integration # Integration tests only npm run test:contract # Contract tests only ``` --- ## API Reference ### Endpoint: `GET /sitemap.xml` **Description**: Generate XML sitemap of all accessible Google Drive documents. **Request**: ```http GET /sitemap.xml HTTP/1.1 Host: example.com ``` **Success Response** (200 OK): ```http HTTP/1.1 200 OK Content-Type: application/xml; charset=utf-8 Content-Length: {size} ``` **Error Responses**: - `404 Not Found` - Invalid endpoint (only /sitemap.xml supported) - `413 Payload Too Large` - More than 50,000 documents in Drive - `429 Too Many Requests` - Rate limit exceeded (includes `Retry-After` header) - `401 Unauthorized` - Authentication failed - `503 Service Unavailable` - Drive API unavailable - `500 Internal Server Error` - Unexpected error **Note**: All error responses have **empty body** (status code only). See [contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md) for full API contract. --- ## Architecture ### Project Structure ``` google-drive-content-adapter/ ├── src/ │ ├── server.js # HTTP server entry point │ ├── proxy.js # Monolithic route handler (sitemap logic) │ ├── logger.js # Logging module (console.js alias) │ ├── auth.js # Service Account JWT authentication │ └── xml-utils.js # XML generation utilities ├── config/ │ ├── config.js # Server configuration (port, baseUrl) │ └── settings.js # Drive API filter configuration ├── tests/ │ ├── unit/ # Unit tests │ ├── integration/ # Integration tests │ └── contract/ # Contract tests ├── specs/ # Feature specifications and planning docs │ └── 001-drive-proxy-adapter/ │ ├── spec.md │ ├── plan.md │ ├── research.md │ ├── data-model.md │ ├── quickstart.md (this file) │ └── contracts/ │ └── sitemap-xml-schema.md ├── package.json └── README.md ``` --- ### Request Flow ``` 1. Client → GET /sitemap.xml 2. Server → Create RequestContext (ID, timestamp) 3. Server → Enqueue request (FIFO queue) 4. Queue → Process request (sequential, one at a time) 5. Proxy → Authenticate with Service Account JWT 6. Proxy → Query Drive API files.list() (paginate if >1000 docs) 7. Proxy → Check count ≤ 50,000 8. Proxy → Transform Documents to SitemapEntries 9. Proxy → Generate XML sitemap 10. Server → Return 200 + XML (or error status) 11. Queue → Process next request ``` --- ## Troubleshooting ### 1. Fatal Error: Invalid Service Account Credentials **Error**: ``` [2026-03-07T10:00:00.000Z] [ERROR] FATAL: Invalid client_email in Service Account credentials ``` **Solution**: - Check `GOOGLE_SERVICE_ACCOUNT_KEY` env var is valid JSON - Ensure `client_email` field ends with `.gserviceaccount.com` - Ensure `private_key` field starts with `-----BEGIN PRIVATE KEY-----` - Verify no extra escaping/quotes in JSON string --- ### 2. Fatal Error: Port Already in Use **Error**: ``` [2026-03-07T10:00:00.000Z] [ERROR] FATAL: Unable to bind to port 3000 (EADDRINUSE) ``` **Solution**: - Change `PORT` env var to different port (e.g., 8080) - OR stop other process using port 3000: `lsof -ti:3000 | xargs kill` --- ### 3. 401 Unauthorized Response **Cause**: Service Account token refresh failed **Solution**: - Verify Service Account has Drive API access (share folders with service account email) - Check Drive API is enabled in Google Cloud Console - Ensure scope is correct: `https://www.googleapis.com/auth/drive.readonly` --- ### 4. 413 Payload Too Large Response **Cause**: Google Drive contains more than 50,000 documents **Solution**: - Adjust `DRIVE_QUERY` to filter documents (e.g., by folder, date, file type) - Example: `DRIVE_QUERY="'folder-id' in parents and trashed = false"` --- ### 5. 429 Too Many Requests Response **Cause**: Drive API rate limit exceeded **Solution**: - Wait for time specified in `Retry-After` response header (seconds) - Reduce request frequency - Consider Drive API quota limits ([docs](https://developers.google.com/drive/api/guides/limits)) --- ### 6. 503 Service Unavailable Response **Cause**: Google Drive API is temporarily unavailable **Solution**: - Wait and retry manually (no automatic retries per spec) - Check [Google Workspace Status Dashboard](https://www.google.com/appsstatus) --- ## Performance Tips ### 1. Optimize Drive Query Filter **Default** (all files): ```javascript DRIVE_QUERY="trashed = false" ``` **Filter by folder**: ```javascript DRIVE_QUERY="'folder-id' in parents and trashed = false" ``` **Filter by date**: ```javascript DRIVE_QUERY="modifiedTime > '2026-01-01T00:00:00' and trashed = false" ``` **Filter by MIME type**: ```javascript DRIVE_QUERY="mimeType = 'application/pdf' and trashed = false" ``` See [Drive API search query syntax](https://developers.google.com/drive/api/guides/search-files) for more options. --- ### 2. Adjust BASE_URL for Production **Development**: ``` BASE_URL=http://localhost:3000 ``` **Production**: ``` BASE_URL=https://your-domain.com ``` This ensures sitemap URLs point to the correct domain. --- ### 3. Monitor Memory Usage **Check memory usage** (production): ```bash node --inspect src/server.js # Open chrome://inspect in Chrome DevTools ``` **Expected**: <256MB under normal load (<10 concurrent requests) --- ## Security Best Practices 1. **Never commit** Service Account JSON key file to version control 2. **Use environment variables** for all sensitive configuration 3. **Restrict Service Account permissions** to minimum required (readonly scope) 4. **Monitor logs** for unauthorized access attempts 5. **Use HTTPS** in production (configure reverse proxy like nginx) 6. **Filter credentials from logs** (private_key field never logged) --- ## Deployment ### Docker (Recommended) **Dockerfile**: ```dockerfile FROM node:18-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . EXPOSE 3000 CMD ["npm", "start"] ``` **Build and run**: ```bash docker build -t drive-sitemap-adapter . docker run -p 3000:3000 \ -e GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account",...}' \ -e BASE_URL=https://your-domain.com \ drive-sitemap-adapter ``` --- ### Cloud Platforms **Google Cloud Run**: ```bash gcloud run deploy drive-sitemap-adapter \ --source . \ --set-env-vars BASE_URL=https://your-domain.com \ --set-secrets GOOGLE_SERVICE_ACCOUNT_KEY=service-account-key:latest ``` **AWS ECS / Fargate**: Use environment variables in task definition **Heroku**: Set environment variables via Heroku CLI or dashboard --- ## Additional Resources - **Feature Specification**: [specs/001-drive-proxy-adapter/spec.md](./spec.md) - **Implementation Plan**: [specs/001-drive-proxy-adapter/plan.md](./plan.md) - **Research Document**: [specs/001-drive-proxy-adapter/research.md](./research.md) - **Data Model**: [specs/001-drive-proxy-adapter/data-model.md](./data-model.md) - **API Contract**: [specs/001-drive-proxy-adapter/contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md) - **Google Drive API Docs**: [https://developers.google.com/drive/api/v3/reference](https://developers.google.com/drive/api/v3/reference) - **Sitemap Protocol**: [https://www.sitemaps.org/protocol.html](https://www.sitemaps.org/protocol.html) --- ## Support For issues or questions, refer to: 1. This quickstart guide 2. Feature specification (spec.md) for requirements 3. Research document (research.md) for technical decisions 4. Contract documentation (contracts/) for API details --- ## Version History | Version | Date | Changes | |---------|------|---------| | 1.0.0 | 2026-03-07 | Initial quickstart guide |