13 KiB
Quickstart Guide: Google Drive HTTP Proxy Adapter
Feature: 001-drive-proxy-adapter
Date: 2026-03-07
Version: 1.0.0
Overview
The Google Drive HTTP Proxy Adapter is a Node.js application that generates XML sitemaps of Google Drive documents. It provides a single HTTP endpoint (/sitemap.xml) that queries the Google Drive API and returns a sitemap listing all accessible documents with links in RESTful format.
Key Features:
- Service Account authentication (JWT-based, no user interaction)
- Sitemap protocol compliant (50,000 URL limit enforced)
- FIFO request queuing (sequential processing)
- Configurable Drive API filters
- Plain text logging to stdout/stderr
Prerequisites
- Node.js: v18.0.0 or later (LTS version recommended)
- Google Cloud Project: With Drive API enabled
- Service Account: JSON key file with Drive API access
- Network Access: Connectivity to googleapis.com
Installation
1. Clone Repository
git clone <repository-url>
cd google-drive-content-adapter
2. Install Dependencies
npm install
Dependencies:
googleapis@^140.0.0- Official Google API client for Node.js
Configuration
1. Service Account Setup
Create Service Account (Google Cloud Console):
- Navigate to IAM & Admin > Service Accounts
- Click "Create Service Account"
- Name:
drive-sitemap-adapter(or your choice) - Grant role: None required if accessing service account's own Drive
- Click "Create Key" → Choose JSON format → Download key file
Enable Drive API:
- Navigate to APIs & Services > Library
- Search for "Google Drive API"
- Click "Enable"
Grant Access (if accessing user drives):
- Share Drive folders/files with Service Account email (
xxx@project.iam.gserviceaccount.com) - OR configure domain-wide delegation (for G Suite organizations)
2. Environment Variables
Create .env file in project root (or set environment variables):
# REQUIRED: Service Account credentials (inline JSON)
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"xxx@project.iam.gserviceaccount.com","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}'
# OPTIONAL: Server configuration
PORT=3000 # Default: 3000
BASE_URL=http://localhost:3000 # Default: http://localhost:3000
# OPTIONAL: Drive API query filter
DRIVE_QUERY="trashed = false" # Default: "trashed = false"
Important Notes:
GOOGLE_SERVICE_ACCOUNT_KEYmust be a single-line JSON string (escape newlines in private_key)BASE_URLshould match your production domain for sitemap URLsDRIVE_QUERYsupports Drive API query syntax (docs)
3. Configuration Files
config/config.js: Server settings (auto-generated from env vars)
export default {
server: {
port: process.env.PORT || 3000,
baseUrl: process.env.BASE_URL || 'http://localhost:3000'
}
};
config/settings.js: Drive API configuration
export default {
drive: {
query: process.env.DRIVE_QUERY || "trashed = false",
fields: 'files(id, name, mimeType, modifiedTime)',
pageSize: 1000,
scope: 'https://www.googleapis.com/auth/drive.readonly'
}
};
To customize Drive API filter, edit config/settings.js or set DRIVE_QUERY env var.
Usage
Start Server (Development)
npm run dev
Output:
[2026-03-07T10:00:00.000Z] [INFO] Server configuration loaded: port=3000, baseUrl=http://localhost:3000
[2026-03-07T10:00:00.100Z] [INFO] Service Account authenticated: xxx***@project.iam.gserviceaccount.com
[2026-03-07T10:00:00.200Z] [INFO] HTTP server listening on port 3000
Start Server (Production)
npm start
Request Sitemap
Using curl:
curl http://localhost:3000/sitemap.xml
Expected Response (200 OK):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost:3000/documents/1A2B3C4D5E6F7G8H</loc>
<lastmod>2026-03-07</lastmod>
</url>
<url>
<loc>http://localhost:3000/documents/9I0J1K2L3M4N5O6P</loc>
<lastmod>2026-03-05</lastmod>
</url>
</urlset>
Testing
Run All Tests
npm test
Test Suites:
tests/unit/- Unit tests for Drive client, auth, sitemap generator, queuetests/integration/- End-to-end endpoint tests for /sitemap.xmltests/contract/- XML sitemap schema validation tests
Run Specific Test Suite
npm run test:unit # Unit tests only
npm run test:integration # Integration tests only
npm run test:contract # Contract tests only
API Reference
Endpoint: GET /sitemap.xml
Description: Generate XML sitemap of all accessible Google Drive documents.
Request:
GET /sitemap.xml HTTP/1.1
Host: example.com
Success Response (200 OK):
HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
Content-Length: {size}
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- up to 50,000 URL entries -->
</urlset>
Error Responses:
404 Not Found- Invalid endpoint (only /sitemap.xml supported)413 Payload Too Large- More than 50,000 documents in Drive429 Too Many Requests- Rate limit exceeded (includesRetry-Afterheader)401 Unauthorized- Authentication failed503 Service Unavailable- Drive API unavailable500 Internal Server Error- Unexpected error
Note: All error responses have empty body (status code only).
See contracts/sitemap-xml-schema.md for full API contract.
Architecture
Project Structure
google-drive-content-adapter/
├── src/
│ ├── server.js # HTTP server entry point
│ ├── proxy.js # Monolithic route handler (sitemap logic)
│ ├── logger.js # Logging module (console.js alias)
│ ├── auth.js # Service Account JWT authentication
│ └── xml-utils.js # XML generation utilities
├── config/
│ ├── config.js # Server configuration (port, baseUrl)
│ └── settings.js # Drive API filter configuration
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── contract/ # Contract tests
├── specs/ # Feature specifications and planning docs
│ └── 001-drive-proxy-adapter/
│ ├── spec.md
│ ├── plan.md
│ ├── research.md
│ ├── data-model.md
│ ├── quickstart.md (this file)
│ └── contracts/
│ └── sitemap-xml-schema.md
├── package.json
└── README.md
Request Flow
1. Client → GET /sitemap.xml
2. Server → Create RequestContext (ID, timestamp)
3. Server → Enqueue request (FIFO queue)
4. Queue → Process request (sequential, one at a time)
5. Proxy → Authenticate with Service Account JWT
6. Proxy → Query Drive API files.list() (paginate if >1000 docs)
7. Proxy → Check count ≤ 50,000
8. Proxy → Transform Documents to SitemapEntries
9. Proxy → Generate XML sitemap
10. Server → Return 200 + XML (or error status)
11. Queue → Process next request
Troubleshooting
1. Fatal Error: Invalid Service Account Credentials
Error:
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Invalid client_email in Service Account credentials
Solution:
- Check
GOOGLE_SERVICE_ACCOUNT_KEYenv var is valid JSON - Ensure
client_emailfield ends with.gserviceaccount.com - Ensure
private_keyfield starts with-----BEGIN PRIVATE KEY----- - Verify no extra escaping/quotes in JSON string
2. Fatal Error: Port Already in Use
Error:
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Unable to bind to port 3000 (EADDRINUSE)
Solution:
- Change
PORTenv var to different port (e.g., 8080) - OR stop other process using port 3000:
lsof -ti:3000 | xargs kill
3. 401 Unauthorized Response
Cause: Service Account token refresh failed
Solution:
- Verify Service Account has Drive API access (share folders with service account email)
- Check Drive API is enabled in Google Cloud Console
- Ensure scope is correct:
https://www.googleapis.com/auth/drive.readonly
4. 413 Payload Too Large Response
Cause: Google Drive contains more than 50,000 documents
Solution:
- Adjust
DRIVE_QUERYto filter documents (e.g., by folder, date, file type) - Example:
DRIVE_QUERY="'folder-id' in parents and trashed = false"
5. 429 Too Many Requests Response
Cause: Drive API rate limit exceeded
Solution:
- Wait for time specified in
Retry-Afterresponse header (seconds) - Reduce request frequency
- Consider Drive API quota limits (docs)
6. 503 Service Unavailable Response
Cause: Google Drive API is temporarily unavailable
Solution:
- Wait and retry manually (no automatic retries per spec)
- Check Google Workspace Status Dashboard
Performance Tips
1. Optimize Drive Query Filter
Default (all files):
DRIVE_QUERY="trashed = false"
Filter by folder:
DRIVE_QUERY="'folder-id' in parents and trashed = false"
Filter by date:
DRIVE_QUERY="modifiedTime > '2026-01-01T00:00:00' and trashed = false"
Filter by MIME type:
DRIVE_QUERY="mimeType = 'application/pdf' and trashed = false"
See Drive API search query syntax for more options.
2. Adjust BASE_URL for Production
Development:
BASE_URL=http://localhost:3000
Production:
BASE_URL=https://your-domain.com
This ensures sitemap URLs point to the correct domain.
3. Monitor Memory Usage
Check memory usage (production):
node --inspect src/server.js
# Open chrome://inspect in Chrome DevTools
Expected: <256MB under normal load (<10 concurrent requests)
Security Best Practices
- Never commit Service Account JSON key file to version control
- Use environment variables for all sensitive configuration
- Restrict Service Account permissions to minimum required (readonly scope)
- Monitor logs for unauthorized access attempts
- Use HTTPS in production (configure reverse proxy like nginx)
- Filter credentials from logs (private_key field never logged)
Deployment
Docker (Recommended)
Dockerfile:
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Build and run:
docker build -t drive-sitemap-adapter .
docker run -p 3000:3000 \
-e GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account",...}' \
-e BASE_URL=https://your-domain.com \
drive-sitemap-adapter
Cloud Platforms
Google Cloud Run:
gcloud run deploy drive-sitemap-adapter \
--source . \
--set-env-vars BASE_URL=https://your-domain.com \
--set-secrets GOOGLE_SERVICE_ACCOUNT_KEY=service-account-key:latest
AWS ECS / Fargate: Use environment variables in task definition
Heroku: Set environment variables via Heroku CLI or dashboard
Additional Resources
- Feature Specification: specs/001-drive-proxy-adapter/spec.md
- Implementation Plan: specs/001-drive-proxy-adapter/plan.md
- Research Document: specs/001-drive-proxy-adapter/research.md
- Data Model: specs/001-drive-proxy-adapter/data-model.md
- API Contract: specs/001-drive-proxy-adapter/contracts/sitemap-xml-schema.md
- Google Drive API Docs: https://developers.google.com/drive/api/v3/reference
- Sitemap Protocol: https://www.sitemaps.org/protocol.html
Support
For issues or questions, refer to:
- This quickstart guide
- Feature specification (spec.md) for requirements
- Research document (research.md) for technical decisions
- Contract documentation (contracts/) for API details
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-03-07 | Initial quickstart guide |