496 lines
13 KiB
Markdown
496 lines
13 KiB
Markdown
# Quickstart Guide: Google Drive HTTP Proxy Adapter
|
|
|
|
**Feature**: 001-drive-proxy-adapter
|
|
**Date**: 2026-03-07
|
|
**Version**: 1.0.0
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The Google Drive HTTP Proxy Adapter is a Node.js application that generates XML sitemaps of Google Drive documents. It provides a single HTTP endpoint (`/sitemap.xml`) that queries the Google Drive API and returns a sitemap listing all accessible documents with links in RESTful format.
|
|
|
|
**Key Features**:
|
|
- Service Account authentication (JWT-based, no user interaction)
|
|
- Sitemap protocol compliant (50,000 URL limit enforced)
|
|
- FIFO request queuing (sequential processing)
|
|
- Configurable Drive API filters
|
|
- Plain text logging to stdout/stderr
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
1. **Node.js**: v18.0.0 or later (LTS version recommended)
|
|
2. **Google Cloud Project**: With Drive API enabled
|
|
3. **Service Account**: JSON key file with Drive API access
|
|
4. **Network Access**: Connectivity to googleapis.com
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
### 1. Clone Repository
|
|
|
|
```bash
|
|
git clone <repository-url>
|
|
cd google-drive-content-adapter
|
|
```
|
|
|
|
### 2. Install Dependencies
|
|
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
**Dependencies**:
|
|
- `googleapis@^140.0.0` - Official Google API client for Node.js
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### 1. Service Account Setup
|
|
|
|
**Create Service Account** (Google Cloud Console):
|
|
1. Navigate to [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
|
|
2. Click "Create Service Account"
|
|
3. Name: `drive-sitemap-adapter` (or your choice)
|
|
4. Grant role: None required if accessing service account's own Drive
|
|
5. Click "Create Key" → Choose JSON format → Download key file
|
|
|
|
**Enable Drive API**:
|
|
1. Navigate to [APIs & Services > Library](https://console.cloud.google.com/apis/library)
|
|
2. Search for "Google Drive API"
|
|
3. Click "Enable"
|
|
|
|
**Grant Access** (if accessing user drives):
|
|
- Share Drive folders/files with Service Account email (`xxx@project.iam.gserviceaccount.com`)
|
|
- OR configure domain-wide delegation (for G Suite organizations)
|
|
|
|
---
|
|
|
|
### 2. Environment Variables
|
|
|
|
Create `.env` file in project root (or set environment variables):
|
|
|
|
```bash
|
|
# REQUIRED: Service Account credentials (inline JSON)
|
|
GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n","client_email":"xxx@project.iam.gserviceaccount.com","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}'
|
|
|
|
# OPTIONAL: Server configuration
|
|
PORT=3000 # Default: 3000
|
|
BASE_URL=http://localhost:3000 # Default: http://localhost:3000
|
|
|
|
# OPTIONAL: Drive API query filter
|
|
DRIVE_QUERY="trashed = false" # Default: "trashed = false"
|
|
```
|
|
|
|
**Important Notes**:
|
|
- `GOOGLE_SERVICE_ACCOUNT_KEY` must be a single-line JSON string (escape newlines in private_key)
|
|
- `BASE_URL` should match your production domain for sitemap URLs
|
|
- `DRIVE_QUERY` supports Drive API query syntax ([docs](https://developers.google.com/drive/api/guides/search-files))
|
|
|
|
---
|
|
|
|
### 3. Configuration Files
|
|
|
|
**config/config.js**: Server settings (auto-generated from env vars)
|
|
```javascript
|
|
export default {
|
|
server: {
|
|
port: process.env.PORT || 3000,
|
|
baseUrl: process.env.BASE_URL || 'http://localhost:3000'
|
|
}
|
|
};
|
|
```
|
|
|
|
**config/settings.js**: Drive API configuration
|
|
```javascript
|
|
export default {
|
|
drive: {
|
|
query: process.env.DRIVE_QUERY || "trashed = false",
|
|
fields: 'files(id, name, mimeType, modifiedTime)',
|
|
pageSize: 1000,
|
|
scope: 'https://www.googleapis.com/auth/drive.readonly'
|
|
}
|
|
};
|
|
```
|
|
|
|
**To customize Drive API filter**, edit `config/settings.js` or set `DRIVE_QUERY` env var.
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
### Start Server (Development)
|
|
|
|
```bash
|
|
npm run dev
|
|
```
|
|
|
|
**Output**:
|
|
```
|
|
[2026-03-07T10:00:00.000Z] [INFO] Server configuration loaded: port=3000, baseUrl=http://localhost:3000
|
|
[2026-03-07T10:00:00.100Z] [INFO] Service Account authenticated: xxx***@project.iam.gserviceaccount.com
|
|
[2026-03-07T10:00:00.200Z] [INFO] HTTP server listening on port 3000
|
|
```
|
|
|
|
---
|
|
|
|
### Start Server (Production)
|
|
|
|
```bash
|
|
npm start
|
|
```
|
|
|
|
---
|
|
|
|
### Request Sitemap
|
|
|
|
**Using curl**:
|
|
```bash
|
|
curl http://localhost:3000/sitemap.xml
|
|
```
|
|
|
|
**Expected Response** (200 OK):
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
|
|
<url>
|
|
<loc>http://localhost:3000/documents/1A2B3C4D5E6F7G8H</loc>
|
|
<lastmod>2026-03-07</lastmod>
|
|
</url>
|
|
<url>
|
|
<loc>http://localhost:3000/documents/9I0J1K2L3M4N5O6P</loc>
|
|
<lastmod>2026-03-05</lastmod>
|
|
</url>
|
|
</urlset>
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Run All Tests
|
|
|
|
```bash
|
|
npm test
|
|
```
|
|
|
|
**Test Suites**:
|
|
- `tests/unit/` - Unit tests for Drive client, auth, sitemap generator, queue
|
|
- `tests/integration/` - End-to-end endpoint tests for /sitemap.xml
|
|
- `tests/contract/` - XML sitemap schema validation tests
|
|
|
|
---
|
|
|
|
### Run Specific Test Suite
|
|
|
|
```bash
|
|
npm run test:unit # Unit tests only
|
|
npm run test:integration # Integration tests only
|
|
npm run test:contract # Contract tests only
|
|
```
|
|
|
|
---
|
|
|
|
## API Reference
|
|
|
|
### Endpoint: `GET /sitemap.xml`
|
|
|
|
**Description**: Generate XML sitemap of all accessible Google Drive documents.
|
|
|
|
**Request**:
|
|
```http
|
|
GET /sitemap.xml HTTP/1.1
|
|
Host: example.com
|
|
```
|
|
|
|
**Success Response** (200 OK):
|
|
```http
|
|
HTTP/1.1 200 OK
|
|
Content-Type: application/xml; charset=utf-8
|
|
Content-Length: {size}
|
|
|
|
```
|
|
|
|
**Error Responses**:
|
|
- `404 Not Found` - Invalid endpoint (only /sitemap.xml supported)
|
|
- `413 Payload Too Large` - More than 50,000 documents in Drive
|
|
- `429 Too Many Requests` - Rate limit exceeded (includes `Retry-After` header)
|
|
- `401 Unauthorized` - Authentication failed
|
|
- `503 Service Unavailable` - Drive API unavailable
|
|
- `500 Internal Server Error` - Unexpected error
|
|
|
|
**Note**: All error responses have **empty body** (status code only).
|
|
|
|
See [contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md) for full API contract.
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
google-drive-content-adapter/
|
|
├── src/
|
|
│ ├── server.js # HTTP server entry point
|
|
│ ├── proxy.js # Monolithic route handler (sitemap logic)
|
|
│ ├── logger.js # Logging module (console.js alias)
|
|
│ ├── auth.js # Service Account JWT authentication
|
|
│ └── xml-utils.js # XML generation utilities
|
|
├── config/
|
|
│ ├── config.js # Server configuration (port, baseUrl)
|
|
│ └── settings.js # Drive API filter configuration
|
|
├── tests/
|
|
│ ├── unit/ # Unit tests
|
|
│ ├── integration/ # Integration tests
|
|
│ └── contract/ # Contract tests
|
|
├── specs/ # Feature specifications and planning docs
|
|
│ └── 001-drive-proxy-adapter/
|
|
│ ├── spec.md
|
|
│ ├── plan.md
|
|
│ ├── research.md
|
|
│ ├── data-model.md
|
|
│ ├── quickstart.md (this file)
|
|
│ └── contracts/
|
|
│ └── sitemap-xml-schema.md
|
|
├── package.json
|
|
└── README.md
|
|
```
|
|
|
|
---
|
|
|
|
### Request Flow
|
|
|
|
```
|
|
1. Client → GET /sitemap.xml
|
|
2. Server → Create RequestContext (ID, timestamp)
|
|
3. Server → Enqueue request (FIFO queue)
|
|
4. Queue → Process request (sequential, one at a time)
|
|
5. Proxy → Authenticate with Service Account JWT
|
|
6. Proxy → Query Drive API files.list() (paginate if >1000 docs)
|
|
7. Proxy → Check count ≤ 50,000
|
|
8. Proxy → Transform Documents to SitemapEntries
|
|
9. Proxy → Generate XML sitemap
|
|
10. Server → Return 200 + XML (or error status)
|
|
11. Queue → Process next request
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### 1. Fatal Error: Invalid Service Account Credentials
|
|
|
|
**Error**:
|
|
```
|
|
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Invalid client_email in Service Account credentials
|
|
```
|
|
|
|
**Solution**:
|
|
- Check `GOOGLE_SERVICE_ACCOUNT_KEY` env var is valid JSON
|
|
- Ensure `client_email` field ends with `.gserviceaccount.com`
|
|
- Ensure `private_key` field starts with `-----BEGIN PRIVATE KEY-----`
|
|
- Verify no extra escaping/quotes in JSON string
|
|
|
|
---
|
|
|
|
### 2. Fatal Error: Port Already in Use
|
|
|
|
**Error**:
|
|
```
|
|
[2026-03-07T10:00:00.000Z] [ERROR] FATAL: Unable to bind to port 3000 (EADDRINUSE)
|
|
```
|
|
|
|
**Solution**:
|
|
- Change `PORT` env var to different port (e.g., 8080)
|
|
- OR stop other process using port 3000: `lsof -ti:3000 | xargs kill`
|
|
|
|
---
|
|
|
|
### 3. 401 Unauthorized Response
|
|
|
|
**Cause**: Service Account token refresh failed
|
|
|
|
**Solution**:
|
|
- Verify Service Account has Drive API access (share folders with service account email)
|
|
- Check Drive API is enabled in Google Cloud Console
|
|
- Ensure scope is correct: `https://www.googleapis.com/auth/drive.readonly`
|
|
|
|
---
|
|
|
|
### 4. 413 Payload Too Large Response
|
|
|
|
**Cause**: Google Drive contains more than 50,000 documents
|
|
|
|
**Solution**:
|
|
- Adjust `DRIVE_QUERY` to filter documents (e.g., by folder, date, file type)
|
|
- Example: `DRIVE_QUERY="'folder-id' in parents and trashed = false"`
|
|
|
|
---
|
|
|
|
### 5. 429 Too Many Requests Response
|
|
|
|
**Cause**: Drive API rate limit exceeded
|
|
|
|
**Solution**:
|
|
- Wait for time specified in `Retry-After` response header (seconds)
|
|
- Reduce request frequency
|
|
- Consider Drive API quota limits ([docs](https://developers.google.com/drive/api/guides/limits))
|
|
|
|
---
|
|
|
|
### 6. 503 Service Unavailable Response
|
|
|
|
**Cause**: Google Drive API is temporarily unavailable
|
|
|
|
**Solution**:
|
|
- Wait and retry manually (no automatic retries per spec)
|
|
- Check [Google Workspace Status Dashboard](https://www.google.com/appsstatus)
|
|
|
|
---
|
|
|
|
## Performance Tips
|
|
|
|
### 1. Optimize Drive Query Filter
|
|
|
|
**Default** (all files):
|
|
```javascript
|
|
DRIVE_QUERY="trashed = false"
|
|
```
|
|
|
|
**Filter by folder**:
|
|
```javascript
|
|
DRIVE_QUERY="'folder-id' in parents and trashed = false"
|
|
```
|
|
|
|
**Filter by date**:
|
|
```javascript
|
|
DRIVE_QUERY="modifiedTime > '2026-01-01T00:00:00' and trashed = false"
|
|
```
|
|
|
|
**Filter by MIME type**:
|
|
```javascript
|
|
DRIVE_QUERY="mimeType = 'application/pdf' and trashed = false"
|
|
```
|
|
|
|
See [Drive API search query syntax](https://developers.google.com/drive/api/guides/search-files) for more options.
|
|
|
|
---
|
|
|
|
### 2. Adjust BASE_URL for Production
|
|
|
|
**Development**:
|
|
```
|
|
BASE_URL=http://localhost:3000
|
|
```
|
|
|
|
**Production**:
|
|
```
|
|
BASE_URL=https://your-domain.com
|
|
```
|
|
|
|
This ensures sitemap URLs point to the correct domain.
|
|
|
|
---
|
|
|
|
### 3. Monitor Memory Usage
|
|
|
|
**Check memory usage** (production):
|
|
```bash
|
|
node --inspect src/server.js
|
|
# Open chrome://inspect in Chrome DevTools
|
|
```
|
|
|
|
**Expected**: <256MB under normal load (<10 concurrent requests)
|
|
|
|
---
|
|
|
|
## Security Best Practices
|
|
|
|
1. **Never commit** Service Account JSON key file to version control
|
|
2. **Use environment variables** for all sensitive configuration
|
|
3. **Restrict Service Account permissions** to minimum required (readonly scope)
|
|
4. **Monitor logs** for unauthorized access attempts
|
|
5. **Use HTTPS** in production (configure reverse proxy like nginx)
|
|
6. **Filter credentials from logs** (private_key field never logged)
|
|
|
|
---
|
|
|
|
## Deployment
|
|
|
|
### Docker (Recommended)
|
|
|
|
**Dockerfile**:
|
|
```dockerfile
|
|
FROM node:18-alpine
|
|
WORKDIR /app
|
|
COPY package*.json ./
|
|
RUN npm ci --only=production
|
|
COPY . .
|
|
EXPOSE 3000
|
|
CMD ["npm", "start"]
|
|
```
|
|
|
|
**Build and run**:
|
|
```bash
|
|
docker build -t drive-sitemap-adapter .
|
|
docker run -p 3000:3000 \
|
|
-e GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account",...}' \
|
|
-e BASE_URL=https://your-domain.com \
|
|
drive-sitemap-adapter
|
|
```
|
|
|
|
---
|
|
|
|
### Cloud Platforms
|
|
|
|
**Google Cloud Run**:
|
|
```bash
|
|
gcloud run deploy drive-sitemap-adapter \
|
|
--source . \
|
|
--set-env-vars BASE_URL=https://your-domain.com \
|
|
--set-secrets GOOGLE_SERVICE_ACCOUNT_KEY=service-account-key:latest
|
|
```
|
|
|
|
**AWS ECS / Fargate**: Use environment variables in task definition
|
|
|
|
**Heroku**: Set environment variables via Heroku CLI or dashboard
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- **Feature Specification**: [specs/001-drive-proxy-adapter/spec.md](./spec.md)
|
|
- **Implementation Plan**: [specs/001-drive-proxy-adapter/plan.md](./plan.md)
|
|
- **Research Document**: [specs/001-drive-proxy-adapter/research.md](./research.md)
|
|
- **Data Model**: [specs/001-drive-proxy-adapter/data-model.md](./data-model.md)
|
|
- **API Contract**: [specs/001-drive-proxy-adapter/contracts/sitemap-xml-schema.md](./contracts/sitemap-xml-schema.md)
|
|
- **Google Drive API Docs**: [https://developers.google.com/drive/api/v3/reference](https://developers.google.com/drive/api/v3/reference)
|
|
- **Sitemap Protocol**: [https://www.sitemaps.org/protocol.html](https://www.sitemaps.org/protocol.html)
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
For issues or questions, refer to:
|
|
1. This quickstart guide
|
|
2. Feature specification (spec.md) for requirements
|
|
3. Research document (research.md) for technical decisions
|
|
4. Contract documentation (contracts/) for API details
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0.0 | 2026-03-07 | Initial quickstart guide |
|