Peter.Morton 1fecaf52f7 Implement literal function body pattern for global variable functions
Changed architecture so .js files contain literal function bodies

Changes to googleDriveAdapterHelper.js:
1. Changed from object literal (\({...}\)) to return statement (return {...})
2. File now contains LITERAL BODY of a function
3. Updated header comment to explain this pattern
4. File is NO LONGER valid standalone JavaScript (has bare return)

Changes to server.js loadGlobalVariables():
1. Wrap loaded code in function: `(function() { ${code} })()`
2. Creates IIFE that executes the function body
3. Captures returned object from the function
4. Pattern applies to ALL .js files in globalVariables/

Pattern:
```javascript
// File: googleDriveAdapterHelper.js (literal function body)
class DocumentCountExceededError extends Error {...}
function generateRequestId() {...}

return {
  DocumentCountExceededError,
  generateRequestId,
  // ... all exports
};
```

```javascript
// server.js wraps it:
const wrappedCode = `(function() {
${code}
})()`;
const script = new vm.Script(wrappedCode, { filename: file });
const returnedObject = script.runInContext(context);
```

Benefits:
 Files represent pure function bodies
 Wrapping logic centralized in server.js
 Clear separation: content vs. execution wrapper
 Explicit return statement in function body
 Consistent pattern for all global variable functions
 Easy to understand: file = function body, server = wrapper

Testing:
✓ Server starts successfully
✓ Module loads: 'Loaded global functions: googleDriveAdapterHelper'
✓ Object captured: type=object, keys=11
✓ All functions accessible in VM context
✓ proxy.js can call googleDriveAdapterHelper.* functions

Note: googleDriveAdapterHelper.js will show syntax error if run standalone
(has bare return statement) - this is intentional, it's a function body!

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-07 12:31:23 -06:00
2026-03-07 01:36:41 -06:00
2026-03-07 01:36:41 -06:00
2026-03-06 23:34:00 -06:00

Google Drive Sitemap Adapter

HTTP service that generates XML sitemaps listing all accessible documents in a Google Drive account. Uses Service Account authentication for secure, automated access.

Features

  • Sitemap Generation: XML sitemap at /sitemap.xml listing all accessible Google Drive documents
  • RESTful URLs: Document links in format /documents/{documentId} per sitemap protocol
  • Service Account Auth: JWT-based authentication using Google Service Account credentials
  • Pagination Support: Handles large document sets (up to 50,000 URLs per sitemap protocol)
  • 50k Limit Enforcement: Returns 413 error if document count exceeds sitemap protocol limit
  • FIFO Request Queue: Concurrent requests processed sequentially (one at a time)
  • Rate Limit Handling: Returns 429 with Retry-After header when Drive API rate limits
  • No Retry on 503: Fails immediately on Drive API unavailability (per spec)
  • Minimal Dependencies: Only googleapis package required

Quick Start

Prerequisites

  • Node.js v18.x or later
  • Google Cloud Project with Drive API enabled
  • Service Account credentials with Drive API access

Setup

  1. Install dependencies:

    npm install
    
  2. Configure Service Account (see specs/001-drive-proxy-adapter/quickstart.md for detailed steps):

    • Create Service Account in Google Cloud Console
    • Download service account key JSON file
    • Share Drive files/folders with service account email
    • Place key file at config/service-account-key.json
  3. Configure environment:

    cp .env.example .env
    # Edit .env with your service account email
    
  4. Start the server:

    npm start
    # or for development with auto-reload:
    npm run dev
    
  5. Generate sitemap:

    curl http://localhost:3000/sitemap.xml
    

Usage Examples

# Get sitemap of all documents
curl http://localhost:3000/sitemap.xml

# Verify XML format
curl http://localhost:3000/sitemap.xml | xmllint --noout -

# Count documents in sitemap
curl http://localhost:3000/sitemap.xml | grep -c '<loc>'

Architecture

Monolithic Design

This project follows a monolithic architecture as specified in the project constitution:

  • Single Route File: ALL routing, business logic, and Drive API integration in src/proxy.js (~350 LOC)
  • Utility Modules: Separate files for auth, logging, XML utils (constitution-compliant separation of concerns)
  • Configuration as Data: JSON configuration in config/default.json loaded into global.config at startup
  • Minimal Dependencies: Only googleapis package for Drive API integration

Why Monolithic?

Rationale defined in constitution:

  1. Simplicity: Easy to understand, debug, and maintain
  2. Direct Code Flow: No dependency injection, no framework magic
  3. YAGNI Principle: No premature abstraction for a focused service

Structure

src/
├── server.js           # HTTP server, config loader, validation
├── proxy.js            # Request handler with FIFO queue integration
├── drive-client.js     # Drive API integration with 50k limit enforcement
├── sitemap-generator.js # Sitemap XML generation with RESTful URLs
├── queue.js            # FIFO request queue (sequential processing)
├── auth.js             # Service Account authentication
├── logger.js           # Structured logging utility
├── utils.js            # Request ID, validation
└── xml-utils.js        # XML escaping

Testing

Test Structure

Tests follow TDD workflow with real assertions:

tests/
├── contract/      # API contract tests (HTTP interface)
├── integration/   # Drive API integration tests
└── unit/          # Pure function unit tests

Running Tests

# All tests
npm test

# Specific test suites
npm run test:unit
npm run test:integration
npm run test:contract

Coverage Requirements

  • Minimum: 80% code coverage (enforced)
  • Tests Written First: TDD mandatory per constitution
  • Real Assertions: No placeholder tests

Configuration

Configuration is loaded from config/default.json and merged with environment variables:

{
  "server": {
    "port": 3000,
    "host": "0.0.0.0",
    "baseUrl": "http://localhost:3000"
  },
  "google": {
    "serviceAccountEmail": "service@project.iam.gserviceaccount.com",
    "serviceAccountKeyPath": "./config/service-account-key.json",
    "scopes": ["https://www.googleapis.com/auth/drive.readonly"]
  },
  "sitemap": {
    "maxUrls": 50000
  },
  "logging": {
    "level": "info"
  }
}

Environment variables override JSON config (e.g., PORT, GOOGLE_SERVICE_ACCOUNT_EMAIL).

API Documentation

Endpoints

  • GET /sitemap.xml - XML sitemap of all accessible documents (200 OK with XML body)
  • GET /* - All other paths return 404 Not Found (empty body)

Response Headers

Successful sitemap response (200 OK):

  • Content-Type: application/xml; charset=utf-8
  • X-Request-Id: req_<uuid> - Request tracing ID
  • X-Document-Count: <number> - Number of documents in sitemap

Error Responses

All errors return HTTP status code only with no response body (per specification):

  • 401 Unauthorized - Service account authentication failed
  • 404 Not Found - Path is not /sitemap.xml
  • 413 Payload Too Large - Document count exceeds 50,000 (sitemap protocol limit)
  • 429 Too Many Requests - Drive API rate limit exceeded (includes Retry-After header in seconds)
  • 500 Internal Server Error - Server error
  • 503 Service Unavailable - Drive API unavailable (NO RETRY per specification)

Performance Characteristics

  • Cold Start: < 10 seconds to accepting requests
  • Sitemap Generation: < 5 seconds for 10,000 documents
  • Concurrent Requests: 10+ without degradation
  • Memory Usage: < 256MB under normal load

Development

Project Structure

google-drive-content-adapter/
├── config/
│   └── default.json           # Configuration
├── src/
│   ├── server.js              # HTTP server
│   ├── proxy.js               # Request handler (monolithic)
│   ├── auth.js                # Service Account auth
│   ├── logger.js              # Structured logging
│   ├── utils.js               # Utilities
│   └── xml-utils.js           # XML escaping
├── tests/
│   ├── contract/              # API contract tests
│   ├── integration/           # Integration tests
│   └── unit/                  # Unit tests
├── specs/
│   └── 001-drive-proxy-adapter/  # Feature spec, plan, tasks
├── .env.example               # Environment template
├── package.json               # Dependencies and scripts
└── README.md                  # This file

Development Workflow

  1. Write Tests First (TDD)
  2. Implement Minimum Code
  3. Run Tests: npm test
  4. Run in Development: npm run dev

Deployment

Docker

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY src/ ./src/
COPY config/ ./config/
CMD ["node", "src/server.js"]
EXPOSE 3000
docker build -t drive-sitemap-adapter .
docker run -p 3000:3000 -v $(pwd)/config:/app/config drive-sitemap-adapter

Direct Node.js

NODE_ENV=production npm start

Troubleshooting

Authentication Failed (401)

  • Verify service account key file exists at config/service-account-key.json
  • Check service account email matches configuration
  • Ensure Drive API is enabled in Google Cloud project

Empty Sitemap

  • Service account needs access to Drive files
  • Share files/folders with service account email
  • Check service account has "Viewer" permission

Rate Limit (429)

  • Wait for time specified in Retry-After header
  • Reduce frequency of sitemap requests
  • Check Google Cloud Console quotas

License

ISC

Documentation

For detailed setup and usage instructions, see:

Description
Example of using GitHub speckit to build a proxy script that provides an adapter for crawling Google Drive
Readme 419 KiB
Languages
JavaScript 71.4%
Shell 28.6%