Added new feature for document export, including API contracts, data model, implementation plan, and tests. Updated related configurations and instructions.

This commit is contained in:
2026-03-10 16:25:09 -05:00
parent 2acb04ad76
commit bf6f2eebd6
22 changed files with 2856 additions and 64 deletions

View File

@@ -0,0 +1,337 @@
# Quickstart Guide: Document Export API
**Feature**: 002-document-export
**Date**: 2026-03-09
**Audience**: Developers and API consumers
## Overview
The Document Export API provides a simple HTTP endpoint for exporting Google Drive documents in multiple formats. The system automatically selects the best available format (Markdown > HTML > PDF) and streams the content with appropriate headers.
---
## Quick Start
### 1. Start the Proxy Server
```bash
# Install dependencies (if not already done)
npm install
# Start server in development mode (with auto-reload)
npm run dev
# Or start in production mode
npm start
```
Server starts on `http://localhost:3000` (configurable via `config/default.json`)
---
### 2. Export a Document
**Basic Request**:
```bash
curl http://localhost:3000/documents/{DOCUMENT_ID}
```
**Example (Export Google Doc as Markdown)**:
```bash
curl http://localhost:3000/documents/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms \
-o output.md
```
**Example (Export Native PDF)**:
```bash
curl http://localhost:3000/documents/1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890 \
-o output.pdf
```
**Save with Original Filename**:
```bash
# The Content-Disposition header includes the original filename
curl -OJ http://localhost:3000/documents/{DOCUMENT_ID}
```
---
## Finding Document IDs
### From Google Drive URL
Google Drive URLs contain the document ID:
```
https://docs.google.com/document/d/DOCUMENT_ID/edit
https://drive.google.com/file/d/DOCUMENT_ID/view
```
**Example**:
- URL: `https://docs.google.com/document/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit`
- Document ID: `1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms`
---
## Supported Formats
### Google Workspace Documents
Automatically exported in best available format:
| Document Type | Preferred Format | Fallback Formats |
|---------------|------------------|------------------|
| Google Docs | Markdown (.md) | HTML (.html), PDF (.pdf) |
| Google Sheets | HTML (.html) | PDF (.pdf) |
| Google Slides | PDF (.pdf) | - |
### Native Files
| File Type | Behavior |
|-----------|----------|
| PDF | Streamed directly (no conversion) |
| Images, Videos, Archives | Returns 403 "mimetype not supported" |
---
## Response Headers
Every successful response includes:
```http
Content-Type: text/x-markdown | text/html | application/pdf
Content-Disposition: inline; filename="document-name.ext"
```
- **Content-Type**: Indicates the export format
- **Content-Disposition**: Provides the original filename with appropriate extension
---
## Error Handling
### Common Errors
| Error | Status | Cause | Solution |
|-------|--------|-------|----------|
| Document not found | 404 | Invalid ID | Verify document ID is correct |
| Unauthorized | 401 | No permission | Check Google Drive access permissions |
| mimetype not supported | 403 | Unsupported file type | Only Workspace docs and PDFs supported |
| Payload Too Large | 413 | Document >10MB | Use smaller documents or direct Drive access |
| Gateway Timeout | 504 | Operation >30s | Retry or use smaller documents |
### Error Response Format
All errors return plain text messages:
```bash
$ curl http://localhost:3000/documents/invalid-id
Document not found
$ curl http://localhost:3000/documents/{IMAGE_FILE_ID}
mimetype not supported
```
---
## Advanced Usage
### Check Response Headers
```bash
# View headers without downloading content
curl -I http://localhost:3000/documents/{DOCUMENT_ID}
```
**Example Output**:
```http
HTTP/1.1 200 OK
Content-Type: text/x-markdown
Content-Disposition: inline; filename="Meeting_Notes.md"
```
### Stream Large Documents
```bash
# Stream to stdout (for processing)
curl http://localhost:3000/documents/{DOCUMENT_ID} | less
# Pipe to another tool
curl http://localhost:3000/documents/{DOCUMENT_ID} | pandoc -f markdown -t docx -o output.docx
```
### Integrate with Scripts
**Bash Script Example**:
```bash
#!/bin/bash
DOCUMENT_ID="1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
OUTPUT_DIR="./exports"
# Create output directory
mkdir -p "$OUTPUT_DIR"
# Export document
curl "http://localhost:3000/documents/$DOCUMENT_ID" \
-o "$OUTPUT_DIR/document.md" \
--fail \
--show-error
if [ $? -eq 0 ]; then
echo "Export successful: $OUTPUT_DIR/document.md"
else
echo "Export failed"
exit 1
fi
```
**Node.js Example**:
```javascript
const axios = require('axios');
const fs = require('fs');
async function exportDocument(documentId, outputPath) {
const url = `http://localhost:3000/documents/${documentId}`;
try {
const response = await axios.get(url, {
responseType: 'stream',
timeout: 30000 // 30 second timeout
});
const writer = fs.createWriteStream(outputPath);
response.data.pipe(writer);
return new Promise((resolve, reject) => {
writer.on('finish', resolve);
writer.on('error', reject);
});
} catch (error) {
if (error.response) {
console.error(`Error ${error.response.status}: ${error.response.data}`);
} else {
console.error('Request failed:', error.message);
}
throw error;
}
}
// Usage
exportDocument('1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms', 'output.md')
.then(() => console.log('Export complete'))
.catch(err => console.error('Export failed:', err));
```
---
## Testing
### Run Tests
```bash
# Run all tests
npm test
# Run specific test suites
npm run test:contract # API contract tests
npm run test:integration # Google Drive integration tests
npm run test:unit # Unit tests
```
### Manual Testing Checklist
- [ ] Export Google Doc as Markdown
- [ ] Export Google Sheet as HTML
- [ ] Export Google Slides as PDF
- [ ] Export native PDF file
- [ ] Test invalid document ID (should return 404)
- [ ] Test unsupported file type (should return 403)
- [ ] Verify Content-Disposition filename matches document name
- [ ] Verify Content-Type header matches export format
---
## Performance Characteristics
| Metric | Expected Value |
|--------|----------------|
| Response time (docs <10MB) | <5 seconds |
| Concurrent requests | 50+ supported |
| Success rate | >99% for valid docs |
| Memory per request | <1MB (streaming) |
---
## Troubleshooting
### "Document not found" for valid document
1. Verify document ID is correct (check Google Drive URL)
2. Ensure Google Drive service account has access to the document
3. Check if document is in a shared drive (requires `supportsAllDrives=true`)
### "Unauthorized" error
1. Check Google Drive credentials in `src/globalVariables/google_drive_settings.json`
2. Verify service account has been granted access to the document
3. Check if access token is expired (auth handled by proxy layer)
### "Gateway Timeout" on large documents
1. Document may be >10MB (check file size in Google Drive)
2. Slow network connection to Google Drive API
3. Try again - transient network issue
### "mimetype not supported"
This is expected for non-document files:
- Images (.jpg, .png, .gif)
- Videos (.mp4, .mov)
- Archives (.zip, .tar)
- Executables (.exe, .dmg)
Only Google Workspace documents (Docs, Sheets, Slides) and native PDFs are supported.
---
## Configuration
### Server Settings
Edit `config/default.json`:
```json
{
"server": {
"host": "localhost",
"port": 3000
},
"logging": {
"level": "info"
}
}
```
### Google Drive Credentials
Credentials stored in `src/globalVariables/google_drive_settings.json` (managed by existing infrastructure).
---
## Next Steps
- **Integration**: Use the `/documents/:documentId` endpoint in your applications
- **Testing**: Run contract tests to verify behavior: `npm run test:contract`
- **Monitoring**: Check logs for errors: `npm run dev` shows real-time logs
- **Scaling**: Deploy multiple instances behind a load balancer for high traffic
---
## Support
For issues or questions:
1. Check error messages and status codes (see Error Handling section)
2. Review logs for detailed error information
3. Verify Google Drive permissions and credentials
4. Consult API contract: `specs/002-document-export/contracts/documents-export-api.md`