338 lines
7.9 KiB
Markdown
338 lines
7.9 KiB
Markdown
# Quickstart Guide: Document Export API
|
|
|
|
**Feature**: 002-document-export
|
|
**Date**: 2026-03-09
|
|
**Audience**: Developers and API consumers
|
|
|
|
## Overview
|
|
|
|
The Document Export API provides a simple HTTP endpoint for exporting Google Drive documents in multiple formats. The system automatically selects the best available format (Markdown > HTML > PDF) and streams the content with appropriate headers.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Start the Proxy Server
|
|
|
|
```bash
|
|
# Install dependencies (if not already done)
|
|
npm install
|
|
|
|
# Start server in development mode (with auto-reload)
|
|
npm run dev
|
|
|
|
# Or start in production mode
|
|
npm start
|
|
```
|
|
|
|
Server starts on `http://localhost:3000` (configurable via `config/default.json`)
|
|
|
|
---
|
|
|
|
### 2. Export a Document
|
|
|
|
**Basic Request**:
|
|
```bash
|
|
curl http://localhost:3000/documents/{DOCUMENT_ID}
|
|
```
|
|
|
|
**Example (Export Google Doc as Markdown)**:
|
|
```bash
|
|
curl http://localhost:3000/documents/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms \
|
|
-o output.md
|
|
```
|
|
|
|
**Example (Export Native PDF)**:
|
|
```bash
|
|
curl http://localhost:3000/documents/1AbcDeFgHiJkLmNoPqRsTuVwXyZ1234567890 \
|
|
-o output.pdf
|
|
```
|
|
|
|
**Save with Original Filename**:
|
|
```bash
|
|
# The Content-Disposition header includes the original filename
|
|
curl -OJ http://localhost:3000/documents/{DOCUMENT_ID}
|
|
```
|
|
|
|
---
|
|
|
|
## Finding Document IDs
|
|
|
|
### From Google Drive URL
|
|
|
|
Google Drive URLs contain the document ID:
|
|
|
|
```
|
|
https://docs.google.com/document/d/DOCUMENT_ID/edit
|
|
https://drive.google.com/file/d/DOCUMENT_ID/view
|
|
```
|
|
|
|
**Example**:
|
|
- URL: `https://docs.google.com/document/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms/edit`
|
|
- Document ID: `1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms`
|
|
|
|
---
|
|
|
|
## Supported Formats
|
|
|
|
### Google Workspace Documents
|
|
|
|
Automatically exported in best available format:
|
|
|
|
| Document Type | Preferred Format | Fallback Formats |
|
|
|---------------|------------------|------------------|
|
|
| Google Docs | Markdown (.md) | HTML (.html), PDF (.pdf) |
|
|
| Google Sheets | HTML (.html) | PDF (.pdf) |
|
|
| Google Slides | PDF (.pdf) | - |
|
|
|
|
### Native Files
|
|
|
|
| File Type | Behavior |
|
|
|-----------|----------|
|
|
| PDF | Streamed directly (no conversion) |
|
|
| Images, Videos, Archives | Returns 403 "mimetype not supported" |
|
|
|
|
---
|
|
|
|
## Response Headers
|
|
|
|
Every successful response includes:
|
|
|
|
```http
|
|
Content-Type: text/x-markdown | text/html | application/pdf
|
|
Content-Disposition: inline; filename="document-name.ext"
|
|
```
|
|
|
|
- **Content-Type**: Indicates the export format
|
|
- **Content-Disposition**: Provides the original filename with appropriate extension
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### Common Errors
|
|
|
|
| Error | Status | Cause | Solution |
|
|
|-------|--------|-------|----------|
|
|
| Document not found | 404 | Invalid ID | Verify document ID is correct |
|
|
| Unauthorized | 401 | No permission | Check Google Drive access permissions |
|
|
| mimetype not supported | 403 | Unsupported file type | Only Workspace docs and PDFs supported |
|
|
| Payload Too Large | 413 | Document >10MB | Use smaller documents or direct Drive access |
|
|
| Gateway Timeout | 504 | Operation >30s | Retry or use smaller documents |
|
|
|
|
### Error Response Format
|
|
|
|
All errors return plain text messages:
|
|
|
|
```bash
|
|
$ curl http://localhost:3000/documents/invalid-id
|
|
Document not found
|
|
|
|
$ curl http://localhost:3000/documents/{IMAGE_FILE_ID}
|
|
mimetype not supported
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced Usage
|
|
|
|
### Check Response Headers
|
|
|
|
```bash
|
|
# View headers without downloading content
|
|
curl -I http://localhost:3000/documents/{DOCUMENT_ID}
|
|
```
|
|
|
|
**Example Output**:
|
|
```http
|
|
HTTP/1.1 200 OK
|
|
Content-Type: text/x-markdown
|
|
Content-Disposition: inline; filename="Meeting_Notes.md"
|
|
```
|
|
|
|
### Stream Large Documents
|
|
|
|
```bash
|
|
# Stream to stdout (for processing)
|
|
curl http://localhost:3000/documents/{DOCUMENT_ID} | less
|
|
|
|
# Pipe to another tool
|
|
curl http://localhost:3000/documents/{DOCUMENT_ID} | pandoc -f markdown -t docx -o output.docx
|
|
```
|
|
|
|
### Integrate with Scripts
|
|
|
|
**Bash Script Example**:
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
DOCUMENT_ID="1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
|
|
OUTPUT_DIR="./exports"
|
|
|
|
# Create output directory
|
|
mkdir -p "$OUTPUT_DIR"
|
|
|
|
# Export document
|
|
curl "http://localhost:3000/documents/$DOCUMENT_ID" \
|
|
-o "$OUTPUT_DIR/document.md" \
|
|
--fail \
|
|
--show-error
|
|
|
|
if [ $? -eq 0 ]; then
|
|
echo "Export successful: $OUTPUT_DIR/document.md"
|
|
else
|
|
echo "Export failed"
|
|
exit 1
|
|
fi
|
|
```
|
|
|
|
**Node.js Example**:
|
|
```javascript
|
|
const axios = require('axios');
|
|
const fs = require('fs');
|
|
|
|
async function exportDocument(documentId, outputPath) {
|
|
const url = `http://localhost:3000/documents/${documentId}`;
|
|
|
|
try {
|
|
const response = await axios.get(url, {
|
|
responseType: 'stream',
|
|
timeout: 30000 // 30 second timeout
|
|
});
|
|
|
|
const writer = fs.createWriteStream(outputPath);
|
|
response.data.pipe(writer);
|
|
|
|
return new Promise((resolve, reject) => {
|
|
writer.on('finish', resolve);
|
|
writer.on('error', reject);
|
|
});
|
|
} catch (error) {
|
|
if (error.response) {
|
|
console.error(`Error ${error.response.status}: ${error.response.data}`);
|
|
} else {
|
|
console.error('Request failed:', error.message);
|
|
}
|
|
throw error;
|
|
}
|
|
}
|
|
|
|
// Usage
|
|
exportDocument('1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms', 'output.md')
|
|
.then(() => console.log('Export complete'))
|
|
.catch(err => console.error('Export failed:', err));
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Run Tests
|
|
|
|
```bash
|
|
# Run all tests
|
|
npm test
|
|
|
|
# Run specific test suites
|
|
npm run test:contract # API contract tests
|
|
npm run test:integration # Google Drive integration tests
|
|
npm run test:unit # Unit tests
|
|
```
|
|
|
|
### Manual Testing Checklist
|
|
|
|
- [ ] Export Google Doc as Markdown
|
|
- [ ] Export Google Sheet as HTML
|
|
- [ ] Export Google Slides as PDF
|
|
- [ ] Export native PDF file
|
|
- [ ] Test invalid document ID (should return 404)
|
|
- [ ] Test unsupported file type (should return 403)
|
|
- [ ] Verify Content-Disposition filename matches document name
|
|
- [ ] Verify Content-Type header matches export format
|
|
|
|
---
|
|
|
|
## Performance Characteristics
|
|
|
|
| Metric | Expected Value |
|
|
|--------|----------------|
|
|
| Response time (docs <10MB) | <5 seconds |
|
|
| Concurrent requests | 50+ supported |
|
|
| Success rate | >99% for valid docs |
|
|
| Memory per request | <1MB (streaming) |
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "Document not found" for valid document
|
|
|
|
1. Verify document ID is correct (check Google Drive URL)
|
|
2. Ensure Google Drive service account has access to the document
|
|
3. Check if document is in a shared drive (requires `supportsAllDrives=true`)
|
|
|
|
### "Unauthorized" error
|
|
|
|
1. Check Google Drive credentials in `src/globalVariables/google_drive_settings.json`
|
|
2. Verify service account has been granted access to the document
|
|
3. Check if access token is expired (auth handled by proxy layer)
|
|
|
|
### "Gateway Timeout" on large documents
|
|
|
|
1. Document may be >10MB (check file size in Google Drive)
|
|
2. Slow network connection to Google Drive API
|
|
3. Try again - transient network issue
|
|
|
|
### "mimetype not supported"
|
|
|
|
This is expected for non-document files:
|
|
- Images (.jpg, .png, .gif)
|
|
- Videos (.mp4, .mov)
|
|
- Archives (.zip, .tar)
|
|
- Executables (.exe, .dmg)
|
|
|
|
Only Google Workspace documents (Docs, Sheets, Slides) and native PDFs are supported.
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Server Settings
|
|
|
|
Edit `config/default.json`:
|
|
|
|
```json
|
|
{
|
|
"server": {
|
|
"host": "localhost",
|
|
"port": 3000
|
|
},
|
|
"logging": {
|
|
"level": "info"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Google Drive Credentials
|
|
|
|
Credentials stored in `src/globalVariables/google_drive_settings.json` (managed by existing infrastructure).
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
- **Integration**: Use the `/documents/:documentId` endpoint in your applications
|
|
- **Testing**: Run contract tests to verify behavior: `npm run test:contract`
|
|
- **Monitoring**: Check logs for errors: `npm run dev` shows real-time logs
|
|
- **Scaling**: Deploy multiple instances behind a load balancer for high traffic
|
|
|
|
---
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
1. Check error messages and status codes (see Error Handling section)
|
|
2. Review logs for detailed error information
|
|
3. Verify Google Drive permissions and credentials
|
|
4. Consult API contract: `specs/002-document-export/contracts/documents-export-api.md`
|