- Move all documentation to docs/ directory for better organization - Update ANALYSIS.md with current production status and resolved issues - Completely rewrite IMPLEMENTATION_COMPARISON.md with feature parity matrix - Update TODO.md to reflect completed milestones and future roadmap - Create comprehensive docs/README.md as documentation index - Update main README.md with project status and documentation links - All documentation now reflects August 2025 production-ready status - Both implementations verified as feature-complete with identical output 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
467 lines
15 KiB
Markdown
467 lines
15 KiB
Markdown
# mail2couch
|
|
|
|
A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB databases with intelligent incremental sync, comprehensive filtering, and native attachment support.
|
|
|
|
## Features
|
|
|
|
### Core Functionality
|
|
- **IMAP Email Backup**: Connect to any IMAP server (Gmail, Outlook, self-hosted)
|
|
- **CouchDB Storage**: Store emails as JSON documents with native CouchDB attachments
|
|
- **Incremental Sync**: Efficiently sync only new messages using IMAP SEARCH with timestamp tracking
|
|
- **Per-Account Databases**: Each mail source gets its own CouchDB database for better organization
|
|
- **Duplicate Prevention**: Automatic detection and prevention of duplicate message storage
|
|
|
|
### Sync Modes
|
|
- **Archive Mode**: Preserve all messages ever seen, even if deleted from mail server (default)
|
|
- **Sync Mode**: Maintain 1-to-1 relationship with mail server (removes deleted messages from CouchDB)
|
|
|
|
### Advanced Filtering
|
|
- **Wildcard Folder Patterns**: Use `*`, `?`, `[abc]` patterns for flexible folder selection
|
|
- **Keyword Filtering**: Filter messages by keywords in subjects, senders, or recipients
|
|
- **Date Filtering**: Process only messages since a specific date
|
|
- **Include/Exclude Logic**: Combine multiple filter types for precise control
|
|
|
|
### Message Processing
|
|
- **Full MIME Support**: Parse multipart messages, HTML/plain text, and embedded content
|
|
- **Native Attachments**: Store email attachments as CouchDB native attachments with compression
|
|
- **Complete Headers**: Preserve all email headers and metadata
|
|
- **UTF-8 Support**: Handle international characters and special content
|
|
|
|
### HTML Webmail Interface
|
|
- **Beautiful Web Interface**: Modern, responsive HTML presentations for viewing archived emails
|
|
- **Gmail-like Design**: Professional, mobile-friendly interface with clean typography
|
|
- **Message Lists**: Dynamic HTML lists with sorting, filtering, and folder organization
|
|
- **Individual Messages**: Rich HTML display with proper formatting, URL linking, and collapsible headers
|
|
- **Attachment Support**: Direct download links with file type and size information
|
|
- **Search Integration**: Full-text subject search with keyword highlighting
|
|
- **Folder Analytics**: Message count summaries and folder-based navigation
|
|
- **Mobile Responsive**: Optimized for desktop, tablet, and mobile viewing
|
|
|
|
### Operational Features
|
|
- **Automatic Config Discovery**: Finds configuration files in standard locations
|
|
- **Command Line Control**: GNU-style options with `--max-messages`/`-m` and `--config`/`-c` flags
|
|
- **Comprehensive Logging**: Detailed output for monitoring and troubleshooting
|
|
- **Error Resilience**: Graceful handling of network issues and server problems
|
|
|
|
## Project Status
|
|
|
|
**Production Ready** (August 2025): Both Go and Rust implementations are fully functional with identical feature sets and database output.
|
|
|
|
- ✅ **Complete Feature Parity**: Server-side filtering, binary attachments, incremental sync
|
|
- ✅ **Comprehensive Testing**: Verified identical CouchDB output between implementations
|
|
- ✅ **SystemD Integration**: Automated scheduling with timer units
|
|
- ✅ **Production Quality**: Robust error handling, retry logic, dry-run mode
|
|
|
|
## 📚 Documentation
|
|
|
|
Comprehensive documentation is available in the [`docs/`](docs/) directory:
|
|
|
|
- **[docs/README.md](docs/README.md)** - Documentation overview and quick start
|
|
- **[docs/ANALYSIS.md](docs/ANALYSIS.md)** - Technical analysis and current status
|
|
- **[docs/IMPLEMENTATION_COMPARISON.md](docs/IMPLEMENTATION_COMPARISON.md)** - Go vs Rust comparison
|
|
- **[docs/FOLDER_PATTERNS.md](docs/FOLDER_PATTERNS.md)** - Folder filtering guide
|
|
- **[docs/couchdb-schemas.md](docs/couchdb-schemas.md)** - Database schema documentation
|
|
- **[docs/TODO.md](docs/TODO.md)** - Development roadmap and future plans
|
|
|
|
## Quick Start
|
|
|
|
### Installation
|
|
|
|
1. **Install dependencies**:
|
|
```bash
|
|
# Go 1.21+ required
|
|
go version
|
|
```
|
|
|
|
2. **Clone and build**:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd mail2couch/go
|
|
go build -o mail2couch .
|
|
```
|
|
|
|
### Basic Usage
|
|
|
|
1. **Create configuration file** (`config.json`):
|
|
```json
|
|
{
|
|
"couchDb": {
|
|
"url": "http://localhost:5984",
|
|
"user": "admin",
|
|
"password": "password"
|
|
},
|
|
"mailSources": [
|
|
{
|
|
"name": "Personal Gmail",
|
|
"enabled": true,
|
|
"protocol": "imap",
|
|
"host": "imap.gmail.com",
|
|
"port": 993,
|
|
"user": "your-email@gmail.com",
|
|
"password": "your-app-password",
|
|
"mode": "archive",
|
|
"folderFilter": {
|
|
"include": ["*"],
|
|
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam"]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
2. **Run mail2couch**:
|
|
```bash
|
|
./mail2couch
|
|
```
|
|
|
|
The application will:
|
|
- Create a CouchDB database named `m2c_personal_gmail`
|
|
- Sync all folders except Trash and Spam
|
|
- Store messages with native attachments
|
|
- Track sync state for efficient incremental updates
|
|
|
|
## Configuration
|
|
|
|
### Configuration File Discovery
|
|
|
|
mail2couch automatically searches for configuration files in this order:
|
|
1. Path specified by `--config`/`-c` flag
|
|
2. `./config.json` (current directory)
|
|
3. `./config/config.json` (config subdirectory)
|
|
4. `~/.config/mail2couch/config.json` (user config directory)
|
|
5. `~/.mail2couch.json` (user home directory)
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
./mail2couch [options]
|
|
|
|
Options:
|
|
-c, --config FILE Path to configuration file
|
|
-m, --max-messages N Limit messages processed per mailbox per run (0 = unlimited)
|
|
-h, --help Show help message
|
|
```
|
|
|
|
### Folder Pattern Examples
|
|
|
|
| Pattern | Description | Matches |
|
|
|---------|-------------|---------|
|
|
| `"*"` | All folders | `INBOX`, `Sent`, `Work/Projects`, etc. |
|
|
| `"INBOX"` | Exact match | `INBOX` only |
|
|
| `"Work*"` | Prefix match | `Work`, `Work/Projects`, `WorkStuff` |
|
|
| `"*/Archive"` | Suffix match | `Personal/Archive`, `Work/Archive` |
|
|
| `"Work/*"` | Subfolder match | `Work/Projects`, `Work/Clients` |
|
|
|
|
### Keyword Filtering Examples
|
|
|
|
```json
|
|
{
|
|
"messageFilter": {
|
|
"subjectKeywords": ["urgent", "meeting", "invoice"],
|
|
"senderKeywords": ["@company.com", "noreply@"],
|
|
"recipientKeywords": ["team@", "support@"]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Advanced Configuration Examples
|
|
|
|
See the [example configurations](#example-configurations) section below for detailed configuration scenarios.
|
|
|
|
## Testing
|
|
|
|
A comprehensive test environment is included with Podman containers:
|
|
|
|
```bash
|
|
cd test
|
|
|
|
# Quick automated testing (recommended)
|
|
./run-tests.sh # Complete integration test with automatic cleanup
|
|
|
|
# Specialized feature testing
|
|
./test-wildcard-patterns.sh # Test folder pattern matching
|
|
./test-incremental-sync.sh # Test incremental synchronization
|
|
|
|
# Manual testing environment
|
|
./start-test-env.sh # Start persistent test environment
|
|
# ... manual testing with various configurations ...
|
|
./stop-test-env.sh # Clean up when done
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Database Structure
|
|
- **Per-Account Databases**: Each mail source creates its own CouchDB database with `m2c_` prefix
|
|
- **Message Documents**: Each email becomes a CouchDB document with metadata
|
|
- **Native Attachments**: Email attachments stored as CouchDB attachments (compressed)
|
|
- **Sync Metadata**: Tracks incremental sync state per mailbox
|
|
- **HTML Webmail Views**: CouchDB design documents with show/list functions for web interface
|
|
|
|
### Document Structure
|
|
```json
|
|
{
|
|
"_id": "INBOX_12345",
|
|
"sourceUid": "12345",
|
|
"mailbox": "INBOX",
|
|
"from": ["sender@example.com"],
|
|
"to": ["recipient@example.com"],
|
|
"subject": "Sample Email",
|
|
"date": "2024-01-15T10:30:00Z",
|
|
"body": "Email content...",
|
|
"headers": {"Content-Type": ["text/plain"]},
|
|
"storedAt": "2024-01-15T10:35:00Z",
|
|
"docType": "mail",
|
|
"hasAttachments": true,
|
|
"_attachments": {
|
|
"document.pdf": {
|
|
"content_type": "application/pdf",
|
|
"length": 54321
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Accessing Stored Emails
|
|
|
|
Once mail2couch has synced your emails, you can access them through CouchDB's REST API:
|
|
|
|
#### Raw Data Access
|
|
```bash
|
|
# List all databases
|
|
http://localhost:5984/_all_dbs
|
|
|
|
# View database info
|
|
http://localhost:5984/{database}
|
|
|
|
# List all documents in database
|
|
http://localhost:5984/{database}/_all_docs
|
|
|
|
# Get individual message
|
|
http://localhost:5984/{database}/{message_id}
|
|
|
|
# Get message with attachments
|
|
http://localhost:5984/{database}/{message_id}/{attachment_name}
|
|
```
|
|
|
|
## Example Configurations
|
|
|
|
### Simple Configuration
|
|
Basic setup for a single Gmail account:
|
|
|
|
```json
|
|
{
|
|
"couchDb": {
|
|
"url": "http://localhost:5984",
|
|
"user": "admin",
|
|
"password": "password"
|
|
},
|
|
"mailSources": [
|
|
{
|
|
"name": "Personal Gmail",
|
|
"enabled": true,
|
|
"protocol": "imap",
|
|
"host": "imap.gmail.com",
|
|
"port": 993,
|
|
"user": "your-email@gmail.com",
|
|
"password": "your-app-password",
|
|
"mode": "archive",
|
|
"folderFilter": {
|
|
"include": ["INBOX", "Sent"],
|
|
"exclude": []
|
|
},
|
|
"messageFilter": {
|
|
"since": "2024-01-01"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Advanced Multi-Account Configuration
|
|
Complex setup with multiple accounts, filtering, and different sync modes:
|
|
|
|
```json
|
|
{
|
|
"couchDb": {
|
|
"url": "https://your-couchdb.example.com:5984",
|
|
"user": "backup_user",
|
|
"password": "secure_password"
|
|
},
|
|
"mailSources": [
|
|
{
|
|
"name": "Work Email",
|
|
"enabled": true,
|
|
"protocol": "imap",
|
|
"host": "outlook.office365.com",
|
|
"port": 993,
|
|
"user": "you@company.com",
|
|
"password": "app-password",
|
|
"mode": "sync",
|
|
"folderFilter": {
|
|
"include": ["*"],
|
|
"exclude": ["Deleted Items", "Junk Email", "Drafts"]
|
|
},
|
|
"messageFilter": {
|
|
"since": "2023-01-01",
|
|
"subjectKeywords": ["project", "meeting", "urgent"],
|
|
"senderKeywords": ["@company.com", "@client.com"]
|
|
}
|
|
},
|
|
{
|
|
"name": "Personal Gmail",
|
|
"enabled": true,
|
|
"protocol": "imap",
|
|
"host": "imap.gmail.com",
|
|
"port": 993,
|
|
"user": "personal@gmail.com",
|
|
"password": "gmail-app-password",
|
|
"mode": "archive",
|
|
"folderFilter": {
|
|
"include": ["INBOX", "Important", "Work/*", "Personal/*"],
|
|
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam", "*Temp*"]
|
|
},
|
|
"messageFilter": {
|
|
"recipientKeywords": ["family@", "personal@"]
|
|
}
|
|
},
|
|
{
|
|
"name": "Self-Hosted Mail",
|
|
"enabled": true,
|
|
"protocol": "imap",
|
|
"host": "mail.yourdomain.com",
|
|
"port": 143,
|
|
"user": "admin@yourdomain.com",
|
|
"password": "mail-password",
|
|
"mode": "archive",
|
|
"folderFilter": {
|
|
"include": ["INBOX", "Archive/*", "Projects/*"],
|
|
"exclude": ["*/Drafts", "Trash"]
|
|
},
|
|
"messageFilter": {
|
|
"since": "2023-06-01",
|
|
"subjectKeywords": ["invoice", "receipt", "statement"]
|
|
}
|
|
},
|
|
{
|
|
"name": "Legacy Account",
|
|
"enabled": false,
|
|
"protocol": "imap",
|
|
"host": "legacy.mailserver.com",
|
|
"port": 993,
|
|
"user": "old@account.com",
|
|
"password": "legacy-password",
|
|
"mode": "archive",
|
|
"folderFilter": {
|
|
"include": ["INBOX"],
|
|
"exclude": []
|
|
},
|
|
"messageFilter": {}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Configuration Options Reference
|
|
|
|
#### CouchDB Configuration
|
|
- `url`: CouchDB server URL with protocol and port
|
|
- `user`: CouchDB username with database access
|
|
- `password`: CouchDB password
|
|
|
|
#### Mail Source Configuration
|
|
- `name`: Descriptive name (used for database naming)
|
|
- `enabled`: Boolean to enable/disable this source
|
|
- `protocol`: Only `"imap"` currently supported
|
|
- `host`: IMAP server hostname
|
|
- `port`: IMAP port (993 for TLS, 143 for plain, 3143 for testing)
|
|
- `user`: Email account username
|
|
- `password`: Email account password (use app passwords for Gmail/Outlook)
|
|
- `mode`: `"sync"` (mirror server) or `"archive"` (preserve all messages)
|
|
|
|
#### Folder Filter Configuration
|
|
- `include`: Array of folder patterns to process (empty = all folders)
|
|
- `exclude`: Array of folder patterns to skip
|
|
|
|
#### Message Filter Configuration
|
|
- `since`: Date string (YYYY-MM-DD) to process messages from
|
|
- `subjectKeywords`: Array of keywords that must appear in subject line
|
|
- `senderKeywords`: Array of keywords that must appear in sender addresses
|
|
- `recipientKeywords`: Array of keywords that must appear in recipient addresses
|
|
|
|
## Production Deployment
|
|
|
|
### Security Considerations
|
|
- Use app passwords instead of account passwords
|
|
- Store configuration files with restricted permissions (600)
|
|
- Use HTTPS for CouchDB connections in production
|
|
- Consider encrypting sensitive configuration data
|
|
|
|
### Monitoring and Maintenance
|
|
- Review sync metadata documents for sync health
|
|
- Monitor CouchDB database sizes and compaction
|
|
- Set up log rotation for application output
|
|
- Schedule regular backups of CouchDB databases
|
|
|
|
### Performance Tuning
|
|
- Use `--max-messages`/`-m` to limit processing load
|
|
- Run during off-peak hours for large initial syncs
|
|
- Monitor IMAP server rate limits and connection limits
|
|
- Consider running multiple instances for different accounts
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Connection Errors**:
|
|
- Verify IMAP server settings and credentials
|
|
- Check firewall and network connectivity
|
|
- Ensure correct ports (993 for TLS, 143 for plain)
|
|
|
|
**Authentication Failures**:
|
|
- Use app passwords for Gmail, Outlook, and other providers
|
|
- Enable "Less Secure Apps" if required by provider
|
|
- Verify account permissions and 2FA settings
|
|
|
|
**Sync Issues**:
|
|
- Check CouchDB connectivity and permissions
|
|
- Review sync metadata documents for error states
|
|
- Verify folder names and patterns match server structure
|
|
|
|
**Performance Problems**:
|
|
- Use date filtering (`since`) for large mailboxes
|
|
- Implement `--max-messages`/`-m` limits for initial syncs
|
|
- Monitor server-side rate limiting
|
|
|
|
For detailed troubleshooting, see the [test environment documentation](test/README.md).
|
|
|
|
## Future Plans
|
|
|
|
### CouchDB-Hosted Webmail Viewer
|
|
|
|
We plan to develop a comprehensive webmail interface for viewing the archived emails directly through CouchDB. This will include:
|
|
|
|
- **📧 Modern Web Interface**: A responsive, Gmail-style webmail viewer built on CouchDB design documents
|
|
- **🔍 Advanced Search**: Full-text search across subjects, senders, and message content
|
|
- **📁 Folder Organization**: Browse messages by mailbox with visual indicators and statistics
|
|
- **📎 Attachment Viewer**: Direct download and preview of email attachments
|
|
- **📱 Mobile Support**: Optimized interface for tablets and smartphones
|
|
- **🎨 Customizable Themes**: Multiple UI themes and layout options
|
|
- **⚡ Real-time Updates**: Live synchronization as new emails are archived
|
|
- **🔐 Authentication**: Secure access controls and user management
|
|
- **📊 Analytics Dashboard**: Email statistics and storage insights
|
|
|
|
This webmail viewer will be implemented as:
|
|
- **CouchDB Design Documents**: Views, shows, and list functions for data access
|
|
- **Self-contained HTML/CSS/JS**: No external dependencies or servers required
|
|
- **RESTful Architecture**: Clean API endpoints for integration with other tools
|
|
- **Progressive Enhancement**: Works with JavaScript disabled for basic functionality
|
|
|
|
The webmail interface will be a separate component that can be optionally installed alongside the core mail2couch storage functionality, maintaining the clean separation between data archival and presentation layers.
|
|
|
|
## Contributing
|
|
|
|
This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details.
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|