mail2couch/README.md
Ole-Morten Duesund 5a125ba410 feat: add MIT license
- Add the MIT license to the project.
- Update the README to reference the new license.
- Remove the license issue from the ANALYSIS.md document.
2025-08-02 15:32:47 +02:00

447 lines
14 KiB
Markdown

# mail2couch
A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB databases with intelligent incremental sync, comprehensive filtering, and native attachment support.
## Features
### Core Functionality
- **IMAP Email Backup**: Connect to any IMAP server (Gmail, Outlook, self-hosted)
- **CouchDB Storage**: Store emails as JSON documents with native CouchDB attachments
- **Incremental Sync**: Efficiently sync only new messages using IMAP SEARCH with timestamp tracking
- **Per-Account Databases**: Each mail source gets its own CouchDB database for better organization
- **Duplicate Prevention**: Automatic detection and prevention of duplicate message storage
### Sync Modes
- **Archive Mode**: Preserve all messages ever seen, even if deleted from mail server (default)
- **Sync Mode**: Maintain 1-to-1 relationship with mail server (removes deleted messages from CouchDB)
### Advanced Filtering
- **Wildcard Folder Patterns**: Use `*`, `?`, `[abc]` patterns for flexible folder selection
- **Keyword Filtering**: Filter messages by keywords in subjects, senders, or recipients
- **Date Filtering**: Process only messages since a specific date
- **Include/Exclude Logic**: Combine multiple filter types for precise control
### Message Processing
- **Full MIME Support**: Parse multipart messages, HTML/plain text, and embedded content
- **Native Attachments**: Store email attachments as CouchDB native attachments with compression
- **Complete Headers**: Preserve all email headers and metadata
- **UTF-8 Support**: Handle international characters and special content
### HTML Webmail Interface
- **Beautiful Web Interface**: Modern, responsive HTML presentations for viewing archived emails
- **Gmail-like Design**: Professional, mobile-friendly interface with clean typography
- **Message Lists**: Dynamic HTML lists with sorting, filtering, and folder organization
- **Individual Messages**: Rich HTML display with proper formatting, URL linking, and collapsible headers
- **Attachment Support**: Direct download links with file type and size information
- **Search Integration**: Full-text subject search with keyword highlighting
- **Folder Analytics**: Message count summaries and folder-based navigation
- **Mobile Responsive**: Optimized for desktop, tablet, and mobile viewing
### Operational Features
- **Automatic Config Discovery**: Finds configuration files in standard locations
- **Command Line Control**: GNU-style options with `--max-messages`/`-m` and `--config`/`-c` flags
- **Comprehensive Logging**: Detailed output for monitoring and troubleshooting
- **Error Resilience**: Graceful handling of network issues and server problems
## Quick Start
### Installation
1. **Install dependencies**:
```bash
# Go 1.21+ required
go version
```
2. **Clone and build**:
```bash
git clone <repository-url>
cd mail2couch/go
go build -o mail2couch .
```
### Basic Usage
1. **Create configuration file** (`config.json`):
```json
{
"couchDb": {
"url": "http://localhost:5984",
"user": "admin",
"password": "password"
},
"mailSources": [
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "your-email@gmail.com",
"password": "your-app-password",
"mode": "archive",
"folderFilter": {
"include": ["*"],
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam"]
}
}
]
}
```
2. **Run mail2couch**:
```bash
./mail2couch
```
The application will:
- Create a CouchDB database named `m2c_personal_gmail`
- Sync all folders except Trash and Spam
- Store messages with native attachments
- Track sync state for efficient incremental updates
## Configuration
### Configuration File Discovery
mail2couch automatically searches for configuration files in this order:
1. Path specified by `--config`/`-c` flag
2. `./config.json` (current directory)
3. `./config/config.json` (config subdirectory)
4. `~/.config/mail2couch/config.json` (user config directory)
5. `~/.mail2couch.json` (user home directory)
### Command Line Options
```bash
./mail2couch [options]
Options:
-c, --config FILE Path to configuration file
-m, --max-messages N Limit messages processed per mailbox per run (0 = unlimited)
-h, --help Show help message
```
### Folder Pattern Examples
| Pattern | Description | Matches |
|---------|-------------|---------|
| `"*"` | All folders | `INBOX`, `Sent`, `Work/Projects`, etc. |
| `"INBOX"` | Exact match | `INBOX` only |
| `"Work*"` | Prefix match | `Work`, `Work/Projects`, `WorkStuff` |
| `"*/Archive"` | Suffix match | `Personal/Archive`, `Work/Archive` |
| `"Work/*"` | Subfolder match | `Work/Projects`, `Work/Clients` |
### Keyword Filtering Examples
```json
{
"messageFilter": {
"subjectKeywords": ["urgent", "meeting", "invoice"],
"senderKeywords": ["@company.com", "noreply@"],
"recipientKeywords": ["team@", "support@"]
}
}
```
## Advanced Configuration Examples
See the [example configurations](#example-configurations) section below for detailed configuration scenarios.
## Testing
A comprehensive test environment is included with Podman containers:
```bash
cd test
# Quick automated testing (recommended)
./run-tests.sh # Complete integration test with automatic cleanup
# Specialized feature testing
./test-wildcard-patterns.sh # Test folder pattern matching
./test-incremental-sync.sh # Test incremental synchronization
# Manual testing environment
./start-test-env.sh # Start persistent test environment
# ... manual testing with various configurations ...
./stop-test-env.sh # Clean up when done
```
## Architecture
### Database Structure
- **Per-Account Databases**: Each mail source creates its own CouchDB database with `m2c_` prefix
- **Message Documents**: Each email becomes a CouchDB document with metadata
- **Native Attachments**: Email attachments stored as CouchDB attachments (compressed)
- **Sync Metadata**: Tracks incremental sync state per mailbox
- **HTML Webmail Views**: CouchDB design documents with show/list functions for web interface
### Document Structure
```json
{
"_id": "INBOX_12345",
"sourceUid": "12345",
"mailbox": "INBOX",
"from": ["sender@example.com"],
"to": ["recipient@example.com"],
"subject": "Sample Email",
"date": "2024-01-15T10:30:00Z",
"body": "Email content...",
"headers": {"Content-Type": ["text/plain"]},
"storedAt": "2024-01-15T10:35:00Z",
"docType": "mail",
"hasAttachments": true,
"_attachments": {
"document.pdf": {
"content_type": "application/pdf",
"length": 54321
}
}
}
```
### Accessing Stored Emails
Once mail2couch has synced your emails, you can access them through CouchDB's REST API:
#### Raw Data Access
```bash
# List all databases
http://localhost:5984/_all_dbs
# View database info
http://localhost:5984/{database}
# List all documents in database
http://localhost:5984/{database}/_all_docs
# Get individual message
http://localhost:5984/{database}/{message_id}
# Get message with attachments
http://localhost:5984/{database}/{message_id}/{attachment_name}
```
## Example Configurations
### Simple Configuration
Basic setup for a single Gmail account:
```json
{
"couchDb": {
"url": "http://localhost:5984",
"user": "admin",
"password": "password"
},
"mailSources": [
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "your-email@gmail.com",
"password": "your-app-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Sent"],
"exclude": []
},
"messageFilter": {
"since": "2024-01-01"
}
}
]
}
```
### Advanced Multi-Account Configuration
Complex setup with multiple accounts, filtering, and different sync modes:
```json
{
"couchDb": {
"url": "https://your-couchdb.example.com:5984",
"user": "backup_user",
"password": "secure_password"
},
"mailSources": [
{
"name": "Work Email",
"enabled": true,
"protocol": "imap",
"host": "outlook.office365.com",
"port": 993,
"user": "you@company.com",
"password": "app-password",
"mode": "sync",
"folderFilter": {
"include": ["*"],
"exclude": ["Deleted Items", "Junk Email", "Drafts"]
},
"messageFilter": {
"since": "2023-01-01",
"subjectKeywords": ["project", "meeting", "urgent"],
"senderKeywords": ["@company.com", "@client.com"]
}
},
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "personal@gmail.com",
"password": "gmail-app-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Important", "Work/*", "Personal/*"],
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam", "*Temp*"]
},
"messageFilter": {
"recipientKeywords": ["family@", "personal@"]
}
},
{
"name": "Self-Hosted Mail",
"enabled": true,
"protocol": "imap",
"host": "mail.yourdomain.com",
"port": 143,
"user": "admin@yourdomain.com",
"password": "mail-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Archive/*", "Projects/*"],
"exclude": ["*/Drafts", "Trash"]
},
"messageFilter": {
"since": "2023-06-01",
"subjectKeywords": ["invoice", "receipt", "statement"]
}
},
{
"name": "Legacy Account",
"enabled": false,
"protocol": "imap",
"host": "legacy.mailserver.com",
"port": 993,
"user": "old@account.com",
"password": "legacy-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX"],
"exclude": []
},
"messageFilter": {}
}
]
}
```
### Configuration Options Reference
#### CouchDB Configuration
- `url`: CouchDB server URL with protocol and port
- `user`: CouchDB username with database access
- `password`: CouchDB password
#### Mail Source Configuration
- `name`: Descriptive name (used for database naming)
- `enabled`: Boolean to enable/disable this source
- `protocol`: Only `"imap"` currently supported
- `host`: IMAP server hostname
- `port`: IMAP port (993 for TLS, 143 for plain, 3143 for testing)
- `user`: Email account username
- `password`: Email account password (use app passwords for Gmail/Outlook)
- `mode`: `"sync"` (mirror server) or `"archive"` (preserve all messages)
#### Folder Filter Configuration
- `include`: Array of folder patterns to process (empty = all folders)
- `exclude`: Array of folder patterns to skip
#### Message Filter Configuration
- `since`: Date string (YYYY-MM-DD) to process messages from
- `subjectKeywords`: Array of keywords that must appear in subject line
- `senderKeywords`: Array of keywords that must appear in sender addresses
- `recipientKeywords`: Array of keywords that must appear in recipient addresses
## Production Deployment
### Security Considerations
- Use app passwords instead of account passwords
- Store configuration files with restricted permissions (600)
- Use HTTPS for CouchDB connections in production
- Consider encrypting sensitive configuration data
### Monitoring and Maintenance
- Review sync metadata documents for sync health
- Monitor CouchDB database sizes and compaction
- Set up log rotation for application output
- Schedule regular backups of CouchDB databases
### Performance Tuning
- Use `--max-messages`/`-m` to limit processing load
- Run during off-peak hours for large initial syncs
- Monitor IMAP server rate limits and connection limits
- Consider running multiple instances for different accounts
## Troubleshooting
### Common Issues
**Connection Errors**:
- Verify IMAP server settings and credentials
- Check firewall and network connectivity
- Ensure correct ports (993 for TLS, 143 for plain)
**Authentication Failures**:
- Use app passwords for Gmail, Outlook, and other providers
- Enable "Less Secure Apps" if required by provider
- Verify account permissions and 2FA settings
**Sync Issues**:
- Check CouchDB connectivity and permissions
- Review sync metadata documents for error states
- Verify folder names and patterns match server structure
**Performance Problems**:
- Use date filtering (`since`) for large mailboxes
- Implement `--max-messages`/`-m` limits for initial syncs
- Monitor server-side rate limiting
For detailed troubleshooting, see the [test environment documentation](test/README.md).
## Future Plans
### CouchDB-Hosted Webmail Viewer
We plan to develop a comprehensive webmail interface for viewing the archived emails directly through CouchDB. This will include:
- **📧 Modern Web Interface**: A responsive, Gmail-style webmail viewer built on CouchDB design documents
- **🔍 Advanced Search**: Full-text search across subjects, senders, and message content
- **📁 Folder Organization**: Browse messages by mailbox with visual indicators and statistics
- **📎 Attachment Viewer**: Direct download and preview of email attachments
- **📱 Mobile Support**: Optimized interface for tablets and smartphones
- **🎨 Customizable Themes**: Multiple UI themes and layout options
- **⚡ Real-time Updates**: Live synchronization as new emails are archived
- **🔐 Authentication**: Secure access controls and user management
- **📊 Analytics Dashboard**: Email statistics and storage insights
This webmail viewer will be implemented as:
- **CouchDB Design Documents**: Views, shows, and list functions for data access
- **Self-contained HTML/CSS/JS**: No external dependencies or servers required
- **RESTful Architecture**: Clean API endpoints for integration with other tools
- **Progressive Enhancement**: Works with JavaScript disabled for basic functionality
The webmail interface will be a separate component that can be optionally installed alongside the core mail2couch storage functionality, maintaining the clean separation between data archival and presentation layers.
## Contributing
This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.