feat: add comprehensive README documentation and clean up configuration

## Documentation Enhancements
- Create comprehensive README with installation, configuration, and usage examples
- Add simple, advanced, and provider-specific configuration examples
- Document all features: incremental sync, wildcard patterns, keyword filtering, attachment support
- Include production deployment guidance and troubleshooting section
- Add architecture documentation with database structure and document format examples

## Configuration Cleanup
- Remove unnecessary `database` field from CouchDB configuration
- Add `m2c_` prefix to all CouchDB database names for better namespace isolation
- Update GenerateAccountDBName() to consistently prefix databases with `m2c_`
- Clean up all configuration examples to remove deprecated database field

## Test Environment Simplification
- Simplify test script structure to eliminate confusion and redundancy
- Remove redundant populate-test-messages.sh wrapper script
- Update run-tests.sh to be comprehensive automated test with cleanup
- Maintain clear separation: automated tests vs manual testing environment
- Update all test scripts to expect m2c-prefixed database names

## Configuration Examples Added
- config-simple.json: Basic single Gmail account setup
- config-advanced.json: Multi-account with complex filtering and different providers
- config-providers.json: Real-world configurations for Gmail, Outlook, Yahoo, iCloud

## Benefits
- Clear documentation for users from beginner to advanced
- Namespace isolation prevents database conflicts in shared CouchDB instances
- Simplified test workflow eliminates user confusion about which scripts to use
- Comprehensive examples cover common email provider configurations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Ole-Morten Duesund 2025-08-01 21:26:53 +02:00
commit c2ad55eaaf
17 changed files with 1139 additions and 111 deletions

388
README.md
View file

@ -1,5 +1,389 @@
# mail2couch
A utility to back up mail from various sources to couchdb
A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB databases with intelligent incremental sync, comprehensive filtering, and native attachment support.
At least two implementations will be available, on in Rust and one in Go.
## Features
### Core Functionality
- **IMAP Email Backup**: Connect to any IMAP server (Gmail, Outlook, self-hosted)
- **CouchDB Storage**: Store emails as JSON documents with native CouchDB attachments
- **Incremental Sync**: Efficiently sync only new messages using IMAP SEARCH with timestamp tracking
- **Per-Account Databases**: Each mail source gets its own CouchDB database for better organization
- **Duplicate Prevention**: Automatic detection and prevention of duplicate message storage
### Sync Modes
- **Archive Mode**: Preserve all messages ever seen, even if deleted from mail server (default)
- **Sync Mode**: Maintain 1-to-1 relationship with mail server (removes deleted messages from CouchDB)
### Advanced Filtering
- **Wildcard Folder Patterns**: Use `*`, `?`, `[abc]` patterns for flexible folder selection
- **Keyword Filtering**: Filter messages by keywords in subjects, senders, or recipients
- **Date Filtering**: Process only messages since a specific date
- **Include/Exclude Logic**: Combine multiple filter types for precise control
### Message Processing
- **Full MIME Support**: Parse multipart messages, HTML/plain text, and embedded content
- **Native Attachments**: Store email attachments as CouchDB native attachments with compression
- **Complete Headers**: Preserve all email headers and metadata
- **UTF-8 Support**: Handle international characters and special content
### Operational Features
- **Automatic Config Discovery**: Finds configuration files in standard locations
- **Command Line Control**: Override settings with `--max-messages` and `--config` flags
- **Comprehensive Logging**: Detailed output for monitoring and troubleshooting
- **Error Resilience**: Graceful handling of network issues and server problems
## Quick Start
### Installation
1. **Install dependencies**:
```bash
# Go 1.21+ required
go version
```
2. **Clone and build**:
```bash
git clone <repository-url>
cd mail2couch/go
go build -o mail2couch .
```
### Basic Usage
1. **Create configuration file** (`config.json`):
```json
{
"couchDb": {
"url": "http://localhost:5984",
"user": "admin",
"password": "password"
},
"mailSources": [
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "your-email@gmail.com",
"password": "your-app-password",
"mode": "archive",
"folderFilter": {
"include": ["*"],
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam"]
}
}
]
}
```
2. **Run mail2couch**:
```bash
./mail2couch
```
The application will:
- Create a CouchDB database named `m2c_personal_gmail`
- Sync all folders except Trash and Spam
- Store messages with native attachments
- Track sync state for efficient incremental updates
## Configuration
### Configuration File Discovery
mail2couch automatically searches for configuration files in this order:
1. Path specified by `--config` flag
2. `./config.json` (current directory)
3. `./config/config.json` (config subdirectory)
4. `~/.config/mail2couch/config.json` (user config directory)
5. `~/.mail2couch.json` (user home directory)
### Command Line Options
```bash
./mail2couch [options]
Options:
--config PATH Specify configuration file path
--max-messages N Limit messages processed per mailbox per run (0 = unlimited)
```
### Folder Pattern Examples
| Pattern | Description | Matches |
|---------|-------------|---------|
| `"*"` | All folders | `INBOX`, `Sent`, `Work/Projects`, etc. |
| `"INBOX"` | Exact match | `INBOX` only |
| `"Work*"` | Prefix match | `Work`, `Work/Projects`, `WorkStuff` |
| `"*/Archive"` | Suffix match | `Personal/Archive`, `Work/Archive` |
| `"Work/*"` | Subfolder match | `Work/Projects`, `Work/Clients` |
### Keyword Filtering Examples
```json
{
"messageFilter": {
"subjectKeywords": ["urgent", "meeting", "invoice"],
"senderKeywords": ["@company.com", "noreply@"],
"recipientKeywords": ["team@", "support@"]
}
}
```
## Advanced Configuration Examples
See the [example configurations](#example-configurations) section below for detailed configuration scenarios.
## Testing
A comprehensive test environment is included with Podman containers:
```bash
cd test
# Quick automated testing (recommended)
./run-tests.sh # Complete integration test with automatic cleanup
# Specialized feature testing
./test-wildcard-patterns.sh # Test folder pattern matching
./test-incremental-sync.sh # Test incremental synchronization
# Manual testing environment
./start-test-env.sh # Start persistent test environment
# ... manual testing with various configurations ...
./stop-test-env.sh # Clean up when done
```
## Architecture
### Database Structure
- **Per-Account Databases**: Each mail source creates its own CouchDB database with `m2c_` prefix
- **Message Documents**: Each email becomes a CouchDB document with metadata
- **Native Attachments**: Email attachments stored as CouchDB attachments (compressed)
- **Sync Metadata**: Tracks incremental sync state per mailbox
### Document Structure
```json
{
"_id": "INBOX_12345",
"sourceUid": "12345",
"mailbox": "INBOX",
"from": ["sender@example.com"],
"to": ["recipient@example.com"],
"subject": "Sample Email",
"date": "2024-01-15T10:30:00Z",
"body": "Email content...",
"headers": {"Content-Type": ["text/plain"]},
"storedAt": "2024-01-15T10:35:00Z",
"docType": "mail",
"hasAttachments": true,
"_attachments": {
"document.pdf": {
"content_type": "application/pdf",
"length": 54321
}
}
}
```
## Example Configurations
### Simple Configuration
Basic setup for a single Gmail account:
```json
{
"couchDb": {
"url": "http://localhost:5984",
"user": "admin",
"password": "password"
},
"mailSources": [
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "your-email@gmail.com",
"password": "your-app-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Sent"],
"exclude": []
},
"messageFilter": {
"since": "2024-01-01"
}
}
]
}
```
### Advanced Multi-Account Configuration
Complex setup with multiple accounts, filtering, and different sync modes:
```json
{
"couchDb": {
"url": "https://your-couchdb.example.com:5984",
"user": "backup_user",
"password": "secure_password"
},
"mailSources": [
{
"name": "Work Email",
"enabled": true,
"protocol": "imap",
"host": "outlook.office365.com",
"port": 993,
"user": "you@company.com",
"password": "app-password",
"mode": "sync",
"folderFilter": {
"include": ["*"],
"exclude": ["Deleted Items", "Junk Email", "Drafts"]
},
"messageFilter": {
"since": "2023-01-01",
"subjectKeywords": ["project", "meeting", "urgent"],
"senderKeywords": ["@company.com", "@client.com"]
}
},
{
"name": "Personal Gmail",
"enabled": true,
"protocol": "imap",
"host": "imap.gmail.com",
"port": 993,
"user": "personal@gmail.com",
"password": "gmail-app-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Important", "Work/*", "Personal/*"],
"exclude": ["[Gmail]/Trash", "[Gmail]/Spam", "*Temp*"]
},
"messageFilter": {
"recipientKeywords": ["family@", "personal@"]
}
},
{
"name": "Self-Hosted Mail",
"enabled": true,
"protocol": "imap",
"host": "mail.yourdomain.com",
"port": 143,
"user": "admin@yourdomain.com",
"password": "mail-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX", "Archive/*", "Projects/*"],
"exclude": ["*/Drafts", "Trash"]
},
"messageFilter": {
"since": "2023-06-01",
"subjectKeywords": ["invoice", "receipt", "statement"]
}
},
{
"name": "Legacy Account",
"enabled": false,
"protocol": "imap",
"host": "legacy.mailserver.com",
"port": 993,
"user": "old@account.com",
"password": "legacy-password",
"mode": "archive",
"folderFilter": {
"include": ["INBOX"],
"exclude": []
},
"messageFilter": {}
}
]
}
```
### Configuration Options Reference
#### CouchDB Configuration
- `url`: CouchDB server URL with protocol and port
- `user`: CouchDB username with database access
- `password`: CouchDB password
#### Mail Source Configuration
- `name`: Descriptive name (used for database naming)
- `enabled`: Boolean to enable/disable this source
- `protocol`: Only `"imap"` currently supported
- `host`: IMAP server hostname
- `port`: IMAP port (993 for TLS, 143 for plain, 3143 for testing)
- `user`: Email account username
- `password`: Email account password (use app passwords for Gmail/Outlook)
- `mode`: `"sync"` (mirror server) or `"archive"` (preserve all messages)
#### Folder Filter Configuration
- `include`: Array of folder patterns to process (empty = all folders)
- `exclude`: Array of folder patterns to skip
#### Message Filter Configuration
- `since`: Date string (YYYY-MM-DD) to process messages from
- `subjectKeywords`: Array of keywords that must appear in subject line
- `senderKeywords`: Array of keywords that must appear in sender addresses
- `recipientKeywords`: Array of keywords that must appear in recipient addresses
## Production Deployment
### Security Considerations
- Use app passwords instead of account passwords
- Store configuration files with restricted permissions (600)
- Use HTTPS for CouchDB connections in production
- Consider encrypting sensitive configuration data
### Monitoring and Maintenance
- Review sync metadata documents for sync health
- Monitor CouchDB database sizes and compaction
- Set up log rotation for application output
- Schedule regular backups of CouchDB databases
### Performance Tuning
- Use `--max-messages` to limit processing load
- Run during off-peak hours for large initial syncs
- Monitor IMAP server rate limits and connection limits
- Consider running multiple instances for different accounts
## Troubleshooting
### Common Issues
**Connection Errors**:
- Verify IMAP server settings and credentials
- Check firewall and network connectivity
- Ensure correct ports (993 for TLS, 143 for plain)
**Authentication Failures**:
- Use app passwords for Gmail, Outlook, and other providers
- Enable "Less Secure Apps" if required by provider
- Verify account permissions and 2FA settings
**Sync Issues**:
- Check CouchDB connectivity and permissions
- Review sync metadata documents for error states
- Verify folder names and patterns match server structure
**Performance Problems**:
- Use date filtering (`since`) for large mailboxes
- Implement `--max-messages` limits for initial syncs
- Monitor server-side rate limiting
For detailed troubleshooting, see the [test environment documentation](test/README.md).
## Contributing
This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details.
## License
[License information to be added]