A utility to back up mail from various sources to couchdb
  • Rust 54.3%
  • Go 24%
  • Shell 11.8%
  • Python 6.1%
  • Just 3.8%
Find a file
Ole-Morten Duesund c2ad55eaaf feat: add comprehensive README documentation and clean up configuration
## Documentation Enhancements
- Create comprehensive README with installation, configuration, and usage examples
- Add simple, advanced, and provider-specific configuration examples
- Document all features: incremental sync, wildcard patterns, keyword filtering, attachment support
- Include production deployment guidance and troubleshooting section
- Add architecture documentation with database structure and document format examples

## Configuration Cleanup
- Remove unnecessary `database` field from CouchDB configuration
- Add `m2c_` prefix to all CouchDB database names for better namespace isolation
- Update GenerateAccountDBName() to consistently prefix databases with `m2c_`
- Clean up all configuration examples to remove deprecated database field

## Test Environment Simplification
- Simplify test script structure to eliminate confusion and redundancy
- Remove redundant populate-test-messages.sh wrapper script
- Update run-tests.sh to be comprehensive automated test with cleanup
- Maintain clear separation: automated tests vs manual testing environment
- Update all test scripts to expect m2c-prefixed database names

## Configuration Examples Added
- config-simple.json: Basic single Gmail account setup
- config-advanced.json: Multi-account with complex filtering and different providers
- config-providers.json: Real-world configurations for Gmail, Outlook, Yahoo, iCloud

## Benefits
- Clear documentation for users from beginner to advanced
- Namespace isolation prevents database conflicts in shared CouchDB instances
- Simplified test workflow eliminates user confusion about which scripts to use
- Comprehensive examples cover common email provider configurations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-01 21:26:53 +02:00
go feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
test feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
.gitignore Initial commit 2025-07-29 13:13:50 +02:00
CLAUDE.md feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
config-advanced.json feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
config-providers.json feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
config-simple.json feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
config.json feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
FOLDER_PATTERNS.md feat: implement comprehensive wildcard folder selection and keyword filtering 2025-08-01 17:24:02 +02:00
LICENSE Initial commit 2025-07-29 13:13:50 +02:00
README.md feat: add comprehensive README documentation and clean up configuration 2025-08-01 21:26:53 +02:00
TODO.md docs: add comprehensive keyword filtering specification 2025-07-29 17:19:55 +02:00

mail2couch

A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB databases with intelligent incremental sync, comprehensive filtering, and native attachment support.

Features

Core Functionality

  • IMAP Email Backup: Connect to any IMAP server (Gmail, Outlook, self-hosted)
  • CouchDB Storage: Store emails as JSON documents with native CouchDB attachments
  • Incremental Sync: Efficiently sync only new messages using IMAP SEARCH with timestamp tracking
  • Per-Account Databases: Each mail source gets its own CouchDB database for better organization
  • Duplicate Prevention: Automatic detection and prevention of duplicate message storage

Sync Modes

  • Archive Mode: Preserve all messages ever seen, even if deleted from mail server (default)
  • Sync Mode: Maintain 1-to-1 relationship with mail server (removes deleted messages from CouchDB)

Advanced Filtering

  • Wildcard Folder Patterns: Use *, ?, [abc] patterns for flexible folder selection
  • Keyword Filtering: Filter messages by keywords in subjects, senders, or recipients
  • Date Filtering: Process only messages since a specific date
  • Include/Exclude Logic: Combine multiple filter types for precise control

Message Processing

  • Full MIME Support: Parse multipart messages, HTML/plain text, and embedded content
  • Native Attachments: Store email attachments as CouchDB native attachments with compression
  • Complete Headers: Preserve all email headers and metadata
  • UTF-8 Support: Handle international characters and special content

Operational Features

  • Automatic Config Discovery: Finds configuration files in standard locations
  • Command Line Control: Override settings with --max-messages and --config flags
  • Comprehensive Logging: Detailed output for monitoring and troubleshooting
  • Error Resilience: Graceful handling of network issues and server problems

Quick Start

Installation

  1. Install dependencies:

    # Go 1.21+ required
    go version
    
  2. Clone and build:

    git clone <repository-url>
    cd mail2couch/go
    go build -o mail2couch .
    

Basic Usage

  1. Create configuration file (config.json):

    {
      "couchDb": {
        "url": "http://localhost:5984",
        "user": "admin", 
        "password": "password"
      },
      "mailSources": [
        {
          "name": "Personal Gmail",
          "enabled": true,
          "protocol": "imap",
          "host": "imap.gmail.com",
          "port": 993,
          "user": "your-email@gmail.com",
          "password": "your-app-password",
          "mode": "archive",
          "folderFilter": {
            "include": ["*"],
            "exclude": ["[Gmail]/Trash", "[Gmail]/Spam"]
          }
        }
      ]
    }
    
  2. Run mail2couch:

    ./mail2couch
    

The application will:

  • Create a CouchDB database named m2c_personal_gmail
  • Sync all folders except Trash and Spam
  • Store messages with native attachments
  • Track sync state for efficient incremental updates

Configuration

Configuration File Discovery

mail2couch automatically searches for configuration files in this order:

  1. Path specified by --config flag
  2. ./config.json (current directory)
  3. ./config/config.json (config subdirectory)
  4. ~/.config/mail2couch/config.json (user config directory)
  5. ~/.mail2couch.json (user home directory)

Command Line Options

./mail2couch [options]

Options:
  --config PATH        Specify configuration file path
  --max-messages N     Limit messages processed per mailbox per run (0 = unlimited)

Folder Pattern Examples

Pattern Description Matches
"*" All folders INBOX, Sent, Work/Projects, etc.
"INBOX" Exact match INBOX only
"Work*" Prefix match Work, Work/Projects, WorkStuff
"*/Archive" Suffix match Personal/Archive, Work/Archive
"Work/*" Subfolder match Work/Projects, Work/Clients

Keyword Filtering Examples

{
  "messageFilter": {
    "subjectKeywords": ["urgent", "meeting", "invoice"],
    "senderKeywords": ["@company.com", "noreply@"],
    "recipientKeywords": ["team@", "support@"]
  }
}

Advanced Configuration Examples

See the example configurations section below for detailed configuration scenarios.

Testing

A comprehensive test environment is included with Podman containers:

cd test

# Quick automated testing (recommended)
./run-tests.sh              # Complete integration test with automatic cleanup

# Specialized feature testing  
./test-wildcard-patterns.sh # Test folder pattern matching
./test-incremental-sync.sh  # Test incremental synchronization

# Manual testing environment
./start-test-env.sh         # Start persistent test environment
# ... manual testing with various configurations ...
./stop-test-env.sh          # Clean up when done

Architecture

Database Structure

  • Per-Account Databases: Each mail source creates its own CouchDB database with m2c_ prefix
  • Message Documents: Each email becomes a CouchDB document with metadata
  • Native Attachments: Email attachments stored as CouchDB attachments (compressed)
  • Sync Metadata: Tracks incremental sync state per mailbox

Document Structure

{
  "_id": "INBOX_12345",
  "sourceUid": "12345", 
  "mailbox": "INBOX",
  "from": ["sender@example.com"],
  "to": ["recipient@example.com"],
  "subject": "Sample Email",
  "date": "2024-01-15T10:30:00Z",
  "body": "Email content...",
  "headers": {"Content-Type": ["text/plain"]},
  "storedAt": "2024-01-15T10:35:00Z",
  "docType": "mail",
  "hasAttachments": true,
  "_attachments": {
    "document.pdf": {
      "content_type": "application/pdf",
      "length": 54321
    }
  }
}

Example Configurations

Simple Configuration

Basic setup for a single Gmail account:

{
  "couchDb": {
    "url": "http://localhost:5984",
    "user": "admin",
    "password": "password"
  },
  "mailSources": [
    {
      "name": "Personal Gmail",
      "enabled": true,
      "protocol": "imap",
      "host": "imap.gmail.com", 
      "port": 993,
      "user": "your-email@gmail.com",
      "password": "your-app-password",
      "mode": "archive",
      "folderFilter": {
        "include": ["INBOX", "Sent"],
        "exclude": []
      },
      "messageFilter": {
        "since": "2024-01-01"
      }
    }
  ]
}

Advanced Multi-Account Configuration

Complex setup with multiple accounts, filtering, and different sync modes:

{
  "couchDb": {
    "url": "https://your-couchdb.example.com:5984",
    "user": "backup_user",
    "password": "secure_password"
  },
  "mailSources": [
    {
      "name": "Work Email", 
      "enabled": true,
      "protocol": "imap",
      "host": "outlook.office365.com",
      "port": 993,
      "user": "you@company.com",
      "password": "app-password",
      "mode": "sync",
      "folderFilter": {
        "include": ["*"],
        "exclude": ["Deleted Items", "Junk Email", "Drafts"]
      },
      "messageFilter": {
        "since": "2023-01-01",
        "subjectKeywords": ["project", "meeting", "urgent"],
        "senderKeywords": ["@company.com", "@client.com"]
      }
    },
    {
      "name": "Personal Gmail",
      "enabled": true, 
      "protocol": "imap",
      "host": "imap.gmail.com",
      "port": 993,
      "user": "personal@gmail.com",
      "password": "gmail-app-password",
      "mode": "archive",
      "folderFilter": {
        "include": ["INBOX", "Important", "Work/*", "Personal/*"],
        "exclude": ["[Gmail]/Trash", "[Gmail]/Spam", "*Temp*"]
      },
      "messageFilter": {
        "recipientKeywords": ["family@", "personal@"]
      }
    },
    {
      "name": "Self-Hosted Mail",
      "enabled": true,
      "protocol": "imap", 
      "host": "mail.yourdomain.com",
      "port": 143,
      "user": "admin@yourdomain.com",
      "password": "mail-password",
      "mode": "archive",
      "folderFilter": {
        "include": ["INBOX", "Archive/*", "Projects/*"],
        "exclude": ["*/Drafts", "Trash"]
      },
      "messageFilter": {
        "since": "2023-06-01",
        "subjectKeywords": ["invoice", "receipt", "statement"]
      }
    },
    {
      "name": "Legacy Account",
      "enabled": false,
      "protocol": "imap",
      "host": "legacy.mailserver.com", 
      "port": 993,
      "user": "old@account.com",
      "password": "legacy-password",
      "mode": "archive",
      "folderFilter": {
        "include": ["INBOX"],
        "exclude": []
      },
      "messageFilter": {}
    }
  ]
}

Configuration Options Reference

CouchDB Configuration

  • url: CouchDB server URL with protocol and port
  • user: CouchDB username with database access
  • password: CouchDB password

Mail Source Configuration

  • name: Descriptive name (used for database naming)
  • enabled: Boolean to enable/disable this source
  • protocol: Only "imap" currently supported
  • host: IMAP server hostname
  • port: IMAP port (993 for TLS, 143 for plain, 3143 for testing)
  • user: Email account username
  • password: Email account password (use app passwords for Gmail/Outlook)
  • mode: "sync" (mirror server) or "archive" (preserve all messages)

Folder Filter Configuration

  • include: Array of folder patterns to process (empty = all folders)
  • exclude: Array of folder patterns to skip

Message Filter Configuration

  • since: Date string (YYYY-MM-DD) to process messages from
  • subjectKeywords: Array of keywords that must appear in subject line
  • senderKeywords: Array of keywords that must appear in sender addresses
  • recipientKeywords: Array of keywords that must appear in recipient addresses

Production Deployment

Security Considerations

  • Use app passwords instead of account passwords
  • Store configuration files with restricted permissions (600)
  • Use HTTPS for CouchDB connections in production
  • Consider encrypting sensitive configuration data

Monitoring and Maintenance

  • Review sync metadata documents for sync health
  • Monitor CouchDB database sizes and compaction
  • Set up log rotation for application output
  • Schedule regular backups of CouchDB databases

Performance Tuning

  • Use --max-messages to limit processing load
  • Run during off-peak hours for large initial syncs
  • Monitor IMAP server rate limits and connection limits
  • Consider running multiple instances for different accounts

Troubleshooting

Common Issues

Connection Errors:

  • Verify IMAP server settings and credentials
  • Check firewall and network connectivity
  • Ensure correct ports (993 for TLS, 143 for plain)

Authentication Failures:

  • Use app passwords for Gmail, Outlook, and other providers
  • Enable "Less Secure Apps" if required by provider
  • Verify account permissions and 2FA settings

Sync Issues:

  • Check CouchDB connectivity and permissions
  • Review sync metadata documents for error states
  • Verify folder names and patterns match server structure

Performance Problems:

  • Use date filtering (since) for large mailboxes
  • Implement --max-messages limits for initial syncs
  • Monitor server-side rate limiting

For detailed troubleshooting, see the test environment documentation.

Contributing

This project welcomes contributions! Please see CLAUDE.md for development setup and architecture details.

License

[License information to be added]