diff --git a/ANALYSIS.md b/ANALYSIS.md deleted file mode 100644 index 64f46c6..0000000 --- a/ANALYSIS.md +++ /dev/null @@ -1,112 +0,0 @@ -### Comprehensive Analysis of `mail2couch` Implementations - -This document provides an updated, in-depth analysis of the `mail2couch` project, integrating findings from the original `ANALYSIS.md` with a fresh review of the current Go and Rust codebases. It evaluates the current state, compares the two implementations, and outlines a roadmap for future improvements. - ---- - -### 1. Current State of the Implementations - -The project currently consists of two distinct implementations of the same core tool. - -* **The Go Implementation**: This is a mature, functional, and straightforward command-line tool. It is built on a simple, sequential architecture and effectively synchronizes emails from IMAP servers to CouchDB. It serves as a solid baseline for the project's core functionality. - -* **The Rust Implementation**: Contrary to the description in the original `ANALYSIS.md`, the Rust version is **no longer a non-functional placeholder**. It is now a complete, and in many ways, more advanced alternative to the Go version. It is built on a highly modular, asynchronous architecture, prioritizing performance, robustness, and an expanded feature set. - ---- - -### 2. Analysis of Points from Original `ANALYSIS.md` - -Several key issues and suggestions were raised in the original analysis. Here is their current status: - -* **`Incomplete Rust Implementation`**: **(Addressed)** The Rust implementation is now fully functional and surpasses the Go version in features and robustness. -* **`Performance for Large-Scale Use (Concurrency)`**: **(Addressed in Rust)** The Go version remains sequential. The Rust version, however, is fully asynchronous, allowing for concurrent network operations, which directly addresses this performance concern. -* **`Inefficient Keyword Filtering`**: **(Addressed in Rust)** The Go version still performs keyword filtering client-side. The Rust version implements server-side filtering using `IMAP SEARCH` with keywords, which is significantly more efficient. -* **`Primary Weakness: Security`**: **(Still an Issue)** Both implementations still require plaintext passwords in the configuration file. This remains a primary weakness. -* **`Missing Core Feature: Web Interface`**: **(Still an Issue)** This feature has not been implemented in either version. -* **`Usability Enhancement: Dry-Run Mode`**: **(✅ Resolved)** Both implementations now include comprehensive `--dry-run/-n` mode functionality that allows safe configuration testing without making any CouchDB changes. - ---- - -### 3. Comparative Analysis: Go vs. Rust - -#### **The Go Version** - -* **Pros**: - * **Simplicity**: The code is sequential and easy to follow, making it highly approachable for new contributors. - * **Stability**: It provides a solid, functional baseline that effectively accomplishes the core mission of the project. - * **Fast Compilation**: Quick compile times make for a fast development cycle. - * **Dry-Run Support**: Now includes comprehensive `--dry-run` mode for safe configuration testing. -* **Cons**: - * **Performance**: The lack of concurrency makes it slow for users with multiple accounts or large mailboxes. - * **Inefficiency**: Client-side keyword filtering wastes bandwidth and processing time. - * **Basic Error Handling**: The absence of retry logic makes it brittle in the face of transient network errors. - -#### **The Rust Version** - -* **Pros**: - * **Performance**: The `async` architecture provides superior performance through concurrency. - * **Robustness**: Automatic retry logic for network calls makes it highly resilient to temporary failures. - * **Feature-Rich**: Implements more efficient server-side filtering, better folder-matching logic, and a more professional CLI. - * **Safety & Maintainability**: The modular design and Rust's compile-time guarantees make the code safer and easier to maintain and extend. - * **Comprehensive Dry-Run**: Includes detailed `--dry-run` mode with enhanced simulation logging and summary reporting. -* **Cons**: - * **Complexity**: The codebase is significantly more complex due to its asynchronous nature, abstract design, and the inherent learning curve of Rust. - * **Slower Compilation**: Longer compile times can slow down development. - ---- - -### 4. Recent Implementation Updates - -#### **Dry-Run Mode Implementation (August 2025)** - -Both Go and Rust implementations now include comprehensive `--dry-run` functionality: - -##### **Go Implementation Features:** -- **CLI Integration**: Added `--dry-run/-n` flag using pflag with GNU-style options -- **Comprehensive Skipping**: All CouchDB write operations bypassed in dry-run mode -- **IMAP Preservation**: Maintains full IMAP operations for realistic email discovery -- **Detailed Simulation**: Shows what would be done with informative logging -- **Enhanced Reporting**: Clear distinction between dry-run and normal mode output -- **Bash Completion**: Updated completion script includes new flag - -##### **Rust Implementation Features:** -- **CLI Integration**: Added `--dry-run/-n` flag using clap with structured argument parsing -- **Advanced Simulation**: Detailed logging of what would be stored including message subjects -- **Async-Safe Skipping**: All async CouchDB operations properly bypassed -- **Enhanced Summary**: Comprehensive dry-run vs normal mode reporting with emoji indicators -- **Test Coverage**: All tests updated to include new dry_run field - -##### **Implementation Benefits:** -- **Risk Mitigation**: Users can validate configurations without database changes -- **Debugging Aid**: Shows exactly what emails would be processed and stored -- **Development Tool**: Enables safe testing of configuration changes -- **Documentation**: Demonstrates the full sync process without side effects - -This addresses the critical usability requirement identified in the original analysis and significantly improves the user experience for configuration validation and troubleshooting. - ---- - -### 5. Future Improvements and Missing Features - -This roadmap combines suggestions from both analyses, prioritizing the most impactful changes. - -#### **Tier 1: Critical Needs** - -1. **Fix the Security Model (Both)**: This is the most urgent issue. - * **Short-Term**: Add support for reading credentials from environment variables (e.g., `M2C_IMAP_PASSWORD`). - * **Long-Term**: Implement OAuth2 for modern providers like Gmail and Outlook. This is the industry standard and eliminates the need to store passwords. -2. **Implement a Web Interface (Either)**: As noted in the original analysis, this is the key missing feature for making the archived data useful. This would involve creating CouchDB design documents and a simple web server to render the views. -3. ~~**Add a `--dry-run` Mode (Both)**~~: **✅ COMPLETED** - Both implementations now include comprehensive dry-run functionality with the `--dry-run/-n` flag that allows users to test their configuration safely before making any changes to their database. - -#### **Tier 2: High-Impact Enhancements** - -1. **Add Concurrency to the Go Version**: To bring the Go implementation closer to the performance of the Rust version, it should be updated to use goroutines to process accounts and/or mailboxes in parallel. -2. **Improve Attachment Handling in Rust**: The `TODO` in the Rust IMAP client for parsing binary attachments should be completed to ensure all attachment types are saved correctly. -3. **URL-Encode Document IDs in Rust**: The CouchDB client in the Rust version should URL-encode document IDs to prevent errors when mailbox names contain special characters. -4. **Add Progress Indicators (Rust)**: For a better user experience during long syncs, the Rust version would benefit greatly from progress bars (e.g., using the `indicatif` crate). - -#### **Tier 3: "Nice-to-Have" Features** - -1. **Interactive Setup (Either)**: A `mail2couch setup` command to interactively generate the `config.json` file would significantly improve first-time user experience. -2. **Support for Other Protocols/Backends (Either)**: Extend the tool to support POP3 or JMAP, or to use other databases like PostgreSQL or Elasticsearch as a storage backend. -3. **Backfill Command (Either)**: A `--backfill-all` flag to ignore existing sync metadata and perform a complete re-sync of an account. \ No newline at end of file diff --git a/FOLDER_PATTERNS.md b/FOLDER_PATTERNS.md deleted file mode 100644 index dc96b8c..0000000 --- a/FOLDER_PATTERNS.md +++ /dev/null @@ -1,102 +0,0 @@ -# Folder Pattern Matching in mail2couch - -mail2couch supports powerful wildcard patterns for selecting which folders to process. This allows flexible configuration for different mail backup scenarios. - -## Pattern Syntax - -The folder filtering uses Go's `filepath.Match` syntax, which supports: - -- `*` matches any sequence of characters (including none) -- `?` matches any single character -- `[abc]` matches any character within the brackets -- `[a-z]` matches any character in the range -- `\` escapes special characters - -## Special Cases - -- `"*"` in the include list means **ALL available folders** will be processed -- Empty include list with exclude patterns will process all folders except excluded ones -- Exact string matching is supported for backwards compatibility - -## Examples - -### Include All Folders -```json -{ - "folderFilter": { - "include": ["*"], - "exclude": ["Drafts", "Trash", "Spam"] - } -} -``` -This processes all folders except Drafts, Trash, and Spam. - -### Work-Related Folders Only -```json -{ - "folderFilter": { - "include": ["Work*", "Projects*", "INBOX"], - "exclude": ["*Temp*", "*Draft*"] - } -} -``` -This includes folders starting with "Work" or "Projects", plus INBOX, but excludes any folder containing "Temp" or "Draft". - -### Archive Patterns -```json -{ - "folderFilter": { - "include": ["Archive*", "*Important*", "INBOX"], - "exclude": ["*Temp"] - } -} -``` -This includes folders starting with "Archive", any folder containing "Important", and INBOX, excluding temporary folders. - -### Specific Folders Only -```json -{ - "folderFilter": { - "include": ["INBOX", "Sent", "Important"], - "exclude": [] - } -} -``` -This processes only the exact folders: INBOX, Sent, and Important. - -### Subfolder Patterns -```json -{ - "folderFilter": { - "include": ["Work/*", "Personal/*"], - "exclude": ["*/Drafts"] - } -} -``` -This includes all subfolders under Work and Personal, but excludes any Drafts subfolder. - -## Folder Hierarchy - -Different IMAP servers use different separators for folder hierarchies: -- Most servers use `/` (e.g., `Work/Projects`, `Archive/2024`) -- Some use `.` (e.g., `Work.Projects`, `Archive.2024`) - -The patterns work with whatever separator your IMAP server uses. - -## Common Use Cases - -1. **Corporate Email**: `["*"]` with exclude `["Drafts", "Trash", "Spam"]` for complete backup -2. **Selective Backup**: `["INBOX", "Sent", "Important"]` for essential folders only -3. **Project-based**: `["Project*", "Client*"]` to backup work-related folders -4. **Archive Mode**: `["Archive*", "*Important*"]` for long-term storage -5. **Sync Mode**: `["INBOX"]` for real-time synchronization - -## Message Origin Tracking - -All messages stored in CouchDB include a `mailbox` field that records the original folder name. This ensures you can always identify which folder a message came from, regardless of how it was selected by the folder filter. - -## Performance Considerations - -- Using `"*"` processes all folders, which may be slow for accounts with many folders -- Specific folder names are faster than wildcard patterns -- Consider using exclude patterns to filter out large, unimportant folders like Trash or Spam \ No newline at end of file diff --git a/IMPLEMENTATION_COMPARISON.md b/IMPLEMENTATION_COMPARISON.md deleted file mode 100644 index 4288f27..0000000 --- a/IMPLEMENTATION_COMPARISON.md +++ /dev/null @@ -1,560 +0,0 @@ -# Go vs Rust Implementation Comparison - -This document provides a comprehensive technical analysis comparing the Go and Rust implementations of mail2couch, helping users and developers choose the best implementation for their needs. - -## Executive Summary - -The mail2couch project offers two distinct architectural approaches to email backup: - -- **Go Implementation**: A straightforward, sequential approach emphasizing simplicity and ease of understanding -- **Rust Implementation**: A sophisticated, asynchronous architecture prioritizing performance, reliability, and advanced features - -**Key Finding**: The Rust implementation (~3,056 LOC across 9 modules) is significantly more feature-complete and architecturally advanced than the Go implementation (~1,355 LOC across 4 modules), representing a mature evolution rather than a simple port. - ---- - -## Architecture & Design Philosophy - -### Go Implementation: Sequential Simplicity - -**Design Philosophy**: Straightforward, imperative programming with minimal abstraction - -- **Processing Model**: Sequential processing of sources → mailboxes → messages -- **Error Handling**: Basic error propagation with continue-on-error for non-critical failures -- **Modularity**: Simple package structure (`config`, `couch`, `mail`, `main`) -- **State Management**: Minimal state, mostly function-based operations - -```go -// Example: Sequential processing approach -func processImapSource(source *config.MailSource, couchClient *couch.Client, - dbName string, maxMessages int, dryRun bool) error { - // Connect to IMAP server - imapClient, err := mail.NewImapClient(source) - if err != nil { - return fmt.Errorf("failed to connect to IMAP server: %w", err) - } - defer imapClient.Logout() - - // Process each mailbox sequentially - for _, mailbox := range mailboxes { - // Process messages one by one - messages, currentUIDs, err := imapClient.GetMessages(...) - // Store messages synchronously - } -} -``` - -### Rust Implementation: Async Orchestration - -**Design Philosophy**: Modular, type-safe architecture with comprehensive error handling - -- **Processing Model**: Asynchronous coordination with concurrent network operations -- **Error Handling**: Sophisticated retry logic, structured error types, graceful degradation -- **Modularity**: Well-separated concerns (`cli`, `config`, `couch`, `imap`, `sync`, `filters`, `schemas`) -- **State Management**: Stateful coordinator pattern with proper resource management - -```rust -// Example: Asynchronous coordination approach -impl SyncCoordinator { - pub async fn sync_all_sources(&mut self) -> Result> { - let mut results = Vec::new(); - let sources = self.config.mail_sources.clone(); - - for source in &sources { - if !source.enabled { - info!("Skipping disabled source: {}", source.name); - continue; - } - - match self.sync_source(source).await { - Ok(result) => { - info!("✅ Completed sync for {}: {} messages across {} mailboxes", - result.source_name, result.total_messages, result.mailboxes_processed); - results.push(result); - } - Err(e) => { - error!("❌ Failed to sync source {}: {}", source.name, e); - // Continue with other sources even if one fails - } - } - } - Ok(results) - } -} -``` - ---- - -## Performance & Scalability - -### Concurrency Models - -| Aspect | Go Implementation | Rust Implementation | -|--------|------------------|-------------------| -| **Processing Model** | Sequential (blocking) | Asynchronous (non-blocking) | -| **Account Processing** | One at a time | One at a time with internal concurrency | -| **Mailbox Processing** | One at a time | One at a time with async I/O | -| **Message Processing** | One at a time | Batch processing with async operations | -| **Network Operations** | Blocking I/O | Non-blocking async I/O | - -### IMAP Filtering Efficiency - -**Go: Client-Side Filtering** -```go -// Downloads ALL messages first, then filters locally -messages := imap.FetchAll() -filtered := []Message{} -for _, msg := range messages { - if ShouldProcessMessage(msg, filter) { - filtered = append(filtered, msg) - } -} -``` - -**Rust: Server-Side Filtering** -```rust -// Filters on server, only downloads matching messages -pub async fn search_messages_advanced( - &mut self, - since_date: Option<&DateTime>, - subject_keywords: Option<&[String]>, - from_keywords: Option<&[String]>, -) -> Result> { - let mut search_parts = Vec::new(); - - if let Some(keywords) = subject_keywords { - for keyword in keywords { - search_parts.push(format!("SUBJECT \"{}\"", keyword)); - } - } - // Server processes the filter, returns only matching UIDs -} -``` - -**Performance Impact**: For a mailbox with 10,000 emails where you only want recent messages: -- **Go**: Downloads all 10,000 emails, then filters locally -- **Rust**: Server filters first, downloads only matching emails (potentially 10x less data transfer) - -### Error Recovery and Resilience - -**Go: Basic Error Handling** -```go -err := processImapSource(&source, couchClient, dbName, args.MaxMessages, args.DryRun) -if err != nil { - log.Printf("ERROR: Failed to process IMAP source %s: %v", source.Name, err) -} -// Continues with next source, no retry logic -``` - -**Rust: Intelligent Retry Logic** -```rust -async fn retry_operation(&self, operation_name: &str, operation: F) -> Result -where F: Fn() -> Fut, Fut: std::future::Future> -{ - const MAX_RETRIES: u32 = 3; - const RETRY_DELAY_MS: u64 = 1000; - - for attempt in 1..=MAX_RETRIES { - match operation().await { - Ok(result) => return Ok(result), - Err(e) => { - let is_retryable = match &e.downcast_ref::() { - Some(CouchError::Http(_)) => true, - Some(CouchError::CouchDb { status, .. }) => *status >= 500, - _ => false, - }; - - if is_retryable && attempt < MAX_RETRIES { - warn!("Attempt {}/{} failed for {}: {}. Retrying in {}ms...", - attempt, MAX_RETRIES, operation_name, e, RETRY_DELAY_MS); - async_std::task::sleep(Duration::from_millis(RETRY_DELAY_MS)).await; - } else { - error!("Operation {} failed after {} attempts: {}", - operation_name, attempt, e); - return Err(e); - } - } - } - } - unreachable!() -} -``` - ---- - -## Developer Experience - -### Code Complexity and Learning Curve - -| Aspect | Go Implementation | Rust Implementation | -|--------|------------------|-------------------| -| **Lines of Code** | 1,355 | 3,056 | -| **Number of Files** | 4 | 9 | -| **Dependencies** | 4 external | 14+ external | -| **Compilation Time** | 2-3 seconds | 6+ seconds | -| **Learning Curve** | Low | Medium-High | -| **Debugging Ease** | Simple stack traces | Rich error context | - -### Dependency Management - -**Go Dependencies (minimal approach):** -```go -require ( - github.com/emersion/go-imap/v2 v2.0.0-beta.5 - github.com/emersion/go-message v0.18.1 - github.com/go-kivik/kivik/v4 v4.4.0 - github.com/spf13/pflag v1.0.7 -) -``` - -**Rust Dependencies (rich ecosystem):** -```toml -[dependencies] -anyhow = "1.0" -serde = { version = "1.0", features = ["derive"] } -serde_json = "1.0" -tokio = { version = "1.0", features = ["full"] } -reqwest = { version = "0.11", features = ["json"] } -clap = { version = "4.0", features = ["derive"] } -log = "0.4" -env_logger = "0.10" -chrono = { version = "0.4", features = ["serde"] } -async-imap = "0.9" -mail-parser = "0.6" -thiserror = "1.0" -glob = "0.3" -dirs = "5.0" -``` - -**Trade-offs**: -- **Go**: Faster builds, fewer potential security vulnerabilities, simpler dependency tree -- **Rust**: Richer functionality, better error types, more battle-tested async ecosystem - ---- - -## Feature Comparison Matrix - -| Feature | Go Implementation | Rust Implementation | Notes | -|---------|------------------|-------------------|-------| -| **Core Functionality** | -| IMAP Email Sync | ✅ | ✅ | Both fully functional | -| CouchDB Storage | ✅ | ✅ | Both support attachments | -| Incremental Sync | ✅ | ✅ | Both use metadata tracking | -| **Configuration** | -| JSON Config Files | ✅ | ✅ | Same format, auto-discovery | -| Folder Filtering | ✅ | ✅ | Both support wildcards | -| Date Filtering | ✅ | ✅ | Since date support | -| Keyword Filtering | ✅ (client-side) | ✅ (server-side) | Rust is more efficient | -| **CLI Features** | -| GNU-style Arguments | ✅ | ✅ | Both use standard conventions | -| Dry-run Mode | ✅ | ✅ | Both recently implemented | -| Bash Completion | ✅ | ✅ | Auto-generated scripts | -| Help System | Basic | Rich | Rust uses clap framework | -| **Reliability** | -| Error Handling | Basic | Advanced | Rust has retry logic | -| Connection Recovery | Manual | Automatic | Rust handles reconnections | -| Resource Management | Manual (defer) | Automatic (RAII) | Rust prevents leaks | -| **Performance** | -| Concurrent Processing | ❌ | ✅ | Rust uses async/await | -| Server-side Filtering | ❌ | ✅ | Rust reduces bandwidth | -| Memory Efficiency | Good | Excellent | Rust zero-copy where possible | -| **Development** | -| Test Coverage | Minimal | Comprehensive | Rust has extensive tests | -| Documentation | Basic | Rich | Rust has detailed docs | -| Type Safety | Good | Excellent | Rust prevents more errors | - ---- - -## Use Case Recommendations - -### Choose Go Implementation When: - -#### 🎯 **Personal Use & Simplicity** -- Single email account or small number of accounts -- Infrequent synchronization (daily/weekly) -- Simple setup requirements -- You want to understand/modify the code easily - -#### 🎯 **Resource Constraints** -- Memory-limited environments -- CPU-constrained systems -- Quick deployment needed -- Minimal disk space for binaries - -#### 🎯 **Development Preferences** -- Team familiar with Go -- Preference for simple, readable code -- Fast compilation important for development cycle -- Minimal external dependencies preferred - -**Example Use Case**: Personal backup of 1-2 Gmail accounts, running weekly on a Raspberry Pi. - -### Choose Rust Implementation When: - -#### 🚀 **Performance Critical Scenarios** -- Multiple email accounts (3+ accounts) -- Large mailboxes (10,000+ emails) -- Frequent synchronization (hourly/real-time) -- High-volume email processing - -#### 🚀 **Production Environments** -- Business-critical email backups -- Need for reliable error recovery -- 24/7 operation requirements -- Professional deployment standards - -#### 🚀 **Advanced Features Required** -- Server-side IMAP filtering needed -- Complex folder filtering patterns -- Detailed logging and monitoring -- Long-term maintenance planned - -**Example Use Case**: Corporate email backup system handling 10+ accounts with complex filtering rules, running continuously in a production environment. - ---- - -## Performance Benchmarks - -### Theoretical Performance Comparison - -| Scenario | Go Implementation | Rust Implementation | Improvement | -|----------|------------------|-------------------|-------------| -| **Single small account** (1,000 emails) | 2-3 minutes | 1-2 minutes | 33-50% faster | -| **Multiple accounts** (3 accounts, 5,000 emails each) | 15-20 minutes | 8-12 minutes | 40-47% faster | -| **Large mailbox** (50,000 emails with filtering) | 45-60 minutes | 15-25 minutes | 58-67% faster | -| **Network errors** (5% packet loss) | May fail/restart | Continues with retry | Much more reliable | - -*Note: These are estimated performance improvements based on architectural differences. Actual performance will vary based on network conditions, server capabilities, and email characteristics.* - -### Resource Usage - -| Metric | Go Implementation | Rust Implementation | -|--------|------------------|-------------------| -| **Memory Usage** | 20-50 MB | 15-40 MB | -| **CPU Usage** | Low (single-threaded) | Medium (multi-threaded) | -| **Network Efficiency** | Lower (downloads then filters) | Higher (filters then downloads) | -| **Disk I/O** | Sequential writes | Batched writes | - ---- - -## Migration Guide - -### From Go to Rust - -If you're currently using the Go implementation and considering migration: - -#### **When to Migrate**: -- You experience performance issues with large mailboxes -- You need better error recovery and reliability -- You want more efficient network usage -- You're planning long-term maintenance - -#### **Migration Steps**: -1. **Test in parallel**: Run both implementations with `--dry-run` to compare results -2. **Backup existing data**: Ensure your CouchDB data is backed up -3. **Update configuration**: Configuration format is identical, no changes needed -4. **Replace binary**: Simply replace the Go binary with the Rust binary -5. **Monitor performance**: Compare sync times and resource usage - -#### **Compatibility Notes**: -- ✅ Configuration files are 100% compatible -- ✅ CouchDB database format is identical -- ✅ Command-line arguments are the same -- ✅ Dry-run mode works identically - -### Staying with Go - -The Go implementation remains fully supported and is appropriate when: -- Current performance meets your needs -- Simplicity is more important than features -- Team lacks Rust expertise -- Resource usage is already optimized for your environment - ---- - -## Technical Architecture Details - -### Go Implementation Structure - -``` -go/ -├── main.go # Entry point and orchestration -├── config/ -│ └── config.go # Configuration loading and CLI parsing -├── couch/ -│ └── couch.go # CouchDB client and operations -└── mail/ - └── imap.go # IMAP client and message processing -``` - -**Key Characteristics**: -- Monolithic processing flow -- Synchronous I/O operations -- Basic error handling -- Minimal abstraction layers - -### Rust Implementation Structure - -``` -rust/src/ -├── main.rs # Entry point -├── lib.rs # Library exports -├── cli.rs # Command-line interface -├── config.rs # Configuration management -├── sync.rs # Synchronization coordinator -├── imap.rs # IMAP client with retry logic -├── couch.rs # CouchDB client with error handling -├── filters.rs # Filtering utilities -└── schemas.rs # Data structure definitions -``` - -**Key Characteristics**: -- Modular architecture with clear separation -- Asynchronous I/O with tokio runtime -- Comprehensive error handling -- Rich abstraction layers - ---- - -## Security Considerations - -Both implementations currently share the same security limitations and features: - -### Current Security Features -- ✅ TLS/SSL support for IMAP and CouchDB connections -- ✅ Configuration file validation -- ✅ Safe handling of email content - -### Shared Security Limitations -- ⚠️ Plaintext passwords in configuration files -- ⚠️ No OAuth2 support for modern email providers -- ⚠️ No credential encryption at rest - -### Future Security Improvements (Recommended for Both) -1. **Environment Variable Credentials**: Support reading passwords from environment variables -2. **OAuth2 Integration**: Support modern authentication for Gmail, Outlook, etc. -3. **Credential Encryption**: Encrypt stored credentials with system keyring integration -4. **Audit Logging**: Enhanced logging of authentication and access events - ---- - -## Deployment Considerations - -### Go Implementation Deployment - -**Binary Name**: `mail2couch-go` - -**Advantages**: -- Single binary deployment -- Minimal system dependencies -- Lower memory footprint -- Faster startup time - -**Best Practices**: -```bash -# Build for production using justfile -just build-go-release - -# Or build directly -cd go && go build -ldflags="-s -w" -o mail2couch-go . - -# Deploy with systemd service -sudo cp go/mail2couch-go /usr/local/bin/ -sudo systemctl enable mail2couch-go.service -``` - -### Rust Implementation Deployment - -**Binary Name**: `mail2couch-rs` - -**Advantages**: -- Better resource utilization under load -- Superior error recovery -- More detailed logging and monitoring -- Enhanced CLI experience - -**Best Practices**: -```bash -# Build optimized release using justfile -just build-rust-release - -# Or build directly -cd rust && cargo build --release - -# Deploy with enhanced monitoring -sudo cp rust/target/release/mail2couch-rs /usr/local/bin/ -sudo systemctl enable mail2couch-rs.service - -# Configure structured logging -export RUST_LOG=info -export MAIL2COUCH_LOG_FORMAT=json -``` - -### Universal Installation - -```bash -# Build and install both implementations (user-local) -just install -# This installs to ~/bin/mail2couch-go and ~/bin/mail2couch-rs - -# Build and install both implementations (system-wide) -sudo just system-install -# This installs to /usr/local/bin/mail2couch-go and /usr/local/bin/mail2couch-rs -``` - ---- - -## Future Development Roadmap - -### Short-term Improvements (Both Implementations) - -1. **Security Enhancements** - - Environment variable credential support - - OAuth2 authentication for major providers - - Encrypted credential storage - -2. **Usability Improvements** - - Interactive configuration wizard - - Progress indicators for long-running operations - - Enhanced error messages with solutions - -### Long-term Strategic Direction - -#### Go Implementation (Maintenance Mode) -- Bug fixes and security updates -- Maintain compatibility with Rust version -- Focus on simplicity and stability -- Target: Personal and small-scale deployments - -#### Rust Implementation (Active Development) -- Performance optimizations -- Advanced features (web interface, monitoring APIs) -- Enterprise features (clustering, high availability) -- Target: Production and large-scale deployments - -### Recommended Development Focus - -1. **Primary Development**: Focus on Rust implementation for new features -2. **Compatibility Maintenance**: Ensure Go version remains compatible -3. **Migration Path**: Provide clear migration guidance and tooling -4. **Documentation**: Maintain comprehensive documentation for both - ---- - -## Conclusion - -Both implementations represent excellent software engineering practices and serve different market segments effectively: - -- **Go Implementation**: Ideal for users who prioritize simplicity, fast deployment, and ease of understanding. Perfect for personal use and small-scale deployments. - -- **Rust Implementation**: Superior choice for users who need performance, reliability, and advanced features. Recommended for production environments and large-scale email processing. - -### Final Recommendation - -**For new deployments**: Start with the Rust implementation unless simplicity is your primary concern. The performance benefits and reliability features provide significant value. - -**For existing Go users**: Consider migration if you experience performance limitations or need better error recovery. The migration path is straightforward due to configuration compatibility. - -**For development contributions**: Focus on the Rust implementation for new features, while maintaining the Go version for bug fixes and compatibility. - -The project demonstrates that having two implementations can serve different user needs effectively, with each leveraging the strengths of its respective programming language and ecosystem. \ No newline at end of file diff --git a/TODO.md b/TODO.md deleted file mode 100644 index e004c00..0000000 --- a/TODO.md +++ /dev/null @@ -1,47 +0,0 @@ -# mail2couch TODO and Feature Requests - -## Planned Features - -### Keyword Filtering for Messages - -Add support for filtering messages by keywords in various message fields. This would extend the current `messageFilter` configuration. - -**Proposed Configuration Extension:** - -```json -{ - "messageFilter": { - "since": "2024-01-01", - "subjectKeywords": ["urgent", "important", "meeting"], - "senderKeywords": ["@company.com", "notifications"], - "recipientKeywords": ["team@company.com", "all@"] - } -} -``` - -**Implementation Details:** - -- `subjectKeywords`: Array of keywords to match in email subject lines -- `senderKeywords`: Array of keywords to match in sender email addresses or names -- `recipientKeywords`: Array of keywords to match in recipient (To/CC/BCC) addresses or names -- Keywords should support both inclusive (must contain) and exclusive (must not contain) patterns -- Case-insensitive matching by default -- Support for simple wildcards or regex patterns - -**Use Cases:** - -1. **Corporate Email Filtering**: Only backup emails from specific domains or containing work-related keywords -2. **Project-based Archiving**: Filter emails related to specific projects or clients -3. **Notification Management**: Exclude or include automated notifications based on sender patterns -4. **Security**: Filter out potential spam/phishing by excluding certain keywords or senders - -**Implementation Priority:** Medium - useful for reducing storage requirements and focusing on relevant emails. - -## Other Planned Improvements - -1. **Real IMAP Message Parsing**: Replace placeholder data with actual message content -2. **Message Body Extraction**: Support for HTML/plain text and multipart messages -3. **Attachment Handling**: Optional support for email attachments -4. **Batch Operations**: Improve CouchDB insertion performance -5. **Error Recovery**: Retry logic and partial sync recovery -6. **Testing**: Comprehensive unit test coverage \ No newline at end of file diff --git a/couchdb-schemas.md b/couchdb-schemas.md deleted file mode 100644 index 57c170d..0000000 --- a/couchdb-schemas.md +++ /dev/null @@ -1,207 +0,0 @@ -# CouchDB Document Schemas - -This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.). - -## Mail Document Schema - -**Document Type**: `mail` -**Document ID Format**: `{mailbox}_{uid}` (e.g., `INBOX_123`) -**Purpose**: Stores individual email messages with metadata and content - -```json -{ - "_id": "INBOX_123", - "_rev": "1-abc123...", - "_attachments": { - "attachment1.pdf": { - "content_type": "application/pdf", - "length": 12345, - "stub": true - } - }, - "sourceUid": "123", - "mailbox": "INBOX", - "from": ["sender@example.com"], - "to": ["recipient@example.com"], - "subject": "Email Subject", - "date": "2025-08-02T12:16:10Z", - "body": "Email body content", - "headers": { - "Content-Type": ["text/plain; charset=utf-8"], - "Message-ID": [""], - "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"] - }, - "storedAt": "2025-08-02T14:16:22.375241322+02:00", - "docType": "mail", - "hasAttachments": true -} -``` - -### Field Definitions - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `_id` | string | Yes | CouchDB document ID: `{mailbox}_{uid}` | -| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | -| `_attachments` | object | No | CouchDB native attachments (email attachments) | -| `sourceUid` | string | Yes | Original IMAP UID from mail server | -| `mailbox` | string | Yes | Source mailbox name (e.g., "INBOX", "Sent") | -| `from` | array[string] | Yes | Sender email addresses | -| `to` | array[string] | Yes | Recipient email addresses | -| `subject` | string | Yes | Email subject line | -| `date` | string (ISO8601) | Yes | Email date from headers | -| `body` | string | Yes | Email body content (plain text) | -| `headers` | object | Yes | All email headers as key-value pairs | -| `storedAt` | string (ISO8601) | Yes | When document was stored in CouchDB | -| `docType` | string | Yes | Always "mail" for email documents | -| `hasAttachments` | boolean | Yes | Whether email has attachments | - -### Attachment Stub Schema - -When emails have attachments, they are stored as CouchDB native attachments: - -```json -{ - "filename.ext": { - "content_type": "mime/type", - "length": 12345, - "stub": true - } -} -``` - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `content_type` | string | Yes | MIME type of attachment | -| `length` | integer | No | Size in bytes | -| `stub` | boolean | No | Indicates attachment is stored separately | - -## Sync Metadata Document Schema - -**Document Type**: `sync_metadata` -**Document ID Format**: `sync_metadata_{mailbox}` (e.g., `sync_metadata_INBOX`) -**Purpose**: Tracks synchronization state for incremental syncing - -```json -{ - "_id": "sync_metadata_INBOX", - "_rev": "1-def456...", - "docType": "sync_metadata", - "mailbox": "INBOX", - "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", - "lastMessageUID": 15, - "messageCount": 18, - "updatedAt": "2025-08-02T14:26:08.281094+02:00" -} -``` - -### Field Definitions - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `_id` | string | Yes | CouchDB document ID: `sync_metadata_{mailbox}` | -| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | -| `docType` | string | Yes | Always "sync_metadata" for sync documents | -| `mailbox` | string | Yes | Mailbox name this metadata applies to | -| `lastSyncTime` | string (ISO8601) | Yes | When this mailbox was last synced | -| `lastMessageUID` | integer | Yes | Highest IMAP UID processed in last sync | -| `messageCount` | integer | Yes | Number of messages processed in last sync | -| `updatedAt` | string (ISO8601) | Yes | When this metadata was last updated | - -## Database Naming Convention - -**Format**: `m2c_{account_name}` -**Rules**: -- Prefix all databases with `m2c_` -- Convert account names to lowercase -- Replace invalid characters with underscores -- Ensure database name starts with a letter -- If account name starts with non-letter, prefix with `mail_` - -**Examples**: -- Account "Personal Gmail" → Database `m2c_personal_gmail` -- Account "123work" → Database `m2c_mail_123work` -- Email "user@example.com" → Database `m2c_user_example_com` - -## Document ID Conventions - -### Mail Documents -- **Format**: `{mailbox}_{uid}` -- **Examples**: `INBOX_123`, `Sent_456`, `Work/Projects_789` -- **Uniqueness**: Combination of mailbox and IMAP UID ensures uniqueness - -### Sync Metadata Documents -- **Format**: `sync_metadata_{mailbox}` -- **Examples**: `sync_metadata_INBOX`, `sync_metadata_Sent` -- **Purpose**: One metadata document per mailbox for tracking sync state - -## Data Type Mappings - -### Go to JSON -| Go Type | JSON Type | Example | -|---------|-----------|---------| -| `string` | string | `"text"` | -| `[]string` | array | `["item1", "item2"]` | -| `map[string][]string` | object | `{"key": ["value1", "value2"]}` | -| `time.Time` | string (ISO8601) | `"2025-08-02T14:26:08.281094+02:00"` | -| `uint32` | number | `123` | -| `int` | number | `456` | -| `bool` | boolean | `true` | - -### Rust Considerations -When implementing in Rust, ensure: -- Use `chrono::DateTime` for timestamps with ISO8601 serialization -- Use `Vec` for string arrays -- Use `HashMap>` for headers -- Use `serde` with `#[serde(rename = "fieldName")]` for JSON field mapping -- Handle optional fields with `Option` - -## Validation Rules - -### Required Fields -All documents must include: -- `_id`: Valid CouchDB document ID -- `docType`: Identifies document type for filtering -- `mailbox`: Source mailbox name (for mail documents) - -### Data Constraints -- Email addresses: No validation enforced (preserve as-is from IMAP) -- Dates: Must be valid ISO8601 format -- UIDs: Must be positive integers -- Document IDs: Must be valid CouchDB IDs (no spaces, special chars) - -### Attachment Handling -- Store email attachments as CouchDB native attachments -- Preserve original filenames and MIME types -- Use attachment stubs in document metadata -- Support binary content through CouchDB attachment API - -## Backward Compatibility - -When modifying schemas: -1. Add new fields as optional -2. Never remove existing fields -3. Maintain existing field types and formats -4. Document any breaking changes clearly -5. Provide migration guidance for existing data - -## Implementation Notes - -### CouchDB Features Used -- **Native Attachments**: For email attachments -- **Document IDs**: Predictable format for easy access -- **Bulk Operations**: For efficient storage -- **Conflict Resolution**: CouchDB handles revision conflicts - -### Performance Considerations -- Index by `docType` for efficient filtering -- Index by `mailbox` for folder-based queries -- Index by `date` for chronological access -- Use bulk insert operations for multiple messages - -### Future Extensions -This schema supports future enhancements: -- **Webmail Views**: CouchDB design documents for HTML interface -- **Search Indexes**: Full-text search with CouchDB-Lucene -- **Replication**: Multi-database sync scenarios -- **Analytics**: Message statistics and reporting \ No newline at end of file diff --git a/examples/sample-mail-document.json b/examples/sample-mail-document.json deleted file mode 100644 index 231981e..0000000 --- a/examples/sample-mail-document.json +++ /dev/null @@ -1,42 +0,0 @@ -{ - "_id": "INBOX_123", - "_rev": "1-abc123def456789", - "_attachments": { - "report.pdf": { - "content_type": "application/pdf", - "length": 245760, - "stub": true - }, - "image.png": { - "content_type": "image/png", - "length": 12345, - "stub": true - } - }, - "sourceUid": "123", - "mailbox": "INBOX", - "from": ["sender@example.com", "alias@example.com"], - "to": ["recipient@company.com", "cc@company.com"], - "subject": "Monthly Report - Q3 2025", - "date": "2025-08-02T12:16:10Z", - "body": "Please find the attached monthly report for Q3 2025.\n\nBest regards,\nSender Name", - "headers": { - "Content-Type": ["multipart/mixed; boundary=\"----=_Part_123456\""], - "Content-Transfer-Encoding": ["7bit"], - "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"], - "From": ["sender@example.com"], - "To": ["recipient@company.com"], - "Cc": ["cc@company.com"], - "Subject": ["Monthly Report - Q3 2025"], - "Message-ID": [""], - "MIME-Version": ["1.0"], - "X-Mailer": ["Mail Client 1.0"], - "Return-Path": [""], - "Received": [ - "from smtp.example.com (smtp.example.com [192.168.1.100]) by mx.company.com (Postfix) with ESMTP id ABC123; Sat, 02 Aug 2025 14:16:10 +0200" - ] - }, - "storedAt": "2025-08-02T14:16:22.375241322+02:00", - "docType": "mail", - "hasAttachments": true -} \ No newline at end of file diff --git a/examples/sample-sync-metadata.json b/examples/sample-sync-metadata.json deleted file mode 100644 index 2aeeb91..0000000 --- a/examples/sample-sync-metadata.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "_id": "sync_metadata_INBOX", - "_rev": "2-def456abc789123", - "docType": "sync_metadata", - "mailbox": "INBOX", - "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", - "lastMessageUID": 123, - "messageCount": 45, - "updatedAt": "2025-08-02T14:26:08.281094+02:00" -} \ No newline at end of file diff --git a/examples/simple-mail-document.json b/examples/simple-mail-document.json deleted file mode 100644 index 305ba61..0000000 --- a/examples/simple-mail-document.json +++ /dev/null @@ -1,24 +0,0 @@ -{ - "_id": "Sent_456", - "_rev": "1-xyz789abc123def", - "sourceUid": "456", - "mailbox": "Sent", - "from": ["user@company.com"], - "to": ["client@external.com"], - "subject": "Meeting Follow-up", - "date": "2025-08-02T10:30:00Z", - "body": "Thank you for the productive meeting today. As discussed, I'll send the proposal by end of week.\n\nBest regards,\nUser Name", - "headers": { - "Content-Type": ["text/plain; charset=utf-8"], - "Content-Transfer-Encoding": ["7bit"], - "Date": ["Sat, 02 Aug 2025 12:30:00 +0200"], - "From": ["user@company.com"], - "To": ["client@external.com"], - "Subject": ["Meeting Follow-up"], - "Message-ID": [""], - "MIME-Version": ["1.0"] - }, - "storedAt": "2025-08-02T12:30:45.123456789+02:00", - "docType": "mail", - "hasAttachments": false -} \ No newline at end of file diff --git a/test-config-comparison.md b/test-config-comparison.md deleted file mode 100644 index 90ae448..0000000 --- a/test-config-comparison.md +++ /dev/null @@ -1,154 +0,0 @@ -# Test Configuration Comparison: Rust vs Go - -## Overview - -Two identical test configurations have been created for testing both Rust and Go implementations with the test environment: - -- **Rust**: `/home/olemd/src/mail2couch/rust/config-test-rust.json` -- **Go**: `/home/olemd/src/mail2couch/go/config-test-go.json` - -## Configuration Details - -Both configurations use the **same test environment** from `/home/olemd/src/mail2couch/test/` with: - -### Database Connection -- **CouchDB URL**: `http://localhost:5984` -- **Admin Credentials**: `admin` / `password` - -### IMAP Test Server -- **Host**: `localhost` -- **Port**: `3143` (GreenMail test server) -- **Connection**: Plain (no TLS for testing) - -### Test Accounts - -Both configurations use the **same IMAP test accounts**: - -| Username | Password | Purpose | -|----------|----------|---------| -| `testuser1` | `password123` | Wildcard all folders test | -| `syncuser` | `syncpass` | Work pattern test (sync mode) | -| `archiveuser` | `archivepass` | Specific folders test | -| `testuser2` | `password456` | Subfolder pattern test (disabled) | - -### Mail Sources Configuration - -Both configurations define **identical mail sources** with only the account names differing: - -#### 1. Wildcard All Folders Test -- **Account Name**: "**Rust** Wildcard All Folders Test" vs "**Go** Wildcard All Folders Test" -- **Mode**: `archive` -- **Folders**: All folders (`*`) except `Drafts` and `Trash` -- **Filters**: Subject keywords: `["meeting", "important"]`, Sender keywords: `["@company.com"]` - -#### 2. Work Pattern Test -- **Account Name**: "**Rust** Work Pattern Test" vs "**Go** Work Pattern Test" -- **Mode**: `sync` (delete removed emails) -- **Folders**: `Work*`, `Important*`, `INBOX` (exclude `*Temp*`) -- **Filters**: Recipient keywords: `["support@", "team@"]` - -#### 3. Specific Folders Only -- **Account Name**: "**Rust** Specific Folders Only" vs "**Go** Specific Folders Only" -- **Mode**: `archive` -- **Folders**: Exactly `INBOX`, `Sent`, `Personal` -- **Filters**: None - -#### 4. Subfolder Pattern Test (Disabled) -- **Account Name**: "**Rust** Subfolder Pattern Test" vs "**Go** Subfolder Pattern Test" -- **Mode**: `archive` -- **Folders**: `Work/*`, `Archive/*` (exclude `*/Drafts`) -- **Status**: `enabled: false` - -## Expected Database Names - -When run, each implementation will create **different databases** due to the account name differences: - -### Rust Implementation Databases -- `m2c_rust_wildcard_all_folders_test` -- `m2c_rust_work_pattern_test` -- `m2c_rust_specific_folders_only` -- `m2c_rust_subfolder_pattern_test` (disabled) - -### Go Implementation Databases -- `m2c_go_wildcard_all_folders_test` -- `m2c_go_work_pattern_test` -- `m2c_go_specific_folders_only` -- `m2c_go_subfolder_pattern_test` (disabled) - -## Testing Commands - -### Start Test Environment -```bash -cd /home/olemd/src/mail2couch/test -./start-test-env.sh -``` - -### Run Rust Implementation -```bash -cd /home/olemd/src/mail2couch/rust -cargo build --release -./target/release/mail2couch -c config-test-rust.json -``` - -### Run Go Implementation -```bash -cd /home/olemd/src/mail2couch/go -go build -o mail2couch . -./mail2couch -c config-test-go.json -``` - -### Verify Results -```bash -# List all databases -curl http://localhost:5984/_all_dbs - -# Check Rust databases -curl http://localhost:5984/m2c_rust_wildcard_all_folders_test -curl http://localhost:5984/m2c_rust_work_pattern_test -curl http://localhost:5984/m2c_rust_specific_folders_only - -# Check Go databases -curl http://localhost:5984/m2c_go_wildcard_all_folders_test -curl http://localhost:5984/m2c_go_work_pattern_test -curl http://localhost:5984/m2c_go_specific_folders_only -``` - -### Stop Test Environment -```bash -cd /home/olemd/src/mail2couch/test -./stop-test-env.sh -``` - -## Validation Points - -Both implementations should produce **identical results** when processing the same IMAP accounts: - -1. **Database Structure**: Same document schemas and field names -2. **Message Processing**: Same email parsing and storage logic -3. **Folder Filtering**: Same wildcard pattern matching -4. **Message Filtering**: Same keyword filtering behavior -5. **Sync Behavior**: Same incremental sync and deletion handling -6. **Error Handling**: Same retry logic and error recovery - -The only differences should be: -- Database names (due to account name prefixes) -- Timestamp precision (implementation-specific) -- Internal document IDs format (if any) - -## Use Cases - -### Feature Parity Testing -Run both implementations with the same configuration to verify identical behavior: -```bash -# Run both implementations -./test-both-implementations.sh - -# Compare database contents -./compare-database-results.sh -``` - -### Performance Comparison -Use identical configurations to benchmark performance differences between Rust and Go implementations. - -### Development Testing -Use separate configurations during development to avoid database conflicts when testing both implementations simultaneously. \ No newline at end of file