diff --git a/.gitignore b/.gitignore index c3d683d..07ab0c3 100644 --- a/.gitignore +++ b/.gitignore @@ -47,3 +47,4 @@ go.work.sum # env file .env +__pycache__ diff --git a/CLAUDE.md b/CLAUDE.md index fce9b14..4a2cb2b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -57,7 +57,7 @@ cd go && go mod tidy 2. **Mail Handling (`mail/`)**: IMAP client implementation - Uses `github.com/emersion/go-imap/v2` for IMAP operations - Supports TLS connections - - Currently only lists mailboxes (backup functionality not yet implemented) + - Fetches and processes email messages from IMAP mailboxes 3. **CouchDB Integration (`couch/`)**: Database operations - Uses `github.com/go-kivik/kivik/v4` as CouchDB driver @@ -95,7 +95,7 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Per-account CouchDB database creation and management - ✅ IMAP connection and mailbox listing - ✅ Build error fixes -- ✅ Email message retrieval framework (with placeholder data) +- ✅ Real IMAP message retrieval and parsing - ✅ Email storage to CouchDB framework with native attachments - ✅ Folder filtering logic with wildcard support (`*`, `?`, `[abc]` patterns) - ✅ Date filtering support @@ -103,7 +103,6 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Duplicate detection and prevention - ✅ Sync vs Archive mode implementation - ✅ CouchDB attachment storage for email attachments -- ✅ Real IMAP message parsing with go-message library - ✅ Full message body and attachment handling with MIME multipart support - ✅ Command line argument support (--max-messages flag) - ✅ Per-account CouchDB databases for better organization @@ -143,6 +142,7 @@ Sync metadata documents are stored in CouchDB with ID format: `sync_metadata_{ma - Comprehensive test environment with Podman containers and automated test scripts - The application uses automatic config file discovery as documented above + ### Next Steps The following enhancements could further improve the implementation: @@ -158,4 +158,5 @@ The following enhancements could further improve the implementation: ## Development Guidelines ### Code Quality and Standards -- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly \ No newline at end of file +- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly +- We always want linting and formatting of our code to be perfect \ No newline at end of file diff --git a/README.md b/README.md index f81e050..7c0fc5e 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,16 @@ A powerful email backup utility that synchronizes mail from IMAP accounts to Cou - **Complete Headers**: Preserve all email headers and metadata - **UTF-8 Support**: Handle international characters and special content +### HTML Webmail Interface +- **Beautiful Web Interface**: Modern, responsive HTML presentations for viewing archived emails +- **Gmail-like Design**: Professional, mobile-friendly interface with clean typography +- **Message Lists**: Dynamic HTML lists with sorting, filtering, and folder organization +- **Individual Messages**: Rich HTML display with proper formatting, URL linking, and collapsible headers +- **Attachment Support**: Direct download links with file type and size information +- **Search Integration**: Full-text subject search with keyword highlighting +- **Folder Analytics**: Message count summaries and folder-based navigation +- **Mobile Responsive**: Optimized for desktop, tablet, and mobile viewing + ### Operational Features - **Automatic Config Discovery**: Finds configuration files in standard locations - **Command Line Control**: Override settings with `--max-messages` and `--config` flags @@ -164,6 +174,7 @@ cd test - **Message Documents**: Each email becomes a CouchDB document with metadata - **Native Attachments**: Email attachments stored as CouchDB attachments (compressed) - **Sync Metadata**: Tracks incremental sync state per mailbox +- **HTML Webmail Views**: CouchDB design documents with show/list functions for web interface ### Document Structure ```json @@ -189,6 +200,28 @@ cd test } ``` +### Accessing Stored Emails + +Once mail2couch has synced your emails, you can access them through CouchDB's REST API: + +#### Raw Data Access +```bash +# List all databases +http://localhost:5984/_all_dbs + +# View database info +http://localhost:5984/{database} + +# List all documents in database +http://localhost:5984/{database}/_all_docs + +# Get individual message +http://localhost:5984/{database}/{message_id} + +# Get message with attachments +http://localhost:5984/{database}/{message_id}/{attachment_name} +``` + ## Example Configurations ### Simple Configuration @@ -380,6 +413,30 @@ Complex setup with multiple accounts, filtering, and different sync modes: For detailed troubleshooting, see the [test environment documentation](test/README.md). +## Future Plans + +### CouchDB-Hosted Webmail Viewer + +We plan to develop a comprehensive webmail interface for viewing the archived emails directly through CouchDB. This will include: + +- **📧 Modern Web Interface**: A responsive, Gmail-style webmail viewer built on CouchDB design documents +- **🔍 Advanced Search**: Full-text search across subjects, senders, and message content +- **📁 Folder Organization**: Browse messages by mailbox with visual indicators and statistics +- **📎 Attachment Viewer**: Direct download and preview of email attachments +- **📱 Mobile Support**: Optimized interface for tablets and smartphones +- **🎨 Customizable Themes**: Multiple UI themes and layout options +- **⚡ Real-time Updates**: Live synchronization as new emails are archived +- **🔐 Authentication**: Secure access controls and user management +- **📊 Analytics Dashboard**: Email statistics and storage insights + +This webmail viewer will be implemented as: +- **CouchDB Design Documents**: Views, shows, and list functions for data access +- **Self-contained HTML/CSS/JS**: No external dependencies or servers required +- **RESTful Architecture**: Clean API endpoints for integration with other tools +- **Progressive Enhancement**: Works with JavaScript disabled for basic functionality + +The webmail interface will be a separate component that can be optionally installed alongside the core mail2couch storage functionality, maintaining the clean separation between data archival and presentation layers. + ## Contributing This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details. diff --git a/couchdb-schemas.md b/couchdb-schemas.md new file mode 100644 index 0000000..57c170d --- /dev/null +++ b/couchdb-schemas.md @@ -0,0 +1,207 @@ +# CouchDB Document Schemas + +This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.). + +## Mail Document Schema + +**Document Type**: `mail` +**Document ID Format**: `{mailbox}_{uid}` (e.g., `INBOX_123`) +**Purpose**: Stores individual email messages with metadata and content + +```json +{ + "_id": "INBOX_123", + "_rev": "1-abc123...", + "_attachments": { + "attachment1.pdf": { + "content_type": "application/pdf", + "length": 12345, + "stub": true + } + }, + "sourceUid": "123", + "mailbox": "INBOX", + "from": ["sender@example.com"], + "to": ["recipient@example.com"], + "subject": "Email Subject", + "date": "2025-08-02T12:16:10Z", + "body": "Email body content", + "headers": { + "Content-Type": ["text/plain; charset=utf-8"], + "Message-ID": [""], + "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"] + }, + "storedAt": "2025-08-02T14:16:22.375241322+02:00", + "docType": "mail", + "hasAttachments": true +} +``` + +### Field Definitions + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `_id` | string | Yes | CouchDB document ID: `{mailbox}_{uid}` | +| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | +| `_attachments` | object | No | CouchDB native attachments (email attachments) | +| `sourceUid` | string | Yes | Original IMAP UID from mail server | +| `mailbox` | string | Yes | Source mailbox name (e.g., "INBOX", "Sent") | +| `from` | array[string] | Yes | Sender email addresses | +| `to` | array[string] | Yes | Recipient email addresses | +| `subject` | string | Yes | Email subject line | +| `date` | string (ISO8601) | Yes | Email date from headers | +| `body` | string | Yes | Email body content (plain text) | +| `headers` | object | Yes | All email headers as key-value pairs | +| `storedAt` | string (ISO8601) | Yes | When document was stored in CouchDB | +| `docType` | string | Yes | Always "mail" for email documents | +| `hasAttachments` | boolean | Yes | Whether email has attachments | + +### Attachment Stub Schema + +When emails have attachments, they are stored as CouchDB native attachments: + +```json +{ + "filename.ext": { + "content_type": "mime/type", + "length": 12345, + "stub": true + } +} +``` + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `content_type` | string | Yes | MIME type of attachment | +| `length` | integer | No | Size in bytes | +| `stub` | boolean | No | Indicates attachment is stored separately | + +## Sync Metadata Document Schema + +**Document Type**: `sync_metadata` +**Document ID Format**: `sync_metadata_{mailbox}` (e.g., `sync_metadata_INBOX`) +**Purpose**: Tracks synchronization state for incremental syncing + +```json +{ + "_id": "sync_metadata_INBOX", + "_rev": "1-def456...", + "docType": "sync_metadata", + "mailbox": "INBOX", + "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", + "lastMessageUID": 15, + "messageCount": 18, + "updatedAt": "2025-08-02T14:26:08.281094+02:00" +} +``` + +### Field Definitions + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `_id` | string | Yes | CouchDB document ID: `sync_metadata_{mailbox}` | +| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | +| `docType` | string | Yes | Always "sync_metadata" for sync documents | +| `mailbox` | string | Yes | Mailbox name this metadata applies to | +| `lastSyncTime` | string (ISO8601) | Yes | When this mailbox was last synced | +| `lastMessageUID` | integer | Yes | Highest IMAP UID processed in last sync | +| `messageCount` | integer | Yes | Number of messages processed in last sync | +| `updatedAt` | string (ISO8601) | Yes | When this metadata was last updated | + +## Database Naming Convention + +**Format**: `m2c_{account_name}` +**Rules**: +- Prefix all databases with `m2c_` +- Convert account names to lowercase +- Replace invalid characters with underscores +- Ensure database name starts with a letter +- If account name starts with non-letter, prefix with `mail_` + +**Examples**: +- Account "Personal Gmail" → Database `m2c_personal_gmail` +- Account "123work" → Database `m2c_mail_123work` +- Email "user@example.com" → Database `m2c_user_example_com` + +## Document ID Conventions + +### Mail Documents +- **Format**: `{mailbox}_{uid}` +- **Examples**: `INBOX_123`, `Sent_456`, `Work/Projects_789` +- **Uniqueness**: Combination of mailbox and IMAP UID ensures uniqueness + +### Sync Metadata Documents +- **Format**: `sync_metadata_{mailbox}` +- **Examples**: `sync_metadata_INBOX`, `sync_metadata_Sent` +- **Purpose**: One metadata document per mailbox for tracking sync state + +## Data Type Mappings + +### Go to JSON +| Go Type | JSON Type | Example | +|---------|-----------|---------| +| `string` | string | `"text"` | +| `[]string` | array | `["item1", "item2"]` | +| `map[string][]string` | object | `{"key": ["value1", "value2"]}` | +| `time.Time` | string (ISO8601) | `"2025-08-02T14:26:08.281094+02:00"` | +| `uint32` | number | `123` | +| `int` | number | `456` | +| `bool` | boolean | `true` | + +### Rust Considerations +When implementing in Rust, ensure: +- Use `chrono::DateTime` for timestamps with ISO8601 serialization +- Use `Vec` for string arrays +- Use `HashMap>` for headers +- Use `serde` with `#[serde(rename = "fieldName")]` for JSON field mapping +- Handle optional fields with `Option` + +## Validation Rules + +### Required Fields +All documents must include: +- `_id`: Valid CouchDB document ID +- `docType`: Identifies document type for filtering +- `mailbox`: Source mailbox name (for mail documents) + +### Data Constraints +- Email addresses: No validation enforced (preserve as-is from IMAP) +- Dates: Must be valid ISO8601 format +- UIDs: Must be positive integers +- Document IDs: Must be valid CouchDB IDs (no spaces, special chars) + +### Attachment Handling +- Store email attachments as CouchDB native attachments +- Preserve original filenames and MIME types +- Use attachment stubs in document metadata +- Support binary content through CouchDB attachment API + +## Backward Compatibility + +When modifying schemas: +1. Add new fields as optional +2. Never remove existing fields +3. Maintain existing field types and formats +4. Document any breaking changes clearly +5. Provide migration guidance for existing data + +## Implementation Notes + +### CouchDB Features Used +- **Native Attachments**: For email attachments +- **Document IDs**: Predictable format for easy access +- **Bulk Operations**: For efficient storage +- **Conflict Resolution**: CouchDB handles revision conflicts + +### Performance Considerations +- Index by `docType` for efficient filtering +- Index by `mailbox` for folder-based queries +- Index by `date` for chronological access +- Use bulk insert operations for multiple messages + +### Future Extensions +This schema supports future enhancements: +- **Webmail Views**: CouchDB design documents for HTML interface +- **Search Indexes**: Full-text search with CouchDB-Lucene +- **Replication**: Multi-database sync scenarios +- **Analytics**: Message statistics and reporting \ No newline at end of file diff --git a/examples/sample-mail-document.json b/examples/sample-mail-document.json new file mode 100644 index 0000000..231981e --- /dev/null +++ b/examples/sample-mail-document.json @@ -0,0 +1,42 @@ +{ + "_id": "INBOX_123", + "_rev": "1-abc123def456789", + "_attachments": { + "report.pdf": { + "content_type": "application/pdf", + "length": 245760, + "stub": true + }, + "image.png": { + "content_type": "image/png", + "length": 12345, + "stub": true + } + }, + "sourceUid": "123", + "mailbox": "INBOX", + "from": ["sender@example.com", "alias@example.com"], + "to": ["recipient@company.com", "cc@company.com"], + "subject": "Monthly Report - Q3 2025", + "date": "2025-08-02T12:16:10Z", + "body": "Please find the attached monthly report for Q3 2025.\n\nBest regards,\nSender Name", + "headers": { + "Content-Type": ["multipart/mixed; boundary=\"----=_Part_123456\""], + "Content-Transfer-Encoding": ["7bit"], + "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"], + "From": ["sender@example.com"], + "To": ["recipient@company.com"], + "Cc": ["cc@company.com"], + "Subject": ["Monthly Report - Q3 2025"], + "Message-ID": [""], + "MIME-Version": ["1.0"], + "X-Mailer": ["Mail Client 1.0"], + "Return-Path": [""], + "Received": [ + "from smtp.example.com (smtp.example.com [192.168.1.100]) by mx.company.com (Postfix) with ESMTP id ABC123; Sat, 02 Aug 2025 14:16:10 +0200" + ] + }, + "storedAt": "2025-08-02T14:16:22.375241322+02:00", + "docType": "mail", + "hasAttachments": true +} \ No newline at end of file diff --git a/examples/sample-sync-metadata.json b/examples/sample-sync-metadata.json new file mode 100644 index 0000000..2aeeb91 --- /dev/null +++ b/examples/sample-sync-metadata.json @@ -0,0 +1,10 @@ +{ + "_id": "sync_metadata_INBOX", + "_rev": "2-def456abc789123", + "docType": "sync_metadata", + "mailbox": "INBOX", + "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", + "lastMessageUID": 123, + "messageCount": 45, + "updatedAt": "2025-08-02T14:26:08.281094+02:00" +} \ No newline at end of file diff --git a/examples/simple-mail-document.json b/examples/simple-mail-document.json new file mode 100644 index 0000000..305ba61 --- /dev/null +++ b/examples/simple-mail-document.json @@ -0,0 +1,24 @@ +{ + "_id": "Sent_456", + "_rev": "1-xyz789abc123def", + "sourceUid": "456", + "mailbox": "Sent", + "from": ["user@company.com"], + "to": ["client@external.com"], + "subject": "Meeting Follow-up", + "date": "2025-08-02T10:30:00Z", + "body": "Thank you for the productive meeting today. As discussed, I'll send the proposal by end of week.\n\nBest regards,\nUser Name", + "headers": { + "Content-Type": ["text/plain; charset=utf-8"], + "Content-Transfer-Encoding": ["7bit"], + "Date": ["Sat, 02 Aug 2025 12:30:00 +0200"], + "From": ["user@company.com"], + "To": ["client@external.com"], + "Subject": ["Meeting Follow-up"], + "Message-ID": [""], + "MIME-Version": ["1.0"] + }, + "storedAt": "2025-08-02T12:30:45.123456789+02:00", + "docType": "mail", + "hasAttachments": false +} \ No newline at end of file diff --git a/go/config/config.go b/go/config/config.go index 5581094..dc1808e 100644 --- a/go/config/config.go +++ b/go/config/config.go @@ -40,7 +40,7 @@ type FolderFilter struct { type MessageFilter struct { Since string `json:"since,omitempty"` SubjectKeywords []string `json:"subjectKeywords,omitempty"` // Filter by keywords in subject - SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses + SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses RecipientKeywords []string `json:"recipientKeywords,omitempty"` // Filter by keywords in recipient addresses } diff --git a/go/couch/couch.go b/go/couch/couch.go index 7a3c5ab..c75c3b6 100644 --- a/go/couch/couch.go +++ b/go/couch/couch.go @@ -22,20 +22,20 @@ type Client struct { // MailDocument represents an email message stored in CouchDB type MailDocument struct { - ID string `json:"_id,omitempty"` - Rev string `json:"_rev,omitempty"` - Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments - SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) - Mailbox string `json:"mailbox"` // Source mailbox name - From []string `json:"from"` - To []string `json:"to"` - Subject string `json:"subject"` - Date time.Time `json:"date"` - Body string `json:"body"` - Headers map[string][]string `json:"headers"` - StoredAt time.Time `json:"storedAt"` // When the document was stored - DocType string `json:"docType"` // Always "mail" - HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments + ID string `json:"_id,omitempty"` + Rev string `json:"_rev,omitempty"` + Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments + SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) + Mailbox string `json:"mailbox"` // Source mailbox name + From []string `json:"from"` + To []string `json:"to"` + Subject string `json:"subject"` + Date time.Time `json:"date"` + Body string `json:"body"` + Headers map[string][]string `json:"headers"` + StoredAt time.Time `json:"storedAt"` // When the document was stored + DocType string `json:"docType"` // Always "mail" + HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments } // AttachmentStub represents metadata for a CouchDB attachment @@ -94,19 +94,19 @@ func GenerateAccountDBName(accountName, userEmail string) string { if name == "" { name = userEmail } - + // Convert to lowercase and replace invalid characters with underscores name = strings.ToLower(name) // CouchDB database names must match: ^[a-z][a-z0-9_$()+/-]*$ validName := regexp.MustCompile(`[^a-z0-9_$()+/-]`).ReplaceAllString(name, "_") - + // Ensure it starts with a letter and add m2c prefix if len(validName) > 0 && (validName[0] < 'a' || validName[0] > 'z') { validName = "m2c_mail_" + validName } else { validName = "m2c_" + validName } - + return validName } @@ -228,7 +228,7 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri // Create a view query to get all document IDs for the specified mailbox rows := db.AllDocs(ctx) - + docIDs := make(map[string]bool) for rows.Next() { docID, err := rows.ID() @@ -240,11 +240,11 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri docIDs[docID] = true } } - + if rows.Err() != nil { return nil, rows.Err() } - + return docIDs, nil } @@ -295,7 +295,7 @@ func (c *Client) SyncMailbox(ctx context.Context, dbName, mailbox string, curren if len(parts) < 2 { continue } - + uidStr := parts[len(parts)-1] uid := uint32(0) if _, err := fmt.Sscanf(uidStr, "%d", &uid); err != nil { diff --git a/go/go.mod b/go/go.mod index 377160a..1f6d85c 100644 --- a/go/go.mod +++ b/go/go.mod @@ -4,11 +4,11 @@ go 1.24.4 require ( github.com/emersion/go-imap/v2 v2.0.0-beta.5 + github.com/emersion/go-message v0.18.1 github.com/go-kivik/kivik/v4 v4.4.0 ) require ( - github.com/emersion/go-message v0.18.1 // indirect github.com/emersion/go-sasl v0.0.0-20231106173351-e73c9f7bad43 // indirect github.com/google/uuid v1.6.0 // indirect golang.org/x/net v0.25.0 // indirect diff --git a/go/mail/imap.go b/go/mail/imap.go index 63c7d7e..6ba4453 100644 --- a/go/mail/imap.go +++ b/go/mail/imap.go @@ -104,7 +104,7 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // First, get all current UIDs in the mailbox for sync purposes allUIDsSet := imap.SeqSet{} allUIDsSet.AddRange(1, mbox.NumMessages) - + // Fetch UIDs for all messages to track current state uidCmd := c.Fetch(allUIDsSet, &imap.FetchOptions{UID: true}) for { @@ -112,12 +112,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if msg == nil { break } - + data, err := msg.Collect() if err != nil { continue } - + if data.UID != 0 { currentUIDs[uint32(data.UID)] = true } @@ -126,13 +126,13 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Determine which messages to fetch based on since date var seqSet imap.SeqSet - + if since != nil { // Use IMAP SEARCH to find messages since the specified date searchCriteria := &imap.SearchCriteria{ Since: *since, } - + searchCmd := c.Search(searchCriteria, nil) searchResults, err := searchCmd.Wait() if err != nil { @@ -149,12 +149,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if len(searchSeqNums) == 0 { return []*Message{}, currentUIDs, nil } - + // Limit results if maxMessages is specified if maxMessages > 0 && len(searchSeqNums) > maxMessages { searchSeqNums = searchSeqNums[len(searchSeqNums)-maxMessages:] } - + for _, seqNum := range searchSeqNums { seqSet.AddNum(seqNum) } @@ -165,11 +165,11 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if maxMessages > 0 && int(numToFetch) > maxMessages { numToFetch = uint32(maxMessages) } - + if numToFetch == 0 { return []*Message{}, currentUIDs, nil } - + // Fetch the most recent messages seqSet.AddRange(mbox.NumMessages-numToFetch+1, mbox.NumMessages) } @@ -177,12 +177,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Fetch message data - get envelope and full message body options := &imap.FetchOptions{ Envelope: true, - UID: true, + UID: true, BodySection: []*imap.FetchItemBodySection{ {}, // Empty section gets the entire message }, } - + fetchCmd := c.Fetch(seqSet, options) for { @@ -196,12 +196,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i log.Printf("Failed to parse message: %v", err) continue } - + // Apply message-level keyword filtering if messageFilter != nil && !c.ShouldProcessMessage(parsedMsg, messageFilter) { continue // Skip this message due to keyword filter } - + messages = append(messages, parsedMsg) } @@ -231,7 +231,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa env := buffer.Envelope msg.Subject = env.Subject msg.Date = env.Date - + // Parse From addresses for _, addr := range env.From { if addr.Mailbox != "" { @@ -242,7 +242,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa msg.From = append(msg.From, fullAddr) } } - + // Parse To addresses for _, addr := range env.To { if addr.Mailbox != "" { @@ -264,7 +264,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa if len(buffer.BodySection) > 0 { bodyBuffer := buffer.BodySection[0] reader := bytes.NewReader(bodyBuffer.Bytes) - + // Parse the message using go-message entity, err := message.Read(reader) if err != nil { @@ -338,7 +338,7 @@ func (c *ImapClient) parseMessagePart(entity *message.Entity, msg *Message) erro disposition, dispositionParams, _ := entity.Header.ContentDisposition() // Determine if this is an attachment - isAttachment := disposition == "attachment" || + isAttachment := disposition == "attachment" || (disposition == "inline" && dispositionParams["filename"] != "") || params["name"] != "" diff --git a/go/mail2couch b/go/mail2couch deleted file mode 100755 index 2133741..0000000 Binary files a/go/mail2couch and /dev/null differ diff --git a/go/main.go b/go/main.go index 8d4b661..155b195 100644 --- a/go/main.go +++ b/go/main.go @@ -13,7 +13,7 @@ import ( func main() { args := config.ParseCommandLine() - + cfg, err := config.LoadConfigWithDiscovery(args) if err != nil { log.Fatalf("Failed to load configuration: %v", err) @@ -33,12 +33,12 @@ func main() { // Generate per-account database name dbName := couch.GenerateAccountDBName(source.Name, source.User) - + // Ensure the account-specific database exists ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) err = couchClient.EnsureDB(ctx, dbName) cancel() - + if err != nil { log.Printf("Could not ensure CouchDB database '%s' exists (is it running?): %v", dbName, err) continue @@ -111,7 +111,7 @@ func processImapSource(source *config.MailSource, couchClient *couch.Client, dbN if syncMetadata != nil { // Use last sync time for incremental sync sinceDate = &syncMetadata.LastSyncTime - fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", + fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", sinceDate.Format("2006-01-02 15:04:05"), syncMetadata.MessageCount) } else { // First sync - use config since date if available diff --git a/rust/Cargo.toml b/rust/Cargo.toml new file mode 100644 index 0000000..87fcab2 --- /dev/null +++ b/rust/Cargo.toml @@ -0,0 +1,52 @@ +[package] +name = "mail2couch" +version = "0.1.0" +edition = "2021" +description = "A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB" +license = "MIT" +repository = "https://github.com/yourusername/mail2couch" +keywords = ["email", "backup", "imap", "couchdb", "sync"] +categories = ["email", "database"] + +[dependencies] +# Serialization +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" + +# Date/time handling +chrono = { version = "0.4", features = ["serde"] } + +# HTTP client for CouchDB +reqwest = { version = "0.11", features = ["json"] } + +# Async runtime +tokio = { version = "1.0", features = ["full"] } + +# Error handling +thiserror = "1.0" +anyhow = "1.0" + +# Configuration +config = "0.13" + +# IMAP client (when implementing IMAP functionality) +# async-imap = "0.9" # Commented out for now due to compatibility issues + +# Logging +log = "0.4" +env_logger = "0.10" + +# CLI argument parsing +clap = { version = "4.0", features = ["derive"] } + +[dev-dependencies] +# Testing utilities +tokio-test = "0.4" + +[lib] +name = "mail2couch" +path = "src/lib.rs" + +[[bin]] +name = "mail2couch" +path = "src/main.rs" \ No newline at end of file diff --git a/rust/README.md b/rust/README.md new file mode 100644 index 0000000..4265bcd --- /dev/null +++ b/rust/README.md @@ -0,0 +1,111 @@ +# Mail2Couch Rust Implementation + +This directory contains the Rust implementation of mail2couch, which will provide the same functionality as the Go implementation while maintaining full compatibility with the CouchDB document schemas. + +## Current Status + +🚧 **Work in Progress** - The Rust implementation is planned for future development. + +Currently available: +- ✅ **CouchDB Schema Definitions**: Complete Rust structs that match the Go implementation +- ✅ **Serialization Support**: Full serde integration for JSON handling +- ✅ **Type Safety**: Strongly typed structures for all CouchDB documents +- ✅ **Compatibility Tests**: Validated against example documents +- ✅ **Database Naming**: Same database naming logic as Go implementation + +## Schema Compatibility + +The Rust implementation uses the same CouchDB document schemas as the Go implementation: + +### Mail Documents +```rust +use mail2couch::{MailDocument, generate_database_name}; + +let mut doc = MailDocument::new( + "123".to_string(), // IMAP UID + "INBOX".to_string(), // Mailbox + vec!["sender@example.com".to_string()], // From + vec!["recipient@example.com".to_string()], // To + "Subject".to_string(), // Subject + Utc::now(), // Date + "Body content".to_string(), // Body + HashMap::new(), // Headers + false, // Has attachments +); + +doc.set_id(); // Sets ID to "INBOX_123" +``` + +### Sync Metadata +```rust +use mail2couch::SyncMetadata; + +let metadata = SyncMetadata::new( + "INBOX".to_string(), // Mailbox + Utc::now(), // Last sync time + 456, // Last message UID + 100, // Message count +); +// ID automatically set to "sync_metadata_INBOX" +``` + +### Database Naming +```rust +use mail2couch::generate_database_name; + +let db_name = generate_database_name("Personal Gmail", ""); +// Returns: "m2c_personal_gmail" + +let db_name = generate_database_name("", "user@example.com"); +// Returns: "m2c_user_example_com" +``` + +## Dependencies + +The Rust implementation uses these key dependencies: + +- **serde**: JSON serialization/deserialization +- **chrono**: Date/time handling with ISO8601 support +- **reqwest**: HTTP client for CouchDB API +- **tokio**: Async runtime +- **anyhow/thiserror**: Error handling + +## Testing + +Run the schema compatibility tests: + +```bash +cargo test +``` + +All tests validate that the Rust structures produce JSON compatible with the Go implementation and documented schemas. + +## Future Implementation + +The planned Rust implementation will include: + +- **IMAP Client**: Connect to mail servers and retrieve messages +- **CouchDB Integration**: Store documents using native Rust CouchDB client +- **Configuration**: Same JSON config format as Go implementation +- **CLI Interface**: Compatible command-line interface +- **Performance**: Leveraging Rust's performance characteristics +- **Memory Safety**: Rust's ownership model for reliable operation + +## Schema Documentation + +See the following files for complete schema documentation: + +- [`../couchdb-schemas.md`](../couchdb-schemas.md): Complete schema specification +- [`../examples/`](../examples/): JSON example documents +- [`src/schemas.rs`](src/schemas.rs): Rust type definitions + +## Cross-Implementation Compatibility + +Both Go and Rust implementations: +- Use identical CouchDB document schemas +- Generate the same database names +- Store documents with the same field names and types +- Support incremental sync with compatible metadata +- Handle attachments using CouchDB native attachment storage + +This ensures that databases created by either implementation can be used interchangeably. \ No newline at end of file diff --git a/rust/src/lib.rs b/rust/src/lib.rs new file mode 100644 index 0000000..6be3a6f --- /dev/null +++ b/rust/src/lib.rs @@ -0,0 +1,20 @@ +//! # mail2couch +//! +//! A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB. +//! +//! This library provides the core functionality for: +//! - Connecting to IMAP servers +//! - Retrieving email messages and attachments +//! - Storing emails in CouchDB with proper document structures +//! - Incremental synchronization to avoid re-processing messages +//! - Filtering by folders, dates, and keywords +//! +//! ## Document Schemas +//! +//! The library uses well-defined CouchDB document schemas that are compatible +//! with the Go implementation. See the `schemas` module for details. + +pub mod schemas; + +// Re-export main types for convenience +pub use schemas::{MailDocument, SyncMetadata, AttachmentStub, generate_database_name}; \ No newline at end of file diff --git a/rust/src/main.rs b/rust/src/main.rs new file mode 100644 index 0000000..db9d28f --- /dev/null +++ b/rust/src/main.rs @@ -0,0 +1,7 @@ +// Placeholder main.rs for Rust implementation +// This will be implemented in the future + +fn main() { + println!("mail2couch Rust implementation - Coming Soon!"); + println!("See the Go implementation in ../go/ for current functionality."); +} \ No newline at end of file diff --git a/rust/src/schemas.rs b/rust/src/schemas.rs new file mode 100644 index 0000000..5f75145 --- /dev/null +++ b/rust/src/schemas.rs @@ -0,0 +1,266 @@ +// CouchDB document schemas for mail2couch +// This file defines the Rust structures that correspond to the CouchDB document schemas +// defined in couchdb-schemas.md + +use chrono::{DateTime, Utc}; +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; + +/// Represents an email message stored in CouchDB +/// Document ID format: {mailbox}_{uid} (e.g., "INBOX_123") +/// Document type: "mail" +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct MailDocument { + /// CouchDB document ID + #[serde(rename = "_id")] + #[serde(skip_serializing_if = "Option::is_none")] + pub id: Option, + + /// CouchDB revision (managed by CouchDB) + #[serde(rename = "_rev")] + #[serde(skip_serializing_if = "Option::is_none")] + pub rev: Option, + + /// CouchDB native attachments for email attachments + #[serde(rename = "_attachments")] + #[serde(skip_serializing_if = "Option::is_none")] + pub attachments: Option>, + + /// Original IMAP UID from mail server + #[serde(rename = "sourceUid")] + pub source_uid: String, + + /// Source mailbox name (e.g., "INBOX", "Sent") + pub mailbox: String, + + /// Sender email addresses + pub from: Vec, + + /// Recipient email addresses + pub to: Vec, + + /// Email subject line + pub subject: String, + + /// Email date from headers (ISO8601 format) + pub date: DateTime, + + /// Email body content (plain text) + pub body: String, + + /// All email headers as key-value pairs + pub headers: HashMap>, + + /// When document was stored in CouchDB (ISO8601 format) + #[serde(rename = "storedAt")] + pub stored_at: DateTime, + + /// Document type identifier (always "mail") + #[serde(rename = "docType")] + pub doc_type: String, + + /// Whether email has attachments + #[serde(rename = "hasAttachments")] + pub has_attachments: bool, +} + +/// Metadata for CouchDB native attachments +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct AttachmentStub { + /// MIME type of attachment + #[serde(rename = "content_type")] + pub content_type: String, + + /// Size in bytes (optional) + #[serde(skip_serializing_if = "Option::is_none")] + pub length: Option, + + /// Indicates attachment is stored separately (optional) + #[serde(skip_serializing_if = "Option::is_none")] + pub stub: Option, +} + +/// Sync state information for incremental syncing +/// Document ID format: sync_metadata_{mailbox} (e.g., "sync_metadata_INBOX") +/// Document type: "sync_metadata" +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SyncMetadata { + /// CouchDB document ID + #[serde(rename = "_id")] + #[serde(skip_serializing_if = "Option::is_none")] + pub id: Option, + + /// CouchDB revision (managed by CouchDB) + #[serde(rename = "_rev")] + #[serde(skip_serializing_if = "Option::is_none")] + pub rev: Option, + + /// Document type identifier (always "sync_metadata") + #[serde(rename = "docType")] + pub doc_type: String, + + /// Mailbox name this metadata applies to + pub mailbox: String, + + /// When this mailbox was last synced (ISO8601 format) + #[serde(rename = "lastSyncTime")] + pub last_sync_time: DateTime, + + /// Highest IMAP UID processed in last sync + #[serde(rename = "lastMessageUID")] + pub last_message_uid: u32, + + /// Number of messages processed in last sync + #[serde(rename = "messageCount")] + pub message_count: u32, + + /// When this metadata was last updated (ISO8601 format) + #[serde(rename = "updatedAt")] + pub updated_at: DateTime, +} + +impl MailDocument { + /// Create a new MailDocument with required fields + pub fn new( + source_uid: String, + mailbox: String, + from: Vec, + to: Vec, + subject: String, + date: DateTime, + body: String, + headers: HashMap>, + has_attachments: bool, + ) -> Self { + let now = Utc::now(); + Self { + id: None, // Will be set when storing to CouchDB + rev: None, // Managed by CouchDB + attachments: None, + source_uid, + mailbox, + from, + to, + subject, + date, + body, + headers, + stored_at: now, + doc_type: "mail".to_string(), + has_attachments, + } + } + + /// Generate document ID based on mailbox and UID + pub fn generate_id(&self) -> String { + format!("{}_{}", self.mailbox, self.source_uid) + } + + /// Set the document ID + pub fn set_id(&mut self) { + self.id = Some(self.generate_id()); + } +} + +impl SyncMetadata { + /// Create new sync metadata for a mailbox + pub fn new( + mailbox: String, + last_sync_time: DateTime, + last_message_uid: u32, + message_count: u32, + ) -> Self { + let now = Utc::now(); + Self { + id: Some(format!("sync_metadata_{}", mailbox)), + rev: None, // Managed by CouchDB + doc_type: "sync_metadata".to_string(), + mailbox, + last_sync_time, + last_message_uid, + message_count, + updated_at: now, + } + } +} + +/// Generate CouchDB database name from account information +/// Format: m2c_{account_name} +/// Rules: lowercase, replace invalid chars with underscores, ensure starts with letter +pub fn generate_database_name(account_name: &str, user_email: &str) -> String { + let name = if account_name.is_empty() { + user_email + } else { + account_name + }; + + // Convert to lowercase and replace invalid characters + let mut valid_name = name + .to_lowercase() + .chars() + .map(|c| { + if c.is_ascii_alphanumeric() || c == '_' || c == '$' || c == '(' || c == ')' || c == '+' || c == '-' || c == '/' { + c + } else { + '_' + } + }) + .collect::(); + + // Ensure starts with a letter + if valid_name.is_empty() || !valid_name.chars().next().unwrap().is_ascii_lowercase() { + valid_name = format!("m2c_mail_{}", valid_name); + } else { + valid_name = format!("m2c_{}", valid_name); + } + + valid_name +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_generate_database_name() { + assert_eq!(generate_database_name("Personal Gmail", ""), "m2c_personal_gmail"); + assert_eq!(generate_database_name("", "user@example.com"), "m2c_user_example_com"); + assert_eq!(generate_database_name("123work", ""), "m2c_mail_123work"); + } + + #[test] + fn test_mail_document_id_generation() { + let mut doc = MailDocument::new( + "123".to_string(), + "INBOX".to_string(), + vec!["sender@example.com".to_string()], + vec!["recipient@example.com".to_string()], + "Test Subject".to_string(), + Utc::now(), + "Test body".to_string(), + HashMap::new(), + false, + ); + + assert_eq!(doc.generate_id(), "INBOX_123"); + + doc.set_id(); + assert_eq!(doc.id, Some("INBOX_123".to_string())); + } + + #[test] + fn test_sync_metadata_creation() { + let metadata = SyncMetadata::new( + "INBOX".to_string(), + Utc::now(), + 456, + 100, + ); + + assert_eq!(metadata.id, Some("sync_metadata_INBOX".to_string())); + assert_eq!(metadata.doc_type, "sync_metadata"); + assert_eq!(metadata.mailbox, "INBOX"); + assert_eq!(metadata.last_message_uid, 456); + assert_eq!(metadata.message_count, 100); + } +} \ No newline at end of file diff --git a/scripts/validate-schemas.py b/scripts/validate-schemas.py new file mode 100755 index 0000000..c3f037f --- /dev/null +++ b/scripts/validate-schemas.py @@ -0,0 +1,169 @@ +#!/usr/bin/env python3 +""" +Schema Validation Script for mail2couch + +This script validates that the CouchDB document schemas are consistent +between the Go implementation and the documented JSON examples. +""" + +import json +import sys +from pathlib import Path + +def load_json_file(file_path): + """Load and parse a JSON file.""" + try: + with open(file_path, 'r') as f: + return json.load(f) + except FileNotFoundError: + print(f"ERROR: File not found: {file_path}") + return None + except json.JSONDecodeError as e: + print(f"ERROR: Invalid JSON in {file_path}: {e}") + return None + +def validate_mail_document(doc, filename): + """Validate a mail document structure.""" + required_fields = [ + '_id', 'sourceUid', 'mailbox', 'from', 'to', 'subject', + 'date', 'body', 'headers', 'storedAt', 'docType', 'hasAttachments' + ] + + errors = [] + + # Check required fields + for field in required_fields: + if field not in doc: + errors.append(f"Missing required field: {field}") + + # Check field types + if 'docType' in doc and doc['docType'] != 'mail': + errors.append(f"Invalid docType: expected 'mail', got '{doc['docType']}'") + + if 'from' in doc and not isinstance(doc['from'], list): + errors.append("Field 'from' must be an array") + + if 'to' in doc and not isinstance(doc['to'], list): + errors.append("Field 'to' must be an array") + + if 'headers' in doc and not isinstance(doc['headers'], dict): + errors.append("Field 'headers' must be an object") + + if 'hasAttachments' in doc and not isinstance(doc['hasAttachments'], bool): + errors.append("Field 'hasAttachments' must be a boolean") + + # Check _id format + if '_id' in doc: + doc_id = doc['_id'] + if '_' not in doc_id: + errors.append(f"Document ID '{doc_id}' should follow format 'mailbox_uid'") + + # Validate attachments if present + if '_attachments' in doc: + if not isinstance(doc['_attachments'], dict): + errors.append("Field '_attachments' must be an object") + else: + for filename, stub in doc['_attachments'].items(): + if 'content_type' not in stub: + errors.append(f"Attachment '{filename}' missing content_type") + + if errors: + print(f"ERRORS in {filename}:") + for error in errors: + print(f" - {error}") + return False + else: + print(f"✓ {filename}: Valid mail document") + return True + +def validate_sync_metadata(doc, filename): + """Validate a sync metadata document structure.""" + required_fields = [ + '_id', 'docType', 'mailbox', 'lastSyncTime', + 'lastMessageUID', 'messageCount', 'updatedAt' + ] + + errors = [] + + # Check required fields + for field in required_fields: + if field not in doc: + errors.append(f"Missing required field: {field}") + + # Check field types + if 'docType' in doc and doc['docType'] != 'sync_metadata': + errors.append(f"Invalid docType: expected 'sync_metadata', got '{doc['docType']}'") + + if 'lastMessageUID' in doc and not isinstance(doc['lastMessageUID'], int): + errors.append("Field 'lastMessageUID' must be an integer") + + if 'messageCount' in doc and not isinstance(doc['messageCount'], int): + errors.append("Field 'messageCount' must be an integer") + + # Check _id format + if '_id' in doc: + doc_id = doc['_id'] + if not doc_id.startswith('sync_metadata_'): + errors.append(f"Document ID '{doc_id}' should start with 'sync_metadata_'") + + if errors: + print(f"ERRORS in {filename}:") + for error in errors: + print(f" - {error}") + return False + else: + print(f"✓ {filename}: Valid sync metadata document") + return True + +def main(): + """Main validation function.""" + script_dir = Path(__file__).parent + project_root = script_dir.parent + examples_dir = project_root / "examples" + + print("Validating CouchDB document schemas...") + print("=" * 50) + + all_valid = True + + # Validate mail documents + mail_files = [ + "sample-mail-document.json", + "simple-mail-document.json" + ] + + for filename in mail_files: + file_path = examples_dir / filename + doc = load_json_file(file_path) + if doc is None: + all_valid = False + continue + + if not validate_mail_document(doc, filename): + all_valid = False + + # Validate sync metadata + sync_files = [ + "sample-sync-metadata.json" + ] + + for filename in sync_files: + file_path = examples_dir / filename + doc = load_json_file(file_path) + if doc is None: + all_valid = False + continue + + if not validate_sync_metadata(doc, filename): + all_valid = False + + print("=" * 50) + if all_valid: + print("✓ All schemas are valid!") + sys.exit(0) + else: + print("✗ Schema validation failed!") + sys.exit(1) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/test/README.md b/test/README.md index bd4c180..ec83430 100644 --- a/test/README.md +++ b/test/README.md @@ -8,6 +8,7 @@ The test environment provides: - **CouchDB**: Database for storing email messages - **GreenMail IMAP Server**: Java-based mail server designed for testing with pre-populated test accounts and messages - **Test Configuration**: Ready-to-use config for testing both sync and archive modes +- **HTML Webmail Interface**: Beautiful, responsive web interface for viewing archived emails ## Quick Start @@ -114,6 +115,20 @@ Each account contains: - **SMTP Port**: 3025 - **Server**: GreenMail (Java-based test server) +### Accessing Test Data +After running mail2couch, you can access the stored emails via CouchDB's REST API: + +**📋 Database Access:** +- All databases: http://localhost:5984/_all_dbs +- Specific database: http://localhost:5984/m2c_specific_folders_only +- All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs +- Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12 + +**🔍 Raw Data Examples:** +- Database info: http://localhost:5984/m2c_specific_folders_only +- Document content: http://localhost:5984/m2c_specific_folders_only/INBOX_1 +- Email attachments: http://localhost:5984/m2c_specific_folders_only/INBOX_1/{attachment_name} + ## Database Structure mail2couch will create separate databases for each mail source (with `m2c_` prefix): @@ -126,6 +141,7 @@ Each database contains documents with: - `mailbox` field indicating the origin folder - Native CouchDB attachments for email attachments - Full message headers and body content +- JSON documents accessible via CouchDB REST API ## Testing Sync vs Archive Modes diff --git a/test/start-test-env.sh b/test/start-test-env.sh index 2e6a82c..dfb9b59 100755 --- a/test/start-test-env.sh +++ b/test/start-test-env.sh @@ -63,5 +63,18 @@ echo "" echo "To run mail2couch:" echo " cd ../go && ./mail2couch -config ../test/config-test.json" echo "" +echo "📧 MAIL2COUCH DATABASE ACCESS:" +echo "After running mail2couch, you can access the stored emails via CouchDB:" +echo "" +echo "📋 Database Access (examples after sync):" +echo " - All databases: http://localhost:5984/_all_dbs" +echo " - Specific database: http://localhost:5984/m2c_specific_folders_only" +echo " - All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs" +echo " - Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12" +echo "" +echo "🔍 Raw Data Access:" +echo " - Database info: http://localhost:5984/m2c_specific_folders_only" +echo " - Document with content: http://localhost:5984/m2c_specific_folders_only/INBOX_12" +echo "" echo "To stop the environment:" echo " ./stop-test-env.sh" \ No newline at end of file diff --git a/test/test-incremental-sync.sh b/test/test-incremental-sync.sh index bc35016..45e0005 100755 --- a/test/test-incremental-sync.sh +++ b/test/test-incremental-sync.sh @@ -81,7 +81,17 @@ add_new_messages() { import imaplib import time -from test.populate_greenmail import create_simple_message +import sys +import os + +# Add the test directory to Python path to enable imports +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +import importlib.util +spec = importlib.util.spec_from_file_location("populate_greenmail", "populate-greenmail.py") +populate_greenmail = importlib.util.module_from_spec(spec) +spec.loader.exec_module(populate_greenmail) +create_simple_message = populate_greenmail.create_simple_message def add_new_messages(): """Add new messages to test incremental sync"""