diff --git a/.gitignore b/.gitignore index 07ab0c3..c3d683d 100644 --- a/.gitignore +++ b/.gitignore @@ -47,4 +47,3 @@ go.work.sum # env file .env -__pycache__ diff --git a/CLAUDE.md b/CLAUDE.md index 4a2cb2b..fce9b14 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -57,7 +57,7 @@ cd go && go mod tidy 2. **Mail Handling (`mail/`)**: IMAP client implementation - Uses `github.com/emersion/go-imap/v2` for IMAP operations - Supports TLS connections - - Fetches and processes email messages from IMAP mailboxes + - Currently only lists mailboxes (backup functionality not yet implemented) 3. **CouchDB Integration (`couch/`)**: Database operations - Uses `github.com/go-kivik/kivik/v4` as CouchDB driver @@ -95,7 +95,7 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Per-account CouchDB database creation and management - ✅ IMAP connection and mailbox listing - ✅ Build error fixes -- ✅ Real IMAP message retrieval and parsing +- ✅ Email message retrieval framework (with placeholder data) - ✅ Email storage to CouchDB framework with native attachments - ✅ Folder filtering logic with wildcard support (`*`, `?`, `[abc]` patterns) - ✅ Date filtering support @@ -103,6 +103,7 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Duplicate detection and prevention - ✅ Sync vs Archive mode implementation - ✅ CouchDB attachment storage for email attachments +- ✅ Real IMAP message parsing with go-message library - ✅ Full message body and attachment handling with MIME multipart support - ✅ Command line argument support (--max-messages flag) - ✅ Per-account CouchDB databases for better organization @@ -142,7 +143,6 @@ Sync metadata documents are stored in CouchDB with ID format: `sync_metadata_{ma - Comprehensive test environment with Podman containers and automated test scripts - The application uses automatic config file discovery as documented above - ### Next Steps The following enhancements could further improve the implementation: @@ -158,5 +158,4 @@ The following enhancements could further improve the implementation: ## Development Guidelines ### Code Quality and Standards -- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly -- We always want linting and formatting of our code to be perfect \ No newline at end of file +- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly \ No newline at end of file diff --git a/README.md b/README.md index 7c0fc5e..f81e050 100644 --- a/README.md +++ b/README.md @@ -27,16 +27,6 @@ A powerful email backup utility that synchronizes mail from IMAP accounts to Cou - **Complete Headers**: Preserve all email headers and metadata - **UTF-8 Support**: Handle international characters and special content -### HTML Webmail Interface -- **Beautiful Web Interface**: Modern, responsive HTML presentations for viewing archived emails -- **Gmail-like Design**: Professional, mobile-friendly interface with clean typography -- **Message Lists**: Dynamic HTML lists with sorting, filtering, and folder organization -- **Individual Messages**: Rich HTML display with proper formatting, URL linking, and collapsible headers -- **Attachment Support**: Direct download links with file type and size information -- **Search Integration**: Full-text subject search with keyword highlighting -- **Folder Analytics**: Message count summaries and folder-based navigation -- **Mobile Responsive**: Optimized for desktop, tablet, and mobile viewing - ### Operational Features - **Automatic Config Discovery**: Finds configuration files in standard locations - **Command Line Control**: Override settings with `--max-messages` and `--config` flags @@ -174,7 +164,6 @@ cd test - **Message Documents**: Each email becomes a CouchDB document with metadata - **Native Attachments**: Email attachments stored as CouchDB attachments (compressed) - **Sync Metadata**: Tracks incremental sync state per mailbox -- **HTML Webmail Views**: CouchDB design documents with show/list functions for web interface ### Document Structure ```json @@ -200,28 +189,6 @@ cd test } ``` -### Accessing Stored Emails - -Once mail2couch has synced your emails, you can access them through CouchDB's REST API: - -#### Raw Data Access -```bash -# List all databases -http://localhost:5984/_all_dbs - -# View database info -http://localhost:5984/{database} - -# List all documents in database -http://localhost:5984/{database}/_all_docs - -# Get individual message -http://localhost:5984/{database}/{message_id} - -# Get message with attachments -http://localhost:5984/{database}/{message_id}/{attachment_name} -``` - ## Example Configurations ### Simple Configuration @@ -413,30 +380,6 @@ Complex setup with multiple accounts, filtering, and different sync modes: For detailed troubleshooting, see the [test environment documentation](test/README.md). -## Future Plans - -### CouchDB-Hosted Webmail Viewer - -We plan to develop a comprehensive webmail interface for viewing the archived emails directly through CouchDB. This will include: - -- **📧 Modern Web Interface**: A responsive, Gmail-style webmail viewer built on CouchDB design documents -- **🔍 Advanced Search**: Full-text search across subjects, senders, and message content -- **📁 Folder Organization**: Browse messages by mailbox with visual indicators and statistics -- **📎 Attachment Viewer**: Direct download and preview of email attachments -- **📱 Mobile Support**: Optimized interface for tablets and smartphones -- **🎨 Customizable Themes**: Multiple UI themes and layout options -- **⚡ Real-time Updates**: Live synchronization as new emails are archived -- **🔐 Authentication**: Secure access controls and user management -- **📊 Analytics Dashboard**: Email statistics and storage insights - -This webmail viewer will be implemented as: -- **CouchDB Design Documents**: Views, shows, and list functions for data access -- **Self-contained HTML/CSS/JS**: No external dependencies or servers required -- **RESTful Architecture**: Clean API endpoints for integration with other tools -- **Progressive Enhancement**: Works with JavaScript disabled for basic functionality - -The webmail interface will be a separate component that can be optionally installed alongside the core mail2couch storage functionality, maintaining the clean separation between data archival and presentation layers. - ## Contributing This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details. diff --git a/couchdb-schemas.md b/couchdb-schemas.md deleted file mode 100644 index 57c170d..0000000 --- a/couchdb-schemas.md +++ /dev/null @@ -1,207 +0,0 @@ -# CouchDB Document Schemas - -This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.). - -## Mail Document Schema - -**Document Type**: `mail` -**Document ID Format**: `{mailbox}_{uid}` (e.g., `INBOX_123`) -**Purpose**: Stores individual email messages with metadata and content - -```json -{ - "_id": "INBOX_123", - "_rev": "1-abc123...", - "_attachments": { - "attachment1.pdf": { - "content_type": "application/pdf", - "length": 12345, - "stub": true - } - }, - "sourceUid": "123", - "mailbox": "INBOX", - "from": ["sender@example.com"], - "to": ["recipient@example.com"], - "subject": "Email Subject", - "date": "2025-08-02T12:16:10Z", - "body": "Email body content", - "headers": { - "Content-Type": ["text/plain; charset=utf-8"], - "Message-ID": [""], - "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"] - }, - "storedAt": "2025-08-02T14:16:22.375241322+02:00", - "docType": "mail", - "hasAttachments": true -} -``` - -### Field Definitions - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `_id` | string | Yes | CouchDB document ID: `{mailbox}_{uid}` | -| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | -| `_attachments` | object | No | CouchDB native attachments (email attachments) | -| `sourceUid` | string | Yes | Original IMAP UID from mail server | -| `mailbox` | string | Yes | Source mailbox name (e.g., "INBOX", "Sent") | -| `from` | array[string] | Yes | Sender email addresses | -| `to` | array[string] | Yes | Recipient email addresses | -| `subject` | string | Yes | Email subject line | -| `date` | string (ISO8601) | Yes | Email date from headers | -| `body` | string | Yes | Email body content (plain text) | -| `headers` | object | Yes | All email headers as key-value pairs | -| `storedAt` | string (ISO8601) | Yes | When document was stored in CouchDB | -| `docType` | string | Yes | Always "mail" for email documents | -| `hasAttachments` | boolean | Yes | Whether email has attachments | - -### Attachment Stub Schema - -When emails have attachments, they are stored as CouchDB native attachments: - -```json -{ - "filename.ext": { - "content_type": "mime/type", - "length": 12345, - "stub": true - } -} -``` - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `content_type` | string | Yes | MIME type of attachment | -| `length` | integer | No | Size in bytes | -| `stub` | boolean | No | Indicates attachment is stored separately | - -## Sync Metadata Document Schema - -**Document Type**: `sync_metadata` -**Document ID Format**: `sync_metadata_{mailbox}` (e.g., `sync_metadata_INBOX`) -**Purpose**: Tracks synchronization state for incremental syncing - -```json -{ - "_id": "sync_metadata_INBOX", - "_rev": "1-def456...", - "docType": "sync_metadata", - "mailbox": "INBOX", - "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", - "lastMessageUID": 15, - "messageCount": 18, - "updatedAt": "2025-08-02T14:26:08.281094+02:00" -} -``` - -### Field Definitions - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `_id` | string | Yes | CouchDB document ID: `sync_metadata_{mailbox}` | -| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) | -| `docType` | string | Yes | Always "sync_metadata" for sync documents | -| `mailbox` | string | Yes | Mailbox name this metadata applies to | -| `lastSyncTime` | string (ISO8601) | Yes | When this mailbox was last synced | -| `lastMessageUID` | integer | Yes | Highest IMAP UID processed in last sync | -| `messageCount` | integer | Yes | Number of messages processed in last sync | -| `updatedAt` | string (ISO8601) | Yes | When this metadata was last updated | - -## Database Naming Convention - -**Format**: `m2c_{account_name}` -**Rules**: -- Prefix all databases with `m2c_` -- Convert account names to lowercase -- Replace invalid characters with underscores -- Ensure database name starts with a letter -- If account name starts with non-letter, prefix with `mail_` - -**Examples**: -- Account "Personal Gmail" → Database `m2c_personal_gmail` -- Account "123work" → Database `m2c_mail_123work` -- Email "user@example.com" → Database `m2c_user_example_com` - -## Document ID Conventions - -### Mail Documents -- **Format**: `{mailbox}_{uid}` -- **Examples**: `INBOX_123`, `Sent_456`, `Work/Projects_789` -- **Uniqueness**: Combination of mailbox and IMAP UID ensures uniqueness - -### Sync Metadata Documents -- **Format**: `sync_metadata_{mailbox}` -- **Examples**: `sync_metadata_INBOX`, `sync_metadata_Sent` -- **Purpose**: One metadata document per mailbox for tracking sync state - -## Data Type Mappings - -### Go to JSON -| Go Type | JSON Type | Example | -|---------|-----------|---------| -| `string` | string | `"text"` | -| `[]string` | array | `["item1", "item2"]` | -| `map[string][]string` | object | `{"key": ["value1", "value2"]}` | -| `time.Time` | string (ISO8601) | `"2025-08-02T14:26:08.281094+02:00"` | -| `uint32` | number | `123` | -| `int` | number | `456` | -| `bool` | boolean | `true` | - -### Rust Considerations -When implementing in Rust, ensure: -- Use `chrono::DateTime` for timestamps with ISO8601 serialization -- Use `Vec` for string arrays -- Use `HashMap>` for headers -- Use `serde` with `#[serde(rename = "fieldName")]` for JSON field mapping -- Handle optional fields with `Option` - -## Validation Rules - -### Required Fields -All documents must include: -- `_id`: Valid CouchDB document ID -- `docType`: Identifies document type for filtering -- `mailbox`: Source mailbox name (for mail documents) - -### Data Constraints -- Email addresses: No validation enforced (preserve as-is from IMAP) -- Dates: Must be valid ISO8601 format -- UIDs: Must be positive integers -- Document IDs: Must be valid CouchDB IDs (no spaces, special chars) - -### Attachment Handling -- Store email attachments as CouchDB native attachments -- Preserve original filenames and MIME types -- Use attachment stubs in document metadata -- Support binary content through CouchDB attachment API - -## Backward Compatibility - -When modifying schemas: -1. Add new fields as optional -2. Never remove existing fields -3. Maintain existing field types and formats -4. Document any breaking changes clearly -5. Provide migration guidance for existing data - -## Implementation Notes - -### CouchDB Features Used -- **Native Attachments**: For email attachments -- **Document IDs**: Predictable format for easy access -- **Bulk Operations**: For efficient storage -- **Conflict Resolution**: CouchDB handles revision conflicts - -### Performance Considerations -- Index by `docType` for efficient filtering -- Index by `mailbox` for folder-based queries -- Index by `date` for chronological access -- Use bulk insert operations for multiple messages - -### Future Extensions -This schema supports future enhancements: -- **Webmail Views**: CouchDB design documents for HTML interface -- **Search Indexes**: Full-text search with CouchDB-Lucene -- **Replication**: Multi-database sync scenarios -- **Analytics**: Message statistics and reporting \ No newline at end of file diff --git a/examples/sample-mail-document.json b/examples/sample-mail-document.json deleted file mode 100644 index 231981e..0000000 --- a/examples/sample-mail-document.json +++ /dev/null @@ -1,42 +0,0 @@ -{ - "_id": "INBOX_123", - "_rev": "1-abc123def456789", - "_attachments": { - "report.pdf": { - "content_type": "application/pdf", - "length": 245760, - "stub": true - }, - "image.png": { - "content_type": "image/png", - "length": 12345, - "stub": true - } - }, - "sourceUid": "123", - "mailbox": "INBOX", - "from": ["sender@example.com", "alias@example.com"], - "to": ["recipient@company.com", "cc@company.com"], - "subject": "Monthly Report - Q3 2025", - "date": "2025-08-02T12:16:10Z", - "body": "Please find the attached monthly report for Q3 2025.\n\nBest regards,\nSender Name", - "headers": { - "Content-Type": ["multipart/mixed; boundary=\"----=_Part_123456\""], - "Content-Transfer-Encoding": ["7bit"], - "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"], - "From": ["sender@example.com"], - "To": ["recipient@company.com"], - "Cc": ["cc@company.com"], - "Subject": ["Monthly Report - Q3 2025"], - "Message-ID": [""], - "MIME-Version": ["1.0"], - "X-Mailer": ["Mail Client 1.0"], - "Return-Path": [""], - "Received": [ - "from smtp.example.com (smtp.example.com [192.168.1.100]) by mx.company.com (Postfix) with ESMTP id ABC123; Sat, 02 Aug 2025 14:16:10 +0200" - ] - }, - "storedAt": "2025-08-02T14:16:22.375241322+02:00", - "docType": "mail", - "hasAttachments": true -} \ No newline at end of file diff --git a/examples/sample-sync-metadata.json b/examples/sample-sync-metadata.json deleted file mode 100644 index 2aeeb91..0000000 --- a/examples/sample-sync-metadata.json +++ /dev/null @@ -1,10 +0,0 @@ -{ - "_id": "sync_metadata_INBOX", - "_rev": "2-def456abc789123", - "docType": "sync_metadata", - "mailbox": "INBOX", - "lastSyncTime": "2025-08-02T14:26:08.281094+02:00", - "lastMessageUID": 123, - "messageCount": 45, - "updatedAt": "2025-08-02T14:26:08.281094+02:00" -} \ No newline at end of file diff --git a/examples/simple-mail-document.json b/examples/simple-mail-document.json deleted file mode 100644 index 305ba61..0000000 --- a/examples/simple-mail-document.json +++ /dev/null @@ -1,24 +0,0 @@ -{ - "_id": "Sent_456", - "_rev": "1-xyz789abc123def", - "sourceUid": "456", - "mailbox": "Sent", - "from": ["user@company.com"], - "to": ["client@external.com"], - "subject": "Meeting Follow-up", - "date": "2025-08-02T10:30:00Z", - "body": "Thank you for the productive meeting today. As discussed, I'll send the proposal by end of week.\n\nBest regards,\nUser Name", - "headers": { - "Content-Type": ["text/plain; charset=utf-8"], - "Content-Transfer-Encoding": ["7bit"], - "Date": ["Sat, 02 Aug 2025 12:30:00 +0200"], - "From": ["user@company.com"], - "To": ["client@external.com"], - "Subject": ["Meeting Follow-up"], - "Message-ID": [""], - "MIME-Version": ["1.0"] - }, - "storedAt": "2025-08-02T12:30:45.123456789+02:00", - "docType": "mail", - "hasAttachments": false -} \ No newline at end of file diff --git a/go/config/config.go b/go/config/config.go index dc1808e..5581094 100644 --- a/go/config/config.go +++ b/go/config/config.go @@ -40,7 +40,7 @@ type FolderFilter struct { type MessageFilter struct { Since string `json:"since,omitempty"` SubjectKeywords []string `json:"subjectKeywords,omitempty"` // Filter by keywords in subject - SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses + SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses RecipientKeywords []string `json:"recipientKeywords,omitempty"` // Filter by keywords in recipient addresses } diff --git a/go/couch/couch.go b/go/couch/couch.go index c75c3b6..7a3c5ab 100644 --- a/go/couch/couch.go +++ b/go/couch/couch.go @@ -22,20 +22,20 @@ type Client struct { // MailDocument represents an email message stored in CouchDB type MailDocument struct { - ID string `json:"_id,omitempty"` - Rev string `json:"_rev,omitempty"` - Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments - SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) - Mailbox string `json:"mailbox"` // Source mailbox name - From []string `json:"from"` - To []string `json:"to"` - Subject string `json:"subject"` - Date time.Time `json:"date"` - Body string `json:"body"` - Headers map[string][]string `json:"headers"` - StoredAt time.Time `json:"storedAt"` // When the document was stored - DocType string `json:"docType"` // Always "mail" - HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments + ID string `json:"_id,omitempty"` + Rev string `json:"_rev,omitempty"` + Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments + SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) + Mailbox string `json:"mailbox"` // Source mailbox name + From []string `json:"from"` + To []string `json:"to"` + Subject string `json:"subject"` + Date time.Time `json:"date"` + Body string `json:"body"` + Headers map[string][]string `json:"headers"` + StoredAt time.Time `json:"storedAt"` // When the document was stored + DocType string `json:"docType"` // Always "mail" + HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments } // AttachmentStub represents metadata for a CouchDB attachment @@ -94,19 +94,19 @@ func GenerateAccountDBName(accountName, userEmail string) string { if name == "" { name = userEmail } - + // Convert to lowercase and replace invalid characters with underscores name = strings.ToLower(name) // CouchDB database names must match: ^[a-z][a-z0-9_$()+/-]*$ validName := regexp.MustCompile(`[^a-z0-9_$()+/-]`).ReplaceAllString(name, "_") - + // Ensure it starts with a letter and add m2c prefix if len(validName) > 0 && (validName[0] < 'a' || validName[0] > 'z') { validName = "m2c_mail_" + validName } else { validName = "m2c_" + validName } - + return validName } @@ -228,7 +228,7 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri // Create a view query to get all document IDs for the specified mailbox rows := db.AllDocs(ctx) - + docIDs := make(map[string]bool) for rows.Next() { docID, err := rows.ID() @@ -240,11 +240,11 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri docIDs[docID] = true } } - + if rows.Err() != nil { return nil, rows.Err() } - + return docIDs, nil } @@ -295,7 +295,7 @@ func (c *Client) SyncMailbox(ctx context.Context, dbName, mailbox string, curren if len(parts) < 2 { continue } - + uidStr := parts[len(parts)-1] uid := uint32(0) if _, err := fmt.Sscanf(uidStr, "%d", &uid); err != nil { diff --git a/go/go.mod b/go/go.mod index 1f6d85c..377160a 100644 --- a/go/go.mod +++ b/go/go.mod @@ -4,11 +4,11 @@ go 1.24.4 require ( github.com/emersion/go-imap/v2 v2.0.0-beta.5 - github.com/emersion/go-message v0.18.1 github.com/go-kivik/kivik/v4 v4.4.0 ) require ( + github.com/emersion/go-message v0.18.1 // indirect github.com/emersion/go-sasl v0.0.0-20231106173351-e73c9f7bad43 // indirect github.com/google/uuid v1.6.0 // indirect golang.org/x/net v0.25.0 // indirect diff --git a/go/mail/imap.go b/go/mail/imap.go index 6ba4453..63c7d7e 100644 --- a/go/mail/imap.go +++ b/go/mail/imap.go @@ -104,7 +104,7 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // First, get all current UIDs in the mailbox for sync purposes allUIDsSet := imap.SeqSet{} allUIDsSet.AddRange(1, mbox.NumMessages) - + // Fetch UIDs for all messages to track current state uidCmd := c.Fetch(allUIDsSet, &imap.FetchOptions{UID: true}) for { @@ -112,12 +112,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if msg == nil { break } - + data, err := msg.Collect() if err != nil { continue } - + if data.UID != 0 { currentUIDs[uint32(data.UID)] = true } @@ -126,13 +126,13 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Determine which messages to fetch based on since date var seqSet imap.SeqSet - + if since != nil { // Use IMAP SEARCH to find messages since the specified date searchCriteria := &imap.SearchCriteria{ Since: *since, } - + searchCmd := c.Search(searchCriteria, nil) searchResults, err := searchCmd.Wait() if err != nil { @@ -149,12 +149,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if len(searchSeqNums) == 0 { return []*Message{}, currentUIDs, nil } - + // Limit results if maxMessages is specified if maxMessages > 0 && len(searchSeqNums) > maxMessages { searchSeqNums = searchSeqNums[len(searchSeqNums)-maxMessages:] } - + for _, seqNum := range searchSeqNums { seqSet.AddNum(seqNum) } @@ -165,11 +165,11 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if maxMessages > 0 && int(numToFetch) > maxMessages { numToFetch = uint32(maxMessages) } - + if numToFetch == 0 { return []*Message{}, currentUIDs, nil } - + // Fetch the most recent messages seqSet.AddRange(mbox.NumMessages-numToFetch+1, mbox.NumMessages) } @@ -177,12 +177,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Fetch message data - get envelope and full message body options := &imap.FetchOptions{ Envelope: true, - UID: true, + UID: true, BodySection: []*imap.FetchItemBodySection{ {}, // Empty section gets the entire message }, } - + fetchCmd := c.Fetch(seqSet, options) for { @@ -196,12 +196,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i log.Printf("Failed to parse message: %v", err) continue } - + // Apply message-level keyword filtering if messageFilter != nil && !c.ShouldProcessMessage(parsedMsg, messageFilter) { continue // Skip this message due to keyword filter } - + messages = append(messages, parsedMsg) } @@ -231,7 +231,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa env := buffer.Envelope msg.Subject = env.Subject msg.Date = env.Date - + // Parse From addresses for _, addr := range env.From { if addr.Mailbox != "" { @@ -242,7 +242,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa msg.From = append(msg.From, fullAddr) } } - + // Parse To addresses for _, addr := range env.To { if addr.Mailbox != "" { @@ -264,7 +264,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa if len(buffer.BodySection) > 0 { bodyBuffer := buffer.BodySection[0] reader := bytes.NewReader(bodyBuffer.Bytes) - + // Parse the message using go-message entity, err := message.Read(reader) if err != nil { @@ -338,7 +338,7 @@ func (c *ImapClient) parseMessagePart(entity *message.Entity, msg *Message) erro disposition, dispositionParams, _ := entity.Header.ContentDisposition() // Determine if this is an attachment - isAttachment := disposition == "attachment" || + isAttachment := disposition == "attachment" || (disposition == "inline" && dispositionParams["filename"] != "") || params["name"] != "" diff --git a/go/mail2couch b/go/mail2couch new file mode 100755 index 0000000..2133741 Binary files /dev/null and b/go/mail2couch differ diff --git a/go/main.go b/go/main.go index 155b195..8d4b661 100644 --- a/go/main.go +++ b/go/main.go @@ -13,7 +13,7 @@ import ( func main() { args := config.ParseCommandLine() - + cfg, err := config.LoadConfigWithDiscovery(args) if err != nil { log.Fatalf("Failed to load configuration: %v", err) @@ -33,12 +33,12 @@ func main() { // Generate per-account database name dbName := couch.GenerateAccountDBName(source.Name, source.User) - + // Ensure the account-specific database exists ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) err = couchClient.EnsureDB(ctx, dbName) cancel() - + if err != nil { log.Printf("Could not ensure CouchDB database '%s' exists (is it running?): %v", dbName, err) continue @@ -111,7 +111,7 @@ func processImapSource(source *config.MailSource, couchClient *couch.Client, dbN if syncMetadata != nil { // Use last sync time for incremental sync sinceDate = &syncMetadata.LastSyncTime - fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", + fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", sinceDate.Format("2006-01-02 15:04:05"), syncMetadata.MessageCount) } else { // First sync - use config since date if available diff --git a/rust/Cargo.toml b/rust/Cargo.toml deleted file mode 100644 index 87fcab2..0000000 --- a/rust/Cargo.toml +++ /dev/null @@ -1,52 +0,0 @@ -[package] -name = "mail2couch" -version = "0.1.0" -edition = "2021" -description = "A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB" -license = "MIT" -repository = "https://github.com/yourusername/mail2couch" -keywords = ["email", "backup", "imap", "couchdb", "sync"] -categories = ["email", "database"] - -[dependencies] -# Serialization -serde = { version = "1.0", features = ["derive"] } -serde_json = "1.0" - -# Date/time handling -chrono = { version = "0.4", features = ["serde"] } - -# HTTP client for CouchDB -reqwest = { version = "0.11", features = ["json"] } - -# Async runtime -tokio = { version = "1.0", features = ["full"] } - -# Error handling -thiserror = "1.0" -anyhow = "1.0" - -# Configuration -config = "0.13" - -# IMAP client (when implementing IMAP functionality) -# async-imap = "0.9" # Commented out for now due to compatibility issues - -# Logging -log = "0.4" -env_logger = "0.10" - -# CLI argument parsing -clap = { version = "4.0", features = ["derive"] } - -[dev-dependencies] -# Testing utilities -tokio-test = "0.4" - -[lib] -name = "mail2couch" -path = "src/lib.rs" - -[[bin]] -name = "mail2couch" -path = "src/main.rs" \ No newline at end of file diff --git a/rust/README.md b/rust/README.md deleted file mode 100644 index 4265bcd..0000000 --- a/rust/README.md +++ /dev/null @@ -1,111 +0,0 @@ -# Mail2Couch Rust Implementation - -This directory contains the Rust implementation of mail2couch, which will provide the same functionality as the Go implementation while maintaining full compatibility with the CouchDB document schemas. - -## Current Status - -🚧 **Work in Progress** - The Rust implementation is planned for future development. - -Currently available: -- ✅ **CouchDB Schema Definitions**: Complete Rust structs that match the Go implementation -- ✅ **Serialization Support**: Full serde integration for JSON handling -- ✅ **Type Safety**: Strongly typed structures for all CouchDB documents -- ✅ **Compatibility Tests**: Validated against example documents -- ✅ **Database Naming**: Same database naming logic as Go implementation - -## Schema Compatibility - -The Rust implementation uses the same CouchDB document schemas as the Go implementation: - -### Mail Documents -```rust -use mail2couch::{MailDocument, generate_database_name}; - -let mut doc = MailDocument::new( - "123".to_string(), // IMAP UID - "INBOX".to_string(), // Mailbox - vec!["sender@example.com".to_string()], // From - vec!["recipient@example.com".to_string()], // To - "Subject".to_string(), // Subject - Utc::now(), // Date - "Body content".to_string(), // Body - HashMap::new(), // Headers - false, // Has attachments -); - -doc.set_id(); // Sets ID to "INBOX_123" -``` - -### Sync Metadata -```rust -use mail2couch::SyncMetadata; - -let metadata = SyncMetadata::new( - "INBOX".to_string(), // Mailbox - Utc::now(), // Last sync time - 456, // Last message UID - 100, // Message count -); -// ID automatically set to "sync_metadata_INBOX" -``` - -### Database Naming -```rust -use mail2couch::generate_database_name; - -let db_name = generate_database_name("Personal Gmail", ""); -// Returns: "m2c_personal_gmail" - -let db_name = generate_database_name("", "user@example.com"); -// Returns: "m2c_user_example_com" -``` - -## Dependencies - -The Rust implementation uses these key dependencies: - -- **serde**: JSON serialization/deserialization -- **chrono**: Date/time handling with ISO8601 support -- **reqwest**: HTTP client for CouchDB API -- **tokio**: Async runtime -- **anyhow/thiserror**: Error handling - -## Testing - -Run the schema compatibility tests: - -```bash -cargo test -``` - -All tests validate that the Rust structures produce JSON compatible with the Go implementation and documented schemas. - -## Future Implementation - -The planned Rust implementation will include: - -- **IMAP Client**: Connect to mail servers and retrieve messages -- **CouchDB Integration**: Store documents using native Rust CouchDB client -- **Configuration**: Same JSON config format as Go implementation -- **CLI Interface**: Compatible command-line interface -- **Performance**: Leveraging Rust's performance characteristics -- **Memory Safety**: Rust's ownership model for reliable operation - -## Schema Documentation - -See the following files for complete schema documentation: - -- [`../couchdb-schemas.md`](../couchdb-schemas.md): Complete schema specification -- [`../examples/`](../examples/): JSON example documents -- [`src/schemas.rs`](src/schemas.rs): Rust type definitions - -## Cross-Implementation Compatibility - -Both Go and Rust implementations: -- Use identical CouchDB document schemas -- Generate the same database names -- Store documents with the same field names and types -- Support incremental sync with compatible metadata -- Handle attachments using CouchDB native attachment storage - -This ensures that databases created by either implementation can be used interchangeably. \ No newline at end of file diff --git a/rust/src/lib.rs b/rust/src/lib.rs deleted file mode 100644 index 6be3a6f..0000000 --- a/rust/src/lib.rs +++ /dev/null @@ -1,20 +0,0 @@ -//! # mail2couch -//! -//! A powerful email backup utility that synchronizes mail from IMAP accounts to CouchDB. -//! -//! This library provides the core functionality for: -//! - Connecting to IMAP servers -//! - Retrieving email messages and attachments -//! - Storing emails in CouchDB with proper document structures -//! - Incremental synchronization to avoid re-processing messages -//! - Filtering by folders, dates, and keywords -//! -//! ## Document Schemas -//! -//! The library uses well-defined CouchDB document schemas that are compatible -//! with the Go implementation. See the `schemas` module for details. - -pub mod schemas; - -// Re-export main types for convenience -pub use schemas::{MailDocument, SyncMetadata, AttachmentStub, generate_database_name}; \ No newline at end of file diff --git a/rust/src/main.rs b/rust/src/main.rs deleted file mode 100644 index db9d28f..0000000 --- a/rust/src/main.rs +++ /dev/null @@ -1,7 +0,0 @@ -// Placeholder main.rs for Rust implementation -// This will be implemented in the future - -fn main() { - println!("mail2couch Rust implementation - Coming Soon!"); - println!("See the Go implementation in ../go/ for current functionality."); -} \ No newline at end of file diff --git a/rust/src/schemas.rs b/rust/src/schemas.rs deleted file mode 100644 index 5f75145..0000000 --- a/rust/src/schemas.rs +++ /dev/null @@ -1,266 +0,0 @@ -// CouchDB document schemas for mail2couch -// This file defines the Rust structures that correspond to the CouchDB document schemas -// defined in couchdb-schemas.md - -use chrono::{DateTime, Utc}; -use serde::{Deserialize, Serialize}; -use std::collections::HashMap; - -/// Represents an email message stored in CouchDB -/// Document ID format: {mailbox}_{uid} (e.g., "INBOX_123") -/// Document type: "mail" -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct MailDocument { - /// CouchDB document ID - #[serde(rename = "_id")] - #[serde(skip_serializing_if = "Option::is_none")] - pub id: Option, - - /// CouchDB revision (managed by CouchDB) - #[serde(rename = "_rev")] - #[serde(skip_serializing_if = "Option::is_none")] - pub rev: Option, - - /// CouchDB native attachments for email attachments - #[serde(rename = "_attachments")] - #[serde(skip_serializing_if = "Option::is_none")] - pub attachments: Option>, - - /// Original IMAP UID from mail server - #[serde(rename = "sourceUid")] - pub source_uid: String, - - /// Source mailbox name (e.g., "INBOX", "Sent") - pub mailbox: String, - - /// Sender email addresses - pub from: Vec, - - /// Recipient email addresses - pub to: Vec, - - /// Email subject line - pub subject: String, - - /// Email date from headers (ISO8601 format) - pub date: DateTime, - - /// Email body content (plain text) - pub body: String, - - /// All email headers as key-value pairs - pub headers: HashMap>, - - /// When document was stored in CouchDB (ISO8601 format) - #[serde(rename = "storedAt")] - pub stored_at: DateTime, - - /// Document type identifier (always "mail") - #[serde(rename = "docType")] - pub doc_type: String, - - /// Whether email has attachments - #[serde(rename = "hasAttachments")] - pub has_attachments: bool, -} - -/// Metadata for CouchDB native attachments -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct AttachmentStub { - /// MIME type of attachment - #[serde(rename = "content_type")] - pub content_type: String, - - /// Size in bytes (optional) - #[serde(skip_serializing_if = "Option::is_none")] - pub length: Option, - - /// Indicates attachment is stored separately (optional) - #[serde(skip_serializing_if = "Option::is_none")] - pub stub: Option, -} - -/// Sync state information for incremental syncing -/// Document ID format: sync_metadata_{mailbox} (e.g., "sync_metadata_INBOX") -/// Document type: "sync_metadata" -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct SyncMetadata { - /// CouchDB document ID - #[serde(rename = "_id")] - #[serde(skip_serializing_if = "Option::is_none")] - pub id: Option, - - /// CouchDB revision (managed by CouchDB) - #[serde(rename = "_rev")] - #[serde(skip_serializing_if = "Option::is_none")] - pub rev: Option, - - /// Document type identifier (always "sync_metadata") - #[serde(rename = "docType")] - pub doc_type: String, - - /// Mailbox name this metadata applies to - pub mailbox: String, - - /// When this mailbox was last synced (ISO8601 format) - #[serde(rename = "lastSyncTime")] - pub last_sync_time: DateTime, - - /// Highest IMAP UID processed in last sync - #[serde(rename = "lastMessageUID")] - pub last_message_uid: u32, - - /// Number of messages processed in last sync - #[serde(rename = "messageCount")] - pub message_count: u32, - - /// When this metadata was last updated (ISO8601 format) - #[serde(rename = "updatedAt")] - pub updated_at: DateTime, -} - -impl MailDocument { - /// Create a new MailDocument with required fields - pub fn new( - source_uid: String, - mailbox: String, - from: Vec, - to: Vec, - subject: String, - date: DateTime, - body: String, - headers: HashMap>, - has_attachments: bool, - ) -> Self { - let now = Utc::now(); - Self { - id: None, // Will be set when storing to CouchDB - rev: None, // Managed by CouchDB - attachments: None, - source_uid, - mailbox, - from, - to, - subject, - date, - body, - headers, - stored_at: now, - doc_type: "mail".to_string(), - has_attachments, - } - } - - /// Generate document ID based on mailbox and UID - pub fn generate_id(&self) -> String { - format!("{}_{}", self.mailbox, self.source_uid) - } - - /// Set the document ID - pub fn set_id(&mut self) { - self.id = Some(self.generate_id()); - } -} - -impl SyncMetadata { - /// Create new sync metadata for a mailbox - pub fn new( - mailbox: String, - last_sync_time: DateTime, - last_message_uid: u32, - message_count: u32, - ) -> Self { - let now = Utc::now(); - Self { - id: Some(format!("sync_metadata_{}", mailbox)), - rev: None, // Managed by CouchDB - doc_type: "sync_metadata".to_string(), - mailbox, - last_sync_time, - last_message_uid, - message_count, - updated_at: now, - } - } -} - -/// Generate CouchDB database name from account information -/// Format: m2c_{account_name} -/// Rules: lowercase, replace invalid chars with underscores, ensure starts with letter -pub fn generate_database_name(account_name: &str, user_email: &str) -> String { - let name = if account_name.is_empty() { - user_email - } else { - account_name - }; - - // Convert to lowercase and replace invalid characters - let mut valid_name = name - .to_lowercase() - .chars() - .map(|c| { - if c.is_ascii_alphanumeric() || c == '_' || c == '$' || c == '(' || c == ')' || c == '+' || c == '-' || c == '/' { - c - } else { - '_' - } - }) - .collect::(); - - // Ensure starts with a letter - if valid_name.is_empty() || !valid_name.chars().next().unwrap().is_ascii_lowercase() { - valid_name = format!("m2c_mail_{}", valid_name); - } else { - valid_name = format!("m2c_{}", valid_name); - } - - valid_name -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_generate_database_name() { - assert_eq!(generate_database_name("Personal Gmail", ""), "m2c_personal_gmail"); - assert_eq!(generate_database_name("", "user@example.com"), "m2c_user_example_com"); - assert_eq!(generate_database_name("123work", ""), "m2c_mail_123work"); - } - - #[test] - fn test_mail_document_id_generation() { - let mut doc = MailDocument::new( - "123".to_string(), - "INBOX".to_string(), - vec!["sender@example.com".to_string()], - vec!["recipient@example.com".to_string()], - "Test Subject".to_string(), - Utc::now(), - "Test body".to_string(), - HashMap::new(), - false, - ); - - assert_eq!(doc.generate_id(), "INBOX_123"); - - doc.set_id(); - assert_eq!(doc.id, Some("INBOX_123".to_string())); - } - - #[test] - fn test_sync_metadata_creation() { - let metadata = SyncMetadata::new( - "INBOX".to_string(), - Utc::now(), - 456, - 100, - ); - - assert_eq!(metadata.id, Some("sync_metadata_INBOX".to_string())); - assert_eq!(metadata.doc_type, "sync_metadata"); - assert_eq!(metadata.mailbox, "INBOX"); - assert_eq!(metadata.last_message_uid, 456); - assert_eq!(metadata.message_count, 100); - } -} \ No newline at end of file diff --git a/scripts/validate-schemas.py b/scripts/validate-schemas.py deleted file mode 100755 index c3f037f..0000000 --- a/scripts/validate-schemas.py +++ /dev/null @@ -1,169 +0,0 @@ -#!/usr/bin/env python3 -""" -Schema Validation Script for mail2couch - -This script validates that the CouchDB document schemas are consistent -between the Go implementation and the documented JSON examples. -""" - -import json -import sys -from pathlib import Path - -def load_json_file(file_path): - """Load and parse a JSON file.""" - try: - with open(file_path, 'r') as f: - return json.load(f) - except FileNotFoundError: - print(f"ERROR: File not found: {file_path}") - return None - except json.JSONDecodeError as e: - print(f"ERROR: Invalid JSON in {file_path}: {e}") - return None - -def validate_mail_document(doc, filename): - """Validate a mail document structure.""" - required_fields = [ - '_id', 'sourceUid', 'mailbox', 'from', 'to', 'subject', - 'date', 'body', 'headers', 'storedAt', 'docType', 'hasAttachments' - ] - - errors = [] - - # Check required fields - for field in required_fields: - if field not in doc: - errors.append(f"Missing required field: {field}") - - # Check field types - if 'docType' in doc and doc['docType'] != 'mail': - errors.append(f"Invalid docType: expected 'mail', got '{doc['docType']}'") - - if 'from' in doc and not isinstance(doc['from'], list): - errors.append("Field 'from' must be an array") - - if 'to' in doc and not isinstance(doc['to'], list): - errors.append("Field 'to' must be an array") - - if 'headers' in doc and not isinstance(doc['headers'], dict): - errors.append("Field 'headers' must be an object") - - if 'hasAttachments' in doc and not isinstance(doc['hasAttachments'], bool): - errors.append("Field 'hasAttachments' must be a boolean") - - # Check _id format - if '_id' in doc: - doc_id = doc['_id'] - if '_' not in doc_id: - errors.append(f"Document ID '{doc_id}' should follow format 'mailbox_uid'") - - # Validate attachments if present - if '_attachments' in doc: - if not isinstance(doc['_attachments'], dict): - errors.append("Field '_attachments' must be an object") - else: - for filename, stub in doc['_attachments'].items(): - if 'content_type' not in stub: - errors.append(f"Attachment '{filename}' missing content_type") - - if errors: - print(f"ERRORS in {filename}:") - for error in errors: - print(f" - {error}") - return False - else: - print(f"✓ {filename}: Valid mail document") - return True - -def validate_sync_metadata(doc, filename): - """Validate a sync metadata document structure.""" - required_fields = [ - '_id', 'docType', 'mailbox', 'lastSyncTime', - 'lastMessageUID', 'messageCount', 'updatedAt' - ] - - errors = [] - - # Check required fields - for field in required_fields: - if field not in doc: - errors.append(f"Missing required field: {field}") - - # Check field types - if 'docType' in doc and doc['docType'] != 'sync_metadata': - errors.append(f"Invalid docType: expected 'sync_metadata', got '{doc['docType']}'") - - if 'lastMessageUID' in doc and not isinstance(doc['lastMessageUID'], int): - errors.append("Field 'lastMessageUID' must be an integer") - - if 'messageCount' in doc and not isinstance(doc['messageCount'], int): - errors.append("Field 'messageCount' must be an integer") - - # Check _id format - if '_id' in doc: - doc_id = doc['_id'] - if not doc_id.startswith('sync_metadata_'): - errors.append(f"Document ID '{doc_id}' should start with 'sync_metadata_'") - - if errors: - print(f"ERRORS in {filename}:") - for error in errors: - print(f" - {error}") - return False - else: - print(f"✓ {filename}: Valid sync metadata document") - return True - -def main(): - """Main validation function.""" - script_dir = Path(__file__).parent - project_root = script_dir.parent - examples_dir = project_root / "examples" - - print("Validating CouchDB document schemas...") - print("=" * 50) - - all_valid = True - - # Validate mail documents - mail_files = [ - "sample-mail-document.json", - "simple-mail-document.json" - ] - - for filename in mail_files: - file_path = examples_dir / filename - doc = load_json_file(file_path) - if doc is None: - all_valid = False - continue - - if not validate_mail_document(doc, filename): - all_valid = False - - # Validate sync metadata - sync_files = [ - "sample-sync-metadata.json" - ] - - for filename in sync_files: - file_path = examples_dir / filename - doc = load_json_file(file_path) - if doc is None: - all_valid = False - continue - - if not validate_sync_metadata(doc, filename): - all_valid = False - - print("=" * 50) - if all_valid: - print("✓ All schemas are valid!") - sys.exit(0) - else: - print("✗ Schema validation failed!") - sys.exit(1) - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/test/README.md b/test/README.md index ec83430..bd4c180 100644 --- a/test/README.md +++ b/test/README.md @@ -8,7 +8,6 @@ The test environment provides: - **CouchDB**: Database for storing email messages - **GreenMail IMAP Server**: Java-based mail server designed for testing with pre-populated test accounts and messages - **Test Configuration**: Ready-to-use config for testing both sync and archive modes -- **HTML Webmail Interface**: Beautiful, responsive web interface for viewing archived emails ## Quick Start @@ -115,20 +114,6 @@ Each account contains: - **SMTP Port**: 3025 - **Server**: GreenMail (Java-based test server) -### Accessing Test Data -After running mail2couch, you can access the stored emails via CouchDB's REST API: - -**📋 Database Access:** -- All databases: http://localhost:5984/_all_dbs -- Specific database: http://localhost:5984/m2c_specific_folders_only -- All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs -- Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12 - -**🔍 Raw Data Examples:** -- Database info: http://localhost:5984/m2c_specific_folders_only -- Document content: http://localhost:5984/m2c_specific_folders_only/INBOX_1 -- Email attachments: http://localhost:5984/m2c_specific_folders_only/INBOX_1/{attachment_name} - ## Database Structure mail2couch will create separate databases for each mail source (with `m2c_` prefix): @@ -141,7 +126,6 @@ Each database contains documents with: - `mailbox` field indicating the origin folder - Native CouchDB attachments for email attachments - Full message headers and body content -- JSON documents accessible via CouchDB REST API ## Testing Sync vs Archive Modes diff --git a/test/start-test-env.sh b/test/start-test-env.sh index dfb9b59..2e6a82c 100755 --- a/test/start-test-env.sh +++ b/test/start-test-env.sh @@ -63,18 +63,5 @@ echo "" echo "To run mail2couch:" echo " cd ../go && ./mail2couch -config ../test/config-test.json" echo "" -echo "📧 MAIL2COUCH DATABASE ACCESS:" -echo "After running mail2couch, you can access the stored emails via CouchDB:" -echo "" -echo "📋 Database Access (examples after sync):" -echo " - All databases: http://localhost:5984/_all_dbs" -echo " - Specific database: http://localhost:5984/m2c_specific_folders_only" -echo " - All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs" -echo " - Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12" -echo "" -echo "🔍 Raw Data Access:" -echo " - Database info: http://localhost:5984/m2c_specific_folders_only" -echo " - Document with content: http://localhost:5984/m2c_specific_folders_only/INBOX_12" -echo "" echo "To stop the environment:" echo " ./stop-test-env.sh" \ No newline at end of file diff --git a/test/test-incremental-sync.sh b/test/test-incremental-sync.sh index 45e0005..bc35016 100755 --- a/test/test-incremental-sync.sh +++ b/test/test-incremental-sync.sh @@ -81,17 +81,7 @@ add_new_messages() { import imaplib import time -import sys -import os - -# Add the test directory to Python path to enable imports -sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) - -import importlib.util -spec = importlib.util.spec_from_file_location("populate_greenmail", "populate-greenmail.py") -populate_greenmail = importlib.util.module_from_spec(spec) -spec.loader.exec_module(populate_greenmail) -create_simple_message = populate_greenmail.create_simple_message +from test.populate_greenmail import create_simple_message def add_new_messages(): """Add new messages to test incremental sync"""