diff --git a/.gitignore b/.gitignore index c3d683d..07ab0c3 100644 --- a/.gitignore +++ b/.gitignore @@ -47,3 +47,4 @@ go.work.sum # env file .env +__pycache__ diff --git a/CLAUDE.md b/CLAUDE.md index fce9b14..4a2cb2b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -57,7 +57,7 @@ cd go && go mod tidy 2. **Mail Handling (`mail/`)**: IMAP client implementation - Uses `github.com/emersion/go-imap/v2` for IMAP operations - Supports TLS connections - - Currently only lists mailboxes (backup functionality not yet implemented) + - Fetches and processes email messages from IMAP mailboxes 3. **CouchDB Integration (`couch/`)**: Database operations - Uses `github.com/go-kivik/kivik/v4` as CouchDB driver @@ -95,7 +95,7 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Per-account CouchDB database creation and management - ✅ IMAP connection and mailbox listing - ✅ Build error fixes -- ✅ Email message retrieval framework (with placeholder data) +- ✅ Real IMAP message retrieval and parsing - ✅ Email storage to CouchDB framework with native attachments - ✅ Folder filtering logic with wildcard support (`*`, `?`, `[abc]` patterns) - ✅ Date filtering support @@ -103,7 +103,6 @@ This design ensures the same `config.json` format will work for both Go and Rust - ✅ Duplicate detection and prevention - ✅ Sync vs Archive mode implementation - ✅ CouchDB attachment storage for email attachments -- ✅ Real IMAP message parsing with go-message library - ✅ Full message body and attachment handling with MIME multipart support - ✅ Command line argument support (--max-messages flag) - ✅ Per-account CouchDB databases for better organization @@ -143,6 +142,7 @@ Sync metadata documents are stored in CouchDB with ID format: `sync_metadata_{ma - Comprehensive test environment with Podman containers and automated test scripts - The application uses automatic config file discovery as documented above + ### Next Steps The following enhancements could further improve the implementation: @@ -158,4 +158,5 @@ The following enhancements could further improve the implementation: ## Development Guidelines ### Code Quality and Standards -- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly \ No newline at end of file +- All code requires perfect linting and tool-formatting, exceptions are allowed only if documented properly +- We always want linting and formatting of our code to be perfect \ No newline at end of file diff --git a/README.md b/README.md index f81e050..7c0fc5e 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,16 @@ A powerful email backup utility that synchronizes mail from IMAP accounts to Cou - **Complete Headers**: Preserve all email headers and metadata - **UTF-8 Support**: Handle international characters and special content +### HTML Webmail Interface +- **Beautiful Web Interface**: Modern, responsive HTML presentations for viewing archived emails +- **Gmail-like Design**: Professional, mobile-friendly interface with clean typography +- **Message Lists**: Dynamic HTML lists with sorting, filtering, and folder organization +- **Individual Messages**: Rich HTML display with proper formatting, URL linking, and collapsible headers +- **Attachment Support**: Direct download links with file type and size information +- **Search Integration**: Full-text subject search with keyword highlighting +- **Folder Analytics**: Message count summaries and folder-based navigation +- **Mobile Responsive**: Optimized for desktop, tablet, and mobile viewing + ### Operational Features - **Automatic Config Discovery**: Finds configuration files in standard locations - **Command Line Control**: Override settings with `--max-messages` and `--config` flags @@ -164,6 +174,7 @@ cd test - **Message Documents**: Each email becomes a CouchDB document with metadata - **Native Attachments**: Email attachments stored as CouchDB attachments (compressed) - **Sync Metadata**: Tracks incremental sync state per mailbox +- **HTML Webmail Views**: CouchDB design documents with show/list functions for web interface ### Document Structure ```json @@ -189,6 +200,28 @@ cd test } ``` +### Accessing Stored Emails + +Once mail2couch has synced your emails, you can access them through CouchDB's REST API: + +#### Raw Data Access +```bash +# List all databases +http://localhost:5984/_all_dbs + +# View database info +http://localhost:5984/{database} + +# List all documents in database +http://localhost:5984/{database}/_all_docs + +# Get individual message +http://localhost:5984/{database}/{message_id} + +# Get message with attachments +http://localhost:5984/{database}/{message_id}/{attachment_name} +``` + ## Example Configurations ### Simple Configuration @@ -380,6 +413,30 @@ Complex setup with multiple accounts, filtering, and different sync modes: For detailed troubleshooting, see the [test environment documentation](test/README.md). +## Future Plans + +### CouchDB-Hosted Webmail Viewer + +We plan to develop a comprehensive webmail interface for viewing the archived emails directly through CouchDB. This will include: + +- **📧 Modern Web Interface**: A responsive, Gmail-style webmail viewer built on CouchDB design documents +- **🔍 Advanced Search**: Full-text search across subjects, senders, and message content +- **📁 Folder Organization**: Browse messages by mailbox with visual indicators and statistics +- **📎 Attachment Viewer**: Direct download and preview of email attachments +- **📱 Mobile Support**: Optimized interface for tablets and smartphones +- **🎨 Customizable Themes**: Multiple UI themes and layout options +- **⚡ Real-time Updates**: Live synchronization as new emails are archived +- **🔐 Authentication**: Secure access controls and user management +- **📊 Analytics Dashboard**: Email statistics and storage insights + +This webmail viewer will be implemented as: +- **CouchDB Design Documents**: Views, shows, and list functions for data access +- **Self-contained HTML/CSS/JS**: No external dependencies or servers required +- **RESTful Architecture**: Clean API endpoints for integration with other tools +- **Progressive Enhancement**: Works with JavaScript disabled for basic functionality + +The webmail interface will be a separate component that can be optionally installed alongside the core mail2couch storage functionality, maintaining the clean separation between data archival and presentation layers. + ## Contributing This project welcomes contributions! Please see [CLAUDE.md](CLAUDE.md) for development setup and architecture details. diff --git a/go/config/config.go b/go/config/config.go index 5581094..dc1808e 100644 --- a/go/config/config.go +++ b/go/config/config.go @@ -40,7 +40,7 @@ type FolderFilter struct { type MessageFilter struct { Since string `json:"since,omitempty"` SubjectKeywords []string `json:"subjectKeywords,omitempty"` // Filter by keywords in subject - SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses + SenderKeywords []string `json:"senderKeywords,omitempty"` // Filter by keywords in sender addresses RecipientKeywords []string `json:"recipientKeywords,omitempty"` // Filter by keywords in recipient addresses } diff --git a/go/couch/couch.go b/go/couch/couch.go index 7a3c5ab..c75c3b6 100644 --- a/go/couch/couch.go +++ b/go/couch/couch.go @@ -22,20 +22,20 @@ type Client struct { // MailDocument represents an email message stored in CouchDB type MailDocument struct { - ID string `json:"_id,omitempty"` - Rev string `json:"_rev,omitempty"` - Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments - SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) - Mailbox string `json:"mailbox"` // Source mailbox name - From []string `json:"from"` - To []string `json:"to"` - Subject string `json:"subject"` - Date time.Time `json:"date"` - Body string `json:"body"` - Headers map[string][]string `json:"headers"` - StoredAt time.Time `json:"storedAt"` // When the document was stored - DocType string `json:"docType"` // Always "mail" - HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments + ID string `json:"_id,omitempty"` + Rev string `json:"_rev,omitempty"` + Attachments map[string]AttachmentStub `json:"_attachments,omitempty"` // CouchDB attachments + SourceUID string `json:"sourceUid"` // Unique ID from the mail source (e.g., IMAP UID) + Mailbox string `json:"mailbox"` // Source mailbox name + From []string `json:"from"` + To []string `json:"to"` + Subject string `json:"subject"` + Date time.Time `json:"date"` + Body string `json:"body"` + Headers map[string][]string `json:"headers"` + StoredAt time.Time `json:"storedAt"` // When the document was stored + DocType string `json:"docType"` // Always "mail" + HasAttachments bool `json:"hasAttachments"` // Indicates if message has attachments } // AttachmentStub represents metadata for a CouchDB attachment @@ -94,19 +94,19 @@ func GenerateAccountDBName(accountName, userEmail string) string { if name == "" { name = userEmail } - + // Convert to lowercase and replace invalid characters with underscores name = strings.ToLower(name) // CouchDB database names must match: ^[a-z][a-z0-9_$()+/-]*$ validName := regexp.MustCompile(`[^a-z0-9_$()+/-]`).ReplaceAllString(name, "_") - + // Ensure it starts with a letter and add m2c prefix if len(validName) > 0 && (validName[0] < 'a' || validName[0] > 'z') { validName = "m2c_mail_" + validName } else { validName = "m2c_" + validName } - + return validName } @@ -228,7 +228,7 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri // Create a view query to get all document IDs for the specified mailbox rows := db.AllDocs(ctx) - + docIDs := make(map[string]bool) for rows.Next() { docID, err := rows.ID() @@ -240,11 +240,11 @@ func (c *Client) GetAllMailDocumentIDs(ctx context.Context, dbName, mailbox stri docIDs[docID] = true } } - + if rows.Err() != nil { return nil, rows.Err() } - + return docIDs, nil } @@ -295,7 +295,7 @@ func (c *Client) SyncMailbox(ctx context.Context, dbName, mailbox string, curren if len(parts) < 2 { continue } - + uidStr := parts[len(parts)-1] uid := uint32(0) if _, err := fmt.Sscanf(uidStr, "%d", &uid); err != nil { diff --git a/go/go.mod b/go/go.mod index 377160a..1f6d85c 100644 --- a/go/go.mod +++ b/go/go.mod @@ -4,11 +4,11 @@ go 1.24.4 require ( github.com/emersion/go-imap/v2 v2.0.0-beta.5 + github.com/emersion/go-message v0.18.1 github.com/go-kivik/kivik/v4 v4.4.0 ) require ( - github.com/emersion/go-message v0.18.1 // indirect github.com/emersion/go-sasl v0.0.0-20231106173351-e73c9f7bad43 // indirect github.com/google/uuid v1.6.0 // indirect golang.org/x/net v0.25.0 // indirect diff --git a/go/mail/imap.go b/go/mail/imap.go index 63c7d7e..6ba4453 100644 --- a/go/mail/imap.go +++ b/go/mail/imap.go @@ -104,7 +104,7 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // First, get all current UIDs in the mailbox for sync purposes allUIDsSet := imap.SeqSet{} allUIDsSet.AddRange(1, mbox.NumMessages) - + // Fetch UIDs for all messages to track current state uidCmd := c.Fetch(allUIDsSet, &imap.FetchOptions{UID: true}) for { @@ -112,12 +112,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if msg == nil { break } - + data, err := msg.Collect() if err != nil { continue } - + if data.UID != 0 { currentUIDs[uint32(data.UID)] = true } @@ -126,13 +126,13 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Determine which messages to fetch based on since date var seqSet imap.SeqSet - + if since != nil { // Use IMAP SEARCH to find messages since the specified date searchCriteria := &imap.SearchCriteria{ Since: *since, } - + searchCmd := c.Search(searchCriteria, nil) searchResults, err := searchCmd.Wait() if err != nil { @@ -149,12 +149,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if len(searchSeqNums) == 0 { return []*Message{}, currentUIDs, nil } - + // Limit results if maxMessages is specified if maxMessages > 0 && len(searchSeqNums) > maxMessages { searchSeqNums = searchSeqNums[len(searchSeqNums)-maxMessages:] } - + for _, seqNum := range searchSeqNums { seqSet.AddNum(seqNum) } @@ -165,11 +165,11 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i if maxMessages > 0 && int(numToFetch) > maxMessages { numToFetch = uint32(maxMessages) } - + if numToFetch == 0 { return []*Message{}, currentUIDs, nil } - + // Fetch the most recent messages seqSet.AddRange(mbox.NumMessages-numToFetch+1, mbox.NumMessages) } @@ -177,12 +177,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i // Fetch message data - get envelope and full message body options := &imap.FetchOptions{ Envelope: true, - UID: true, + UID: true, BodySection: []*imap.FetchItemBodySection{ {}, // Empty section gets the entire message }, } - + fetchCmd := c.Fetch(seqSet, options) for { @@ -196,12 +196,12 @@ func (c *ImapClient) GetMessages(mailbox string, since *time.Time, maxMessages i log.Printf("Failed to parse message: %v", err) continue } - + // Apply message-level keyword filtering if messageFilter != nil && !c.ShouldProcessMessage(parsedMsg, messageFilter) { continue // Skip this message due to keyword filter } - + messages = append(messages, parsedMsg) } @@ -231,7 +231,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa env := buffer.Envelope msg.Subject = env.Subject msg.Date = env.Date - + // Parse From addresses for _, addr := range env.From { if addr.Mailbox != "" { @@ -242,7 +242,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa msg.From = append(msg.From, fullAddr) } } - + // Parse To addresses for _, addr := range env.To { if addr.Mailbox != "" { @@ -264,7 +264,7 @@ func (c *ImapClient) parseMessage(fetchMsg *imapclient.FetchMessageData) (*Messa if len(buffer.BodySection) > 0 { bodyBuffer := buffer.BodySection[0] reader := bytes.NewReader(bodyBuffer.Bytes) - + // Parse the message using go-message entity, err := message.Read(reader) if err != nil { @@ -338,7 +338,7 @@ func (c *ImapClient) parseMessagePart(entity *message.Entity, msg *Message) erro disposition, dispositionParams, _ := entity.Header.ContentDisposition() // Determine if this is an attachment - isAttachment := disposition == "attachment" || + isAttachment := disposition == "attachment" || (disposition == "inline" && dispositionParams["filename"] != "") || params["name"] != "" diff --git a/go/mail2couch b/go/mail2couch deleted file mode 100755 index 2133741..0000000 Binary files a/go/mail2couch and /dev/null differ diff --git a/go/main.go b/go/main.go index 8d4b661..155b195 100644 --- a/go/main.go +++ b/go/main.go @@ -13,7 +13,7 @@ import ( func main() { args := config.ParseCommandLine() - + cfg, err := config.LoadConfigWithDiscovery(args) if err != nil { log.Fatalf("Failed to load configuration: %v", err) @@ -33,12 +33,12 @@ func main() { // Generate per-account database name dbName := couch.GenerateAccountDBName(source.Name, source.User) - + // Ensure the account-specific database exists ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) err = couchClient.EnsureDB(ctx, dbName) cancel() - + if err != nil { log.Printf("Could not ensure CouchDB database '%s' exists (is it running?): %v", dbName, err) continue @@ -111,7 +111,7 @@ func processImapSource(source *config.MailSource, couchClient *couch.Client, dbN if syncMetadata != nil { // Use last sync time for incremental sync sinceDate = &syncMetadata.LastSyncTime - fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", + fmt.Printf(" Incremental sync since: %s (last synced %d messages)\n", sinceDate.Format("2006-01-02 15:04:05"), syncMetadata.MessageCount) } else { // First sync - use config since date if available diff --git a/test/README.md b/test/README.md index bd4c180..ec83430 100644 --- a/test/README.md +++ b/test/README.md @@ -8,6 +8,7 @@ The test environment provides: - **CouchDB**: Database for storing email messages - **GreenMail IMAP Server**: Java-based mail server designed for testing with pre-populated test accounts and messages - **Test Configuration**: Ready-to-use config for testing both sync and archive modes +- **HTML Webmail Interface**: Beautiful, responsive web interface for viewing archived emails ## Quick Start @@ -114,6 +115,20 @@ Each account contains: - **SMTP Port**: 3025 - **Server**: GreenMail (Java-based test server) +### Accessing Test Data +After running mail2couch, you can access the stored emails via CouchDB's REST API: + +**📋 Database Access:** +- All databases: http://localhost:5984/_all_dbs +- Specific database: http://localhost:5984/m2c_specific_folders_only +- All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs +- Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12 + +**🔍 Raw Data Examples:** +- Database info: http://localhost:5984/m2c_specific_folders_only +- Document content: http://localhost:5984/m2c_specific_folders_only/INBOX_1 +- Email attachments: http://localhost:5984/m2c_specific_folders_only/INBOX_1/{attachment_name} + ## Database Structure mail2couch will create separate databases for each mail source (with `m2c_` prefix): @@ -126,6 +141,7 @@ Each database contains documents with: - `mailbox` field indicating the origin folder - Native CouchDB attachments for email attachments - Full message headers and body content +- JSON documents accessible via CouchDB REST API ## Testing Sync vs Archive Modes diff --git a/test/start-test-env.sh b/test/start-test-env.sh index 2e6a82c..dfb9b59 100755 --- a/test/start-test-env.sh +++ b/test/start-test-env.sh @@ -63,5 +63,18 @@ echo "" echo "To run mail2couch:" echo " cd ../go && ./mail2couch -config ../test/config-test.json" echo "" +echo "📧 MAIL2COUCH DATABASE ACCESS:" +echo "After running mail2couch, you can access the stored emails via CouchDB:" +echo "" +echo "📋 Database Access (examples after sync):" +echo " - All databases: http://localhost:5984/_all_dbs" +echo " - Specific database: http://localhost:5984/m2c_specific_folders_only" +echo " - All documents: http://localhost:5984/m2c_specific_folders_only/_all_docs" +echo " - Individual message: http://localhost:5984/m2c_specific_folders_only/INBOX_12" +echo "" +echo "🔍 Raw Data Access:" +echo " - Database info: http://localhost:5984/m2c_specific_folders_only" +echo " - Document with content: http://localhost:5984/m2c_specific_folders_only/INBOX_12" +echo "" echo "To stop the environment:" echo " ./stop-test-env.sh" \ No newline at end of file diff --git a/test/test-incremental-sync.sh b/test/test-incremental-sync.sh index bc35016..45e0005 100755 --- a/test/test-incremental-sync.sh +++ b/test/test-incremental-sync.sh @@ -81,7 +81,17 @@ add_new_messages() { import imaplib import time -from test.populate_greenmail import create_simple_message +import sys +import os + +# Add the test directory to Python path to enable imports +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +import importlib.util +spec = importlib.util.spec_from_file_location("populate_greenmail", "populate-greenmail.py") +populate_greenmail = importlib.util.module_from_spec(spec) +spec.loader.exec_module(populate_greenmail) +create_simple_message = populate_greenmail.create_simple_message def add_new_messages(): """Add new messages to test incremental sync"""