feat: add comprehensive README documentation and clean up configuration
## Documentation Enhancements - Create comprehensive README with installation, configuration, and usage examples - Add simple, advanced, and provider-specific configuration examples - Document all features: incremental sync, wildcard patterns, keyword filtering, attachment support - Include production deployment guidance and troubleshooting section - Add architecture documentation with database structure and document format examples ## Configuration Cleanup - Remove unnecessary `database` field from CouchDB configuration - Add `m2c_` prefix to all CouchDB database names for better namespace isolation - Update GenerateAccountDBName() to consistently prefix databases with `m2c_` - Clean up all configuration examples to remove deprecated database field ## Test Environment Simplification - Simplify test script structure to eliminate confusion and redundancy - Remove redundant populate-test-messages.sh wrapper script - Update run-tests.sh to be comprehensive automated test with cleanup - Maintain clear separation: automated tests vs manual testing environment - Update all test scripts to expect m2c-prefixed database names ## Configuration Examples Added - config-simple.json: Basic single Gmail account setup - config-advanced.json: Multi-account with complex filtering and different providers - config-providers.json: Real-world configurations for Gmail, Outlook, Yahoo, iCloud ## Benefits - Clear documentation for users from beginner to advanced - Namespace isolation prevents database conflicts in shared CouchDB instances - Simplified test workflow eliminates user confusion about which scripts to use - Comprehensive examples cover common email provider configurations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
357cd06264
commit
c2ad55eaaf
17 changed files with 1139 additions and 111 deletions
55
CLAUDE.md
55
CLAUDE.md
|
|
@ -31,7 +31,14 @@ cd go && ./mail2couch -config /path/to/config.json -max-messages 50
|
|||
# Run linting/static analysis
|
||||
cd go && go vet ./...
|
||||
|
||||
# Run tests (currently no tests exist)
|
||||
# Run integration tests with Podman containers
|
||||
cd test && ./run-tests.sh
|
||||
|
||||
# Run specialized tests
|
||||
cd test && ./test-wildcard-patterns.sh
|
||||
cd test && ./test-incremental-sync.sh
|
||||
|
||||
# Run unit tests (none currently implemented)
|
||||
cd go && go test ./...
|
||||
|
||||
# Check dependencies
|
||||
|
|
@ -60,7 +67,7 @@ cd go && go mod tidy
|
|||
### Configuration Structure
|
||||
|
||||
The application uses `config.json` for configuration with the following structure:
|
||||
- `couchDb`: Database connection settings (URL, credentials, database name - note: the database field is now ignored as each mail source gets its own database)
|
||||
- `couchDb`: Database connection settings (URL, credentials)
|
||||
- `mailSources`: Array of mail sources with individual settings:
|
||||
- Protocol support (currently only IMAP)
|
||||
- Connection details (host, port, credentials)
|
||||
|
|
@ -100,7 +107,7 @@ This design ensures the same `config.json` format will work for both Go and Rust
|
|||
- ✅ Full message body and attachment handling with MIME multipart support
|
||||
- ✅ Command line argument support (--max-messages flag)
|
||||
- ✅ Per-account CouchDB databases for better organization
|
||||
- ❌ Incremental sync functionality
|
||||
- ✅ Incremental sync functionality with IMAP SEARCH and sync metadata tracking
|
||||
- ❌ Rust implementation
|
||||
|
||||
### Key Dependencies
|
||||
|
|
@ -108,33 +115,45 @@ This design ensures the same `config.json` format will work for both Go and Rust
|
|||
- `github.com/emersion/go-imap/v2`: IMAP client library
|
||||
- `github.com/go-kivik/kivik/v4`: CouchDB client library
|
||||
|
||||
### Incremental Sync Implementation
|
||||
|
||||
The application implements intelligent incremental synchronization to avoid re-processing messages:
|
||||
|
||||
- **Sync Metadata Storage**: Each mailbox sync operation stores metadata including last sync timestamp and highest UID processed
|
||||
- **IMAP SEARCH Integration**: Uses IMAP SEARCH with SINCE criteria for efficient server-side filtering of new messages
|
||||
- **Per-Mailbox Tracking**: Sync state is tracked independently for each mailbox in each account
|
||||
- **Fallback Behavior**: Gracefully falls back to fetching recent messages if IMAP SEARCH fails
|
||||
- **First Sync Handling**: Initial sync can use config `since` date or perform full sync
|
||||
|
||||
Sync metadata documents are stored in CouchDB with ID format: `sync_metadata_{mailbox}` and include:
|
||||
- `lastSyncTime`: When this mailbox was last successfully synced
|
||||
- `lastMessageUID`: Highest UID processed in the last sync
|
||||
- `messageCount`: Number of messages processed in the last sync
|
||||
|
||||
### Development Notes
|
||||
|
||||
- The main entry point is `main.go` which orchestrates the configuration loading, CouchDB setup, and mail source processing
|
||||
- Each mail source gets its own CouchDB database named using `GenerateAccountDBName()` function
|
||||
- Each mail source gets its own CouchDB database named using `GenerateAccountDBName()` function with `m2c_` prefix
|
||||
- Each mail source is processed sequentially with proper error handling
|
||||
- The application currently uses placeholder message data for testing the storage pipeline
|
||||
- Message filtering by folder (include/exclude) and date (since) is implemented
|
||||
- The application uses real IMAP message parsing with go-message library for full email processing
|
||||
- Message filtering by folder (wildcard patterns), date (since), and keywords is implemented
|
||||
- Duplicate detection prevents re-storing existing messages
|
||||
- Sync vs Archive mode determines whether to remove documents from CouchDB when they're no longer in the mail account
|
||||
- Email attachments are stored as native CouchDB attachments linked to the email document
|
||||
- No tests are currently implemented
|
||||
- Comprehensive test environment with Podman containers and automated test scripts
|
||||
- The application uses automatic config file discovery as documented above
|
||||
|
||||
### Next Steps
|
||||
|
||||
To complete the implementation, the following items need to be addressed:
|
||||
The following enhancements could further improve the implementation:
|
||||
|
||||
1. **Real IMAP Message Parsing**: Replace placeholder message generation with actual IMAP message fetching and parsing using the correct go-imap/v2 API
|
||||
2. **Message Body Extraction**: Implement proper text/plain and text/html body extraction from multipart messages
|
||||
3. **Keyword Filtering**: Add support for filtering messages by keywords in:
|
||||
- Subject line (`subjectKeywords`)
|
||||
- Sender addresses (`senderKeywords`)
|
||||
- Recipient addresses (`recipientKeywords`)
|
||||
4. **Attachment Handling**: Add support for email attachments (optional)
|
||||
5. **Error Recovery**: Add retry logic for network failures and partial sync recovery
|
||||
6. **Performance**: Add batch operations for better CouchDB insertion performance
|
||||
7. **Testing**: Add unit tests for all major components
|
||||
1. **Error Recovery**: Add retry logic for network failures and partial sync recovery
|
||||
2. **Performance Optimization**: Add batch operations for better CouchDB insertion performance
|
||||
3. **Unit Testing**: Add comprehensive unit tests for all major components
|
||||
4. **Advanced Filtering**: Add support for more complex filter expressions and regex patterns
|
||||
5. **Monitoring**: Add metrics and logging for production deployment
|
||||
6. **Configuration Validation**: Enhanced validation for configuration files
|
||||
7. **Multi-threading**: Parallel processing of multiple mailboxes or accounts
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue