mail2couch/couchdb-schemas.md
Ole-Morten Duesund 651d95e98b docs: add comprehensive CouchDB schema documentation for cross-implementation compatibility
- Add complete CouchDB document schema specifications in couchdb-schemas.md
- Create example JSON documents for mail and sync metadata structures
- Implement Rust schema definitions with full serde support and type safety
- Add validation script to ensure schema consistency across implementations
- Document field definitions, data types, and validation rules
- Provide Rust Cargo.toml with appropriate dependencies for future implementation

This establishes a solid foundation for the planned Rust implementation while ensuring
100% compatibility with existing Go implementation databases. Both implementations will
use identical document structures, field names, and database naming conventions.

Schema Features:
- Mail documents with native CouchDB attachment support
- Sync metadata for incremental synchronization
- Predictable document ID patterns for efficient access
- Cross-language type mappings and validation rules
- Example documents for testing and reference

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-02 15:08:35 +02:00

6.9 KiB

CouchDB Document Schemas

This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.).

Mail Document Schema

Document Type: mail
Document ID Format: {mailbox}_{uid} (e.g., INBOX_123)
Purpose: Stores individual email messages with metadata and content

{
  "_id": "INBOX_123",
  "_rev": "1-abc123...",
  "_attachments": {
    "attachment1.pdf": {
      "content_type": "application/pdf",
      "length": 12345,
      "stub": true
    }
  },
  "sourceUid": "123",
  "mailbox": "INBOX",
  "from": ["sender@example.com"],
  "to": ["recipient@example.com"],
  "subject": "Email Subject",
  "date": "2025-08-02T12:16:10Z",
  "body": "Email body content",
  "headers": {
    "Content-Type": ["text/plain; charset=utf-8"],
    "Message-ID": ["<msg123@example.com>"],
    "Date": ["Sat, 02 Aug 2025 14:16:10 +0200"]
  },
  "storedAt": "2025-08-02T14:16:22.375241322+02:00",
  "docType": "mail",
  "hasAttachments": true
}

Field Definitions

Field Type Required Description
_id string Yes CouchDB document ID: {mailbox}_{uid}
_rev string Auto CouchDB revision (managed by CouchDB)
_attachments object No CouchDB native attachments (email attachments)
sourceUid string Yes Original IMAP UID from mail server
mailbox string Yes Source mailbox name (e.g., "INBOX", "Sent")
from array[string] Yes Sender email addresses
to array[string] Yes Recipient email addresses
subject string Yes Email subject line
date string (ISO8601) Yes Email date from headers
body string Yes Email body content (plain text)
headers object Yes All email headers as key-value pairs
storedAt string (ISO8601) Yes When document was stored in CouchDB
docType string Yes Always "mail" for email documents
hasAttachments boolean Yes Whether email has attachments

Attachment Stub Schema

When emails have attachments, they are stored as CouchDB native attachments:

{
  "filename.ext": {
    "content_type": "mime/type",
    "length": 12345,
    "stub": true
  }
}
Field Type Required Description
content_type string Yes MIME type of attachment
length integer No Size in bytes
stub boolean No Indicates attachment is stored separately

Sync Metadata Document Schema

Document Type: sync_metadata
Document ID Format: sync_metadata_{mailbox} (e.g., sync_metadata_INBOX)
Purpose: Tracks synchronization state for incremental syncing

{
  "_id": "sync_metadata_INBOX",
  "_rev": "1-def456...",
  "docType": "sync_metadata",
  "mailbox": "INBOX",
  "lastSyncTime": "2025-08-02T14:26:08.281094+02:00",
  "lastMessageUID": 15,
  "messageCount": 18,
  "updatedAt": "2025-08-02T14:26:08.281094+02:00"
}

Field Definitions

Field Type Required Description
_id string Yes CouchDB document ID: sync_metadata_{mailbox}
_rev string Auto CouchDB revision (managed by CouchDB)
docType string Yes Always "sync_metadata" for sync documents
mailbox string Yes Mailbox name this metadata applies to
lastSyncTime string (ISO8601) Yes When this mailbox was last synced
lastMessageUID integer Yes Highest IMAP UID processed in last sync
messageCount integer Yes Number of messages processed in last sync
updatedAt string (ISO8601) Yes When this metadata was last updated

Database Naming Convention

Format: m2c_{account_name}
Rules:

  • Prefix all databases with m2c_
  • Convert account names to lowercase
  • Replace invalid characters with underscores
  • Ensure database name starts with a letter
  • If account name starts with non-letter, prefix with mail_

Examples:

  • Account "Personal Gmail" → Database m2c_personal_gmail
  • Account "123work" → Database m2c_mail_123work
  • Email "user@example.com" → Database m2c_user_example_com

Document ID Conventions

Mail Documents

  • Format: {mailbox}_{uid}
  • Examples: INBOX_123, Sent_456, Work/Projects_789
  • Uniqueness: Combination of mailbox and IMAP UID ensures uniqueness

Sync Metadata Documents

  • Format: sync_metadata_{mailbox}
  • Examples: sync_metadata_INBOX, sync_metadata_Sent
  • Purpose: One metadata document per mailbox for tracking sync state

Data Type Mappings

Go to JSON

Go Type JSON Type Example
string string "text"
[]string array ["item1", "item2"]
map[string][]string object {"key": ["value1", "value2"]}
time.Time string (ISO8601) "2025-08-02T14:26:08.281094+02:00"
uint32 number 123
int number 456
bool boolean true

Rust Considerations

When implementing in Rust, ensure:

  • Use chrono::DateTime<Utc> for timestamps with ISO8601 serialization
  • Use Vec<String> for string arrays
  • Use HashMap<String, Vec<String>> for headers
  • Use serde with #[serde(rename = "fieldName")] for JSON field mapping
  • Handle optional fields with Option<T>

Validation Rules

Required Fields

All documents must include:

  • _id: Valid CouchDB document ID
  • docType: Identifies document type for filtering
  • mailbox: Source mailbox name (for mail documents)

Data Constraints

  • Email addresses: No validation enforced (preserve as-is from IMAP)
  • Dates: Must be valid ISO8601 format
  • UIDs: Must be positive integers
  • Document IDs: Must be valid CouchDB IDs (no spaces, special chars)

Attachment Handling

  • Store email attachments as CouchDB native attachments
  • Preserve original filenames and MIME types
  • Use attachment stubs in document metadata
  • Support binary content through CouchDB attachment API

Backward Compatibility

When modifying schemas:

  1. Add new fields as optional
  2. Never remove existing fields
  3. Maintain existing field types and formats
  4. Document any breaking changes clearly
  5. Provide migration guidance for existing data

Implementation Notes

CouchDB Features Used

  • Native Attachments: For email attachments
  • Document IDs: Predictable format for easy access
  • Bulk Operations: For efficient storage
  • Conflict Resolution: CouchDB handles revision conflicts

Performance Considerations

  • Index by docType for efficient filtering
  • Index by mailbox for folder-based queries
  • Index by date for chronological access
  • Use bulk insert operations for multiple messages

Future Extensions

This schema supports future enhancements:

  • Webmail Views: CouchDB design documents for HTML interface
  • Search Indexes: Full-text search with CouchDB-Lucene
  • Replication: Multi-database sync scenarios
  • Analytics: Message statistics and reporting