mail2couch/docs/couchdb-schemas.md

207 lines
6.9 KiB
Markdown
Raw Permalink Normal View History

# CouchDB Document Schemas
This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.).
## Mail Document Schema
**Document Type**: `mail`
**Document ID Format**: `{mailbox}_{uid}` (e.g., `INBOX_123`)
**Purpose**: Stores individual email messages with metadata and content
```json
{
"_id": "INBOX_123",
"_rev": "1-abc123...",
"_attachments": {
"attachment1.pdf": {
"content_type": "application/pdf",
"length": 12345,
"stub": true
}
},
"sourceUid": "123",
"mailbox": "INBOX",
"from": ["sender@example.com"],
"to": ["recipient@example.com"],
"subject": "Email Subject",
"date": "2025-08-02T12:16:10Z",
"body": "Email body content",
"headers": {
"Content-Type": ["text/plain; charset=utf-8"],
"Message-ID": ["<msg123@example.com>"],
"Date": ["Sat, 02 Aug 2025 14:16:10 +0200"]
},
"storedAt": "2025-08-02T14:16:22.375241322+02:00",
"docType": "mail",
"hasAttachments": true
}
```
### Field Definitions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `_id` | string | Yes | CouchDB document ID: `{mailbox}_{uid}` |
| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) |
| `_attachments` | object | No | CouchDB native attachments (email attachments) |
| `sourceUid` | string | Yes | Original IMAP UID from mail server |
| `mailbox` | string | Yes | Source mailbox name (e.g., "INBOX", "Sent") |
| `from` | array[string] | Yes | Sender email addresses |
| `to` | array[string] | Yes | Recipient email addresses |
| `subject` | string | Yes | Email subject line |
| `date` | string (ISO8601) | Yes | Email date from headers |
| `body` | string | Yes | Email body content (plain text) |
| `headers` | object | Yes | All email headers as key-value pairs |
| `storedAt` | string (ISO8601) | Yes | When document was stored in CouchDB |
| `docType` | string | Yes | Always "mail" for email documents |
| `hasAttachments` | boolean | Yes | Whether email has attachments |
### Attachment Stub Schema
When emails have attachments, they are stored as CouchDB native attachments:
```json
{
"filename.ext": {
"content_type": "mime/type",
"length": 12345,
"stub": true
}
}
```
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `content_type` | string | Yes | MIME type of attachment |
| `length` | integer | No | Size in bytes |
| `stub` | boolean | No | Indicates attachment is stored separately |
## Sync Metadata Document Schema
**Document Type**: `sync_metadata`
**Document ID Format**: `sync_metadata_{mailbox}` (e.g., `sync_metadata_INBOX`)
**Purpose**: Tracks synchronization state for incremental syncing
```json
{
"_id": "sync_metadata_INBOX",
"_rev": "1-def456...",
"docType": "sync_metadata",
"mailbox": "INBOX",
"lastSyncTime": "2025-08-02T14:26:08.281094+02:00",
"lastMessageUID": 15,
"messageCount": 18,
"updatedAt": "2025-08-02T14:26:08.281094+02:00"
}
```
### Field Definitions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `_id` | string | Yes | CouchDB document ID: `sync_metadata_{mailbox}` |
| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) |
| `docType` | string | Yes | Always "sync_metadata" for sync documents |
| `mailbox` | string | Yes | Mailbox name this metadata applies to |
| `lastSyncTime` | string (ISO8601) | Yes | When this mailbox was last synced |
| `lastMessageUID` | integer | Yes | Highest IMAP UID processed in last sync |
| `messageCount` | integer | Yes | Number of messages processed in last sync |
| `updatedAt` | string (ISO8601) | Yes | When this metadata was last updated |
## Database Naming Convention
**Format**: `m2c_{account_name}`
**Rules**:
- Prefix all databases with `m2c_`
- Convert account names to lowercase
- Replace invalid characters with underscores
- Ensure database name starts with a letter
- If account name starts with non-letter, prefix with `mail_`
**Examples**:
- Account "Personal Gmail" → Database `m2c_personal_gmail`
- Account "123work" → Database `m2c_mail_123work`
- Email "user@example.com" → Database `m2c_user_example_com`
## Document ID Conventions
### Mail Documents
- **Format**: `{mailbox}_{uid}`
- **Examples**: `INBOX_123`, `Sent_456`, `Work/Projects_789`
- **Uniqueness**: Combination of mailbox and IMAP UID ensures uniqueness
### Sync Metadata Documents
- **Format**: `sync_metadata_{mailbox}`
- **Examples**: `sync_metadata_INBOX`, `sync_metadata_Sent`
- **Purpose**: One metadata document per mailbox for tracking sync state
## Data Type Mappings
### Go to JSON
| Go Type | JSON Type | Example |
|---------|-----------|---------|
| `string` | string | `"text"` |
| `[]string` | array | `["item1", "item2"]` |
| `map[string][]string` | object | `{"key": ["value1", "value2"]}` |
| `time.Time` | string (ISO8601) | `"2025-08-02T14:26:08.281094+02:00"` |
| `uint32` | number | `123` |
| `int` | number | `456` |
| `bool` | boolean | `true` |
### Rust Considerations
When implementing in Rust, ensure:
- Use `chrono::DateTime<Utc>` for timestamps with ISO8601 serialization
- Use `Vec<String>` for string arrays
- Use `HashMap<String, Vec<String>>` for headers
- Use `serde` with `#[serde(rename = "fieldName")]` for JSON field mapping
- Handle optional fields with `Option<T>`
## Validation Rules
### Required Fields
All documents must include:
- `_id`: Valid CouchDB document ID
- `docType`: Identifies document type for filtering
- `mailbox`: Source mailbox name (for mail documents)
### Data Constraints
- Email addresses: No validation enforced (preserve as-is from IMAP)
- Dates: Must be valid ISO8601 format
- UIDs: Must be positive integers
- Document IDs: Must be valid CouchDB IDs (no spaces, special chars)
### Attachment Handling
- Store email attachments as CouchDB native attachments
- Preserve original filenames and MIME types
- Use attachment stubs in document metadata
- Support binary content through CouchDB attachment API
## Backward Compatibility
When modifying schemas:
1. Add new fields as optional
2. Never remove existing fields
3. Maintain existing field types and formats
4. Document any breaking changes clearly
5. Provide migration guidance for existing data
## Implementation Notes
### CouchDB Features Used
- **Native Attachments**: For email attachments
- **Document IDs**: Predictable format for easy access
- **Bulk Operations**: For efficient storage
- **Conflict Resolution**: CouchDB handles revision conflicts
### Performance Considerations
- Index by `docType` for efficient filtering
- Index by `mailbox` for folder-based queries
- Index by `date` for chronological access
- Use bulk insert operations for multiple messages
### Future Extensions
This schema supports future enhancements:
- **Webmail Views**: CouchDB design documents for HTML interface
- **Search Indexes**: Full-text search with CouchDB-Lucene
- **Replication**: Multi-database sync scenarios
- **Analytics**: Message statistics and reporting