- Add complete CouchDB document schema specifications in couchdb-schemas.md - Create example JSON documents for mail and sync metadata structures - Implement Rust schema definitions with full serde support and type safety - Add validation script to ensure schema consistency across implementations - Document field definitions, data types, and validation rules - Provide Rust Cargo.toml with appropriate dependencies for future implementation This establishes a solid foundation for the planned Rust implementation while ensuring 100% compatibility with existing Go implementation databases. Both implementations will use identical document structures, field names, and database naming conventions. Schema Features: - Mail documents with native CouchDB attachment support - Sync metadata for incremental synchronization - Predictable document ID patterns for efficient access - Cross-language type mappings and validation rules - Example documents for testing and reference 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
207 lines
No EOL
6.9 KiB
Markdown
207 lines
No EOL
6.9 KiB
Markdown
# CouchDB Document Schemas
|
|
|
|
This document defines the CouchDB document schemas used by mail2couch. These schemas must be maintained consistently across all implementations (Go, Rust, etc.).
|
|
|
|
## Mail Document Schema
|
|
|
|
**Document Type**: `mail`
|
|
**Document ID Format**: `{mailbox}_{uid}` (e.g., `INBOX_123`)
|
|
**Purpose**: Stores individual email messages with metadata and content
|
|
|
|
```json
|
|
{
|
|
"_id": "INBOX_123",
|
|
"_rev": "1-abc123...",
|
|
"_attachments": {
|
|
"attachment1.pdf": {
|
|
"content_type": "application/pdf",
|
|
"length": 12345,
|
|
"stub": true
|
|
}
|
|
},
|
|
"sourceUid": "123",
|
|
"mailbox": "INBOX",
|
|
"from": ["sender@example.com"],
|
|
"to": ["recipient@example.com"],
|
|
"subject": "Email Subject",
|
|
"date": "2025-08-02T12:16:10Z",
|
|
"body": "Email body content",
|
|
"headers": {
|
|
"Content-Type": ["text/plain; charset=utf-8"],
|
|
"Message-ID": ["<msg123@example.com>"],
|
|
"Date": ["Sat, 02 Aug 2025 14:16:10 +0200"]
|
|
},
|
|
"storedAt": "2025-08-02T14:16:22.375241322+02:00",
|
|
"docType": "mail",
|
|
"hasAttachments": true
|
|
}
|
|
```
|
|
|
|
### Field Definitions
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `_id` | string | Yes | CouchDB document ID: `{mailbox}_{uid}` |
|
|
| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) |
|
|
| `_attachments` | object | No | CouchDB native attachments (email attachments) |
|
|
| `sourceUid` | string | Yes | Original IMAP UID from mail server |
|
|
| `mailbox` | string | Yes | Source mailbox name (e.g., "INBOX", "Sent") |
|
|
| `from` | array[string] | Yes | Sender email addresses |
|
|
| `to` | array[string] | Yes | Recipient email addresses |
|
|
| `subject` | string | Yes | Email subject line |
|
|
| `date` | string (ISO8601) | Yes | Email date from headers |
|
|
| `body` | string | Yes | Email body content (plain text) |
|
|
| `headers` | object | Yes | All email headers as key-value pairs |
|
|
| `storedAt` | string (ISO8601) | Yes | When document was stored in CouchDB |
|
|
| `docType` | string | Yes | Always "mail" for email documents |
|
|
| `hasAttachments` | boolean | Yes | Whether email has attachments |
|
|
|
|
### Attachment Stub Schema
|
|
|
|
When emails have attachments, they are stored as CouchDB native attachments:
|
|
|
|
```json
|
|
{
|
|
"filename.ext": {
|
|
"content_type": "mime/type",
|
|
"length": 12345,
|
|
"stub": true
|
|
}
|
|
}
|
|
```
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `content_type` | string | Yes | MIME type of attachment |
|
|
| `length` | integer | No | Size in bytes |
|
|
| `stub` | boolean | No | Indicates attachment is stored separately |
|
|
|
|
## Sync Metadata Document Schema
|
|
|
|
**Document Type**: `sync_metadata`
|
|
**Document ID Format**: `sync_metadata_{mailbox}` (e.g., `sync_metadata_INBOX`)
|
|
**Purpose**: Tracks synchronization state for incremental syncing
|
|
|
|
```json
|
|
{
|
|
"_id": "sync_metadata_INBOX",
|
|
"_rev": "1-def456...",
|
|
"docType": "sync_metadata",
|
|
"mailbox": "INBOX",
|
|
"lastSyncTime": "2025-08-02T14:26:08.281094+02:00",
|
|
"lastMessageUID": 15,
|
|
"messageCount": 18,
|
|
"updatedAt": "2025-08-02T14:26:08.281094+02:00"
|
|
}
|
|
```
|
|
|
|
### Field Definitions
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `_id` | string | Yes | CouchDB document ID: `sync_metadata_{mailbox}` |
|
|
| `_rev` | string | Auto | CouchDB revision (managed by CouchDB) |
|
|
| `docType` | string | Yes | Always "sync_metadata" for sync documents |
|
|
| `mailbox` | string | Yes | Mailbox name this metadata applies to |
|
|
| `lastSyncTime` | string (ISO8601) | Yes | When this mailbox was last synced |
|
|
| `lastMessageUID` | integer | Yes | Highest IMAP UID processed in last sync |
|
|
| `messageCount` | integer | Yes | Number of messages processed in last sync |
|
|
| `updatedAt` | string (ISO8601) | Yes | When this metadata was last updated |
|
|
|
|
## Database Naming Convention
|
|
|
|
**Format**: `m2c_{account_name}`
|
|
**Rules**:
|
|
- Prefix all databases with `m2c_`
|
|
- Convert account names to lowercase
|
|
- Replace invalid characters with underscores
|
|
- Ensure database name starts with a letter
|
|
- If account name starts with non-letter, prefix with `mail_`
|
|
|
|
**Examples**:
|
|
- Account "Personal Gmail" → Database `m2c_personal_gmail`
|
|
- Account "123work" → Database `m2c_mail_123work`
|
|
- Email "user@example.com" → Database `m2c_user_example_com`
|
|
|
|
## Document ID Conventions
|
|
|
|
### Mail Documents
|
|
- **Format**: `{mailbox}_{uid}`
|
|
- **Examples**: `INBOX_123`, `Sent_456`, `Work/Projects_789`
|
|
- **Uniqueness**: Combination of mailbox and IMAP UID ensures uniqueness
|
|
|
|
### Sync Metadata Documents
|
|
- **Format**: `sync_metadata_{mailbox}`
|
|
- **Examples**: `sync_metadata_INBOX`, `sync_metadata_Sent`
|
|
- **Purpose**: One metadata document per mailbox for tracking sync state
|
|
|
|
## Data Type Mappings
|
|
|
|
### Go to JSON
|
|
| Go Type | JSON Type | Example |
|
|
|---------|-----------|---------|
|
|
| `string` | string | `"text"` |
|
|
| `[]string` | array | `["item1", "item2"]` |
|
|
| `map[string][]string` | object | `{"key": ["value1", "value2"]}` |
|
|
| `time.Time` | string (ISO8601) | `"2025-08-02T14:26:08.281094+02:00"` |
|
|
| `uint32` | number | `123` |
|
|
| `int` | number | `456` |
|
|
| `bool` | boolean | `true` |
|
|
|
|
### Rust Considerations
|
|
When implementing in Rust, ensure:
|
|
- Use `chrono::DateTime<Utc>` for timestamps with ISO8601 serialization
|
|
- Use `Vec<String>` for string arrays
|
|
- Use `HashMap<String, Vec<String>>` for headers
|
|
- Use `serde` with `#[serde(rename = "fieldName")]` for JSON field mapping
|
|
- Handle optional fields with `Option<T>`
|
|
|
|
## Validation Rules
|
|
|
|
### Required Fields
|
|
All documents must include:
|
|
- `_id`: Valid CouchDB document ID
|
|
- `docType`: Identifies document type for filtering
|
|
- `mailbox`: Source mailbox name (for mail documents)
|
|
|
|
### Data Constraints
|
|
- Email addresses: No validation enforced (preserve as-is from IMAP)
|
|
- Dates: Must be valid ISO8601 format
|
|
- UIDs: Must be positive integers
|
|
- Document IDs: Must be valid CouchDB IDs (no spaces, special chars)
|
|
|
|
### Attachment Handling
|
|
- Store email attachments as CouchDB native attachments
|
|
- Preserve original filenames and MIME types
|
|
- Use attachment stubs in document metadata
|
|
- Support binary content through CouchDB attachment API
|
|
|
|
## Backward Compatibility
|
|
|
|
When modifying schemas:
|
|
1. Add new fields as optional
|
|
2. Never remove existing fields
|
|
3. Maintain existing field types and formats
|
|
4. Document any breaking changes clearly
|
|
5. Provide migration guidance for existing data
|
|
|
|
## Implementation Notes
|
|
|
|
### CouchDB Features Used
|
|
- **Native Attachments**: For email attachments
|
|
- **Document IDs**: Predictable format for easy access
|
|
- **Bulk Operations**: For efficient storage
|
|
- **Conflict Resolution**: CouchDB handles revision conflicts
|
|
|
|
### Performance Considerations
|
|
- Index by `docType` for efficient filtering
|
|
- Index by `mailbox` for folder-based queries
|
|
- Index by `date` for chronological access
|
|
- Use bulk insert operations for multiple messages
|
|
|
|
### Future Extensions
|
|
This schema supports future enhancements:
|
|
- **Webmail Views**: CouchDB design documents for HTML interface
|
|
- **Search Indexes**: Full-text search with CouchDB-Lucene
|
|
- **Replication**: Multi-database sync scenarios
|
|
- **Analytics**: Message statistics and reporting |