skyview/debian/usr/share/doc/skyview-adsb/DATABASE.md
Ole-Morten Duesund d80bb3a10f docs: Update DATABASE.md with comprehensive schema and usage documentation
- Document complete database schema including aircraft history and callsign cache
- Add external data source tables and relationships
- Include optimization and maintenance procedures
- Document indexes, performance considerations, and storage requirements
- Provide examples of database queries and operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-31 19:44:15 +02:00

23 KiB

SkyView Database Architecture

This document describes SkyView's SQLite database architecture, migration system, and integration approach for persistent data storage.

Overview

SkyView uses a single SQLite database to store:

  • Historic aircraft data: Position history, message counts, signal strength
  • Callsign lookup data: Cached airline/airport information from external APIs
  • Embedded aviation data: OpenFlights airline and airport databases

Database Design Principles

Embedded Architecture

  • Single SQLite file for all persistent data
  • No external database dependencies
  • Self-contained deployment with embedded schemas
  • Backward compatibility through versioned migrations

Performance Optimization

  • Strategic indexing for time-series aircraft data
  • Efficient lookups for callsign enhancement
  • Configurable data retention policies
  • Query optimization for real-time operations

Data Safety

  • Atomic migration transactions
  • Pre-migration backups for destructive changes
  • Data loss warnings for schema changes
  • Rollback capabilities where possible

Database Schema

Core Tables

schema_info

Tracks database version and applied migrations:

CREATE TABLE schema_info (
    version INTEGER PRIMARY KEY,
    applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    description TEXT,
    checksum TEXT
);

aircraft_history

Stores time-series aircraft position and message data:

CREATE TABLE aircraft_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    icao TEXT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    latitude REAL,
    longitude REAL,
    altitude INTEGER,
    speed INTEGER,
    track INTEGER,
    vertical_rate INTEGER,
    squawk TEXT,
    callsign TEXT,
    source_id TEXT NOT NULL,
    signal_strength REAL
);

Indexes:

  • idx_aircraft_history_icao_time: Fast queries by aircraft and time range
  • idx_aircraft_history_timestamp: Time-based cleanup and queries
  • idx_aircraft_history_callsign: Callsign-based searches

airlines

Multi-source airline database with unified schema:

CREATE TABLE airlines (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    alias TEXT,
    iata_code TEXT,
    icao_code TEXT,
    callsign TEXT,
    country TEXT,
    country_code TEXT,
    active BOOLEAN DEFAULT 1,
    data_source TEXT NOT NULL DEFAULT 'unknown',
    source_id TEXT,
    imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Indexes:

  • idx_airlines_icao_code: ICAO code lookup (primary for callsign enhancement)
  • idx_airlines_iata_code: IATA code lookup
  • idx_airlines_callsign: Radio callsign lookup
  • idx_airlines_country_code: Country-based filtering
  • idx_airlines_active: Active airlines filtering
  • idx_airlines_source: Data source tracking

airports

Multi-source airport database with comprehensive metadata:

CREATE TABLE airports (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    ident TEXT,
    type TEXT,
    city TEXT,
    municipality TEXT,
    region TEXT,
    country TEXT,
    country_code TEXT,
    continent TEXT,
    iata_code TEXT,
    icao_code TEXT,
    local_code TEXT,
    gps_code TEXT,
    latitude REAL,
    longitude REAL,
    elevation_ft INTEGER,
    scheduled_service BOOLEAN DEFAULT 0,
    home_link TEXT,
    wikipedia_link TEXT,
    keywords TEXT,
    timezone_offset REAL,
    timezone TEXT,
    dst_type TEXT,
    data_source TEXT NOT NULL DEFAULT 'unknown',
    source_id TEXT,
    imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Indexes:

  • idx_airports_icao_code: ICAO code lookup
  • idx_airports_iata_code: IATA code lookup
  • idx_airports_ident: Airport identifier lookup
  • idx_airports_country_code: Country-based filtering
  • idx_airports_type: Airport type filtering
  • idx_airports_coords: Geographic coordinate queries
  • idx_airports_source: Data source tracking

callsign_cache

Caches external API lookups and local enrichment for callsign enhancement:

CREATE TABLE callsign_cache (
    callsign TEXT PRIMARY KEY,
    airline_icao TEXT,
    airline_iata TEXT,
    airline_name TEXT,
    airline_country TEXT,
    flight_number TEXT,
    origin_iata TEXT,           -- Departure airport IATA code
    destination_iata TEXT,      -- Arrival airport IATA code  
    aircraft_type TEXT,
    route TEXT,                 -- Full route description
    status TEXT,                -- Flight status (scheduled, delayed, etc.)
    source TEXT NOT NULL DEFAULT 'local',
    cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL
);

Route Information Fields:

  • origin_iata: IATA code of departure airport (e.g., "JFK" for New York JFK)
  • destination_iata: IATA code of arrival airport (e.g., "LAX" for Los Angeles)
  • route: Human-readable route description (e.g., "JFK-LAX" or "New York to Los Angeles")
  • status: Current flight status when available from external APIs

These fields enable enhanced flight tracking with origin-destination pairs and route visualization.

Indexes:

  • idx_callsign_cache_expires: Efficient cache cleanup
  • idx_callsign_cache_airline: Airline-based queries

data_sources

Tracks loaded external data sources and their metadata:

CREATE TABLE data_sources (
    name TEXT PRIMARY KEY,
    license TEXT NOT NULL,
    url TEXT,
    version TEXT,
    imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    record_count INTEGER DEFAULT 0,
    user_accepted_license BOOLEAN DEFAULT 0
);

Database Location Strategy

Path Resolution Order

  1. Explicit configuration: database.path in config file
  2. System service: /var/lib/skyview/skyview.db
  3. User mode: ~/.local/share/skyview/skyview.db
  4. Fallback: ./skyview.db in current directory

Directory Permissions

  • System: root:root with 755 permissions for /var/lib/skyview/
  • User: User-owned directories with standard permissions
  • Service: skyview:skyview user/group for system service

Migration System

Migration Structure

type Migration struct {
    Version     int    // Sequential version number
    Description string // Human-readable description
    Up          string // SQL for applying migration
    Down        string // SQL for rollback (optional)
    DataLoss    bool   // Warning flag for destructive changes
}

Migration Process

  1. Version Check: Compare current schema version with available migrations
  2. Backup: Create automatic backup before destructive changes
  3. Transaction: Wrap each migration in atomic transaction
  4. Validation: Verify schema integrity after migration
  5. Logging: Record successful migrations in schema_info

Data Loss Protection

  • Migrations marked with DataLoss: true require explicit user consent
  • Automatic backups created before destructive operations
  • Warning messages displayed during upgrade process
  • Rollback SQL provided where possible

Example Migration Sequence

var migrations = []Migration{
    {
        Version:     1,
        Description: "Initial schema with aircraft history",
        Up:          createInitialSchema,
        DataLoss:    false,
    },
    {
        Version:     2,
        Description: "Add OpenFlights airline and airport data",
        Up:          addAviationTables,
        DataLoss:    false,
    },
    {
        Version:     3,
        Description: "Add callsign lookup cache",
        Up:          addCallsignCache,
        DataLoss:    false,
    },
}

Data Sources and Loading

SkyView supports multiple aviation data sources with automatic conflict resolution and license compliance.

Supported Data Sources

OpenFlights Airlines Database

  • Source: https://openflights.org/data.html
  • License: Open Database License (ODbL) 1.0
  • Content: Global airline data with ICAO/IATA codes, callsigns, and country information
  • Records: ~6,162 airlines
  • Update Method: Runtime download (no license confirmation required)

OpenFlights Airports Database

  • Source: https://openflights.org/data.html
  • License: Open Database License (ODbL) 1.0
  • Content: Global airport data with coordinates, codes, and metadata
  • Records: ~7,698 airports
  • Update Method: Runtime download

OurAirports Database

  • Source: https://ourairports.com/data/
  • License: Creative Commons Zero (CC0) 1.0
  • Content: Comprehensive airport database with detailed metadata
  • Records: ~83,557 airports
  • Update Method: Runtime download

Data Loading System

Intelligent Conflict Resolution

The data loading system uses INSERT OR REPLACE upserts to handle overlapping data:

INSERT OR REPLACE INTO airlines (id, name, alias, iata_code, icao_code, callsign, country, active, data_source)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)

This ensures that:

  • Duplicate records are automatically updated rather than causing errors
  • Later data sources can override earlier ones
  • Database integrity is maintained during bulk loads

Loading Process

  1. Source Validation: Verify data source accessibility and format
  2. Incremental Processing: Process data in chunks to manage memory
  3. Error Handling: Log and continue on individual record errors
  4. Statistics Reporting: Track records processed, added, and errors
  5. Source Tracking: Record metadata about each loaded source

Performance Characteristics

  • OpenFlights Airlines: ~6,162 records in ~363ms
  • OpenFlights Airports: ~7,698 records in ~200ms
  • OurAirports: ~83,557 records in ~980ms
  • Error Rate: <0.1% under normal conditions

Configuration Integration

Database Configuration

{
  "database": {
    "path": "/var/lib/skyview-adsb/skyview.db",
    "max_history_days": 7,
    "backup_on_upgrade": true,
    "vacuum_interval": "24h",
    "page_size": 4096
  },
  "callsign": {
    "enabled": true,
    "cache_hours": 24,
    "external_apis": true,
    "privacy_mode": false
  }
}

Configuration Fields

database

  • path: Database file location (empty = auto-resolve)
  • max_history_days: Retention policy for aircraft history (0 = unlimited)
  • backup_on_upgrade: Create backup before schema migrations

callsign

  • enabled: Enable callsign enhancement features
  • cache_hours: TTL for cached external API results
  • privacy_mode: Disable all external data requests
  • sources: Independent control for each data source

Enhanced Configuration Example

{
  "callsign": {
    "enabled": true,
    "cache_hours": 24,
    "privacy_mode": false,
    "sources": {
      "openflights_embedded": {
        "enabled": true,
        "priority": 1,
        "license": "AGPL-3.0"
      },
      "faa_registry": {
        "enabled": false,
        "priority": 2,
        "update_frequency": "weekly",
        "license": "public_domain"
      },
      "opensky_api": {
        "enabled": false,
        "priority": 3,
        "timeout_seconds": 5,
        "max_retries": 2,
        "requires_consent": true,
        "license_warning": "Commercial use requires OpenSky Network consent",
        "user_accepts_terms": false
      },
      "custom_database": {
        "enabled": false,
        "priority": 4,
        "path": "",
        "license": "user_verified"
      }
    },
    "fallback_chain": ["openflights_embedded", "faa_registry", "opensky_api", "custom_database"]
  }
}

Individual Source Configuration Options

  • enabled: Enable/disable this specific source
  • priority: Processing order (lower numbers = higher priority)
  • license: License type for compliance tracking
  • requires_consent: Whether source requires explicit user consent
  • user_accepts_terms: User acknowledgment of licensing terms
  • timeout_seconds: Per-source timeout configuration
  • max_retries: Per-source retry limits
  • update_frequency: For downloadable sources (daily/weekly/monthly)

Debian Package Integration

Package Structure

/var/lib/skyview/           # Database directory
/etc/skyview/config.json    # Default configuration
/usr/bin/skyview           # Main application
/usr/share/skyview/        # Embedded resources

Installation Process

  1. postinst: Create directories, user accounts, permissions
  2. First Run: Database initialization and migration on startup
  3. Upgrades: Automatic schema migration with backup
  4. Service: Systemd integration with proper database access

Service User

  • User: skyview-adsb
  • Home: /var/lib/skyview-adsb
  • Shell: /bin/false (service account)
  • Database: Read/write access to /var/lib/skyview-adsb/

Automatic Database Updates

The systemd service configuration includes automatic database updates on startup:

[Service]
Type=simple
User=skyview-adsb
Group=skyview-adsb
# Update database before starting main service  
ExecStartPre=/usr/bin/skyview-data -config /etc/skyview-adsb/config.json update
TimeoutStartSec=300
ExecStart=/usr/bin/skyview -config /etc/skyview-adsb/config.json

This ensures aviation data sources are refreshed before each service start, complementing the weekly timer-based updates.

Data Retention and Cleanup

Automatic Cleanup

  • Aircraft History: Configurable retention period (max_history_days)
  • Cache Expiration: TTL-based cleanup of external API cache
  • Optimization: Periodic VACUUM operations for storage efficiency

Manual Maintenance

-- Clean old aircraft history (example: 7 days)
DELETE FROM aircraft_history 
WHERE timestamp < datetime('now', '-7 days');

-- Clean expired cache entries
DELETE FROM callsign_cache 
WHERE expires_at < datetime('now');

-- Optimize database storage
VACUUM;

Database Optimization

SkyView includes a comprehensive database optimization system that automatically manages storage efficiency and performance.

Optimization Features

Automatic VACUUM Operations

  • Full VACUUM: Rebuilds database to reclaim deleted space
  • Incremental VACUUM: Gradual space reclamation with minimal performance impact
  • Scheduled Maintenance: Configurable intervals for automatic optimization
  • Size Reporting: Before/after statistics with space savings metrics

Storage Optimization

  • Page Size Optimization: Configurable SQLite page size for optimal performance
  • Auto-Vacuum Configuration: Enables incremental space reclamation
  • Statistics Updates: ANALYZE operations for query plan optimization
  • Efficiency Monitoring: Real-time storage efficiency reporting

Using the Optimization System

Command Line Interface

# Run comprehensive database optimization
skyview-data optimize

# Run with force flag to skip confirmation prompts
skyview-data optimize --force

# Check current optimization statistics
skyview-data optimize --stats-only

Optimization Output Example

Optimizing database for storage efficiency...
✓ Auto VACUUM: Enable incremental auto-vacuum
✓ Incremental VACUUM: Reclaim free pages incrementally  
✓ Optimize: Update SQLite query planner statistics
✓ Analyze: Update table statistics for better query plans

VACUUM completed in 1.2s: 275.3 MB → 263.1 MB (saved 12.2 MB, 4.4%)

Database optimization completed successfully.
Storage efficiency: 96.8% (263.1 MB used of 272.4 MB allocated)

Configuration Options

{
  "database": {
    "vacuum_interval": "24h",
    "page_size": 4096,
    "enable_compression": true,
    "compression_level": 6
  }
}

Optimization Statistics

The optimization system provides detailed metrics about database performance:

Available Statistics

  • Database Size: Total file size in bytes
  • Page Statistics: Page size, count, and utilization
  • Storage Efficiency: Percentage of allocated space actually used
  • Free Space: Amount of reclaimable space available
  • Auto-Vacuum Status: Current auto-vacuum configuration
  • Last Optimization: Timestamp of most recent optimization

Programmatic Access

// Get current optimization statistics
optimizer := NewOptimizationManager(db, config)
stats, err := optimizer.GetOptimizationStats()
if err != nil {
    log.Fatal("Failed to get stats:", err)
}

fmt.Printf("Database efficiency: %.1f%%\n", stats.Efficiency)
fmt.Printf("Storage used: %.1f MB\n", float64(stats.DatabaseSize)/(1024*1024))

Performance Considerations

Query Optimization

  • Time-range queries use idx_aircraft_history_icao_time
  • Callsign lookups prioritize local cache over external APIs
  • Bulk operations use transactions for consistency

Storage Efficiency

  • Configurable history limits prevent unbounded growth
  • Automatic VACUUM operations with optimization reporting
  • Compressed timestamps and efficient data types
  • Page size optimization for storage efficiency
  • Auto-vacuum configuration for incremental space reclamation

Memory Usage

  • WAL mode for concurrent read/write access
  • Connection pooling for multiple goroutines
  • Prepared statements for repeated queries

Privacy and Security

Privacy Mode

SkyView includes comprehensive privacy controls through the privacy_mode configuration option:

{
  "callsign": {
    "enabled": true,
    "privacy_mode": true,
    "external_apis": false
  }
}

Privacy Mode Features

  • No External Calls: Completely disables all external API requests
  • Local-Only Lookups: Uses only embedded OpenFlights database for callsign enhancement
  • No Data Transmission: Aircraft data never leaves the local system
  • Compliance: Suitable for sensitive environments requiring air-gapped operation

Privacy Mode Behavior

Feature Privacy Mode ON Privacy Mode OFF
External API calls Disabled Configurable
OpenFlights lookup Enabled Enabled
Callsign caching Local only Full caching
Data transmission None ⚠️ API calls only

Use Cases for Privacy Mode

  • Military installations: No external data transmission allowed
  • Air-gapped networks: No internet connectivity available
  • Corporate policies: External API usage prohibited
  • Personal privacy: User preference for local-only operation

Security Considerations

File Permissions

  • Database files readable only by skyview user/group
  • Configuration files protected from unauthorized access
  • Backup files inherit secure permissions

Data Protection

  • Local SQLite database with file-system level security
  • No cloud storage or external database dependencies
  • All aviation data processed and stored locally

Network Security

  • External API calls (when enabled) use HTTPS only
  • No persistent connections to external services
  • Optional certificate validation for API endpoints

Data Integrity

  • Foreign key constraints where applicable
  • Transaction isolation for concurrent operations
  • Checksums for migration verification

Troubleshooting

Common Issues

Database Locked

Error: database is locked

Solution: Stop SkyView service, check for stale lock files, restart

Migration Failures

Error: migration 3 failed: table already exists

Solution: Check schema version, restore from backup, retry migration

Permission Denied

Error: unable to open database file

Solution: Verify file permissions, check directory ownership, ensure disk space

Diagnostic Commands

# Check database integrity
sqlite3 /var/lib/skyview/skyview.db "PRAGMA integrity_check;"

# View schema version
sqlite3 /var/lib/skyview/skyview.db "SELECT * FROM schema_info;"

# Database statistics
sqlite3 /var/lib/skyview/skyview.db ".dbinfo"

Testing and Quality Assurance

SkyView includes comprehensive test coverage for all database functionality to ensure reliability and data integrity.

Test Coverage Areas

Core Database Functionality

  • Database Creation and Initialization: Connection management, configuration handling
  • Migration System: Schema versioning, upgrade/downgrade operations
  • Connection Pooling: Concurrent access, connection lifecycle management
  • SQLite Pragma Settings: WAL mode, foreign keys, performance optimizations

Data Loading and Management

  • Multi-Source Loading: OpenFlights, OurAirports data integration
  • Conflict Resolution: Upsert operations, duplicate handling
  • Error Handling: Network failures, malformed data recovery
  • Performance Validation: Loading speed, memory usage optimization

Callsign Enhancement System

  • Parsing Logic: Callsign validation, airline code extraction
  • Database Integration: Local lookups, caching operations
  • Search Functionality: Airline filtering, country-based queries
  • Cache Management: TTL handling, cleanup operations

Optimization System

  • VACUUM Operations: Space reclamation, performance monitoring
  • Page Size Optimization: Configuration validation, storage efficiency
  • Statistics Generation: Metrics accuracy, reporting consistency
  • Maintenance Scheduling: Automated optimization, interval management

Test Infrastructure

Automated Test Setup

// setupTestDatabase creates isolated test environment
func setupTestDatabase(t *testing.T) (*Database, func()) {
    tempFile, _ := os.CreateTemp("", "test_skyview_*.db")
    config := &Config{Path: tempFile.Name()}
    db, _ := NewDatabase(config)
    db.Initialize() // Run all migrations
    
    cleanup := func() {
        db.Close()
        os.Remove(tempFile.Name())
    }
    return db, cleanup
}

Network-Safe Testing

Tests gracefully handle network connectivity issues:

  • Skip tests requiring external data sources when offline
  • Provide meaningful error messages for connectivity failures
  • Use local test data when external sources are unavailable

Running Tests

# Run all database tests
go test -v ./internal/database/...

# Run tests in short mode (skip long-running network tests)
go test -v -short ./internal/database/...

# Run specific test categories
go test -v -run="TestDatabase" ./internal/database/...
go test -v -run="TestOptimization" ./internal/database/...
go test -v -run="TestCallsign" ./internal/database/...

Future Enhancements

Planned Features

  • Compression: Time-series compression for long-term storage
  • Partitioning: Date-based partitioning for large datasets
  • Replication: Read replica support for high-availability setups
  • Analytics: Built-in reporting and statistics tables
  • Enhanced Route Data: Integration with additional flight tracking APIs
  • Geographic Indexing: Spatial queries for airport proximity searches

Migration Path

  • All enhancements will use versioned migrations
  • Backward compatibility maintained for existing installations
  • Data preservation prioritized over schema optimization
  • Comprehensive testing required for all schema changes