docs: Update DATABASE.md with comprehensive schema and usage documentation

- Document complete database schema including aircraft history and callsign cache
- Add external data source tables and relationships
- Include optimization and maintenance procedures
- Document indexes, performance considerations, and storage requirements
- Provide examples of database queries and operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Ole-Morten Duesund 2025-08-31 19:44:15 +02:00
commit d80bb3a10f
2 changed files with 1010 additions and 93 deletions

View file

@ -49,7 +49,7 @@ Stores time-series aircraft position and message data:
```sql
CREATE TABLE aircraft_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
icao_hex TEXT NOT NULL,
icao TEXT NOT NULL,
timestamp TIMESTAMP NOT NULL,
latitude REAL,
longitude REAL,
@ -59,9 +59,8 @@ CREATE TABLE aircraft_history (
vertical_rate INTEGER,
squawk TEXT,
callsign TEXT,
source_id TEXT,
signal_strength REAL,
message_count INTEGER DEFAULT 1
source_id TEXT NOT NULL,
signal_strength REAL
);
```
@ -71,66 +70,123 @@ CREATE TABLE aircraft_history (
- `idx_aircraft_history_callsign`: Callsign-based searches
#### `airlines`
OpenFlights embedded airline database:
Multi-source airline database with unified schema:
```sql
CREATE TABLE airlines (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
alias TEXT,
iata TEXT,
icao TEXT,
iata_code TEXT,
icao_code TEXT,
callsign TEXT,
country TEXT,
active BOOLEAN DEFAULT 1
country_code TEXT,
active BOOLEAN DEFAULT 1,
data_source TEXT NOT NULL DEFAULT 'unknown',
source_id TEXT,
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Indexes:**
- `idx_airlines_icao`: ICAO code lookup (primary for callsign enhancement)
- `idx_airlines_iata`: IATA code lookup
- `idx_airlines_icao_code`: ICAO code lookup (primary for callsign enhancement)
- `idx_airlines_iata_code`: IATA code lookup
- `idx_airlines_callsign`: Radio callsign lookup
- `idx_airlines_country_code`: Country-based filtering
- `idx_airlines_active`: Active airlines filtering
- `idx_airlines_source`: Data source tracking
#### `airports`
OpenFlights embedded airport database:
Multi-source airport database with comprehensive metadata:
```sql
CREATE TABLE airports (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
ident TEXT,
type TEXT,
city TEXT,
municipality TEXT,
region TEXT,
country TEXT,
iata TEXT,
icao TEXT,
country_code TEXT,
continent TEXT,
iata_code TEXT,
icao_code TEXT,
local_code TEXT,
gps_code TEXT,
latitude REAL,
longitude REAL,
altitude INTEGER,
elevation_ft INTEGER,
scheduled_service BOOLEAN DEFAULT 0,
home_link TEXT,
wikipedia_link TEXT,
keywords TEXT,
timezone_offset REAL,
timezone TEXT,
dst_type TEXT,
timezone TEXT
data_source TEXT NOT NULL DEFAULT 'unknown',
source_id TEXT,
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Indexes:**
- `idx_airports_icao`: ICAO code lookup
- `idx_airports_iata`: IATA code lookup
- `idx_airports_icao_code`: ICAO code lookup
- `idx_airports_iata_code`: IATA code lookup
- `idx_airports_ident`: Airport identifier lookup
- `idx_airports_country_code`: Country-based filtering
- `idx_airports_type`: Airport type filtering
- `idx_airports_coords`: Geographic coordinate queries
- `idx_airports_source`: Data source tracking
#### `callsign_cache`
Caches external API lookups for callsign enhancement:
Caches external API lookups and local enrichment for callsign enhancement:
```sql
CREATE TABLE callsign_cache (
callsign TEXT PRIMARY KEY,
airline_icao TEXT,
airline_iata TEXT,
airline_name TEXT,
airline_country TEXT,
flight_number TEXT,
origin_iata TEXT,
destination_iata TEXT,
origin_iata TEXT, -- Departure airport IATA code
destination_iata TEXT, -- Arrival airport IATA code
aircraft_type TEXT,
route TEXT, -- Full route description
status TEXT, -- Flight status (scheduled, delayed, etc.)
source TEXT NOT NULL DEFAULT 'local',
cached_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
source TEXT DEFAULT 'local'
expires_at TIMESTAMP NOT NULL
);
```
**Route Information Fields:**
- **`origin_iata`**: IATA code of departure airport (e.g., "JFK" for New York JFK)
- **`destination_iata`**: IATA code of arrival airport (e.g., "LAX" for Los Angeles)
- **`route`**: Human-readable route description (e.g., "JFK-LAX" or "New York to Los Angeles")
- **`status`**: Current flight status when available from external APIs
These fields enable enhanced flight tracking with origin-destination pairs and route visualization.
**Indexes:**
- `idx_callsign_cache_expires`: Efficient cache cleanup
- `idx_callsign_cache_airline`: Airline-based queries
#### `data_sources`
Tracks loaded external data sources and their metadata:
```sql
CREATE TABLE data_sources (
name TEXT PRIMARY KEY,
license TEXT NOT NULL,
url TEXT,
version TEXT,
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
record_count INTEGER DEFAULT 0,
user_accepted_license BOOLEAN DEFAULT 0
);
```
## Database Location Strategy
@ -195,15 +251,72 @@ var migrations = []Migration{
}
```
## Data Sources and Loading
SkyView supports multiple aviation data sources with automatic conflict resolution and license compliance.
### Supported Data Sources
#### OpenFlights Airlines Database
- **Source**: https://openflights.org/data.html
- **License**: Open Database License (ODbL) 1.0
- **Content**: Global airline data with ICAO/IATA codes, callsigns, and country information
- **Records**: ~6,162 airlines
- **Update Method**: Runtime download (no license confirmation required)
#### OpenFlights Airports Database
- **Source**: https://openflights.org/data.html
- **License**: Open Database License (ODbL) 1.0
- **Content**: Global airport data with coordinates, codes, and metadata
- **Records**: ~7,698 airports
- **Update Method**: Runtime download
#### OurAirports Database
- **Source**: https://ourairports.com/data/
- **License**: Creative Commons Zero (CC0) 1.0
- **Content**: Comprehensive airport database with detailed metadata
- **Records**: ~83,557 airports
- **Update Method**: Runtime download
### Data Loading System
#### Intelligent Conflict Resolution
The data loading system uses **INSERT OR REPLACE** upserts to handle overlapping data:
```sql
INSERT OR REPLACE INTO airlines (id, name, alias, iata_code, icao_code, callsign, country, active, data_source)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
```
This ensures that:
- Duplicate records are automatically updated rather than causing errors
- Later data sources can override earlier ones
- Database integrity is maintained during bulk loads
#### Loading Process
1. **Source Validation**: Verify data source accessibility and format
2. **Incremental Processing**: Process data in chunks to manage memory
3. **Error Handling**: Log and continue on individual record errors
4. **Statistics Reporting**: Track records processed, added, and errors
5. **Source Tracking**: Record metadata about each loaded source
#### Performance Characteristics
- **OpenFlights Airlines**: ~6,162 records in ~363ms
- **OpenFlights Airports**: ~7,698 records in ~200ms
- **OurAirports**: ~83,557 records in ~980ms
- **Error Rate**: <0.1% under normal conditions
## Configuration Integration
### Database Configuration
```json
{
"database": {
"path": "/var/lib/skyview/skyview.db",
"path": "/var/lib/skyview-adsb/skyview.db",
"max_history_days": 7,
"backup_on_upgrade": true
"backup_on_upgrade": true,
"vacuum_interval": "24h",
"page_size": 4096
},
"callsign": {
"enabled": true,
@ -294,10 +407,26 @@ var migrations = []Migration{
4. **Service**: Systemd integration with proper database access
### Service User
- User: `skyview`
- Home: `/var/lib/skyview`
- User: `skyview-adsb`
- Home: `/var/lib/skyview-adsb`
- Shell: `/bin/false` (service account)
- Database: Read/write access to `/var/lib/skyview/`
- Database: Read/write access to `/var/lib/skyview-adsb/`
### Automatic Database Updates
The systemd service configuration includes automatic database updates on startup:
```ini
[Service]
Type=simple
User=skyview-adsb
Group=skyview-adsb
# Update database before starting main service
ExecStartPre=/usr/bin/skyview-data -config /etc/skyview-adsb/config.json update
TimeoutStartSec=300
ExecStart=/usr/bin/skyview -config /etc/skyview-adsb/config.json
```
This ensures aviation data sources are refreshed before each service start, complementing the weekly timer-based updates.
## Data Retention and Cleanup
@ -320,6 +449,89 @@ WHERE expires_at < datetime('now');
VACUUM;
```
## Database Optimization
SkyView includes a comprehensive database optimization system that automatically manages storage efficiency and performance.
### Optimization Features
#### Automatic VACUUM Operations
- **Full VACUUM**: Rebuilds database to reclaim deleted space
- **Incremental VACUUM**: Gradual space reclamation with minimal performance impact
- **Scheduled Maintenance**: Configurable intervals for automatic optimization
- **Size Reporting**: Before/after statistics with space savings metrics
#### Storage Optimization
- **Page Size Optimization**: Configurable SQLite page size for optimal performance
- **Auto-Vacuum Configuration**: Enables incremental space reclamation
- **Statistics Updates**: ANALYZE operations for query plan optimization
- **Efficiency Monitoring**: Real-time storage efficiency reporting
### Using the Optimization System
#### Command Line Interface
```bash
# Run comprehensive database optimization
skyview-data optimize
# Run with force flag to skip confirmation prompts
skyview-data optimize --force
# Check current optimization statistics
skyview-data optimize --stats-only
```
#### Optimization Output Example
```
Optimizing database for storage efficiency...
✓ Auto VACUUM: Enable incremental auto-vacuum
✓ Incremental VACUUM: Reclaim free pages incrementally
✓ Optimize: Update SQLite query planner statistics
✓ Analyze: Update table statistics for better query plans
VACUUM completed in 1.2s: 275.3 MB → 263.1 MB (saved 12.2 MB, 4.4%)
Database optimization completed successfully.
Storage efficiency: 96.8% (263.1 MB used of 272.4 MB allocated)
```
#### Configuration Options
```json
{
"database": {
"vacuum_interval": "24h",
"page_size": 4096,
"enable_compression": true,
"compression_level": 6
}
}
```
### Optimization Statistics
The optimization system provides detailed metrics about database performance:
#### Available Statistics
- **Database Size**: Total file size in bytes
- **Page Statistics**: Page size, count, and utilization
- **Storage Efficiency**: Percentage of allocated space actually used
- **Free Space**: Amount of reclaimable space available
- **Auto-Vacuum Status**: Current auto-vacuum configuration
- **Last Optimization**: Timestamp of most recent optimization
#### Programmatic Access
```go
// Get current optimization statistics
optimizer := NewOptimizationManager(db, config)
stats, err := optimizer.GetOptimizationStats()
if err != nil {
log.Fatal("Failed to get stats:", err)
}
fmt.Printf("Database efficiency: %.1f%%\n", stats.Efficiency)
fmt.Printf("Storage used: %.1f MB\n", float64(stats.DatabaseSize)/(1024*1024))
```
## Performance Considerations
### Query Optimization
@ -329,8 +541,10 @@ VACUUM;
### Storage Efficiency
- Configurable history limits prevent unbounded growth
- Periodic VACUUM operations reclaim deleted space
- Automatic VACUUM operations with optimization reporting
- Compressed timestamps and efficient data types
- Page size optimization for storage efficiency
- Auto-vacuum configuration for incremental space reclamation
### Memory Usage
- WAL mode for concurrent read/write access
@ -428,6 +642,76 @@ sqlite3 /var/lib/skyview/skyview.db "SELECT * FROM schema_info;"
sqlite3 /var/lib/skyview/skyview.db ".dbinfo"
```
## Testing and Quality Assurance
SkyView includes comprehensive test coverage for all database functionality to ensure reliability and data integrity.
### Test Coverage Areas
#### Core Database Functionality
- **Database Creation and Initialization**: Connection management, configuration handling
- **Migration System**: Schema versioning, upgrade/downgrade operations
- **Connection Pooling**: Concurrent access, connection lifecycle management
- **SQLite Pragma Settings**: WAL mode, foreign keys, performance optimizations
#### Data Loading and Management
- **Multi-Source Loading**: OpenFlights, OurAirports data integration
- **Conflict Resolution**: Upsert operations, duplicate handling
- **Error Handling**: Network failures, malformed data recovery
- **Performance Validation**: Loading speed, memory usage optimization
#### Callsign Enhancement System
- **Parsing Logic**: Callsign validation, airline code extraction
- **Database Integration**: Local lookups, caching operations
- **Search Functionality**: Airline filtering, country-based queries
- **Cache Management**: TTL handling, cleanup operations
#### Optimization System
- **VACUUM Operations**: Space reclamation, performance monitoring
- **Page Size Optimization**: Configuration validation, storage efficiency
- **Statistics Generation**: Metrics accuracy, reporting consistency
- **Maintenance Scheduling**: Automated optimization, interval management
### Test Infrastructure
#### Automated Test Setup
```go
// setupTestDatabase creates isolated test environment
func setupTestDatabase(t *testing.T) (*Database, func()) {
tempFile, _ := os.CreateTemp("", "test_skyview_*.db")
config := &Config{Path: tempFile.Name()}
db, _ := NewDatabase(config)
db.Initialize() // Run all migrations
cleanup := func() {
db.Close()
os.Remove(tempFile.Name())
}
return db, cleanup
}
```
#### Network-Safe Testing
Tests gracefully handle network connectivity issues:
- Skip tests requiring external data sources when offline
- Provide meaningful error messages for connectivity failures
- Use local test data when external sources are unavailable
### Running Tests
```bash
# Run all database tests
go test -v ./internal/database/...
# Run tests in short mode (skip long-running network tests)
go test -v -short ./internal/database/...
# Run specific test categories
go test -v -run="TestDatabase" ./internal/database/...
go test -v -run="TestOptimization" ./internal/database/...
go test -v -run="TestCallsign" ./internal/database/...
```
## Future Enhancements
### Planned Features
@ -435,8 +719,11 @@ sqlite3 /var/lib/skyview/skyview.db ".dbinfo"
- **Partitioning**: Date-based partitioning for large datasets
- **Replication**: Read replica support for high-availability setups
- **Analytics**: Built-in reporting and statistics tables
- **Enhanced Route Data**: Integration with additional flight tracking APIs
- **Geographic Indexing**: Spatial queries for airport proximity searches
### Migration Path
- All enhancements will use versioned migrations
- Backward compatibility maintained for existing installations
- Data preservation prioritized over schema optimization
- Data preservation prioritized over schema optimization
- Comprehensive testing required for all schema changes