Skip to main content

The System of Record

The Data Layer is a versioned, relational database that serves as the immutable system of record for all modeling activities. Every input, execution, and output is permanently archived with full audit trails.
Core Principle: Nothing is ever overwritten. All data is append-only, creating perfect reproducibility and temporal analysis capabilities.

Database Architecture

Core Tables

Input Tables

Immutable records of all versioned scenario configurations.
CREATE TABLE scenarios (
    id VARCHAR PRIMARY KEY,
    created_at TIMESTAMP,
    created_by VARCHAR,
    name VARCHAR,
    description TEXT,
    plan_rules_id VARCHAR,
    operating_assumptions_id VARCHAR,
    version INTEGER,
    parent_scenario_id VARCHAR  -- For scenario variants
);
Key Features:
  • Every scenario is versioned
  • Scenarios can fork from parent scenarios
  • Immutable once created

Processing Tables

Execution metadata connecting inputs to outputs.
CREATE TABLE simulation_runs (
    id VARCHAR PRIMARY KEY,
    scenario_id VARCHAR,
    census_id VARCHAR,
    system_config JSONB,
    
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    status VARCHAR,  -- 'running', 'completed', 'failed'
    
    projection_start_year INTEGER,
    projection_end_year INTEGER,
    
    error_message TEXT,
    execution_metrics JSONB  -- performance stats
);
The Logbook: This table is the permanent record connecting specific inputs to specific outputs.

Output Tables

Year-by-year company financial snapshots.
CREATE TABLE annual_company_states (
    simulation_run_id VARCHAR,
    year INTEGER,
    source_type VARCHAR,  -- 'actual' or 'simulated'
    
    revenue DECIMAL,
    ebitda DECIMAL,
    total_payroll DECIMAL,
    esop_contribution DECIMAL,
    
    financial_metrics JSONB,
    
    PRIMARY KEY (simulation_run_id, year)
);
Source Type: Distinguishes historical actuals from future projections.

Versioning Strategy

Scenario Versioning

Scenarios can be versioned to track changes over time:
# Original scenario (June 2024)
scenario_v1 = Scenario(
    id="acme_base",
    version=1,
    contribution_amount=500_000
)

# Updated scenario (September 2024)
scenario_v2 = Scenario(
    id="acme_base",
    version=2,
    parent_scenario_id="acme_base_v1",
    contribution_amount=450_000,  # Reduced
    change_log="Reduced contribution due to market conditions"
)

Source Type Classification

All data is tagged with source_type:
  • 'actual': Historical, verified data
  • 'simulated': Forecasted data from model runs
This enables powerful queries:
-- Compare actual vs. projected for year 2024
SELECT 
    actual.repurchase_amount as actual_repurchases,
    simulated.repurchase_amount as projected_repurchases,
    (actual.repurchase_amount - simulated.repurchase_amount) as variance
FROM annual_trust_states actual
JOIN annual_trust_states simulated
    ON actual.year = simulated.year
WHERE actual.source_type = 'actual'
    AND simulated.source_type = 'simulated'
    AND actual.year = 2024;

Temporal Analysis Capabilities

The immutable architecture enables sophisticated temporal queries:
Track how your forecasts changed over time:
-- How did our 2028 repurchase forecast evolve?
SELECT 
    sr.completed_at as forecast_date,
    ats.repurchase_amount as projected_2028_repurchases
FROM simulation_runs sr
JOIN annual_trust_states ats 
    ON sr.id = ats.simulation_run_id
WHERE ats.year = 2028
    AND ats.source_type = 'simulated'
ORDER BY sr.completed_at;
Identify which assumption changes drove result differences:
-- What changed between runs?
SELECT 
    jsonb_diff(
        old_run.operating_assumptions,
        new_run.operating_assumptions
    ) as assumption_changes
FROM scenarios old_run, scenarios new_run
WHERE old_run.id = 'acme_v1'
    AND new_run.id = 'acme_v2';
Compare forecasts to actuals to measure model accuracy:
-- How accurate was our 2024 forecast made in 2023?
SELECT 
    projected.repurchase_amount as forecast,
    actual.repurchase_amount as actual,
    ABS(projected.repurchase_amount - actual.repurchase_amount) / 
        actual.repurchase_amount as error_percentage
FROM annual_trust_states projected
JOIN annual_trust_states actual
    ON projected.year = actual.year
WHERE projected.source_type = 'simulated'
    AND projected.simulation_run_id = 'run_2023_forecast'
    AND actual.source_type = 'actual'
    AND actual.year = 2024;

Data Integrity

Constraints & Validation

-- Ensure share conservation
ALTER TABLE annual_trust_states
    ADD CONSTRAINT shares_balance_check
    CHECK (
        total_shares_outstanding = 
        allocated_shares + suspense_shares + unallocated_shares
    );

-- Prevent negative cash
ALTER TABLE annual_trust_states
    ADD CONSTRAINT cash_positive_check
    CHECK (
        (cash_ledger->>'total_cash')::DECIMAL >= 0
    );

Referential Integrity

-- All simulation runs must reference valid scenarios
ALTER TABLE simulation_runs
    ADD FOREIGN KEY (scenario_id) 
    REFERENCES scenarios(id);

-- All participant snapshots must reference valid runs
ALTER TABLE annual_participant_snapshots
    ADD FOREIGN KEY (simulation_run_id)
    REFERENCES simulation_runs(id)
    ON DELETE CASCADE;

Query Patterns

Common Queries

SELECT * FROM simulation_runs
WHERE scenario_id = 'acme_base'
ORDER BY completed_at DESC
LIMIT 1;

Backup & Recovery

Point-in-Time Recovery

Restore database to any point in history

Snapshot Exports

Export specific runs for offline archival

Incremental Backups

Daily backups with 90-day retention

Disaster Recovery

Multi-region replication for business continuity

Performance Considerations

-- Fast simulation run lookups
CREATE INDEX idx_runs_scenario ON simulation_runs(scenario_id, completed_at);

-- Fast year-based queries
CREATE INDEX idx_trust_year ON annual_trust_states(simulation_run_id, year);

-- Fast participant lookups
CREATE INDEX idx_participant ON annual_participant_snapshots(participant_id, year);
Large tables partitioned by simulation_run_id for faster queries and easier archival.
Pre-computed aggregations for common dashboard queries (e.g., total repurchase obligations by year).

Next Steps

Data Models

Explore the object models built on this database

API Reference

Learn how to query and manipulate data via API