Designing Schemas for Messy Real-World Business Logic

The database schema is the foundation everything else builds on. Get it right, and your application flows naturally. Get it wrong, and you'll fight the data model for the lifetime of the system.

Good data modelling captures business reality accurately. It handles the exceptions, edge cases, and "but sometimes..." scenarios that real businesses have. Poor data modelling forces workarounds, creates data integrity issues, and makes simple queries complicated.

users

id BIGINT

email VARCHAR

tier ENUM

orders

id BIGINT

user_id BIGINT

status VARCHAR

order_lines

id BIGINT

order_id BIGINT

qty INT

pricing_rules 👁️

id BIGINT

name VARCHAR

conditions JSON

actions JSON

products

id BIGINT

sku VARCHAR

attributes JSON

services

id BIGINT

duration INT

renew_at DATE

audit_log 👁️

id BIGINT

subject_id BIGINT

subject_type STR

changes JSON

The Constraint: Business Logic is Messy

Tutorial schemas are clean. A user has many orders. An order has many line items. Foreign keys point where they should. The happy path works.

Then reality arrives. A customer merges with another company, and you need to combine their order histories without losing audit trails. A product gets discontinued, but existing orders still reference it. An invoice needs to reflect the price at the time of sale, not the current price. A manager needs to see "what did this record look like last Tuesday?"

These aren't edge cases. They're the normal state of business software that runs for more than six months. The schema either handles them gracefully or forces increasingly desperate workarounds in application code.

The pattern: Most schema problems aren't discovered during development. They surface six months later when someone asks for a report that the schema makes impossible, or when a business rule changes and you realise the data model assumed it never would.

Domain Modelling: Start with the Language

Before writing any SQL, map the language of the business. What do they call things? An "account" might mean a customer to sales, a login to IT, and a billing entity to finance. A "project" in one department might overlap with "job" in another.

These aren't pedantic distinctions. They determine table names, relationship structures, and the queries you'll write for years. Get them wrong, and every developer who touches the codebase will need a translation guide.

Interview the domain experts

Talk to the people who do the work. They know the exceptions. "Usually orders have one customer, but sometimes we split billing between two." These details determine schema structure.

Build a glossary

Document terms and their meanings across departments. When sales says "account" and support says "account", do they mean the same entity? Resolve ambiguities before encoding them in tables.

Map the relationships

Draw the connections. Can a customer have multiple billing addresses? Can a line item span multiple orders? Can an employee report to multiple managers? Cardinality matters.

Find the lifecycle events

What happens when an order is cancelled? When a customer churns? When a product is discontinued? These events reveal requirements that happy-path discussions miss.

The result is a ubiquitous language: terms that mean the same thing in code, in the database, and in conversations with the business. When the code says Invoice, everyone knows what that means. No translation required. This shared vocabulary is essential for building a single source of truth.

Normalisation: The Naive Approach vs. Deliberate Design

The Naive Approach: Normalise Everything

Database courses teach normalisation as dogma. Third normal form. No redundant data. Every fact stored once. In isolation, this is correct.

In production, it creates problems. A simple "show me the order summary" query joins six tables. The report that runs overnight because it aggregates across perfectly normalised tables. The API endpoint that times out because it needs to reconstruct data from fifteen related records.

The Other Naive Approach: Denormalise Everything

The pendulum swings. "Normalisation is slow, let's just copy data everywhere." Now you have synchronisation bugs. The customer's name is different in the orders table than in the customers table. Someone updated one and not the other. Which is correct? No one knows.

The Robust Pattern: Normalise First, Then Denormalise Deliberately

Start with a normalised design. Understand the canonical structure. Then, with specific, measured performance problems in hand, introduce denormalisation where the trade-offs are worth it.

Scenario	Normalised Approach	When to Denormalise
Customer name on orders	Join to customers table	Copy at order time if you need historical accuracy and the join costs more than the storage
Order totals	SUM(line_items.price * quantity)	Store computed total if the calculation is expensive and totals are queried frequently
Product categories	Junction table for many-to-many	JSON array if you never query by category and need fast reads
Reporting aggregates	Calculate on demand	Materialised views or summary tables for dashboards that run constantly

The key word is "deliberately". Every denormalisation is a trade-off: storage vs. computation, consistency vs. performance. Document why you made the trade-off. Future developers (including yourself in six months) will need to know.

Temporal Data: The Problem Everyone Ignores Until It's Too Late

Business data changes over time. Prices change. Employees change roles. Customers change addresses. A schema that only stores current state will eventually fail someone who needs to answer: "What was true on this date?"

The Constraint

An auditor asks: "What was this customer's credit limit when this order was approved?" The order was placed in March. The credit limit was updated in April. Your schema stores only the current credit limit. You cannot answer the question.

Effective Dating

Records that are valid for a period use effective dates: effective_from and effective_to. The current record has effective_to = NULL or a far-future date.

-- Price history table
CREATE TABLE product_prices (
    id BIGINT PRIMARY KEY,
    product_id BIGINT REFERENCES products(id),
    price DECIMAL(10,2) NOT NULL,
    effective_from DATE NOT NULL,
    effective_to DATE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT no_overlap EXCLUDE USING gist (
        product_id WITH =,
        daterange(effective_from, effective_to, '[)') WITH &&
    )
);

The exclusion constraint (PostgreSQL) prevents overlapping date ranges for the same product. This is the database enforcing business rules that application code would otherwise need to check on every insert.

Querying Effective-Dated Data

To find the price on a specific date:

SELECT price
FROM product_prices
WHERE product_id = 123
  AND effective_from <= '2025-03-15'
  AND (effective_to IS NULL OR effective_to > '2025-03-15');

This pattern applies to any data that changes over time: employee roles, customer addresses, tax rates, exchange rates, pricing tiers.

Snapshot vs. Temporal Tables

Snapshot approach

Copy relevant data at transaction time. The order stores the price, customer name, and address as they were when the order was placed. Simple to query. Duplicates data.

Use when: You need fast reads and the data is copied at a clear point in time (order placement, invoice generation).

Temporal tables

Maintain history in a separate table or with effective dates. Query "as of" any point in time. More complex queries. No duplication.

Use when: You need to query historical state at arbitrary points, or when the same data is referenced by many records.

PostgreSQL and SQL Server both offer system-versioned temporal tables that handle this automatically. For MySQL or SQLite, you'll implement it manually with triggers or application logic.

Polymorphic Relationships: Several Implementation Options

A "note" that can belong to either a customer or an order. An "attachment" linked to invoices, projects, or support tickets. These are polymorphic relationships: one table relates to multiple other tables.

The Naive Approach: Morph Columns

Laravel's morphTo pattern stores notable_type and notable_id:

CREATE TABLE notes (
    id BIGINT PRIMARY KEY,
    notable_type VARCHAR(255),  -- 'App\Models\Customer' or 'App\Models\Order'
    notable_id BIGINT,
    content TEXT,
    created_at TIMESTAMP
);

This works at the application level but sacrifices database integrity. There's no foreign key constraint. notable_id might reference a row that doesn't exist. The database cannot enforce referential integrity.

The Robust Pattern: Multiple Foreign Keys

Use nullable foreign keys for each possible parent:

CREATE TABLE notes (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT REFERENCES customers(id),
    order_id BIGINT REFERENCES orders(id),
    content TEXT,
    created_at TIMESTAMP,
    CONSTRAINT exactly_one_parent CHECK (
        (customer_id IS NOT NULL)::int +
        (order_id IS NOT NULL)::int = 1
    )
);

The check constraint ensures exactly one parent is set. The foreign keys ensure the parent exists. The database enforces the business rules.

The Junction Table Pattern

For true many-to-many polymorphism, use separate junction tables:

CREATE TABLE customer_notes (
    customer_id BIGINT REFERENCES customers(id),
    note_id BIGINT REFERENCES notes(id),
    PRIMARY KEY (customer_id, note_id)
);

CREATE TABLE order_notes (
    order_id BIGINT REFERENCES orders(id),
    note_id BIGINT REFERENCES notes(id),
    PRIMARY KEY (order_id, note_id)
);

More tables, but full referential integrity and the ability for a note to belong to multiple parents if the business requires it.

Trade-off: Morph columns are convenient for rapid development. Multiple foreign keys are safer for production systems where data integrity matters. Choose based on how much you trust your application code to never make mistakes.

Soft Deletes and Archival Strategies

Deleting a record removes it permanently. Sometimes that's correct. Often it isn't. Audit requirements, reporting needs, and the ability to undo mistakes all argue for soft deletes.

The Implementation

ALTER TABLE customers ADD COLUMN deleted_at TIMESTAMP NULL;

A record with deleted_at IS NULL is active. A record with a timestamp is soft-deleted. Every query that should exclude deleted records needs:

SELECT * FROM customers WHERE deleted_at IS NULL;

The Problem: Query Complexity

Every developer must remember to add the filter. Every query, every join, every subquery. Miss one, and deleted records appear in reports. ORMs like Laravel handle this with global scopes, but raw queries bypass them.

The Robust Pattern: Partial Indexes and Views

Create a view for active records:

CREATE VIEW active_customers AS
SELECT * FROM customers WHERE deleted_at IS NULL;

Application code queries the view by default. The underlying table is available when you need historical data.

Add a partial index for queries that filter on active records:

CREATE INDEX idx_customers_active_email
ON customers(email)
WHERE deleted_at IS NULL;

This index only includes active records, making it smaller and faster for the common case.

Archival for Large Tables

Soft deletes keep everything in one table. For high-volume tables (millions of rows), this slows queries even with indexes. The archival pattern moves old data to separate tables:

-- Active orders (last 2 years)
CREATE TABLE orders (...);

-- Archived orders (older than 2 years)
CREATE TABLE orders_archive (LIKE orders INCLUDING ALL);

-- Move old orders periodically
INSERT INTO orders_archive
SELECT * FROM orders WHERE created_at < NOW() - INTERVAL '2 years';

DELETE FROM orders WHERE created_at < NOW() - INTERVAL '2 years';

Active queries hit the smaller table. Historical reports union both tables when needed. The approach trades query simplicity for performance.

Soft Delete

Add deleted_at column. Simple. Every query needs filter.

View Abstraction

Query active_* views. Deleted records hidden by default.

Table Archival

Separate tables for old data. Best for high volume.

JSON Columns vs. Relational Tables

PostgreSQL's JSONB and MySQL's JSON columns offer schema flexibility within a relational database. Used correctly, they simplify certain patterns. Used incorrectly, they recreate all the problems of document databases without the benefits.

Good Uses for JSON Columns

Configuration that varies by record

Each customer has different notification preferences. Each product has different configurable options. The structure varies, and you don't query by these fields. Store as JSON, validate in application code.

External API responses

You store webhook payloads or API responses for debugging and audit purposes. You don't query by their contents. JSON preserves the original structure without schema maintenance.

Truly variable structures

Form builder responses where each form has different fields. Survey answers where questions vary. The structure is user-defined and cannot be predicted at schema design time.

Poor Uses for JSON Columns

Data you filter or sort by

Extracting values from JSON for WHERE clauses is slow. Even with JSON path indexes, it's slower than a proper column. If you query by it, it should be a column.

Data with consistent structure

If every record has the same fields, use columns. You get type checking, constraints, and indexing. JSON provides flexibility you're not using.

Relationships between records

Storing foreign keys in JSON arrays prevents referential integrity. The database cannot enforce that referenced records exist. Use junction tables for relationships.

The Hybrid Pattern

Use columns for data you query, JSON for data you don't:

CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT REFERENCES customers(id),
    status VARCHAR(50) NOT NULL,           -- Queried frequently
    total DECIMAL(10,2) NOT NULL,          -- Queried frequently
    shipping_address_id BIGINT REFERENCES addresses(id),  -- Relational
    metadata JSONB,                         -- Webhook data, notes, flexible config
    created_at TIMESTAMP
);

-- Index the columns you query
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Optional: index specific JSON paths if you must query them
CREATE INDEX idx_orders_source ON orders((metadata->>'source'));

This gives you relational rigour where it matters and flexibility where it doesn't.

Indexing Strategies for Common Query Patterns

Indexes speed up reads and slow down writes. Every index must earn its place. The strategy depends on your query patterns, not abstract best practices.

The Naive Approach: Index Everything

Add an index to every column. Queries are fast. Inserts and updates are slow. The database maintains indexes it never uses. Storage grows.

The Other Naive Approach: Index Nothing

Let the database figure it out. It won't. Full table scans on every query. The application slows to a crawl as data grows.

The Robust Pattern: Index for Your Queries

Analyse your actual query patterns. Index the columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses. Use EXPLAIN ANALYZE to verify indexes are being used.

Single-Column Indexes

The basics. Index columns you filter by:

-- If you frequently query orders by status
CREATE INDEX idx_orders_status ON orders(status);

-- If you frequently query customers by email
CREATE UNIQUE INDEX idx_customers_email ON customers(email);

Composite Indexes

For queries that filter on multiple columns:

-- Queries like: WHERE customer_id = ? AND status = ?
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);

-- Column order matters: this index helps queries that filter by customer_id alone,
-- but not queries that filter only by status

The leftmost columns in a composite index can satisfy queries independently. A (customer_id, status, created_at) index helps queries filtering by customer_id, or by customer_id + status, or by all three. It does not help queries filtering only by status or only by created_at.

Covering Indexes

Include columns the query selects to avoid table lookups:

-- Query: SELECT status, total FROM orders WHERE customer_id = ?
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id) INCLUDE (status, total);

The index contains all the data the query needs. The database never touches the table itself. Faster reads at the cost of larger indexes.

Partial Indexes

Index a subset of rows:

-- Most queries only care about active orders
CREATE INDEX idx_orders_active_customer
ON orders(customer_id)
WHERE status NOT IN ('cancelled', 'completed');

-- Most queries exclude soft-deleted records
CREATE INDEX idx_customers_active
ON customers(email)
WHERE deleted_at IS NULL;

Smaller indexes. Faster maintenance. Only useful if your queries include the same filter condition.

Expression Indexes

Index computed values:

-- Queries like: WHERE LOWER(email) = ?
CREATE INDEX idx_customers_email_lower ON customers(LOWER(email));

-- Queries like: WHERE DATE(created_at) = ?
CREATE INDEX idx_orders_created_date ON orders(DATE(created_at));

The query must use the exact same expression for the index to apply.

Measure, don't guess: Use EXPLAIN ANALYZE to see which indexes are actually used. Check pg_stat_user_indexes (PostgreSQL) to find unused indexes. Remove indexes that cost more than they help.

State Machines and Status Fields

Orders move through states: draft, submitted, approved, shipped, completed, cancelled. Modelling state transitions correctly prevents impossible states and provides audit trails.

The Naive Approach: Status Column Only

ALTER TABLE orders ADD COLUMN status VARCHAR(50) DEFAULT 'draft';

The application updates the status. No record of when transitions happened. No enforcement of valid transitions. An order can jump from "draft" to "shipped" if the code has a bug.

The Robust Pattern: Status with Transition History

CREATE TABLE order_status_transitions (
    id BIGINT PRIMARY KEY,
    order_id BIGINT REFERENCES orders(id),
    from_status VARCHAR(50),
    to_status VARCHAR(50) NOT NULL,
    transitioned_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    transitioned_by BIGINT REFERENCES users(id),
    notes TEXT
);

-- The orders table stores current status for fast queries
ALTER TABLE orders ADD COLUMN current_status VARCHAR(50) DEFAULT 'draft';
ALTER TABLE orders ADD COLUMN current_status_at TIMESTAMP;

Insert a transition record whenever status changes. The orders table has the current status for filtering. The transitions table has the complete history for audit and reporting. For comprehensive change tracking across your entire system, see our audit trails guide.

Valid Transition Enforcement

Define which transitions are allowed:

CREATE TABLE valid_status_transitions (
    from_status VARCHAR(50),
    to_status VARCHAR(50),
    PRIMARY KEY (from_status, to_status)
);

INSERT INTO valid_status_transitions VALUES
    ('draft', 'submitted'),
    ('submitted', 'approved'),
    ('submitted', 'rejected'),
    ('approved', 'shipped'),
    ('shipped', 'completed'),
    ('draft', 'cancelled'),
    ('submitted', 'cancelled');

Application code checks this table before allowing a transition. Or use a trigger to enforce it at the database level:

CREATE FUNCTION check_status_transition() RETURNS TRIGGER AS $$
BEGIN
    IF NOT EXISTS (
        SELECT 1 FROM valid_status_transitions
        WHERE from_status = OLD.current_status
        AND to_status = NEW.current_status
    ) THEN
        RAISE EXCEPTION 'Invalid status transition from % to %',
            OLD.current_status, NEW.current_status;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

This prevents bugs in application code from creating invalid states. The database is the last line of defence.

Multi-Tenancy Patterns

Multiple clients sharing one database. Three approaches, each with trade-offs.

Tenant ID on every table

Add tenant_id to every table. Every query includes the filter. Simple to implement. Risk of data leakage if a query misses the filter.

Best for: Shared infrastructure, cost-sensitive deployments, tenants with similar data volumes.

Separate schemas

Each tenant gets their own PostgreSQL schema. Queries don't need tenant filters. Schema changes must be applied to all schemas.

Best for: Moderate isolation requirements, tenants who want some data separation guarantees.

Separate databases

Complete isolation. Each tenant has their own database. Maximum security. Higher operational cost. Cross-tenant queries require application-level aggregation.

Best for: Strict compliance requirements, large enterprise tenants, data residency requirements.

Implementing Tenant ID Safely

If using the tenant ID pattern, use Row Level Security (PostgreSQL) to enforce isolation at the database level. We cover multi-tenancy implementation in more detail in our multi-tenant Laravel guide.

-- Enable RLS
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Policy that restricts access to current tenant
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant_id')::bigint);

-- Set tenant context at connection time
SET app.current_tenant_id = '123';

Even if application code forgets the WHERE clause, the database enforces tenant isolation. This is defence in depth.

Constraints: Your Last Line of Defence

Application code has bugs. ORMs have bugs. API consumers send unexpected data. Constraints are the database saying "this data is invalid" regardless of how it arrived.

Constraint Types

NOT NULL: Column must have a value. Prevents incomplete records.
UNIQUE: No duplicate values. Email addresses, order numbers, SKUs.
FOREIGN KEY: Referenced row must exist. Prevents orphaned records.
CHECK: Custom validation. CHECK (price >= 0), CHECK (end_date > start_date).
EXCLUSION: No overlapping values. Date ranges, scheduled resources, room bookings.

CREATE TABLE bookings (
    id BIGINT PRIMARY KEY,
    room_id BIGINT REFERENCES rooms(id) NOT NULL,
    starts_at TIMESTAMP NOT NULL,
    ends_at TIMESTAMP NOT NULL,
    booked_by BIGINT REFERENCES users(id) NOT NULL,

    -- End must be after start
    CONSTRAINT valid_time_range CHECK (ends_at > starts_at),

    -- No double-booking (PostgreSQL with btree_gist extension)
    CONSTRAINT no_overlap EXCLUDE USING gist (
        room_id WITH =,
        tsrange(starts_at, ends_at) WITH &&
    )
);

The database prevents double-bookings. No race conditions in application code. No "oops, two people booked the same room" support tickets.

The investment: A missing constraint in a customer database allowed duplicate entries for years. Cleaning up the resulting data cost more than the original development. Constraints are cheap. Data problems are expensive.

Anti-Patterns to Recognise

Patterns that seem reasonable but cause problems at scale or over time.

The "god" table: One table representing multiple concepts. A "things" table with type columns and nullable fields for each type. Impossible to constrain. Queries become CASE statement nightmares.

Entity-Attribute-Value (EAV): Storing everything as key-value pairs. (entity_id, attribute_name, attribute_value). Seems flexible. Actually makes queries impossible to optimise and prevents meaningful constraints.

Comma-separated values in columns: Storing multiple values in one column (tags = 'red,blue,green'). Cannot be indexed properly. Cannot enforce referential integrity. Use a junction table.

Implicit typing: Storing different data types in the same column based on context. A "value" column that's sometimes a number, sometimes a date, sometimes text. Cannot be indexed or validated.

Premature denormalisation: Copying data "for performance" before measuring. Creates synchronisation bugs. Only denormalise with a specific, measured performance problem.

If you recognise these patterns in an existing schema, plan migrations carefully. Fixing a schema with live data is surgery, not coding.

The Business Link: Why This Matters

Schema work isn't glamorous. It doesn't produce visible features. The payoff is indirect but substantial.

✓

Fewer bugs reach production Constraints catch invalid data before it corrupts reports and confuses users.
✓

Queries stay fast as data grows Proper indexes and structure mean the system that works with 10,000 records still works with 10 million.
✓

New features are easier to build A well-structured schema makes new queries straightforward. A poorly-structured one makes every feature a battle.
✓

Reports answer the actual questions Historical data is available when auditors ask. Temporal queries return accurate results.
✓

Migrations are less painful A schema designed for change can be modified without rebuilding the entire system.

The cost of getting schema right is paid once, during design. The cost of getting it wrong is paid repeatedly, on every feature, every report, every migration, for the lifetime of the system.

Development

Systems

Designing Schemas for Messy Real-World Business Logic

The Constraint: Business Logic is Messy

Domain Modelling: Start with the Language

Interview the domain experts

Build a glossary

Map the relationships

Find the lifecycle events

Normalisation: The Naive Approach vs. Deliberate Design

The Naive Approach: Normalise Everything

The Other Naive Approach: Denormalise Everything

The Robust Pattern: Normalise First, Then Denormalise Deliberately

Temporal Data: The Problem Everyone Ignores Until It's Too Late

The Constraint

Effective Dating

Querying Effective-Dated Data

Snapshot vs. Temporal Tables

Snapshot approach

Temporal tables

Polymorphic Relationships: Several Implementation Options

The Naive Approach: Morph Columns

The Robust Pattern: Multiple Foreign Keys

The Junction Table Pattern

Soft Deletes and Archival Strategies

The Implementation

The Problem: Query Complexity

The Robust Pattern: Partial Indexes and Views

Archival for Large Tables

JSON Columns vs. Relational Tables

Good Uses for JSON Columns

Configuration that varies by record

External API responses

Truly variable structures

Poor Uses for JSON Columns

Data you filter or sort by

Data with consistent structure

Relationships between records

The Hybrid Pattern

Indexing Strategies for Common Query Patterns

The Naive Approach: Index Everything

The Other Naive Approach: Index Nothing

The Robust Pattern: Index for Your Queries

Single-Column Indexes

Composite Indexes

Covering Indexes

Partial Indexes

Expression Indexes

State Machines and Status Fields

The Naive Approach: Status Column Only

The Robust Pattern: Status with Transition History

Valid Transition Enforcement

Multi-Tenancy Patterns

Tenant ID on every table

Separate schemas

Separate databases

Implementing Tenant ID Safely

Constraints: Your Last Line of Defence

Constraint Types

Anti-Patterns to Recognise

The Business Link: Why This Matters

Further Reading

Design Your Data Model