Moving and Upgrading Databases Safely

Databases change. New features need new tables. Business requirements reveal missing columns. Performance problems demand restructuring. The question isn't whether your schema will evolve. It's how you'll manage that evolution without taking down production or losing data. This is a core concern of thoughtful data modelling.

Database migrations are versioned, trackable changes to your schema. They turn scary database changes into routine, reversible operations. Done well, migrations enable confident deployments. Done poorly, they cause outages, data corruption, and 3am phone calls.

✈️ Pre-Flight Check

Small Table

100M+ Rows

Structural Changes

✓

Add New Column

✓

Rename Column

✓

Drop Column

Constraints & Indices

✓

Add Index

✓

Add Foreign Key

✓

Set NOT NULL

Estimated Impact

Safe

ℹ️ Status

Select operations to calculate lock potential and deployment complexity.

The Core Problem

Schema changes are inherently dangerous because databases are stateful. Code deployments are atomic: you push new code, it runs. If it fails, you roll back. But databases persist across deployments. A column you drop is gone. A constraint you add might reject live traffic. A table rename breaks every query that references the old name.

This statefulness creates coordination problems that don't exist with stateless deployments:

Ordering dependencies

Code expects certain tables and columns to exist. If the migration runs after the code deploys, queries fail. If the migration runs before old code retires, the old code might write to structures that no longer exist.

Lock contention

Many schema changes acquire locks on tables. On busy tables, this blocks reads and writes. A migration that takes 50ms on your laptop can block production for 10 minutes on a table with 50 million rows.

Irreversibility

Some changes destroy data. Dropping a column, changing a type, removing a table. You can't simply "undo" these. You need explicit strategies for data preservation and recovery.

Environment divergence

Development, staging, and production schemas must match. Manual changes, skipped migrations, or edited migration files create divergence that causes unpredictable failures.

Migration Fundamentals

A migration is a single, versioned unit of schema change. It has a unique identifier (typically a timestamp), an "up" operation that applies the change, and a "down" operation that reverses it. Migrations run in order, tracked in a metadata table so the system knows what's been applied.

The migration contract

Every migration must satisfy three properties:

Idempotent tracking: Running the migration system twice doesn't apply the same migration twice. The state table prevents re-execution.
Ordered execution: Migrations run in their version order. Migration 3 always runs after migration 2, regardless of when files were created.
Transactional integrity: DDL statements should run inside transactions where the database supports it. A half-applied migration is worse than a failed one.

Immutability principle

Once a migration has run in any shared environment (staging, production, a colleague's machine), treat it as immutable. Never edit the file. If you made a mistake, write a new migration to correct it. Editing applied migrations causes a checksum mismatch (in tools that verify), or silent divergence (in tools that don't). Either outcome leads to hours of debugging.

Migration granularity

Keep migrations atomic. One migration should do one thing: add a column, create a table, add an index. Resist the temptation to combine related changes. Granular migrations are easier to reason about, easier to roll back, and easier to debug when something fails.

✓

Good: Migration 1 adds the users table. Migration 2 adds the email_verified_at column.

Avoid: One migration creates the users table, adds three columns to orders, and drops a constraint on products.

Safe vs Dangerous Operations

Not all schema changes carry equal risk. Understanding the risk profile of each operation type determines whether you can run a migration during business hours or need a maintenance window.

Additive changes (generally safe)

These changes don't break existing code:

Adding a new table
Adding a nullable column
Adding a column with a default value
Adding an index (with caveats)
Adding a new enum value

Additive changes can deploy before the code that uses them.

Destructive changes (require coordination)

These changes can break things:

Removing a column
Renaming a column or table
Changing a column's type
Removing a table
Adding a NOT NULL constraint to existing column
Removing an enum value

Destructive changes require multi-phase deployment.

The hidden dangers of "safe" operations

Even additive changes have edge cases. Adding an index on a large table acquires a lock for the duration of the index build. On PostgreSQL, this blocks writes unless you use CREATE INDEX CONCURRENTLY (see the PostgreSQL documentation on index creation). On MySQL, it depends on the storage engine and version. Adding a column with a default value in older MySQL versions rewrites the entire table. Know your database engine's behaviour.

Index creation locking behaviour by database
Database	Standard CREATE INDEX	Non-blocking option
PostgreSQL	Blocks writes for duration	`CREATE INDEX CONCURRENTLY`
MySQL 5.7+	Online DDL (mostly non-blocking)	`ALGORITHM=INPLACE, LOCK=NONE`
MySQL 5.6	Blocks writes	pt-online-schema-change
SQL Server	Blocks writes	`WITH (ONLINE = ON)` (Enterprise)

Migration Patterns: Concrete Examples

Abstract principles become clearer with specific patterns. Here are the operations we perform regularly, with the exact approach for each.

Pattern 1: Adding a nullable column

The simplest migration. No coordination required with application code. Run the migration, then deploy code that uses the column.

SQL: ALTER TABLE orders ADD COLUMN discount_code VARCHAR(50) NULL;

The column accepts NULL, so existing rows remain valid. Application code that doesn't know about the column continues working. New code can write to it. Deploy migration first, code second.

Pattern 2: Adding a column with a default value

Slightly more complex. The default value strategy depends on your database version.

PostgreSQL 11+, MySQL 8.0.12+:

ALTER TABLE orders ADD COLUMN is_priority BOOLEAN NOT NULL DEFAULT FALSE;

Modern databases handle this efficiently: they store the default in metadata and apply it lazily. Older versions rewrite every row in the table, which locks it for the duration. For large tables on older databases, add the column as nullable, backfill in batches, then add the NOT NULL constraint.

Pattern 3: Renaming a column (expand-contract)

Never rename directly. A direct ALTER TABLE ... RENAME COLUMN breaks all code referencing the old name instantly. Use the expand-contract pattern instead.

Phase 1: Expand

Add new column. Dual-write to both old and new. Read from old.

Phase 2: Migrate

Backfill new column from old. Switch reads to new column.

Phase 3: Contract

Stop writing to old column. Remove old column.

Example: renaming user_name to username:

Migration 1: ALTER TABLE users ADD COLUMN username VARCHAR(255);
Deploy 1: Application writes to both user_name and username. Reads from user_name.
Migration 2: UPDATE users SET username = user_name WHERE username IS NULL; (in batches for large tables)
Deploy 2: Application reads from username. Still writes to both.
Deploy 3: Application stops writing to user_name.
Migration 3: ALTER TABLE users DROP COLUMN user_name;

Three migrations. Three deployments. No downtime. No broken queries.

Pattern 4: Changing a column type

Type changes range from trivial to terrifying depending on the transformation. Widening a VARCHAR from 50 to 255 is usually safe. Changing a VARCHAR to an INTEGER requires data transformation and can fail if the data doesn't convert.

Safe type changes (usually metadata-only):

Widening VARCHAR: VARCHAR(50) to VARCHAR(255)
Widening numeric precision: DECIMAL(10,2) to DECIMAL(12,2)
Changing INT to BIGINT (most databases)

Dangerous type changes (require expand-contract):

Narrowing VARCHAR: data truncation possible
Changing between incompatible types: VARCHAR to INTEGER, TEXT to JSON
Changing precision in ways that lose data: DECIMAL(10,4) to DECIMAL(10,2)

For dangerous changes, use the expand-contract pattern. Add a new column with the new type, migrate data with explicit transformation logic, switch reads, drop the old column.

Pattern 5: Removing a column

Column removal must happen after all code stops referencing it. The sequence:

Deploy 1: Remove all reads from the column. Application no longer SELECT the column.
Deploy 2: Remove all writes to the column. Application no longer INSERT or UPDATE the column.
Wait: Ensure no running code references the column. Check for scheduled jobs, background workers, cached queries.
Migration: ALTER TABLE orders DROP COLUMN legacy_status;

Data loss warning: Dropping a column destroys data permanently. Before dropping, verify you have backups and have extracted any data you might need. Consider creating a backup table first: CREATE TABLE orders_legacy_backup AS SELECT id, legacy_status FROM orders;

Pattern 6: Adding a foreign key constraint

Foreign keys validate existing data. If any rows violate the constraint, the ALTER fails. For existing tables with potentially dirty data:

Query for violating rows: SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM customers);
Fix or remove violating data
Add the constraint: ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id);

On very large tables, consider adding the constraint as NOT VALID first (PostgreSQL), then validating separately:

PostgreSQL:

ALTER TABLE orders ADD CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID;

ALTER TABLE orders VALIDATE CONSTRAINT fk_customer;

The first statement is fast (no data scan). The second validates in the background without blocking writes.

Pattern 7: Adding a NOT NULL constraint

Adding NOT NULL to an existing column that contains NULLs fails immediately. The process:

Backfill NULL values with appropriate defaults
Verify no NULLs remain: SELECT COUNT(*) FROM users WHERE email IS NULL;
Add the constraint

On large tables, backfilling in batches prevents long-running transactions:

Batched backfill: Process 10,000 rows at a time with UPDATE users SET email = 'unknown@example.com' WHERE email IS NULL AND id BETWEEN ? AND ?;

Zero-Downtime Deployment Strategies

Production systems can't go offline for every schema change. Zero-downtime migrations require understanding the relationship between schema changes and application deployments. This coordination is part of broader infrastructure management.

The deployment ordering rule

For any migration, exactly one of these orderings is safe:

Deployment ordering by change type
Change type	Safe order	Reason
Add column/table	Migration first	New code expects structure to exist
Remove column/table	Code first	Old code must stop referencing before removal
Add constraint	Code first (usually)	Code should already satisfy constraint
Remove constraint	Migration first	New code may violate old constraint
Rename	Expand-contract	Neither order is safe; use three-phase approach

Rolling deployments and schema compatibility

In rolling deployments, old and new application versions run simultaneously. Both versions must work with the current schema. This creates a compatibility window: any schema change must be backward-compatible with the previous code version and forward-compatible with the next.

The implication: most changes require multiple deployments. You can't add a required column in one deployment, because old instances (still running) don't write to it. Add it as nullable first, deploy code that writes to it, then add the NOT NULL constraint.

Blue-green deployments

Blue-green deployments run two complete environments. Traffic switches atomically from blue to green. This sounds simpler for migrations, but both environments share the same database. The same compatibility rules apply.

Blue-green does enable faster rollback: switch traffic back to blue. But if green ran a destructive migration, switching back doesn't undo it. Plan migrations to be rollback-safe independent of deployment strategy.

Feature flags for complex migrations

For complex multi-phase changes, use feature flags to control which code paths execute:

Deploy code that can read from old or new structure, controlled by flag
Run migration with flag set to "old"
Backfill data
Flip flag to "new" for a percentage of traffic
Monitor for errors
Increase percentage until 100%
Deploy code that only knows new structure
Remove old structure

This approach adds complexity but provides granular control. Reserve it for changes with significant risk.

Large Table Migrations

Tables with millions of rows require special handling. Operations that take milliseconds on small tables can run for hours on large ones, blocking production traffic the entire time.

Understanding table size thresholds

The exact threshold depends on hardware, database engine, and operation type. As rough guidance for row counts:

Under 100,000 rows: Most operations complete quickly. Standard migration approaches work.
100,000 to 1 million rows: Test migration timing. Some operations may need off-peak scheduling.
1 million to 10 million rows: Careful planning required. Use online schema change tools or maintenance windows.
Over 10 million rows: Every change needs explicit planning. Consider partitioning, archiving, or table rebuilding strategies.

Online schema change tools

These tools perform schema changes without extended locking by creating a shadow table, applying the change to the shadow, copying data over, and swapping tables atomically.

pt-online-schema-change (Percona)

MySQL tool that uses triggers to capture changes during copy. Mature, well-tested. Requires triggers, which some environments disable.

gh-ost (GitHub)

MySQL tool that uses binary log streaming instead of triggers. Lighter impact on production. More complex setup.

pg_repack (PostgreSQL)

Repacks tables and indexes to reclaim space and rebuild structure. Useful for table bloat after large deletions.

LHM (Large Hadron Migrator)

Rails-specific tool for online MySQL migrations. Integrates with ActiveRecord. Good for Rails shops.

Batched operations

For data backfills and updates, process in batches to avoid long-running transactions and lock contention:

Batching pattern: Process rows in chunks of 5,000-10,000. Commit after each batch. Add a small delay between batches to let replication catch up and other queries execute.

Batching is essential for:

Backfilling new columns with computed values
Data type conversions
Deleting large numbers of rows
Updating denormalised values

Maintenance windows

Some changes can't avoid locking. When you must lock a large table:

Schedule during lowest-traffic period
Communicate the window to stakeholders
Test the migration timing on production-size data
Have a rollback plan ready
Monitor replication lag if applicable
Set a time limit: if not done in X minutes, abort and retry later

Schema Migrations vs Data Migrations

Schema migrations change structure: tables, columns, indexes, constraints. Data migrations change values: backfilling, transforming, cleaning. They have different characteristics and should often be separate.

Why separate them

Aspect	Schema migrations	Data migrations
Speed	Usually fast (DDL operations)	Often slow (row-by-row processing)
Reversibility	Usually reversible (add/drop operations)	Often irreversible (data transformation)
Transaction scope	Entire operation in one transaction	Should run in batches outside transactions
Failure mode	All-or-nothing (transaction rollback)	Can fail partway (needs resumability)

Data migration patterns

Backfilling computed values: When adding a column that should be computed from existing data (e.g., full_name from first_name and last_name), add the column in a schema migration, then backfill in a separate data migration or background job.

Format transformation: Converting phone numbers to E.164 format, splitting full addresses into components, normalising email addresses. These require row-by-row processing with validation. Run as background jobs with progress tracking.

Data cleanup: Fixing inconsistencies, removing orphaned records, correcting invalid values. Often best done as one-time scripts outside the migration system, since they're not idempotent and don't need to run in new environments.

Idempotency matters

Schema migrations are inherently idempotent: the migration system tracks what's applied. Data migrations should be explicitly idempotent:

✓

Good: UPDATE users SET normalized_email = LOWER(email) WHERE normalized_email IS NULL;

Avoid: UPDATE users SET normalized_email = LOWER(email); (re-running needlessly updates every row)

Rollback Strategies

Every migration should have a rollback plan before it runs. "Just restore from backup" is not a rollback plan. It's a disaster recovery plan, and it means downtime.

Rollback by reversal

The "down" migration reverses the "up" migration. This works for additive changes:

Added a column: Drop it
Added a table: Drop it
Added an index: Drop it
Added a constraint: Drop it

Write the down migration at the same time as the up migration. Test it. A down migration that's never been run is a liability, not a safety net.

Rollback with data preservation

Some rollbacks would lose data. Before dropping a column or table, consider whether you might need that data back:

Create a backup table before dropping: CREATE TABLE orders_status_backup AS SELECT id, status FROM orders;
Export to a file: COPY (SELECT id, status FROM orders) TO '/tmp/orders_status.csv';
Keep the column longer than you think necessary (weeks, not days)

Rollback-incompatible changes

Some changes can't be cleanly rolled back:

Data transformations: Converting UK postcodes to uppercase destroys the original case. Store the original if you might need it.
Data merges: Merging two columns into one loses the separation. Keep source columns until you're certain.
Destructive updates: Overwriting values based on business logic. Take a snapshot first.

Testing rollbacks

Before running a migration in production:

Run the up migration in a test environment
Run the down migration
Run the up migration again
Verify the schema matches the original up result

Migrations that pass this cycle are safe. Migrations that fail it need fixing before they go anywhere near production.

Testing Migrations

Migration testing goes beyond "it works on my machine". Production has data volumes, edge cases, and constraints that development doesn't.

Test against production-like data

A migration that runs in 50ms on an empty table might run for 30 minutes on 10 million rows. A column that's always populated in test data might have NULLs in production. Test with realistic data volumes and variety.

Options for production-like test data:

Anonymised production snapshots: Copy production, strip PII, use for testing
Synthetic data at scale: Generate millions of rows with realistic distributions
Staging environment: Maintain staging with production-like data volume

Timing estimation

Measure migration duration on production-like data. Record it. Use it to estimate production timing. Build in buffer: migrations often run slower on production due to concurrent load.

Rule of thumb: If a migration takes more than 30 seconds on production-like data, it needs special handling. Schedule off-peak, use online schema change tools, or batch it.

CI/CD integration

Run migrations as part of CI to catch errors early:

Fresh database: Run all migrations from scratch. Catches syntax errors and ordering issues.
Rollback test: Run up, then down, then up again. Catches incomplete down migrations.
Seed data test: Run migrations, then seeders. Catches incompatibilities with expected data.

Pre-deployment checks

Before running migrations in production:

✓

Review the SQL Understand exactly what statements will execute. No surprises.
✓

Check for locks Identify operations that acquire table locks. Plan accordingly.
✓

Estimate timing Know how long it will take. Set expectations.
✓

Document the rollback Write down the exact steps to reverse if needed.
✓

Coordinate with team Ensure no conflicting migrations. Check deployment schedule.

Coordinating Migrations with Deployments

Migrations don't exist in isolation. They're part of a deployment process that includes code changes, configuration updates, and often multiple services.

Single-service deployments

For a single application with its database, the typical flow:

Put application in maintenance mode (if migrations require it)

Run migrations

Deploy new code

Verify health, exit maintenance mode

For zero-downtime deployments, migrations run while old code is still serving traffic. The migration must be backward-compatible with old code.

Multi-service deployments

When multiple services share a database (not ideal, but common), migrations require coordination:

Identify all services that touch the affected tables
Ensure all services are compatible with both old and new schema during transition
Deploy services in the correct order
Use expand-contract to avoid simultaneous updates

Migration deployment automation

Automate migration execution as part of your deployment pipeline:

Pre-deployment hook: Run migrations before new code starts
Health checks: Verify database connectivity and migration success before proceeding
Rollback triggers: Automatic rollback if deployment fails
Notifications: Alert team when migrations run and complete

Never run migrations manually in production if you can avoid it. Manual execution introduces human error and lacks audit trails.

Common Mistakes

These mistakes appear repeatedly. Learn from others' pain.

Not testing rollbacks

Writing down migrations that have never executed. When you need them at 2am, they fail with syntax errors or logic bugs. Test every down migration before the up migration goes to production.

Editing applied migrations

Changing a migration file after it's run in staging or production. Creates checksum mismatches or silent divergence. Your production schema no longer matches what the migration file says.

Mixing schema and data migrations

Complex data transformations inside schema migrations. Makes migrations slow, hard to reverse, and prone to timeout. Separate structural changes from data changes.

Ignoring production scale

Testing only on empty databases. That ALTER TABLE takes 50ms locally but 45 minutes on the production table with 20 million rows. Test with realistic data volumes.

Deploying code and migrations together

When something fails, you don't know if it's the migration or the code. Separate them temporally so you can isolate failures.

No rollback plan

"We'll figure it out if something goes wrong." You won't. Under pressure, with alerts firing, you'll make mistakes. Write the rollback plan before you run the migration.

Migration Tooling

Most frameworks provide migration support. Choose tools that track state in the database, support up and down operations, provide clear reporting, and integrate with your deployment pipeline.

Common migration tools by ecosystem
Framework/Language	Built-in tool	Notable features
Laravel (PHP)	Artisan migrations	Schema builder, rollback support, seeding integration
Rails (Ruby)	ActiveRecord migrations	Ruby DSL, reversible blocks, strong_migrations gem for safety
Django (Python)	Django migrations	Auto-generated from models, squashing, RunPython for data migrations
Node.js	Knex, Sequelize, Prisma	Varies by ORM. Knex offers raw SQL. Prisma uses declarative schema.
Standalone	Flyway, Liquibase	Language-agnostic, SQL or XML/YAML definitions, enterprise features
Go	golang-migrate, goose	CLI tools, SQL files, embedded migration support

For MySQL specifically, consider adding strong_migrations (Rails) or equivalent linting tools that catch dangerous operations before they reach production. These tools flag operations like adding indexes without CONCURRENTLY, changing column types, or removing columns without verification.

What You Get

Proper migration discipline yields compounding benefits over time.

✓

Zero-downtime schema changes Users never see maintenance windows for routine changes
✓

Confident rollbacks Every change is reversible because the down migration is tested
✓

Environment parity Development, staging, and production schemas match exactly
✓

Audit trail Every schema change is versioned, timestamped, and attributable
✓

Team velocity Developers make schema changes without fear or coordination overhead
✓

Predictable deployments Migration timing is known in advance, never a surprise

Schema evolution becomes routine. Changes that once required weekend maintenance windows deploy during business hours without anyone noticing.

Migrate Your Database Safely

We implement database migration strategies that make schema changes safe and reversible. Versioned, tested changes that evolve your database without downtime or data loss. Confident deployments even for complex structural changes.

Let's talk about database migrations →

Related: Data Modelling →

Development

Systems