Moving and Upgrading Databases Safely
Your application code lives in Git. Every change is tracked, reviewable, and reversible. Your database schema should work the same way. Database migrations are the mechanism that makes this possible: versioned, ordered, trackable changes to your database structure that deploy alongside your application code.
The concept is simple. The execution in production is where the complexity lives. A migration that works on your local machine with 200 rows will lock a production table with 15 million rows for 40 minutes. A column rename that looks safe will break every query referencing the old name the moment it runs. A rollback that should undo the last deploy will silently drop a column that has already been populated with new data.
These are not hypothetical risks. We have encountered every one of them across the Laravel applications we build and maintain.
Scope: This page covers application-level schema evolution: versioned ALTER TABLE changes deployed alongside your code in a live MySQL or PostgreSQL system. If you are migrating between database engines or moving from on-premise to cloud, that is a different discipline entirely. For that, see our legacy migration page.
Why Database Migrations Fail in Production
Most tutorials teach database migrations as a straightforward pattern: write an "up" method to apply the change, write a "down" method to reverse it, run the command. This works in development. It fails in production for three reasons, and each one becomes more acute as your system scales beyond a single server.
State is permanent
Code deployments replace the old version entirely. Schema changes modify a live, stateful system. A dropped column is gone. A type change may truncate data silently. There is no "git checkout" for a production database.
Scale changes the physics
Adding an index to a table with 500 rows takes milliseconds. Adding an index to a table with 50 million rows takes minutes, and the table is locked for writes the entire time. Every migration has a different risk profile depending on the data volume behind it.
Concurrency creates conflicts
During a rolling deployment, old code and new code run simultaneously. If the new code expects a column that does not exist yet, or the old code expects a column that has just been dropped, requests fail. The deployment window is not atomic.
The gap between tutorial-level migrations and production-grade database migration strategy is where most teams get bitten. Without disciplined schema evolution, you end up with schema drift between environments and no reliable way to reproduce what ran where. The patterns that follow close that gap.
Safe Operations Versus Dangerous Operations
Not all schema migrations carry the same risk. The distinction matters because it determines your deployment strategy.
Generally safe (additive)
- Adding a new table
- Adding a nullable column (no default required, no table rewrite)
- Adding an index (watch table size; see large table section below)
- Adding a new enum value at the end of the list
Dangerous (require coordination)
- Removing a column (requires multi-phase deploy)
- Renaming a column or table (use expand-contract)
- Changing a column type (safe only if lossless)
- Dropping a table (irreversible once committed)
The rule is straightforward: anything additive is generally safe. Anything that modifies or removes existing structures needs a multi-step approach with code changes deployed before and after the schema change. This applies to your application code, but also to any API integrations that query the affected tables directly.
Zero-Downtime Migration Patterns
For any application that cannot tolerate downtime during deploys, database migrations must be backward-compatible. Old code and new code must both function correctly against the schema at every intermediate state.
The expand-contract pattern
This is the workhorse pattern for any non-trivial schema change. It works in three phases.
Deployment ordering
The sequence matters. For adding a column: deploy the migration first, then deploy the code that uses it. For removing a column: deploy the code that stops using it first, then deploy the migration that drops it. Getting this backwards is one of the most common causes of production incidents during deploys.
| Change Type | Safe Order | Reason |
|---|---|---|
| Add column/table | Migration first | New code expects the structure to exist |
| Remove column/table | Code first | Old code must stop referencing before removal |
| Add constraint | Code first (usually) | Code should already satisfy the constraint |
| Rename | Expand-contract | Neither order is safe alone; use three-phase approach |
In our CI/CD pipelines, we enforce this ordering through deployment checklists and migration review gates.
Feature flags as a safety net
For complex schema transitions, feature flags add a layer of control that deployment ordering alone cannot provide. The pattern works like this: deploy the expand-phase migration, gate the new code path behind a feature flag, enable the flag for a small percentage of traffic, verify correctness, then ramp to 100%. If anything goes wrong, disable the flag immediately. The schema stays expanded, no rollback needed, and you debug at your own pace.
This is particularly valuable during the dual-write phase of an expand-contract migration. The flag controls which code path is active, so you can switch between old and new behaviour without redeploying. Once the contract phase is complete and the old structure is removed, retire the flag. The discipline is in the lifecycle: a flag that outlives its migration becomes technical debt. We tie flag cleanup to the same ticket that tracks the contract-phase migration.
Large Table Migrations
When a table exceeds one million rows, standard ALTER TABLE statements become risky. The database engine may need to rewrite the entire table, locking it for the duration. On a busy table, this means blocked writes, queued requests, and eventually timeouts.
Row count risk thresholds
| Row Count | Risk Level | Approach |
|---|---|---|
| Under 100,000 | Low | Standard migrations work fine |
| 100,000 to 1 million | Moderate | Test timing on production-size data first |
| 1 to 10 million | High | Plan carefully, consider online schema change tools |
| Over 10 million | Critical | Explicit strategy per change, use gh-ost or pt-online-schema-change |
Online schema change tools
These tools modify large tables without holding long-running locks.
gh-ost (GitHub)
Creates a ghost table with the new schema, copies rows incrementally, tails the binlog for ongoing changes, then performs an atomic swap. No triggers required, and the operation can be paused or throttled.
pt-online-schema-change (Percona)
Uses a similar approach with triggers to capture ongoing writes. Well-tested and battle-proven, but triggers add overhead on write-heavy tables.
PostgreSQL's built-in advantages
PostgreSQL handles large table migrations better than MySQL in several material ways, which is one reason we prefer it for production systems.
- Non-blocking column additions: Adding a nullable column does not rewrite the table. Since version 11,
ALTER TABLE ... ADD COLUMNwith aDEFAULTvalue also avoids table rewrites, making it safe even on tables with tens of millions of rows. - Concurrent index creation:
CREATE INDEX CONCURRENTLYbuilds the index without holding anACCESS EXCLUSIVElock, so writes continue uninterrupted. The trade-off is a longer build time and the possibility of an invalid index if the operation fails partway through (check for invalid indexes by queryingpg_indexwhereindisvalid = false, drop any invalid indexes, then retry). - Non-blocking constraint validation: Add a constraint with
NOT VALIDto skip checking existing rows, then runVALIDATE CONSTRAINTseparately. Validation takes a weakerSHARE UPDATE EXCLUSIVElock, allowing concurrent reads and writes. This is the safe pattern for adding foreign keys orCHECKconstraints to large tables. - Transactional DDL: Unlike MySQL, where DDL statements are auto-committed and partial failures leave orphaned structures, PostgreSQL wraps DDL in transactions. A failed migration rolls back cleanly, leaving the schema untouched.
These capabilities often eliminate the need for external tools like gh-ost or pt-online-schema-change. We select the approach based on the database engine, table size, write volume, and acceptable risk window. For MySQL, where native online DDL is more limited, external tools remain essential for high-row-count tables.
Data Migrations Versus Schema Migrations
Schema migrations change the structure of your database: columns, indexes, constraints, tables. Data migrations change the content: backfilling a new column, transforming existing values, merging duplicate records. Mixing them in a single migration file is a common mistake. They should be separate because they differ in critical ways.
| Aspect | Schema Migrations | Data Migrations |
|---|---|---|
| Speed | Usually fast (except on large tables) | Can run for hours depending on volume |
| Reversibility | Clean reversals (drop what you added) | May not be reversible without backups |
| Transaction scope | Typically transactional | Should batch rows and commit incrementally |
| Failure mode | Usually rolls back cleanly | May partially complete, leaving inconsistent state |
In Laravel's migration system, we structure these as separate migration files with clear naming conventions: add_status_column_to_orders for schema, backfill_order_status_from_legacy_field for data. The ordering ensures the schema change runs first.
For data migrations that touch millions of rows, we use background jobs with progress tracking. The migration file dispatches the job; the job batches the data transformation. Each batch must be idempotent so that a partial failure followed by a retry does not corrupt already-migrated rows. This keeps the deploy fast and the data migration observable.
Rollback Viability by Migration Type
Everyone assumes rollbacks work until they need one. The reality is that rollback safety depends entirely on the type of migration and how much time has passed since it ran. This table covers the five common categories.
| Migration Type | Rollback Safe? | Data at Risk |
|---|---|---|
| Additive (new table, nullable column) | Yes | Drop what was added. No data loss unless the new structure has been populated. |
| Backfill (populate new column) | Usually safe | Backfilled data is lost, but the source data still exists. Re-run the backfill after fixing the issue. |
| Constraint addition (NOT VALID, then VALIDATE) | Yes | Drop the constraint and re-validate later. No data affected. |
| Destructive rename or drop (contract phase) | No | After the contract phase, the old structure is gone. Roll forward with a fix. Rollback cannot restore dropped columns. |
| Lossy type change or field merge | No | Original data has been overwritten or merged. Point-in-time recovery from backups is the only option. |
The key question is not "can I roll back?" but "when does rolling forward become safer than rolling back?" Once new data has been written to a new column or new structure, rolling back destroys that data. After that threshold, fixing the problem in place and deploying a corrective migration is almost always the right call. This is why every destructive migration should have a documented roll-forward plan, not just a down() method. Understanding your recovery point objective (RPO) and recovery time objective (RTO) before you deploy helps you decide how much risk each migration type actually carries.
Multi-Tenant Migration Patterns
Running database migrations across multiple tenants introduces coordination challenges that single-database applications do not face. These patterns handle both shared-database and database-per-tenant architectures in multi-tenant Laravel systems.
Shared database (tenant column)
All tenants share one database with a tenant_id column on every table. Migrations run once and affect all tenants simultaneously. The key discipline: test migrations against production-scale data, not just a single tenant's worth of rows. A table that holds 10,000 rows per tenant and serves 500 tenants has 5 million rows.
Database per tenant
Each tenant has their own database. Migrations must run against every database individually. The failure mode: 400 of 500 tenant databases migrated successfully, but 100 failed. Our migration runner iterates through tenant databases, records success or failure per tenant, and provides a real-time dashboard showing status across the fleet.
This is one area where the data model design has long-term consequences. Choosing shared versus per-tenant databases in the early stages of the project determines your migration complexity for the life of the application.
Migration File Discipline
Migration files are not throwaway scripts. They are a permanent record of your schema's evolution, and treating them with the same rigour as application code prevents an entire category of team-level problems.
add_status_column_to_orders tells a reviewer exactly what will happen. update_orders_table tells them nothing. When debugging a production issue at 2am, the migration filename is the first thing you read.schema:dump collapses all migrations into a single SQL file. This speeds up fresh installs but destroys the change history that is valuable for debugging. Squash when migration count exceeds 200 or test suite setup time becomes painful, but keep the original files in version control.These rules matter most when multiple developers are creating migrations on different branches. Timestamp-based ordering (Laravel's default) handles most conflicts, but two developers modifying the same table simultaneously will still produce migrations that work individually and fail when merged. The fix is a review convention: any migration touching a shared table gets flagged in the pull request for sequencing review.
Testing Migrations in CI/CD Pipelines
Migrations that are not tested before production will eventually fail in production. The testing strategy depends on what can go wrong.
Structural testing
Every migration should pass an up-down-up cycle: apply it, reverse it, apply it again. This verifies that the "down" method correctly undoes the "up" method and that the migration is re-runnable. In our CI pipelines, this runs automatically on every pull request that includes a migration file.
Data-volume testing
A migration that completes in 200 milliseconds against a test database with 50 rows may take 20 minutes against production data. We maintain a staging environment with production-scale data (anonymised) specifically for testing migration timing. Any migration touching a table with more than 500,000 rows requires a timing test before approval.
Dependency testing
Migrations have implicit dependencies on application code. A migration that adds a non-nullable column without a default value will fail if the application has not been updated to provide that value. CI should run the application's test suite against the migrated schema to catch these mismatches.
Migration review checklist
Before any migration reaches production, we verify six conditions.
-
Deployment order documented Does it require migration before code, or code before migration?
-
Table size assessed Does it touch a table with more than 100,000 rows?
-
Destructive changes flagged Does it modify or remove an existing column?
-
Rollback tested Has the "down" method been verified on a test database?
-
Data migration separated Does it include a data migration that should be its own file?
-
Production-scale timing measured Has it been tested against realistic data volumes?
This checklist catches the majority of migration incidents before they reach production. We enforce it through pull request templates and automated CI checks.
Common Failure Modes
These are the migration failure patterns that cause the most damage in production. Each one is drawn from a real incident.
Lock timeout on a busy table
The migration acquires a lock, but the table is under heavy write load. The lock request queues behind existing transactions, and the migration times out. Schedule migrations on high-traffic tables during low-traffic windows. Use online schema change tools that avoid long-running locks.
Rollback that destroys data
The "down" method drops a column added in the "up" method. Between deploy and rollback, the application wrote thousands of records to that column. The rollback drops it all. Treat rollbacks as a safety net. For columns that accumulate data, rename rather than drop.
Partial migration in a non-transactional database
MySQL's DDL statements are not fully transactional. A migration that creates a table and then adds a foreign key can fail on the foreign key, leaving an orphaned table. Keep migrations small. One structural change per file. PostgreSQL handles this better with transactional DDL, which is one reason we prefer it for infrastructure decisions.
Migration ordering conflicts in team development
Two developers create migrations simultaneously for the same table. Both work individually. When merged, the ordering may cause conflicts. Use timestamps for migration ordering (Laravel's default). Review migration files in every pull request. Squash old migrations periodically.
What good migration practice looks like
Database migrations are one of those areas where the difference between "it works" and "it works in production" is significant. Every pattern described above comes from real incidents, real production systems, and real fixes applied under pressure.
Treating migrations as first-class code means they go through code review, they have tests, and they have a deployment strategy. For systems with strict audit requirements, migrations are logged and tied to deploy records.
The result is that schema changes become routine. Changes that once required weekend maintenance windows deploy during business hours without anyone noticing. That is the goal: treating your database as code, with versioning that is as boring and reliable as the rest of your deployment pipeline. Tools like Flyway and Liquibase follow the same principles for non-Laravel stacks, but Laravel's built-in migration system handles the vast majority of what we need. For applications already in production, schema management is part of our ongoing support service, where migration discipline is maintained alongside security patches and feature work.
Get Your Migrations Right
If you are dealing with migration complexity across multiple databases, large tables, or multi-tenant systems, we are happy to talk it through.
Discuss your migration challenges →