Server Management, Self-Hosting, and DevOps Basics
Your application is only as reliable as the servers it runs on. The most carefully tested Laravel codebase becomes worthless when a deployment corrupts the database, a disk fills up at 3am, or a certificate expires without anyone noticing. Web application infrastructure is the set of decisions and systems that prevent those failures.
The patterns described here come from provisioning and managing infrastructure across a wide range of Laravel applications, some running continuously for over a decade. Real failures, real fixes, and the operational discipline that keeps production systems alive. That includes taking over servers configured by previous developers, with no documentation and no obvious way to rebuild them.
The Constraint: Why Infrastructure Decisions Compound
Most development teams treat infrastructure as an afterthought. The application gets attention; the server gets whatever the default setup provides. This works until it does not.
A single misconfigured Nginx worker pool causes request queuing under load. A database running on the same disk as the application means a large import fills the volume and crashes both. An SSL certificate that renews manually gets forgotten and takes the site offline on a Saturday morning.
The compounding effect: Every infrastructure decision constrains future options. The hosting provider you choose determines your scaling options. Your deployment method determines your rollback speed. Your monitoring setup determines whether you find problems or your users do.
The constraint is this: infrastructure must be reproducible, observable, and recoverable. If you cannot rebuild a server from scratch in under an hour, you do not have infrastructure. You have a snowflake.
The Naive Approach: Manual Servers and Hope
The tutorial version of deployment looks like this: SSH into a server, run git pull, run composer install, run php artisan migrate, restart PHP-FPM. It works on the first deploy. It fails on the fiftieth.
The problems accumulate gradually. File permissions drift. Environment variables get edited directly on the server and never recorded anywhere. A failed migration leaves the database in a half-applied state. A composer install downloads a new dependency version that breaks the application, and there is no way to roll back without restoring a full backup.
The pattern extends to hosting decisions. A team starts on shared hosting because it is cheap. The application grows. Shared hosting throttles CPU during peak hours. The response is to upgrade to a bigger plan, then a VPS, then a managed platform, each migration requiring a full rebuild because nothing was documented or automated.
The Robust Pattern: Infrastructure as a Managed System
We treat web application infrastructure as code, not as a set of manual configurations. Every server we provision follows a repeatable process. Every deployment is atomic and produces an immutable build artefact. Every failure mode has a documented recovery path.
The stack
Our standard Laravel infrastructure stack uses these components, each chosen for a specific reason and managed against a specific failure mode.
Ubuntu LTS on a VPS
Hetzner or DigitalOcean, depending on region and requirements. Long-term support releases provide security patches without breaking changes.
Laravel Forge for provisioning
Server provisioning, SSL management, and deployment orchestration. Handles PHP installation, Nginx configuration, and Let's Encrypt certificates.
Nginx + PHP-FPM + OPcache
Nginx handles request routing and SSL termination. PHP-FPM manages application worker processes, with pool sizes calculated from available memory (each worker consumes roughly 30-50MB). OPcache eliminates repeated PHP compilation, reducing response times by 50-70% on typical Laravel requests.
PostgreSQL + Redis
PostgreSQL on a separate volume or server for I/O isolation. Redis provides sub-millisecond reads for cache, session storage, and queue processing.
This is not a complex stack. It is deliberately simple. Fewer moving parts means fewer failure points, and every component has been battle-tested across hundreds of thousands of production hours.
PHP-FPM worker sizing
Most PHP hosting guides say "tune your workers" without providing the actual calculation. Here it is.
pm.max_children formula:
(Total RAM - OS overhead - PostgreSQL shared_buffers - Redis maxmemory - queue worker memory) / per-worker RSS
Measure per-worker RSS with: ps -eo rss,comm | grep php-fpm | awk '{sum+=$1; n++} END {print sum/n/1024 " MB average"}'. On a typical Laravel application, expect 30-50MB per worker. A 4GB VPS with PostgreSQL and Redis running locally leaves roughly 2.5GB for PHP-FPM, which supports around 50-80 workers. Set pm = static for predictable memory usage on dedicated servers, or pm = dynamic on shared environments where memory must be reclaimed during quiet periods.
Why Laravel Forge
We use Laravel Forge to provision and manage servers. Forge handles the tedious parts of server setup while still allowing SSH access for the remaining 10% that requires custom configuration. The alternative is maintaining Ansible playbooks or shell scripts for server provisioning. We have done both. Forge reduces the operational burden for the 90% case.
Forge also provides a deployment pipeline: pull code, install dependencies, run migrations, build assets, restart PHP-FPM. Each step is logged. Failures halt the pipeline before the application is affected. For teams that need more control over deployment orchestration, Envoyer provides dedicated zero-downtime deployment management with release history and one-click rollback.
Hosting Decisions: VPS, Cloud, and Managed Platforms
Choosing where to host a web application is a decision with long-term consequences. The wrong choice costs money, limits scaling options, or creates vendor dependency.
VPS hosting (our default)
For most Laravel applications serving under 50,000 daily users, a well-configured VPS is the correct choice. A single server with 4 vCPUs, 8GB RAM, and SSD storage handles more traffic than most businesses generate.
We default to Hetzner for European hosting. The cost difference is significant: a Hetzner CPX31 (4 vCPU, 8GB RAM, 160GB SSD) costs approximately €15/month. The equivalent DigitalOcean droplet costs $48/month. The equivalent AWS EC2 instance costs roughly $70/month before storage and data transfer. That price gap compounds over years.
VPS hosting also means you own your infrastructure. No platform lock-in, no proprietary APIs, no sudden pricing changes. This ties directly into digital sovereignty: if your business logic, customer data, and operational processes live on servers you control, you retain the freedom to move, modify, or scale without asking a platform vendor for permission.
When to use cloud platforms
AWS, Google Cloud, and Azure make sense when you need specific managed services: object storage with CDN (S3 + CloudFront), managed database clusters with automated failover, or serverless compute for unpredictable workloads.
Cost warning: Cloud billing is notoriously difficult to predict. Egress charges, NAT gateway fees, and per-request pricing on managed services can triple the expected monthly cost. We have seen AWS bills double overnight because a misconfigured logging pipeline was writing gigabytes to CloudWatch.
Our rule: start with a VPS. Move specific services to cloud platforms when you have a concrete requirement that a VPS cannot satisfy. Do not start on AWS because it feels professional. Start on a VPS because it is simple, fast, and cheap.
Managed application platforms
Laravel Vapor, Railway, and similar platforms abstract away server management entirely. These platforms suit applications with highly variable traffic or teams with no infrastructure expertise. The cost per request is higher, but the operational burden is near zero. The limitation is control: when something goes wrong at the platform level, you wait for their support team.
Containerisation: when it helps and when it does not
Docker and Kubernetes appear in most infrastructure discussions. For teams running fewer than five services with a development team under ten people, containerisation adds operational overhead without proportional benefit. The debugging complexity, resource consumption, and learning curve exceed what most SMB applications require.
Containers become genuinely useful when you manage five or more distinct services, need environment parity across a large development team, or run a mature CI/CD pipeline that benefits from immutable build artefacts. Below those thresholds, Forge on a VPS has a lower total cost of ownership and a simpler failure surface. This is not an opinion against containers. It is a decision framework: match the tooling to the complexity of the problem.
Zero-Downtime Deployments
Every production deployment we run follows the atomic deployment pattern. A deployment pipeline is not a luxury. It is the difference between a five-second rollback and a two-hour recovery.
Create a new release directory
Clone or pull the latest code into a fresh directory on the server. Install Composer dependencies with --no-dev --optimize-autoloader.
Run migrations and build assets
Run database migrations with a pre-flight check. Build frontend assets if required. Run a health check against the new release.
Swap the symlink
Point the current symlink at the new release directory. This is atomic. The application serves the old release until the exact moment the symlink changes.
Reload and clean up
Graceful PHP-FPM restart (no dropped connections). Purge old releases, keeping the last five for rollback. Rollback means pointing the symlink at a previous directory: a one-second operation.
Some teams adopt blue-green deployment or canary deployment strategies at this stage, running the new release alongside the old and shifting traffic gradually. For most single-server Laravel applications, the symlink swap achieves the same outcome with less operational overhead. The naive approach (running git pull in the live directory) means the application serves partially-updated code during deployment. A request hitting the server mid-pull might load old controllers with new views. This causes errors that are difficult to reproduce and diagnose.
When deployments fail
Every zero-downtime deployment guide covers the happy path. The harder question is what happens when a migration fails mid-deploy, when the new release passes health checks but breaks under real traffic, or when a Composer dependency introduces a runtime error that only surfaces in production.
The answer depends on the type of migration that ran. Additive migrations (adding columns, creating tables) are rollback-safe: point the symlink back to the previous release and the old code ignores the new columns. Destructive migrations (dropping columns, renaming tables) are not rollback-safe: the previous release expects columns that no longer exist. This is why we separate additive schema changes from destructive cleanup, deploying them in different releases with a buffer period between.
Rollback rule: If the migration was additive, roll back code immediately (symlink swap, one second). If the migration was destructive, you must roll forward with a fix. This distinction is the reason we never combine additive and destructive schema changes in a single deployment.
Monitoring and Alerting
Monitoring without alerting is data collection. Alerting without monitoring is guesswork. We configure both.
| Metric | Alert Threshold | Why It Matters |
|---|---|---|
| HTTP 5xx rate | Above 1% | Application errors affecting users |
| Response time (p95) | Above 500ms | Performance degradation under load |
| Disk usage | 80% warning, 90% critical | Most common infrastructure failure |
| Memory | Below 500MB free | Process starvation and OOM kills |
| SSL certificate expiry | 14 days before expiry | Auto-renewal can fail silently |
| Queue depth | Above configured threshold | Background work is backing up |
Tool selection for small server estates
Enterprise APM tools (Datadog, New Relic) cost more per month than the servers they monitor at SMB scale. For teams running 1-5 servers, a layered approach covers every metric without enterprise pricing. This is the practical side of site reliability engineering (SRE): matching observability tooling to the size of the estate.
Oh Dear or Uptime Robot
External uptime monitoring, SSL certificate expiry alerts, and mixed-content detection. These catch problems that server-side monitoring cannot: DNS resolution failures, CDN outages, and certificate renewal issues.
Laravel Pulse
Application-level metrics built into Laravel: slow queries, cache hit rates, queue throughput, and user request patterns. No external service required.
Netdata
Real-time server metrics (CPU, memory, disk I/O, network) with zero-configuration installation. Lightweight enough for production use on the same server it monitors.
Sentry
Error tracking with full stack traces, release tracking, and integration with deployment pipelines. Shows which deployment introduced a new error class.
Lessons from production
Certain monitoring lessons only come from experience. These are the ones that have saved us repeatedly.
--max-jobs=1000).Backup Strategy
We follow the 3-2-1 rule: three copies of data, on two different storage types, with one copy off-site.
Database backups
Automated daily PostgreSQL dumps via pg_dump, stored locally and replicated to off-site object storage. For production databases, we enable WAL (Write-Ahead Log) archiving, which provides point-in-time recovery: the ability to restore the database to any second, not just the last nightly dump.
Uploaded files
Synced to off-site storage nightly. Large media files stored directly in object storage (S3 or equivalent) rather than on the application server.
Server configuration
Managed through Forge. A new server can be provisioned from scratch in under 30 minutes.
The critical discipline is testing restores. A backup that has never been restored is a hope, not a backup. We test database restores monthly and document the recovery time.
Two numbers govern backup strategy: RTO (Recovery Time Objective, how long the business can tolerate being offline) and RPO (Recovery Point Objective, how much data loss is acceptable). A nightly pg_dump gives an RPO of up to 24 hours. WAL archiving with continuous shipping can reduce RPO to seconds. If a full restore takes longer than the business can tolerate, we adjust the strategy: faster storage, parallel restore, or a standby replica that can be promoted immediately.
Scaling: Vertical First, Then Horizontal
Premature scaling wastes money and adds operational complexity. Most Laravel applications never need horizontal scaling. Vertical scaling (upgrading the server) is simpler, cheaper, and sufficient for the vast majority of workloads.
Vertical scaling
A single well-configured server handles more traffic than most teams expect. Nginx serves static assets from memory. Redis eliminates repeated database queries. PHP-FPM worker tuning ensures available memory is used efficiently. When a server is genuinely under-resourced, the fix is straightforward: increase RAM, add CPU cores, switch to faster storage.
When horizontal scaling is necessary
Horizontal scaling becomes necessary when a single server cannot handle the requirements regardless of size, or when you need geographic distribution for latency reasons. It requires architectural changes that are best planned before they are needed.
The decision rule: if your monthly hosting cost is under £500 and you are not experiencing performance problems, you do not need horizontal scaling. Invest that engineering time in application-level optimisation instead: query optimisation, caching strategies, and efficient background job processing.
Common Infrastructure Symptoms
When something goes wrong in production, the symptom is rarely the cause. This reference maps the errors you see to the infrastructure problems that produce them.
| Symptom | Likely Cause | Fix |
|---|---|---|
| 502 Bad Gateway | PHP-FPM socket not running or worker pool exhausted | Check pm.max_children against available memory. Restart PHP-FPM. |
| 504 Gateway Timeout | Nginx fastcgi_read_timeout exceeded |
Optimise the slow request, or move long-running work to a background job. |
| Disk full | Unrotated log files, failed job output, or temporary uploads | Configure logrotate for Laravel logs. Monitor disk usage with alerts at 80%. |
| SSL certificate expired | Let's Encrypt ACME renewal failed silently | Check HTTP-01 challenge accessibility. Test certbot renew --dry-run. |
| "Too many connections" | PHP-FPM workers exceeding PostgreSQL max_connections |
Add PgBouncer for connection pooling, or reduce pm.max_children. |
| OOM killer in dmesg | pm.max_children set too high for available RAM |
Recalculate worker count using the formula above. Switch to pm = static. |
| Stuck queue jobs | Worker crash from memory leak or unhandled exception | Set --max-jobs=1000 and --max-time=3600 to force periodic worker restarts. |
Infrastructure as an Asset
Web application infrastructure is not a cost centre. It is an operational asset that determines uptime, deployment speed, and recovery capability.
-
Deployments happen multiple times per day Atomic deploys with one-second rollback. No stress, no downtime, no maintenance windows.
-
Failures are detected before users notice Continuous monitoring with targeted alerts. Problems found in minutes, not days.
-
Recovery follows a documented procedure Tested backups, rehearsed restores, known recovery times. No panicked improvisation.
-
Provider independence Standard Linux servers on standard infrastructure. Move providers in a day, not a quarter.
This is closely related to the question of owning versus renting your systems. Infrastructure decisions also affect your security and operational posture. Every component in the stack is a potential attack surface. Fewer components, kept current and monitored, means a smaller surface to defend. If you are migrating from a legacy system, the infrastructure plan must account for the transition period: running old and new systems in parallel, data synchronisation, and the eventual cutover. And infrastructure is not a one-time decision. It is an ongoing maintenance commitment that compounds in value when treated as a first-class concern.
Get Your Infrastructure Right
If you are running a web application and your infrastructure needs attention, we are happy to talk it through. Infrastructure management is a core part of our ongoing support service, covering monitoring, security patches, deployment pipelines, and capacity planning.
Discuss your infrastructure →