Keeping Systems Safe and Running

Building an application is one thing. Keeping it secure, available, and recoverable when things go wrong is another. Security and operations aren't glamorous, but they're what prevent 3am emergencies and data disasters.

Most security breaches and outages stem from neglecting basics: unpatched software, no backups, no monitoring. Getting the fundamentals right protects your business and your users. This page covers the patterns we apply to every production system we build and maintain.

Application Security: The OWASP Top 10

The OWASP Top 10 represents the most critical security risks to web applications. These aren't theoretical vulnerabilities; they're the attack vectors that actually get exploited. Every application we build is hardened against these categories from day one.

Reality check: Most breaches exploit known vulnerabilities with known fixes. The challenge isn't finding exotic zero-days; it's consistently applying well-documented defences across every endpoint, every form, every query.

A01: Broken Access Control

Users acting outside their intended permissions. The naive approach checks authentication but not authorisation, or checks it inconsistently.

Our pattern: Every controller action verifies the authenticated user has permission to access the specific resource. We use policy classes that encode business rules: "Can this user edit this record?" not just "Is this user logged in?" Authorisation checks happen at the query level, not just the route level. If a user requests order #12345, we verify they own that order before touching the database.

A02: Cryptographic Failures

Sensitive data exposed through weak or missing encryption. The naive approach stores passwords in plaintext, transmits data over HTTP, or uses deprecated algorithms.

Our pattern: Passwords are hashed with bcrypt or Argon2, never encrypted (you should never need to retrieve a password). All traffic uses TLS 1.2 or higher. Sensitive data at rest (payment details, personal data) is encrypted with AES-256. We never roll our own cryptography; we use vetted libraries with secure defaults.

A03: Injection

Untrusted data sent to an interpreter as part of a command or query. SQL injection remains common because developers concatenate user input into queries.

Our pattern: All database queries use parameterised statements or ORMs that parameterise automatically. User input never appears in a query string. The same applies to OS commands, LDAP queries, and XPath expressions. We treat all input as hostile until validated and escaped appropriately for its destination.

A04: Insecure Design

Missing or ineffective security controls at the design stage. You can't patch your way out of a fundamentally insecure architecture.

Our pattern: Threat modelling during design. We identify what could go wrong before writing code: What data is sensitive? Who should access it? What happens if this component is compromised? Security requirements are specified alongside functional requirements, not bolted on afterward.

A05: Security Misconfiguration

Default credentials, unnecessary features enabled, overly permissive settings. The attack surface expands with every unnecessary component.

Our pattern: Infrastructure as code with security baselines. Every server starts from a hardened template with unnecessary services disabled. Default credentials are changed before deployment (and this is enforced, not trusted). Debug modes and development tools are never present in production. Security headers are configured by default.

A06: Vulnerable and Outdated Components

Using libraries, frameworks, or other components with known vulnerabilities. Your application is only as secure as its weakest dependency.

Our pattern: Automated dependency scanning in CI/CD. Tools like Dependabot, Snyk, or npm audit flag known vulnerabilities before code reaches production. We maintain a schedule for routine updates, not just emergency patches. When a critical vulnerability is announced, we have processes to assess impact and deploy fixes within hours, not weeks.

A07: Identification and Authentication Failures

Weak authentication mechanisms, credential stuffing, session hijacking. The naive approach allows unlimited login attempts and uses predictable session tokens.

Our pattern: Rate limiting on authentication endpoints. Account lockout after failed attempts with increasing delays. Secure session management with HTTP-only, secure, same-site cookies. Session tokens regenerated after authentication. Multi-factor authentication available for sensitive operations.

A08: Software and Data Integrity Failures

Code and infrastructure that doesn't verify integrity. Auto-updates without signature verification. Deserialisation of untrusted data.

Our pattern: Signed releases and verified package sources. CI/CD pipelines that are themselves secured with access controls and audit logs. We never deserialise user-provided data without validation. Subresource integrity (SRI) for third-party scripts.

A09: Security Logging and Monitoring Failures

Breaches undetected because nobody was watching. Insufficient logging means attacks go unnoticed until the damage is done.

Our pattern: Comprehensive logging of security-relevant events: authentication attempts, authorisation failures, input validation failures, unusual access patterns. Logs are centralised, tamper-evident, and retained long enough to investigate incidents. Alerting on anomalies, not just failures.

A10: Server-Side Request Forgery (SSRF)

Application fetches a URL supplied by the user without validation. Attackers use this to access internal services, cloud metadata endpoints, or perform port scanning.

Our pattern: Allowlisting permitted URL schemes and destinations. Blocking requests to internal IP ranges, localhost, and cloud metadata endpoints (169.254.169.254). Validating and sanitising all user-supplied URLs before the application acts on them.

Authentication Patterns

Authentication is the first line of defence. Get it wrong and everything else becomes irrelevant. The naive approach implements authentication once and considers it done. The robust approach treats authentication as an ongoing concern with multiple layers.

Password Security

Passwords are hashed, never encrypted or stored in plaintext. We use bcrypt with a cost factor of 12 or higher, or Argon2id where available. Password requirements balance security with usability: minimum 12 characters, no arbitrary complexity rules that encourage predictable patterns.

Breached password checking against known compromised lists (HaveIBeenPwned API)
No password hints or security questions (they weaken security)
Secure password reset flows with time-limited, single-use tokens

Multi-Factor Authentication

Something you know (password) plus something you have (device). We implement TOTP (time-based one-time passwords) as the baseline, with WebAuthn/passkeys as the preferred option where supported.

Recovery codes generated at setup for account recovery
MFA required for sensitive operations even within authenticated sessions
Hardware security keys supported for high-value accounts

Session Management

Sessions are the persistent proof of authentication. Weak session management undermines strong login security.

Cryptographically random session IDs (minimum 128 bits entropy)
Session regeneration after authentication and privilege changes
Absolute timeout (e.g., 8 hours) and idle timeout (e.g., 30 minutes)
Secure cookie attributes: HttpOnly, Secure, SameSite=Strict
Session listing and remote logout capability

Brute Force Protection

Rate limiting prevents automated credential attacks. The naive approach locks accounts (enabling denial of service). The robust approach adds progressive delays.

Exponential backoff: 1s, 2s, 4s, 8s delays between attempts
CAPTCHA after threshold (5 failures)
IP-based and account-based rate limits (both matter)
Notification to account owner of suspicious activity

Authorisation Patterns

Authentication answers "who are you?" Authorisation answers "what can you do?" The naive approach conflates these, checking login status but not permissions. The robust approach implements granular, context-aware access control.

Principle of least privilege: Every user, process, and system component should have only the minimum access necessary to perform its function. Default deny. Explicit grant.

Role-Based Access Control (RBAC)

Users are assigned roles; roles have permissions. Simple to understand, easy to audit. Works well when access patterns map cleanly to job functions.

Implementation: Define roles (admin, manager, user, guest). Define permissions (create_order, view_reports, manage_users). Assign permissions to roles. Assign roles to users. Check permissions at every access point. Store role assignments in a normalised database structure that supports auditing.

Attribute-Based Access Control (ABAC)

Access decisions based on attributes of the user, resource, action, and environment. More flexible than RBAC but more complex. Appropriate when access rules depend on context.

Example: "Managers can approve expenses up to £5,000. Directors can approve any amount. Users can only view expenses from their department. Finance can view all expenses but not approve their own." These rules depend on multiple attributes and cannot be expressed as simple role checks.

Resource-Level Authorisation

Access control at the record level, not just the action level. The naive approach checks "can this user edit invoices?" The robust approach checks "can this user edit this specific invoice?"

Implementation: Query scoping at the data layer. User.invoices returns only invoices belonging to that user. Direct object references (invoice ID in URL) are always verified against the authenticated user's permissions before access. Never trust that a user should access a resource just because they know its identifier.

Authorisation failures should be logged but not reveal information. Return 403 Forbidden for unauthorised access to existing resources, 404 Not Found when the user shouldn't know the resource exists.

Input Validation and Output Encoding

All input is hostile until validated. All output must be encoded for its context. These are not optional practices; they are the fundamental discipline that prevents injection attacks.

Input Validation Strategy

Validate on the server. Client-side validation is for user experience, not security. Attackers bypass your JavaScript.

Allowlist, not blocklist: Define what is permitted, reject everything else
Type coercion: Expect an integer? Cast to integer. Expect an email? Validate format
Length limits: Every field has a maximum. Enforce it
Canonicalisation: Normalise before validation (Unicode, URL encoding)
File uploads: Validate type by content, not extension. Scan for malware. Store outside webroot

Output Encoding

Data changes meaning based on context. A string safe in HTML is dangerous in JavaScript or SQL.

HTML context: Encode < > & " to prevent XSS
JavaScript context: JSON encode, never interpolate into script blocks
URL context: URL encode parameters
SQL context: Parameterised queries (never string concatenation)
CSS context: Sanitise or reject user-controlled values

Modern templating engines (Blade, Twig, React) encode by default. The danger is when developers bypass this protection with "raw" or "dangerouslySetInnerHTML" for convenience. Every exception must be justified and reviewed.

Attack Vector	Naive Pattern	Secure Pattern
SQL Injection	"SELECT * FROM users WHERE id = " + userId	Parameterised: SELECT * FROM users WHERE id = ?
XSS	<div>{{ user.bio \| raw }}</div>	<div>{{ user.bio }}</div> (auto-escaped)
Path Traversal	file_get_contents($userPath)	Validate against allowlist, use basename()
Command Injection	exec("convert " + filename)	Escape arguments, use library APIs

Secrets Management

API keys, database credentials, encryption keys, OAuth secrets. Every application has them. The naive approach commits them to version control or hardcodes them. The robust approach treats secrets as first-class infrastructure concerns, particularly important when building API integrations that require credentials for external services.

If it's in Git, it's compromised. Git history is permanent. Even if you remove a secret in a later commit, it remains in history. Assume any secret that touched version control is burned and rotate it immediately.

Environment Variables

The minimum viable approach. Secrets are injected at runtime, not stored in code. Works for simple deployments.

.env files for local development (never committed)
Environment variables set in hosting platform (Forge, Vapor, Heroku)
CI/CD secrets for deployment credentials

Limitation: No encryption at rest, limited audit trail, rotation requires redeploy.

Secret Management Services

Purpose-built services for secrets: AWS Secrets Manager, HashiCorp Vault, Azure Key Vault. For production systems handling sensitive data.

Encryption at rest and in transit
Access control and audit logging
Automatic rotation for supported services
Versioning and rollback capability

Trade-off: Additional infrastructure complexity, potential single point of failure.

Secret Rotation

Secrets should be rotated regularly and immediately after suspected compromise. The naive approach uses the same database password for years. The robust approach automates rotation.

Database credentials: Rotate quarterly. Use connection poolers that support credential refresh without application restart.
API keys: Rotate when team members leave. Monitor usage and revoke unused keys.
Encryption keys: Support key versioning. New data uses new key; old data decrypts with old key. Gradual migration.
OAuth tokens: Short-lived access tokens, longer-lived refresh tokens. Revocation capability.

Dependency Security

Modern applications are 90% dependencies by code volume. A vulnerability in any dependency is a vulnerability in your application. The naive approach installs packages and never updates them. The robust approach treats dependency management as security hygiene.

Automated Scanning

Integrate vulnerability scanning into CI/CD. Fail builds for critical vulnerabilities. Alert on high and medium.

npm: npm audit, Snyk, Socket.dev
Composer: composer audit, Snyk, Roave Security Advisories
Python: pip-audit, Safety, Snyk
Containers: Trivy, Clair, Anchore

Update Strategy

Balance security with stability. Not every update is urgent; not every update can wait.

Critical vulnerabilities: Patch within 24 hours
High vulnerabilities: Patch within one week
Routine updates: Monthly update cycle with testing
Major version upgrades: Quarterly, with dedicated testing time

Dependabot, Renovate, or similar tools automate pull requests for dependency updates. Review them promptly. A backlog of 50 pending security updates is worse than useless; it's a false sense of coverage.

Supply chain attacks: Package registries can be compromised. Typosquatting (similar package names) and dependency confusion (internal package names hijacked) are real threats. Pin exact versions. Verify package integrity. Consider private registries for sensitive projects.

Backups

Backups are your last line of defence against data loss, ransomware, and catastrophic failures. The naive approach takes backups and hopes they work. The robust approach verifies backups regularly and knows exactly how long recovery takes.

The 3-2-1 Rule

A proven backup strategy:

3 copies of your data
2 different storage types (local + cloud, SSD + tape)
1 copy offsite (different physical location, different cloud region)

This protects against hardware failure, site disasters, and ransomware attacks. A single backup in the same location as your production system is not a backup; it's a copy waiting to be destroyed together.

What to Back Up

Database: Your most critical asset. Daily at minimum, hourly for high-transaction systems. Point-in-time recovery where available.
Uploaded files: User uploads, documents, media. Sync to object storage with versioning.
Configuration: Environment settings, infrastructure definitions. Encrypted if containing secrets.
Code: Already in version control, but verify you can restore a working deployment from a specific commit.

Restore Testing

Backups are worthless if you can't restore from them. We schedule regular restore tests:

Monthly: Restore database to test environment, verify integrity
Quarterly: Full system restore drill, measure actual recovery time
Document the exact steps (don't rely on memory during a crisis)
Verify the restored system functions correctly, not just that files exist

Retention Policy

How long to keep backups depends on your needs and regulatory requirements:

Hourly backups: Retained for 24 hours (for quick rollback)
Daily backups: Retained for 30 days
Weekly backups: Retained for 3 months
Monthly backups: Retained for 1 year
Annual backups: Retained per regulatory requirements (often 7 years for financial data)

Encrypt backups. Store encryption keys separately from backups. A backup you can't decrypt is useless. A backup an attacker can decrypt defeats the purpose.

Logging and Audit Trails

Logs are your forensic evidence. When something goes wrong, logs tell you what happened, when, and how. The naive approach logs errors only. The robust approach creates a complete audit trail that supports investigation, compliance, and debugging.

What to Log

Authentication events: Login, logout, failed attempts, password changes, MFA events
Authorisation failures: Access denied to resources
Data access: Who accessed what sensitive data, when
Data modifications: Who changed what, before/after values
Administrative actions: User creation, role changes, system configuration
Application errors: Exceptions, failed operations, unexpected states
Security events: Input validation failures, CSRF token mismatches, suspicious patterns

What Not to Log

Logs themselves can become a security liability:

Passwords: Never, even in failed login attempts (log username only)
Credit card numbers: Mask all but last 4 digits
Personal data: Consider pseudonymisation or omission
Session tokens: Use hashes or identifiers, not raw tokens
API keys: Reference by name, not value

GDPR and similar regulations apply to logs. Retention and access controls matter.

Log Management

Logs from multiple servers must be aggregated, searchable, and tamper-evident. We use centralised logging services (Papertrail, Loggly, ELK stack, CloudWatch Logs) with the following practices:

Structured logging: JSON format with consistent fields (timestamp, level, message, context)
Request correlation: Unique request ID propagated through all log entries for a single request
Immutable storage: Write-once logs that cannot be modified after creation
Retention: Minimum 90 days for operational logs, longer for security and compliance
Access control: Log access is audited. Not everyone needs to see everything.

Monitoring and Alerting

Monitoring without alerting is data collection. Alerting without action is noise. The robust approach monitors what matters, alerts when intervention is needed, and ignores the rest.

What to Monitor

Availability: Is the application responding? Can users complete key workflows?
Performance: Response times, database query times, memory usage, queue depth
Errors: Application exceptions, failed jobs, integration failures
Security: Failed login spikes, unusual access patterns, authorisation failures
Resources: Disk space, CPU, memory, connection pool exhaustion
Business metrics: Orders per hour, signups, conversion rates (anomalies indicate problems)

Alerting Discipline

Every alert should be actionable. If you receive an alert and the correct response is "ignore it," the alert should not exist.

Critical: Requires immediate response (application down, data breach)
Warning: Requires response within hours (disk space 80%, error rate elevated)
Info: Review during business hours (deployment completed, backup finished)

Alert fatigue is real. A team that receives 50 alerts daily stops responding to any of them.

External Uptime Monitoring

Internal monitoring can fail silently (if the monitoring server is down, who monitors the monitor?). External services check your application from outside your infrastructure:

HTTP checks from multiple geographic locations
SSL certificate expiry monitoring
DNS resolution monitoring
Transaction monitoring (simulate login, checkout, critical paths)

Security Testing

Security testing happens throughout the development lifecycle, not as a final gate before release. The naive approach runs a vulnerability scanner once a year. The robust approach integrates security testing into daily development.

Static Application Security Testing (SAST)

Analyse source code for security vulnerabilities without executing the application. Fast, runs in CI/CD, catches common patterns.

Tools: Semgrep, SonarQube, PHPStan (with security rules), ESLint security plugins. Focus on injection vulnerabilities, hardcoded secrets, insecure functions.

Dynamic Application Security Testing (DAST)

Test the running application from the outside. Finds vulnerabilities that only manifest at runtime, like misconfigurations and authentication flaws.

Tools: OWASP ZAP, Burp Suite, Nuclei. Run against staging environments. Authenticate as different user roles to test authorisation.

Dependency Scanning

Check all dependencies against known vulnerability databases. Run on every build. Block deployments with critical vulnerabilities.

Tools: Snyk, Dependabot, npm audit, Composer audit, Trivy for containers. Integrate with PR workflow for immediate feedback.

Penetration Testing

Manual testing by security professionals who think like attackers. Finds business logic flaws and chained vulnerabilities that automated tools miss.

Frequency: Annually for most applications. After major changes to authentication, authorisation, or payment flows. Required for PCI-DSS and some regulatory compliance.

Security testing is not a checkbox. Finding no vulnerabilities means either your application is unusually secure, or your testing isn't thorough enough. The latter is more common.

Incident Response

Incidents will happen. The question is not if but when. Preparation determines whether an incident is a controlled recovery or a chaotic scramble.

Preparation

Before incidents happen, document everything: how to access production systems, who to contact, communication channels, escalation paths. Keep offline copies. Practice the runbook. When the server is on fire is not the time to figure out where the fire extinguisher is.

Response

When an incident occurs: Assess scope and severity. Contain the damage (isolate compromised systems). Communicate with affected parties. Document everything in real-time. One person coordinates; others execute. Don't destroy forensic evidence in the rush to fix.

Recovery and Learning

Post-incident: Conduct blameless post-mortem. Identify root cause (the real one, not the proximate trigger). Implement preventive measures. Update procedures. Share learnings. An incident you don't learn from is an incident waiting to repeat.

Security Incident Specifics

Security incidents require additional considerations beyond operational incidents:

Preserve evidence: Don't wipe compromised systems until forensics are complete. Image drives before restoration.
Assume lateral movement: If one system is compromised, assume attackers have accessed everything that system could access.
Credential rotation: Rotate all secrets that the compromised system could access. All of them.
Legal and regulatory notification: Data breaches often require notification within 72 hours. Know your obligations before the incident.
Communication: Prepare holding statements. Coordinate with legal. Don't speculate publicly about scope until you know.

Disaster Recovery

Disaster recovery is the plan for when everything fails. Not a server, not a service, but the entire system. The naive approach assumes it won't happen. The robust approach knows it will and has tested the recovery.

Recovery Objectives

RTO (Recovery Time Objective): How quickly must you recover? Four hours? Twenty-four hours? One week? This determines your infrastructure investment.

RPO (Recovery Point Objective): How much data can you afford to lose? One hour? One day? This determines your backup frequency.

These are business decisions with technical implications. A 15-minute RTO requires hot standby infrastructure. A 24-hour RTO can use cold backups. Cost scales accordingly.

Recovery Procedures

Document step-by-step recovery for:

Database corruption or loss
Complete server failure
Ransomware (restore from clean backups)
Cloud provider regional outage
Complete site restoration from scratch

Test these procedures regularly. A plan you've never tested isn't a plan; it's wishful thinking.

Infrastructure as code makes disaster recovery faster and more reliable. If your infrastructure is defined in Terraform or CloudFormation, you can recreate it from scratch. If it was configured manually over years, good luck documenting every setting.

Security Headers

HTTP security headers are low-effort, high-impact defences. They're set once in server configuration and provide protection against common attacks. The naive approach doesn't set them. The robust approach treats them as baseline requirements.

Header	Purpose	Recommended Value
Strict-Transport-Security	Enforce HTTPS, prevent downgrade attacks	max-age=31536000; includeSubDomains; preload
Content-Security-Policy	Prevent XSS, control resource loading	Define allowed sources per resource type
X-Content-Type-Options	Prevent MIME sniffing	nosniff
X-Frame-Options	Prevent clickjacking	DENY or SAMEORIGIN
Referrer-Policy	Control referrer information leakage	strict-origin-when-cross-origin
Permissions-Policy	Control browser features (camera, mic, geolocation)	Disable unused features

Content-Security-Policy deserves particular attention. A properly configured CSP prevents most XSS attacks by controlling which scripts can execute. Start with a restrictive policy and loosen as needed, rather than starting permissive. You can test your current headers using Mozilla Observatory.

What You Get

Security and operations work is invisible when done well. You notice its absence, not its presence. Here's what properly implemented security and operations provides:

✓

Tested backups with verified restoration Not "we think we can restore" but "we did restore on this date and it took 47 minutes"
✓

Proactive monitoring with actionable alerts Problems detected before users report them, alerts that require response not dismissal
✓

Defence in depth against common attacks OWASP Top 10 mitigations applied consistently across every endpoint and form
✓

Audit trails for compliance and investigation Complete record of who did what, when, searchable and tamper-evident
✓

Incident response procedures that have been tested Documented runbooks, clear roles, practised execution
✓

Recovery capability with known objectives RTO and RPO defined by business need, infrastructure built to meet them

The business outcome: fewer 3am emergencies, faster recovery when things do go wrong, confidence that customer data is protected, and evidence of due diligence when auditors come calling.

Secure Your Applications

We implement security and operations that keep your applications safe and running. Backups that work, monitoring that alerts, systems hardened against attack. The fundamentals that prevent disasters.

Let's talk about security and operations →

Related: Infrastructure →

Development

Systems