Most businesses don't run on one system. They have a CRM, an accounting package, maybe an ERP, various SaaS tools, and probably some legacy system nobody wants to touch. These systems need to talk to each other, and making that work reliably is harder than it looks. For external service connections, see our API integrations guide.
Integration patterns are proven approaches to connecting systems. The right pattern depends on what you're connecting, how data needs to flow, and what happens when things go wrong. This page covers the patterns we use, the trade-offs involved, and the implementation details that separate working integrations from fragile ones.
Why Integration Is Hard
Integration projects fail for predictable reasons. Understanding these failure modes is the first step to avoiding them.
Different data models
Your CRM's idea of a "customer" doesn't match your accounting system's "account." Fields have different names, different formats, different validation rules. Translation is always needed.
Different timing expectations
Some systems expect real-time updates. Others batch process overnight. Some can handle delays; others break if data is stale.
Different reliability guarantees
What happens when a system is down? Does the other system retry? Queue updates? Fail silently? Each system handles failures differently.
Different change velocities
One system updates monthly. Another changes APIs without warning. Your integration has to handle both stable and volatile dependencies.
These differences compound. A customer record in your CRM might map to three different entities in your ERP: a contact, a company, and a billing account. Each has its own identifier, its own update rules, its own validation constraints. The integration has to understand all of this and keep everything in sync.
Integration Architecture Patterns
Before choosing how to connect systems, you need to understand the topology options. Each pattern has different scaling characteristics, failure modes, and maintenance burdens. For a comprehensive catalogue of these patterns, the Enterprise Integration Patterns reference remains the canonical resource.
Point-to-Point
System A talks directly to System B. The simplest possible integration: one system calls another's API or writes to its database. There's no intermediary, no abstraction layer, no message queue.
Connection count: With N systems, point-to-point requires N*(N-1)/2 connections for full mesh connectivity. Three systems need three connections. Ten systems need forty-five. The complexity grows quadratically.
Point-to-point works when you have exactly two systems and no plans to add more. It fails when you try to scale it. Each new system multiplies the integration burden. Each system needs to understand the data formats of every other system. Changes in one system ripple through all its connections.
We use point-to-point for simple, stable, two-system scenarios. A marketing automation tool pushing leads to a CRM. An e-commerce platform sending orders to a fulfilment service. Isolated, well-defined data flows with clear ownership.
Hub and Spoke
All systems connect to a central hub. The hub handles translation between data formats, routes messages to the right destinations, and provides a single point of monitoring. New systems only need one connection: to the hub.
The hub becomes the canonical representation of your data model. System A sends a customer update in its native format. The hub translates it to the canonical format, then translates it again to System B's format. Neither A nor B needs to know anything about the other.
Hub and spoke is the pattern we recommend for most mid-size integration projects. Five to fifteen systems, moderate complexity, a team that can maintain the hub. The initial investment in building the hub pays off quickly as you add more systems.
Event Bus / Message Broker
Systems publish events to a message broker. Other systems subscribe to the events they care about. Publishers don't know who's listening. Subscribers don't know where messages originate. The broker handles routing, persistence, and delivery guarantees.
This is publish-subscribe (pub/sub) architecture. When a customer record changes in your CRM, it publishes a "customer.updated" event containing the changed data. Your marketing system subscribes to customer events. Your billing system subscribes to customer events. Neither knows about the other. The CRM doesn't know who's listening.
Event design matters: Events should contain enough data for subscribers to act without calling back to the source. If your "order.created" event only contains an order ID, every subscriber has to fetch the order details. This creates load on the source system and tight coupling you were trying to avoid.
Event-driven architecture excels when you have many systems that need to react to the same business events. It handles scale well (add more subscribers without changing publishers), isolates failures (one slow subscriber doesn't block others), and supports temporal decoupling (subscribers can process events when they're ready).
The complexity cost is real. You need to design your event schema carefully. You need to handle event ordering (or accept that events may arrive out of order). You need infrastructure to run the message broker. You need tooling to trace events through the system.
ETL (Extract, Transform, Load)
Batch-oriented data movement. Extract data from source systems, transform it to the target format, load it into the destination. Runs on a schedule rather than in response to individual changes.
ETL is the right pattern when:
- Real-time sync isn't required (reporting, analytics, data warehousing)
- Source systems can't handle the load of real-time queries
- You need to aggregate data across multiple sources before loading
- The transformation logic is complex and benefits from batch processing
ETL jobs typically run during off-peak hours. They extract changed records since the last run (delta extraction), apply transformation logic, and bulk-load into the destination. The batch nature allows optimisations that aren't possible with row-by-row processing.
The main limitation is latency. Data in the destination is always stale by at least one batch interval. For some use cases (monthly financial reports), this is fine. For others (inventory availability), it's unacceptable.
Middleware and Message Brokers
The choice of middleware significantly affects what's possible in your integration architecture. Different brokers offer different guarantees around ordering, durability, and delivery.
Message Queue Semantics
Understanding delivery guarantees is fundamental to reliable integration design.
| Guarantee | Behaviour | When to use |
|---|---|---|
| At-most-once | Messages may be lost. Never delivered more than once. | Metrics, logs, non-critical notifications |
| At-least-once | Messages never lost. May be delivered multiple times. | Most business data (with idempotent consumers) |
| Exactly-once | Each message delivered exactly once. | Financial transactions (often simulated with at-least-once + deduplication) |
At-least-once is the practical default for most integrations. It's achievable with standard message brokers (RabbitMQ, Amazon SQS, Azure Service Bus). The trade-off is that your consumers must be idempotent: processing the same message twice should produce the same result as processing it once.
Exactly-once delivery is hard. What looks like exactly-once is usually at-least-once combined with deduplication at the consumer. The broker assigns each message a unique ID. The consumer tracks which IDs it has processed and ignores duplicates. This pushes the complexity from the broker to the consumer, but it's reliable.
Broker Selection Criteria
Different message brokers suit different scenarios. The choice depends on your throughput requirements, ordering guarantees, and operational capabilities.
RabbitMQ
Mature, well-documented, supports multiple messaging patterns (queues, topics, routing). Good for moderate throughput (tens of thousands of messages per second). Strong ordering guarantees within a single queue. Clustering for high availability. We use RabbitMQ for most business integration projects where reliability matters more than extreme throughput.
Apache Kafka
Designed for high-throughput event streaming. Millions of messages per second. Persistent log-based storage means consumers can replay historical events. Partitioned topics for parallel processing (but ordering only guaranteed within partitions). Operationally complex. We use Kafka when throughput demands it or when event replay capability is a requirement.
Cloud-native options
Amazon SQS, Azure Service Bus, Google Pub/Sub. Managed services that reduce operational burden. Generous free tiers for moderate usage. Trade-off is less control over configuration and potential vendor lock-in. We use these when the project is already committed to a specific cloud platform.
Data Synchronisation Strategies
Choosing how to keep data in sync across systems is separate from choosing the integration architecture. The sync strategy determines what data flows, when it flows, and who owns conflicting changes.
Sync Direction
One-way sync
Data flows from source to destination only. The destination is read-only for that data. Simpler because there's no conflict resolution.
Example: Customer data from CRM synced to email marketing tool.
Two-way sync
Changes in either system propagate to the other. Complex because changes can happen simultaneously and conflict.
Example: Calendar sync between two systems where events can be created in either.
Primary-replica
One system is authoritative. Changes happen there, then propagate to others. Other systems can read but not modify.
Example: Product catalogue maintained in ERP, synced to website and POS systems.
Merge replication
Changes from multiple systems are merged. Conflicts resolved by rules or human review. Complex but preserves all changes.
Example: Field sales reps working offline, syncing back to central CRM.
Eventual Consistency
In distributed systems, strong consistency (all systems see the same data at the same time) comes at the cost of availability and latency. Most integration scenarios accept eventual consistency: all systems will converge to the same state, but not instantaneously.
Eventual consistency means living with temporary inconsistencies. A customer's address might be updated in the CRM at 10:00. The billing system might not see that change until 10:05. During those five minutes, the two systems disagree. This is usually acceptable. The alternative (locking both systems until they synchronise) would make them slower and more fragile.
The CAP theorem in practice: You can have consistency, availability, and partition tolerance. Pick two. For cross-system integration, we almost always choose availability and partition tolerance, accepting eventual consistency as the trade-off.
Conflict Resolution Strategies
When two systems can modify the same data, conflicts are inevitable. The question is how to detect and resolve them.
Last-write-wins
Compare timestamps. The most recent change wins. Simple to implement, but requires synchronised clocks and loses data when concurrent changes affect different fields of the same record. A CRM user updates a phone number at 10:00:00.000. An ERP user updates a billing address at 10:00:00.001. The entire CRM record gets overwritten.
Field-level merge
Track changes at field granularity. If CRM changes phone and ERP changes address, merge both changes. Only conflict when both systems change the same field. More complex tracking, but preserves more data.
Source-of-truth by field
Define which system owns which fields. CRM owns contact information. ERP owns financial data. Changes to non-owned fields are ignored or flagged. Clear rules, but requires upfront agreement and may not match reality.
Manual resolution queue
Detect conflicts and queue them for human review. Preserves all data but creates operational burden. Appropriate for high-value records where data loss is unacceptable.
We typically recommend source-of-truth by field for most business integrations. It requires upfront analysis to determine which system should own which data, but it eliminates most conflict scenarios and provides clear rules that both technical and business stakeholders can understand.
Data Transformation
Transformation is where most integration complexity lives. Source data never matches destination requirements exactly. The transformation layer handles the translation. Proper data modelling in your target system makes transformation rules clearer and validation more reliable.
Field Mapping
The simplest transformation: source field X maps to destination field Y. Straightforward when semantics match, even if names differ.
Real-world complications:
- Cardinality differences: Source has one address field. Destination has separate fields for street, city, postcode. You need parsing logic.
- Composite keys: Source identifies customers by email. Destination uses a compound key of account number and contact ID. You need lookup logic.
- Optional vs required: Source allows null values. Destination requires values. You need default logic.
- Length limits: Source allows 500-character notes. Destination allows 255. You need truncation logic (and a decision about what to do with the lost data).
Format Conversion
Data types that look the same often aren't. Dates are the classic example.
| Source | Destination | Challenge |
|---|---|---|
| 2024-01-15 | 15/01/2024 | Format conversion only |
| 2024-01-15T14:30:00Z | 2024-01-15T09:30:00-05:00 | Timezone conversion required |
| 1705330200 | 15-Jan-2024 | Unix timestamp to human-readable |
| "January 15th, 2024" | 2024-01-15 | Natural language parsing |
Currency, phone numbers, and addresses present similar challenges. A UK phone number might be stored as "07700 900000", "+447700900000", or "447700900000". Your transformation layer needs to normalise these to a consistent format.
Aggregation and Splitting
One record in source becomes multiple in destination (or vice versa). An order in your e-commerce system might create:
- One invoice header in accounting
- Multiple invoice line items
- One or more shipment records in logistics
- Customer credit updates if loyalty points apply
Splitting requires careful sequencing. The invoice header must exist before line items reference it. Foreign key relationships in the destination constrain the order of operations.
Aggregation is the reverse. Multiple source records become one destination record. Daily sales transactions might aggregate into a single revenue entry in the general ledger. The transformation needs to accumulate, validate, and flush at appropriate boundaries.
Enrichment
Adding data during transformation. Common enrichment patterns:
- Geocoding: Convert addresses to latitude/longitude for mapping or distance calculations.
- Currency conversion: Apply exchange rates to convert amounts to a base currency.
- Reference data lookup: Resolve codes to descriptions, or vice versa.
- Calculated fields: Derive values from other fields (age from birthdate, status from conditions).
Enrichment adds external dependencies to your integration. If the geocoding service is down, what happens? The answer needs to be explicit: fail the record, queue for retry, proceed with empty coordinates.
Validation
Source data may not meet destination requirements. Validate before loading. Decide upfront how to handle failures.
Validation is documentation
Your validation rules encode business knowledge about what valid data looks like. Document them. When the rules change, the integration changes. When you're debugging data quality issues, the validation rules are the first place to look.
Error Handling and Recovery
Integrations fail. Networks drop. Services timeout. Data violates constraints. The question isn't whether failures happen, but how the system responds.
Retry Strategies
Transient failures often resolve themselves. A momentary network glitch, a service restarting, a database failover. Retry with increasing delays.
Exponential backoff
Double the delay between each retry. First retry after 1 second, then 2, then 4, then 8. Prevents overwhelming a recovering service. Add jitter (random variation) to prevent thundering herd when multiple clients retry simultaneously.
Retry budgets
Limit total retries to prevent infinite loops. After N attempts, give up and escalate. The right N depends on the failure mode: network issues might resolve in 3 retries; a malformed record will never succeed.
Circuit breakers
If a downstream service fails repeatedly, stop trying temporarily. The circuit "opens" after N failures, rejecting requests immediately. After a timeout, allow one test request through. If it succeeds, "close" the circuit and resume normal operation. This prevents cascading failures and gives failing services time to recover.
Dead Letter Queues
When a message can't be processed after all retries, it goes to the dead letter queue (DLQ). The DLQ is a holding area for failed messages, preserving them for investigation.
DLQ handling should include:
- Alerting: Notify when messages arrive in the DLQ. A growing DLQ is a symptom of a problem.
- Visibility: Tools to inspect DLQ contents, understand why messages failed, and identify patterns.
- Replay: Ability to reprocess DLQ messages after fixing the underlying issue.
- Expiry: Policy for how long messages stay in the DLQ before archiving or deletion.
Partial Success Handling
When processing a batch of 100 records and 3 fail, what happens to the other 97? The answer depends on business requirements.
| Strategy | Behaviour | When to use |
|---|---|---|
| All-or-nothing | If any record fails, roll back the entire batch | Financial transactions requiring consistency |
| Continue on error | Process 97, report 3 failures | Tolerant systems where partial data is useful |
| Quarantine and continue | Process 97, quarantine 3 for review | Most business data sync scenarios |
Monitoring and Alerting
Integrations need visibility. Without monitoring, you only discover failures when users complain.
-
Throughput metrics Records processed per minute. Sudden drops indicate problems.
-
Latency metrics Time from source change to destination update. Increasing latency is a warning sign.
-
Error rates Percentage of failed operations. Set thresholds that trigger alerts.
-
Queue depths Messages waiting to be processed. Growing queues indicate consumers can't keep up.
-
Data freshness Timestamp of most recent successful sync. Stale data means something is stuck.
Alert on anomalies, not just thresholds. A 1% error rate might be normal. A 1% error rate when yesterday it was 0.1% is worth investigating.
Real-Time vs Batch
The choice between real-time and batch integration affects architecture, infrastructure, and operational complexity. Neither is inherently better; each suits different requirements.
Real-time
Data syncs immediately (or nearly so). Required when freshness matters: inventory levels, order status, security events.
Challenges: Handling failures gracefully, managing spikes, ensuring ordering.
Batch
Data syncs on a schedule: hourly, daily, weekly. Appropriate when slight delays are acceptable and consistency matters.
Challenges: Handling large volumes efficiently, managing long-running processes.
Near-real-time
Not immediate, but faster than batch. Often implemented as frequent small batches (every few minutes) or delayed event processing.
Use case: When real-time is overkill but batch is too slow.
Choosing the Right Approach
Start by understanding the business requirement for data freshness. Ask: if this data is 5 minutes old, does it matter? What about 1 hour? 1 day?
| Data type | Typical freshness requirement | Recommended approach |
|---|---|---|
| Inventory levels | Seconds | Real-time |
| Order status | Minutes | Real-time or near-real-time |
| Customer contact info | Hours | Near-real-time or frequent batch |
| Financial reporting | Daily | Batch |
| Data warehouse | Daily to weekly | Batch |
Don't default to real-time. It's more complex to build, harder to debug, and requires more infrastructure. If batch meets the business requirement, batch is the better choice.
Legacy System Integration
Legacy systems are the hard cases. They often predate modern integration practices. They may lack APIs entirely. Their data models are undocumented. Their behaviour is encoded in decades of accumulated business logic. And they contain critical data that can't be lost. For comprehensive guidance on modernising these systems, see our legacy migration guide.
The reality of legacy systems
- No API (file exports only, or nothing at all)
- Undocumented data formats and business rules
- Fragile, unmaintained code that nobody wants to touch
- Critical business data that has nowhere else to live
- Users who depend on specific behaviours
Integration Approaches
File-based integration
Export data from the legacy system to files (CSV, fixed-width, XML). Import those files into the new system. This is the lowest-risk approach: you're not modifying the legacy system or querying it directly. The trade-off is batch timing and the need to handle file transfer, parsing, and cleanup.
Database-level integration
Connect directly to the legacy database and query the tables. Risky because you're bypassing the application layer. The database schema may not reflect business semantics accurately. Triggers and stored procedures may depend on application behaviour. Foreign keys might be enforced in application code, not the database. Proceed with extensive schema analysis and testing.
Screen scraping / UI automation
Automate the legacy system's user interface. Enter data into screens, read data from displays. This is the approach of last resort. It's fragile (any UI change breaks the integration), slow (limited to human-speed interaction), and hard to debug. But when there's no API, no file export, and no database access, it may be the only option.
Wrapper service (strangler fig pattern)
Build an API layer in front of the legacy system. New consumers talk to the API. The API translates to legacy operations. Over time, migrate functionality from legacy to the wrapper until the legacy system can be retired. This is a longer-term strategy but positions you for eventual migration.
Legacy Integration Best Practices
- Document as you go: Legacy systems are poorly documented. Every piece of knowledge you gain during integration should be captured.
- Test with production data: Test data doesn't capture the edge cases accumulated over years of production use.
- Plan for surprises: You will discover undocumented behaviour. Budget time for investigation.
- Maintain the legacy system's ability to function: Until migration is complete, the legacy system is still critical. Don't break it.
- Consider data quality: Legacy data often has quality issues. Address them during integration rather than propagating them.
Integration Testing
Integration testing validates that systems work together correctly. It's distinct from unit testing (individual components) and end-to-end testing (complete user flows). Integration tests verify the connections between systems.
Testing Strategies
Contract testing
Verify that systems adhere to agreed interfaces. Consumer defines expectations. Provider verifies it meets them. Tools like Pact formalise the contract and automate verification. Catches breaking changes before deployment.
Component testing with test doubles
Test your integration code against mock or stub implementations of external systems. Fast, deterministic, and independent of external availability. Doesn't catch issues caused by real system behaviour.
Sandbox environment testing
Many SaaS systems provide sandbox environments for testing. Real API behaviour without affecting production data. Essential for validating integration before go-live.
Production traffic replay
Capture real integration traffic. Replay it against new integration code. Compare results. Catches regressions and edge cases that synthetic tests miss.
What to Test
- Happy path: Normal data flows correctly from source to destination.
- Boundary conditions: Empty strings, maximum lengths, unicode, special characters.
- Error handling: What happens when the destination is unavailable? When data validation fails? When retries exhaust?
- Idempotency: Processing the same message twice produces the same result.
- Ordering: If ordering matters, verify that out-of-order messages are handled correctly.
- Performance: Can the integration handle expected throughput? What happens at peak load?
Test data management: Integration tests need representative data. Anonymised production data is ideal. Synthetic data should include edge cases you've observed in production. Maintain a library of problematic records that caused past failures.
API Design for Integration
When building systems that others will integrate with, design choices significantly affect how easy or hard those integrations will be.
-
Provide webhooks Push notifications when data changes. Saves polling and enables real-time sync. Include retry logic for failed deliveries.
-
Support bulk operations Allow retrieving and updating multiple records in one request. Essential for efficient sync. Batch endpoints drastically reduce network overhead.
-
Include timestamps Modification timestamps enable incremental sync (only get records changed since last sync). Also include created_at, updated_at, and deleted_at for soft deletes.
-
Use idempotency keys Allow retries without duplicate effects. The client provides a unique key; the server deduplicates. Critical for reliable integration.
-
Version the API Allow evolution without breaking existing integrations. Clear deprecation policies. Maintain old versions for a reasonable period.
-
Return stable identifiers External systems store your IDs. If IDs change, integrations break. Use UUIDs or immutable business keys.
-
Provide comprehensive error responses Include error codes, human-readable messages, and enough detail to diagnose problems. Integration developers will thank you.
What You Get
When we build integrations, we apply these patterns systematically. The result is infrastructure that actually works in production, not just in demos.
-
Failures handled gracefully Retries, circuit breakers, dead letter queues, and alerting built in from the start.
-
Data transformed correctly Field mapping, format conversion, and validation that handles real-world inconsistencies.
-
Visibility into operations Monitoring dashboards showing throughput, latency, error rates, and queue depths.
-
Isolation from external changes Abstraction layers that let external systems change without breaking your integration.
-
Legacy systems connected Even systems without APIs can be integrated using appropriate patterns.
-
Documented patterns Clear documentation of data flows, transformation rules, and error handling so your team can maintain the integration.
Systems that exchange data reliably, recover from failures automatically, and provide the visibility to diagnose problems when they occur.
Connect Your Systems
We build integrations that connect your systems reliably. CRMs, ERPs, accounting packages, legacy databases: connected with the right pattern for your timing and reliability needs. Data flowing where it needs to go, failures handled gracefully.
Let's talk about integrating your systems β