Skip to main content
SEMastery
Architectureintermediate

12 Essential Distributed System Design Patterns Every Architect Should Know

A friendly guide to 12 distributed system design patterns in .NET — saga, CQRS, outbox, circuit breaker, retry, sidecar, and more, with diagrams and code.

14 min readUpdated May 14, 2026

A big kitchen with many cooks

Picture a busy wedding kitchen in India. One person alone cannot cook for 500 guests. So the work is split. One cook makes rotis, another fries the samosas, a third stirs the dal, and someone else only handles sweets. They shout orders across the room, pass plates back and forth, and somehow a full meal reaches every guest.

Now think about what can go wrong. The roti cook might run out of flour. The sweet cook might fall sick. A waiter might drop a tray. If the kitchen has no good habits, one small problem turns into chaos and dinner is ruined.

A distributed system is exactly this kitchen, but for software. Instead of one big program, you have many small programs (services) running on different machines, talking to each other over a network. Each one can fail or slow down at any moment.

Design patterns are the good kitchen habits. They are tried-and-tested ways to keep the whole meal coming, even when one cook stumbles. Below are twelve patterns every architect should know. We will keep the language simple, use pictures, and show small C# examples.

Figure 1: A distributed system is many small services talking over a network. Any one of them can fail at any time.

Why these patterns even exist

In a single small app, life is easy. One database, one transaction, and if something breaks you see one error. The moment you split work across machines, three hard truths appear.

The network is not reliable. Messages get lost, delayed, or arrive twice. A service you call may be up, down, or painfully slow. And there is no single big transaction that can wrap a change across five services at once. Each pattern below is an answer to one of these truths.

Here is a quick map before we go deep.

GroupPatternsWhat problem they solve
ResilienceRetry, Circuit Breaker, Bulkhead, TimeoutKeep working when a service fails or slows down
Data consistencySaga, Outbox, IdempotencyKeep data correct across many services
Read/write shapeCQRS, Event Sourcing, Cache-AsideMake reads fast and history clear
StructureAPI Gateway, Sidecar / AmbassadorOrganise traffic and shared concerns

1. Retry — try again, gently

A network call sometimes fails for a tiny moment, then works fine the next second. This is called a transient fault. The Retry pattern simply tries the call again a few times before giving up.

The trick is to wait a little longer between each try. This is called exponential backoff. It stops a struggling service from being hammered while it is trying to recover. Adding a small random delay (called jitter) stops many clients from retrying at the exact same instant.

// .NET 10 has Microsoft.Extensions.Http.Resilience built on Polly.
// You add a resilience handler to a typed HttpClient.
builder.Services
    .AddHttpClient<PaymentClient>()
    .AddStandardResilienceHandler(options =>
    {
        options.Retry.MaxRetryAttempts = 3;
        options.Retry.UseJitter = true; // spread out the retries
        options.Retry.BackoffType = DelayBackoffType.Exponential;
    });

Only retry calls that are safe to repeat. Reading data is safe. Charging a card twice is not — for that you need Idempotency, which we meet later.

2. Circuit Breaker — stop knocking on a closed door

Imagine knocking on a friend's door. If nobody answers after many knocks, you stop and come back later instead of knocking forever. The Circuit Breaker does the same for a service.

It watches the failures. When too many calls fail, it "opens" and fails fast for a while, without even trying the broken service. This gives the sick service time to heal. After a cooldown, it lets a few test calls through. If they pass, it closes again and traffic flows.

Figure 2: The three states of a circuit breaker. It moves between them based on how many calls succeed or fail.

Retry and Circuit Breaker are best friends. Retry handles a quick blip; the breaker steps in when the blip becomes a real outage, so you stop wasting time and threads on a service that is clearly down.

3. Timeout — never wait forever

A slow call is often worse than a failed one. A failed call frees up quickly. A slow call holds a thread, holds memory, and makes the caller slow too. Soon the slowness spreads everywhere.

A Timeout sets a clear limit: "If you do not answer in 2 seconds, I give up." In .NET you pass a CancellationToken so the wait can be cut short cleanly.

public async Task<Quote> GetQuoteAsync(CancellationToken ct)
{
    using var cts = CancellationTokenSource
        .CreateLinkedTokenSource(ct);
    cts.CancelAfter(TimeSpan.FromSeconds(2)); // hard limit
 
    // If the call takes too long, it is cancelled and throws.
    return await _httpClient.GetFromJsonAsync<Quote>("/quote", cts.Token);
}

4. Bulkhead — keep one leak from sinking the ship

A big ship is built with separate sealed sections called bulkheads. If one section floods, the water cannot spread, and the ship stays afloat.

In software, the Bulkhead pattern keeps resources separate so that one busy or broken part cannot use up everything. For example, you give the "reports" feature its own pool of connections. If reports go crazy and use all their pool, the "checkout" feature still has its own pool and keeps working.

Bulkhead isolation

Shared pool
Reports pool
Checkout pool
Search pool

Steps

1

Shared pool

Without bulkheads, one feature can drain it all

2

Reports pool

Isolated limit just for reports

3

Checkout pool

Stays healthy even if reports fail

4

Search pool

Its own limit, its own safety

Each feature gets its own pool. One greedy feature cannot starve the others.

5. API Gateway — one front door

If every client had to know the address of every service, life would be a mess. The API Gateway is a single front door. Clients talk to the gateway, and the gateway forwards each request to the right service.

The gateway is also a handy place for shared jobs: checking the login token, rate limiting, logging, and combining several service calls into one tidy response. In .NET you often build this with YARP, Microsoft's reverse proxy.

Figure 3: An API Gateway gives clients one address and routes each request to the correct service behind it.

6. Sidecar and Ambassador — a helper that travels along

A Sidecar is like the little carriage attached to a motorbike. It rides next to your service in its own container, handling shared chores such as logging, metrics, secrets, and network rules. Your service code stays clean because these chores live in the sidecar.

The Ambassador is a special sidecar that handles outgoing network calls for your service. It can add retries, timeouts, and the circuit breaker for you, so every language and every service gets the same safe behaviour without writing it again. Service meshes like Linkerd and Istio work this way.

7. Cache-Aside — keep a copy of hot answers

Looking something up over the network again and again is slow. The Cache-Aside pattern keeps a copy of popular data close by (often in Redis). The app checks the cache first. If the answer is there (a "hit"), great. If not (a "miss"), it reads the real source, stores a copy, and returns it.

public async Task<Product?> GetProductAsync(int id)
{
    var key = $"product:{id}";
    var cached = await _cache.GetStringAsync(key);
    if (cached is not null)
        return JsonSerializer.Deserialize<Product>(cached); // hit
 
    var product = await _repository.FindAsync(id);            // miss
    if (product is not null)
    {
        await _cache.SetStringAsync(key, JsonSerializer.Serialize(product),
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
            });
    }
    return product;
}

Always give cached items an expiry, and clear the key when the real data changes. Stale data is a common bug.

8. CQRS — separate the writing from the reading

CQRS stands for Command Query Responsibility Segregation. It is a long name for a simple idea: split the part that changes data (commands) from the part that reads data (queries).

Why bother? Reads and writes often want different shapes and different speeds. You may have one write per second but a thousand reads. With CQRS you can give reads their own fast, denormalised model — sometimes even a separate read database — while writes stay strict and safe.

Figure 4: CQRS sends writes and reads down separate paths, each tuned for its own job.

You do not need a library for this. The pattern made MediatR popular, but MediatR is now commercially licensed, so check the terms first. Plain command and query handler classes work perfectly well.

9. Event Sourcing — store the story, not just the ending

Most apps save only the current state. A bank balance shows ₹500, but you cannot see how it got there. Event Sourcing instead saves every change as an event: "deposited ₹700", "withdrew ₹200". The current balance is the sum of all events.

This gives you a perfect history, easy auditing, and the power to rebuild any past state. It pairs naturally with CQRS: writes append events, and reads are built from those events into a handy shape.

ApproachWhat is storedCan see history?Good for
State-basedThe latest value onlyNoSimple apps, CRUD
Event SourcingEvery change as an eventYes, full historyAudits, finance, complex flows

The cost is more complexity, so use it where history and audit really matter, not everywhere.

10. Saga — many small steps with an undo button

Here is the hardest problem in distributed systems. An order needs three services: reserve stock, charge the card, and book delivery. There is no single big transaction across all three. So what if the card charge fails after stock is already reserved?

The Saga pattern answers this. It breaks the work into a chain of small local steps. Each step does its own little transaction. If a later step fails, the saga runs compensating steps that undo the earlier ones — release the stock, cancel the booking. It is like a careful "undo" trail.

Order saga with compensation

Reserve stock
Charge card
Book delivery
Done

Steps

1

Reserve stock

Local step 1; undo = release stock

2

Charge card

Local step 2; if it fails, release stock

3

Book delivery

Local step 3; undo = cancel booking

4

Done

All steps succeeded, order confirmed

Each forward step has an undo. If payment fails, earlier steps are reversed.

There are two styles. In choreography, each service listens for events and reacts on its own — no boss. In orchestration, one coordinator tells each service what to do next. Tools like MassTransit support sagas, but note MassTransit is now commercially licensed too, so weigh that before adopting it.

11. Outbox — never lose a message

Here is a sneaky bug called the dual-write problem. Your service saves an order to the database, then publishes an "OrderPlaced" message to the bus. What if the save works but the publish fails (or the other way round)? Now the two are out of sync, and money or stock can go wrong.

The Outbox pattern fixes this. Inside the same database transaction that saves the order, you also write the message into an "outbox" table. Both succeed or both fail together. A separate background worker then reads the outbox and publishes the messages to the bus, marking each as sent.

Figure 5: The outbox saves the message in the same transaction as the data, then a worker publishes it reliably.
// Both the order and the outbox row save together, atomically.
public async Task PlaceOrderAsync(Order order)
{
    _db.Orders.Add(order);
    _db.OutboxMessages.Add(new OutboxMessage
    {
        Type = "OrderPlaced",
        Payload = JsonSerializer.Serialize(order),
        OccurredOn = DateTime.UtcNow
    });
    await _db.SaveChangesAsync(); // one transaction, no lost message
}

This gives at-least-once delivery: the message will arrive, but maybe more than once. That is exactly why the next pattern matters.

12. Idempotency — safe to do twice

Because messages can arrive twice, and clients can retry, the same request might reach your service more than once. Idempotency means doing the same thing twice has the same effect as doing it once.

The usual trick is an idempotency key. The client sends a unique key with the request. Your service remembers keys it has already handled. If the same key comes again, it returns the first result instead of charging the card a second time.

public async Task<Result> ChargeAsync(string idempotencyKey, decimal amount)
{
    // Have we seen this key before?
    var existing = await _db.Charges
        .FirstOrDefaultAsync(c => c.IdempotencyKey == idempotencyKey);
    if (existing is not null)
        return existing.Result; // safe replay, no double charge
 
    var result = await _gateway.ChargeAsync(amount);
    _db.Charges.Add(new Charge { IdempotencyKey = idempotencyKey, Result = result });
    await _db.SaveChangesAsync();
    return result;
}

Idempotency, Outbox, and Retry form a powerful team. Retry keeps trying, Outbox makes sure nothing is lost, and Idempotency makes sure the extra tries do no harm.

How the patterns work together

No single pattern is enough on its own. Real systems combine them. A typical resilient call to another service stacks several layers: a timeout inside a retry, inside a circuit breaker, going through an ambassador.

Layers of a single resilient call

Timeout
Retry
Circuit Breaker
Ambassador

Steps

1

Timeout

Give up if the call is too slow

2

Retry

Try transient failures a few times with backoff

3

Circuit Breaker

Stop calling a service that is clearly down

4

Ambassador

A sidecar applies all of this uniformly

A safe network call wraps several patterns together, from the inside out.

And a typical write path for data uses another stack: save with the Outbox, deliver with Saga steps, and protect each step with Idempotency. Pick the patterns your problem actually needs. Adding all twelve to a tiny app only buys you complexity.

Choosing wisely

A good architect does not chase patterns for fun. Each one adds moving parts, and moving parts can break. Use this rough guide.

If you need to...Reach forWatch out for
Survive a flaky networkRetry + Circuit Breaker + TimeoutRetrying unsafe writes
Keep data correct across servicesSaga + Outbox + IdempotencyForgetting the undo steps
Make reads very fastCache-Aside + CQRSStale or out-of-date data
Keep full historyEvent SourcingExtra complexity everywhere
Tidy traffic and shared choresAPI Gateway + SidecarThe gateway becoming a bottleneck

Start small. A modular monolith with a few of these patterns is often better than a swarm of microservices using all twelve badly.

Quick recap

  • A distributed system is many small programs on different machines talking over a network. Any part can fail, slow down, or repeat a message.
  • Retry, Circuit Breaker, Timeout, and Bulkhead keep your app working when a service is flaky, slow, or down.
  • API Gateway gives clients one front door; Sidecar and Ambassador move shared chores like logging and resilience out of your service code.
  • Cache-Aside keeps hot data close for speed; always set an expiry and clear it on change.
  • CQRS splits writing from reading so each can be tuned; Event Sourcing stores every change so you keep the full story.
  • Saga breaks a big cross-service job into small steps with undo (compensation) when something fails.
  • Outbox saves the message in the same transaction as the data so nothing is lost; Idempotency makes repeated requests safe.
  • MediatR and MassTransit are now commercially licensed. These patterns are ideas, not libraries — you can build them in plain C#.
  • Pick only the patterns your problem needs. Each one adds complexity, so start small and grow.

References and further reading

Related Posts