Skip to main content
SEMastery
DevOpsintermediate

Retries and Resilience in .NET with Polly and Microsoft Resilience

Learn retries, timeouts, and circuit breakers in .NET using Polly v8 and Microsoft.Extensions.Http.Resilience, with simple examples a beginner can follow.

12 min readUpdated March 17, 2026

When the shopkeeper is busy

Imagine you go to your favourite tea stall in the evening. You ask for one chai. The shopkeeper is busy and does not hear you. What do you do? You do not walk away forever. You wait a moment and ask again. Maybe you ask a third time. But you also know when to stop. If the shop is closed and the shutter is down, asking ten more times is pointless. You go to the next stall.

This is exactly how a good program should talk to other services on the network. Sometimes a call fails for a tiny reason, like a slow Wi-Fi moment. Asking again works. But if the other service is truly down, asking again and again only wastes time and makes things worse.

The skill of knowing when to retry, how long to wait, and when to give up is called resilience. In .NET we get this skill for free using two tools: Polly and Microsoft.Extensions.Http.Resilience. This article will teach you both in simple steps.

What can go wrong on a network

When your app calls another service over the internet, many small things can fail. Most of these failures are temporary. The fancy name for a temporary failure is a transient fault.

Common transient faults when one service calls another

A transient fault is like a phone call that drops because you went under a bridge. Calling back works. A non-transient fault is like dialling a number that does not exist. Calling back will never help. Resilience patterns are mostly about handling the first kind well.

Here are the faults you will meet most often:

FaultWhat it looks likeRetry helps?
TimeoutThe call takes too long and never answersOften yes
Connection droppedNetwork blip, socket closedOften yes
HTTP 503Service says "I am too busy right now"Yes, after a wait
HTTP 429"You are sending too many requests"Yes, after a wait
HTTP 404"That thing does not exist"No, retrying is pointless
HTTP 400"Your request is wrong"No, fix the request

The big lesson: retry only the faults that have a chance of fixing themselves.

Meet the three core patterns

Almost all resilience work comes down to three friendly patterns. Let us meet them one by one.

1. Retry

A retry simply tries the call again when it fails. But a naive retry can be dangerous. If you retry instantly, you might send a flood of calls all at once.

So good retries use two extra ideas:

  • Backoff: wait a little longer between each try. First wait 1 second, then 2, then 4. This is called exponential backoff.
  • Jitter: add a small random delay so that many clients do not all retry at the exact same moment. Without jitter, a thousand apps could retry together and slam the server.

How a retry with backoff behaves

Call fails
Wait 1s + jitter
Retry 1
Wait 2s + jitter
Retry 2

Steps

1

Call fails

First attempt errors

2

Wait

Backoff delay

3

Retry 1

Try again

4

Wait more

Longer delay

5

Retry 2

Last try

Each attempt waits a little longer, with a small random jitter added.

2. Timeout

A timeout says "if this call does not finish in X seconds, stop waiting." Without it, one stuck call can hold a thread forever and slowly freeze your whole app. There are two flavours:

  • Per-attempt timeout: limits each single try.
  • Total timeout: limits the whole operation, including all the retries added up.

3. Circuit breaker

A circuit breaker is the smartest of the three. It watches your calls. If too many fail in a short window, it trips and stops all calls for a while. This gives the struggling service room to breathe. After a rest, it lets one test call through. If that works, normal traffic resumes.

The three states of a circuit breaker

Think of the circuit breaker like the trip switch in your home. When there is a short circuit, the switch cuts power to protect the house. You do not keep flipping it back instantly. You wait, fix the issue, then turn it on.

Polly and Microsoft Resilience: who does what

There are two libraries you will hear about. They work together.

LibraryWhat it gives youWhen to reach for it
PollyThe core engine: retry, timeout, circuit breaker, fallback, rate limiterAny code, not only HTTP
Microsoft.Extensions.Http.ResilienceA neat wrapper that plugs Polly into HttpClient and DIWhen you make HTTP calls

Polly v8 was built as a joint effort between the Polly maintainers and Microsoft. So when you use the Microsoft package, you are still using Polly underneath. The Microsoft package just saves you from wiring things by hand.

How the pieces stack

Your HttpClient
Microsoft Resilience handler
Polly pipeline
Network

Steps

1

HttpClient

You call SendAsync

2

Resilience handler

Wraps the call

3

Polly pipeline

Retry, timeout, breaker

4

Network

Real request

Microsoft Resilience is a friendly layer on top of the Polly engine.

The easiest win: the standard resilience handler

If your problem is HTTP calls, the fastest and safest path is the standard resilience handler. It is one method call and it gives you a well-tuned pipeline that Microsoft built and tested.

First, add the package:

dotnet add package Microsoft.Extensions.Http.Resilience

Now wire it onto a typed HttpClient in your Program.cs:

using Microsoft.Extensions.Http.Resilience;
 
var builder = WebApplication.CreateBuilder(args);
 
builder.Services
    .AddHttpClient<WeatherClient>(client =>
    {
        client.BaseAddress = new Uri("https://api.example.com");
    })
    // This single line adds the full resilience pipeline.
    .AddStandardResilienceHandler();
 
var app = builder.Build();
app.Run();

That one line, AddStandardResilienceHandler(), gives you five strategies stacked in the right order:

  1. Rate limiter — caps how many calls go out at once.
  2. Total request timeout — an overall time limit for the whole operation.
  3. Retry — automatic retries with exponential backoff and jitter.
  4. Circuit breaker — trips when the failure ratio gets too high.
  5. Per-attempt timeout — a time limit for each single try.

The defaults are sensible: about five retries, a circuit breaker that watches a 10-second window and trips around a 20% failure ratio with a minimum number of calls, and a predicate that treats things like HTTP 408, 429, and 5xx as worth retrying. For many apps you can stop right here.

Tuning the standard handler

The defaults are good, but sometimes you need to nudge them. You can pass options without rebuilding the whole pipeline.

builder.Services
    .AddHttpClient<WeatherClient>()
    .AddStandardResilienceHandler(options =>
    {
        // Give each single attempt 3 seconds.
        options.AttemptTimeout.Timeout = TimeSpan.FromSeconds(3);
 
        // Cap the whole operation, including retries, at 15 seconds.
        options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(15);
 
        // Try a maximum of 4 times.
        options.Retry.MaxRetryAttempts = 4;
 
        // Trip the breaker when 30% of calls fail.
        options.CircuitBreaker.FailureRatio = 0.3;
    });

One rule the handler enforces for you: the total timeout must be larger than the per-attempt timeout, and the per-attempt timeout should be larger than the circuit breaker sampling settings allow. If your numbers do not make sense together, you get a clear error at startup instead of a strange bug in production.

Building your own pipeline with Polly v8

Sometimes you want full control, or you need resilience for code that is not an HTTP call, like a database query or a message queue read. For that, you build a pipeline directly with Polly v8 and its ResiliencePipelineBuilder.

Here the order you add strategies is the order they wrap each other, from outside to inside.

using Polly;
using Polly.Retry;
using Polly.CircuitBreaker;
 
ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
    // Outer: cap the whole thing.
    .AddTimeout(TimeSpan.FromSeconds(15))
    // Then retry with backoff and jitter.
    .AddRetry(new RetryStrategyOptions
    {
        MaxRetryAttempts = 3,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Exponential,
        UseJitter = true,
        ShouldHandle = new PredicateBuilder().Handle<HttpRequestException>()
    })
    // Then the circuit breaker.
    .AddCircuitBreaker(new CircuitBreakerStrategyOptions
    {
        FailureRatio = 0.5,
        SamplingDuration = TimeSpan.FromSeconds(10),
        MinimumThroughput = 5,
        BreakDuration = TimeSpan.FromSeconds(30)
    })
    // Inner: each single attempt gets 4 seconds.
    .AddTimeout(TimeSpan.FromSeconds(4))
    .Build();
 
// Use it to run any risky piece of code.
await pipeline.ExecuteAsync(async token =>
{
    await CallTheDatabaseAsync(token);
});

Read that builder from top to bottom and you get the nesting: total timeout wraps retry wraps circuit breaker wraps per-attempt timeout wraps your actual code. This is the same shape the standard HTTP handler uses, just written by hand.

How the strategies nest from outer to inner

Why the order is not just a detail

Picture the difference if you swapped retry and the per-attempt timeout.

  • Retry outside, timeout inside (good): every single try gets its own fresh time limit. A slow try is cut short, and the retry starts a clean attempt.
  • Timeout outside, retry inside: the whole set of retries must finish inside one timeout. If your first try eats most of the time, later retries may never get a fair chance.

Both can be valid choices, but you must pick on purpose. The default Microsoft order is a safe starting point for most teams.

Registering a named pipeline for reuse

If many parts of your app need the same pipeline, register it once in DI and ask for it by name. This keeps your settings in one place.

builder.Services.AddResiliencePipeline("db-pipeline", pipeline =>
{
    pipeline
        .AddRetry(new RetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        })
        .AddTimeout(TimeSpan.FromSeconds(10));
});
 
// Later, inject ResiliencePipelineProvider<string> and fetch it:
public class OrderService(ResiliencePipelineProvider<string> provider)
{
    public async Task SaveAsync(Order order, CancellationToken ct)
    {
        var pipeline = provider.GetPipeline("db-pipeline");
        await pipeline.ExecuteAsync(async token =>
            await _repository.SaveAsync(order, token), ct);
    }
}

A picture of one resilient call

Let us trace a single call through the pipeline so the flow feels real.

One call through the pipeline

Start
Attempt 1 fails
Wait + jitter
Attempt 2 ok
Return

Steps

1

Start

Total timer begins

2

Attempt 1

503 from server

3

Wait

Backoff delay

4

Attempt 2

200 success

5

Return

Caller gets data

A transient failure is retried, then succeeds, all within the total timeout.

And here is the same idea as a sequence, showing how the breaker would step in if failures kept piling up.

A retry sequence where the breaker may trip on repeated failure

Watching what your pipeline does

A resilience pipeline that you cannot see is a little scary. You want to know when retries happen and when the breaker trips. Polly raises telemetry that flows into the standard .NET logging and metrics system, so tools like OpenTelemetry, Seq, or Jaeger can show you the story.

You can also hook callbacks for learning or alerting:

.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    OnRetry = args =>
    {
        Console.WriteLine(
            $"Retry {args.AttemptNumber} after {args.RetryDelay}");
        return default;
    }
})

In production, prefer real logging over Console.WriteLine, and watch the breaker state. A breaker that trips often is a signal that a downstream service is unwell, and that is useful information for your whole team.

Common mistakes to avoid

A few traps catch almost everyone the first time:

  • Retrying non-transient errors. Retrying an HTTP 400 or 404 just wastes time. Use a ShouldHandle predicate to retry only the right faults.
  • Retrying without a circuit breaker. Pure retries against a dying service create a retry storm and can take the service fully down. Always pair them.
  • Forgetting the timeout. Without a timeout, one frozen call can slowly starve your thread pool.
  • No jitter. Many clients retrying in lockstep create sharp spikes. Jitter spreads them out.
  • Retrying non-idempotent writes blindly. Retrying a "charge the card" call could charge twice. Make such calls idempotent first, or do not retry them.

When should you use which approach

SituationBest choice
Calling another HTTP serviceStandard resilience handler on the typed client
Database or queue code, not HTTPHand-built Polly pipeline
Same policy reused in many placesNamed pipeline registered in DI
Very special, tuned HTTP behaviourAddResilienceHandler with custom strategies

Start simple. For most HTTP work, the standard handler is the right answer, and you only drop down to custom pipelines when you truly need the control.

Quick recap

  • A transient fault is a temporary failure that often fixes itself. Resilience patterns mostly handle these.
  • Retry tries again, but should use exponential backoff and jitter so you do not flood the server.
  • Timeout stops a slow call from freezing your app. Use both a per-attempt and a total timeout.
  • Circuit breaker trips when too many calls fail, giving a sick service time to recover. It has three states: Closed, Open, and Half-Open.
  • Polly is the engine; Microsoft.Extensions.Http.Resilience is the friendly wrapper for HttpClient.
  • For HTTP, AddStandardResilienceHandler() gives you a strong, well-ordered pipeline in one line.
  • For non-HTTP code, build your own pipeline with ResiliencePipelineBuilder, and remember that order matters.
  • Only retry the faults that can actually recover, always pair retries with a breaker and a timeout, and watch your telemetry.

References and further reading

Related Posts