Retries and Resilience in .NET with Polly and Microsoft Resilience
Learn retries, timeouts, and circuit breakers in .NET using Polly v8 and Microsoft.Extensions.Http.Resilience, with simple examples a beginner can follow.
When the shopkeeper is busy
Imagine you go to your favourite tea stall in the evening. You ask for one chai. The shopkeeper is busy and does not hear you. What do you do? You do not walk away forever. You wait a moment and ask again. Maybe you ask a third time. But you also know when to stop. If the shop is closed and the shutter is down, asking ten more times is pointless. You go to the next stall.
This is exactly how a good program should talk to other services on the network. Sometimes a call fails for a tiny reason, like a slow Wi-Fi moment. Asking again works. But if the other service is truly down, asking again and again only wastes time and makes things worse.
The skill of knowing when to retry, how long to wait, and when to give up is called resilience. In .NET we get this skill for free using two tools: Polly and Microsoft.Extensions.Http.Resilience. This article will teach you both in simple steps.
What can go wrong on a network
When your app calls another service over the internet, many small things can fail. Most of these failures are temporary. The fancy name for a temporary failure is a transient fault.
A transient fault is like a phone call that drops because you went under a bridge. Calling back works. A non-transient fault is like dialling a number that does not exist. Calling back will never help. Resilience patterns are mostly about handling the first kind well.
Here are the faults you will meet most often:
| Fault | What it looks like | Retry helps? |
|---|---|---|
| Timeout | The call takes too long and never answers | Often yes |
| Connection dropped | Network blip, socket closed | Often yes |
| HTTP 503 | Service says "I am too busy right now" | Yes, after a wait |
| HTTP 429 | "You are sending too many requests" | Yes, after a wait |
| HTTP 404 | "That thing does not exist" | No, retrying is pointless |
| HTTP 400 | "Your request is wrong" | No, fix the request |
The big lesson: retry only the faults that have a chance of fixing themselves.
Meet the three core patterns
Almost all resilience work comes down to three friendly patterns. Let us meet them one by one.
1. Retry
A retry simply tries the call again when it fails. But a naive retry can be dangerous. If you retry instantly, you might send a flood of calls all at once.
So good retries use two extra ideas:
- Backoff: wait a little longer between each try. First wait 1 second, then 2, then 4. This is called exponential backoff.
- Jitter: add a small random delay so that many clients do not all retry at the exact same moment. Without jitter, a thousand apps could retry together and slam the server.
How a retry with backoff behaves
Steps
Call fails
First attempt errors
Wait
Backoff delay
Retry 1
Try again
Wait more
Longer delay
Retry 2
Last try
2. Timeout
A timeout says "if this call does not finish in X seconds, stop waiting." Without it, one stuck call can hold a thread forever and slowly freeze your whole app. There are two flavours:
- Per-attempt timeout: limits each single try.
- Total timeout: limits the whole operation, including all the retries added up.
3. Circuit breaker
A circuit breaker is the smartest of the three. It watches your calls. If too many fail in a short window, it trips and stops all calls for a while. This gives the struggling service room to breathe. After a rest, it lets one test call through. If that works, normal traffic resumes.
Think of the circuit breaker like the trip switch in your home. When there is a short circuit, the switch cuts power to protect the house. You do not keep flipping it back instantly. You wait, fix the issue, then turn it on.
Polly and Microsoft Resilience: who does what
There are two libraries you will hear about. They work together.
| Library | What it gives you | When to reach for it |
|---|---|---|
| Polly | The core engine: retry, timeout, circuit breaker, fallback, rate limiter | Any code, not only HTTP |
| Microsoft.Extensions.Http.Resilience | A neat wrapper that plugs Polly into HttpClient and DI | When you make HTTP calls |
Polly v8 was built as a joint effort between the Polly maintainers and Microsoft. So when you use the Microsoft package, you are still using Polly underneath. The Microsoft package just saves you from wiring things by hand.
How the pieces stack
Steps
HttpClient
You call SendAsync
Resilience handler
Wraps the call
Polly pipeline
Retry, timeout, breaker
Network
Real request
The easiest win: the standard resilience handler
If your problem is HTTP calls, the fastest and safest path is the standard resilience handler. It is one method call and it gives you a well-tuned pipeline that Microsoft built and tested.
First, add the package:
dotnet add package Microsoft.Extensions.Http.ResilienceNow wire it onto a typed HttpClient in your Program.cs:
using Microsoft.Extensions.Http.Resilience;
var builder = WebApplication.CreateBuilder(args);
builder.Services
.AddHttpClient<WeatherClient>(client =>
{
client.BaseAddress = new Uri("https://api.example.com");
})
// This single line adds the full resilience pipeline.
.AddStandardResilienceHandler();
var app = builder.Build();
app.Run();That one line, AddStandardResilienceHandler(), gives you five strategies stacked in the right order:
- Rate limiter — caps how many calls go out at once.
- Total request timeout — an overall time limit for the whole operation.
- Retry — automatic retries with exponential backoff and jitter.
- Circuit breaker — trips when the failure ratio gets too high.
- Per-attempt timeout — a time limit for each single try.
The defaults are sensible: about five retries, a circuit breaker that watches a 10-second window and trips around a 20% failure ratio with a minimum number of calls, and a predicate that treats things like HTTP 408, 429, and 5xx as worth retrying. For many apps you can stop right here.
Tuning the standard handler
The defaults are good, but sometimes you need to nudge them. You can pass options without rebuilding the whole pipeline.
builder.Services
.AddHttpClient<WeatherClient>()
.AddStandardResilienceHandler(options =>
{
// Give each single attempt 3 seconds.
options.AttemptTimeout.Timeout = TimeSpan.FromSeconds(3);
// Cap the whole operation, including retries, at 15 seconds.
options.TotalRequestTimeout.Timeout = TimeSpan.FromSeconds(15);
// Try a maximum of 4 times.
options.Retry.MaxRetryAttempts = 4;
// Trip the breaker when 30% of calls fail.
options.CircuitBreaker.FailureRatio = 0.3;
});One rule the handler enforces for you: the total timeout must be larger than the per-attempt timeout, and the per-attempt timeout should be larger than the circuit breaker sampling settings allow. If your numbers do not make sense together, you get a clear error at startup instead of a strange bug in production.
Building your own pipeline with Polly v8
Sometimes you want full control, or you need resilience for code that is not an HTTP call, like a database query or a message queue read. For that, you build a pipeline directly with Polly v8 and its ResiliencePipelineBuilder.
Here the order you add strategies is the order they wrap each other, from outside to inside.
using Polly;
using Polly.Retry;
using Polly.CircuitBreaker;
ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
// Outer: cap the whole thing.
.AddTimeout(TimeSpan.FromSeconds(15))
// Then retry with backoff and jitter.
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(1),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = new PredicateBuilder().Handle<HttpRequestException>()
})
// Then the circuit breaker.
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
FailureRatio = 0.5,
SamplingDuration = TimeSpan.FromSeconds(10),
MinimumThroughput = 5,
BreakDuration = TimeSpan.FromSeconds(30)
})
// Inner: each single attempt gets 4 seconds.
.AddTimeout(TimeSpan.FromSeconds(4))
.Build();
// Use it to run any risky piece of code.
await pipeline.ExecuteAsync(async token =>
{
await CallTheDatabaseAsync(token);
});Read that builder from top to bottom and you get the nesting: total timeout wraps retry wraps circuit breaker wraps per-attempt timeout wraps your actual code. This is the same shape the standard HTTP handler uses, just written by hand.
Why the order is not just a detail
Picture the difference if you swapped retry and the per-attempt timeout.
- Retry outside, timeout inside (good): every single try gets its own fresh time limit. A slow try is cut short, and the retry starts a clean attempt.
- Timeout outside, retry inside: the whole set of retries must finish inside one timeout. If your first try eats most of the time, later retries may never get a fair chance.
Both can be valid choices, but you must pick on purpose. The default Microsoft order is a safe starting point for most teams.
Registering a named pipeline for reuse
If many parts of your app need the same pipeline, register it once in DI and ask for it by name. This keeps your settings in one place.
builder.Services.AddResiliencePipeline("db-pipeline", pipeline =>
{
pipeline
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true
})
.AddTimeout(TimeSpan.FromSeconds(10));
});
// Later, inject ResiliencePipelineProvider<string> and fetch it:
public class OrderService(ResiliencePipelineProvider<string> provider)
{
public async Task SaveAsync(Order order, CancellationToken ct)
{
var pipeline = provider.GetPipeline("db-pipeline");
await pipeline.ExecuteAsync(async token =>
await _repository.SaveAsync(order, token), ct);
}
}A picture of one resilient call
Let us trace a single call through the pipeline so the flow feels real.
One call through the pipeline
Steps
Start
Total timer begins
Attempt 1
503 from server
Wait
Backoff delay
Attempt 2
200 success
Return
Caller gets data
And here is the same idea as a sequence, showing how the breaker would step in if failures kept piling up.
Watching what your pipeline does
A resilience pipeline that you cannot see is a little scary. You want to know when retries happen and when the breaker trips. Polly raises telemetry that flows into the standard .NET logging and metrics system, so tools like OpenTelemetry, Seq, or Jaeger can show you the story.
You can also hook callbacks for learning or alerting:
.AddRetry(new RetryStrategyOptions
{
MaxRetryAttempts = 3,
OnRetry = args =>
{
Console.WriteLine(
$"Retry {args.AttemptNumber} after {args.RetryDelay}");
return default;
}
})In production, prefer real logging over Console.WriteLine, and watch the breaker state. A breaker that trips often is a signal that a downstream service is unwell, and that is useful information for your whole team.
Common mistakes to avoid
A few traps catch almost everyone the first time:
- Retrying non-transient errors. Retrying an HTTP 400 or 404 just wastes time. Use a
ShouldHandlepredicate to retry only the right faults. - Retrying without a circuit breaker. Pure retries against a dying service create a retry storm and can take the service fully down. Always pair them.
- Forgetting the timeout. Without a timeout, one frozen call can slowly starve your thread pool.
- No jitter. Many clients retrying in lockstep create sharp spikes. Jitter spreads them out.
- Retrying non-idempotent writes blindly. Retrying a "charge the card" call could charge twice. Make such calls idempotent first, or do not retry them.
When should you use which approach
| Situation | Best choice |
|---|---|
| Calling another HTTP service | Standard resilience handler on the typed client |
| Database or queue code, not HTTP | Hand-built Polly pipeline |
| Same policy reused in many places | Named pipeline registered in DI |
| Very special, tuned HTTP behaviour | AddResilienceHandler with custom strategies |
Start simple. For most HTTP work, the standard handler is the right answer, and you only drop down to custom pipelines when you truly need the control.
Quick recap
- A transient fault is a temporary failure that often fixes itself. Resilience patterns mostly handle these.
- Retry tries again, but should use exponential backoff and jitter so you do not flood the server.
- Timeout stops a slow call from freezing your app. Use both a per-attempt and a total timeout.
- Circuit breaker trips when too many calls fail, giving a sick service time to recover. It has three states: Closed, Open, and Half-Open.
- Polly is the engine; Microsoft.Extensions.Http.Resilience is the friendly wrapper for HttpClient.
- For HTTP,
AddStandardResilienceHandler()gives you a strong, well-ordered pipeline in one line. - For non-HTTP code, build your own pipeline with
ResiliencePipelineBuilder, and remember that order matters. - Only retry the faults that can actually recover, always pair retries with a breaker and a timeout, and watch your telemetry.
References and further reading
- Build resilient HTTP apps: key development patterns — Microsoft Learn
- Introduction to resilient app development — Microsoft Learn
- Polly resilience pipelines documentation
- Polly retry strategy documentation
- Building resilient cloud services with .NET — .NET Blog
- Polly project on GitHub (App-vNext/Polly)
Related Posts
Health Checks in ASP.NET Core: A Beginner's Guide
Learn health checks in ASP.NET Core: add liveness and readiness endpoints, check your database and Redis, write custom checks, and wire up Kubernetes probes.
How .NET Aspire Simplifies Service Discovery for Your Apps
Learn how .NET Aspire service discovery lets your services find each other by name, with no hardcoded URLs, ports, or environment headaches.
Getting Started With OpenTelemetry in .NET With Jaeger and Seq
A beginner guide to OpenTelemetry in .NET. Add traces, metrics, and logs, then view them in Jaeger and Seq using the OTLP exporter step by step.
The False Comfort of the Happy Path: Decoupling Your Services
Learn why the happy path lies to you, and how decoupling .NET services with messaging, retries, and circuit breakers keeps your app calm when things break.
Logging Requests and Responses for APIs and HttpClient in ASP.NET Core
Learn to log incoming API requests and outgoing HttpClient calls in ASP.NET Core using built-in HTTP logging and a custom DelegatingHandler, step by step.
Overriding Default HTTP Resilience Handlers in .NET
Learn how to override global HTTP resilience handlers in .NET so one HttpClient can use its own retry, timeout, and circuit breaker rules.