Horizontally Scaling ASP.NET Core APIs With YARP Load Balancing
Learn how to scale ASP.NET Core APIs horizontally using YARP load balancing, with policies, health checks, and a full Program.cs setup explained simply.
One ticket counter is not enough
Picture a busy railway station in the morning. There is one ticket counter, and a long line of people behind it. The one clerk is working as fast as he can, but the line keeps growing. People get angry. Some give up and leave.
The station manager has two choices. He can ask the clerk to work faster and give him a better computer. That helps a little, but there is a limit to how fast one person can go. This is called vertical scaling — making one worker bigger and stronger.
Or the manager can open five more counters and put a clerk at each one. Now a guard stands at the front of the hall and waves each new person to whichever counter is free. The same crowd is served much faster, and if one clerk goes for tea, the guard simply stops sending people to that counter. This is horizontal scaling — adding more workers and sharing the crowd between them.
That guard at the front, the one deciding which counter you go to, is a load balancer. In the .NET world, one easy way to build that guard is YARP.
What is YARP?
YARP stands for Yet Another Reverse Proxy. It is a free, open-source library from Microsoft. With it, you build a small ASP.NET Core app whose only job is to receive requests and pass them to your real API servers behind the scenes.
A reverse proxy is just a polite middleman. The outside world talks to the proxy. The proxy talks to your servers. The outside world never needs to know how many servers you have, or which one answered. To them it looks like one single API.
YARP works well on .NET 8, .NET 9, and the current LTS release, .NET 10. Because it is plain C# inside a normal web app, you can read it, debug it, and change its rules using code you already understand.
Two words: cluster and destination
YARP uses two simple words a lot, so let us learn them first.
A destination is one copy of your API. It is one running server with its own address, like https://localhost:5101. In our railway story, a destination is one ticket counter.
A cluster is a group of destinations that all do the same job. The whole row of ticket counters together is one cluster. When a request comes in, YARP picks one destination from the cluster to handle it.
A route is the rule that says "requests that look like this should go to that cluster." For example, "any request starting with /api goes to the orders cluster."
| Term | Railway analogy | In YARP |
|---|---|---|
| Destination | One ticket counter | One API server with an address |
| Cluster | The whole row of counters | A group of equal API servers |
| Route | The sign pointing you to the right hall | A rule matching the URL path |
| Policy | How the guard chooses a counter | The load balancing algorithm |
Setting up YARP step by step
Let us build the proxy. First, create a brand new empty web project. This project will be only the proxy. Your real API stays in its own project.
dotnet new web -n Gateway
cd Gateway
dotnet add package Yarp.ReverseProxyNow open Program.cs. We tell ASP.NET Core to load YARP and read its settings from configuration. This is the whole startup file, and it is short.
var builder = WebApplication.CreateBuilder(args);
// Register YARP and load routes and clusters from appsettings.json.
builder.Services
.AddReverseProxy()
.LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
var app = builder.Build();
// Add the proxy into the request pipeline.
app.MapReverseProxy();
app.Run();That is it for the code. The interesting part lives in appsettings.json, where we describe the route and the cluster. Here we send everything under / to a cluster with three destinations.
{
"ReverseProxy": {
"Routes": {
"api-route": {
"ClusterId": "api-cluster",
"Match": { "Path": "{**catch-all}" }
}
},
"Clusters": {
"api-cluster": {
"LoadBalancingPolicy": "RoundRobin",
"Destinations": {
"d1": { "Address": "https://localhost:5101/" },
"d2": { "Address": "https://localhost:5102/" },
"d3": { "Address": "https://localhost:5103/" }
}
}
}
}
}Read it slowly. There is one route called api-route. It matches every path and sends it to api-cluster. The cluster has three destinations, d1, d2, and d3, each pointing at a copy of your API on a different port. The LoadBalancingPolicy tells YARP how to choose between them. We will look at policies next.
From zero to a working YARP gateway
Steps
New project
dotnet new web for the gateway
Add package
add Yarp.ReverseProxy
Program.cs
AddReverseProxy + MapReverseProxy
appsettings
define route and cluster
Run
traffic now spreads across copies
How does YARP choose a destination?
When a request arrives, the cluster may have three healthy destinations. YARP must pick exactly one. The rule it uses is the load balancing policy. YARP ships with a few, and you choose by name.
If you do not set a policy at all, YARP uses PowerOfTwoChoices. It picks two destinations at random, looks at how busy each one is, and sends the request to the less busy of the two. This sounds simple, but it spreads load surprisingly well and costs almost nothing to compute. For most teams it is the best default.
| Policy | How it picks | Good when |
|---|---|---|
| PowerOfTwoChoices | Two random, pick the quieter one | Default; great all-rounder |
| RoundRobin | Next one in line, then loop | You want even, predictable rotation |
| LeastRequests | The one with fewest in-flight requests | Some requests are much slower than others |
| Random | A random one each time | Simple and stateless |
| FirstAlphabetical | Always the first available by name | Testing or a clear primary order |
You can change the policy with one line in appsettings.json, no rebuild of your API needed. That flexibility is one of the nicest things about keeping the proxy in configuration.
Why health checks matter
Here is a problem. Imagine one of your three API copies crashes, but the proxy does not know. YARP keeps cheerfully sending one out of every three requests to a dead server. One in three users gets an error. That is bad.
The fix is health checks. A health check is YARP's way of asking "are you okay?" and only sending traffic to servers that say yes. There are two kinds, and you can use both together.
Active health checks are proactive. On a timer, YARP sends a small request to a health endpoint on every destination, such as /health. If the server answers with a success code (2xx), it is marked healthy. If it fails or times out, it is marked unhealthy and taken out of rotation until it recovers.
Passive health checks are reactive. YARP simply watches the real responses flowing through it. If a destination starts returning lots of failures, YARP marks it unhealthy without needing a separate probe. Passive checks need the passive health middleware, which MapReverseProxy adds for you automatically.
Here is what active health checks look like in configuration. We add a HealthCheck block to the cluster.
{
"ReverseProxy": {
"Clusters": {
"api-cluster": {
"LoadBalancingPolicy": "PowerOfTwoChoices",
"HealthCheck": {
"Active": {
"Enabled": true,
"Interval": "00:00:10",
"Timeout": "00:00:05",
"Policy": "ConsecutiveFailures",
"Path": "/health"
}
},
"Destinations": {
"d1": { "Address": "https://localhost:5101/" },
"d2": { "Address": "https://localhost:5102/" }
}
}
}
}
}This says: every 10 seconds, ping /health on each destination, give it 5 seconds to answer, and use the ConsecutiveFailures policy to decide when a destination has failed enough times to be marked unhealthy. For this to work, your API needs a real /health endpoint. In ASP.NET Core you add one with a couple of lines.
// In your API project (not the gateway), expose a health endpoint.
builder.Services.AddHealthChecks();
var app = builder.Build();
app.MapHealthChecks("/health");What happens when a server goes down
Steps
Probe
YARP pings /health on a timer
Detect
d2 fails repeatedly
Remove
d2 marked unhealthy, no traffic
Reroute
users served by d1 and d3
Recover
d2 passes probe, returns to pool
Running real copies to test it
To actually see load balancing, you need more than one copy of your API running. On one machine, the easy trick is to start the same API on different ports. You can do this from the terminal by setting the URL each time.
# Terminal 1
dotnet run --project MyApi --urls "https://localhost:5101"
# Terminal 2
dotnet run --project MyApi --urls "https://localhost:5102"
# Terminal 3
dotnet run --project MyApi --urls "https://localhost:5103"Now start the gateway too. Send several requests to the gateway and add a line in your API that logs the port it is running on. You will see the responses come from different ports as YARP shares the load. In real production you would not run copies by hand. A container platform like Kubernetes, Azure Container Apps, or even .NET Aspire during development would start and manage the copies for you, and YARP would balance across them.
A note on shared state
Horizontal scaling brings one new question you must answer: where does shared data live? If a user logs in on copy 1, and their next request lands on copy 2, copy 2 must also know they are logged in.
The golden rule is to keep your API copies stateless. Do not store session data, locks, or caches in the memory of one copy. Instead push that shared state into something all copies can reach, like a database or a Redis cache. That way it does not matter which copy serves you. Any of them can do the job, which is exactly what makes adding more copies safe and easy.
| Where state lives | Safe to scale? | Why |
|---|---|---|
| In one server's memory | No | Other copies cannot see it |
| In a shared database | Yes | Every copy reads the same data |
| In Redis or a cache | Yes | Every copy shares one source of truth |
| In a sticky cookie only | Risky | Breaks if that one server dies |
Where YARP fits in your stack
You might already have a cloud load balancer at the very edge of your system. That is fine. YARP does not have to replace it. Many teams keep the cloud balancer for raw network spreading and TLS, and put YARP just behind it for smart, app-aware routing in C#. For example, YARP can route /orders to one cluster and /payments to another, add headers, or apply rate limits, all in code your team controls.
So the choice is not always "YARP or Nginx." It is often "what is the right tool at each layer." For pure app logic and team-owned routing rules, having it in your .NET code is a real advantage.
Quick recap
- Horizontal scaling means running many copies of your API and sharing traffic between them, instead of making one server bigger.
- YARP is Microsoft's free reverse proxy library. You build the load balancer inside a normal ASP.NET Core app.
- A cluster is a group of equal destinations (API copies). A route matches a URL and sends it to a cluster.
- The load balancing policy decides which destination handles a request. PowerOfTwoChoices is a great default; RoundRobin and LeastRequests are common alternatives.
- Health checks keep traffic away from broken servers. Use active checks (timed probes to
/health) and passive checks (watching real responses). - Keep your API copies stateless. Push shared data into a database or Redis so any copy can serve any request.
- YARP can work alongside a cloud load balancer, not only replace it. Use each tool where it fits best.
References and further reading
- YARP Load Balancing — Microsoft Learn
- YARP Destination Health Checks — Microsoft Learn
- YARP Getting Started — Microsoft Learn
- Horizontally Scaling ASP.NET Core APIs With YARP Load Balancing — Milan Jovanović
- YARP Reverse Proxy in ASP.NET Core — NikolaTech
Related Posts
Rate Limiting in ASP.NET Core: A Simple, Complete Guide
Learn rate limiting in ASP.NET Core with simple examples. Understand fixed window, sliding window, token bucket, and concurrency limiters, with diagrams, code, and real-world advice on which to pick.
Caching in ASP.NET Core: Make Your App Fast (The Easy Way)
Learn caching in ASP.NET Core with simple examples. Understand in-memory cache, distributed Redis cache, HybridCache, and output cache, with diagrams, code, and clear advice on which to use and when.
.NET Aspire: A Game Changer for Cloud-Native Development
A beginner-friendly guide to .NET Aspire, the cloud-native stack that orchestrates your services, databases, and dashboards with one simple command.
YARP as an API Gateway in .NET: A Beginner's Guide
Learn how to use YARP as an API gateway in .NET 10. Routes, clusters, load balancing, health checks, auth, and transforms explained in simple, friendly steps.
Implementing an API Gateway for Microservices With YARP
Learn to build an API gateway for microservices with YARP in .NET 10. Routes, clusters, auth, rate limits, and transforms explained in simple steps.
YARP vs Nginx: A Quick Performance Comparison for .NET
A simple, friendly look at YARP vs Nginx as a reverse proxy: how each one works, real benchmark numbers, tuning tips, and how to pick the right one.