Skip to main content
SEMastery
DevOpsintermediate

Horizontally Scaling ASP.NET Core APIs With YARP Load Balancing

Learn how to scale ASP.NET Core APIs horizontally using YARP load balancing, with policies, health checks, and a full Program.cs setup explained simply.

12 min readUpdated November 4, 2025

One ticket counter is not enough

Picture a busy railway station in the morning. There is one ticket counter, and a long line of people behind it. The one clerk is working as fast as he can, but the line keeps growing. People get angry. Some give up and leave.

The station manager has two choices. He can ask the clerk to work faster and give him a better computer. That helps a little, but there is a limit to how fast one person can go. This is called vertical scaling — making one worker bigger and stronger.

Or the manager can open five more counters and put a clerk at each one. Now a guard stands at the front of the hall and waves each new person to whichever counter is free. The same crowd is served much faster, and if one clerk goes for tea, the guard simply stops sending people to that counter. This is horizontal scaling — adding more workers and sharing the crowd between them.

That guard at the front, the one deciding which counter you go to, is a load balancer. In the .NET world, one easy way to build that guard is YARP.

What is YARP?

YARP stands for Yet Another Reverse Proxy. It is a free, open-source library from Microsoft. With it, you build a small ASP.NET Core app whose only job is to receive requests and pass them to your real API servers behind the scenes.

A reverse proxy is just a polite middleman. The outside world talks to the proxy. The proxy talks to your servers. The outside world never needs to know how many servers you have, or which one answered. To them it looks like one single API.

YARP works well on .NET 8, .NET 9, and the current LTS release, .NET 10. Because it is plain C# inside a normal web app, you can read it, debug it, and change its rules using code you already understand.

Figure 1: The reverse proxy is the guard at the front. Clients talk only to it, and it shares requests across many API copies.

Two words: cluster and destination

YARP uses two simple words a lot, so let us learn them first.

A destination is one copy of your API. It is one running server with its own address, like https://localhost:5101. In our railway story, a destination is one ticket counter.

A cluster is a group of destinations that all do the same job. The whole row of ticket counters together is one cluster. When a request comes in, YARP picks one destination from the cluster to handle it.

A route is the rule that says "requests that look like this should go to that cluster." For example, "any request starting with /api goes to the orders cluster."

TermRailway analogyIn YARP
DestinationOne ticket counterOne API server with an address
ClusterThe whole row of countersA group of equal API servers
RouteThe sign pointing you to the right hallA rule matching the URL path
PolicyHow the guard chooses a counterThe load balancing algorithm

Setting up YARP step by step

Let us build the proxy. First, create a brand new empty web project. This project will be only the proxy. Your real API stays in its own project.

dotnet new web -n Gateway
cd Gateway
dotnet add package Yarp.ReverseProxy

Now open Program.cs. We tell ASP.NET Core to load YARP and read its settings from configuration. This is the whole startup file, and it is short.

var builder = WebApplication.CreateBuilder(args);
 
// Register YARP and load routes and clusters from appsettings.json.
builder.Services
    .AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
 
var app = builder.Build();
 
// Add the proxy into the request pipeline.
app.MapReverseProxy();
 
app.Run();

That is it for the code. The interesting part lives in appsettings.json, where we describe the route and the cluster. Here we send everything under / to a cluster with three destinations.

{
  "ReverseProxy": {
    "Routes": {
      "api-route": {
        "ClusterId": "api-cluster",
        "Match": { "Path": "{**catch-all}" }
      }
    },
    "Clusters": {
      "api-cluster": {
        "LoadBalancingPolicy": "RoundRobin",
        "Destinations": {
          "d1": { "Address": "https://localhost:5101/" },
          "d2": { "Address": "https://localhost:5102/" },
          "d3": { "Address": "https://localhost:5103/" }
        }
      }
    }
  }
}

Read it slowly. There is one route called api-route. It matches every path and sends it to api-cluster. The cluster has three destinations, d1, d2, and d3, each pointing at a copy of your API on a different port. The LoadBalancingPolicy tells YARP how to choose between them. We will look at policies next.

From zero to a working YARP gateway

New project
Add package
Program.cs
appsettings
Run

Steps

1

New project

dotnet new web for the gateway

2

Add package

add Yarp.ReverseProxy

3

Program.cs

AddReverseProxy + MapReverseProxy

4

appsettings

define route and cluster

5

Run

traffic now spreads across copies

Five small steps take you from an empty project to a running load balancer.

How does YARP choose a destination?

When a request arrives, the cluster may have three healthy destinations. YARP must pick exactly one. The rule it uses is the load balancing policy. YARP ships with a few, and you choose by name.

If you do not set a policy at all, YARP uses PowerOfTwoChoices. It picks two destinations at random, looks at how busy each one is, and sends the request to the less busy of the two. This sounds simple, but it spreads load surprisingly well and costs almost nothing to compute. For most teams it is the best default.

PolicyHow it picksGood when
PowerOfTwoChoicesTwo random, pick the quieter oneDefault; great all-rounder
RoundRobinNext one in line, then loopYou want even, predictable rotation
LeastRequestsThe one with fewest in-flight requestsSome requests are much slower than others
RandomA random one each timeSimple and stateless
FirstAlphabeticalAlways the first available by nameTesting or a clear primary order
Figure 2: Round robin hands each new request to the next destination, then loops back to the start.

You can change the policy with one line in appsettings.json, no rebuild of your API needed. That flexibility is one of the nicest things about keeping the proxy in configuration.

Why health checks matter

Here is a problem. Imagine one of your three API copies crashes, but the proxy does not know. YARP keeps cheerfully sending one out of every three requests to a dead server. One in three users gets an error. That is bad.

The fix is health checks. A health check is YARP's way of asking "are you okay?" and only sending traffic to servers that say yes. There are two kinds, and you can use both together.

Active health checks are proactive. On a timer, YARP sends a small request to a health endpoint on every destination, such as /health. If the server answers with a success code (2xx), it is marked healthy. If it fails or times out, it is marked unhealthy and taken out of rotation until it recovers.

Passive health checks are reactive. YARP simply watches the real responses flowing through it. If a destination starts returning lots of failures, YARP marks it unhealthy without needing a separate probe. Passive checks need the passive health middleware, which MapReverseProxy adds for you automatically.

Figure 3: A destination moves between healthy and unhealthy. YARP only sends real traffic to healthy ones.

Here is what active health checks look like in configuration. We add a HealthCheck block to the cluster.

{
  "ReverseProxy": {
    "Clusters": {
      "api-cluster": {
        "LoadBalancingPolicy": "PowerOfTwoChoices",
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:10",
            "Timeout": "00:00:05",
            "Policy": "ConsecutiveFailures",
            "Path": "/health"
          }
        },
        "Destinations": {
          "d1": { "Address": "https://localhost:5101/" },
          "d2": { "Address": "https://localhost:5102/" }
        }
      }
    }
  }
}

This says: every 10 seconds, ping /health on each destination, give it 5 seconds to answer, and use the ConsecutiveFailures policy to decide when a destination has failed enough times to be marked unhealthy. For this to work, your API needs a real /health endpoint. In ASP.NET Core you add one with a couple of lines.

// In your API project (not the gateway), expose a health endpoint.
builder.Services.AddHealthChecks();
 
var app = builder.Build();
 
app.MapHealthChecks("/health");

What happens when a server goes down

Probe
Detect
Remove
Reroute
Recover

Steps

1

Probe

YARP pings /health on a timer

2

Detect

d2 fails repeatedly

3

Remove

d2 marked unhealthy, no traffic

4

Reroute

users served by d1 and d3

5

Recover

d2 passes probe, returns to pool

Health checks let YARP route around a sick destination automatically, then bring it back.

Running real copies to test it

To actually see load balancing, you need more than one copy of your API running. On one machine, the easy trick is to start the same API on different ports. You can do this from the terminal by setting the URL each time.

# Terminal 1
dotnet run --project MyApi --urls "https://localhost:5101"
 
# Terminal 2
dotnet run --project MyApi --urls "https://localhost:5102"
 
# Terminal 3
dotnet run --project MyApi --urls "https://localhost:5103"

Now start the gateway too. Send several requests to the gateway and add a line in your API that logs the port it is running on. You will see the responses come from different ports as YARP shares the load. In real production you would not run copies by hand. A container platform like Kubernetes, Azure Container Apps, or even .NET Aspire during development would start and manage the copies for you, and YARP would balance across them.

A note on shared state

Horizontal scaling brings one new question you must answer: where does shared data live? If a user logs in on copy 1, and their next request lands on copy 2, copy 2 must also know they are logged in.

The golden rule is to keep your API copies stateless. Do not store session data, locks, or caches in the memory of one copy. Instead push that shared state into something all copies can reach, like a database or a Redis cache. That way it does not matter which copy serves you. Any of them can do the job, which is exactly what makes adding more copies safe and easy.

Where state livesSafe to scale?Why
In one server's memoryNoOther copies cannot see it
In a shared databaseYesEvery copy reads the same data
In Redis or a cacheYesEvery copy shares one source of truth
In a sticky cookie onlyRiskyBreaks if that one server dies

Where YARP fits in your stack

You might already have a cloud load balancer at the very edge of your system. That is fine. YARP does not have to replace it. Many teams keep the cloud balancer for raw network spreading and TLS, and put YARP just behind it for smart, app-aware routing in C#. For example, YARP can route /orders to one cluster and /payments to another, add headers, or apply rate limits, all in code your team controls.

So the choice is not always "YARP or Nginx." It is often "what is the right tool at each layer." For pure app logic and team-owned routing rules, having it in your .NET code is a real advantage.

Quick recap

  • Horizontal scaling means running many copies of your API and sharing traffic between them, instead of making one server bigger.
  • YARP is Microsoft's free reverse proxy library. You build the load balancer inside a normal ASP.NET Core app.
  • A cluster is a group of equal destinations (API copies). A route matches a URL and sends it to a cluster.
  • The load balancing policy decides which destination handles a request. PowerOfTwoChoices is a great default; RoundRobin and LeastRequests are common alternatives.
  • Health checks keep traffic away from broken servers. Use active checks (timed probes to /health) and passive checks (watching real responses).
  • Keep your API copies stateless. Push shared data into a database or Redis so any copy can serve any request.
  • YARP can work alongside a cloud load balancer, not only replace it. Use each tool where it fits best.

References and further reading

Related Posts