What is YARP and what is it used for?

YARP stands for Yet Another Reverse Proxy. It is a free library from Microsoft that lets you build your own reverse proxy and load balancer inside a normal ASP.NET Core app. You use it to sit in front of several copies of your API and share incoming requests between them, so no single copy gets overloaded.

What is the difference between horizontal and vertical scaling?

Vertical scaling means making one machine bigger by adding more CPU and memory. Horizontal scaling means adding more machines that each run a copy of your app. Horizontal scaling is usually cheaper at large sizes and keeps working even if one machine dies, but it needs a load balancer like YARP to spread the traffic.

Which load balancing policy should I pick in YARP?

If you are not sure, leave it on the default, which is PowerOfTwoChoices. It picks two destinations at random and sends the request to the less busy one, which spreads load well with very little cost. Use RoundRobin if you want simple, even, predictable rotation, and LeastRequests if some requests are much slower than others.

Do I need health checks with YARP load balancing?

Yes, in production you really do. Without health checks YARP keeps sending traffic to a broken server because it does not know it is broken. Active health checks ping a health endpoint on each server on a timer, and passive health checks watch real responses for failures. Both let YARP stop using a sick server until it recovers.

Is YARP a replacement for Nginx or a cloud load balancer?

It can be, but it does not have to be. YARP is great when you want routing logic written in C# and shared with your team, or when you run inside Kubernetes or App Service. Many teams still keep a cloud load balancer at the very edge and use YARP for smarter, app-aware routing behind it. Both approaches are fine.

DevOpsintermediate

Horizontally Scaling ASP.NET Core APIs With YARP Load Balancing

Learn how to scale ASP.NET Core APIs horizontally using YARP load balancing, with policies, health checks, and a full Program.cs setup explained simply.

12 min readUpdated November 4, 2025

One ticket counter is not enough

Picture a busy railway station in the morning. There is one ticket counter, and a long line of people behind it. The one clerk is working as fast as he can, but the line keeps growing. People get angry. Some give up and leave.

The station manager has two choices. He can ask the clerk to work faster and give him a better computer. That helps a little, but there is a limit to how fast one person can go. This is called vertical scaling — making one worker bigger and stronger.

Or the manager can open five more counters and put a clerk at each one. Now a guard stands at the front of the hall and waves each new person to whichever counter is free. The same crowd is served much faster, and if one clerk goes for tea, the guard simply stops sending people to that counter. This is horizontal scaling — adding more workers and sharing the crowd between them.

That guard at the front, the one deciding which counter you go to, is a load balancer. In the .NET world, one easy way to build that guard is YARP.

What is YARP?

YARP stands for Yet Another Reverse Proxy. It is a free, open-source library from Microsoft. With it, you build a small ASP.NET Core app whose only job is to receive requests and pass them to your real API servers behind the scenes.

A reverse proxy is just a polite middleman. The outside world talks to the proxy. The proxy talks to your servers. The outside world never needs to know how many servers you have, or which one answered. To them it looks like one single API.

YARP works well on .NET 8, .NET 9, and the current LTS release, .NET 10. Because it is plain C# inside a normal web app, you can read it, debug it, and change its rules using code you already understand.

Figure 1: The reverse proxy is the guard at the front. Clients talk only to it, and it shares requests across many API copies.

Two words: cluster and destination

YARP uses two simple words a lot, so let us learn them first.

A destination is one copy of your API. It is one running server with its own address, like https://localhost:5101. In our railway story, a destination is one ticket counter.

A cluster is a group of destinations that all do the same job. The whole row of ticket counters together is one cluster. When a request comes in, YARP picks one destination from the cluster to handle it.

A route is the rule that says "requests that look like this should go to that cluster." For example, "any request starting with /api goes to the orders cluster."

Term	Railway analogy	In YARP
Destination	One ticket counter	One API server with an address
Cluster	The whole row of counters	A group of equal API servers
Route	The sign pointing you to the right hall	A rule matching the URL path
Policy	How the guard chooses a counter	The load balancing algorithm

Setting up YARP step by step

Let us build the proxy. First, create a brand new empty web project. This project will be only the proxy. Your real API stays in its own project.

dotnet new web -n Gateway
cd Gateway
dotnet add package Yarp.ReverseProxy

Now open Program.cs. We tell ASP.NET Core to load YARP and read its settings from configuration. This is the whole startup file, and it is short.

var builder = WebApplication.CreateBuilder(args);
 
// Register YARP and load routes and clusters from appsettings.json.
builder.Services
    .AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
 
var app = builder.Build();
 
// Add the proxy into the request pipeline.
app.MapReverseProxy();
 
app.Run();

That is it for the code. The interesting part lives in appsettings.json, where we describe the route and the cluster. Here we send everything under / to a cluster with three destinations.

{
  "ReverseProxy": {
    "Routes": {
      "api-route": {
        "ClusterId": "api-cluster",
        "Match": { "Path": "{**catch-all}" }
      }
    },
    "Clusters": {
      "api-cluster": {
        "LoadBalancingPolicy": "RoundRobin",
        "Destinations": {
          "d1": { "Address": "https://localhost:5101/" },
          "d2": { "Address": "https://localhost:5102/" },
          "d3": { "Address": "https://localhost:5103/" }
        }
      }
    }
  }
}

Read it slowly. There is one route called api-route. It matches every path and sends it to api-cluster. The cluster has three destinations, d1, d2, and d3, each pointing at a copy of your API on a different port. The LoadBalancingPolicy tells YARP how to choose between them. We will look at policies next.

From zero to a working YARP gateway

New project

Add package

Program.cs

appsettings

Run

Steps

New project

dotnet new web for the gateway

Add package

add Yarp.ReverseProxy

Program.cs

AddReverseProxy + MapReverseProxy

appsettings

define route and cluster

Run

traffic now spreads across copies

Five small steps take you from an empty project to a running load balancer.

How does YARP choose a destination?

When a request arrives, the cluster may have three healthy destinations. YARP must pick exactly one. The rule it uses is the load balancing policy. YARP ships with a few, and you choose by name.

If you do not set a policy at all, YARP uses PowerOfTwoChoices. It picks two destinations at random, looks at how busy each one is, and sends the request to the less busy of the two. This sounds simple, but it spreads load surprisingly well and costs almost nothing to compute. For most teams it is the best default.

Policy	How it picks	Good when
PowerOfTwoChoices	Two random, pick the quieter one	Default; great all-rounder
RoundRobin	Next one in line, then loop	You want even, predictable rotation
LeastRequests	The one with fewest in-flight requests	Some requests are much slower than others
Random	A random one each time	Simple and stateless
FirstAlphabetical	Always the first available by name	Testing or a clear primary order

Figure 2: Round robin hands each new request to the next destination, then loops back to the start.

You can change the policy with one line in appsettings.json, no rebuild of your API needed. That flexibility is one of the nicest things about keeping the proxy in configuration.

Why health checks matter

Here is a problem. Imagine one of your three API copies crashes, but the proxy does not know. YARP keeps cheerfully sending one out of every three requests to a dead server. One in three users gets an error. That is bad.

The fix is health checks. A health check is YARP's way of asking "are you okay?" and only sending traffic to servers that say yes. There are two kinds, and you can use both together.

Active health checks are proactive. On a timer, YARP sends a small request to a health endpoint on every destination, such as /health. If the server answers with a success code (2xx), it is marked healthy. If it fails or times out, it is marked unhealthy and taken out of rotation until it recovers.

Passive health checks are reactive. YARP simply watches the real responses flowing through it. If a destination starts returning lots of failures, YARP marks it unhealthy without needing a separate probe. Passive checks need the passive health middleware, which MapReverseProxy adds for you automatically.

Figure 3: A destination moves between healthy and unhealthy. YARP only sends real traffic to healthy ones.

Here is what active health checks look like in configuration. We add a HealthCheck block to the cluster.

{
  "ReverseProxy": {
    "Clusters": {
      "api-cluster": {
        "LoadBalancingPolicy": "PowerOfTwoChoices",
        "HealthCheck": {
          "Active": {
            "Enabled": true,
            "Interval": "00:00:10",
            "Timeout": "00:00:05",
            "Policy": "ConsecutiveFailures",
            "Path": "/health"
          }
        },
        "Destinations": {
          "d1": { "Address": "https://localhost:5101/" },
          "d2": { "Address": "https://localhost:5102/" }
        }
      }
    }
  }
}

This says: every 10 seconds, ping /health on each destination, give it 5 seconds to answer, and use the ConsecutiveFailures policy to decide when a destination has failed enough times to be marked unhealthy. For this to work, your API needs a real /health endpoint. In ASP.NET Core you add one with a couple of lines.

// In your API project (not the gateway), expose a health endpoint.
builder.Services.AddHealthChecks();
 
var app = builder.Build();
 
app.MapHealthChecks("/health");

What happens when a server goes down

Probe

Detect

Remove

Reroute

Recover

Steps

Probe

YARP pings /health on a timer

Detect

d2 fails repeatedly

Remove

d2 marked unhealthy, no traffic

Reroute

users served by d1 and d3

Recover

d2 passes probe, returns to pool

Health checks let YARP route around a sick destination automatically, then bring it back.

Running real copies to test it

To actually see load balancing, you need more than one copy of your API running. On one machine, the easy trick is to start the same API on different ports. You can do this from the terminal by setting the URL each time.

# Terminal 1
dotnet run --project MyApi --urls "https://localhost:5101"
 
# Terminal 2
dotnet run --project MyApi --urls "https://localhost:5102"
 
# Terminal 3
dotnet run --project MyApi --urls "https://localhost:5103"

Now start the gateway too. Send several requests to the gateway and add a line in your API that logs the port it is running on. You will see the responses come from different ports as YARP shares the load. In real production you would not run copies by hand. A container platform like Kubernetes, Azure Container Apps, or even .NET Aspire during development would start and manage the copies for you, and YARP would balance across them.

A note on shared state

Horizontal scaling brings one new question you must answer: where does shared data live? If a user logs in on copy 1, and their next request lands on copy 2, copy 2 must also know they are logged in.

The golden rule is to keep your API copies stateless. Do not store session data, locks, or caches in the memory of one copy. Instead push that shared state into something all copies can reach, like a database or a Redis cache. That way it does not matter which copy serves you. Any of them can do the job, which is exactly what makes adding more copies safe and easy.

Where state lives	Safe to scale?	Why
In one server's memory	No	Other copies cannot see it
In a shared database	Yes	Every copy reads the same data
In Redis or a cache	Yes	Every copy shares one source of truth
In a sticky cookie only	Risky	Breaks if that one server dies

Where YARP fits in your stack

You might already have a cloud load balancer at the very edge of your system. That is fine. YARP does not have to replace it. Many teams keep the cloud balancer for raw network spreading and TLS, and put YARP just behind it for smart, app-aware routing in C#. For example, YARP can route /orders to one cluster and /payments to another, add headers, or apply rate limits, all in code your team controls.

So the choice is not always "YARP or Nginx." It is often "what is the right tool at each layer." For pure app logic and team-owned routing rules, having it in your .NET code is a real advantage.

Quick recap

Horizontal scaling means running many copies of your API and sharing traffic between them, instead of making one server bigger.
YARP is Microsoft's free reverse proxy library. You build the load balancer inside a normal ASP.NET Core app.
A cluster is a group of equal destinations (API copies). A route matches a URL and sends it to a cluster.
The load balancing policy decides which destination handles a request. PowerOfTwoChoices is a great default; RoundRobin and LeastRequests are common alternatives.
Health checks keep traffic away from broken servers. Use active checks (timed probes to /health) and passive checks (watching real responses).
Keep your API copies stateless. Push shared data into a database or Redis so any copy can serve any request.
YARP can work alongside a cloud load balancer, not only replace it. Use each tool where it fits best.

Horizontally Scaling ASP.NET Core APIs With YARP Load Balancing

One ticket counter is not enough

What is YARP?

Two words: cluster and destination

Setting up YARP step by step

From zero to a working YARP gateway

How does YARP choose a destination?

Why health checks matter

What happens when a server goes down

Running real copies to test it

A note on shared state

Where YARP fits in your stack

Quick recap

References and further reading

Related Posts

Rate Limiting in ASP.NET Core: A Simple, Complete Guide

Caching in ASP.NET Core: Make Your App Fast (The Easy Way)

.NET Aspire: A Game Changer for Cloud-Native Development

YARP as an API Gateway in .NET: A Beginner's Guide

Implementing an API Gateway for Microservices With YARP

YARP vs Nginx: A Quick Performance Comparison for .NET