Skip to main content
SEMastery
ASP.NETintermediate

Rate Limiting in ASP.NET Core: A Simple, Complete Guide

Learn rate limiting in ASP.NET Core with simple examples. Understand fixed window, sliding window, token bucket, and concurrency limiters, with diagrams, code, and real-world advice on which to pick.

11 min readUpdated November 14, 2025

A theme park ride with limited seats

Picture a popular ride at a theme park. The ride can take only a fixed number of people every minute. If everyone rushed in at once, the ride would break and nobody would have a safe, fun time. So the staff control the flow — only so many people per minute, and the rest wait in line.

A web API is just like that ride. If too many requests arrive at once — whether from a busy client, a buggy script, or an attacker — your server can slow down or crash, hurting everyone. Rate limiting is the staff member who controls the flow: it allows a fair number of requests and politely tells the rest to wait.

When a client goes over the limit, the server does not do the work. Instead it replies with HTTP 429 Too Many Requests, which means "you are going too fast, please slow down." ASP.NET Core has had built-in rate limiting since .NET 7, and the API has been stable and refined through .NET 8, 9, and 10.

Let us learn how it works and which type to choose.

What rate limiting does, in one picture

Every incoming request passes through the limiter first. The limiter checks: is this client under their limit? If yes, the request continues. If no, it is rejected with 429.

Figure 1: The limiter sits in front of your API. Allowed requests pass through; over-limit requests get a 429.

The four built-in limiters

ASP.NET Core gives you four ready-made limiter types. Three of them cap requests per time period; the fourth caps requests at the same moment.

The Four Built-In Rate Limiters

Fixed Window
Sliding Window
Token Bucket
Concurrency

Steps

1

Fixed Window

N requests per fixed period; resets on the clock

2

Sliding Window

N requests over a rolling period; smoother and fairer

3

Token Bucket

A jar of tokens refilled steadily; allows short bursts

4

Concurrency

Only N requests may run at the same time

Three limiters control requests over time. The concurrency limiter controls how many run at once.

Fixed window

The simplest one. You allow, say, 100 requests per minute. A counter starts at 0, climbs with each request, and resets to 0 when the minute ends.

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("fixed", limiter =>
    {
        limiter.PermitLimit = 100;             // 100 requests
        limiter.Window = TimeSpan.FromMinutes(1); // per 1 minute
        limiter.QueueLimit = 0;                // no waiting queue
    });
});

Fixed window is easy but has a fairness flaw, which we will see next.

Sliding window

The fixed window has a sneaky problem: the boundary burst. A client can send 100 requests in the last second of one window and another 100 in the first second of the next — 200 requests in about two seconds, even though the "limit" is 100 per minute.

Figure 2: The fixed-window boundary burst. 100 requests at the end of window 1 and 100 at the start of window 2 means 200 in a tiny span.

The sliding window fixes this by tracking requests over a rolling period instead of resetting on the clock. It breaks the window into small segments and keeps a running total, so a burst at the boundary is still counted against the limit. No boundary burst is possible.

options.AddSlidingWindowLimiter("sliding", limiter =>
{
    limiter.PermitLimit = 100;
    limiter.Window = TimeSpan.FromMinutes(1);
    limiter.SegmentsPerWindow = 6; // tracks 10-second segments
});

It uses a little more memory (it tracks per-segment counts), but the fairness is worth it for most public APIs.

Token bucket

The token bucket is the most flexible. Imagine a jar that holds up to 100 tokens. Each request takes one token. Tokens are added back steadily — say, 10 every 10 seconds. If the jar is empty, requests are refused until more tokens drip in.

This shape allows a short burst (spend the whole jar at once) but enforces a steady average rate (the refill speed). It is perfect for clients that are normally quiet but occasionally need to send a quick burst.

Figure 3: The token bucket. Requests spend tokens; the bucket refills at a fixed rate. Empty bucket means wait.
options.AddTokenBucketLimiter("token", limiter =>
{
    limiter.TokenLimit = 100;                          // jar holds 100
    limiter.TokensPerPeriod = 10;                      // add 10...
    limiter.ReplenishmentPeriod = TimeSpan.FromSeconds(10); // ...every 10s
    limiter.QueueLimit = 0;
});

Concurrency limiter

The odd one out. It does not care about requests per minute — it cares about how many requests are running at the same time. Each request takes one slot; when it finishes, the slot is freed. This is great for protecting an expensive operation (like a heavy report) from being run by too many people at once.

options.AddConcurrencyLimiter("concurrency", limiter =>
{
    limiter.PermitLimit = 10;  // only 10 requests in flight at once
    limiter.QueueLimit = 20;   // up to 20 more may wait
});

Choosing the right limiter

Here is a quick comparison to help you pick:

LimiterControlsAllows bursts?Best for
Fixed windowRequests per fixed periodYes (boundary burst)Simple internal limits
Sliding windowRequests per rolling periodNoMost public APIs (fair default)
Token bucketAverage rate + burst sizeYes (controlled)Clients with occasional bursts
ConcurrencyRequests running at onceN/AProtecting heavy operations

And a plain-language rule of thumb:

If you want to…Use…
Be fair and simple for a public APISliding window
Allow a quick burst but a steady averageToken bucket
Limit how many heavy jobs run togetherConcurrency
Just get started quicklyFixed window

Wiring it up and limiting per user

Defining a limiter is half the job. You also turn the middleware on and apply it. The most useful real-world setup limits each client separately, so one noisy client cannot use up everyone else's budget. You do this by partitioning on something that identifies the client, like their API key or IP address.

builder.Services.AddRateLimiter(options =>
{
    // A separate limit per API key (or per IP if no key).
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(
        context =>
        {
            var key = context.Request.Headers["X-Api-Key"].ToString();
            if (string.IsNullOrEmpty(key))
                key = context.Connection.RemoteIpAddress?.ToString() ?? "anonymous";
 
            return RateLimitPartition.GetSlidingWindowLimiter(key, _ =>
                new SlidingWindowRateLimiterOptions
                {
                    PermitLimit = 100,
                    Window = TimeSpan.FromMinutes(1),
                    SegmentsPerWindow = 6,
                });
        });
 
    // Send a friendly 429 with a Retry-After header.
    options.OnRejected = async (context, token) =>
    {
        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
        context.HttpContext.Response.Headers.RetryAfter = "60";
        await context.HttpContext.Response.WriteAsync(
            "Too many requests. Please try again later.", token);
    };
});
 
var app = builder.Build();
app.UseRateLimiter(); // turn the middleware on
💡

Always send a Retry-After header with your 429 responses. It tells well-behaved clients exactly how long to wait before trying again, which makes their retries smarter and reduces wasted traffic.

You can also apply a named limiter to specific endpoints instead of globally:

app.MapGet("/report", () => GenerateHeavyReport())
   .RequireRateLimiting("concurrency");

A few important details

  • It is in-memory by default. Each server keeps its own counts. If you run several instances behind a load balancer and need a shared limit, use a distributed store like Redis (through a custom limiter or a library). Otherwise a client could get "limit × number-of-servers" requests.
  • Order matters. Put app.UseRateLimiter() early in the pipeline so you reject floods before spending work on authentication, database calls, and the rest.
  • Pick limits from real data. Start with a generous limit, watch your traffic, and tighten it. Limits that are too strict frustrate honest users; too loose and they do not protect you.
  • Combine with caching and auth. Rate limiting is one layer. Pair it with caching to reduce load and authentication to know who each client is.

The full request lifecycle through the limiter

It helps to picture where the limiter sits and what happens at each step when a request arrives:

A Request's Journey Through the Rate Limiter

Request In
Find Partition Key
Check Budget
Allow or Queue
429 if Over

Steps

1

Arrive

A request hits the rate limiter middleware first

2

Identify

Pick the partition key — API key or IP address

3

Check

Is this client under their limit in this window?

4

Allow

Under limit → pass through (or wait in the queue)

5

Reject

Over limit → return 429 with a Retry-After header

The limiter runs early. It identifies the client, checks their budget, and either lets the request through or returns a 429.

Watch your 429s and tune the limits

A rate limiter is only useful if you can see it working. Treat the number of 429 responses as a health signal:

  • Too many 429s for normal users means your limit is too strict — honest people are being blocked, and you should raise it.
  • Almost no 429s ever might mean the limit is too loose to protect you during a real spike.
  • A sudden flood of 429s from one client is a useful alarm — it could be a runaway script or an attack.

Log every rejection with the client key and the endpoint, and add a simple counter metric for 429s so you can graph it. Over a week or two of real traffic, this data tells you exactly where to set each limit. Start generous, watch the graph, and tighten gradually — guessing limits up front almost always gets them wrong.

ℹ️

A quick way to test your limiter locally is to fire many requests in a loop with a tool like curl, k6, or bombardier, and confirm you start seeing 429 responses once you cross the limit. This proves the limiter is actually wired in before you rely on it in production.

When you need rate limiting

You almost always want some rate limiting on a public API. It protects you from traffic spikes, runaway scripts, brute-force login attempts, and simple denial-of-service abuse. It also keeps one heavy customer from starving everyone else.

You can relax it for trusted internal services on a private network, where every caller is known and well-behaved — though even there, a generous concurrency limit on expensive endpoints is wise insurance.

Quick recap

  • Rate limiting controls how many requests a client may make, replying with 429 Too Many Requests when they go over.
  • ASP.NET Core has four built-in limiters: fixed window, sliding window, token bucket, and concurrency.
  • Sliding window is a fair default; token bucket allows controlled bursts; concurrency caps simultaneous heavy work; fixed window is simplest but allows boundary bursts.
  • Partition per client (API key or IP) so one client cannot eat everyone's budget, and always send a Retry-After header.
  • The built-in limiters are in-memory — use Redis for a shared limit across many servers.

Like the friendly staff at a theme park ride, rate limiting keeps the flow safe and fair so every visitor gets a good experience — even on the busiest day.

References and further reading

Related Posts