Rate Limiting in ASP.NET Core: A Simple, Complete Guide
Learn rate limiting in ASP.NET Core with simple examples. Understand fixed window, sliding window, token bucket, and concurrency limiters, with diagrams, code, and real-world advice on which to pick.
A theme park ride with limited seats
Picture a popular ride at a theme park. The ride can take only a fixed number of people every minute. If everyone rushed in at once, the ride would break and nobody would have a safe, fun time. So the staff control the flow — only so many people per minute, and the rest wait in line.
A web API is just like that ride. If too many requests arrive at once — whether from a busy client, a buggy script, or an attacker — your server can slow down or crash, hurting everyone. Rate limiting is the staff member who controls the flow: it allows a fair number of requests and politely tells the rest to wait.
When a client goes over the limit, the server does not do the work. Instead it replies with HTTP 429 Too Many Requests, which means "you are going too fast, please slow down." ASP.NET Core has had built-in rate limiting since .NET 7, and the API has been stable and refined through .NET 8, 9, and 10.
Let us learn how it works and which type to choose.
What rate limiting does, in one picture
Every incoming request passes through the limiter first. The limiter checks: is this client under their limit? If yes, the request continues. If no, it is rejected with 429.
The four built-in limiters
ASP.NET Core gives you four ready-made limiter types. Three of them cap requests per time period; the fourth caps requests at the same moment.
The Four Built-In Rate Limiters
Steps
Fixed Window
N requests per fixed period; resets on the clock
Sliding Window
N requests over a rolling period; smoother and fairer
Token Bucket
A jar of tokens refilled steadily; allows short bursts
Concurrency
Only N requests may run at the same time
Fixed window
The simplest one. You allow, say, 100 requests per minute. A counter starts at 0, climbs with each request, and resets to 0 when the minute ends.
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("fixed", limiter =>
{
limiter.PermitLimit = 100; // 100 requests
limiter.Window = TimeSpan.FromMinutes(1); // per 1 minute
limiter.QueueLimit = 0; // no waiting queue
});
});Fixed window is easy but has a fairness flaw, which we will see next.
Sliding window
The fixed window has a sneaky problem: the boundary burst. A client can send 100 requests in the last second of one window and another 100 in the first second of the next — 200 requests in about two seconds, even though the "limit" is 100 per minute.
The sliding window fixes this by tracking requests over a rolling period instead of resetting on the clock. It breaks the window into small segments and keeps a running total, so a burst at the boundary is still counted against the limit. No boundary burst is possible.
options.AddSlidingWindowLimiter("sliding", limiter =>
{
limiter.PermitLimit = 100;
limiter.Window = TimeSpan.FromMinutes(1);
limiter.SegmentsPerWindow = 6; // tracks 10-second segments
});It uses a little more memory (it tracks per-segment counts), but the fairness is worth it for most public APIs.
Token bucket
The token bucket is the most flexible. Imagine a jar that holds up to 100 tokens. Each request takes one token. Tokens are added back steadily — say, 10 every 10 seconds. If the jar is empty, requests are refused until more tokens drip in.
This shape allows a short burst (spend the whole jar at once) but enforces a steady average rate (the refill speed). It is perfect for clients that are normally quiet but occasionally need to send a quick burst.
options.AddTokenBucketLimiter("token", limiter =>
{
limiter.TokenLimit = 100; // jar holds 100
limiter.TokensPerPeriod = 10; // add 10...
limiter.ReplenishmentPeriod = TimeSpan.FromSeconds(10); // ...every 10s
limiter.QueueLimit = 0;
});Concurrency limiter
The odd one out. It does not care about requests per minute — it cares about how many requests are running at the same time. Each request takes one slot; when it finishes, the slot is freed. This is great for protecting an expensive operation (like a heavy report) from being run by too many people at once.
options.AddConcurrencyLimiter("concurrency", limiter =>
{
limiter.PermitLimit = 10; // only 10 requests in flight at once
limiter.QueueLimit = 20; // up to 20 more may wait
});Choosing the right limiter
Here is a quick comparison to help you pick:
| Limiter | Controls | Allows bursts? | Best for |
|---|---|---|---|
| Fixed window | Requests per fixed period | Yes (boundary burst) | Simple internal limits |
| Sliding window | Requests per rolling period | No | Most public APIs (fair default) |
| Token bucket | Average rate + burst size | Yes (controlled) | Clients with occasional bursts |
| Concurrency | Requests running at once | N/A | Protecting heavy operations |
And a plain-language rule of thumb:
| If you want to… | Use… |
|---|---|
| Be fair and simple for a public API | Sliding window |
| Allow a quick burst but a steady average | Token bucket |
| Limit how many heavy jobs run together | Concurrency |
| Just get started quickly | Fixed window |
Wiring it up and limiting per user
Defining a limiter is half the job. You also turn the middleware on and apply it. The most useful real-world setup limits each client separately, so one noisy client cannot use up everyone else's budget. You do this by partitioning on something that identifies the client, like their API key or IP address.
builder.Services.AddRateLimiter(options =>
{
// A separate limit per API key (or per IP if no key).
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(
context =>
{
var key = context.Request.Headers["X-Api-Key"].ToString();
if (string.IsNullOrEmpty(key))
key = context.Connection.RemoteIpAddress?.ToString() ?? "anonymous";
return RateLimitPartition.GetSlidingWindowLimiter(key, _ =>
new SlidingWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
SegmentsPerWindow = 6,
});
});
// Send a friendly 429 with a Retry-After header.
options.OnRejected = async (context, token) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
context.HttpContext.Response.Headers.RetryAfter = "60";
await context.HttpContext.Response.WriteAsync(
"Too many requests. Please try again later.", token);
};
});
var app = builder.Build();
app.UseRateLimiter(); // turn the middleware onAlways send a Retry-After header with your 429 responses. It tells well-behaved clients exactly how long to wait before trying again, which makes their retries smarter and reduces wasted traffic.
You can also apply a named limiter to specific endpoints instead of globally:
app.MapGet("/report", () => GenerateHeavyReport())
.RequireRateLimiting("concurrency");A few important details
- It is in-memory by default. Each server keeps its own counts. If you run several instances behind a load balancer and need a shared limit, use a distributed store like Redis (through a custom limiter or a library). Otherwise a client could get "limit × number-of-servers" requests.
- Order matters. Put
app.UseRateLimiter()early in the pipeline so you reject floods before spending work on authentication, database calls, and the rest. - Pick limits from real data. Start with a generous limit, watch your traffic, and tighten it. Limits that are too strict frustrate honest users; too loose and they do not protect you.
- Combine with caching and auth. Rate limiting is one layer. Pair it with caching to reduce load and authentication to know who each client is.
The full request lifecycle through the limiter
It helps to picture where the limiter sits and what happens at each step when a request arrives:
A Request's Journey Through the Rate Limiter
Steps
Arrive
A request hits the rate limiter middleware first
Identify
Pick the partition key — API key or IP address
Check
Is this client under their limit in this window?
Allow
Under limit → pass through (or wait in the queue)
Reject
Over limit → return 429 with a Retry-After header
Watch your 429s and tune the limits
A rate limiter is only useful if you can see it working. Treat the number of 429 responses as a health signal:
- Too many 429s for normal users means your limit is too strict — honest people are being blocked, and you should raise it.
- Almost no 429s ever might mean the limit is too loose to protect you during a real spike.
- A sudden flood of 429s from one client is a useful alarm — it could be a runaway script or an attack.
Log every rejection with the client key and the endpoint, and add a simple counter metric for 429s so you can graph it. Over a week or two of real traffic, this data tells you exactly where to set each limit. Start generous, watch the graph, and tighten gradually — guessing limits up front almost always gets them wrong.
A quick way to test your limiter locally is to fire many requests in a loop with a tool like curl, k6, or bombardier, and confirm you start seeing 429 responses once you cross the limit. This proves the limiter is actually wired in before you rely on it in production.
When you need rate limiting
You almost always want some rate limiting on a public API. It protects you from traffic spikes, runaway scripts, brute-force login attempts, and simple denial-of-service abuse. It also keeps one heavy customer from starving everyone else.
You can relax it for trusted internal services on a private network, where every caller is known and well-behaved — though even there, a generous concurrency limit on expensive endpoints is wise insurance.
Quick recap
- Rate limiting controls how many requests a client may make, replying with 429 Too Many Requests when they go over.
- ASP.NET Core has four built-in limiters: fixed window, sliding window, token bucket, and concurrency.
- Sliding window is a fair default; token bucket allows controlled bursts; concurrency caps simultaneous heavy work; fixed window is simplest but allows boundary bursts.
- Partition per client (API key or IP) so one client cannot eat everyone's budget, and always send a Retry-After header.
- The built-in limiters are in-memory — use Redis for a shared limit across many servers.
Like the friendly staff at a theme park ride, rate limiting keeps the flow safe and fair so every visitor gets a good experience — even on the busiest day.
References and further reading
- Rate limiting middleware in ASP.NET Core — Microsoft Learn — the official documentation.
- Rate limiting middleware samples — Microsoft Learn — ready-to-run examples for each limiter.
- Mastering Distributed Rate Limiting in ASP.NET Core — Developer's Voice — going beyond in-memory with Redis.
Related Posts
Advanced Rate Limiting Use Cases in .NET: A Friendly Deep Dive
Go beyond the basics of ASP.NET Core rate limiting: per-user limits, chained limiters, friendly 429 responses, Redis for many servers, and tier-based rules.
Caching in ASP.NET Core: Make Your App Fast (The Easy Way)
Learn caching in ASP.NET Core with simple examples. Understand in-memory cache, distributed Redis cache, HybridCache, and output cache, with diagrams, code, and clear advice on which to use and when.
API Key Authentication in ASP.NET Core: The Secure Way
Learn how to add API key authentication to your ASP.NET Core API the right way. Use an AuthenticationHandler, hash keys, compare safely, and follow 2026 security best practices, with diagrams and code.
HTTPS Redirection and HSTS in ASP.NET Core: A Simple Guide
Learn how to configure HTTPS redirection and HSTS in ASP.NET Core with simple examples, diagrams, and clear advice for development and production.
ASP.NET Core Output Cache: Speed Up Your API with In-Memory and Redis
Learn ASP.NET Core output caching the easy way: cache whole API responses in memory or in Redis, set policies, vary by query, and clear with tags — with diagrams and code.
Top 15 Mistakes Developers Make When Creating Web APIs
A warm, beginner-friendly tour of the 15 most common Web API mistakes in ASP.NET Core, with simple fixes, diagrams, tables, and clear C# examples.