Skip to main content
SEMastery
Fundamentalsintermediate

How to Scale Long-Running API Requests in .NET: A Beginner's Guide

Learn how to handle slow, long-running API requests in .NET using the 202 Accepted pattern, background services, channels, and status polling.

11 min readUpdated April 26, 2026

Imagine you walk into a busy sweet shop to order a big box of fresh jalebis for a wedding. The shopkeeper cannot make 500 jalebis while you stand at the counter. If he tried, the whole queue behind you would be stuck for an hour. Nobody could buy even a single laddu.

So a smart shopkeeper does something else. He takes your order, gives you a token number, and says "come back in an hour, your box will be ready." You walk away free. He keeps serving other customers at the counter. In the back kitchen, his team slowly fries your jalebis. When you return and show your token, your hot box is waiting.

A long-running API request is exactly this problem. Some work is just too slow to finish while the caller waits. In this guide you will learn how to build the "token system" for your API in .NET 10, so slow jobs never block your server. We will go step by step, in plain language.

What is a long-running request?

Most API calls are fast. You ask for a user's profile, the server reads one row, and replies in a few milliseconds. Easy.

But some requests ask for slow work:

  • Generating a 200-page PDF report.
  • Resizing or converting a large video.
  • Sending 50,000 emails.
  • Calling a slow third-party service many times.

If your API tries to do this slow work while the caller waits, three bad things happen.

  1. The caller's connection may time out before you finish.
  2. The server thread is stuck holding that one request, so it cannot serve others.
  3. If the caller retries, you might do the same heavy work twice.

Here is the difference between a fast request and a slow one.

A fast request replies right away. A slow one blocks everything if done the wrong way.

The goal is simple. We want the slow path to behave like the fast path: reply quickly, then do the heavy work somewhere else.

The big idea: accept now, work later

The trick is the same as the sweet shop. Do not do the slow work inside the request. Instead:

  1. Accept the request and save what needs to be done.
  2. Return a job id and an HTTP 202 Accepted status right away.
  3. A background worker picks up the job and does the slow work.
  4. The caller polls a status endpoint to check progress.
  5. When the job is done, the caller fetches the result.

This is called the Asynchronous Request-Reply pattern. Microsoft documents it in the Azure Architecture Center. Let us see the whole flow.

The accept-now, work-later pattern with a job id and status polling.

Notice the API never waits for the slow work. It hands the job to a queue and returns a token (the job id). This keeps your server fast and free.

The job lifecycle

Accepted
Queued
Running
Completed

Steps

1

Accepted

API saves the job, returns 202 + id

2

Queued

job waits in the queue

3

Running

worker does the slow work

4

Completed

result is ready to fetch

Every long-running job moves through these states.

Step 1: Accept the request and return 202

Let us start with the endpoint the caller hits. It should be tiny. It only validates the input, creates a job id, drops the job on a queue, and returns.

app.MapPost("/reports", async (
    ReportRequest request,
    IBackgroundQueue queue,
    IJobStore jobs) =>
{
    // 1. Make a job id and remember it.
    var jobId = Guid.NewGuid();
    await jobs.CreateAsync(jobId, JobStatus.Queued);
 
    // 2. Put the work on the queue. Do NOT run it here.
    await queue.EnqueueAsync(new ReportJob(jobId, request.CustomerId));
 
    // 3. Tell the caller where to check progress.
    return Results.Accepted($"/jobs/{jobId}", new { id = jobId });
});

Three small things happen here. We create a job record, we enqueue the work, and we return 202 Accepted. The Results.Accepted helper also sets a Location header pointing at the status endpoint, so the caller knows exactly where to look next.

This endpoint finishes in a few milliseconds, even though the real work might take two minutes.

Step 2: A queue inside your app with Channels

We need a place to hold jobs between "accepted" and "running". For work that lives inside a single app, .NET gives us a perfect tool: System.Threading.Channels. A channel is like a safe pipe. One side writes jobs in, the other side reads them out. It is async-friendly and fast.

We will use a bounded channel. Bounded means it has a maximum size. This gives us backpressure: if the queue is full, new work waits or is rejected instead of piling up forever and crashing the server.

public sealed class BackgroundQueue : IBackgroundQueue
{
    private readonly Channel<ReportJob> _channel =
        Channel.CreateBounded<ReportJob>(new BoundedChannelOptions(100)
        {
            FullMode = BoundedChannelFullMode.Wait
        });
 
    public async ValueTask EnqueueAsync(ReportJob job) =>
        await _channel.Writer.WriteAsync(job);
 
    public IAsyncEnumerable<ReportJob> ReadAllAsync(CancellationToken ct) =>
        _channel.Reader.ReadAllAsync(ct);
}

The number 100 is the capacity. With FullMode = Wait, if 100 jobs are already waiting, the next enqueue pauses until there is room. This protects your memory during a traffic spike.

Why bounded queues protect you

Spike
Bounded Queue
Steady Work
Safe Server

Steps

1

Spike

many requests arrive at once

2

Bounded Queue

only N jobs held, rest wait

3

Steady Work

worker drains at a safe rate

4

Safe Server

memory stays under control

A bounded channel turns a dangerous flood into a calm, steady line.

Step 3: A background worker that does the slow work

Now we need something that reads jobs from the channel and actually does them. In ASP.NET Core this is a BackgroundService, a class that runs quietly in the background for the whole life of the app.

public sealed class ReportWorker(
    IBackgroundQueue queue,
    IJobStore jobs,
    IServiceProvider services) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await foreach (var job in queue.ReadAllAsync(stoppingToken))
        {
            await jobs.UpdateAsync(job.Id, JobStatus.Running);
            try
            {
                using var scope = services.CreateScope();
                var maker = scope.ServiceProvider.GetRequiredService<IReportMaker>();
                var url = await maker.BuildAsync(job, stoppingToken);
 
                await jobs.CompleteAsync(job.Id, url);
            }
            catch (Exception ex)
            {
                await jobs.FailAsync(job.Id, ex.Message);
            }
        }
    }
}

The worker loops forever. Each time a job arrives, it marks the job Running, does the heavy work, and then stores the result link. If something breaks, it marks the job Failed with a message. Notice we create a scope so the worker can safely use scoped services like a database context.

Register both pieces in Program.cs:

builder.Services.AddSingleton<IBackgroundQueue, BackgroundQueue>();
builder.Services.AddHostedService<ReportWorker>();

Step 4: Let the caller check progress

Returning 202 is only half the story. The caller now holds a job id but does not know when the work is done. So we add a status endpoint.

app.MapGet("/jobs/{id:guid}", async (Guid id, IJobStore jobs) =>
{
    var job = await jobs.GetAsync(id);
    if (job is null)
        return Results.NotFound();
 
    return job.Status switch
    {
        JobStatus.Completed => Results.Redirect(job.ResultUrl!),  // 302 to result
        JobStatus.Failed    => Results.Problem(job.Error),
        _                   => Results.Ok(new { id, status = job.Status.ToString() })
    };
});

The caller polls GET /jobs/{id} every few seconds. While the job is Queued or Running, it gets 200 OK with the status. When the job is Completed, it gets redirected to the finished result. Clean and simple.

Here is the state machine the job follows on the server.

The job status states and how a job moves between them.

Polling vs pushing: how callers get updates

Polling is the easiest way for a caller to learn that a job is done, but it is not the only way. Here is how the common options compare.

MethodHow it worksBest for
PollingCaller asks GET /jobs/{id} on a timerSimple clients, public APIs
WebhookServer calls the caller's URL when doneServer-to-server systems
SignalR / WebSocketServer pushes a live messageDashboards, live progress bars

Start with polling. It works everywhere and needs no special setup. Move to webhooks or SignalR only when you truly need instant updates. Keep your first version boring and reliable.

When in-process is not enough

The channel-and-worker setup is great, but it has one weakness. If your server restarts, every job sitting in memory is lost. For a "generate report" feature that may be fine, because the caller can just ask again. For "charge the customer" it is not fine at all.

When jobs must survive restarts or run across many machines, you need a durable queue. This means the jobs are stored in a database, Redis, or a cloud queue, not just in memory. A popular .NET choice is Hangfire, which stores jobs in SQL Server, PostgreSQL, or Redis and can run workers on several servers at once.

Scaling out: many API servers and many workers share one durable job store.

With a shared store, you can add more API servers to accept work and more workers to process it, each scaling on its own. This is how you handle real load.

A quick honesty note about licensing, because it matters when you choose tools. Some well-known libraries changed their terms recently. MediatR and MassTransit have moved to commercial licensing for newer versions, so check the license before adding them to a company project. Hangfire has a free open-source core with a paid Pro edition for advanced features. Always read the license page first.

Here is a simple way to compare your options.

OptionSurvives restart?Scales across machines?Setup effort
Channel + BackgroundServiceNoNoVery low
Hangfire + databaseYesYesMedium
Cloud queue + workersYesYesMedium to high

Three rules that keep you safe

As your jobs grow, three habits will save you a lot of pain.

Make jobs idempotent. Idempotent is a big word for a simple idea: running the same job twice should not cause double trouble. Use the job id as an idempotency key. Before charging a customer, check "did I already finish job 42?" If yes, skip it. Networks retry, workers crash and restart, and the same job can arrive twice. Safe jobs survive that.

Always set a timeout and cancellation. A slow job should not run forever. Pass the CancellationToken into every async call so the work stops cleanly when the app shuts down or a deadline passes.

Watch your queue depth. If the number of waiting jobs keeps climbing, your workers cannot keep up. That is an early warning. Track queue depth, failure rate, and job duration, and set an alert before users feel the slowdown.

A safe job, step by step

Check idempotency
Run with timeout
Record result
Emit metrics

Steps

1

Check idempotency

skip if already done

2

Run with timeout

honor cancellation token

3

Record result

store status and output

4

Emit metrics

track depth and failures

Each guardrail prevents a common production failure.

Putting it together

Let us retell the whole journey in plain words, the sweet shop way.

A caller asks for slow work. Your API takes the order, writes a job record, drops it on a bounded queue, and hands back a token with 202 Accepted. Your server stays fast and free for everyone else. A background worker quietly takes the job, does the heavy lifting, and saves the result. The caller checks the status endpoint now and then. When the job is done, they collect the finished result.

That is the entire pattern. The same shape works whether you handle ten jobs a day or ten thousand. You only swap the in-memory channel for a durable store when jobs must not be lost.

Quick recap

  • Never do slow work while the caller waits. It times out and blocks your server.
  • Accept the request, save a job, and return HTTP 202 Accepted with a job id.
  • Use System.Threading.Channels plus a BackgroundService for in-process work.
  • Use a bounded channel for backpressure so spikes cannot flood your memory.
  • Give callers a status endpoint to poll, like GET /jobs/{id}.
  • Move to a durable queue (Hangfire, Redis, or a cloud queue) when jobs must survive restarts or scale across machines.
  • Make jobs idempotent, add timeouts and cancellation, and monitor queue depth.
  • Check tool licenses: MediatR and MassTransit are now commercially licensed for newer versions.

References and further reading

Related Posts