Do I need microservices to scale a monolith?

No. Most monoliths can scale very far with simple steps: give the server more power (vertical scaling), make the app stateless, run several copies behind a load balancer (horizontal scaling), add caching, and use database read replicas. Microservices are a much bigger change and you only need them when teams or scaling needs truly demand it.

What does it mean for an app to be stateless?

Stateless means the app keeps no important memory of a user between requests on the server itself. Every request carries everything needed to handle it, and shared data lives in a database or a distributed cache like Redis. This lets any copy of the app handle any request, which is the key to running many copies safely.

What is the difference between vertical and horizontal scaling?

Vertical scaling means making one machine stronger by adding more CPU, memory, or faster disks. Horizontal scaling means running many copies of the app on several machines behind a load balancer. Vertical is the simplest first step; horizontal gives you fault tolerance and near-unlimited growth, but the app must be stateless first.

Is caching always a good idea when scaling?

Caching is powerful but not free. It works best for data that is read often and changes rarely. The hard part is invalidation — knowing when to throw old data away. Start by caching slow, popular, rarely-changing data, set sensible expiry times, and measure the hit rate before caching everything.

Architectureintermediate

Scaling Monoliths: A Practical Guide for Growing .NET Systems

Learn how to scale a .NET monolith step by step: vertical scaling, stateless apps, load balancing with YARP, caching with Redis, and read replicas — in simple words.

15 min readUpdated January 18, 2026

A small tea shop that grew popular

Imagine a small tea shop near a railway station. One person runs it. He takes your order, makes the chai, pours it, and hands it over. When only a few people come, this works beautifully.

Then exams end, a festival begins, and suddenly a huge crowd arrives. One person cannot keep up. The line grows longer and longer. People get angry and leave.

What does the owner do? He has a few choices, and they are exactly the choices we have when a software system grows.

First, he can work faster and buy a bigger stove so he makes chai quicker. That is vertical scaling — making one worker stronger.

Second, he can hire more people and put a manager at the front who sends each new customer to whoever is free. That is horizontal scaling with a load balancer.

Third, he can make a big pot of chai in advance so popular orders are ready instantly. That is caching.

A software monolith grows the same way. You do not have to tear it apart into tiny pieces to make it fast. You just apply the right trick at the right time. Let us learn each one slowly.

What "scaling a monolith" really means

A monolith is one application that you build and deploy as a single unit. People sometimes think a monolith cannot scale. That is a myth. A well-built monolith can serve millions of users.

Scaling means one thing: handling more work without slowing down or breaking. More work can mean more users, more data, or heavier requests.

There is no single magic switch. Scaling is a ladder. You climb one rung at a time, and you only climb higher when the rung you are on stops being enough.

The Scaling Ladder for a Monolith

Measure

Scale Up

Go Stateless

Scale Out

Cache

Scale the DB

Steps

Measure

Find the real bottleneck before changing anything

Scale Up

Give the server more CPU and memory (vertical)

Go Stateless

Remove in-memory session so any copy can serve any request

Scale Out

Run many copies behind a load balancer (horizontal)

Cache

Store hot, slow-changing data in memory or Redis

Scale the DB

Add read replicas so reads do not fight writes

Climb one rung at a time. Most systems never need the top rungs.

The most important rule is at the bottom: measure first. Never guess. Use logging, metrics, and tools to find the slow part. Often the problem is one bad database query, not the whole app. Fixing that is cheaper than any fancy architecture change.

Rung 1: Vertical scaling (make the machine bigger)

Vertical scaling means giving your app more power on the same machine: more CPU cores, more memory, faster disks. It is the simplest step and often the most effective first move.

Think of the tea shop owner buying a faster stove. Nothing about how he works changes. He just has better tools.

In the cloud, this is usually a few clicks or one config line. You move from a small server size to a bigger one. No code change needed.

Aspect	Vertical scaling	Horizontal scaling
What changes	One machine gets bigger	More machines are added
Code changes needed	Usually none	App must be stateless
Cost pattern	Jumps in big steps	Grows smoothly
Limit	A single machine has a ceiling	Almost unlimited
Fault tolerance	One machine fails = all down	One copy fails, others serve

Vertical scaling has two weak points. It hits a ceiling — there is a biggest machine you can buy. And it gives no fault tolerance — if that one machine dies, everything is down.

Before you buy bigger hardware, also remove waste in your code. A common one is the N+1 query problem, where you accidentally hit the database once per row in a loop.

// Slow: one query for orders, then one MORE query per order (N+1).
var orders = await db.Orders.ToListAsync();
foreach (var order in orders)
{
    // Each line here fires a separate database trip. Painful at scale.
    order.Customer = await db.Customers.FindAsync(order.CustomerId);
}
 
// Fast: load everything in a single query with a join.
var ordersWithCustomers = await db.Orders
    .Include(o => o.Customer)
    .ToListAsync();

Fixing waste like this can make an app feel "scaled" without buying anything. Always clean the code before you spend money on hardware.

Rung 2: Make the app stateless

Before you can run many copies of your app, you must make it stateless. This is the single most important idea in this whole guide, so let us go slowly.

Stateful means the app remembers things about a user inside its own memory between requests. For example, it might keep your shopping cart in the server's RAM.

Stateless means the server keeps no such memory. Every request carries what it needs, and any shared data lives somewhere outside the app — a database or a shared cache.

Why does this matter? Imagine three copies of the tea shop, and a manager sending each customer to a free worker. If worker A wrote your order on a sticky note in his own pocket, and your next request goes to worker B, then worker B has no idea what you ordered. Chaos.

The fix: workers write orders on a shared board that everyone can see. Now it does not matter who serves you next.

Stateful apps trap user data in one copy's memory. Stateless apps share state outside the app, so any copy can serve any request.

In ASP.NET Core, the most common "hidden state" is session stored in memory. To go stateless, move it to a distributed cache.

var builder = WebApplication.CreateBuilder(args);
 
// Store session in Redis, not in each server's local memory.
// Now any app copy can read the same session.
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration.GetConnectionString("Redis");
    options.InstanceName = "TeaShop:";
});
 
builder.Services.AddSession();
 
var app = builder.Build();
app.UseSession();
app.Run();

A quick checklist for "am I stateless?":

No important data kept in a static field or in-memory dictionary across requests.
Session and cache live in Redis or the database, not local RAM.
Uploaded files go to shared storage (like blob storage), not the local disk.
Background work uses a shared queue, not an in-memory list.

Once you tick these boxes, you are ready to run many copies safely.

Rung 3: Horizontal scaling (run many copies)

Horizontal scaling, also called scale-out, means running several copies of your app on different machines, with a load balancer in front. The load balancer is the manager at the tea shop: it sends each request to whichever copy is free.

This gives two big wins. You get near-linear scaling — roughly, two copies handle about twice the traffic. And you get fault tolerance — if one copy crashes, the others keep serving while it restarts.

A load balancer spreads incoming requests across many identical app copies. Add more copies to handle more load.

For .NET teams, a great in-house option is YARP (Yet Another Reverse Proxy), Microsoft's own reverse proxy library. It runs as an ASP.NET Core app and can route and balance traffic across your instances. In the cloud you can also use Azure Application Gateway, AWS ALB, or Google Cloud Load Balancing.

Here is a tiny YARP setup that spreads load across two app copies.

var builder = WebApplication.CreateBuilder(args);
 
// YARP reads its routes and clusters from configuration.
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
 
var app = builder.Build();
 
// This single line makes the app act as a load-balancing proxy.
app.MapReverseProxy();
 
app.Run();

And the matching appsettings.json cluster, which lists the two copies to balance between:

// appsettings.json (shown as text)
// "ReverseProxy": {
//   "Routes": {
//     "all": { "ClusterId": "teashop", "Match": { "Path": "{**catch-all}" } }
//   },
//   "Clusters": {
//     "teashop": {
//       "LoadBalancingPolicy": "RoundRobin",
//       "Destinations": {
//         "copy1": { "Address": "https://localhost:5101" },
//         "copy2": { "Address": "https://localhost:5102" }
//       }
//     }
//   }
// }

The RoundRobin policy simply takes turns: copy 1, then copy 2, then copy 1 again. Other policies pick the least-busy copy. Either way, the user never knows or cares which copy served them — and that is exactly the point of being stateless.

The flow of a single request through this system looks like this.

Life of One Request Under Horizontal Scaling

Request

Load Balancer

Free Copy

Shared State

Response

Steps

Request

User sends a request to one public address

Load Balancer

Picks a healthy, free app copy

Free Copy

Runs the logic; keeps no private memory

Shared State

Reads and writes session and data in Redis or the DB

Response

Result returns; the next request may go elsewhere

The user talks to the load balancer only. Copies are interchangeable because state is shared.

Rung 4: Caching (don't redo slow work)

Caching means storing the result of slow work so you can reuse it instantly. The tea shop owner makes a big pot of the most popular chai in advance. When someone orders it, he pours from the pot instead of making a fresh cup. Fast.

In software, you cache data that is read often and changes rarely. Good examples: a product catalog, a list of countries, a user's profile that barely changes.

There are two common kinds in .NET:

Cache type	Where it lives	Best for
In-memory cache	Inside one app copy's RAM	Fast, tiny, per-copy data
Distributed cache (Redis)	A shared server all copies use	Data shared across many copies

With many app copies, prefer a distributed cache like Redis. If each copy had its own private memory cache, they could disagree with each other, which confuses users.

Here is a simple "cache-aside" pattern. You look in the cache first; if it is missing, you load from the database and then save it in the cache for next time.

public async Task<Product?> GetProductAsync(int id)
{
    var key = $"product:{id}";
 
    // 1. Try the cache first.
    var cached = await _cache.GetStringAsync(key);
    if (cached is not null)
        return JsonSerializer.Deserialize<Product>(cached);
 
    // 2. Cache miss: go to the slow source of truth.
    var product = await _db.Products.FindAsync(id);
    if (product is null)
        return null;
 
    // 3. Save it for next time, with an expiry so it cannot go stale forever.
    await _cache.SetStringAsync(key, JsonSerializer.Serialize(product),
        new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
        });
 
    return product;
}

Caching has one famously hard part: invalidation. That means knowing when to throw old data away. If a product's price changes but the cache still holds the old price, users see wrong information.

Two safe habits help a lot. First, always set an expiry time so data cannot live forever. Second, clear the cache key whenever you update that item. Start small: cache only your slowest, most popular, rarely-changing data, and watch your cache hit rate before caching more.

Rung 5: Scale the database

Very often, the real bottleneck is not the app at all — it is the database. You can have ten app copies, but if they all hammer one tired database, the database becomes the new line at the tea shop.

A powerful fix is a read replica. This is a copy of your database that stays in sync with the main one but only answers read queries. Writes still go to the primary. Reads spread across replicas, so reads stop fighting with writes.

Writes go to the primary database. Reads are spread to replicas, easing load on the primary.

One thing to know: replicas can be a tiny bit behind the primary. This is called replication lag. For most reads (showing a product, a list, a profile) a delay of a fraction of a second is perfectly fine. For something that must be exactly current right after a write, read from the primary instead.

Other database moves on this rung include adding the right indexes (so queries do not scan whole tables), connection pooling (reusing connections instead of opening new ones), and archiving very old data so tables stay lean.

Putting it together: a simple decision guide

When your monolith feels slow, do not panic and rewrite everything. Walk the ladder calmly.

What Should I Do When My App Is Slow?

Measure

Fix queries

Scale up

Stateless + scale out

Cache + replicas

Steps

Measure

Use metrics to find the true bottleneck

Fix queries

Kill N+1 and add missing indexes first

Scale up

Bigger machine buys easy time

Stateless + scale out

Many copies behind a load balancer

Cache + replicas

Reuse hot data; spread database reads

A calm, ordered response. Each step is cheaper than the next big rewrite.

Notice what is not here: "rewrite into microservices." Scaling a monolith is mostly about these five honest, well-understood steps. Microservices solve team and deployment independence, not raw performance. You can scale a clean monolith astonishingly far before you ever need them.

A small note on tooling and licensing, since it surprises people: some popular .NET libraries changed their terms. MediatR and MassTransit moved to commercial licensing for newer versions. They are still fine tools, but if you adopt them in a growing system, check the license and budget for it — or use the built-in alternatives, like plain interfaces for messaging inside the process and a simple queue for background work.

A quick word on the modern .NET stack

If you are building today, you are in a good spot for scaling. .NET 10 is the current LTS (long-term support) release, so it gets years of updates and patches — a sensible base for a system you expect to grow. C# 14 has shipped. Looking ahead, C# 15 (with union types) is in .NET 11 preview.

None of these versions change the ladder above. The steps — measure, scale up, go stateless, scale out, cache, scale the database — are timeless. Newer runtimes mostly make each rung faster and cheaper, which is exactly what you want.

Common mistakes to avoid

Scaling out before going stateless. Running many copies while keeping session in local memory causes random, hard-to-find bugs. Go stateless first, always.
Caching everything. A cache full of rarely-read data wastes memory and hides bugs. Cache hot, slow, stable data only.
Guessing the bottleneck. Teams often "optimize" the wrong thing. Measure first, every time.
Ignoring the database. The fanciest app scaling cannot save a single overloaded database. Watch query times closely.
Reaching for microservices too early. They add real complexity and operational cost. Climb the monolith ladder fully first.

Quick recap

A monolith can scale very far. You do not need microservices to handle growth.
Measure first. Find the true bottleneck before changing anything; it is often one bad query.
Vertical scaling (a bigger machine) is the easiest first step, but it has a ceiling and no fault tolerance.
Statelessness is the key that unlocks everything else: keep no user memory inside the app; share state in Redis or the database.
Horizontal scaling runs many copies behind a load balancer like YARP, giving near-linear growth and fault tolerance.
Caching reuses slow results; cache hot, rarely-changing data and always set an expiry. Invalidation is the hard part.
Read replicas spread database reads so they stop fighting with writes; mind a little replication lag.
Walk the ladder one rung at a time, and add complexity only when the current rung truly runs out.

Scaling Monoliths: A Practical Guide for Growing .NET Systems

A small tea shop that grew popular

What "scaling a monolith" really means

The Scaling Ladder for a Monolith

Rung 1: Vertical scaling (make the machine bigger)

Rung 2: Make the app stateless

Rung 3: Horizontal scaling (run many copies)

Life of One Request Under Horizontal Scaling

Rung 4: Caching (don't redo slow work)

Rung 5: Scale the database

Putting it together: a simple decision guide

What Should I Do When My App Is Slow?

A quick word on the modern .NET stack

Common mistakes to avoid

Quick recap

References and further reading

Related Posts

What Is a Modular Monolith? A Beginner-Friendly Guide for .NET

Modular Monolith Data Isolation in .NET: A Beginner-Friendly Guide

Migrating a Modular Monolith to Microservices in .NET

The Real Cost of Abstractions in .NET (A Beginner-Friendly Guide)

YARP vs Nginx: A Quick Performance Comparison for .NET

How to Build a High-Performance Cache in C# Without External Libraries