Scaling Monoliths: A Practical Guide for Growing .NET Systems
Learn how to scale a .NET monolith step by step: vertical scaling, stateless apps, load balancing with YARP, caching with Redis, and read replicas — in simple words.
A small tea shop that grew popular
Imagine a small tea shop near a railway station. One person runs it. He takes your order, makes the chai, pours it, and hands it over. When only a few people come, this works beautifully.
Then exams end, a festival begins, and suddenly a huge crowd arrives. One person cannot keep up. The line grows longer and longer. People get angry and leave.
What does the owner do? He has a few choices, and they are exactly the choices we have when a software system grows.
First, he can work faster and buy a bigger stove so he makes chai quicker. That is vertical scaling — making one worker stronger.
Second, he can hire more people and put a manager at the front who sends each new customer to whoever is free. That is horizontal scaling with a load balancer.
Third, he can make a big pot of chai in advance so popular orders are ready instantly. That is caching.
A software monolith grows the same way. You do not have to tear it apart into tiny pieces to make it fast. You just apply the right trick at the right time. Let us learn each one slowly.
What "scaling a monolith" really means
A monolith is one application that you build and deploy as a single unit. People sometimes think a monolith cannot scale. That is a myth. A well-built monolith can serve millions of users.
Scaling means one thing: handling more work without slowing down or breaking. More work can mean more users, more data, or heavier requests.
There is no single magic switch. Scaling is a ladder. You climb one rung at a time, and you only climb higher when the rung you are on stops being enough.
The Scaling Ladder for a Monolith
Steps
Measure
Find the real bottleneck before changing anything
Scale Up
Give the server more CPU and memory (vertical)
Go Stateless
Remove in-memory session so any copy can serve any request
Scale Out
Run many copies behind a load balancer (horizontal)
Cache
Store hot, slow-changing data in memory or Redis
Scale the DB
Add read replicas so reads do not fight writes
The most important rule is at the bottom: measure first. Never guess. Use logging, metrics, and tools to find the slow part. Often the problem is one bad database query, not the whole app. Fixing that is cheaper than any fancy architecture change.
Rung 1: Vertical scaling (make the machine bigger)
Vertical scaling means giving your app more power on the same machine: more CPU cores, more memory, faster disks. It is the simplest step and often the most effective first move.
Think of the tea shop owner buying a faster stove. Nothing about how he works changes. He just has better tools.
In the cloud, this is usually a few clicks or one config line. You move from a small server size to a bigger one. No code change needed.
| Aspect | Vertical scaling | Horizontal scaling |
|---|---|---|
| What changes | One machine gets bigger | More machines are added |
| Code changes needed | Usually none | App must be stateless |
| Cost pattern | Jumps in big steps | Grows smoothly |
| Limit | A single machine has a ceiling | Almost unlimited |
| Fault tolerance | One machine fails = all down | One copy fails, others serve |
Vertical scaling has two weak points. It hits a ceiling — there is a biggest machine you can buy. And it gives no fault tolerance — if that one machine dies, everything is down.
Before you buy bigger hardware, also remove waste in your code. A common one is the N+1 query problem, where you accidentally hit the database once per row in a loop.
// Slow: one query for orders, then one MORE query per order (N+1).
var orders = await db.Orders.ToListAsync();
foreach (var order in orders)
{
// Each line here fires a separate database trip. Painful at scale.
order.Customer = await db.Customers.FindAsync(order.CustomerId);
}
// Fast: load everything in a single query with a join.
var ordersWithCustomers = await db.Orders
.Include(o => o.Customer)
.ToListAsync();Fixing waste like this can make an app feel "scaled" without buying anything. Always clean the code before you spend money on hardware.
Rung 2: Make the app stateless
Before you can run many copies of your app, you must make it stateless. This is the single most important idea in this whole guide, so let us go slowly.
Stateful means the app remembers things about a user inside its own memory between requests. For example, it might keep your shopping cart in the server's RAM.
Stateless means the server keeps no such memory. Every request carries what it needs, and any shared data lives somewhere outside the app — a database or a shared cache.
Why does this matter? Imagine three copies of the tea shop, and a manager sending each customer to a free worker. If worker A wrote your order on a sticky note in his own pocket, and your next request goes to worker B, then worker B has no idea what you ordered. Chaos.
The fix: workers write orders on a shared board that everyone can see. Now it does not matter who serves you next.
In ASP.NET Core, the most common "hidden state" is session stored in memory. To go stateless, move it to a distributed cache.
var builder = WebApplication.CreateBuilder(args);
// Store session in Redis, not in each server's local memory.
// Now any app copy can read the same session.
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("Redis");
options.InstanceName = "TeaShop:";
});
builder.Services.AddSession();
var app = builder.Build();
app.UseSession();
app.Run();A quick checklist for "am I stateless?":
- No important data kept in a static field or in-memory dictionary across requests.
- Session and cache live in Redis or the database, not local RAM.
- Uploaded files go to shared storage (like blob storage), not the local disk.
- Background work uses a shared queue, not an in-memory list.
Once you tick these boxes, you are ready to run many copies safely.
Rung 3: Horizontal scaling (run many copies)
Horizontal scaling, also called scale-out, means running several copies of your app on different machines, with a load balancer in front. The load balancer is the manager at the tea shop: it sends each request to whichever copy is free.
This gives two big wins. You get near-linear scaling — roughly, two copies handle about twice the traffic. And you get fault tolerance — if one copy crashes, the others keep serving while it restarts.
For .NET teams, a great in-house option is YARP (Yet Another Reverse Proxy), Microsoft's own reverse proxy library. It runs as an ASP.NET Core app and can route and balance traffic across your instances. In the cloud you can also use Azure Application Gateway, AWS ALB, or Google Cloud Load Balancing.
Here is a tiny YARP setup that spreads load across two app copies.
var builder = WebApplication.CreateBuilder(args);
// YARP reads its routes and clusters from configuration.
builder.Services.AddReverseProxy()
.LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
var app = builder.Build();
// This single line makes the app act as a load-balancing proxy.
app.MapReverseProxy();
app.Run();And the matching appsettings.json cluster, which lists the two copies to balance between:
// appsettings.json (shown as text)
// "ReverseProxy": {
// "Routes": {
// "all": { "ClusterId": "teashop", "Match": { "Path": "{**catch-all}" } }
// },
// "Clusters": {
// "teashop": {
// "LoadBalancingPolicy": "RoundRobin",
// "Destinations": {
// "copy1": { "Address": "https://localhost:5101" },
// "copy2": { "Address": "https://localhost:5102" }
// }
// }
// }
// }The RoundRobin policy simply takes turns: copy 1, then copy 2, then copy 1 again. Other policies pick the least-busy copy. Either way, the user never knows or cares which copy served them — and that is exactly the point of being stateless.
The flow of a single request through this system looks like this.
Life of One Request Under Horizontal Scaling
Steps
Request
User sends a request to one public address
Load Balancer
Picks a healthy, free app copy
Free Copy
Runs the logic; keeps no private memory
Shared State
Reads and writes session and data in Redis or the DB
Response
Result returns; the next request may go elsewhere
Rung 4: Caching (don't redo slow work)
Caching means storing the result of slow work so you can reuse it instantly. The tea shop owner makes a big pot of the most popular chai in advance. When someone orders it, he pours from the pot instead of making a fresh cup. Fast.
In software, you cache data that is read often and changes rarely. Good examples: a product catalog, a list of countries, a user's profile that barely changes.
There are two common kinds in .NET:
| Cache type | Where it lives | Best for |
|---|---|---|
| In-memory cache | Inside one app copy's RAM | Fast, tiny, per-copy data |
| Distributed cache (Redis) | A shared server all copies use | Data shared across many copies |
With many app copies, prefer a distributed cache like Redis. If each copy had its own private memory cache, they could disagree with each other, which confuses users.
Here is a simple "cache-aside" pattern. You look in the cache first; if it is missing, you load from the database and then save it in the cache for next time.
public async Task<Product?> GetProductAsync(int id)
{
var key = $"product:{id}";
// 1. Try the cache first.
var cached = await _cache.GetStringAsync(key);
if (cached is not null)
return JsonSerializer.Deserialize<Product>(cached);
// 2. Cache miss: go to the slow source of truth.
var product = await _db.Products.FindAsync(id);
if (product is null)
return null;
// 3. Save it for next time, with an expiry so it cannot go stale forever.
await _cache.SetStringAsync(key, JsonSerializer.Serialize(product),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
});
return product;
}Caching has one famously hard part: invalidation. That means knowing when to throw old data away. If a product's price changes but the cache still holds the old price, users see wrong information.
Two safe habits help a lot. First, always set an expiry time so data cannot live forever. Second, clear the cache key whenever you update that item. Start small: cache only your slowest, most popular, rarely-changing data, and watch your cache hit rate before caching more.
Rung 5: Scale the database
Very often, the real bottleneck is not the app at all — it is the database. You can have ten app copies, but if they all hammer one tired database, the database becomes the new line at the tea shop.
A powerful fix is a read replica. This is a copy of your database that stays in sync with the main one but only answers read queries. Writes still go to the primary. Reads spread across replicas, so reads stop fighting with writes.
One thing to know: replicas can be a tiny bit behind the primary. This is called replication lag. For most reads (showing a product, a list, a profile) a delay of a fraction of a second is perfectly fine. For something that must be exactly current right after a write, read from the primary instead.
Other database moves on this rung include adding the right indexes (so queries do not scan whole tables), connection pooling (reusing connections instead of opening new ones), and archiving very old data so tables stay lean.
Putting it together: a simple decision guide
When your monolith feels slow, do not panic and rewrite everything. Walk the ladder calmly.
What Should I Do When My App Is Slow?
Steps
Measure
Use metrics to find the true bottleneck
Fix queries
Kill N+1 and add missing indexes first
Scale up
Bigger machine buys easy time
Stateless + scale out
Many copies behind a load balancer
Cache + replicas
Reuse hot data; spread database reads
Notice what is not here: "rewrite into microservices." Scaling a monolith is mostly about these five honest, well-understood steps. Microservices solve team and deployment independence, not raw performance. You can scale a clean monolith astonishingly far before you ever need them.
A small note on tooling and licensing, since it surprises people: some popular .NET libraries changed their terms. MediatR and MassTransit moved to commercial licensing for newer versions. They are still fine tools, but if you adopt them in a growing system, check the license and budget for it — or use the built-in alternatives, like plain interfaces for messaging inside the process and a simple queue for background work.
A quick word on the modern .NET stack
If you are building today, you are in a good spot for scaling. .NET 10 is the current LTS (long-term support) release, so it gets years of updates and patches — a sensible base for a system you expect to grow. C# 14 has shipped. Looking ahead, C# 15 (with union types) is in .NET 11 preview.
None of these versions change the ladder above. The steps — measure, scale up, go stateless, scale out, cache, scale the database — are timeless. Newer runtimes mostly make each rung faster and cheaper, which is exactly what you want.
Common mistakes to avoid
- Scaling out before going stateless. Running many copies while keeping session in local memory causes random, hard-to-find bugs. Go stateless first, always.
- Caching everything. A cache full of rarely-read data wastes memory and hides bugs. Cache hot, slow, stable data only.
- Guessing the bottleneck. Teams often "optimize" the wrong thing. Measure first, every time.
- Ignoring the database. The fanciest app scaling cannot save a single overloaded database. Watch query times closely.
- Reaching for microservices too early. They add real complexity and operational cost. Climb the monolith ladder fully first.
Quick recap
- A monolith can scale very far. You do not need microservices to handle growth.
- Measure first. Find the true bottleneck before changing anything; it is often one bad query.
- Vertical scaling (a bigger machine) is the easiest first step, but it has a ceiling and no fault tolerance.
- Statelessness is the key that unlocks everything else: keep no user memory inside the app; share state in Redis or the database.
- Horizontal scaling runs many copies behind a load balancer like YARP, giving near-linear growth and fault tolerance.
- Caching reuses slow results; cache hot, rarely-changing data and always set an expiry. Invalidation is the hard part.
- Read replicas spread database reads so they stop fighting with writes; mind a little replication lag.
- Walk the ladder one rung at a time, and add complexity only when the current rung truly runs out.
References and further reading
- ASP.NET Core performance best practices — Microsoft Learn
- YARP — Microsoft's .NET reverse proxy
- Distributed caching in ASP.NET Core — Microsoft Learn
- Scaling Monoliths: A Practical Guide for Growing Systems — Milan Jovanović
- Scaling a Monolith Horizontally — CodeOpinion
Related Posts
What Is a Modular Monolith? A Beginner-Friendly Guide for .NET
Understand the modular monolith in simple words: one app, strong internal walls. Learn how it compares to monoliths and microservices, why it is the 2026 default for most .NET teams, and how to build one.
Modular Monolith Data Isolation in .NET: A Beginner-Friendly Guide
Learn data isolation in a .NET modular monolith using separate schemas, one DbContext per module, and events instead of cross-module joins. Simple, clear examples.
Migrating a Modular Monolith to Microservices in .NET
A simple, friendly guide to moving a .NET modular monolith to microservices using the strangler fig pattern, YARP, clear boundaries, and safe steps.
The Real Cost of Abstractions in .NET (A Beginner-Friendly Guide)
A simple, friendly guide to what abstractions really cost in .NET, when interfaces help, when they hurt, and how the JIT makes most of the cost vanish.
YARP vs Nginx: A Quick Performance Comparison for .NET
A simple, friendly look at YARP vs Nginx as a reverse proxy: how each one works, real benchmark numbers, tuning tips, and how to pick the right one.
How to Build a High-Performance Cache in C# Without External Libraries
Build a fast, thread-safe, size-limited LRU cache in C# using only the .NET base class library. Clear diagrams, code, and student-friendly explanations.