What does production-ready monitoring with health checks actually mean?

It means you have not just a single /health URL, but a small system. You have a fast liveness endpoint that tells an orchestrator if the app is stuck, a readiness endpoint that tells it if the app can take traffic, real checks for your database and cache, a place that stores the history of those checks, and an alert that wakes someone up when things stay red. Production-ready means the checks are honest, fast, safe to expose, and connected to the tools that act on them.

How fast should a health check be?

Very fast. A liveness check should finish in a few milliseconds because it runs every few seconds and a slow check can make an orchestrator think the app is dead. Keep heavy work like a full database query out of liveness. Readiness checks can do a little more, like a quick SELECT 1 against the database, but still aim to finish well under a second. Always set a timeout so one slow dependency cannot block the whole response.

Do I need the Xabaril health checks packages or just the built-in API?

The built-in API in Microsoft.Extensions.Diagnostics.HealthChecks ships with ASP.NET Core and is enough for custom checks. The community Xabaril packages save you time by giving ready-made checks for SQL Server, PostgreSQL, Redis, RabbitMQ, and many more, plus a small dashboard UI. Use the built-in API for your own logic and add Xabaril packages for common dependencies so you do not rewrite the same code.

Should health check endpoints be public?

The simple liveness and readiness endpoints should stay open because Kubernetes probes do not send auth headers. If you require authentication on them, the probes always fail and your pod keeps restarting. Detailed endpoints that show database names, versions, or timings should be protected or exposed only on an internal port, because that information helps attackers.

What is the difference between Degraded and Unhealthy?

Healthy means everything works. Unhealthy means a required dependency failed and the app cannot do its job, so it returns HTTP 503. Degraded means the app still works but something is not great, like a slow cache, and by default it still returns HTTP 200. Use Degraded for warnings you want to see on a dashboard without taking the app out of rotation.

DevOpsintermediate

How to Set Up Production-Ready Monitoring With ASP.NET Core Health Checks

A friendly, step-by-step guide to production-ready monitoring with ASP.NET Core health checks: liveness, readiness, dependency checks, a UI, and probes.

12 min readUpdated March 23, 2026

A night watchman doing his rounds

Picture a big housing society in your city. At night, one watchman walks around with a torch. He does not inspect every single flat. He does a quick round: is the main gate locked, is the water pump running, are the stairwell lights on, is the lift working? In two minutes he knows if the society is fine, if something small is off, or if there is a real problem that needs the manager right now.

A health check in ASP.NET Core is that night watchman, but for your web app. Instead of a person, a robot does the round — usually Kubernetes, a load balancer, or an uptime monitor. Every few seconds it visits a special URL and asks one simple question: "Are you okay?"

Your app runs a few quick checks — can it reach the database, is Redis awake, is there enough disk space — and answers with one of three words: Healthy, Degraded, or Unhealthy. The robot then decides what to do next: keep sending people to your app, slow down, or restart it.

A single /health URL is a good start, but it is not enough for real production. The watchman needs a proper routine, a logbook, and a way to call the manager. This guide shows you how to build that full routine step by step, in plain language.

What "production-ready" really adds

Many tutorials stop after one endpoint. Real systems need more. Here is the difference between a toy setup and a production setup.

Concern	Toy setup	Production-ready setup
Endpoints	One `/health` URL	Separate liveness and readiness URLs
Checks	Always returns "OK"	Real database, cache, and disk checks
Speed	No timeout	Per-check timeout, fast liveness
History	None	Stored results you can look back at
Safety	Public detailed output	Open probes, protected details
Alerts	Someone notices later	Automatic alert when red

The goal is honesty. A health check that always says "Healthy" is worse than none, because it gives false comfort. Your checks must tell the truth even when it hurts.

Who calls your health endpoints and what they do with the answer

Step 1: Add the built-in health check service

The core health check API ships inside ASP.NET Core, so you do not need any extra package to begin. You register the service and map an endpoint.

var builder = WebApplication.CreateBuilder(args);
 
builder.Services.AddHealthChecks();
 
var app = builder.Build();
 
app.MapHealthChecks("/healthz");
 
app.Run();

Now visit /healthz in a browser. With no checks added yet, it returns the plain text Healthy and HTTP status 200. That is the watchman saying "I exist." Useful, but he has not looked at anything yet.

The three possible results are worth remembering. They map to HTTP status codes by default like this.

Status	Meaning	Default HTTP code
Healthy	Everything works	200
Degraded	Works, but not great	200
Unhealthy	A required part failed	503

Notice that Degraded still returns 200. That is on purpose. A degraded app should keep serving people while you investigate. Only Unhealthy pulls it out of rotation.

Step 2: Split liveness from readiness

This is the most important idea in the whole guide. Two questions sound similar but are very different.

Liveness: "Is the app alive, or is it stuck and needs a restart?"
Readiness: "Is the app ready to take requests right now?"

An app can be alive but not ready. Think of a shop where the shutter is up and the lights are on (alive), but the staff are still counting the cash register and have not opened the counter yet (not ready). You would not send customers in yet, but you also would not knock the building down.

Kubernetes treats these very differently. If liveness fails, it restarts the pod. If readiness fails, it stops sending traffic but leaves the pod running so it can recover.

Liveness vs readiness decisions

Live fails

Ready fails

Both pass

Steps

Live fails

Restart the pod

Ready fails

Stop traffic, keep pod

Both pass

Send traffic normally

What the orchestrator does with each answer

We separate them using tags. Every check gets a tag like live or ready, and each endpoint runs only the checks with the matching tag.

builder.Services.AddHealthChecks()
    // A tiny check that proves the app loop is alive.
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
    // A real dependency check used only for readiness.
    .AddCheck("ready-gate", () => HealthCheckResult.Healthy(), tags: ["ready"]);
 
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("live")
});
 
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});

The Predicate is the filter. The live endpoint runs only live-tagged checks, so it stays tiny and fast. The ready endpoint runs the heavier dependency checks. Keep liveness almost empty — often just a "self" check that returns Healthy. If liveness does a database query and the database hiccups, Kubernetes will kill a perfectly fine pod for no reason.

How tags route checks to the right endpoint

Step 3: Check your real dependencies

A health check that only returns Healthy is dishonest. The watchman must actually look at the water pump. For common dependencies, the community Xabaril project gives you ready-made checks so you do not write the same code again. Install the packages you need.

dotnet add package AspNetCore.HealthChecks.SqlServer
dotnet add package AspNetCore.HealthChecks.Npgsql
dotnet add package AspNetCore.HealthChecks.Redis

Then wire them up. Give each one the ready tag so it runs on the readiness endpoint, not on liveness.

var sql = builder.Configuration.GetConnectionString("Sql")!;
var redis = builder.Configuration.GetConnectionString("Redis")!;
 
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
    .AddSqlServer(
        connectionString: sql,
        name: "sql-database",
        tags: ["ready"],
        timeout: TimeSpan.FromSeconds(3))
    .AddRedis(
        redisConnectionString: redis,
        name: "redis-cache",
        tags: ["ready"],
        timeout: TimeSpan.FromSeconds(2));

Two things matter here. First, every dependency check has a timeout. Without one, a frozen database could make your readiness response hang forever, which is as bad as the app being down. Second, the checks are tagged ready, so a slow cache will only stop new traffic, not trigger a restart.

Writing your own custom check

Sometimes you need to check something specific, like whether a downstream payment API answers. You write a small class that implements IHealthCheck.

public sealed class PaymentApiHealthCheck(IHttpClientFactory factory) : IHealthCheck
{
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        try
        {
            var client = factory.CreateClient("payments");
            using var response = await client.GetAsync("/ping", cancellationToken);
 
            if (response.IsSuccessStatusCode)
                return HealthCheckResult.Healthy("Payment API responded.");
 
            // Still up, but the dependency is unhappy — warn, do not kill.
            return HealthCheckResult.Degraded(
                $"Payment API returned {(int)response.StatusCode}.");
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Payment API unreachable.", ex);
        }
    }
}

builder.Services.AddHealthChecks()
    .AddCheck<PaymentApiHealthCheck>("payment-api", tags: ["ready"]);

Notice the three return paths. A clean success is Healthy. An odd status code is Degraded — the app still works, but you want to see the warning. A thrown exception is Unhealthy. Choosing the right level is the real skill. Mark something Unhealthy only if the app truly cannot do its job without it.

Choosing the right status

Works fully

Works partly

Cannot work

Steps

Works fully

Return Healthy

Works partly

Return Degraded

Cannot work

Return Unhealthy

A simple rule for which result to return

Step 4: Return useful JSON, but keep it safe

The default response is the single word Healthy. Robots are happy with that, but humans debugging an incident want detail: which check failed, how long it took, and why. You can shape the response with a custom writer.

app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = async (context, report) =>
    {
        context.Response.ContentType = "application/json";
        var payload = new
        {
            status = report.Status.ToString(),
            totalDurationMs = report.TotalDuration.TotalMilliseconds,
            checks = report.Entries.Select(e => new
            {
                name = e.Key,
                status = e.Value.Status.ToString(),
                durationMs = e.Value.Duration.TotalMilliseconds,
                description = e.Value.Description
            })
        };
        await context.Response.WriteAsJsonAsync(payload);
    }
});

Now the readiness URL returns a tidy JSON object that a person can read during an incident. But here is the safety rule: this detailed output is also useful to attackers. It can leak server names, dependency versions, and your internal structure.

So follow this pattern:

Keep /healthz/live and /healthz/ready simple and open. Kubernetes probes do not send auth headers, so locking these down breaks the probes.
Put the rich JSON and any dashboard on a separate, protected endpoint — behind authorization, or on an internal-only port that the public cannot reach.

Public probes stay open, rich detail stays protected

Step 5: Add a dashboard and store history

A single check tells you "now." Production needs "what happened." The Xabaril UI package gives you a small web dashboard that polls your endpoints and draws a history.

dotnet add package AspNetCore.HealthChecks.UI
dotnet add package AspNetCore.HealthChecks.UI.Client
dotnet add package AspNetCore.HealthChecks.UI.InMemory.Storage

builder.Services
    .AddHealthChecksUI(setup =>
    {
        setup.AddHealthCheckEndpoint("ready", "/healthz/ready");
        setup.SetEvaluationTimeInSeconds(15);
    })
    .AddInMemoryStorage();
 
app.MapHealthChecksUI(options => options.UIPath = "/health-ui");

One warning about storage. AddInMemoryStorage is easy, but it forgets everything when the app restarts. If you want history that survives restarts and crashes — which is exactly when you most want it — use a database-backed store instead, such as AspNetCore.HealthChecks.UI.SqlServer.Storage or the PostgreSQL equivalent. With persistent storage, after an outage you can look back and see precisely when the database started failing.

Protect the /health-ui path the same way you protect the detailed endpoint. It shows your internal map, so it is not for the public.

Step 6: Push results instead of waiting

Everything so far is pull-based: a robot calls your URL. ASP.NET Core also supports push-based monitoring through IHealthCheckPublisher. When you register a publisher, the framework runs your checks on a timer and hands the result to your code. You can then push that result anywhere — a metrics system, a logging pipeline, or a Slack alert.

builder.Services.Configure<HealthCheckPublisherOptions>(options =>
{
    options.Delay = TimeSpan.FromSeconds(5);
    options.Period = TimeSpan.FromSeconds(30);
    options.Predicate = check => check.Tags.Contains("ready");
});
 
builder.Services.AddSingleton<IHealthCheckPublisher, SlackAlertPublisher>();

This is how you turn "someone will notice eventually" into "someone gets paged in 30 seconds." The publisher runs even when no one is calling your endpoints, so it works well with dashboards like Grafana or with an on-call tool.

Pull-based probes versus push-based publishing

Step 7: Wire the probes into Kubernetes

Finally, tell Kubernetes which URL is which. This is where all the earlier work pays off. The liveness probe points at the tiny live endpoint; the readiness probe points at the heavier ready endpoint.

livenessProbe:
  httpGet:
    path: /healthz/live
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /healthz/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

If your app needs a long warm-up (loading a big cache, running migrations), add a startup probe as well. It gives the app extra time to boot before the liveness probe starts judging it, so a slow start does not get mistaken for a crash.

Probe lifecycle on deploy

Startup

Readiness

Liveness

Steps

Startup

Wait for boot to finish

Readiness

Open traffic when ready

Liveness

Restart if stuck later

The order probes run during a rollout

Common mistakes to avoid

Heavy liveness checks. Querying the database in liveness causes pointless restarts when the database blips. Keep liveness to a self-check.
No timeouts. A frozen dependency without a timeout makes the whole response hang, which looks like a crash.
Auth on probes. Kubernetes probes send no auth headers. Locking them down means constant restarts.
Leaking details publicly. Rich JSON helps attackers. Keep it behind protection.
Forgetting history. In-memory storage loses data on restart — the worst moment to lose it.
Always-green checks. A check that never fails is worse than no check, because you trust a lie.

Quick recap

A health check is a quick "are you okay?" round that robots run against your app.
Split liveness (restart if stuck) from readiness (stop traffic if not ready) using tags and a Predicate.
Keep liveness tiny and fast; put real database, cache, and API checks on readiness with a timeout.
Return three honest levels: Healthy, Degraded (still 200), and Unhealthy (503).
Keep simple probe endpoints open; protect rich JSON and the UI dashboard.
Use persistent storage so check history survives restarts.
Add an IHealthCheckPublisher to push alerts instead of waiting to be asked.
Map livenessProbe and readinessProbe (and a startupProbe for slow boots) in Kubernetes.

How to Set Up Production-Ready Monitoring With ASP.NET Core Health Checks

A night watchman doing his rounds

What "production-ready" really adds

Step 1: Add the built-in health check service

Step 2: Split liveness from readiness

Liveness vs readiness decisions

Step 3: Check your real dependencies

Writing your own custom check

Choosing the right status

Step 4: Return useful JSON, but keep it safe

Step 5: Add a dashboard and store history

Step 6: Push results instead of waiting

Step 7: Wire the probes into Kubernetes

Probe lifecycle on deploy

Common mistakes to avoid

Quick recap

References and further reading

Related Posts

Health Checks in ASP.NET Core: A Beginner's Guide

Getting Started With OpenTelemetry in .NET With Jaeger and Seq

5 Serilog Best Practices for Better Structured Logging in .NET

Monitoring .NET Applications With OpenTelemetry and Grafana

Logging Best Practices in ASP.NET Core: A Beginner's Guide

Structured Logging in ASP.NET Core with Serilog: A Beginner's Guide