Skip to main content
SEMastery
Fundamentalsbeginner

Working With LLMs in .NET Using Microsoft.Extensions.AI

A beginner-friendly guide to calling large language models in .NET with Microsoft.Extensions.AI, using one simple IChatClient interface for any provider.

12 min readUpdated April 6, 2026

Imagine you travel around India and you want to ask for directions. In one town people speak Hindi, in another Tamil, in another Bengali. If you had to learn a brand new language in every town, travel would be exhausting. But imagine you had one kind translator who stands beside you everywhere. You speak to the translator once, and the translator handles every local language for you. Your job stays simple. You just ask your question.

That translator is exactly what Microsoft.Extensions.AI is for AI models in .NET.

Today there are many AI model providers. OpenAI, Azure OpenAI, Ollama, and others. Each one used to have its own code style. If you wrote your app for one and later wanted to switch, you had to rewrite a lot. Microsoft.Extensions.AI gives you one common way to talk to all of them. You learn it once. You write your app once. You can swap the model later with a tiny change.

In this guide we go slow, use simple words, and build understanding step by step. By the end you will be able to call a large language model from .NET and feel comfortable doing it.

What is an LLM, in plain words

An LLM stands for large language model. It is an AI that has read a huge amount of text and learned how words usually follow each other. When you give it some words (a question, a sentence to finish, a document to summarize), it predicts good words to send back.

You can think of it like a very well-read friend. You ask, "Explain photosynthesis to a child," and the friend writes a clear, kind answer. The friend does not truly "know" facts the way a database does. It is very good at producing helpful text. That is both its strength and the reason we must check important answers.

What problem Microsoft.Extensions.AI solves

Before this library, every AI provider had its own .NET package with its own method names and its own object shapes. Your code became glued to one provider. Switching meant pain.

Microsoft.Extensions.AI fixes this with abstractions. An abstraction is a shared shape that hides the messy details underneath. The two most important shapes are:

  • IChatClient — for chatting with a model.
  • IEmbeddingGenerator — for turning text into numbers for search.

Any provider can implement these shapes. Your app only talks to the shape, never to the provider directly.

One interface in the middle lets your app talk to many providers.

Look at the picture above. Your app on the left only ever knows IChatClient. The three providers on the right are interchangeable. This is the whole idea, and it is powerful because it keeps your code clean and free to move.

The packages you will hear about

The library is split into a few NuGet packages. You do not need all of them at once, but it helps to know the names.

PackageWhat it gives you
Microsoft.Extensions.AI.AbstractionsThe shared shapes like IChatClient and IEmbeddingGenerator. Tiny and provider-free.
Microsoft.Extensions.AIThe shapes plus helpers: caching, logging, tool calling, dependency injection.
Microsoft.Extensions.AI.OpenAIA bridge so OpenAI and Azure OpenAI act as an IChatClient.
Microsoft.Extensions.AI.OllamaA bridge so a local Ollama model acts as an IChatClient.

A simple rule for beginners: install Microsoft.Extensions.AI plus the one provider package you want. That is enough to start.

Your first chat call

Let us write the smallest possible example. We will use OpenAI here, but remember, the same IChatClient code works for other providers too. Only the setup line changes.

using Microsoft.Extensions.AI;
using OpenAI;
 
// 1. Build a chat client. This one line picks the provider.
IChatClient client =
    new OpenAIClient("YOUR_API_KEY")
        .GetChatClient("gpt-4o-mini")
        .AsIChatClient();
 
// 2. Ask a question and wait for the full answer.
ChatResponse response =
    await client.GetResponseAsync("Explain what an API is to a 10 year old.");
 
// 3. Print the text the model sent back.
Console.WriteLine(response.Text);

Read it slowly. The first block builds the client. The method AsIChatClient() is the magic translator step. It wraps the OpenAI-specific object so that the rest of your code only sees the shared IChatClient shape. The second block sends one message and waits. The third block prints the answer.

Notice that the only provider-specific lines are in step one. If you later wanted Ollama, you would change only that block. Everything below stays the same.

Messages and roles

In real chats you do not send just one line. You send a small history of the conversation, and each message has a role. Roles tell the model who is speaking.

RoleMeaning
SystemSetup instructions for the model, like "You are a polite tutor."
UserWhat the human says.
AssistantWhat the model said earlier, kept so it remembers context.

Here is how you build a small conversation with roles.

var messages = new List<ChatMessage>
{
    new(ChatRole.System, "You are a kind teacher. Keep answers short and simple."),
    new(ChatRole.User, "What is gravity?")
};
 
ChatResponse reply = await client.GetResponseAsync(messages);
Console.WriteLine(reply.Text);
 
// Keep the chat going by adding the reply and a new question.
messages.AddMessages(reply);
messages.Add(new(ChatRole.User, "Why don't we float away then?"));
 
ChatResponse followUp = await client.GetResponseAsync(messages);
Console.WriteLine(followUp.Text);

The System message sets the tone once. After that, you keep adding to the list so the model remembers what was already said. This is how a chatbot holds a real conversation.

One chat turn

Build messages
Call client
Model thinks
Read response

Steps

1

Build messages

System + user roles

2

Call client

GetResponseAsync

3

Model thinks

Provider runs the LLM

4

Read response

Use response.Text

What happens each time you call GetResponseAsync.

Streaming: words as they arrive

When you ask a long question, waiting for the whole answer can feel slow. Streaming lets you show words the moment they arrive, like watching someone type. The user sees progress immediately, which feels much faster and friendlier.

await foreach (ChatResponseUpdate update in
    client.GetStreamingResponseAsync("Write a short poem about the monsoon."))
{
    // Each update is a small piece of the answer. Print it right away.
    Console.Write(update.Text);
}
Console.WriteLine();

The difference is small in code but big in feeling. GetResponseAsync waits for everything. GetStreamingResponseAsync hands you pieces as they come. Use streaming for chat windows where the person is watching.

Streaming sends many small chunks instead of one big reply.

Setting up with dependency injection

Real apps, like an ASP.NET Core web app, use dependency injection (DI). DI is a way to set up your services in one place so the rest of the app can just ask for them. Microsoft.Extensions.AI fits this style naturally, which is one reason it feels so at home in .NET.

var builder = WebApplication.CreateBuilder(args);
 
builder.Services.AddChatClient(services =>
    new OpenAIClient(builder.Configuration["OpenAI:Key"]!)
        .GetChatClient("gpt-4o-mini")
        .AsIChatClient());
 
var app = builder.Build();
 
// Anywhere in your app, just ask for IChatClient.
app.MapPost("/ask", async (IChatClient client, string question) =>
{
    ChatResponse answer = await client.GetResponseAsync(question);
    return Results.Ok(answer.Text);
});
 
app.Run();

Notice the endpoint only asks for IChatClient. It has no idea OpenAI is behind it. That is the clean separation we keep talking about. To switch to Ollama, you change only the AddChatClient setup, and the /ask endpoint never knows the difference.

The middleware pipeline: adding powers

Here is where the library really shines. Just like ASP.NET Core lets you stack middleware for web requests, Microsoft.Extensions.AI lets you stack middleware around your chat client. Each layer adds a power without changing your core code.

Common ready-made layers include:

  • Logging — record what was asked and answered.
  • Caching — remember repeated questions to save time and money.
  • Telemetry — measure how long calls take using OpenTelemetry.
  • Function (tool) invocation — let the model call your C# methods.
Middleware wraps your client in layers, like an onion.

You build this pipeline with AsBuilder(). Each call adds a layer. The order matters, just like stacking real boxes.

IChatClient client =
    new OpenAIClient("YOUR_API_KEY")
        .GetChatClient("gpt-4o-mini")
        .AsIChatClient()
        .AsBuilder()
        .UseDistributedCache(cache)      // remember repeated answers
        .UseLogging(loggerFactory)       // write logs
        .UseFunctionInvocation()         // allow tool calling
        .Build();

The lovely part is that every layer works for any provider, because they all wrap the same IChatClient. Write the caching once, and it works whether you use OpenAI today or Ollama tomorrow.

Building a client pipeline

Base client
Add cache
Add logging
Add tools
Build

Steps

1

Base client

Provider client

2

Add cache

UseDistributedCache

3

Add logging

UseLogging

4

Add tools

UseFunctionInvocation

5

Build

Final IChatClient

How AsBuilder stacks features step by step.

Tool calling: letting the model use your code

LLMs are good with words but bad at fresh facts. They do not know today's weather or the price in your database. Tool calling fixes this. You give the model some C# methods it is allowed to call. When it needs a fact, it asks to run your method, your code runs, and the answer goes back to the model.

// A normal C# method. The description helps the model know when to use it.
[Description("Gets the current weather for a city")]
static string GetWeather(string city) =>
    $"It is 31 degrees and sunny in {city}.";
 
var options = new ChatOptions
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};
 
// The client must have UseFunctionInvocation() in its pipeline.
ChatResponse response = await client.GetResponseAsync(
    "What is the weather in Chennai?", options);
 
Console.WriteLine(response.Text);

Behind the scenes, the model reads your question, decides it needs weather, asks to call GetWeather("Chennai"), your method runs, and the result flows back so the model can write a friendly final answer. The UseFunctionInvocation() layer handles all that back-and-forth for you automatically.

Embeddings: turning text into numbers

So far we used IChatClient to talk. The other big shape is IEmbeddingGenerator. It turns text into a list of numbers called an embedding. Text with similar meaning gets similar numbers. This is the foundation of search, recommendations, and the "find related" features you see everywhere.

IEmbeddingGenerator<string, Embedding<float>> generator =
    new OpenAIClient("YOUR_API_KEY")
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();
 
Embedding<float> vector =
    await generator.GenerateAsync("A cat sat on a warm roof.");
 
// vector.Vector is the list of numbers that captures the meaning.
Console.WriteLine($"This embedding has {vector.Vector.Length} numbers.");

To search, you embed your documents once and store the numbers. When a user asks something, you embed their question and find the stored numbers that are closest. Those closest items are the most related in meaning. This is called semantic search, and it is far smarter than matching exact words.

A safe mental model for production

LLMs are helpful but not perfect. They can be confidently wrong, which people call hallucination. Keep these gentle habits:

  • Never trust important answers blindly. Verify facts that matter.
  • Set a System message to guide tone and rules.
  • Add caching to avoid paying for the same question twice.
  • Log requests so you can debug strange answers later.
  • Keep secret API keys out of your code, in configuration or a secret store.

Because Microsoft.Extensions.AI uses the normal .NET configuration and DI patterns, all of these habits fit in cleanly. You are not learning a strange new world. You are using the .NET you already know, with AI plugged in.

Picking a provider

A quick beginner guide to choosing where your model runs:

  • Ollama (local) is free and private. Great for learning and for data you must keep on your own machine. Needs a decent computer.
  • OpenAI is fast and powerful, billed per use. Great when you want top quality and do not want to manage servers.
  • Azure OpenAI is the same models inside Microsoft Azure, with enterprise controls. Great for companies already on Azure.

The best part: because of the shared IChatClient, you can start on free local Ollama while learning, then move to a cloud provider later by changing one setup block. Your app logic never has to change.

Quick recap

  • Microsoft.Extensions.AI gives you one common way to talk to many AI providers in .NET.
  • IChatClient is for chatting. IEmbeddingGenerator is for turning text into numbers for search.
  • You build a client once with a provider, then your whole app only sees the shared interface.
  • GetResponseAsync waits for the full answer. GetStreamingResponseAsync shows words as they arrive.
  • Messages carry roles: System, User, and Assistant.
  • AsBuilder() lets you stack middleware like caching, logging, telemetry, and tool calling, and these work for any provider.
  • Tool calling lets the model run your C# methods to get fresh facts.
  • Embeddings power semantic search, where similar meaning gives similar numbers.
  • You can start on free local Ollama and move to a cloud provider later by changing one line.

References and further reading

Related Posts