Working With LLMs in .NET Using Microsoft.Extensions.AI
A beginner-friendly guide to calling large language models in .NET with Microsoft.Extensions.AI, using one simple IChatClient interface for any provider.
Imagine you travel around India and you want to ask for directions. In one town people speak Hindi, in another Tamil, in another Bengali. If you had to learn a brand new language in every town, travel would be exhausting. But imagine you had one kind translator who stands beside you everywhere. You speak to the translator once, and the translator handles every local language for you. Your job stays simple. You just ask your question.
That translator is exactly what Microsoft.Extensions.AI is for AI models in .NET.
Today there are many AI model providers. OpenAI, Azure OpenAI, Ollama, and others. Each one used to have its own code style. If you wrote your app for one and later wanted to switch, you had to rewrite a lot. Microsoft.Extensions.AI gives you one common way to talk to all of them. You learn it once. You write your app once. You can swap the model later with a tiny change.
In this guide we go slow, use simple words, and build understanding step by step. By the end you will be able to call a large language model from .NET and feel comfortable doing it.
What is an LLM, in plain words
An LLM stands for large language model. It is an AI that has read a huge amount of text and learned how words usually follow each other. When you give it some words (a question, a sentence to finish, a document to summarize), it predicts good words to send back.
You can think of it like a very well-read friend. You ask, "Explain photosynthesis to a child," and the friend writes a clear, kind answer. The friend does not truly "know" facts the way a database does. It is very good at producing helpful text. That is both its strength and the reason we must check important answers.
What problem Microsoft.Extensions.AI solves
Before this library, every AI provider had its own .NET package with its own method names and its own object shapes. Your code became glued to one provider. Switching meant pain.
Microsoft.Extensions.AI fixes this with abstractions. An abstraction is a shared shape that hides the messy details underneath. The two most important shapes are:
IChatClient— for chatting with a model.IEmbeddingGenerator— for turning text into numbers for search.
Any provider can implement these shapes. Your app only talks to the shape, never to the provider directly.
Look at the picture above. Your app on the left only ever knows IChatClient. The three providers on the right are interchangeable. This is the whole idea, and it is powerful because it keeps your code clean and free to move.
The packages you will hear about
The library is split into a few NuGet packages. You do not need all of them at once, but it helps to know the names.
| Package | What it gives you |
|---|---|
Microsoft.Extensions.AI.Abstractions | The shared shapes like IChatClient and IEmbeddingGenerator. Tiny and provider-free. |
Microsoft.Extensions.AI | The shapes plus helpers: caching, logging, tool calling, dependency injection. |
Microsoft.Extensions.AI.OpenAI | A bridge so OpenAI and Azure OpenAI act as an IChatClient. |
Microsoft.Extensions.AI.Ollama | A bridge so a local Ollama model acts as an IChatClient. |
A simple rule for beginners: install Microsoft.Extensions.AI plus the one provider package you want. That is enough to start.
Your first chat call
Let us write the smallest possible example. We will use OpenAI here, but remember, the same IChatClient code works for other providers too. Only the setup line changes.
using Microsoft.Extensions.AI;
using OpenAI;
// 1. Build a chat client. This one line picks the provider.
IChatClient client =
new OpenAIClient("YOUR_API_KEY")
.GetChatClient("gpt-4o-mini")
.AsIChatClient();
// 2. Ask a question and wait for the full answer.
ChatResponse response =
await client.GetResponseAsync("Explain what an API is to a 10 year old.");
// 3. Print the text the model sent back.
Console.WriteLine(response.Text);Read it slowly. The first block builds the client. The method AsIChatClient() is the magic translator step. It wraps the OpenAI-specific object so that the rest of your code only sees the shared IChatClient shape. The second block sends one message and waits. The third block prints the answer.
Notice that the only provider-specific lines are in step one. If you later wanted Ollama, you would change only that block. Everything below stays the same.
Messages and roles
In real chats you do not send just one line. You send a small history of the conversation, and each message has a role. Roles tell the model who is speaking.
| Role | Meaning |
|---|---|
System | Setup instructions for the model, like "You are a polite tutor." |
User | What the human says. |
Assistant | What the model said earlier, kept so it remembers context. |
Here is how you build a small conversation with roles.
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a kind teacher. Keep answers short and simple."),
new(ChatRole.User, "What is gravity?")
};
ChatResponse reply = await client.GetResponseAsync(messages);
Console.WriteLine(reply.Text);
// Keep the chat going by adding the reply and a new question.
messages.AddMessages(reply);
messages.Add(new(ChatRole.User, "Why don't we float away then?"));
ChatResponse followUp = await client.GetResponseAsync(messages);
Console.WriteLine(followUp.Text);The System message sets the tone once. After that, you keep adding to the list so the model remembers what was already said. This is how a chatbot holds a real conversation.
One chat turn
Steps
Build messages
System + user roles
Call client
GetResponseAsync
Model thinks
Provider runs the LLM
Read response
Use response.Text
Streaming: words as they arrive
When you ask a long question, waiting for the whole answer can feel slow. Streaming lets you show words the moment they arrive, like watching someone type. The user sees progress immediately, which feels much faster and friendlier.
await foreach (ChatResponseUpdate update in
client.GetStreamingResponseAsync("Write a short poem about the monsoon."))
{
// Each update is a small piece of the answer. Print it right away.
Console.Write(update.Text);
}
Console.WriteLine();The difference is small in code but big in feeling. GetResponseAsync waits for everything. GetStreamingResponseAsync hands you pieces as they come. Use streaming for chat windows where the person is watching.
Setting up with dependency injection
Real apps, like an ASP.NET Core web app, use dependency injection (DI). DI is a way to set up your services in one place so the rest of the app can just ask for them. Microsoft.Extensions.AI fits this style naturally, which is one reason it feels so at home in .NET.
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddChatClient(services =>
new OpenAIClient(builder.Configuration["OpenAI:Key"]!)
.GetChatClient("gpt-4o-mini")
.AsIChatClient());
var app = builder.Build();
// Anywhere in your app, just ask for IChatClient.
app.MapPost("/ask", async (IChatClient client, string question) =>
{
ChatResponse answer = await client.GetResponseAsync(question);
return Results.Ok(answer.Text);
});
app.Run();Notice the endpoint only asks for IChatClient. It has no idea OpenAI is behind it. That is the clean separation we keep talking about. To switch to Ollama, you change only the AddChatClient setup, and the /ask endpoint never knows the difference.
The middleware pipeline: adding powers
Here is where the library really shines. Just like ASP.NET Core lets you stack middleware for web requests, Microsoft.Extensions.AI lets you stack middleware around your chat client. Each layer adds a power without changing your core code.
Common ready-made layers include:
- Logging — record what was asked and answered.
- Caching — remember repeated questions to save time and money.
- Telemetry — measure how long calls take using OpenTelemetry.
- Function (tool) invocation — let the model call your C# methods.
You build this pipeline with AsBuilder(). Each call adds a layer. The order matters, just like stacking real boxes.
IChatClient client =
new OpenAIClient("YOUR_API_KEY")
.GetChatClient("gpt-4o-mini")
.AsIChatClient()
.AsBuilder()
.UseDistributedCache(cache) // remember repeated answers
.UseLogging(loggerFactory) // write logs
.UseFunctionInvocation() // allow tool calling
.Build();The lovely part is that every layer works for any provider, because they all wrap the same IChatClient. Write the caching once, and it works whether you use OpenAI today or Ollama tomorrow.
Building a client pipeline
Steps
Base client
Provider client
Add cache
UseDistributedCache
Add logging
UseLogging
Add tools
UseFunctionInvocation
Build
Final IChatClient
Tool calling: letting the model use your code
LLMs are good with words but bad at fresh facts. They do not know today's weather or the price in your database. Tool calling fixes this. You give the model some C# methods it is allowed to call. When it needs a fact, it asks to run your method, your code runs, and the answer goes back to the model.
// A normal C# method. The description helps the model know when to use it.
[Description("Gets the current weather for a city")]
static string GetWeather(string city) =>
$"It is 31 degrees and sunny in {city}.";
var options = new ChatOptions
{
Tools = [AIFunctionFactory.Create(GetWeather)]
};
// The client must have UseFunctionInvocation() in its pipeline.
ChatResponse response = await client.GetResponseAsync(
"What is the weather in Chennai?", options);
Console.WriteLine(response.Text);Behind the scenes, the model reads your question, decides it needs weather, asks to call GetWeather("Chennai"), your method runs, and the result flows back so the model can write a friendly final answer. The UseFunctionInvocation() layer handles all that back-and-forth for you automatically.
Embeddings: turning text into numbers
So far we used IChatClient to talk. The other big shape is IEmbeddingGenerator. It turns text into a list of numbers called an embedding. Text with similar meaning gets similar numbers. This is the foundation of search, recommendations, and the "find related" features you see everywhere.
IEmbeddingGenerator<string, Embedding<float>> generator =
new OpenAIClient("YOUR_API_KEY")
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();
Embedding<float> vector =
await generator.GenerateAsync("A cat sat on a warm roof.");
// vector.Vector is the list of numbers that captures the meaning.
Console.WriteLine($"This embedding has {vector.Vector.Length} numbers.");To search, you embed your documents once and store the numbers. When a user asks something, you embed their question and find the stored numbers that are closest. Those closest items are the most related in meaning. This is called semantic search, and it is far smarter than matching exact words.
A safe mental model for production
LLMs are helpful but not perfect. They can be confidently wrong, which people call hallucination. Keep these gentle habits:
- Never trust important answers blindly. Verify facts that matter.
- Set a
Systemmessage to guide tone and rules. - Add caching to avoid paying for the same question twice.
- Log requests so you can debug strange answers later.
- Keep secret API keys out of your code, in configuration or a secret store.
Because Microsoft.Extensions.AI uses the normal .NET configuration and DI patterns, all of these habits fit in cleanly. You are not learning a strange new world. You are using the .NET you already know, with AI plugged in.
Picking a provider
A quick beginner guide to choosing where your model runs:
- Ollama (local) is free and private. Great for learning and for data you must keep on your own machine. Needs a decent computer.
- OpenAI is fast and powerful, billed per use. Great when you want top quality and do not want to manage servers.
- Azure OpenAI is the same models inside Microsoft Azure, with enterprise controls. Great for companies already on Azure.
The best part: because of the shared IChatClient, you can start on free local Ollama while learning, then move to a cloud provider later by changing one setup block. Your app logic never has to change.
Quick recap
- Microsoft.Extensions.AI gives you one common way to talk to many AI providers in .NET.
IChatClientis for chatting.IEmbeddingGeneratoris for turning text into numbers for search.- You build a client once with a provider, then your whole app only sees the shared interface.
GetResponseAsyncwaits for the full answer.GetStreamingResponseAsyncshows words as they arrive.- Messages carry roles:
System,User, andAssistant. AsBuilder()lets you stack middleware like caching, logging, telemetry, and tool calling, and these work for any provider.- Tool calling lets the model run your C# methods to get fresh facts.
- Embeddings power semantic search, where similar meaning gives similar numbers.
- You can start on free local Ollama and move to a cloud provider later by changing one line.
References and further reading
- Microsoft.Extensions.AI libraries (Microsoft Learn)
- Use the IChatClient interface (Microsoft Learn)
- Introducing Microsoft.Extensions.AI Preview (.NET Blog)
- Quickstart: function calling with .NET (Microsoft Learn)
- Microsoft.Extensions.AI on NuGet
Related Posts
How to Extract Structured Data From Images Using Ollama in .NET
A beginner-friendly guide to reading text and fields from images using a local Ollama vision model in .NET, returning clean, typed JSON every time.
Building Semantic Search With Amazon S3 Vectors and Semantic Kernel
A beginner-friendly guide to building semantic search in .NET using Amazon S3 Vectors for cheap storage and Semantic Kernel for embeddings.
What Is Vector Search? A Concise Guide for .NET Developers
A simple, friendly guide to vector search for .NET developers: embeddings, similarity, nearest neighbors, and how to build it with Microsoft.Extensions.VectorData.
Top AI Instruments for .NET Developers in 2025
A friendly tour of the best AI tools for .NET developers in 2025: GitHub Copilot, Microsoft.Extensions.AI, Agent Framework, and more.
Build a Multi-Model AI Chat Bot in .NET with ChatGPT and Neon Postgres Branching
Learn to build a multi-model AI chat bot in .NET 10 using ChatGPT and Neon serverless Postgres branching, with simple steps a beginner can follow.
Getting Started With pgvector in .NET for Simple Vector Search
Learn pgvector with .NET, Npgsql and EF Core to store embeddings and run simple vector search with cosine distance and HNSW indexes, step by step.