How to Extract Structured Data From Images Using Ollama in .NET
A beginner-friendly guide to reading text and fields from images using a local Ollama vision model in .NET, returning clean, typed JSON every time.
Think about a kind shopkeeper near your home. You hand him a crumpled paper receipt and ask, "How much did I spend, and on what day?" He glances at it for a second and says, "Two hundred and forty rupees, last Tuesday, from the grocery store." He looked at a messy piece of paper and gave you back clean, useful answers.
That is exactly what we want our program to do. We will give it a photo (a receipt, an ID card, a form) and we want back tidy fields: shop name, date, total. Not one big paragraph. Neat values we can save in a database.
To do this we will use Ollama, a free tool that runs AI models on your own computer, and .NET. The best part is that the picture never leaves your machine. No cloud bill. No data sent away. We go slow, use simple words, and build it step by step.
What we are building
We are building a small .NET app that:
- Loads an image from disk.
- Sends it to a local vision model running in Ollama.
- Asks for the answer in a fixed shape (a schema).
- Gets back a clean C# object we can use right away.
Let us understand the words first, because they sound scary but the ideas are simple.
Three small ideas
A vision model is an AI model that can "see" pictures, not just read text. You give it an image and a question, and it answers in words. Normal language models only read text. Vision models read images too. They are also called multimodal models, because they handle more than one kind of input.
Ollama is a free program that downloads these models and runs them on your computer. Think of it like a music app, but instead of songs it stores AI models, and instead of playing music it answers your questions. It listens on your machine at the address http://localhost:11434.
Structured output means we force the answer into a fixed shape. Without it, the model might reply, "This looks like a grocery bill for about 240 rupees." That sentence is hard for code to use. With structured output, we say, "Give me a JSON object with shopName, date, and total." Now the answer is predictable, and our program can read it safely.
From messy photo to clean fields
Steps
Photo
A real receipt image
Model reads
Vision model sees the text
Schema applied
We demand fixed fields
Typed object
Clean C# values to use
Step 1: Install Ollama and a vision model
First, install Ollama from the official website. It works on Windows, macOS, and Linux. Once it is installed, it runs quietly in the background.
Next, pull a vision model. We will start with a small, popular one. Open your terminal and run this:
ollama pull llavaThis downloads the LLaVA model. It is a friendly starting point for reading images. If you later need better accuracy on documents and receipts, you can try a stronger model:
ollama pull qwen2.5vlYou can check that the model is ready with ollama list. Here is a quick guide to help you choose.
| Model | Size (approx) | Good for | Memory needed |
|---|---|---|---|
llava | 7B | General images, simple text | 8 GB+ |
qwen2.5vl | 7B | Documents, receipts, charts | 8 GB+ |
llama3.2-vision | 11B | Detailed reading, harder layouts | 12 GB+ |
Start small. A 7B model runs on most laptops. Only move to a bigger one if your test images come out wrong. Bigger models are slower and need more memory, so do not jump to the largest one without a reason.
Step 2: Set up the .NET project
Now we create a console app. We will use .NET 10, which is the current long-term support version. Run these commands:
dotnet new console -n ReceiptReader
cd ReceiptReader
dotnet add package OllamaSharp
dotnet add package Microsoft.Extensions.AITwo packages do the heavy lifting:
- OllamaSharp is a .NET library that talks to Ollama for us. It is the easiest way to use Ollama from C#.
- Microsoft.Extensions.AI gives us a shared interface called
IChatClient. This is a common "shape" that many AI providers follow. Because OllamaSharp follows it too, our code looks the same as it would for other providers. If you ever switch providers, you change very little.
The reason this layering matters is simple: your code only ever speaks to IChatClient. It does not care which library or model is behind it. That keeps your app tidy and easy to change later.
Step 3: Connect to Ollama
Let us write the connection code. We point OllamaSharp at the local Ollama address and tell it which model to use.
using Microsoft.Extensions.AI;
using OllamaSharp;
// Point at the local Ollama server and pick the vision model.
IChatClient client = new OllamaApiClient(
new Uri("http://localhost:11434/"),
"llava");
// A quick text-only test to confirm the connection works.
var hello = await client.GetResponseAsync("Say hello in one short sentence.");
Console.WriteLine(hello.Text);Notice the type is IChatClient, not OllamaApiClient. We store it in the interface. This is good practice. Our later code does not need to know the brand of the engine, only that it can chat.
If this prints a friendly hello, your setup works. If it fails, make sure Ollama is running and that you pulled the model. Most early errors are just a model that was not downloaded yet, or Ollama not started.
Step 4: Send an image to the model
Now the fun part. We load an image from disk and send it along with a question. In Microsoft.Extensions.AI, an image is wrapped in a DataContent. We put it inside a chat message together with our text.
// Read the image bytes from disk.
byte[] imageBytes = await File.ReadAllBytesAsync("receipt.jpg");
// Build a message that holds BOTH text and the image.
var message = new ChatMessage(ChatRole.User,
[
new TextContent("Read this receipt and list the shop, date, and total."),
new DataContent(imageBytes, "image/jpeg")
]);
var response = await client.GetResponseAsync(message);
Console.WriteLine(response.Text);This already works. The model will look at the picture and describe it in words. But there is a problem. The reply is free text. One time it might say "Total: 240". Another time it might say "The amount comes to about 240 rupees". Our code cannot rely on a moving target. We need fixed fields. That is the next step.
Why free text is risky
Steps
Run 1
'Total is 240'
Run 2
'About Rs 240'
Run 3
'Amount: 240.00'
Step 5: Ask for structured output
Here is the magic step. We define a C# class that describes the shape we want. Then we ask for that shape directly. OllamaSharp and Microsoft.Extensions.AI work together to send a schema to the model and to parse the answer back into our class.
First, the class. Keep it small and clear.
// This class IS our schema. Each property is a field we want back.
public class Receipt
{
public string ShopName { get; set; } = "";
public string Date { get; set; } = "";
public decimal Total { get; set; }
public string Currency { get; set; } = "";
}Now we call the typed version, GetResponseAsync<Receipt>. The <Receipt> part tells the library, "I want the answer shaped like this class."
byte[] imageBytes = await File.ReadAllBytesAsync("receipt.jpg");
var message = new ChatMessage(ChatRole.User,
[
new TextContent(
"Extract the shop name, date, total amount, and currency " +
"from this receipt."),
new DataContent(imageBytes, "image/jpeg")
]);
// Ask for a strongly typed Receipt, not free text.
ChatResponse<Receipt> response =
await client.GetResponseAsync<Receipt>(message);
if (response.TryGetResult(out Receipt? receipt))
{
Console.WriteLine($"Shop: {receipt.ShopName}");
Console.WriteLine($"Date: {receipt.Date}");
Console.WriteLine($"Total: {receipt.Total} {receipt.Currency}");
}
else
{
Console.WriteLine("The model did not return a valid Receipt.");
}Look at what changed. We no longer read a sentence and try to guess the numbers. We get a real Receipt object with a Total of type decimal. We can save it, add it up, or put it in a report straight away.
The library does three jobs for us behind the scenes. It builds a JSON schema from our class. It tells Ollama to follow that schema. Then it reads the JSON reply and turns it into a Receipt. We just write normal C#.
Always check the result
There is one honest truth about AI models: they are not perfect. Most of the time they follow the schema. But sometimes a model returns something that does not fit, especially with a blurry image or a tiny model.
That is why we used TryGetResult. It returns true only when the reply was valid and could be turned into a Receipt. If it returns false, we handle it kindly instead of crashing. This small habit makes your app safe in the real world.
Here is a simple comparison of the two ways to read a reply.
| Approach | What you get | Safety |
|---|---|---|
response.Text | A sentence of free text | You parse it yourself, easy to break |
GetResponseAsync<Receipt> | A typed Receipt object | Schema enforced, checked with TryGetResult |
For real apps, always prefer the typed call with a check. It is the difference between a toy and a tool you can trust.
A complete tiny example
Let us put the pieces together into one short program you can run. It reads a receipt and prints the fields, with a safe fallback.
using Microsoft.Extensions.AI;
using OllamaSharp;
IChatClient client = new OllamaApiClient(
new Uri("http://localhost:11434/"),
"qwen2.5vl"); // a model that reads documents well
byte[] imageBytes = await File.ReadAllBytesAsync("receipt.jpg");
var message = new ChatMessage(ChatRole.User,
[
new TextContent(
"You are reading an Indian grocery receipt. Extract the shop " +
"name, the date as text, the total amount as a number, and the " +
"currency code. If a field is missing, leave it empty."),
new DataContent(imageBytes, "image/jpeg")
]);
ChatResponse<Receipt> response =
await client.GetResponseAsync<Receipt>(message);
if (response.TryGetResult(out Receipt? receipt))
{
Console.WriteLine($"Shop: {receipt.ShopName}");
Console.WriteLine($"Date: {receipt.Date}");
Console.WriteLine($"Total: {receipt.Total} {receipt.Currency}");
}
else
{
Console.WriteLine("Could not read the receipt. Try a clearer photo.");
}
public class Receipt
{
public string ShopName { get; set; } = "";
public string Date { get; set; } = "";
public decimal Total { get; set; }
public string Currency { get; set; } = "";
}Notice the prompt gives gentle hints: it mentions the kind of receipt, and it tells the model what to do when a field is missing. Clear prompts give better answers. You are talking to the model like a helpful teacher giving instructions to a student.
Tips that make a real difference
Small habits raise your accuracy a lot. Here are the ones that matter most for beginners.
Use clear images. A sharp, well-lit photo helps the model far more than a fancy model on a blurry photo. Crop out the background if you can.
Keep your schema small. Ask for the fields you truly need. A class with four clear properties works better than one with twenty. Fewer fields means fewer mistakes.
Write plain prompts. Say exactly what each field means. "Total amount as a number, with no currency symbol" is better than just "total".
Pick the right model for the job. For documents and receipts, a model like Qwen2.5-VL reads structure better than a general one. Test on your own images and trust the results you see.
Always handle failure. Use TryGetResult and show a friendly message. Never assume the reply is perfect.
The diagram above shows a sensible loop. Load the image, send it, check the result. If it is valid, save it. If not, you might retry once with a clearer prompt before giving up. This keeps your app calm even when one image is hard to read.
Where you can use this
Once you can read fields from an image, many real tasks open up:
- Receipts for expense tracking, like our example.
- ID cards to read a name and number into a form.
- Invoices to pull line items and totals for accounting.
- Forms that people fill by hand and scan.
- Screenshots where you want the text turned into data.
All of it runs on your own machine. For a small business in India counting daily bills, or a student building a project, that means zero cloud cost and full privacy. The photos stay with you.
Common real uses
Steps
Receipts
Shop, date, total
ID cards
Name, number
Invoices
Items, amounts
Forms
Handwritten fields
A note on privacy and cost
When you use a cloud AI service, your image travels over the internet to someone else's computer, and you often pay per request. With Ollama, the model runs on your machine. The image is read locally and never uploaded. There is no per-request bill. You pay only for the electricity and the hardware you already own.
This is a big deal for sensitive documents like ID cards or medical forms. Keeping data on your own device is often the safest and simplest choice. It also means your app keeps working even with a weak internet connection, because nothing needs to be sent away.
Quick recap
Here is everything in short, easy points:
- A vision model is an AI that can read pictures, not just text.
- Ollama runs these models for free on your own computer at
http://localhost:11434. - In .NET, OllamaSharp plus Microsoft.Extensions.AI give you a simple
IChatClient. - Send an image with
DataContentinside aChatMessage. - Use
GetResponseAsync<T>with a small class to get structured, typed output instead of messy text. - Always check the reply with
TryGetResultso your app stays safe. - Use clear photos, small schemas, plain prompts, and the right model.
- Everything runs locally, so your data stays private and there are no API bills.
You started with a crumpled photo and ended with clean fields, just like the friendly shopkeeper. That is the whole idea, and now you can build it yourself.
References and further reading
- Structured Outputs (Ollama docs)
- OllamaSharp on GitHub
- Use the IChatClient interface (Microsoft Learn)
- ChatClientStructuredOutputExtensions.GetResponseAsync (Microsoft Learn)
- Structured outputs (Ollama blog)
- How to Extract Structured Data From Images Using Ollama in .NET (Milan Jovanović)
Related Posts
Building Semantic Search With Amazon S3 Vectors and Semantic Kernel
A beginner-friendly guide to building semantic search in .NET using Amazon S3 Vectors for cheap storage and Semantic Kernel for embeddings.
Building Resilient Cloud Applications With .NET
Learn to build resilient cloud apps in .NET with retries, timeouts, and circuit breakers using Polly and Microsoft.Extensions.Resilience.
Flexible PDF Reporting in .NET Using Razor Views
A beginner-friendly guide to making PDF reports in .NET by writing Razor views as HTML and turning them into PDFs with a headless browser.
What Is Vector Search? A Concise Guide for .NET Developers
A simple, friendly guide to vector search for .NET developers: embeddings, similarity, nearest neighbors, and how to build it with Microsoft.Extensions.VectorData.
Working With LLMs in .NET Using Microsoft.Extensions.AI
A beginner-friendly guide to calling large language models in .NET with Microsoft.Extensions.AI, using one simple IChatClient interface for any provider.
Top AI Instruments for .NET Developers in 2025
A friendly tour of the best AI tools for .NET developers in 2025: GitHub Copilot, Microsoft.Extensions.AI, Agent Framework, and more.