Skip to main content
SEMastery
Fundamentalsbeginner

Building Semantic Search With Amazon S3 Vectors and Semantic Kernel

A beginner-friendly guide to building semantic search in .NET using Amazon S3 Vectors for cheap storage and Semantic Kernel for embeddings.

13 min readUpdated May 10, 2026

Think about a friendly librarian in your school. You walk up and say, "I want a book about a boy who flies on a broom and goes to a magic school." You never said the title. You never said "Harry Potter". But the librarian smiles and brings you the right book, because she understood the meaning of what you asked.

A normal computer search is not like that librarian. It looks for the exact words you typed. If you type "magic school broom boy" and the book description says "young wizard at Hogwarts", a word-matching search may find nothing.

Semantic search is how we teach our app to act like that kind librarian. It searches by meaning, not by exact words. In this article we will build semantic search in .NET using two tools that work nicely together:

  • Amazon S3 Vectors to store the data cheaply.
  • Semantic Kernel (and the .NET vector data libraries) to turn text into meaning and to run the search.

We will go slow, use simple words, and build it step by step.

What is an embedding?

Before we search by meaning, we need a way to measure meaning. Computers are good at numbers, not feelings. So we turn each piece of text into a long list of numbers. That list is called an embedding (or a vector).

The clever part is this: text with similar meaning gives numbers that are close together. Text with different meaning gives numbers that are far apart.

Imagine a giant map. "Cat" and "kitten" sit near each other. "Cat" and "rocket" sit far apart. An embedding is just the address of a word or sentence on that map.

Text is turned into a vector. Similar meanings land near each other.

A model that creates these numbers is called an embedding model. Examples are OpenAI's text-embedding-3-small or models you run with Amazon Bedrock or Ollama. Each model gives a fixed number of values, like 1536 numbers. That count is called the number of dimensions.

What is Amazon S3 Vectors?

Once we have these number lists, we need a place to keep them. We could use a special vector database. But Amazon released something simpler and cheaper: Amazon S3 Vectors.

Amazon S3 is the famous storage service that holds files (called objects) in the cloud. S3 Vectors adds native support to store and query vectors right inside S3. It is built to be very cheap at huge scale. AWS says it can cut storage and query costs by up to 90 percent compared with running your own vector engine, and it can hold billions of vectors.

It works best when you have a lot of data but do not search every second. Think of a document archive, a media library, or a product catalog. You pay little to keep the vectors, and you pay a small amount each time you search.

Here are the main building blocks in S3 Vectors.

TermWhat it meansEveryday picture
Vector bucketA special S3 bucket that holds vector indexesA big cupboard
Vector indexA named place inside the bucket where vectors liveOne drawer in the cupboard
VectorOne embedding plus a key and some metadataOne labelled card in the drawer
QueryA search for the nearest vectors to your inputAsking "which cards are most like this one?"

The S3 Vectors API gives you a small set of actions. The ones we care about most are listed below.

API actionWhat it does
CreateVectorBucketMakes a new vector bucket in your AWS region
CreateIndexMakes a new index and sets its dimensions and distance type
PutVectorsAdds up to 500 vectors at a time, each with a key and metadata
QueryVectorsFinds the nearest vectors to a query vector (the search step)
GetVectorsReads vectors back by their keys
DeleteVectorsRemoves vectors you no longer want

What is Semantic Kernel?

Semantic Kernel is a Microsoft open-source SDK for .NET that helps you add AI features to your apps. For our task, two parts of the .NET AI stack matter:

  • An embedding generator, which calls an embedding model and gives you the number list.
  • The vector data abstractions in the Microsoft.Extensions.VectorData package, which give a common, provider-agnostic way to store and search vectors.

The nice thing about these abstractions is that you write your search code once. You can start with a free in-memory store on your laptop, then switch to a real service like Azure AI Search, Qdrant, or your own S3 Vectors store, with very little code change.

How the whole thing fits together

Let us look at the full picture before we write code. There are two journeys. The first is ingestion: we read our documents, turn them into vectors, and save them. The second is search: a user asks a question, we turn the question into a vector, and we find the closest stored vectors.

The two journeys: first we store documents as vectors, then we search them.

Ingestion pipeline

Read
Chunk
Embed
Store

Steps

1

Read

Load text from files or a database

2

Chunk

Split long text into small pieces

3

Embed

Turn each chunk into a vector

4

Store

PutVectors into the S3 index

How a document becomes a searchable vector.

Search pipeline

Ask
Embed
Query
Rank
Show

Steps

1

Ask

User types a natural question

2

Embed

Turn the question into a query vector

3

Query

QueryVectors finds nearest stored vectors

4

Rank

Order results by closeness score

5

Show

Return the matching documents

How a user question finds the right answer.

Step 1: Add the packages

Start a new console app and add the libraries. We need the AWS SDK for S3 Vectors, the .NET vector data abstractions, and an embedding generator. Here we use OpenAI for the embeddings, but you can swap it for Bedrock or Ollama later.

// In your terminal, from the project folder:
// dotnet add package AWSSDK.S3Vectors
// dotnet add package Microsoft.Extensions.VectorData.Abstractions
// dotnet add package Microsoft.Extensions.AI
// dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
 
using Amazon.S3Vectors;
using Amazon.S3Vectors.Model;
using Microsoft.Extensions.AI;

A small note for honesty: package names in the AI space change often because the tools are young and move fast. Always check the current name on NuGet before you copy. The ideas in this article stay the same even if a name changes.

Step 2: Define what a "document" looks like

We want to store more than just numbers. For each chunk of text we keep an id, the original text (so we can show it back), and the embedding. We will keep the human-readable fields in S3 Vectors metadata, and the embedding as the vector itself.

// A simple record for one searchable chunk.
public sealed record SearchDocument
{
    public required string Id { get; init; }       // unique key
    public required string Title { get; init; }     // shown to the user
    public required string Text { get; init; }      // the chunk content
    public required float[] Embedding { get; init; } // the vector
}

Why keep the original text? Because the vector is just numbers. When we find a match, we want to show the human the real sentence, not a list of floats.

Step 3: Create the bucket and index

Before storing anything, we make a vector bucket and an index inside it. The index needs two important settings: the dimension count (must match your embedding model) and the distance type (how we measure "closeness", usually cosine).

var s3v = new AmazonS3VectorsClient(); // uses your AWS credentials
 
await s3v.CreateVectorBucketAsync(new CreateVectorBucketRequest
{
    VectorBucketName = "library-search"
});
 
await s3v.CreateIndexAsync(new CreateIndexRequest
{
    VectorBucketName = "library-search",
    IndexName        = "book-chunks",
    Dimension        = 1536,            // must match the embedding model
    DistanceMetric   = DistanceMetric.Cosine,
    DataType         = DataType.Float32
});

The dimension must match your model. If your model gives 1536 numbers but your index expects 384, the store will reject the data. Think of it like a key and a lock: the shape has to fit.

Step 4: Turn text into embeddings

Now we create the embedding generator and turn a piece of text into a vector. With Microsoft.Extensions.AI, the interface is small and clean.

IEmbeddingGenerator<string, Embedding<float>> embedder =
    new OpenAIClient(apiKey)
        .GetEmbeddingClient("text-embedding-3-small")
        .AsIEmbeddingGenerator();
 
// Turn one sentence into its vector.
Embedding<float> result =
    await embedder.GenerateAsync("A young wizard goes to a magic school.");
 
float[] vector = result.Vector.ToArray();
// vector.Length is 1536, matching our index dimension.

When you have many chunks, generate their embeddings in batches. That is faster and cheaper than one call per chunk, because each network round trip costs time.

Step 5: Store the vectors (ingestion)

Now we push our vectors into the S3 Vectors index. The PutVectors action takes up to 500 vectors per call. Each vector carries a key, the float data, and optional metadata we can show later.

async Task StoreAsync(IEnumerable<SearchDocument> docs)
{
    var items = docs.Select(d => new PutInputVector
    {
        Key  = d.Id,
        Data = new VectorData { Float32 = d.Embedding.ToList() },
        Metadata = Document.FromString(
            $"{{\"title\":\"{d.Title}\",\"text\":\"{d.Text}\"}}")
    }).ToList();
 
    await s3v.PutVectorsAsync(new PutVectorsRequest
    {
        VectorBucketName = "library-search",
        IndexName        = "book-chunks",
        Vectors          = items   // up to 500 per request
    });
}

This is the "fill the drawer" step. Each card (vector) has a label (key), a meaning (the floats), and a sticky note (metadata).

Step 6: Search by meaning (query)

Finally, the fun part. A user types a question. We embed the question, then call QueryVectors to find the nearest stored vectors. We ask for the top few results and request the metadata so we can show the title and text.

async Task<IReadOnlyList<(string Key, double Score)>> SearchAsync(string question)
{
    // 1. Turn the question into a query vector.
    var q = (await embedder.GenerateAsync(question)).Vector.ToArray();
 
    // 2. Ask S3 Vectors for the nearest matches.
    var response = await s3v.QueryVectorsAsync(new QueryVectorsRequest
    {
        VectorBucketName = "library-search",
        IndexName        = "book-chunks",
        QueryVector      = new VectorData { Float32 = q.ToList() },
        TopK             = 5,             // return the 5 closest
        ReturnDistance   = true,
        ReturnMetadata   = true
    });
 
    return response.Vectors
        .Select(v => (v.Key, v.Distance ?? 0d))
        .ToList();
}

A smaller distance means a closer match. With cosine distance, a value near 0 means "almost the same meaning" and a value near 1 means "quite different". So we usually sort by distance and keep the smallest values.

The search request and response flow at runtime.

Using the Semantic Kernel vector abstractions

The AWS SDK code above talks to S3 directly. That is good for learning. But in a bigger app you may prefer the provider-agnostic style from Microsoft.Extensions.VectorData. You describe your model with attributes, and the same search code works across many stores.

using Microsoft.Extensions.VectorData;
 
public sealed class BookChunk
{
    [VectorStoreKey]
    public required string Id { get; set; }
 
    [VectorStoreData]
    public required string Title { get; set; }
 
    [VectorStoreData]
    public required string Text { get; set; }
 
    [VectorStoreVector(1536, DistanceFunction = DistanceFunction.CosineSimilarity)]
    public required ReadOnlyMemory<float> Embedding { get; set; }
}

With a connector that supports this model, you get a VectorStoreCollection and call simple methods like UpsertAsync to save and SearchAsync to find. You can even attach an embedding generator to the collection so it makes the vectors for you during upsert and search. The big win is that you can start with the free InMemoryVectorStore on your laptop, prove your idea works, then point at a real backing store when you scale up.

Why the abstraction helps

Prototype
Test
Scale

Steps

1

Prototype

Use the in-memory store, no cloud needed

2

Test

Same code, real embeddings

3

Scale

Swap to S3 Vectors or another store

Write once, switch stores later.

Tips for good results

Building the pipeline is only half the job. Here are simple habits that make your search feel smart instead of random.

  • Chunk your text well. Do not embed a whole 50-page PDF as one vector. Split it into small, meaningful pieces, maybe a few sentences each. Small chunks give sharper matches.
  • Keep dimensions matched. The model dimension and the index dimension must be equal. Write the number down in one place so you never mix it up.
  • Store the original text. Always keep the real sentence in metadata so you can show it to the user.
  • Batch your embeddings. Generate many at once to save money and time.
  • Pick the right distance. Cosine is a safe default for text. Stay consistent between storing and searching.

When is S3 Vectors the right choice?

S3 Vectors shines when you have lots of vectors but query them less often, and you care about cost. Document archives, support knowledge bases, media catalogs, and agent memory are great fits. AWS positions it for retrieval-augmented generation (RAG), agent memory, and semantic search at very large scale and low cost, with sub-second responses for occasional queries.

If you need extremely fast, high-volume queries every second, a purpose-built, in-memory vector engine may suit better. The good news: because we used the .NET vector abstractions, switching later is mostly a config change, not a rewrite.

References and further reading

Quick recap

  • Semantic search finds results by meaning, not by exact words, like a kind librarian who understands what you really want.
  • An embedding turns text into a list of numbers. Similar meanings give numbers that are close together.
  • Amazon S3 Vectors stores and queries vectors right in S3. It is cheap at large scale and great for archives, catalogs, and agent memory.
  • Semantic Kernel and Microsoft.Extensions.VectorData give clean .NET tools to make embeddings and run searches with one common API.
  • The flow has two journeys: ingest (read, chunk, embed, store) and search (ask, embed, query, rank, show).
  • Match your dimensions, chunk your text well, keep the original text, batch your embeddings, and use cosine distance for text.
  • Using the .NET vector abstractions means you can start small on your laptop and switch stores later with little code change.

Related Posts