Skip to main content
SEMastery
Fundamentalsbeginner

5 Ways to Check for Duplicates in C# Collections

Learn 5 simple ways to find duplicates in C# collections using HashSet, LINQ Any, GroupBy, and Distinct, with clear examples and a speed comparison.

12 min readUpdated February 20, 2026

Introduction

Imagine you are a teacher taking attendance in class. You read names from a list one by one. As you call each name, you put a small tick next to it in your notebook. If you ever reach a name and see that it already has a tick, you know that name was written twice. You found a duplicate.

That little notebook with ticks is exactly how a computer finds duplicates. In C#, there are several tools that act like that notebook. Some are fast, some are slow, and some just read very nicely. In this guide we will look at 5 ways to check whether a collection has duplicate items.

A duplicate simply means the same value appears more than once. For example, the list [2, 5, 2, 9] has a duplicate because 2 shows up twice. The list [2, 5, 9] has no duplicates.

By the end, you will know which method to pick and why. Let us start with a picture of the basic idea.

The basic idea: keep a 'seen' notebook and check each item against it.

A quick word about what we are checking

There are really two questions people ask:

  1. Are there any duplicates at all? (a yes/no answer)
  2. Which items are the duplicates? (a list of repeated values)

Most of this article answers the first question, because that is the most common need. Near the end we also show how to list the actual repeated values. Keep this difference in your mind as you read.

Here is the sample data we will use in our examples.

// Our test data for every example below.
int[] numbersWithDuplicate = { 1, 2, 3, 4, 2, 5 };  // 2 appears twice
int[] numbersAllUnique     = { 1, 2, 3, 4, 5, 6 };  // no repeats

Way 1 — HashSet with a foreach loop (the fast one)

A HashSet<T> is a special collection in .NET that refuses to hold the same value twice. It is the computer version of our tick notebook.

The magic is in its Add method. When you call Add:

  • It returns true if the item was new and got added.
  • It returns false if the item was already there.

So if Add ever returns false, we have found a duplicate. We can stop right away.

public static bool HasDuplicates<T>(IEnumerable<T> items)
{
    var seen = new HashSet<T>();
 
    foreach (var item in items)
    {
        // Add returns false when the item is already in the set.
        if (!seen.Add(item))
        {
            return true; // Found a repeat. Stop early.
        }
    }
 
    return false; // Walked the whole list, no repeats.
}

Why is this fast? Two reasons. First, checking and adding to a HashSet is an O(1) operation on average, which means it takes the same tiny amount of time no matter how big the set grows. Second, the loop stops the moment it finds the first duplicate. It does not waste time looking at the rest.

HashSet foreach: stop as soon as Add says false.

This is the method I reach for most of the time. It is clear, it is quick, and it stops early.

Way 2 — HashSet with LINQ Any (short and sweet)

The first method works, but it is a few lines long. We can write the very same idea in a single expression using LINQ's Any method.

Any walks through a collection and asks a yes/no question about each item. The moment one item answers "yes", Any stops and returns true. This is called short-circuiting, and it is exactly the early-stop behaviour we want.

public static bool HasDuplicates<T>(IEnumerable<T> items)
{
    var seen = new HashSet<T>();
 
    // !seen.Add(item) is true the moment we hit a repeat.
    return items.Any(item => !seen.Add(item));
}

Read the line slowly. For each item, we try to add it to seen. If Add returns false, then !false becomes true, and Any stops and reports a duplicate. If we get through every item with no false, then Any returns false.

This is the same speed as Way 1 in practice, but with less code. Some people find it harder to read because of the clever !seen.Add(item) trick. That is a fair point. Pick the version your team finds easiest to understand.

Way 3 — GroupBy (best for listing the repeats)

GroupBy is a LINQ method that sorts items into buckets by a key. All the 2s go in one bucket, all the 5s in another, and so on. After grouping, any bucket holding more than one item is a duplicate.

public static bool HasDuplicates<T>(IEnumerable<T> items)
{
    return items
        .GroupBy(item => item)       // bucket items by value
        .Any(group => group.Count() > 1); // any bucket with 2+?
}

GroupBy shines when you do not just want a yes/no answer, but the actual list of repeated values. Here is how to get them.

// Return every value that appears more than once.
public static IEnumerable<T> FindDuplicates<T>(IEnumerable<T> items)
{
    return items
        .GroupBy(item => item)
        .Where(group => group.Count() > 1)
        .Select(group => group.Key);
}

The trade-off is cost. GroupBy has to read the whole collection and build every bucket before it can answer. It cannot stop early the way the HashSet loop does. So it is slower for a simple yes/no check, but very handy when you need the duplicate values themselves.

GroupBy sorts items into buckets, then checks bucket sizes.

Way 4 — Distinct and Count (easy to read, slow to run)

Distinct gives you back a collection with all the repeats removed. So if the distinct count is smaller than the total count, some items must have been removed, which means there were duplicates.

public static bool HasDuplicates<T>(IEnumerable<T> items)
{
    // If unique count is less than total count, repeats exist.
    return items.Distinct().Count() != items.Count();
}

This one reads almost like plain English: "if the number of unique items is not the same as the total, there are duplicates." Beginners love it for that reason.

But it is the slowest method here, for two reasons:

  1. It always looks at every item. It can never stop early, even when a duplicate sits at the very start.
  2. It may walk the collection more than once — once for Distinct and once for Count. If items is a lazy query, that work can repeat.

So use Distinct().Count() when the collection is small and you care more about clear code than raw speed.

Way 5 — A manual nested loop (only for learning)

Before HashSet and LINQ existed, people compared every item to every other item by hand. It is good to understand this method so you can see why the others are better.

public static bool HasDuplicates<T>(IList<T> items)
{
    for (int i = 0; i < items.Count; i++)
    {
        for (int j = i + 1; j < items.Count; j++)
        {
            if (EqualityComparer<T>.Default.Equals(items[i], items[j]))
            {
                return true; // Found a matching pair.
            }
        }
    }
 
    return false;
}

The outer loop picks one item. The inner loop compares it to every item that comes after it. If any two match, we found a duplicate.

This works, but it is O(n²), which means the work grows with the square of the list size. For 10 items that is about 100 comparisons. For 1,000 items it is about a million. For 10,000 items it is a hundred million. The HashSet method only does about as many steps as there are items. That is a huge difference on big data.

The lesson: never use the nested loop on large collections. It is here only to show the slow path that HashSet saves you from.

Comparing all five methods

Let us put them side by side. "Stops early" means the method can quit the moment it spots the first duplicate.

MethodHow it worksSpeed (average)Stops early?
HashSet + foreachTick notebook, check Add resultFast (O(n))Yes
HashSet + LINQ AnySame as above, one lineFast (O(n))Yes
GroupByBucket by value, check countsMedium (O(n))No
Distinct + CountCompare unique vs totalSlowNo
Nested loopCompare every pairVery slow (O(n²))Yes

Here is a second table to help you choose based on what you actually need.

Your goalBest pickWhy
Just need yes or no, fastHashSet + foreachQuickest and stops early
Want short, clean codeHashSet + LINQ AnyOne readable line
Need the repeated valuesGroupBy + WhereGives you the actual duplicates
Tiny list, readability firstDistinct + CountEasiest to understand
Learning how it worksNested loopShows the slow baseline

Choosing a duplicate-check method

Need values?
Need speed?
HashSet
GroupBy

Steps

1

Need the duplicate values?

If yes, use GroupBy

2

Need only yes/no?

Care about speed

3

Speed matters

Use HashSet + Add

4

Readability matters

Distinct + Count is fine

A quick decision path from your need to the right tool.

A note on equality and custom types

All these methods rely on knowing when two items are "the same." For numbers and strings, .NET already knows. But for your own classes, you must tell it how.

By default, two objects of a custom class are equal only if they are the exact same object in memory. So two different Student objects with the same name would not count as duplicates unless you teach .NET otherwise.

public record Student(int Id, string Name);
 
// Records compare by their values automatically.
// Two Student(1, "Asha") objects are treated as equal,
// so HashSet and GroupBy will catch them as duplicates.

Using a record is the simplest fix in modern C#, because records compare by their contents out of the box. If you use a normal class, you would override Equals and GetHashCode, or pass an IEqualityComparer<T> to the HashSet.

How equality drives duplicate checks

Item A
Equality rule
Item B
Match?

Steps

1

Take two items

A and B

2

Ask the rule

Equals + GetHashCode

3

Compare

Same key means same item

4

Decide

Equal items are duplicates

Each method asks the equality rules whether two items match.

Watch out for these common mistakes

A few small traps catch many beginners. Keep this state diagram in mind: an item is either new or already seen, and only "already seen" means duplicate.

The two states of an item while scanning.
  • Calling Contains then Add. Some people write if (!set.Contains(x)) set.Add(x);. This checks the set twice. Just use set.Add(x) and read its return value. It does both jobs in one step.
  • Re-running a lazy query. Methods like Distinct().Count() can walk a database query or IEnumerable more than once. If each walk hits the database, that is slow and wasteful. Call .ToList() first if you must reuse the data.
  • Forgetting equality for custom types. As shown above, two different objects are not equal by default. Use a record, or override Equals and GetHashCode.
  • Using the nested loop on big data. It looks innocent but explodes on large lists. Reach for HashSet instead.

Putting it together

Here is a tiny program that runs the fast method on both sample arrays so you can see the output.

int[] withDup = { 1, 2, 3, 4, 2, 5 };
int[] noDup   = { 1, 2, 3, 4, 5, 6 };
 
Console.WriteLine(HasDuplicates(withDup)); // True
Console.WriteLine(HasDuplicates(noDup));   // False
 
static bool HasDuplicates<T>(IEnumerable<T> items)
{
    var seen = new HashSet<T>();
    return items.Any(item => !seen.Add(item));
}

For almost every real project, this HashSet approach is the right default. It is fast, it stops early, and it works on any collection type.

References and further reading

Quick recap

  • A duplicate means the same value appears more than once in a collection.
  • HashSet + foreach is the fast favourite. Add returns false on a repeat, and the loop can stop early.
  • HashSet + LINQ Any does the same thing in one short line using short-circuiting.
  • GroupBy is the best choice when you need the actual list of repeated values, not just yes or no.
  • Distinct + Count reads the cleanest but is the slowest, because it always checks everything and may walk the data twice.
  • The nested loop is O(n²) and should only be used for learning, never on big data.
  • For custom classes, use a record or override Equals and GetHashCode so the methods know when two items are equal.
  • Avoid the Contains then Add pattern; a single Add call already does both checks.

Related Posts