Debunking the Filter Early, JOIN Later SQL Performance Myth
The 'filter before you JOIN' tip is mostly a myth. Modern SQL optimizers already push predicates down. Learn what really happens and how to write clean SQL.
The tea stall and the long queue
Picture a busy tea stall outside a railway station. A long line of people waits. The stall owner has a helper who takes orders.
Now imagine someone gives the helper this advice: "Always pour the tea before you check if the customer has money. It is faster that way!"
That sounds silly, right? A smart helper would never pour first. He naturally checks the small, quick thing (does this person actually want tea, do they have the coins) before doing the heavy work of boiling and pouring. He does this on his own. Nobody has to remind him.
A modern SQL database works just like that smart helper. When you write a query, you do not need to tell it "please filter the rows before you join the tables." It already knows that filtering early is cheaper, and it rearranges the work by itself. The famous tip "filter early, JOIN later" is mostly a leftover habit from very old databases. Today it is, in most cases, a myth.
Let us understand why.
What the myth actually claims
The advice usually sounds like one of these:
- "Put your
WHEREfilter before theJOINso fewer rows get joined." - "Wrap the table in a subquery that filters first, then join the small result."
- "Order your joins so the smallest table comes first."
The belief is that the order you type things controls the order the database runs things. If that were true, typing the filter first would mean the database filters first, and you would join fewer rows.
But that is not how a modern SQL engine works.
SQL is declarative, not a recipe
Here is the key idea. SQL is a declarative language. You describe what result you want. You do not write the steps to get there.
This is different from a normal program where you write step 1, step 2, step 3 and the computer runs them in that exact order. In SQL, you hand the engine a goal, and a special part of the engine called the query optimizer decides the steps.
So the text order of your JOIN and WHERE is just a description. The optimizer is free to ignore that order and pick its own.
The logical order is not the physical order
People often learn the "logical order of operations" in SQL. It looks like this:
FROMandJOINWHEREGROUP BYHAVINGSELECTORDER BY
This order says JOIN comes before WHERE. That is where the worry starts. People think: "Oh no, the join runs first on ALL the rows, then the filter throws most of them away. What a waste!"
But this logical order is only a rule for meaning. It tells you what the result should be. It does not tell you the actual machine steps. The real steps are called the physical plan, and the optimizer is allowed to move the filter earlier.
The trick the optimizer uses here has a name: predicate pushdown.
Predicate pushdown, the helper that filters early for you
A predicate is just a fancy word for a condition, like City = 'Mumbai' or Age > 18.
Predicate pushdown means the optimizer takes that condition and pushes it down close to the table it belongs to. It applies the filter before the join, even if you typed the filter after the join.
So when you write this:
SELECT c.Name, o.Total
FROM Customers c
JOIN Orders o ON o.CustomerId = c.Id
WHERE c.City = 'Mumbai';The optimizer does not join every customer to every order and then filter. Instead it filters customers down to Mumbai first, then joins only those. You did not have to ask. It happened for free.
How predicate pushdown rewrites the work
Steps
Read query
Optimizer reads JOIN and WHERE together
Spot filter
Finds City = Mumbai on Customers
Push down
Applies filter before the join
Join small set
Joins only Mumbai customers to orders
This is exactly the tea stall helper checking the coins before pouring. The database is the helper. You do not need to micromanage it.
A quick proof: two queries, one plan
Let us compare the "myth way" and the "clean way".
The myth way filters first inside a subquery, then joins:
SELECT m.Name, o.Total
FROM (
SELECT Id, Name
FROM Customers
WHERE City = 'Mumbai'
) AS m
JOIN Orders o ON o.CustomerId = m.Id;The clean way just writes a normal join with a WHERE:
SELECT c.Name, o.Total
FROM Customers c
JOIN Orders o ON o.CustomerId = c.Id
WHERE c.City = 'Mumbai';If you turn on the execution plan in SQL Server (or use EXPLAIN ANALYZE in PostgreSQL), you will almost always see the same physical plan for both. Same operators. Same cost. Same speed. The optimizer flattens the subquery and pushes the predicate down in both cases.
So the extra typing in the "myth way" buys you nothing. It only makes the query harder to read.
When the order really did matter (history)
The advice was not always wrong. Decades ago, some database engines had weak or simple optimizers. They sometimes followed your text more literally. On those old systems, hand-tuning the join and filter order could genuinely help.
That is where the habit came from. It made sense then. But engines like SQL Server, PostgreSQL, MySQL, and Oracle have spent years building cost-based optimizers. These look at table size, indexes, and statistics, and they pick a smart plan no matter how you typed the query.
| Era | Optimizer | Did text order matter? |
|---|---|---|
| Old / simple engines | Weak or rule-based | Yes, sometimes |
| Modern engines | Cost-based, smart | Almost never |
| Edge cases today | Stale stats, huge queries | Rarely, fix the root cause |
What a cost-based optimizer really uses
The optimizer does not guess. It uses statistics: small summaries of your data, like how many rows are in a table and how many different values a column has.
With those numbers it estimates the cost of many possible plans and picks the cheapest. The order you typed your joins is just one starting idea, not a command.
This is why two queries that look very different can run at the exact same speed. They land on the same chosen plan.
What this means for EF Core and LINQ
If you write C# with Entity Framework Core, the same lesson applies. You write LINQ, EF Core turns it into SQL, and the database optimizer does the rest. You do not need to twist your LINQ to "filter before include."
Here is a clean, readable query:
var mumbaiOrders = await db.Customers
.Where(c => c.City == "Mumbai")
.SelectMany(c => c.Orders, (c, o) => new
{
c.Name,
o.Total
})
.ToListAsync();EF Core sends a normal JOIN with a WHERE City = 'Mumbai'. The database pushes that filter down before the join. You get a fast plan without any hand-tuning.
The same is true if you write the filter "later" in your LINQ chain. EF Core builds one SQL statement, and the optimizer reorders the work. So focus on clarity, not on the position of your .Where.
One real EF Core tip that does help: only select the columns you need. That genuinely reduces the data sent back. That is a different, real optimization, not the join-order myth.
// Good: ask only for what you use
var names = await db.Customers
.Where(c => c.City == "Mumbai")
.Select(c => c.Name)
.ToListAsync();When you DO need to step in
The myth is mostly false, but "mostly" is not "always." A few real cases need your attention. The fix is almost never reordering text. It is fixing the root cause.
What to do when a query is genuinely slow
Steps
Slow query
You measured it, it is truly slow
Read plan
Look at the actual execution plan
Find cause
Missing index? Stale stats? Bad estimate?
Fix root
Add index, update stats, rewrite logic
Here are the honest edge cases:
| Real problem | Sign you see | The right fix |
|---|---|---|
| Stale statistics | Bad row estimates in the plan | Update statistics |
| Missing index | Big table scan on a filter column | Add a suitable index |
| Very complex query | Optimizer times out searching plans | Simplify or split the query |
| Non-sargable filter | WHERE YEAR(date) = 2026 ignores index | Rewrite to a range filter |
Notice that none of these fixes is "type the filter before the join." That habit does not appear anywhere in the real fixes.
A small note on join order specifically
Sometimes people say "put the smallest table first." Modern optimizers reorder joins for you using cost estimates, so this is usually pointless too. The optimizer already considers many join orders and picks a good one.
The rare exception is a query with so many joins that the optimizer cannot explore every order in time. Even then, the better answer is to simplify the query or update statistics, not to memorize a seating order for your tables.
Why clean SQL wins in the long run
There is one more reason to drop the myth. Clean SQL is easier for people to read. When you wrap a table in a filtering subquery "for speed," the next person who reads your code has to stop and figure out why. They wonder if there is a hidden reason. They are scared to change it.
A plain JOIN with a clear WHERE tells the truth in one glance. It says exactly what data you want. The optimizer turns it into the same fast plan anyway, so you lose nothing on speed and gain a lot on clarity.
Good code is read many more times than it is written. A query that is easy to understand will be easier to fix, easier to extend, and easier to trust. That is a real, lasting benefit, and it costs you no performance at all.
So the simple rule is this: write the clearest query you can, measure only when something feels slow, and reach for the execution plan instead of folklore. Your future self, and your teammates, will thank you.
How to actually look under the hood
The best habit is not memorizing tips. It is reading the execution plan. That shows you the truth.
- In SQL Server: turn on "Include Actual Execution Plan", or run
SET STATISTICS IO ON;to see reads. - In PostgreSQL: run
EXPLAIN ANALYZEbefore your query. - In MySQL: run
EXPLAIN ANALYZEas well.
When you read the plan, you will often find your "filter early" rewrite changed nothing. That is the optimizer quietly telling you it already had things covered.
Quick recap
- SQL is declarative. You describe the result; the optimizer picks the steps.
- The logical order (JOIN before WHERE) is about meaning, not real machine steps.
- Predicate pushdown means the optimizer filters early for you, automatically.
- "Filter early, JOIN later" rewrites usually compile to the same plan. They do not help.
- The advice came from old, weak optimizers. Modern engines are cost-based and smart.
- Real slow queries come from missing indexes, stale statistics, or non-sargable filters, not text order.
- Write clear, readable SQL and LINQ. Then check the actual execution plan when something is slow.
- One real win: select only the columns you need. That is a true optimization, unlike join-order folklore.
References and further reading
- Query Processing Architecture Guide — SQL Server (Microsoft Learn)
- Showplan Logical and Physical Operators Reference (Microsoft Learn)
- Debunking the "Filter Early, JOIN Later" SQL Performance Myth — Milan Jovanović
- Pushing down filters to make queries faster — DoltHub Blog
- Order of Execution of SQL Queries — GeeksforGeeks
Related Posts
A Complete Guide to Different Types of Joins in SQL
Learn SQL joins the easy way: INNER, LEFT, RIGHT, FULL OUTER, CROSS and SELF joins explained with simple tables, diagrams, and C# examples.
EF Core Query Splitting: Fix Slow Queries and Cartesian Explosion
Learn how EF Core query splitting (AsSplitQuery) fixes the cartesian explosion problem with simple examples, diagrams, and real performance numbers. Know when to split and when not to.
Complete Guide to Transaction Isolation Levels in SQL
Learn SQL transaction isolation levels the easy way: dirty reads, non-repeatable reads, phantoms, snapshot, and serializable with simple diagrams and C# code.
Optimizing SQL Performance with Indexing Strategies for Faster Queries
Learn SQL indexing the easy way: clustered, nonclustered, covering and composite indexes, with simple diagrams and C# examples to make your queries fast.
Introduction to Locking and Concurrency Control in .NET 6
A beginner-friendly guide to locking and concurrency control in .NET 6 and EF Core, with a simple analogy, diagrams, code, and optimistic vs pessimistic locking.
EF Core Query Optimization: From 30 Seconds to 30 Milliseconds
Learn EF Core query optimization step by step: fix N+1 queries, use projections, AsNoTracking, indexes, and compiled queries to turn a 30-second query into 30ms.