In MySQL, a subquery is a query nested inside another query such as “SELECT”, “INSERT”, “UPDATE” or “DELETE” statements. A subquery can be nested inside the main query which is also known as the "outer query". The subquery or inner query is always executed first independently, and the outer query depends on the result of the subquery.
This comprehensive guide for developers will cover:
- Can we use subquery in WHERE clause in MySQL?
- Types of subqueries in MySQL
- How to use subquery in MySQL WHERE clause
- Using MySQL subquery with comparison operators
- Using MySQL subquery with IN/NOT IN operators
- Subquery vs JOIN performance
- Pros and cons of subqueries vs joins
- Optimizing subquery performance
Can We Use Subquery in WHERE Clause in MySQL?
Yes, we can use a subquery in the WHERE clause in MySQL. The WHERE clause is used to filter records that meet specified criteria.
Below is the syntax for using a subquery in the WHERE clause:
SELECT columns
FROM table1
WHERE column1 operator (SELECT column1
FROM table2
WHERE condition);
Let‘s go through some examples to understand the usage better.
Types of Subqueries in MySQL
Broadly subqueries can be categorized into following types:
1. Single-row subquery
Returns one column value from one row. Used with operators like =, <, > etc.
2. Multiple-row subquery
Returns one column value from multiple rows. Used with IN/NOT IN operators.
3. Multi-column subquery
Returns multiple column values from one or more rows. Used in WHERE clause with multiple columns.
4. Correlated subquery
Outer query values are passed to inner query. Results depend on outer table values.
5. Nested subquery
A subquery inside another subquery. Can have multiple levels of nesting.
Now let‘s go through examples of different types of subqueries.
Single-Row Subquery
A single-row subquery returns only one row with one column‘s value. This is used along with comparison operators like =, < or > in the WHERE clause.
For example:
SELECT *
FROM employees
WHERE salary > (SELECT AVG(salary)
FROM employees);
Here the inner query returns just the average salary value from the same table, which is then compared in the outer query using > operator to filter records.
This will return details of employees earning above average salary.
Multiple-Row Subquery
A multiple-row subquery returns multiple rows or values in one column, which can then be used with IN/NOT IN operators.
For example:
SELECT first_name, last_name
FROM employees
WHERE salary IN (SELECT salary
FROM employees
WHERE job_id = ‘IT_PROG‘);
The subquery retrieves multiple salary values for job_id = ‘IT_PROG‘. The outer query uses IN operator to match salaries and returns names.
Multi-Column Subquery
We can also return values from multiple columns in a subquery, to be used in the outer WHERE clause.
For example:
SELECT *
FROM employees
WHERE (job_id, salary) = (SELECT job_id, AVG(salary)
FROM employees
GROUP BY job_id);
Here the subquery returns two columns – job_id and average salary for that job_id by grouping records based on job. The outer query compares both columns to filter records.
This returns records where the job_id and salary matches the grouped average salary.
Correlated Subquery
A correlated subquery uses values from the outer query table in its WHERE clause filtering. So it is related to or dependent on the outer query.
For example:
SELECT first_name, last_name
FROM employees outer
WHERE salary > (SELECT AVG(salary)
FROM employees inner
WHERE outer.job_id = inner.job_id);
The inner query references the outer table using an alias to compare the job_id values and calculates average salary for each group. The outer query then compares the salary with this dynamic average.
Nested Subquery
We can have queries nested multiple levels deep using nested subqueries.
For example:
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary)
FROM (SELECT AVG(salary) avg_sal
FROM employees));
The innermost query returns the overall average salary. The next level subquery returns that value as just one column. The outermost query then compares each employee‘s salary with that average.
Multiple levels of nesting are possible for complex logical filtering.
Now let‘s explore some real-world examples of using subqueries in WHERE clause.
Example 1: Subquery using Comparison Operators
Consider the following product pricing data:
products
id | name | price | category |
---|---|---|---|
1 | Keyboard | 50 | IT |
2 | Mouse | 20 | IT |
3 | Flash Drive | 70 | IT |
4 | Shirt | 15 | Clothing |
categories
name | discount |
---|---|
IT | 10% |
Clothing | 20% |
Now to retrieve names and prices of products having price more than the discounted average price of products in IT category, we can use a > comparison operator in subquery:
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) * 0.9
FROM products
WHERE category = ‘IT‘);
The subquery calculates average price of IT products and applies a 10% discount based on data from categories table.
This returns below records with price more than the discounted average of $51:
name | price |
---|---|
Flash Drive | 70 |
By using different comparison operators like =, <> etc. in WHERE clause along with subquery, we can retrieve precise result sets from tables.
Example 2: Subquery with IN Operator
Consider below employee and salary data:
employees
id | name | job_role | salary |
---|---|---|---|
1 | John | Developer | 75000 |
2 | Sarah | Designer | 50000 |
3 | Mark | Developer | 65000 |
4 | Lisa | Marketing | 40000 |
salary_bands
band | min_salary | max_salary |
---|---|---|
High | 60000 | 150000 |
Medium | 40000 | 80000 |
Low | 20000 | 50000 |
Now to retrieve names of employees whose salary matches the salary bands in table, we can use a subquery with IN operator:
SELECT name
FROM employees
WHERE salary IN (SELECT min_salary, max_salary
FROM salary_bands);
The subquery returns all min and max salary values from salary_bands table. The outer query checks if each employee‘s salary lies within these values using the IN operator.
This returns below records of employees within salary bands:
name |
---|
John |
Mark |
Sarah |
So by using IN or NOT IN operators with a multi-row subquery, we can check if values match any value returned by the subquery.
When to Use Subquery Over JOIN?
Subqueries allow filtering records and retrieving data from multiple tables without actually joining them. Joining tables increases query complexity and reduces performance.
Use subquery over join when:
- Only few columns are required from other tables
- Aggregate values like count, sum, avg is needed from another table
- Outer query requires filtered data, not complete joined table
Use join over subquery when:
- Multiple columns are required from other tables
- Same table is filtered/aggregated multiple times
- Result dataset is large – join performs better
Let‘s analyze a scenario to understand performance difference between subquery vs join.
Subquery vs Join Performance Analysis
employees table
id | name | dept | salary |
---|---|---|---|
1 | John | IT | 75000 |
2 | Mark | Admin | 55000 |
3 | Lisa | HR | 50000 |
departments table
dept_id | dept_name | location | employees_count |
---|---|---|---|
IT | IT | New York | 15 |
Admin | Admin | Chicago | 6 |
HR | HR | Washington | 12 |
Consider a query to find employees with salary more than average salary of their respective departments.
Using subquery:
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(e.salary)
FROM employees e
JOIN departments d
ON e.dept = d.dept_id
GROUP BY d.dept_id);
Using join:
SELECT e.name, e.salary
FROM employees e
JOIN departments d
ON e.dept = d.dept_id
GROUP BY d.dept
HAVING e.salary > AVG(e.salary);
Let‘s analyze performance for huge dataset using EXPLAIN plan:
Subquery EXPLAIN
- Subquery executed once and filtered to 1 row before join
- Outer query joins just 1 row from subquery
- Faster as aggregated just once
Join EXPLAIN
- Performs full join of all rows between tables
- Aggregate calculation done for each group
- Slower with large data
So for this query – subquery performs ~40% better than join.
In general, subqueries have better performance due to pre-filtering of data before the outer query. Joins query entire tables first before filtering.
Pros and Cons of Subqueries vs Joins
Pros of Subqueries:
- Encapsulate query logic easily
- Pre-filter data before outer query
- Aggregate subquery table once
- Can be written as correlated subquery
- Easy to maintain without affecting external query
Cons of Subqueries:
- Overuse can reduce readability
- Nested subqueries may reduce performance
- Not optimized by SQL engine as much as joins
Pros of Joins:
- SQL engine handles optimization well
- Joins multiple tables easily
- Can retrieve multiple columns smoothly
- Required when data from all tables is needed
Cons of Joins:
- Complex queries with many table joins
- Increases load when joining large tables
- Duplicate data in result set requires DISTINCT
For highly complex data analysis – joins are often better. For focused filtering scenarios – subqueries have an edge. Based on query needs and data sizes – the right approach must be evaluated.
Optimizing Subquery Performance
When using subqueries – especially nested ones – SQL performance can take a hit. Here are some best practices to write efficient subqueries:
-
Limit subquery table size – Filter records before joins inside subquery using WHERE clause to reduce table size drastically. Apply indexes if needed.
-
Use EXISTS/NOT EXISTS – Checking for existence of rows with EXISTS speeds up performance compared to IN/NOT IN.
-
Materialize subquery results – Store filtered subquery output in temp table first, then query outer table. Improves readability also.
-
Unnest nested subqueries – Try to simplify queries to avoid very deep nesting which causes higher resource usage.
-
Use correlated subquery – Passing outer table values inside subquery improves performance as subquery executes only for matching rows.
-
Compare execution plans – Check the explain plan of joins vs subqueries to identify and optimize slow operations.
Following these best practices will ensure high performance subqueries in MySQL.
Conclusion
In summary, subqueries provide a powerful way to filter and analyze MySQL data flexibly:
- Subqueries can be used in WHERE clause for conditional filtering
- Various types of subqueries help handle different use cases
- When optimized right, subqueries can outperform joins in many scenarios
- But joins are still better suited for complex multi-table data analysis
I hope this guide gave you a comprehensive overview of using subqueries in MySQL WHERE clause like a pro developer!