SQL GROUP BY Statement
Definition
In SQL, the GROUP BY
statement is used in conjunction with aggregate functions (like SUM
, COUNT
, AVG
, MAX
, MIN
, etc.) to group the result-set by one or more columns. The aggregate functions are used to calculate a single value for each group of rows.
Example
For example, consider the following table named "orders":
id | customer_name | product | quantity | price |
---|---|---|---|---|
1 | John Smith | Hammer | 2 | 10 |
2 | John Smith | Screws | 100 | 5 |
3 | Jane Doe | Saw | 1 | 50 |
4 | Jane Doe | Hil | 200 | 5 |
5 | John Smith | Ben | 50 | 5 |
If we want to find out the total quantity of each product ordered, we can use the SUM()
function with GROUP BY
:
SELECT product, SUM(quantity)
FROM orders
GROUP BY product;
The result of this query would be:
product | SUM(quantity) |
---|---|
Hammer | 2 |
Saw | 1 |
Screws | 350 |
Notice that the SELECT
statement only includes the columns mentioned in the GROUP BY
clause and the aggregate function. The columns that are not included in the GROUP BY clause, like "customer_name" and "price", are not displayed in the result set.
You can also combine multiple column in GROUP BY, for example:
SELECT customer_name, product, SUM(quantity)
FROM orders
GROUP BY customer_name, product;
The result of this query would be:
id | customer_name | product | SUM (quantity) |
---|---|---|---|
1 | Jane Doe | Saw | 1 |
2 | Jane Doe | Screws | 200 |
3 | John Smith | Hammer | 2 |
4 | John Smith | Screws | 150 |
It will create a separate group for each combination of the columns mentioned in the GROUP BY clause.
Best Practices
Here are some best practices to keep in mind when using the GROUP BY statement:
Always include a
WHERE
clause before theGROUP BY
clause, if possible, to filter the data. This will make the data more manageable and improve performance.Use the correct aggregate function for the type of data you're working with. For example, use
SUM()
for numerical data andCOUNT()
for categorical data.When selecting multiple columns in the
SELECT
statement, make sure that the column that is not included in theGROUP BY
clause is an aggregate function or a constant.Make sure that you understand how
NULL
values are handled by the aggregate function you're using. Some functions (likeCOUNT()
) treatNULL
values as distinct from other values, while others (likeSUM()
) ignoreNULL
values.Use the
GROUP BY
clause wisely, and make sure you understand how it affects the results of your query. It's very easy to produce incorrect results if you're not careful.Optimize and debug your query, by using the database management system's appropriate
EXPLAIN
command, testing with a small dataset, and also by checking the performance of the query by seeing how much IO, CPU and Memory it is consuming.