SQL SELECT DISTINCT Statements
Definition
The SELECT DISTINCT statement in SQL is used to return only unique values in the result set of a query. It eliminates duplicate rows from the result set. The DISTINCT
keyword is used in the SELECT
statement to accomplish this.
Here is the basic syntax of the SELECT DISTINCT
statement:
SELECT DISTINCT column1, column2
FROM table_name;
Example
If you have a table called employees
with columns first_name
, last_name
, and age
, you could use the following query to select all the unique first names in the table:
SELECT DISTINCT first_name
FROM employees;
This will return a list of all the distinct first names in the "employees" table, without any duplicates.
You could also use multiple columns in the SELECT DISTINCT
statement. For example, the following query will return a list of all the unique first name and last name combinations in the employees
table:
SELECT DISTINCT first_name, last_name
FROM employees;
You can use DISTINCT in combination with other clauses such as WHERE
and ORDER BY
, for example:
SELECT DISTINCT first_name, last_name
FROM employees
WHERE age > 25
ORDER BY last_name;
It would select distinct first_name
and last_name
combinations from the table employees
, filtering only the rows where age
is greater than 25, ordering the results by last_name
Best Practices
There are a few best practices to keep in mind when using the SELECT DISTINCT statement in SQL:
- Use
DISTINCT
only when necessary: TheDISTINCT
keyword is used to eliminate duplicate rows from the result set, but it can also make a query slower by increasing the amount of work that the database server needs to do. So, you should use it only when it is really needed. - Use the right column or columns: If you are only interested in unique values of a single column, you should use that column in the
DISTINCT
clause. If you want to see unique combinations of multiple columns, you should use those columns in the DISTINCT clause. - Use indexing: If you are using a
DISTINCT
clause on a column that is not indexed, it can make the query slow. Therefore, it is a good idea to make sure that the column or columns you are using in theDISTINCT
clause are indexed. - Avoid using
DISTINCT
with aggregate functions: Aggregate functions such asSUM
,AVG
,MAX
andMIN
already return a single value for each group of rows. Therefore, it is not necessary to useDISTINCT
with them. - Be aware of sorting: When using the DISTINCT clause, the order of the rows in the result set is undefined. If the order is important, you should use the
GROUP BY
clause instead of theDISTINCT
clause. - Keep an eye on performance:
DISTINCT
statements can be heavy in terms of performance, since it is required to look at all rows and compare the data to discard duplicates. If it is an option, is suggested to useLIMIT
and/or to combine it with index-backed queries to achieve better performance.