1. sql
  2. /references
  3. /distinct

SQL SELECT DISTINCT Statements

Definition

The SELECT DISTINCT statement in SQL is used to return only unique values in the result set of a query. It eliminates duplicate rows from the result set. The DISTINCT keyword is used in the SELECT statement to accomplish this.

Here is the basic syntax of the SELECT DISTINCT statement:

SELECT DISTINCT column1, column2 
FROM table_name;

Example

If you have a table called employees with columns first_name, last_name, and age, you could use the following query to select all the unique first names in the table:

SELECT DISTINCT first_name 
FROM employees;

This will return a list of all the distinct first names in the "employees" table, without any duplicates.

You could also use multiple columns in the SELECT DISTINCT statement. For example, the following query will return a list of all the unique first name and last name combinations in the employees table:

SELECT DISTINCT first_name, last_name 
FROM employees;

You can use DISTINCT in combination with other clauses such as WHERE and ORDER BY, for example:

SELECT DISTINCT first_name, last_name 
FROM employees 
WHERE age > 25 
ORDER BY last_name;

It would select distinct first_name and last_name combinations from the table employees, filtering only the rows where age is greater than 25, ordering the results by last_name

Best Practices

There are a few best practices to keep in mind when using the SELECT DISTINCT statement in SQL:

  • Use DISTINCT only when necessary: The DISTINCT keyword is used to eliminate duplicate rows from the result set, but it can also make a query slower by increasing the amount of work that the database server needs to do. So, you should use it only when it is really needed.
  • Use the right column or columns: If you are only interested in unique values of a single column, you should use that column in the DISTINCT clause. If you want to see unique combinations of multiple columns, you should use those columns in the DISTINCT clause.
  • Use indexing: If you are using a DISTINCT clause on a column that is not indexed, it can make the query slow. Therefore, it is a good idea to make sure that the column or columns you are using in the DISTINCT clause are indexed.
  • Avoid using DISTINCT with aggregate functions: Aggregate functions such as SUM, AVG, MAX and MIN already return a single value for each group of rows. Therefore, it is not necessary to use DISTINCT with them.
  • Be aware of sorting: When using the DISTINCT clause, the order of the rows in the result set is undefined. If the order is important, you should use the GROUP BY clause instead of the DISTINCT clause.
  • Keep an eye on performance: DISTINCT statements can be heavy in terms of performance, since it is required to look at all rows and compare the data to discard duplicates. If it is an option, is suggested to use LIMIT and/or to combine it with index-backed queries to achieve better performance.