SQL Practice Problems to Train Your Teams
Prepare your team for advanced SQL tasks
In today's data-driven world, the ability to harness the power of data is an invaluable skill. And SQL is the language of databases. It's the key that unlocks the treasure troves of information stored in data repositories, allowing us to retrieve, manipulate, and analyze data with precision and efficiency.
To help your team develop a more thorough understanding of SQL, we have compiled a series of five SQL practice problems. These problems cover a range of advanced scenarios, allowing your team to develop SQL knowledge that will be useful in practical contexts.
Problem #1: SQL query processing with grouping, aggregation, filtering, and sorting
Consider a simplified database system with two tables: Employees and Departments. The Employees table contains information about employees, including their EmployeeID, EmployeeName, Salary, and DepartmentID. The Departments table contains information about departments, including DepartmentID and DepartmentName.
Write a step-by-step description of the logical query processing of the following SQL query:
Explain each step of how this query is processed logically, including any temporary results or intermediate stages.
Solution
The SQL query provided retrieves the average salary (AvgSalary) of employees in each department, but only for departments where the average salary is greater than $50,000. The results are sorted in descending order of the average salary.
Here's a step-by-step explanation of the logical query processing:
- FROM Clause:
- The first step is to identify the source tables for the query. In this case, we have the Departments table aliased as D and the Employees table aliased as E.
- The JOIN operation is used to combine the Departments and Employees tables based on the DepartmentID column, linking employees to their respective departments.
- WHERE Clause (if applicable):
- There is no WHERE clause in this query, so we move on to the next step.
- GROUP BY Clause:
- The GROUP BY clause is used to group the result set by the DepartmentName.
- This means that all rows with the same DepartmentName will be treated as a single group.
- SELECT Clause:
- In the SELECT clause, we calculate the average salary for each department using the AVG(Salary) function. The result is aliased as AvgSalary.
- We also select the DepartmentName.
- HAVING Clause:
- The HAVING clause is applied to the grouped results. It filters out groups (departments) where the average salary is not greater than $50,000.
- Departments that do not meet this condition are excluded from the result.
- ORDER BY Clause:
- The final step is to sort the remaining rows by the AvgSalary in descending order. This means that departments with the highest average salary will appear at the top of the result set.
- Result:
- The final result of the query is a list of department names (DepartmentName) and their corresponding average salaries (AvgSalary) where the average salary is greater than $50,000, sorted in descending order by average salary.
This query processing logic helps you understand how SQL queries are executed step by step to produce the desired result.
Recommended courses: Logical Query Processing, Part 1 and Logical Query Processing, Part 2 with Ami Levin
Problem #2: Calculating running total of sales using window functions
You have a database with a table named Sales that contains information about sales transactions. The Sales table has the following columns: SaleID, ProductID, SaleDate, and SaleAmount.
Write an SQL query that retrieves the following information for each sale:
The running total of sales (RunningTotal) for each product, sorted by SaleDate.
Your query should return all the sales and include the running total of sales for each product at the time of that sale, ordered by SaleDate.
Data set
Sales Table:
SaleID | ProductID | SaleDate | SaleAmount |
---|---|---|---|
101 | 1 | 2023-01-15 | 200.00 |
102 | 2 | 2023-01-16 | 375.00 |
103 | 3 | 2023-01-18 | 144.00 |
104 | 4 | 2023-01-20 | 600.00 |
105 | 5 | 2023-01-22 | 264.00 |
106 | 6 | 2023-01-23 | 504.00 |
107 | 7 | 2023-01-25 | 216.00 |
108 | 8 | 2023-01-27 | 462.00 |
109 | 9 | 2023-01-29 | 297.00 |
110 | 10 | 2023-01-30 | 266.00 |
Solution
You can use a window function, specifically the SUM window function, to calculate the running total of sales for each product. Here's the SQL query to solve this problem:
This query:
- Retrieves the SaleID, SaleDate, SaleAmount, and ProductID columns from the Sales table.
- Uses the SUM window function to calculate the running total of SaleAmount for each product. The PARTITION BY ProductID clause ensures that the running total is calculated separately for each product, and the ORDER BY SaleDate clause specifies the order in which the running total is calculated (in ascending order of SaleDate).
- Orders the results by SaleDate to get the sales and their running totals in chronological order.
This query will give you a result set with each sale, its date, amount, product, and the running total of sales for that product at the time of the sale.
SaleID | SaleDate | SaleAmount | ProductID | RunningTotal |
---|---|---|---|---|
101 | 2023-01-15 | 200 | 1 | 200 |
102 | 2023-01-16 | 375 | 2 | 575 |
103 | 2023-01-18 | 144 | 3 | 719 |
104 | 2023-01-20 | 600 | 4 | 1319 |
105 | 2023-01-22 | 264 | 5 | 1583 |
106 | 2023-01-23 | 504 | 6 | 2087 |
107 | 2023-01-25 | 216 | 7 | 2303 |
108 | 2023-01-27 | 462 | 8 | 2765 |
109 | 2023-01-29 | 297 | 9 | 3062 |
110 | 2023-01-30 | 266 | 10 | 3328 |
Recommended course: Advanced SQL – Window Functions with Ami Levin
Problem #3: Query tuning and optimization for customer order count
You have a database with two tables: Customers and Orders. The Customers table contains information about customers, including their CustomerID, CustomerName, and Country. The Orders table contains information about orders, including OrderID, CustomerID (which links to the CustomerID in the Customers table), and OrderDate.
You are given the following SQL query, which is intended to retrieve the list of customer names along with the total count of orders they have made:
However, you have noticed that this query is performing poorly, particularly when there are a large number of customers and orders in the database. Your task is to optimize this query to improve its performance.
Data set
Customers Table:
SaleID | CustomerName | Country |
---|---|---|
1 | John Smith | USA |
2 | Marry Johnson | Canada |
3 | David Wilson | UK |
4 | Sarah Brown | Australia |
5 | James Lee | Germany |
6 | Linda Davis | France |
7 | Richard White | Spain |
8 | Susan Miller | Italy |
9 | Michael Taylor | Brazil |
10 | Karen Anderson | Mexico |
Orders Table:
OrderID | CustomerID | OrderDate |
---|---|---|
1001 | 1 | 2023-10-01 |
1002 | 1 | 2023-10-02 |
1003 | 1 | 2023-10-03 |
1004 | 2 | 2023-10-04 |
1005 | 2 | 2023-10-05 |
1006 | 3 | 2023-10-06 |
1007 | 3 | 2023-10-07 |
1008 | 3 | 2023-10-08 |
1009 | 4 | 2023-10-09 |
1010 | 4 | 2023-10-10 |
1011 | 4 | 2023-10-11 |
1012 | 5 | 2023-10-12 |
1013 | 5 | 2023-10-13 |
1014 | 5 | 2023-10-14 |
1015 | 5 | 2023-10-15 |
1016 | 6 | 2023-10-16 |
1017 | 6 | 2023-10-17 |
1018 | 7 | 2023-10-18 |
Solution
Here are some optimization steps you can take:
Indexing: Ensure that there are appropriate indexes on the join columns (CustomerID in both tables) to speed up the join operation.
Selecting Only Necessary Columns: In the SELECT clause, you are only interested in the CustomerName and the count of orders. Therefore, you can rewrite the query to avoid selecting unnecessary columns. This can reduce the amount of data that needs to be processed.
Aggregating with INNER JOIN: If the database allows it, you can use an INNER JOIN instead of a LEFT JOIN when you're only interested in customers who have placed orders. This can potentially optimize the query.
Regular Database Maintenance: Perform regular database maintenance tasks like index reorganization and statistics updates to ensure the database is running efficiently.
Query Profiling: Use query profiling tools to analyze the execution plan of the query and identify performance bottlenecks. This can help you see where the query might be consuming the most resources and where you can make further improvements.
Optimizing this query involves database-specific considerations, but these general steps can help you get started. The goal is to reduce the amount of data processed and to make efficient use of indexes and joins.
As with any optimization scenario, effective query optimization may involve trade-offs depending on your specific use case and database system.
CustomerName | TotalOrders |
---|---|
John Smith | 3 |
Mary Johnson | 2 |
David Wilson | 3 |
Sarah Brown | 3 |
James Lee | 4 |
Linda Davis | 2 |
Richard White | 5 |
Susan Miller | 3 |
Michael Taylor | 3 |
Karen Anderson | 3 |
Recommended course: Advanced SQL for Query Tuning and Performance Optimization with Dan Sullivan
Problem #4: Retrieving top-selling books by sales quantity
You are developing a web-based e-commerce application for a bookstore. You need to write an SQL query to retrieve a list of the top-selling books from the database.
Each book's sales data is stored in the Books and Sales tables. The Books table contains information about the books, including BookID, Title, Author, and Price. The Sales table contains information about sales, including SaleID, BookID, Quantity, and SaleDate.
Write an SQL query to retrieve the top 10 best-selling books by the total number of copies sold, along with their titles, authors, and total sales quantity. Ensure that the list is sorted in descending order of sales quantity.
Data set
Books Table:
BookID | Title | Author | Price |
---|---|---|---|
1 | "The Adventure" | "John Smith" | 20.00 |
2 | "Mystery Unveiled" | "Emily Davis" | 25.00 |
3 | "Secrets of Nature" | "Michael Clark" | 18.00 |
4 | "Tech Titans" | "Sophia Brown" | 30.00 |
5 | "Love in Paris" | "David Wilson" | 22.00 |
6 | "The Lost Key" | "Lucy Turner" | 28.00 |
7 | "Time Travelers" | "Andrew Green" | 24.00 |
8 | "Beyond the Stars" | "Olivia White" | 21.00 |
9 | "The Enigma" | "Henry Adams" | 27.00 |
10 | "Hidden Treasures" | "Linda Hall" | 19.00 |
11 | "The Puzzle Box" | "George Parker" | 23.00 |
12 | "Tales of Wonder" | "Emma Lewis" | 26.00 |
13 | "Infinite Realms" | "Benjamin Scott" | 30.00 |
14 | "The Sapphire Crown" | "Alice King" | 22.00 |
15 | "The Quantum Paradox" | "William Reed" | 25.00 |
16 | "Dreams of Tomorrow" | "Sophie Turner" | 20.00 |
17 | "The Secret Garden" | "Daniel White" | 18.00 |
18 | "City of Miracles" | "Sarah Brown" | 29.00 |
19 | "The Celestial Code" | "Michael Harris" | 27.00 |
20 | "Echoes of Eternity" | "Ella Parker" | 24.00 |
Solution
You can use the following SQL query to solve this problem:
This query:
- Joins the Books and Sales tables on the BookID column to associate each sale with the corresponding book.
- Groups the results by book title and author using the GROUP BY clause.
- Calculates the total sales quantity for each book using the SUM function.
- Orders the results in descending order of total sales quantity using the ORDER BY clause.
- Limits the result set to the top 10 records using the LIMIT clause.
The query will provide you with a list of the top-selling books in descending order of sales quantity, along with their titles, authors, and total sales quantity.
SaleID | BookID | Quantity | SaleDate |
---|---|---|---|
116 | 16 | 25 | 2023-02-09 |
113 | 13 | 23 | 2023-02-04 |
108 | 8 | 22 | 2023-01-27 |
118 | 18 | 21 | 2023-02-13 |
104 | 4 | 20 | 2023-01-20 |
112 | 12 | 19 | 2023-02-02 |
106 | 6 | 18 | 2023-01-23 |
115 | 15 | 16 | 2023-02-07 |
102 | 2 | 15 | 2023-01-16 |
110 | 10 | 14 | 2023-01-30 |
Recommended course: Advanced SQL for Application Development with Dan Sullivan
Problem #5: Analyzing customer transactions for e-commerce insights
You are a data scientist working with a dataset of customer transactions for an e-commerce platform. The dataset contains two tables: Customers and Transactions. The Customers table contains information about customers, including CustomerID, CustomerName, and Country. The Transactions table contains information about individual transactions, including TransactionID, CustomerID (which links to the CustomerID in the Customers table), TransactionDate, and Amount.
Your task is to write an SQL query to analyze the data and find insights regarding customer behavior. Specifically, you need to find the following information:
- The total number of transactions in the dataset.
- The total revenue generated from all transactions.
- The average transaction amount.
- The country with the highest total revenue from transactions.
- The customer who has made the highest number of transactions.
Write an SQL query that retrieves this information from the Customers and Transactions tables.
Data set
Customers Table:
CustomerID | CustomerName | Country |
---|---|---|
101 | Alice | USA |
102 | Bob | Canada |
103 | Carol | USA |
104 | Dave | UK |
105 | Eve | Canada |
Transactions Table:
OrderID | CustomerID | OrderDate | Amount |
---|---|---|---|
101 | 1 | 2023-01-15 | 50.00 |
102 | 2 | 2023-01-16 | 30.00 |
103 | 3 | 2023-01-18 | 75.00 |
104 | 1 | 2023-01-20 | 45.00 |
105 | 4 | 2023-01-22 | 60.00 |
106 | 5 | 2023-01-23 | 40.00 |
107 | 1 | 2023-01-25 | 55.00 |
108 | 2 | 2023-01-27 | 35.00 |
109 | 3 | 2023-01-29 | 80.00 |
110 | 1 | 2023-01-30 | 70.00 |
Solution
You can use the following SQL query to obtain the desired insights:
This query:
- Counts the total number of transactions in the dataset.
- Calculates the total revenue from all transactions.
- Computes the average transaction amount.
- Identifies the country with the highest total revenue from transactions.
- Determines the customer who has made the highest number of transactions.
These insights can help data scientists better understand customer behavior and the financial performance of the e-commerce platform.
Metric | Result |
---|---|
Total number of transactions | 10 |
Total revenue generated | $540.00 |
Average transaction amount | $54.00 |
Country with the highest revenue | USA ($375.00) |
Customer with the most transactions | Alice (4 transactions) |
Recommended course: Advanced SQL for Data Scientists with Dan Sullivan