Answering Business Questions Using SQL

In this project walkthrough, we’ll explore how to use SQL for data analysis from a digital music store and answer critical business questions. By working with the Chinook database—a sample database that represents a digital media store similar to iTunes—we’ll demonstrate how SQL can drive data-informed decision-making in a business context.

The Chinook database contains information about artists, albums, tracks, customers, and sales data. Through strategic SQL queries, we’ll help the business understand its market trends, evaluate employee performance, and identify growth opportunities. This project showcases real-world SQL applications that data analysts encounter daily.

We’ll take you through writing increasingly complex SQL queries, from basic exploratory analysis to advanced queries using Common Table Expressions (CTEs) and subqueries.

What You’ll Learn

By the end of this tutorial, you’ll know how to:

Navigate complex relational database schemas with multiple tables
Write SQL queries using joins to connect data across multiple tables
Use Common Table Expressions (CTEs) to organize complex queries
Apply subqueries to calculate percentages and comparative metrics
Analyze business data to provide actionable insights
Connect SQL queries to Python in Jupyter

Before You Start: Pre-Instruction

To make the most of this project walkthrough, follow these preparatory steps:

Review the Project
Prepare Your Environment
- If you’re using the Dataquest platform, everything is already set up for you
- If you’re working locally, you’ll need:
Get Comfortable with SQL Fundamentals
- You should be familiar with basic SQL keywords: SELECT, FROM, GROUP BY, and JOIN
- Some experience with CTEs and subqueries will be helpful, but not required
- New to Markdown? We recommend learning the basics: Markdown Guide

Setting Up Your Environment

Before we get into our analysis, let’s set up our Jupyter environment to work with SQL. We’ll use some SQL magic commands that allow us to write SQL directly in Jupyter cells.

%%capture
%load_ext sql
%sql sqlite:///chinook.db

Learning Insight: The %%capture magic command suppresses any output messages from the cell, keeping our notebook clean. The %load_ext sql command loads the SQL extension, and %sql sqlite:///chinook.db connects us to our database.

Now let’s verify our connection and explore what tables are available in our database:

%%sql

SELECT name 
FROM sqlite_master 
WHERE type='table';

This special SQLite query shows us all the table names in our database. The Chinook database contains 11 tables representing different aspects of a digital music store:

album: Album details
artist: Artist information
customer: Customer information with assigned support representatives
employee: Store employees, including sales support agents
genre: Music genres
invoice: Sales transactions
invoice_line: Individual items within each invoice
media_type: Format types (MP3, AAC, etc.)
playlist: Curated playlists
playlist_track: Tracks within each playlist
track: Song information

Understanding the Database Schema

Working with relational databases means understanding how tables connect to each other. The Chinook database uses primary and foreign keys to establish these relationships. Here’s a simplified view of the key relationships between the tables we’ll be working with:

chinook-schema

customer is linked to employee through support_rep_id
invoice is linked to customer through customer_id
invoice_line is linked to invoice through invoice_id
track is linked to album, invoice_line, and genre through album_id, track_id, and genre_id, respectively

Let’s preview some of our key tables to understand the data we’re working with:

%%sql

SELECT * 
FROM track 
LIMIT 5;

track_id	name	album_id	media_type_id	genre_id	composer	milliseconds	bytes	unit_price
1	For Those About To Rock (We Salute You)	1	1	1	Angus Young, Malcolm Young, Brian Johnson	343719	11170334	0.99
2	Balls to the Wall	2	2	1	None	342562	5510424	0.99
3	Fast As a Shark	3	2	1	F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman	230619	3990994	0.99
4	Restless and Wild	3	2	1	F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman	252051	4331779	0.99
5	Princess of the Dawn	3	2	1	Deaffy & R.A. Smith-Diesel	375418	6290521	0.99

%%sql

SELECT * 
FROM invoice_line 
LIMIT 5;

invoice_line_id	invoice_id	track_id	unit_price	quantity
1	1	1158	0.99	1
2	1	1159	0.99	1
3	1	1160	0.99	1
4	1	1161	0.99	1
5	1	1162	0.99	1

Learning Insight: When working with a new database, always preview your tables with LIMIT to understand the data structure before writing complex queries. This helps you identify column names, data types, and potential relationships without flooding your output with hundreds of rows.

Business Question 1: Which Music Genres Should We Focus on in the USA?

The Chinook store wants to understand which music genres are most popular in the United States market. This information will help them decide which new albums to add to their catalog. Let’s build a query to analyze genre popularity by sales.

Building Our Analysis with a CTE

We’ll use a Common Table Expression (CTE) to create a temporary result set that combines data from multiple tables:

%%sql

WITH genre_usa_tracks AS (
    SELECT
        il.invoice_line_id,
        g.name AS genre,
        t.track_id,
        i.billing_country AS country
    FROM track t
    JOIN genre g ON t.genre_id = g.genre_id
    JOIN invoice_line il ON t.track_id = il.track_id
    JOIN invoice i ON il.invoice_id = i.invoice_id
    WHERE i.billing_country = 'USA'
)
SELECT
    genre,
    COUNT(*) AS tracks_sold,
    COUNT(*) * 100.0 / (SELECT COUNT(*) FROM genre_usa_tracks) AS percentage
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC;

genre	tracks_sold	percentage
Rock	561	53.37773549000951
Alternative & Punk	130	12.369172216936251
Metal	124	11.798287345385347
R&B/Soul	53	5.042816365366318
Blues	36	3.4253092293054235
Alternative	35	3.330161750713606
Latin	22	2.093244529019981
Pop	22	2.093244529019981
Hip Hop/Rap	20	1.9029495718363463
Jazz	14	1.3320647002854424
Easy Listening	13	1.236917221693625
Reggae	6	0.570884871550904
Electronica/Dance	5	0.47573739295908657
Classical	4	0.38058991436726924
Heavy Metal	3	0.285442435775452
Soundtrack	2	0.19029495718363462
TV Shows	1	0.09514747859181731

Learning Insight: CTEs make complex queries more readable by breaking them into logical steps. Here, we first create a filtered dataset of USA purchases, then analyze it. The 100.0 in our percentage calculation ensures we get decimal results instead of integer division.

Our results show that Rock music dominates the USA market with over 50% of sales, followed by Latin, Metal, and Alternative & Punk. This suggests the store should prioritize these genres when selecting new inventory.

Key Insights from Genre Analysis

Rock dominates: With 561 tracks sold (53.4%), Rock is by far the most popular genre
Latin music surprise: The second most popular genre is Latin (10.3%), indicating a significant market segment
Long tail effect: Many genres have very small percentages, suggesting niche markets

Business Question 2: Analyzing Employee Sales Performance

The company wants to evaluate its sales support agents’ performance to identify top performers and areas for improvement. Let’s analyze which employees generate the most revenue.

%%sql

SELECT
    e.first_name || ' ' || e.last_name AS employee_name,
    e.hire_date,
    COUNT(DISTINCT c.customer_id) AS customer_count,
    SUM(i.total) AS total_sales_dollars,
    SUM(i.total) / COUNT(DISTINCT c.customer_id) AS avg_dollars_per_customer
FROM customer c
JOIN invoice i ON c.customer_id = i.customer_id
JOIN employee e ON c.support_rep_id = e.employee_id
GROUP BY e.employee_id, e.hire_date
ORDER BY total_sales_dollars DESC;

employee_name	hire_date	customer_count	total_sales_dollars	avg_dollars_per_customer
Jane Peacock	2017-04-01 00:00:00	21	1731.5100000000039	82.45285714285733
Margaret Park	2017-05-03 00:00:00	20	1584.0000000000034	79.20000000000017
Steve Johnson	2017-10-17 00:00:00	18	1393.920000000002	77.44000000000011

Learning Insight: When using GROUP BY with aggregate functions, remember to include all non-aggregated columns in your GROUP BY clause. This is required in most SQL flavors (though SQLite is more forgiving). The || operator concatenates strings in SQLite.

Performance Analysis Results

Our analysis reveals interesting patterns:

Jane Peacock leads with the highest average dollars per customer, despite not having the most customers
Margaret Park’s performance is solid, with metrics close to Jane’s, suggesting a consistent level of customer value delivery
Steve Johnson, the newest employee, shows promising performance with metrics similar to more experienced staff

Business Question 3: Combining SQL with Python for Visualization

While SQL excels at data retrieval and transformation, combining it with Python enables powerful visualizations. Let’s demonstrate how to pass SQL query results to Python:

import pandas as pd

# Store our query as a string
query = """
SELECT
    genre,
    COUNT(*) AS tracks_sold
FROM genre_usa_tracks
GROUP BY genre
ORDER BY tracks_sold DESC
LIMIT 10;
"""

# Execute the query and store results
result = %sql \$query

# Convert to pandas DataFrame
df = result.DataFrame()

Learning Insight: The %sql inline magic (single percent sign) allows us to execute SQL and capture the results in Python. The dollar sign syntax (\$query) lets us reference Python variables within SQL magic commands.

Challenges and Considerations

During our analysis, we encountered several important SQL concepts worth highlighting:

1. Integer Division Pitfall

When calculating percentages, SQL performs integer division by default:

-- This returns 0 for all percentages
SELECT COUNT(*) / (SELECT COUNT(*) FROM table) AS percentage

-- This returns proper decimals
SELECT COUNT(*) * 100.0 / (SELECT COUNT(*) FROM table) AS percentage

2. JOIN Selection Matters

We used INNER JOIN throughout because we only wanted records that exist in all related tables. If we needed to include customers without invoices, we would use LEFT JOIN instead.

3. Subquery Performance

Our percentage calculation uses a subquery that executes for each row. For larger datasets, consider using window functions or pre-calculating totals in a CTE.

Sharing Your Work with GitHub Gists

GitHub Gists provide an excellent way to share your SQL projects without the complexity of full repositories. Here’s how to share your work:

Navigate to gist.github.com
Create a new gist
Name your file with the .ipynb extension for Jupyter notebooks or .sql for SQL scripts
Paste your code and create either a public or secret gist

Gists automatically render Jupyter notebooks with all outputs preserved, making them perfect for sharing analysis results with stakeholders or including in your portfolio of projects.

Summary of Analysis

In this project, we’ve demonstrated how SQL can answer critical business questions for a digital music store:

Genre Analysis: We identified Rock as the dominant genre in the USA market (53.4%), with Latin music as a surprising second place
Employee Performance: We evaluated sales representatives, discovering that Jane Peacock leads in average revenue per customer
Technical Skills: We applied CTEs, subqueries, multiple joins, and aggregate functions to solve real business problems

These insights enable data-driven decisions about inventory management, employee training, and market strategy.

Next Steps

To extend this analysis and deepen your SQL skills, consider these challenges:

Time-based Analysis: How do sales trends change over time? Add date filtering to identify seasonal patterns
Customer Segmentation: Which customers are the most valuable? Create customer segments based on purchase behavior
Product Recommendations: Which tracks are commonly purchased together? Use self-joins to find associations
International Markets: Expand the genre analysis to compare preferences across different countries

If you’re new to SQL and found this project challenging, start with our SQL Fundamentals skill path to build the foundational skills needed for complex analysis. The course covers essential topics like joins, aggregations, and subqueries that we’ve used throughout this project.

Happy querying!

Source link

Education & Learning

Answering Business Questions Using SQL

What You’ll Learn

Before You Start: Pre-Instruction

Setting Up Your Environment

Understanding the Database Schema

Business Question 1: Which Music Genres Should We Focus on in the USA?

Building Our Analysis with a CTE

Key Insights from Genre Analysis

Business Question 2: Analyzing Employee Sales Performance

Performance Analysis Results

Business Question 3: Combining SQL with Python for Visualization

Challenges and Considerations

1. Integer Division Pitfall

2. JOIN Selection Matters

3. Subquery Performance

Sharing Your Work with GitHub Gists

Summary of Analysis

Next Steps

Pay Equity and the US Women's Soccer Team

The battle of balancing affordability, autonomy, and accountability, ET Education

Leave A Reply Cancel reply

Subscribe our Newsletter

Contact Us

The Asha Modern School

Links

Recommend

Education & Learning

What You’ll Learn

Before You Start: Pre-Instruction

Setting Up Your Environment

Understanding the Database Schema

Business Question 1: Which Music Genres Should We Focus on in the USA?

Building Our Analysis with a CTE

Key Insights from Genre Analysis

Business Question 2: Analyzing Employee Sales Performance

Performance Analysis Results

Business Question 3: Combining SQL with Python for Visualization

Challenges and Considerations

1. Integer Division Pitfall

2. JOIN Selection Matters

3. Subquery Performance

Sharing Your Work with GitHub Gists

Summary of Analysis

Next Steps

Pay Equity and the US Women's Soccer Team

The battle of balancing affordability, autonomy, and accountability, ET Education

You may also like

Focus On What’s Wrong: Designing Like You Mean It

Satisfaction Score 4.67 Is Pretty Good, Right? Right?

Q&A With Dr. RK Prasad: Exploring Classroom Training’s Comeback

Leave A Reply Cancel reply

Subscribe our Newsletter

Contact Us

The Asha Modern School

Links

Recommend

Login with your site account

Register a new account