🎵 Spotify Data Analysis - SQL Project

A comprehensive SQL-based data analysis project exploring Spotify music data with exploratory data analysis (EDA) and queries categorized by difficulty levels (Easy, Medium, and Advanced).

📋 Table of Contents

Project Overview
Features
Database Schema
Project Structure
Technologies Used
Getting Started
Query Categories
Analysis Highlights
Dataset

🎯 Project Overview

This project demonstrates SQL skills by analyzing Spotify music data including:

Audio features (danceability, energy, tempo, etc.)
Streaming statistics (Spotify vs YouTube)
Artist and album analytics
Track performance metrics

The analysis is organized into three difficulty levels to showcase progressive SQL techniques from basic queries to advanced window functions.

✨ Features

Exploratory Data Analysis (EDA): Initial data exploration and cleaning
Easy Queries: Basic SELECT, WHERE, GROUP BY operations
Medium Queries: Aggregations, conditional logic, and joins
Advanced Queries: Window functions, CTEs, subqueries, and complex analytics
Data Cleaning: Removal of invalid records (zero duration tracks)

🗄️ Database Schema

The spotify table contains the following columns:

Column	Type	Description
`artist`	VARCHAR(255)	Artist name
`track`	VARCHAR(255)	Track name
`album`	VARCHAR(255)	Album name
`album_type`	VARCHAR(50)	Type of album (album, single, etc.)
`danceability`	FLOAT	Danceability score (0-1)
`energy`	FLOAT	Energy score (0-1)
`loudness`	FLOAT	Loudness in dB
`speechiness`	FLOAT	Speechiness score (0-1)
`acousticness`	FLOAT	Acousticness score (0-1)
`instrumentalness`	FLOAT	Instrumentalness score (0-1)
`liveness`	FLOAT	Liveness score (0-1)
`valence`	FLOAT	Valence (positivity) score (0-1)
`tempo`	FLOAT	Tempo in BPM
`duration_min`	FLOAT	Track duration in minutes
`title`	VARCHAR(255)	Video title
`channel`	VARCHAR(255)	YouTube channel name
`views`	FLOAT	Number of views
`likes`	BIGINT	Number of likes
`comments`	BIGINT	Number of comments
`licensed`	BOOLEAN	Licensed content flag
`official_video`	BOOLEAN	Official video flag
`stream`	BIGINT	Total streams
`energy_liveness`	FLOAT	Energy to liveness ratio
`most_played_on`	VARCHAR(50)	Primary platform (Spotify/YouTube)

📁 Project Structure

Spotify_Data_Analysis_SQL/
│
├── spotify_data_analysis.sql    # Main SQL script with all queries
├── cleaned_data.csv             # Dataset file
├── spotify-piano-and-headphones-psvd4nwl5u4foka8.jpg  # Project image
└── README.md                     # Project documentation

🛠️ Technologies Used

SQL (PostgreSQL/MySQL compatible)
CSV data format
SQL features used:
- Basic SQL operations (SELECT, WHERE, GROUP BY, ORDER BY)
- Aggregate functions (COUNT, SUM, AVG, MAX, MIN)
- Window functions (DENSE_RANK, SUM OVER)
- Common Table Expressions (CTEs)
- Subqueries
- Conditional logic (CASE statements)
- Data type handling

🚀 Getting Started

Prerequisites

SQL database system (PostgreSQL)
CSV file import capability
SQL client/interface (pgAdmin)

Installation Steps

Clone the repository

git clone https://github.com/Sd7070/Spotify_Data_Analysis_SQL.git
cd Spotify_Data_Analysis_SQL

Import the dataset
- Import cleaned_data.csv into your database
- Create the table structure using the CREATE TABLE statement in the SQL file
- Load the CSV data into the spotify table
Run the SQL queries
- Open spotify_data_analysis.sql in your SQL client
- Execute queries section by section
- Or run the entire script

Example: Creating Table and Importing Data

-- Create the table (from spotify_data_analysis.sql)
CREATE TABLE spotify (
    artist VARCHAR(255),
    track VARCHAR(255),
    -- ... (see full schema in SQL file)
);

📊 Query Categories

🔵 Easy Category (5 Queries)

High-Streaming Tracks: Find tracks with more than 1 billion streams
Album-Artist Mapping: List all albums with their respective artists
Licensed Content Comments: Calculate total comments for licensed tracks
Single Albums: Find all tracks belonging to single-type albums
Artist Track Count: Count total tracks by each artist

Skills Demonstrated: Basic SELECT, WHERE, GROUP BY, aggregate functions

🟡 Medium Category (5 Queries)

Album Danceability: Calculate average danceability per album
Top Energy Tracks: Find top 5 tracks with highest energy values
Official Video Metrics: List tracks with views and likes for official videos
Album Views: Calculate total views for tracks in each album
Platform Comparison: Find tracks streamed more on Spotify than YouTube

Skills Demonstrated: Aggregations, GROUP BY with multiple columns, conditional aggregation (CASE statements)

🔴 Advanced Category (5 Queries)

Top Tracks per Artist: Find top 3 most-viewed tracks for each artist using window functions
Above-Average Liveness: Find tracks with liveness above the dataset average using subqueries
Energy Range Analysis: Calculate energy difference (max-min) per album using CTEs
Energy-Liveness Ratio: Find tracks where energy-to-liveness ratio > 1.2
Cumulative Likes: Calculate cumulative sum of likes ordered by views using window functions

Skills Demonstrated: Window functions (DENSE_RANK, SUM OVER), CTEs (WITH clauses), subqueries, advanced analytics

📈 Analysis Highlights

Data Exploration: Comprehensive EDA including data quality checks and cleaning
Performance Metrics: Analysis of streaming, views, likes, and comments
Audio Feature Analysis: Insights into musical characteristics (energy, danceability, liveness)
Platform Analytics: Comparison between Spotify and YouTube performance
Ranking Analysis: Top-performing tracks and artists using advanced ranking functions
Statistical Analysis: Average calculations, cumulative sums, and comparative metrics

📦 Dataset

File: cleaned_data.csv
Format: CSV with headers
Content: Spotify music data with audio features and streaming metrics
Data Quality: Pre-cleaned dataset (zero-duration tracks removed)

📝 Notes

All queries are compatible with PostgreSQL, MySQL, and most SQL databases
Some syntax may need minor adjustments based on your database system
The dataset should be imported before running the analysis queries
Queries are organized sequentially and can be run independently

🤝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

📄 License

This project is open source and available for educational purposes.

⭐ If you find this project helpful, please consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎵 Spotify Data Analysis - SQL Project

📋 Table of Contents

🎯 Project Overview

✨ Features

🗄️ Database Schema

📁 Project Structure

🛠️ Technologies Used

🚀 Getting Started

Prerequisites

Installation Steps

Example: Creating Table and Importing Data

📊 Query Categories

🔵 Easy Category (5 Queries)

🟡 Medium Category (5 Queries)

🔴 Advanced Category (5 Queries)

📈 Analysis Highlights

📦 Dataset

📝 Notes

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
cleaned_data.csv		cleaned_data.csv
spotify-piano-and-headphones-psvd4nwl5u4foka8.jpg		spotify-piano-and-headphones-psvd4nwl5u4foka8.jpg
spotify_data_analysis.sql		spotify_data_analysis.sql

Folders and files

Latest commit

History

Repository files navigation

🎵 Spotify Data Analysis - SQL Project

📋 Table of Contents

🎯 Project Overview

✨ Features

🗄️ Database Schema

📁 Project Structure

🛠️ Technologies Used

🚀 Getting Started

Prerequisites

Installation Steps

Example: Creating Table and Importing Data

📊 Query Categories

🔵 Easy Category (5 Queries)

🟡 Medium Category (5 Queries)

🔴 Advanced Category (5 Queries)

📈 Analysis Highlights

📦 Dataset

📝 Notes

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages