Description
I delved into the Forbes List of February 2023, conducting comprehensive Exploratory Data Analysis (EDA) to extract valuable insights. These insights were subsequently transformed into a dynamic and interactive dashboard, providing an engaging visualization of the data.
Skills Applied: SQL, Excel, Tableau
Implementation
- I proficiently employed a diverse range of SQL functions, including SUM, MAX, MIN, and COUNT, to gain insights during the data exploration process. Additionally, I utilized advanced techniques such as Common Table Expressions (CTE) to enhance the analysis and generate comprehensive results.
- Utilize Excel for further data cleansing to improve the usability and quality of the information.
- Utilizing Tableau, I created a dynamic dashboard that simplifies the analysis breakdown, making it easier to comprehend and explore the insights.
Breakdown
SQL
Exploratory Data Analysis
- To kickstart my project, I commenced by importing the dataset into SQL and devising analytical queries.
- Following that, I proceeded with my exploratory data analysis (EDA) by utilizing various aggregate functions such as MAX, MIN, SUM, and COUNT to address several inquiries.
- Additionally, I performed data cleaning tasks to facilitate the transformation of the data into a more meaningful format that would assist in my analysis. For instance, I introduced a new column called "Source_First_Name."
- Please refer to the code section below (double-click) to access the complete EDA process and the corresponding list of questions.
-- Dataset
select *
from [Forbes List]..Forbes
order by rank
-- Exploratory Data Analysis
-- What is the average, max, min age of these people in Forbes List?
select max(Age) as Highest_Age, min(Age) as Lowest_Age, cast(avg(Age) as int) as Average_Age
from [Forbes List]..Forbes
-- Who is below the average age?
select *
from [Forbes List]..Forbes
where age <= (select avg(age) from [Forbes List]..Forbes)
order by rank
-- I'm 24 y/o at the point of writing this SQL code, how many people below or equal to my age are in this list?
select count(Age) as Count_People_My_Age
from [Forbes List]..Forbes
where Age <= 24
-- What is the lowest Net Worth and the average of the net worth on the list?
-- First we need to make sure the Net Worth column is converted to FLOAT
WITH Net as (
select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
from [Forbes List]..Forbes
)
-- Now we can find the Lowest Net Worth and Average Net Worth using 'Net_Worth_F'
select CONCAT('$', round(avg(Net_Worth_F), 2), ' B') as Average_Net_Worth, CONCAT('$', min(Net_Worth_F), ' B') as Minimum_Net_Worth
from Net
-- How many people are above average? ($4.76 B)
WITH Net as (
select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
from [Forbes List]..Forbes
)
select count(Net_Worth_F) as Count_Above_Average
from Net
where Net_Worth_F > (select avg(Net_Worth_F) from Net)
-- What countries do these individuals come from?
select DISTINCT(Country)
from [Forbes List]..Forbes
-- How many individuals come from their respective country?
select DISTINCT(Country), count(Country) as Count_Individuals
from [Forbes List]..Forbes
group by Country
order by count(Country) desc
-- What is the average age for the respective country to be on Forbes List?
select DISTINCT(Country), count(Country) as Count_Individuals, cast(avg(Age) as int) as Average_Age
from [Forbes List]..Forbes
group by Country
order by count(Country) desc
-- Sources in Forbes List and most populour source
-- The original Source column has too many different data, let's create a new column to ease my findings
select DISTINCT(Source),
case
WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)
ELSE Source
end as First_Word
from [Forbes List]..Forbes
group by Source;
-- Now I want to group by 'First_Word' distinctively, and do a count to find out which is the most popular
select First_Word, count(First_Word) as Source_Count
from (
select DISTINCT Source,
case
WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)
ELSE Source
end as First_Word
from [Forbes List]..Forbes
group by Source
) as subquery
group by First_Word
Order by count(First_Word) desc
-- Now I want to add this new column to my existing Forbes table.
ALTER TABLE [Forbes List]..Forbes ADD Source_First_Word NVARCHAR(255)
UPDATE [Forbes List]..Forbes
SET Source_First_Word = case
WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)
ELSE Source
end
-- What is the most popular industry?
select DISTINCT(Industry), count(Industry) as Industry_Count
from [Forbes List]..Forbes
group by Industry
order by count(Industry) desc
-- Which industry has the highest average net worth?
WITH Net as (
select Industry, count(Industry) as Industry_Count, round(avg(cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float)), 2) as Average_Net_Worth_F
from [Forbes List]..Forbes
group by Industry
)
select Industry, Industry_Count, CONCAT('$', Average_Net_Worth_F, ' B') as Average_Net_Worth
from Net
order by Average_Net_Worth desc
Excel
Data Cleaning
- I have chosen to utilize Excel for additional data cleaning in order to enhance the usability and quality of the information.
- Firstly, I imported my SQL dataset into Excel by carrying out the following steps: Get Data -> From Database -> From SQL Server Database
- I observed that the "Source" column contains multiple instances of duplicates with variations in spelling, such as additional letters or spelling errors.
- Subsequently, I employed the REPLACE function to perform the required cleaning operations on the "Source" column until all entries became unique by resolving variations in spelling and eliminating duplicates.
Tableau
Create Dashboard
- To enhance the accessibility and ease of reference for all the information, I utilized Tableau to develop an interactive dashboard.
- In order to maximize dashboard interactivity, I integrated dynamic filters into relevant charts.
- By leveraging parameters and calculated fields, users are empowered to actively explore and refine the presentation of data according to their individual requirements, thereby enhancing their overall interactive experience.
** Please note that there is a display error with the parameter shown in the picture ("Select Country"). To view the correct display, kindly visit my Tableau profile.