Forbes List Project

Description

I delved into the Forbes List of February 2023, conducting comprehensive Exploratory Data Analysis (EDA) to extract valuable insights. These insights were subsequently transformed into a dynamic and interactive dashboard, providing an engaging visualization of the data.

Skills Applied: SQL, Excel, Tableau

Implementation

I proficiently employed a diverse range of SQL functions, including SUM, MAX, MIN, and COUNT, to gain insights during the data exploration process. Additionally, I utilized advanced techniques such as Common Table Expressions (CTE) to enhance the analysis and generate comprehensive results.
Utilize Excel for further data cleansing to improve the usability and quality of the information.
Utilizing Tableau, I created a dynamic dashboard that simplifies the analysis breakdown, making it easier to comprehend and explore the insights.

Breakdown

SQL

Exploratory Data Analysis

To kickstart my project, I commenced by importing the dataset into SQL and devising analytical queries.
Following that, I proceeded with my exploratory data analysis (EDA) by utilizing various aggregate functions such as MAX, MIN, SUM, and COUNT to address several inquiries.
Additionally, I performed data cleaning tasks to facilitate the transformation of the data into a more meaningful format that would assist in my analysis. For instance, I introduced a new column called "Source_First_Name."
Please refer to the code section below (double-click) to access the complete EDA process and the corresponding list of questions.

Complete EDA Code

-- Dataset
select *
from [Forbes List]..Forbes
order by rank

-- Exploratory Data Analysis

	-- What is the average, max, min age of these people in Forbes List?
		select max(Age) as Highest_Age, min(Age) as Lowest_Age, cast(avg(Age) as int) as Average_Age
		from [Forbes List]..Forbes


	-- Who is below the average age?
		select *
		from [Forbes List]..Forbes
		where age <= (select avg(age) from [Forbes List]..Forbes)
		order by rank


	-- I'm 24 y/o at the point of writing this SQL code, how many people below or equal to my age are in this list?
		select count(Age) as Count_People_My_Age
		from [Forbes List]..Forbes
		where Age <= 24


	-- What is the lowest Net Worth and the average of the net worth on the list?

		-- First we need to make sure the Net Worth column is converted to FLOAT
		WITH Net as (
		select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
		from [Forbes List]..Forbes
		)

		-- Now we can find the Lowest Net Worth and Average Net Worth using 'Net_Worth_F'
		select CONCAT('$', round(avg(Net_Worth_F), 2), ' B') as Average_Net_Worth, CONCAT('$', min(Net_Worth_F), ' B') as Minimum_Net_Worth
		from Net


	-- How many people are above average? ($4.76 B)
		WITH Net as (
		select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
		from [Forbes List]..Forbes
		)

		select count(Net_Worth_F) as Count_Above_Average
		from Net
		where Net_Worth_F > (select avg(Net_Worth_F) from Net)


	-- What countries do these individuals come from?
		select DISTINCT(Country)
		from [Forbes List]..Forbes
		

	-- How many individuals come from their respective country?
		select DISTINCT(Country), count(Country) as Count_Individuals
		from [Forbes List]..Forbes
		group by Country
		order by count(Country) desc


	-- What is the average age for the respective country to be on Forbes List?
		select DISTINCT(Country), count(Country) as Count_Individuals, cast(avg(Age) as int) as Average_Age 
		from [Forbes List]..Forbes
		group by Country
		order by count(Country) desc


	-- Sources in Forbes List and most populour source
		
		-- The original Source column has too many different data, let's create a new column to ease my findings
			select DISTINCT(Source),
			case
				WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
				WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
				ELSE Source
			end as First_Word
			from [Forbes List]..Forbes
			group by Source;

		-- Now I want to group by 'First_Word' distinctively, and do a count to find out which is the most popular
			select First_Word, count(First_Word) as Source_Count
			from (
				select DISTINCT Source,
				case
					WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
					WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
					ELSE Source
				end as First_Word
				from [Forbes List]..Forbes
				group by Source
			) as subquery
			group by First_Word
			Order by count(First_Word) desc

		-- Now I want to add this new column to my existing Forbes table.
			ALTER TABLE [Forbes List]..Forbes ADD Source_First_Word NVARCHAR(255)

			UPDATE [Forbes List]..Forbes
			SET Source_First_Word = case
							WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
							WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
							ELSE Source
						end
			

	-- What is the most popular industry?
		select DISTINCT(Industry), count(Industry) as Industry_Count
		from [Forbes List]..Forbes
		group by Industry
		order by count(Industry) desc
		
	-- Which industry has the highest average net worth?
		WITH Net as (
			select Industry, count(Industry) as Industry_Count, round(avg(cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float)), 2) as Average_Net_Worth_F
			from [Forbes List]..Forbes
			group by Industry			
		)	

		select Industry, Industry_Count, CONCAT('$', Average_Net_Worth_F, ' B') as Average_Net_Worth
		from Net 
		order by Average_Net_Worth desc

Excel

Data Cleaning

I have chosen to utilize Excel for additional data cleaning in order to enhance the usability and quality of the information.

Firstly, I imported my SQL dataset into Excel by carrying out the following steps: Get Data -> From Database -> From SQL Server Database

I observed that the "Source" column contains multiple instances of duplicates with variations in spelling, such as additional letters or spelling errors.

Subsequently, I employed the REPLACE function to perform the required cleaning operations on the "Source" column until all entries became unique by resolving variations in spelling and eliminating duplicates.

Tableau

Create Dashboard

To enhance the accessibility and ease of reference for all the information, I utilized Tableau to develop an interactive dashboard.
In order to maximize dashboard interactivity, I integrated dynamic filters into relevant charts.
By leveraging parameters and calculated fields, users are empowered to actively explore and refine the presentation of data according to their individual requirements, thereby enhancing their overall interactive experience.

** Please note that there is a display error with the parameter shown in the picture ("Select Country"). To view the correct display, kindly visit my Tableau profile.