Description

I delved into the Forbes List of February 2023, conducting comprehensive Exploratory Data Analysis (EDA) to extract valuable insights. These insights were subsequently transformed into a dynamic and interactive dashboard, providing an engaging visualization of the data.

Skills Applied: SQL, Excel, Tableau


Implementation

  • I proficiently employed a diverse range of SQL functions, including SUM, MAX, MIN, and COUNT, to gain insights during the data exploration process. Additionally, I utilized advanced techniques such as Common Table Expressions (CTE) to enhance the analysis and generate comprehensive results.
  • Utilize Excel for further data cleansing to improve the usability and quality of the information.
  • Utilizing Tableau, I created a dynamic dashboard that simplifies the analysis breakdown, making it easier to comprehend and explore the insights.

Breakdown

SQL

Exploratory Data Analysis

  • To kickstart my project, I commenced by importing the dataset into SQL and devising analytical queries.
  • Following that, I proceeded with my exploratory data analysis (EDA) by utilizing various aggregate functions such as MAX, MIN, SUM, and COUNT to address several inquiries.
  • Additionally, I performed data cleaning tasks to facilitate the transformation of the data into a more meaningful format that would assist in my analysis. For instance, I introduced a new column called "Source_First_Name."
  • Please refer to the code section below (double-click) to access the complete EDA process and the corresponding list of questions.

-- Dataset
select *
from [Forbes List]..Forbes
order by rank

-- Exploratory Data Analysis

	-- What is the average, max, min age of these people in Forbes List?
		select max(Age) as Highest_Age, min(Age) as Lowest_Age, cast(avg(Age) as int) as Average_Age
		from [Forbes List]..Forbes


	-- Who is below the average age?
		select *
		from [Forbes List]..Forbes
		where age <= (select avg(age) from [Forbes List]..Forbes)
		order by rank


	-- I'm 24 y/o at the point of writing this SQL code, how many people below or equal to my age are in this list?
		select count(Age) as Count_People_My_Age
		from [Forbes List]..Forbes
		where Age <= 24


	-- What is the lowest Net Worth and the average of the net worth on the list?

		-- First we need to make sure the Net Worth column is converted to FLOAT
		WITH Net as (
		select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
		from [Forbes List]..Forbes
		)

		-- Now we can find the Lowest Net Worth and Average Net Worth using 'Net_Worth_F'
		select CONCAT('$', round(avg(Net_Worth_F), 2), ' B') as Average_Net_Worth, CONCAT('$', min(Net_Worth_F), ' B') as Minimum_Net_Worth
		from Net


	-- How many people are above average? ($4.76 B)
		WITH Net as (
		select cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float) as Net_Worth_F
		from [Forbes List]..Forbes
		)

		select count(Net_Worth_F) as Count_Above_Average
		from Net
		where Net_Worth_F > (select avg(Net_Worth_F) from Net)


	-- What countries do these individuals come from?
		select DISTINCT(Country)
		from [Forbes List]..Forbes
		

	-- How many individuals come from their respective country?
		select DISTINCT(Country), count(Country) as Count_Individuals
		from [Forbes List]..Forbes
		group by Country
		order by count(Country) desc


	-- What is the average age for the respective country to be on Forbes List?
		select DISTINCT(Country), count(Country) as Count_Individuals, cast(avg(Age) as int) as Average_Age 
		from [Forbes List]..Forbes
		group by Country
		order by count(Country) desc


	-- Sources in Forbes List and most populour source
		
		-- The original Source column has too many different data, let's create a new column to ease my findings
			select DISTINCT(Source),
			case
				WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
				WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
				ELSE Source
			end as First_Word
			from [Forbes List]..Forbes
			group by Source;

		-- Now I want to group by 'First_Word' distinctively, and do a count to find out which is the most popular
			select First_Word, count(First_Word) as Source_Count
			from (
				select DISTINCT Source,
				case
					WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
					WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
					ELSE Source
				end as First_Word
				from [Forbes List]..Forbes
				group by Source
			) as subquery
			group by First_Word
			Order by count(First_Word) desc

		-- Now I want to add this new column to my existing Forbes table.
			ALTER TABLE [Forbes List]..Forbes ADD Source_First_Word NVARCHAR(255)

			UPDATE [Forbes List]..Forbes
			SET Source_First_Word = case
							WHEN CHARINDEX(',', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(',', Source) - 1)
							WHEN CHARINDEX(' ', Source) > 0 THEN SUBSTRING(Source, 1, CHARINDEX(' ', Source) - 1)			
							ELSE Source
						end
			

	-- What is the most popular industry?
		select DISTINCT(Industry), count(Industry) as Industry_Count
		from [Forbes List]..Forbes
		group by Industry
		order by count(Industry) desc
		
	-- Which industry has the highest average net worth?
		WITH Net as (
			select Industry, count(Industry) as Industry_Count, round(avg(cast(REPLACE(REPLACE([Net Worth], '$', ''), ' B', '') as float)), 2) as Average_Net_Worth_F
			from [Forbes List]..Forbes
			group by Industry			
		)	

		select Industry, Industry_Count, CONCAT('$', Average_Net_Worth_F, ' B') as Average_Net_Worth
		from Net 
		order by Average_Net_Worth desc
											

Excel

Data Cleaning

  • I have chosen to utilize Excel for additional data cleaning in order to enhance the usability and quality of the information.
  • Firstly, I imported my SQL dataset into Excel by carrying out the following steps: Get Data -> From Database -> From SQL Server Database

  • I observed that the "Source" column contains multiple instances of duplicates with variations in spelling, such as additional letters or spelling errors.
  • Subsequently, I employed the REPLACE function to perform the required cleaning operations on the "Source" column until all entries became unique by resolving variations in spelling and eliminating duplicates.

Tableau

Create Dashboard

  • To enhance the accessibility and ease of reference for all the information, I utilized Tableau to develop an interactive dashboard.
  • In order to maximize dashboard interactivity, I integrated dynamic filters into relevant charts.
  • By leveraging parameters and calculated fields, users are empowered to actively explore and refine the presentation of data according to their individual requirements, thereby enhancing their overall interactive experience.

** Please note that there is a display error with the parameter shown in the picture ("Select Country"). To view the correct display, kindly visit my Tableau profile.