Creating data visualisation beyond default
Similar to take-home exercise 1, we are still interested in the demographic of the city of Engagement, Ohio USA but this time we will evaluate and make over the data visualisation made by one of our classmates.
The data is processed by using appropriate tidyverse family of packages, whereas the statistical graphics are prepared using ggplot2 and its extensions.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The packages required for this exercise are tidyverse and ggridges.
packages = c('tidyverse','ggridges')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The original datasets were obtained from VAST Challenge 2022 in csv format. It consists of basic information about the residents of Engagement, OH that have agreed to participate in this study.
The code chunk below imports 2 datasets, namely
Participants.csv and Jobs.csv into R by using
read_csv() of readr and saves it as tibble
data frames called demographics and jobs respectively.
Demographic dataset consists of 1011 records, whereas Jobs dataset
consists of 1328 records as shown below.
demographics <- read_csv("data/Participants.csv")
glimpse(demographics)
Rows: 1,011
Columns: 7
$ participantId <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
$ householdSize <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ~
$ haveKids <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU~
$ age <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2~
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",~
$ interestGroup <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", ~
$ joviality <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380~
jobs <- read_csv("data/Jobs.csv")
glimpse(jobs)
Rows: 1,328
Columns: 7
$ jobId <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1~
$ employerId <dbl> 379, 379, 380, 380, 381, 381, 381, 381,~
$ hourlyRate <dbl> 10.00000, 22.21763, 10.00000, 15.31207,~
$ startTime <time> 07:46:00, 07:31:00, 08:00:00, 07:39:00~
$ endTime <time> 15:46:00, 15:31:00, 16:00:00, 15:39:00~
$ daysToWork <chr> "[Monday,Tuesday,Wednesday,Thursday,Fri~
$ educationRequirement <chr> "HighSchoolOrCollege", "Bachelors", "Ba~
Column names are renamed such that the first letter of each word is
capitalised. To do this, we will use a function called
rename() of dplyr.
demographics <- demographics %>%
rename('ParticipantID' = 'participantId',
'HouseholdSize' = 'householdSize',
'HaveKids' = 'haveKids',
'Age' = 'age',
'EducationLevel' = 'educationLevel',
'InterestGroup' = 'interestGroup',
'Joviality' = 'joviality')
jobs <- jobs %>%
rename('JobID' = 'jobId',
'EmployerID' = 'employerId',
'HourlyRate' = 'hourlyRate',
'StartTime' = 'startTime',
'EndTime' = 'endTime',
'DaystoWork' = 'daysToWork',
'EducationRequirement' = 'educationRequirement')
ORIGINAL DATA VISUALISATION
CLARITY
AESTHETIC
MAKEOVER DESIGN
In this makeover design, we add color in
geom_histogram() to depict the outline of the bar. In
addition, we also add some other functions such as theme()
to rotate the y-axis label, geom_vline() to create a dashed
mean line, and geom_text() to display the text
‘Average’.
ggplot(data = demographics,
aes(x = Age, fill = HaveKids)) +
geom_histogram(bins = 20,
color = "grey20") +
labs(x = "Age",
y = "No. of\n People",
title = "Most People Do Not Have Kids",
fill = "Have Kids") +
theme(axis.title.y= element_text(angle=0)) +
geom_vline(aes(xintercept=mean(Age,
na.rm=T)),
color="black",
linetype="dashed",
size=0.5) +
geom_text(aes(42,85,
label="Average"), size=3.5)
ORIGINAL DATA VISUALISATION
CLARITY
AESTHETIC
MAKEOVER DESIGN
In this makeover design, we add some functions such as
theme() to rotate the y-axis label and
geom_text() to display the frequency on top of each
bar.
ggplot(data = demographics,
aes(x = reorder(EducationLevel, EducationLevel, function(x)-length(x)))) +
geom_bar(fill = "lightblue")+
labs(x = "Education Level",
y = "No. of\n People",
title = "Most People Have High School or College Degree") +
theme(axis.title.y= element_text(angle=0)) +
geom_text(aes(label=..count..),
stat="count",
vjust=-0.3)
ORIGINAL DATA VISUALISATION
CLARITY
AESTHETIC
MAKEOVER DESIGN
In this makeover design, we multiply joviality by 100 to get
joviality percentage. Next, we add some functions such as
theme() to rotate the y-axis label and
stat_summary() to display the average value on each
boxplot.
ggplot(data = demographics,
aes( x =reorder(EducationLevel, -Joviality), y = Joviality*100)) +
geom_boxplot()+
stat_summary(geom = "point",
fun = "mean",
colour = "red",
size = 1.5) +
stat_summary(fun.y=mean, colour="darkred", geom="text", show_guide = FALSE,
vjust=-0.7, aes( label=round(..y.., digits=3))) +
labs( x = "Education Level",
y = "Joviality\n Percentage",
title = "Graduates are the Most Jovial on Average") +
theme(axis.title.y= element_text(angle=0))
ORIGINAL DATA VISUALISATION
CLARITY
AESTHETIC
MAKEOVER DESIGN
Similar to the first chart, in this ridge plot makeover design, we
add some functions such as theme() to rotate the y-axis
label, geom_vline() to create a dashed mean line, and
geom_text() to display the text ‘Average’.
ggplot(data = jobs,
aes(x = HourlyRate, y = reorder( EducationRequirement, -HourlyRate))) +
geom_density_ridges(rel_min_height = 0.01)+
labs(x = "Hourly Rate",
y = "Education\n Requirement",
title = "Graduates and Bachelors Earn Higher Hourly Wage") +
theme(axis.title.y= element_text(angle=0)) +
geom_vline(aes(xintercept=mean(HourlyRate,
na.rm=T)),
color="black",
linetype="dashed",
size=0.5) +
geom_text(aes(25,5.5,label="Average"), size=3)
University of New Mexico. (n.d.). Colors in HTML. https://www.unm.edu/~tbeach/IT145/color.html
Xie, Y.H., et.al. (2022, April 14). Font Color. R Markdown Cookbook. https://bookdown.org/yihui/rmarkdown-cookbook/font-color.html