1. Overview

In this take-home exercise 4, we are going to analyse and describe the daily routines of two selected participants of the city of Engagement, Ohio USA by using appropriate visual analytics methods.

The data is processed by using appropriate tidyverse family of packages, whereas the statistical graphics are prepared using ggplot2/ViSiElse and its extensions.

2. Getting Started

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The packages required for this exercise are tidyverse, data.table, ViSiElse, zoo and patchwork.

packages = c('tidyverse','data.table','ViSiElse','zoo','patchwork')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Data

3.1 Data Source

The original datasets were obtained from VAST Challenge 2022 in csv format. They show the daily routines of 1011 residents of Engagement, OH that have agreed to participate in this study.

3.2 Importing Data

The code chunk below imports datasets from a folder called ActivityLogs into R by using fread() of data.table and saves it as tibble data frame called logs. But firstly, list.files() is used to define the path where the data files reside and map_df() of purrr is used to apply a function to elements of a list, and then bind the data frames together.

logs <- list.files(path = "./data/ActivityLogs/",
                   pattern = "*.csv",
                   full.names = T) %>%
  map_df(~fread(.))

As the datasets are large and the process is time consuming, we will save the output tibble into an output file in rds format. We will randomly select 2 participants from the joined dataset, namely Participant #1 and #2 by using filter() of dplyr and we will see how typical Monday and Sunday look like for each participant by selecting May 2, 2022 and May 8, 2022. To do this, we will create a new column called Date by using as.Date() of zoo.

logs <- logs %>%
  filter(participantId == '1' | participantId == '2') %>%
  mutate(Date=as.Date(timestamp)) %>%
  filter(Date == '2022-05-02' | Date == '2022-05-08')
saveRDS(logs, 'data/logs.rds')

In the future, we just need to read the saved rds data file into R by using readRDS().

logs <- readRDS('data/logs.rds')

3.3 Data Wrangling

In this section, we will combine the 2 columns - currentMode and sleepStatus into one column called Activity by using unite().

The code chunk below will select participant #1 and his/her activities on Monday and save it as a new data frame called logs1_mon.

logs1_mon <- logs %>%
  filter(participantId == 1) %>%
  unite(Activity, c(currentMode,sleepStatus), sep = '-', remove = FALSE) %>%
  mutate(StartTime = format(timestamp,"%H:%M:%S"),
        EndTime = timestamp+(5*60)) %>%
  filter(Date == '2022-05-02')

The code chunk below will select participant #2 and his/her activities on Monday and save it as a new data frame called logs2_mon.

logs1_sun <- logs %>%
  filter(participantId == 1) %>%
  unite(Activity, c(currentMode,sleepStatus), sep = '-', remove = FALSE) %>%
  mutate(StartTime = format(timestamp,"%H:%M:%S"),
        EndTime = timestamp+(5*60)) %>%
  filter(Date == '2022-05-08')

The code chunk below will select participant #1 and his/her activities on Sunday and save it as a new data frame called logs1_sun.

logs2_mon <- logs %>%
  filter(participantId == 2) %>%
  unite(Activity, c(currentMode,sleepStatus), sep = '-', remove = FALSE) %>%
  mutate(StartTime = format(timestamp,"%H:%M:%S"),
        EndTime = timestamp+(5*60)) %>%
  filter(Date == '2022-05-02')

The code chunk below will select participant #2 and his/her activities on Sunday and save it as a new data frame called logs2_sun.

logs2_sun <- logs %>%
  filter(participantId == 2) %>%
  unite(Activity, c(currentMode,sleepStatus), sep = '-', remove = FALSE) %>%
  mutate(StartTime = format(timestamp,"%H:%M:%S"),
        EndTime = timestamp+(5*60)) %>%
  filter(Date == '2022-05-08')

4. Daily Routines Visualisation

Typical Monday for Each Participant

m1 <- ggplot(data = logs1_mon,
       aes(x = timestamp, y = Activity)) +
  geom_point() +
  labs(x = "Time",
       y = "Activity",
       title = "Typical Monday for Participant #1") +
  theme(axis.title.y= element_text(angle=0))

m2 <- ggplot(data = logs2_mon,
       aes(x = timestamp, y = Activity)) +
  geom_point() +
  labs(x = "Time",
       y = "Activity",
       title = "Typical Monday for Participant #2") +
  theme(axis.title.y= element_text(angle=0))

m1 / m2

It is observed that participant #2 wakes up earlier than participant #1 and sleeps earlier at night. In the morning, participant #2 arrives at the office and immediately goes to restaurant for breakfast, while participant #1 does not go to restaurant at all throughout the day. The working hours for both participants are similar, but participant #2 starts and stops working earlier than participant #1.

Typical Sunday for Each Participant

s1<- ggplot(data = logs1_sun,
       aes(x = timestamp, y = Activity)) +
  geom_point() +
  labs(x = "Time",
       y = "Activity",
       title = "Typical Sunday for Participant #1") +
  theme(axis.title.y= element_text(angle=0))

s2<- ggplot(data = logs2_sun,
       aes(x = timestamp, y = Activity)) +
  geom_point() +
  labs(x = "Time",
       y = "Activity",
       title = "Typical Sunday for Participant #2") +
  theme(axis.title.y= element_text(angle=0))

s1 / s2

Similar to Monday’s routines, on Sunday participant #2 also wakes up and sleeps earlier than participant #1. Participant #2 spends his/her entire day at home, while participant #1 spends some time in the morning to go to the restaurant for breakfast.

5. Future Work

Daily activities could also be visualised by using ViSiElse method. However, we must specify duration for each activity and put this on x-axis instead of timestamp. The below snapshot shows the example of ViSiElse graph.

The initial code chunk is shown here. To get a total duration for each activity, we will aggregate a new column called Duration.

logs1_v <- logs %>%
  filter(participantId == '1') %>%
  unite(Activity, c(currentMode,sleepStatus), sep = '-', remove = FALSE) %>%
  mutate(Date=as.Date(timestamp),
         EndTime = lead(timestamp),
         Duration = difftime(EndTime,timestamp,units="mins")) %>%
  filter(Date == '2022-05-02') %>%
  pivot_wider(names_from = "Activity",
              values_from = "Duration")

print(logs1_v)

# A tibble: 288 x 18
   timestamp           currentLocation       participantId currentMode
   <dttm>              <chr>                         <int> <chr>      
 1 2022-05-02 00:00:00 POINT (-1526.9372331~             1 AtHome     
 2 2022-05-02 00:05:00 POINT (-1526.9372331~             1 AtHome     
 3 2022-05-02 00:10:00 POINT (-1526.9372331~             1 AtHome     
 4 2022-05-02 00:15:00 POINT (-1526.9372331~             1 AtHome     
 5 2022-05-02 00:20:00 POINT (-1526.9372331~             1 AtHome     
 6 2022-05-02 00:25:00 POINT (-1526.9372331~             1 AtHome     
 7 2022-05-02 00:30:00 POINT (-1526.9372331~             1 AtHome     
 8 2022-05-02 00:35:00 POINT (-1526.9372331~             1 AtHome     
 9 2022-05-02 00:40:00 POINT (-1526.9372331~             1 AtHome     
10 2022-05-02 00:45:00 POINT (-1526.9372331~             1 AtHome     
# ... with 278 more rows, and 14 more variables: hungerStatus <chr>,
#   sleepStatus <chr>, apartmentId <int>, availableBalance <dbl>,
#   jobId <int>, financialStatus <chr>, dailyFoodBudget <int>,
#   weeklyExtraBudget <dbl>, Date <date>, EndTime <dttm>,
#   `AtHome-Sleeping` <drtn>, `AtHome-Awake` <drtn>,
#   `Transport-Awake` <drtn>, `AtWork-Awake` <drtn>

6. References

Marsja, E. (2021, February 14). How to Concatenate Two Columns (or More) in R – stringr, tidyr. https://www.marsja.se/how-to-concatenate-two-columns-or-more-in-r-stringr-tidyr/

Take-home Exercise 4