Base R vs Tidyverse

Author

Filippo Gambarota

Problem Description

In this exercise, we will use two datasets:

  1. The iris dataset for complex operations on grouped data.
  2. The mtcars dataset for reshaping between long and wide formats.

This will allow us to compare different data manipulation tasks using base R and the tidyverse.

Task 1: Complex Operations on Grouped Data

We will calculate the mean and standard deviation for each measurement (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) by species.

Base R Solution

# Load the iris dataset
data(iris)

# Base R approach using tapply and aggregate
mean_sd_base <- aggregate(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~ Species, data = iris, 
                          FUN = function(x) c(mean = mean(x), sd = sd(x)))

# Flatten the results
mean_sd_base <- do.call(data.frame, mean_sd_base)

# Display the result
mean_sd_base
     Species Sepal.Length.mean Sepal.Length.sd Sepal.Width.mean Sepal.Width.sd
1     setosa             5.006       0.3524897            3.428      0.3790644
2 versicolor             5.936       0.5161711            2.770      0.3137983
3  virginica             6.588       0.6358796            2.974      0.3224966
  Petal.Length.mean Petal.Length.sd Petal.Width.mean Petal.Width.sd
1             1.462       0.1736640            0.246      0.1053856
2             4.260       0.4699110            1.326      0.1977527
3             5.552       0.5518947            2.026      0.2746501

Tidyverse Solution

# Load the tidyverse package
library(tidyverse)

# Tidyverse approach using dplyr
mean_sd_tidy <- iris %>%
  group_by(Species) %>%
  summarize(across(starts_with("Sepal") | starts_with("Petal"), 
                   list(mean = ~mean(.), sd = ~sd(.)), 
                   .names = "{col}_{fn}"))

# Display the result
mean_sd_tidy
# A tibble: 3 × 9
  Species    Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
  <fct>                  <dbl>           <dbl>            <dbl>          <dbl>
1 setosa                  5.01           0.352             3.43          0.379
2 versicolor              5.94           0.516             2.77          0.314
3 virginica               6.59           0.636             2.97          0.322
# ℹ 4 more variables: Petal.Length_mean <dbl>, Petal.Length_sd <dbl>,
#   Petal.Width_mean <dbl>, Petal.Width_sd <dbl>

Task 2: Reshaping Data (Long to Wide and Back)

We will reshape the mtcars dataset by converting it into a long format where each measurement is recorded separately for each car model, and then back into a wide format.

Base R Solution

# Load the mtcars dataset
data(mtcars)

# Add car names as a column instead of row names
mtcars$car <- rownames(mtcars)

# Base R approach to long format
mtcars_long_base <- reshape(mtcars, idvar = "car", varying = names(mtcars)[1:11], 
                            v.names = "value", timevar = "variable", 
                            times = names(mtcars)[1:11], direction = "long")

# Back to wide format
mtcars_wide_base <- reshape(mtcars_long_base, idvar = "car", timevar = "variable", 
                            direction = "wide")

# Display results
head(mtcars_long_base)
                                    car variable value
Mazda RX4.mpg                 Mazda RX4      mpg  21.0
Mazda RX4 Wag.mpg         Mazda RX4 Wag      mpg  21.0
Datsun 710.mpg               Datsun 710      mpg  22.8
Hornet 4 Drive.mpg       Hornet 4 Drive      mpg  21.4
Hornet Sportabout.mpg Hornet Sportabout      mpg  18.7
Valiant.mpg                     Valiant      mpg  18.1
head(mtcars_wide_base)
                                    car value.mpg value.cyl value.disp value.hp
Mazda RX4.mpg                 Mazda RX4      21.0         6        160      110
Mazda RX4 Wag.mpg         Mazda RX4 Wag      21.0         6        160      110
Datsun 710.mpg               Datsun 710      22.8         4        108       93
Hornet 4 Drive.mpg       Hornet 4 Drive      21.4         6        258      110
Hornet Sportabout.mpg Hornet Sportabout      18.7         8        360      175
Valiant.mpg                     Valiant      18.1         6        225      105
                      value.drat value.wt value.qsec value.vs value.am
Mazda RX4.mpg               3.90    2.620      16.46        0        1
Mazda RX4 Wag.mpg           3.90    2.875      17.02        0        1
Datsun 710.mpg              3.85    2.320      18.61        1        1
Hornet 4 Drive.mpg          3.08    3.215      19.44        1        0
Hornet Sportabout.mpg       3.15    3.440      17.02        0        0
Valiant.mpg                 2.76    3.460      20.22        1        0
                      value.gear value.carb
Mazda RX4.mpg                  4          4
Mazda RX4 Wag.mpg              4          4
Datsun 710.mpg                 4          1
Hornet 4 Drive.mpg             3          1
Hornet Sportabout.mpg          3          2
Valiant.mpg                    3          1

Tidyverse Solution

# Tidyverse approach to long format
mtcars_long_tidy <- mtcars %>% 
  pivot_longer(cols = -car, names_to = "variable", values_to = "value")

# Back to wide format
mtcars_wide_tidy <- mtcars_long_tidy %>% 
  pivot_wider(names_from = variable, values_from = value)

# Display results
head(mtcars_long_tidy)
# A tibble: 6 × 3
  car       variable  value
  <chr>     <chr>     <dbl>
1 Mazda RX4 mpg       21   
2 Mazda RX4 cyl        6   
3 Mazda RX4 disp     160   
4 Mazda RX4 hp       110   
5 Mazda RX4 drat       3.9 
6 Mazda RX4 wt         2.62
head(mtcars_wide_tidy)
# A tibble: 6 × 12
  car            mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Mazda RX4     21       6   160   110  3.9   2.62  16.5     0     1     4     4
2 Mazda RX4 W…  21       6   160   110  3.9   2.88  17.0     0     1     4     4
3 Datsun 710    22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
4 Hornet 4 Dr…  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
5 Hornet Spor…  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
6 Valiant       18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

Comparison of Base R and Tidyverse

Pros and Cons of Base R

Pros:

  • Flexibility: Base R allows detailed control over transformations.
  • No external dependencies: No need to install additional packages.
  • Suitable for simple tasks: If transformations are minimal, base R can be effective.

Cons:

  • Verbose: Base R code for reshaping data is long and requires multiple parameters.
  • Less intuitive: The syntax for reshape() can be confusing.
  • More manual work: Intermediate steps often need to be managed explicitly.

Pros and Cons of Tidyverse

Pros:

  • Concise and readable: Functions like pivot_longer() and pivot_wider() are intuitive.
  • Streamlined workflow: Tidyverse simplifies common operations like grouping and reshaping.
  • Better suited for modern data analysis: Works well with pipes and declarative transformations.

Cons:

  • Requires package installation: Tidyverse needs additional dependencies.
  • Learning curve: Users new to functional programming might need time to adapt.
  • May not cover every niche use case: Highly specific transformations might need workarounds.

Summary

We compared base R and tidyverse methods for complex grouped operations and reshaping data between long and wide formats. The tidyverse offers a more readable and efficient approach, particularly for grouped data and reshaping tasks. Base R remains useful for cases where fine control over transformations is needed, but it can be more verbose and complex for users unfamiliar with its syntax.