12 Advanced skills
There are a bunch of useful & fun skills that we will not have time to cover during this workshop.
12.1 Version control
Version control is a neat way to track the changes that you have made to your code over time so that you can restore previous versions if you break your code. Check out Happy Git and GitHub for the useR for more information
12.2 Interactive visualization
With big data sets, it is sometimes useful & fun to be able to create an interactive visualization of your data. This can be useful both when you are exploring your data and sharing it with others. Check out shiny
applications for more information.
12.3 Advanced tidyverse
12.3.1 nest
%>%
ChickWeight glimpse()
# Rows: 578
# Columns: 4
# $ weight <dbl> 42, 51, 59, 64, 76, 93, 106, 125, 149, 171, 199, 205, 40, 49, 5…
# $ Time <dbl> 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 21, 0, 2, 4, 6, 8, 10, 1…
# $ Chick <ord> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
# $ Diet <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
%>%
ChickWeight group_by(Chick, Diet) %>%
nest()
# # A tibble: 50 × 3
# # Groups: Chick, Diet [50]
# Chick Diet data
# <ord> <fct> <list>
# 1 1 1 <tibble [12 × 2]>
# 2 2 1 <tibble [12 × 2]>
# 3 3 1 <tibble [12 × 2]>
# 4 4 1 <tibble [12 × 2]>
# 5 5 1 <tibble [12 × 2]>
# 6 6 1 <tibble [12 × 2]>
# 7 7 1 <tibble [12 × 2]>
# 8 8 1 <tibble [11 × 2]>
# 9 9 1 <tibble [12 × 2]>
# 10 10 1 <tibble [12 × 2]>
# # … with 40 more rows
<- ChickWeight %>%
ChickWeight_nest group_by(Chick, Diet) %>%
nest()
$data[1:2]
ChickWeight_nest# [[1]]
# # A tibble: 12 × 2
# weight Time
# <dbl> <dbl>
# 1 42 0
# 2 51 2
# 3 59 4
# 4 64 6
# 5 76 8
# 6 93 10
# 7 106 12
# 8 125 14
# 9 149 16
# 10 171 18
# 11 199 20
# 12 205 21
#
# [[2]]
# # A tibble: 12 × 2
# weight Time
# <dbl> <dbl>
# 1 40 0
# 2 49 2
# 3 58 4
# 4 72 6
# 5 84 8
# 6 103 10
# 7 122 12
# 8 138 14
# 9 162 16
# 10 187 18
# 11 209 20
# 12 215 21
12.3.2 broom
# Load libraries
require(broom)
# Loading required package: broom
Check out the broom
vignette.
And the broom
and dplyr
vignette.
tidy
: constructs a tibble that summarizes the model’s statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information for multtest functions.
glance
: construct a concise one-row summary of the model. This typically contains values such as R^2, adjusted R^2, and residual standard error that are computed once for the entire model.
%>%
ChickWeight group_by(Chick, Diet) %>%
nest() %>%
mutate(
fit = map(data, ~ lm(weight ~ Time, data = .x)),
tidied = map(fit, tidy),
glanced = map(fit, glance)
%>%
) unnest(tidied)
# # A tibble: 100 × 10
# # Groups: Chick, Diet [50]
# Chick Diet data fit term estimate std.error statistic p.value
# <ord> <fct> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 <tibble> <lm> (Intercept) 24.5 6.73 3.64 4.56e- 3
# 2 1 1 <tibble> <lm> Time 7.99 0.524 15.3 2.97e- 8
# 3 2 1 <tibble> <lm> (Intercept) 24.7 4.93 5.01 5.26e- 4
# 4 2 1 <tibble> <lm> Time 8.72 0.384 22.7 6.15e-10
# 5 3 1 <tibble> <lm> (Intercept) 23.2 5.08 4.56 1.04e- 3
# 6 3 1 <tibble> <lm> Time 8.49 0.396 21.5 1.08e- 9
# 7 4 1 <tibble> <lm> (Intercept) 32.9 4.01 8.21 9.42e- 6
# 8 4 1 <tibble> <lm> Time 6.09 0.312 19.5 2.70e- 9
# 9 5 1 <tibble> <lm> (Intercept) 16.9 7.56 2.24 4.93e- 2
# 10 5 1 <tibble> <lm> Time 10.1 0.588 17.1 9.88e- 9
# # … with 90 more rows, and 1 more variable: glanced <list>
%>%
ChickWeight ggplot() +
geom_line(aes(x = Time,
y = weight,
color = Chick)) +
facet_wrap(~ Diet)