7 Creating a script
Now that we have a bunch of fundamental skills, we are going to integrate them from top-to-bottom in the form of a brand new script. This will help us practice what it actually feels like to explore and analyze a data set on our own.
- Each saved script must be able to be run sequentially from top to bottom without failing.
source()
function or the Source button should run without error.- The document should proceed in a rational order.
7.1 Pseudo-script form
# ------------------------------------------------------------------------------
# Title: The general form template of a script
# Date
# Author
# ------------------------------------------------------------------------------
# Load libraries
require(tidyverse)
# Load data
<- read_csv("file.csv")
df
# Analyze data
%>%
df group_by(groups) %>%
summarise(avg = mean(dep_var))
# Visualize
%>%
df ggplot() +
geom_boxplot(aes(x = ind_var, y = dep_var))
# Statistics
<- aov(dep_var ~ ind_var, data = df) mod
This is just an example of a general structure of the script. When you are actually exploring your data, you should have more exploration, analysis, and visualization in your script.
7.2 Script for CTmax data
# CT max data ------------------------------------------------------------------
# Thomas O'Leary
# Load library
library(tidyverse)
# Load data
<- read_delim("Populations_CTmax.csv", delim = ";")
df # Rows: 336 Columns: 3
# ── Column specification ────────────────────────────────────────────────────────
# Delimiter: ";"
# chr (2): Population, Treatment
# dbl (1): CTmax
#
# ℹ Use `spec()` to retrieve the full column specification for this data.
# ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Get averages and standard deviations
%>%
df group_by(Population, Treatment) %>%
summarise(ct_max = mean(CTmax),
sd = sd(CTmax))
# `summarise()` has grouped output by 'Population'. You can override using the
# `.groups` argument.
# # A tibble: 14 × 4
# # Groups: Population [7]
# Population Treatment ct_max sd
# <chr> <chr> <dbl> <dbl>
# 1 Chorges Control 37.7 0.373
# 2 Chorges Hardening 38.3 0.460
# 3 Frankfurt Control 37.9 0.515
# 4 Frankfurt Hardening 38.5 0.458
# 5 Hamburg Control 37.3 1.10
# 6 Hamburg Hardening 38.1 0.301
# 7 Hannover Control 37.6 0.948
# 8 Hannover Hardening 38.3 0.372
# 9 Mols Control 37.1 0.395
# 10 Mols Hardening 38.1 0.316
# 11 Siena Control 38.1 0.349
# 12 Siena Hardening 38.7 0.391
# 13 Skagen Control 36.8 0.517
# 14 Skagen Hardening 37.8 0.408
# Make plots ----
# Make a box plot
%>%
df ggplot() +
geom_boxplot(aes(x = Treatment,
y = CTmax)) +
theme_classic()
# Color by population
%>%
df ggplot() +
geom_boxplot(aes(x = Treatment,
y = CTmax,
color = Population)) +
theme_classic()
# Run analysis
<- aov(CTmax ~ Population*Treatment,
mod data = df)
::tidy(mod)
broom# # A tibble: 4 × 6
# term df sumsq meansq statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Population 6 40.1 6.68 22.7 2.75e-22
# 2 Treatment 1 48.9 48.9 166. 6.99e-31
# 3 Population:Treatment 6 2.63 0.439 1.49 1.82e- 1
# 4 Residuals 322 94.9 0.295 NA NA