Today will be a fast tour of modern R.
R is a moving target
- Focus on “Tidyverse”
dplyr
,ggplot2
,tidyr
, etc- Mostly written by Hadley Wickham
S - 1976
R - 1993
Today will be a fast tour of modern R.
R is a moving target
dplyr
, ggplot2
, tidyr
, etcS - 1976
R - 1993
Complex experimental designs, large \(n\) and/or large \(p\).
Understanding is a cyclic process of exploratation.
In biology you may want to join together many different views of a process: DNA, RNA, epigenetics, proteins, metabolome, cell morphology, …
Your data can be viewed in the context of many other data sets.
In biology, reference genome and gene annotations are the key to joining different types of data.
See also: The “R for Data Science” book
Key packages: readr, tidyr, dplyr
The greatest value of a picture is when it forces us
to notice what we never expected to see.
– John Tukey
Key packages: ggplot2, shiny
Key packages: dplyr
Base R functions: mean, min, max, sd, lm, glm, anova, …
Specialized packages: too many to name
Note: Model fitting and hypothesis testing won’t be covered today.
Tidy data doesn’t mean tidy for a person to read, it means the easiest form for the computer to work with.
Similar to database design.
The experimental design is in the body of the table alongside the data, not in row names or column names.
Example from: Wickham, H. (2015) Tidy data. The Journal of Statistical Software, vol. 59.
If every step of your analysis is recorded in an R script, with no manual steps:
Open, reproducable data science.
Show your code. It may surprise you who picks it up and runs with it.
Document analysis in Rmarkdown
devtools::check
sessionInfo()
Rmarkdown documents
Programming
Tidying data and visualizing it
Sharing data interactively
Working with DNA sequences and genomic feature data