This lesson covers packages primarily by Hadley Wickham for tidying data and then working with it in tidy form, collectively known as the “tidyverse”.
The packages we are using in this lesson are all from CRAN, so we can install them with install.packages
. Don’t run this if you are using our biotraining server, the packages are already installed!
# install.packages(c(
# "tidyverse",
# "viridis",
# "broom"
# ))
library(tidyverse) # Load all "tidyverse" libraries.
# OR
# library(readr) # Read tabular data.
# library(tidyr) # Data frame tidying functions.
# library(dplyr) # General data frame manipulation.
# library(ggplot2) # Flexible plotting.
library(viridis) # Viridis color scale.
These packages usually have useful documentation in the form of “vignettes”. These are readable on the CRAN website, or within R:
vignette()
vignette(package="dplyr")
vignette("dplyr", package="dplyr")
Let’s continue our examination of the FastQC output. If you’re starting fresh for this lesson, you can load the necessary data frame with:
bigtab <- read_csv("r-more-files/fastqc.csv")
## Parsed with column specification:
## cols(
## grade = col_character(),
## test = col_character(),
## file = col_character()
## )
We saw ggplot2 in the introductory R day. Recall that we could assign columns of a data frame to aesthetics–x and y position, color, etc–and then add “geom”s to draw the data.
With ggplot2
we can easily view the whole data set.
ggplot(bigtab, aes(x=file,y=test,color=grade)) +
geom_point()
With categorical data on the x and y axes, a better geom to use is geom_tile
.
ggplot(bigtab, aes(x=file,y=test,fill=grade)) +
geom_tile()
ggplot2
offers a very wide variety of ways to adjust a plot. For categorical aesthetics, usually the first step is ensuring the relevant column is a factor with a meaningful level order.
y_order <- sort(unique(bigtab$test), decreasing=T) # y axis plots from bottom to top, so reverse
bigtab$test <- factor(bigtab$test, levels=y_order)
x_order <- unique(bigtab$file)
bigtab$file <- factor(bigtab$file, levels=x_order)
# Only necessary if not continuing from previous lesson on programming!
color_order <- c("FAIL", "WARN", "PASS")
bigtab$grade <- factor(bigtab$grade, levels=color_order)
myplot <- ggplot(bigtab, aes(x=file, y=test, fill=grade)) +
geom_tile(color="black", size=0.5) + # Black border on tiles
scale_fill_manual( # Colors, as color hex codes
values=c("#ee0000","#ffee00","#00aa00")) +
labs(x="", y="", fill="") + # Remove axis labels
coord_fixed() + # Square tiles
theme_minimal() + # Minimal theme, no grey background
theme(panel.grid=element_blank(), # No underlying grid lines
axis.text.x=element_text( # Vertical text on x axis
angle=90,vjust=0.5,hjust=0))
myplot