This lesson covers packages primarily by Hadley Wickham for tidying data and then working with it in tidy form, collectively known as the “tidyverse”.

The packages we are using in this lesson are all from CRAN, so we can install them with install.packages. Don’t run this if you are using our biotraining server, the packages are already installed!

# install.packages(c(
#     "tidyverse",
#     "viridis",
#     "broom"
# ))
library(tidyverse) # Load all "tidyverse" libraries.
# OR
# library(readr)   # Read tabular data.
# library(tidyr)   # Data frame tidying functions.
# library(dplyr)   # General data frame manipulation.
# library(ggplot2) # Flexible plotting.

library(viridis)   # Viridis color scale.

These packages usually have useful documentation in the form of “vignettes”. These are readable on the CRAN website, or within R:

vignette()
vignette(package="dplyr")
vignette("dplyr", package="dplyr")

Let’s continue our examination of the FastQC output. If you’re starting fresh for this lesson, you can load the necessary data frame with:

bigtab <- read_csv("r-more-files/fastqc.csv")
## Parsed with column specification:
## cols(
##   grade = col_character(),
##   test = col_character(),
##   file = col_character()
## )

ggplot2 revisited

We saw ggplot2 in the introductory R day. Recall that we could assign columns of a data frame to aesthetics–x and y position, color, etc–and then add “geom”s to draw the data.

With ggplot2 we can easily view the whole data set.

ggplot(bigtab, aes(x=file,y=test,color=grade)) + 
    geom_point()

With categorical data on the x and y axes, a better geom to use is geom_tile.

ggplot(bigtab, aes(x=file,y=test,fill=grade)) + 
    geom_tile()

Publication quality images

ggplot2 offers a very wide variety of ways to adjust a plot. For categorical aesthetics, usually the first step is ensuring the relevant column is a factor with a meaningful level order.

y_order <- sort(unique(bigtab$test), decreasing=T)  # y axis plots from bottom to top, so reverse
bigtab$test <- factor(bigtab$test, levels=y_order)

x_order <- unique(bigtab$file)
bigtab$file <- factor(bigtab$file, levels=x_order)

# Only necessary if not continuing from previous lesson on programming!
color_order <- c("FAIL", "WARN", "PASS")
bigtab$grade <- factor(bigtab$grade, levels=color_order)

myplot <- ggplot(bigtab, aes(x=file, y=test, fill=grade)) + 
    geom_tile(color="black", size=0.5) +           # Black border on tiles
    scale_fill_manual(                             # Colors, as color hex codes
        values=c("#ee0000","#ffee00","#00aa00")) +
    labs(x="", y="", fill="") +                    # Remove axis labels
    coord_fixed() +                                # Square tiles
    theme_minimal() +                              # Minimal theme, no grey background
    theme(panel.grid=element_blank(),              # No underlying grid lines
          axis.text.x=element_text(                # Vertical text on x axis
              angle=90,vjust=0.5,hjust=0))              
myplot