Automation with makefiles

Introduction
rmarkdown package
- rmarkdown from the shell
Makefiles
Python
Discussion

Introduction

This section is aiming for a happy medium of flexibility for producing training material. The approach is inspired by how Software Carpentry structure their git repositories, however by using the rmarkdown package we can keep things simple. If you want to go simpler still, rmarkdown also has a render_site function which can build a whole website at once, which might be sufficient to your needs.

We will first look at the rmarkdown package, then how to use this package from a Makefile to automate building of .html files from .Rmd files.

rmarkdown package

Under the hood, when you press the “Knit” button in RStudio the rmarkdown package’s render function is used. We can also use this package from the R console.

rmarkdown::render("foo.Rmd")

The rmarkdown package in turn uses the knitr package to actually run R code, and external program pandoc to create HTML. When creating a PDF, pandoc in turn uses pdflatex.

rmarkdown from the shell

We can run this from a shell using:

Rscript -e 'rmarkdown::render("foo.Rmd")'

Makefiles

A programmer’s natural instinct will be to place this in a “Makefile” so that the make command can make the HTML document whenever we need to.

Rules

Create a file called Makefile (in RStudio, select “File/New File/Text File”, and save it with the name Makefile). Enter the following in the file:

foo.html : foo.Rmd
    Rscript -e 'rmarkdown::render("foo.Rmd")'

Important: The indentation in a Makefile is important, and it must be a tab character. You can check this from R with the command readLines("Makefile") – you should see the indent shown as a \t escape sequence.

We have described how to create the file foo.html from the input file foo.Rmd. Multiple input files can be given if necessary, separated by spaces.

Now in a shell, type:

make foo.html

Nothing happens, as the .html file was created after out last edit of the .Rmd. Make thinks the file is up to date, and does not run the command.

Edit the .Rmd file, and again run make foo.html in the shell. This time the command should run.

If you just type make on its own, the first rule in the Makefile is used.

Exercise

Add some code to foo.Rmd that deliberately contains an error. What happens when you try to make it?

“Pattern” rules

Create a second R Markdown file, bar.Rmd. We could add a second rule to the Makefile to convert this into HTML, but this becomes repetitive. A better option is the use a “pattern rule”:

%.html : %.Rmd
    Rscript -e 'rmarkdown::render("$<")'

Note here:

% symbol in the input and output filenames indicates this is a pattern rule.
In the command we’ve use the special variable $< to insert the input filename into the command. More on variables later.

Now to render our second file we can type:

make bar.html

A “phony” rule to make all files

Consider this Makefile:

all : foo.html bar.html
    echo All files are now up to date

%.html : %.Rmd
    Rscript -e 'rmarkdown::render("$<")'

The all rule depends on some files, so make will ensure these are up to date.
The command echo in the rule just prints out a message (this can even be left out entirely). make does not check that a file called all was actually created.
The all rule is the first rule in the Makefile, so it is the default rule that is run when you type make.

Try editing one of your Rmarkdown files and running make again. Notice only the command to build that file’s corresponding HTML file is run.

Cleaning up

It’s common practice to also have a rule to clean away all the files that have been created. This gives a way to test everything works from a blank slate.

all : foo.html bar.html
    echo All files are now up to date

clean : 
    rm -f foo.html bar.html 

%.html : %.Rmd
    Rscript -e 'rmarkdown::render("$<")'

Now we can type make clean to clean up, and make clean ; make will test that everything can be built without errors.

Variables

A final element of a Makefile is use of variables. With this, you should be able to understand most of what is going on in Makefiles you encounter in the wild.

In our last Makefile, we typed foo.html bar.html twice. It would be nice to avoid this repetition. This can be achieved with a variable:

HTML_FILES=foo.html bar.html

all : $(HTML_FILES)
    echo All files are now up to date

clean : 
    rm -f $(HTML_FILES) 

%.html : %.Rmd
    Rscript -e 'rmarkdown::render("$<")'

Exercise

Create a file index.Rmd, with hyperlinks to foo.html and bar.html. Recall that hyperlinks can be created in markdown with [link text](destination.html).

Adjust your Makefile so this file is also built into an HTML file.

Now you have a website. To test your website, from the “Files” pane in RStudio, click on index.html and then “View in Web Browser”.

You could make this public for example by using GitHub Pages, or by running your own webserver (such as Apache or nginx).

Python

The discussion above has focussed on R Markdown. In Python, a similar role is played by Jupyter notebooks. These are created in a web-based editor, which can be started by typing jupyter notebook in the shell. It is also possible to run a multi-user Jupyter Hub on a server. Notebooks are stored in JSON format, and have extension .ipynb.

Suppose you have created a notebook called my_notebook.ipynb. Then the following shell command can be used to run the code in it and convert the notebook to HTML:

jupyter nbconvert --execute --to html my_notebook.ipynb

One difference between .Rmd and .ipynb files is that .ipynb files also contain cached output of code chunks. Specifying the --execute option here ensures all of the code in the notebook is run afresh. Using this command in a Makefile would ensure the build will halt with an error if there are any problems with the code.

Discussion

make clean ; make can be used to test all of the code in our tutorial. We will see an error message and the process will abort immediately if anything goes wrong as the HTML files are built. This is also a handy way to test that a training server has all the necessary R or Python packages installed!

You might want to add some further rules, for example:

Creating a ZIP file of files used in the tutorial for easy download.
Creating a PDF verions of the notes, by creating markdown files using either rmarkdown or knitr and then using pandoc to create a PDF using LaTeX.
Running scripts to save results of lengthy calculations, so that rendering Rmd files remains fast.

If someone wants to adapt our tutorial, they can easily do so. Having made the changes they want, the HTML files can be re-built with a single command. Give your notes an appropriate licence, such as a Creative Commons license, so that people know they can do this without having to ask your permission.

Setting up tutorial notes in this way gives us desirable features of open science. We have left nothing out or the code will not run. They are easily reproducable and sharable. They are written plain text formats that anyone can edit, and built with tools that are all freely available. They can serve as a base that anyone who wants to can build on.

Home