For loops

We are not covering much about the programming side of R today. However for loops are useful even for interactive work.

If you intend to take your knowledge of R further, you should also investigate writing your own functions, and if statements.

For loops are the way we tell a computer to perform a repetitive task. Under the hood, many of the functions we have been using today use for loops.

If we can’t find a ready made function to do what we want, we may need to write our own for loop.

Preliminary: blocks of code

Suppose we want to print each word in a sentence, and for some reason we want to do this all at once. One way is to use six calls to print:

sentence <- c("Let", "the", "computer", "do", "the", "work")

{
  print(sentence[1])
  print(sentence[2])
  print(sentence[3])
  print(sentence[4])
  print(sentence[5])
  print(sentence[6])
}
## [1] "Let"
## [1] "the"
## [1] "computer"
## [1] "do"
## [1] "the"
## [1] "work"

R treats the code between the { and the } as a single “block”. It reads it in as a single unit, and then executes each line in turn with no further interaction.

For loops

What we did above was quite repetitive. It’s always better when the computer does repetitive work for us.

Here’s a better approach, using a for loop:

for(word in sentence) {
    print(word)
}
## [1] "Let"
## [1] "the"
## [1] "computer"
## [1] "do"
## [1] "the"
## [1] "work"

The general form of a loop is:

for(variable in vector) {
  do things with variable
}

We can name the loop variable anything we like (with a few restrictions, e.g. the name of the variable cannot start with a digit). in is part of the for syntax. Note that the body of the loop is enclosed in curly braces { }. For a single-line loop body, as here, the braces aren’t needed, but it is good practice to include them as we did.

Accumulating a result

Here’s another loop that repeatedly updates a variable:

len <- 0
vowels <- c("a", "e", "i", "o", "u")
for(v in vowels) {
  len <- len + 1
}
# Number of vowels
len
## [1] 5

It’s worth tracing the execution of this little program step by step. Since there are five elements in the vector vowels, the statement inside the loop will be executed five times. The first time around, len is zero (the value assigned to it on line 1) and v is "a". The statement adds 1 to the old value of len, producing 1, and updates len to refer to that new value. The next time around, v is "e" and len is 1, so len is updated to be 2. After three more updates, len is 5; since there is nothing left in the vector vowels for R to process, the loop finishes.

By inserting calls to print or cat in the code, we can see that this is exactly what has happened:

len <- 0
vowels <- c("a", "e", "i", "o", "u")
for(v in vowels) {
  len <- len + 1
  cat("v is", v ,"and len is now", len, "\n")
}
## v is a and len is now 1 
## v is e and len is now 2 
## v is i and len is now 3 
## v is o and len is now 4 
## v is u and len is now 5

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:

letter <- "z"
for(letter in c("a", "b", "c")) {
  print(letter)
}
## [1] "a"
## [1] "b"
## [1] "c"
# after the loop, letter is
letter
## [1] "c"

Challenge - Using loops

  1. Recall that we can use : to create a sequence of numbers.
1:5
## [1] 1 2 3 4 5

Suppose the variable n has been set with some value, and we want to print out the numbers up to that value, one per line.

n <- 7

Write a for loop to achieve this.

  1. Suppose we have a vector called vec and we want to find the total of all the numbers in vec.
vec <- c(7, 30, 300, 1000)

Write a for loop to calculate this total.

(R has a built-in function called sum that does this for you. Please don’t use it for this exercise.)

  1. Multiplication.

Suppose variables a and b have been set to whole numbers:

a <- 6
b <- 7

Use a for loop to calculate a times b. Do not use *.

Hint: In challenge 1 you found a way to do something n times!

Try your loop with various different values in a and b.

Loading a set of files

Let’s look at a more practical example of a for loop, following the pattern of accumulating a result that we’ve just seen. We have been given some demographic data from the Gapminder project, but unfortunately it is split into individual years intro-r/gapminder-NNNN.csv. We would like to load all of these CSV files into a single data frame.

read.csv can only read one file at a time, so we will need to call read.csv many times.

I will be using a couple of useful functions we haven’t seen before, seq and paste0. As usual, you can look these up in the help system with ?seq and ?paste0.

years <- seq(1952, 2007, 5)
years
##  [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
# We could also have written
# years <- c(1952, 1957, <etc> )

We will make filenames with paste0 which pastes several values together as a character string.

paste0("intro-r/gapminder-", 1952, ".csv")
## [1] "intro-r/gapminder-1952.csv"

We will loop over all of the years, and build up a data frame. We start with NULL, which is a special value in R meaning nothing at all. We add to this with rbind, which concatenates the rows of data frames.

gap <- NULL
for(year in years) {
    filename <- paste0("intro-r/gapminder-", year, ".csv")
    gap_year <- read.csv(filename)
    gap <- rbind(gap, gap_year)
}

Again, print or cat can be used to check everything is working correctly.

gap <- NULL
for(year in years) {
    filename <- paste0("intro-r/gapminder-", year, ".csv")

    cat("Loading", filename, "\n")

    gap_year <- read.csv(filename)

    cat("Read", nrow(gap_year), "rows\n")

    gap <- rbind(gap, gap_year)

    cat("Now have", nrow(gap), "rows in gap\n")
}
## Loading intro-r/gapminder-1952.csv 
## Read 142 rows
## Now have 142 rows in gap
## Loading intro-r/gapminder-1957.csv 
## Read 142 rows
## Now have 284 rows in gap
## Loading intro-r/gapminder-1962.csv 
## Read 142 rows
## Now have 426 rows in gap
## Loading intro-r/gapminder-1967.csv 
## Read 142 rows
## Now have 568 rows in gap
## Loading intro-r/gapminder-1972.csv 
## Read 142 rows
## Now have 710 rows in gap
## Loading intro-r/gapminder-1977.csv 
## Read 142 rows
## Now have 852 rows in gap
## Loading intro-r/gapminder-1982.csv 
## Read 142 rows
## Now have 994 rows in gap
## Loading intro-r/gapminder-1987.csv 
## Read 142 rows
## Now have 1136 rows in gap
## Loading intro-r/gapminder-1992.csv 
## Read 142 rows
## Now have 1278 rows in gap
## Loading intro-r/gapminder-1997.csv 
## Read 142 rows
## Now have 1420 rows in gap
## Loading intro-r/gapminder-2002.csv 
## Read 142 rows
## Now have 1562 rows in gap
## Loading intro-r/gapminder-2007.csv 
## Read 142 rows
## Now have 1704 rows in gap
nrow(gap)
## [1] 1704
head(gap)
##       country continent year lifeExp      pop  gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333   779.4453
## 2     Albania    Europe 1952  55.230  1282697  1601.0561
## 3     Algeria    Africa 1952  43.077  9279525  2449.0082
## 4      Angola    Africa 1952  30.015  4232095  3520.6103
## 5   Argentina  Americas 1952  62.485 17876956  5911.3151
## 6   Australia   Oceania 1952  69.120  8691212 10039.5956

When to use for loops

Many of the functions and operators we have been using are implemented using for loops, so often in R we are able to use these rather than directly writing a for loop. However when we need to do something complicated, for loops are there for us. Some real world reasons you might use a for loop:

  • To create a collection of similar plots.

  • To load and process a collection of files, all in the same way.

  • To run a program outside of R, such as a read aligner, with each of a collection of files as input. Programs outside of R can be run using system.

  • To perform resampling such as a permutation test or a bootstrap, to assure yourself that some result you have calculated is not simply due to chance.

Home