For loops
We are not covering much about the programming side of R today. However for
loops are useful even for interactive work.
If you intend to take your knowledge of R further, you should also investigate writing your own function
s, and if
statements.
For loops are the way we tell a computer to perform a repetitive task. Under the hood, many of the functions we have been using today use for loops.
If we can’t find a ready made function to do what we want, we may need to write our own for loop.
Preliminary: blocks of code
Suppose we want to print each word in a sentence, and for some reason we want to do this all at once. One way is to use six calls to print
:
sentence <- c("Let", "the", "computer", "do", "the", "work")
{
print(sentence[1])
print(sentence[2])
print(sentence[3])
print(sentence[4])
print(sentence[5])
print(sentence[6])
}
## [1] "Let"
## [1] "the"
## [1] "computer"
## [1] "do"
## [1] "the"
## [1] "work"
R treats the code between the {
and the }
as a single “block”. It reads it in as a single unit, and then executes each line in turn with no further interaction.
For loops
What we did above was quite repetitive. It’s always better when the computer does repetitive work for us.
Here’s a better approach, using a for loop:
for(word in sentence) {
print(word)
}
## [1] "Let"
## [1] "the"
## [1] "computer"
## [1] "do"
## [1] "the"
## [1] "work"
The general form of a loop is:
for(variable in vector) {
do things with variable
}
We can name the loop variable anything we like (with a few restrictions, e.g. the name of the variable cannot start with a digit). in
is part of the for
syntax. Note that the body of the loop is enclosed in curly braces { }
. For a single-line loop body, as here, the braces aren’t needed, but it is good practice to include them as we did.
Accumulating a result
Here’s another loop that repeatedly updates a variable:
len <- 0
vowels <- c("a", "e", "i", "o", "u")
for(v in vowels) {
len <- len + 1
}
# Number of vowels
len
## [1] 5
It’s worth tracing the execution of this little program step by step. Since there are five elements in the vector vowels
, the statement inside the loop will be executed five times. The first time around, len
is zero (the value assigned to it on line 1) and v
is "a"
. The statement adds 1 to the old value of len
, producing 1, and updates len
to refer to that new value. The next time around, v
is "e"
and len
is 1, so len
is updated to be 2. After three more updates, len
is 5; since there is nothing left in the vector vowels
for R to process, the loop finishes.
By inserting calls to print
or cat
in the code, we can see that this is exactly what has happened:
len <- 0
vowels <- c("a", "e", "i", "o", "u")
for(v in vowels) {
len <- len + 1
cat("v is", v ,"and len is now", len, "\n")
}
## v is a and len is now 1
## v is e and len is now 2
## v is i and len is now 3
## v is o and len is now 4
## v is u and len is now 5
Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:
letter <- "z"
for(letter in c("a", "b", "c")) {
print(letter)
}
## [1] "a"
## [1] "b"
## [1] "c"
# after the loop, letter is
letter
## [1] "c"
Challenge - Using loops
- Recall that we can use
:
to create a sequence of numbers.
1:5
## [1] 1 2 3 4 5
Suppose the variable n
has been set with some value, and we want to print out the numbers up to that value, one per line.
n <- 7
Write a for loop to achieve this.
- Suppose we have a vector called
vec
and we want to find the total of all the numbers invec
.
vec <- c(7, 30, 300, 1000)
Write a for loop to calculate this total.
(R has a built-in function called sum
that does this for you. Please don’t use it for this exercise.)
- Multiplication.
Suppose variables a
and b
have been set to whole numbers:
a <- 6
b <- 7
Use a for loop to calculate a
times b
. Do not use *
.
Hint: In challenge 1 you found a way to do something n times!
Try your loop with various different values in a
and b
.
Loading a set of files
Let’s look at a more practical example of a for loop, following the pattern of accumulating a result that we’ve just seen. We have been given some demographic data from the Gapminder project, but unfortunately it is split into individual years intro-r/gapminder-NNNN.csv
. We would like to load all of these CSV files into a single data frame.
read.csv
can only read one file at a time, so we will need to call read.csv
many times.
I will be using a couple of useful functions we haven’t seen before, seq
and paste0
. As usual, you can look these up in the help system with ?seq
and ?paste0
.
years <- seq(1952, 2007, 5)
years
## [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
# We could also have written
# years <- c(1952, 1957, <etc> )
We will make filenames with paste0
which pastes several values together as a character string.
paste0("intro-r/gapminder-", 1952, ".csv")
## [1] "intro-r/gapminder-1952.csv"
We will loop over all of the years, and build up a data frame. We start with NULL
, which is a special value in R meaning nothing at all. We add to this with rbind
, which concatenates the rows of data frames.
gap <- NULL
for(year in years) {
filename <- paste0("intro-r/gapminder-", year, ".csv")
gap_year <- read.csv(filename)
gap <- rbind(gap, gap_year)
}
Again, print
or cat
can be used to check everything is working correctly.
gap <- NULL
for(year in years) {
filename <- paste0("intro-r/gapminder-", year, ".csv")
cat("Loading", filename, "\n")
gap_year <- read.csv(filename)
cat("Read", nrow(gap_year), "rows\n")
gap <- rbind(gap, gap_year)
cat("Now have", nrow(gap), "rows in gap\n")
}
## Loading intro-r/gapminder-1952.csv
## Read 142 rows
## Now have 142 rows in gap
## Loading intro-r/gapminder-1957.csv
## Read 142 rows
## Now have 284 rows in gap
## Loading intro-r/gapminder-1962.csv
## Read 142 rows
## Now have 426 rows in gap
## Loading intro-r/gapminder-1967.csv
## Read 142 rows
## Now have 568 rows in gap
## Loading intro-r/gapminder-1972.csv
## Read 142 rows
## Now have 710 rows in gap
## Loading intro-r/gapminder-1977.csv
## Read 142 rows
## Now have 852 rows in gap
## Loading intro-r/gapminder-1982.csv
## Read 142 rows
## Now have 994 rows in gap
## Loading intro-r/gapminder-1987.csv
## Read 142 rows
## Now have 1136 rows in gap
## Loading intro-r/gapminder-1992.csv
## Read 142 rows
## Now have 1278 rows in gap
## Loading intro-r/gapminder-1997.csv
## Read 142 rows
## Now have 1420 rows in gap
## Loading intro-r/gapminder-2002.csv
## Read 142 rows
## Now have 1562 rows in gap
## Loading intro-r/gapminder-2007.csv
## Read 142 rows
## Now have 1704 rows in gap
nrow(gap)
## [1] 1704
head(gap)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Albania Europe 1952 55.230 1282697 1601.0561
## 3 Algeria Africa 1952 43.077 9279525 2449.0082
## 4 Angola Africa 1952 30.015 4232095 3520.6103
## 5 Argentina Americas 1952 62.485 17876956 5911.3151
## 6 Australia Oceania 1952 69.120 8691212 10039.5956
When to use for loops
Many of the functions and operators we have been using are implemented using for loops, so often in R we are able to use these rather than directly writing a for loop. However when we need to do something complicated, for loops are there for us. Some real world reasons you might use a for loop:
To create a collection of similar plots.
To load and process a collection of files, all in the same way.
To run a program outside of R, such as a read aligner, with each of a collection of files as input. Programs outside of R can be run using
system
.To perform resampling such as a permutation test or a bootstrap, to assure yourself that some result you have calculated is not simply due to chance.