1 Starting out in R
R is both a programming language and an interactive environment for data exploration and statistics. Today we will be concentrating on R as an interactive environment.
Working with R is primarily text-based. The basic mode of use for R is that the user types in a command in the R language and presses enter, and then R computes and displays the result.
We will be working in RStudio. The easiest way to get started is to go to RStudio Cloud and create a new project. Monash staff and students can log in using their Monash Google account.
The main way of working with R is the console, where you enter commands and view results. RStudio surrounds this with various conveniences. In addition to the console panel, RStudio provides panels containing:
- A text editor, where R commands can be recorded for future reference.
- A history of commands that have been typed on the console.
- An “environment” pane with a list of variables, which contain values that R has been told to save from previous commands.
- A file manager.
- Help on the functions available in R.
- A panel to show plots.
Open RStudio, click on the “Console” pane, type 1+1
and press enter. R displays the result of the calculation. In this document, we will show such an interaction with R as below.
## [1] 2
+
is called an operator. R has the operators you would expect for for basic mathematics: +
-
*
/
^
. It also has operators that do more obscure things.
*
has higher precedence than +
. We can use brackets if necessary ( )
. Try 1+2*3
and (1+2)*3
.
Spaces can be used to make code easier to read.
We can compare with == < > <= >=
. This produces a logical value, TRUE
or FALSE
. Note the double equals, ==
, for equality comparison.
## [1] TRUE
There are also character strings such as "string"
. A character string must be surrounded by either single or double quotes.
1.1 Variables
A variable is a name for a value. We can create a new variable by assigning a value to it using <-
.
RStudio helpfully shows us the variable in the “Environment” pane. We can also print it by typing the name of the variable and hitting enter. In general, R will print to the console any object returned by a function or operation unless we assign it to a variable.
## [1] 5
Examples of valid variables names: hello
, subject_id
, subject.ID
, x42
. Spaces aren’t ok inside variable names. Dots (.
) are ok in R, unlike in many other languages. Numbers are ok, except as the first character. Punctuation is not allowed, with two exceptions: _
and .
.
We can do arithmetic with the variable:
## [1] 25
and even save the result in another variable:
We can also change a variable’s value by assigning it a new value:
## [1] 10
## [1] 25
Notice that the value of area
we calculated earlier hasn’t been updated. Assigning a new value to one variable does not change the values of other variables. This is different to a spreadsheet, but usual for programming languages.
1.2 Saving code in an R script
Once we’ve created a few variables, it becomes important to record how they were calculated so we can reproduce them later.
The usual workflow is to save your code in an R script (“.R file”). Go to “File/New File/R Script” to create a new R script. Code in your R script can be sent to the console by selecting it or placing the cursor on the correct line, and then pressing Control-Enter (Command-Enter on a Mac).
Tip
Add comments to code, using lines starting with the #
character. This makes it easier for others to follow what the code is doing (and also for us the next time we come back to it).
1.3 Vectors
A vector of numbers is a collection of numbers. “Vector” means different things in different fields (mathematics, geometry, biology), but in R it is a fancy name for a collection of numbers. We call the individual numbers elements of the vector.
We can make vectors with c( )
, for example c(1,2,3)
. c
means “combine”. R is obsesssed with vectors, in R even single numbers are vectors of length one. Many things that can be done with a single number can also be done with a vector. For example arithmetic can be done on vectors as it can be on single numbers.
## [1] 10 20 30 40 50
## [1] 11 21 31 41 51
## [1] 20 40 60 80 100
## [1] 5
## [1] 60 10 20 30 40 50
## [1] 10 20 30 40 50 10 20 30 40 50
When we talk about the length of a vector, we are talking about the number of numbers in the vector.
1.4 Types of vector
We will also encounter vectors of character strings, for example "hello"
or c("hello","world")
. Also we will encounter “logical” vectors, which contain TRUE
and FALSE
values. R also has “factors”, which are categorical vectors, and behave much like character vectors (think the factors in an experiment).
Challenge: mixing types
Sometimes the best way to understand R is to try some examples and see what it does.
What happens when you try to make a vector containing different types, using c( )
? Make a vector with some numbers, and some words (eg. character strings like "test"
, or "hello"
).
Why does the output show the numbers surrounded by quotes " "
like character strings are?
Because vectors can only contain one type of thing, R chooses a lowest common denominator type of vector, a type that can contain everything we are trying to put in it. A different language might stop with an error, but R tries to soldier on as best it can. A number can be represented as a character string, but a character string can not be represented as a number, so when we try to put both in the same vector R converts everything to a character string.
1.5 Indexing vectors
Access elements of a vector with [ ]
, for example myvec[1]
to get the first element. You can also assign to a specific element of a vector.
## [1] 10
## [1] 20
## [1] 10 5 30 40 50
Can we use a vector to index another vector? Yes!
## [1] 40 30 5
We could equivalently have written:
## [1] 40 30 5
Challenge: indexing
We can create and index character vectors as well. A cafe is using R to create their menu.
What does
items[-3]
produce? Based on what you find, use indexing to create a version ofitems
without"spam"
.Use indexing to create a vector containing spam, eggs, sausage, spam, and spam.
Add a new item, “lobster”, to
items
.
1.6 Sequences
Another way to create a vector is with :
:
## [1] 1 2 3 4 5 6 7 8 9 10
This can be useful when combined with indexing:
## [1] "spam" "eggs" "beans" "bacon"
Sequences are useful for other things, such as a starting point for calculations:
## [1] 1 4 9 16 25 36 49 64 81 100
1.7 Functions
Functions are the things that do all the work for us in R: calculate, manipulate data, read and write to files, produce plots. R has many built in functions and we will also be loading more specialized functions from “packages”.
We’ve already seen several functions: c( )
, length( )
, and plot( )
. Let’s now have a look at sum( )
.
## [1] 135
We called the function sum
with the argument myvec
, and it returned the value 135. We can get help on how to use sum
with:
?sum
Some functions take more than one argument. Let’s look at the function rep
, which means “repeat”, and which can take a variety of different arguments. In the simplest case, it takes a value and the number of times to repeat that value.
## [1] 42 42 42 42 42 42 42 42 42 42
As with many functions in R—which is obsessed with vectors—the thing to be repeated can be a vector with multiple elements.
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
So far we have used positional arguments, where R determines which argument is which by the order in which they are given. We can also give arguments by name. For example, the above is equivalent to
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Arguments can have default values, and a function may have many different possible arguments that make it do obscure things. For example, rep
can also take an argument each=
. It’s typical for a function to be invoked with some number of positional arguments, which are always given, plus some less commonly used arguments, typically given by name.
## [1] 1 1 1 2 2 2 3 3 3
## [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1
## [39] 1 2 2 2 3 3 3
Challenge: using functions
Use
sum
to sum from 1 to 10,000.Look at the documentation for the
seq
function. What doesseq
do? Give an example of usingseq
with either theby
orlength.out
argument.