# Programming with R

## Basic Operation

• `# this is a comment in R`
• Use `x <- 3` to assign a value, `3`, to a variable, `x`
• R counts from 1, unlike many other programming languages (e.g., Python)
• `c(value1, value2, value3)` creates a vector
• `length(myvec)` returns the number of elements contained in the variable `myvec`, if it is a vector.
• `nrow(mymat)` and `ncol(mymat)` give the number of rows or columns in a matrix or data frame.
• `myvec[i]` selects the i’th element from the variable `myvec`, if it is a vector.
• `mymat[i,j]` selects the cell at the i’th row and j’th column of `mymat`, if `mymat` is a matrix or data frame.
• `mylist\$thing` if `mylist` is a list selects the element named “thing” in the list. Also works for columns of data frames.

• `print(thing)` print the variable `thing` to the console.
• `cat(thing1, thing2, "\n")` another more flexible way to print things. `"\n"` is the “newline” character, it means start a new line of output.

• `?name` get help on the function called `name`

## Data types and structures

What you can do with an object largely depends on what “type” it is.

Check what type of thing you are dealing with: `class(x)`

• “numeric”, “character”, “factor”, “integer”, “logical” - Vectors, containing a list of things all of the same basic type. In R, a single value is a vector with length 1.

• “matrix” - Matrices, a tabular 2D data structure, elements are all of the same basic type.

• “list” - A list of things. Unlike vectors, lists can contain different types of things, and can contain other container data structures. If a function needs to return several different things of different types, a list is a great container to put them all in.

• “data.frame” - Another tabular 2D data structure. A sequence of records (rows), each having certain attributes (columns). Each column is all the same type of thing, but different columns can contain different types of thing. You could think of a data.frame as a list of column vectors.

For more information on the structure of a thing: `str(x)`

To insist that a thing have a certain type: `as.vector(x)` `as.matrix(x)` `as.data.frame(x)` `as.numeric(x)` `as.character(x)`

Lists (and vectors) may have names for elements: `names(mylist)`

Matrices and data frames may have row names and column names: `rownames(mymat)` `colnames(mymat)`

## Control Flow

• create a `for` loop to process elements in a collection one at a time

``````for (i in 1:5) {
print(i)
}``````

This will print:

``````1
2
3
4
5``````
• Create a contitional using `if`, `else if`, and `else`

``````if (x > 0) {
print("value is positive")
} else if (x < 0) {
print("value is negative")
} else{
print("value is neither positive nor negative")
}``````
• equal: Use `==` to test for equality
• `3 == 3`, will return `TRUE`,
• `'apple' == 'orange'` will return `FALSE`
• not-equal: Use `!=` to test for non-equality
• `3 != 3` will return `FALSE`
• `'apple' != 'orange'` will return `TRUE`
• other comparisons: `<`, `>`, `<=`, `>=`

• and: `X & Y` is `TRUE` if both X and Y are true

• or: `X | Y` is `TRUE` if either X or Y, or both are true

• not: `!X` is `TRUE` if X is `FALSE` and vice versa

## Functions

• Defining a function:

``````is_positive <- function(value) {
if (value > 0) {
return(TRUE)
} else {
return(FALSE)
}
}``````
• Use `return` to return a value. (Alternatively, in R, the last executed line of a function is automatically returned.)

• Specifying a default value for a function argrment

``````increment_me <- function(value_to_increment, value_to_increment_by = 1) {
return(value_to_increment + value_to_increment_by)
}``````

`increment_me(4)` will return 5

`intrement_me(4, 6)` will return 10

• Call a function by using `function_name(function_arguments)`

• The apply family of functions are sometimes useful as an alternative to `for` loops:

`apply()` `sapply()` `lapply()` `mapply()`

`apply(dat, MARGIN = 2, mean)` will return the average (`mean`) of each column in `dat`

## .R files

• Run all the code in a .R file with

`source("filename.R")`

• From bash, run a .R file with

`Rscript filename.R`

## Packages

• Install package by using `install.packages("package-name")`
• Update packages by using `update.packages("package-name")`
• Load packages by using `library("package-name")`

## Glossary

argument
A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
call stack
A data structure inside a running program that keeps track of active function calls. Each call’s variables are stored in a stack frame; a new stack frame is put on top of the stack for each call, and discarded when the call is finished.
comma-separated values (CSV)
A common textual representation for tables in which the values in each row are separated by commas.
comment
A remark in a program that is intended to help human readers understand what is going on, but is ignored by the computer. Comments in Python, R, and the Unix shell start with a `#` character and run to the end of the line; comments in SQL start with `--`, and other languages have other conventions.
conditional statement
A statement in a program that might or might not be executed depending on whether a test is true or false. In R, these are `if` statements.
documentation
Human-language text written to explain what software does, how it works, or how to use it.
for loop
A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
function body
The statements that are executed inside a function.
function call
A use of a function in another piece of software, such as the R console or another function.
index
In mathematics, a subscript that specifies the location of a single value in a collection, such as a single pixel in an image. In R, data is indexed using square brackets `[ ]`.
loop variable
The variable that keeps track of the progress of the loop.
notional machine
An abstraction of a computer used to think about what it can and will do. Your “mental model”.
parameter
A variable named in the function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
pipe
A connection from the output of one program to the input of another. When two or more programs are connected in this way, they are called a “pipeline”.
return statement
A statement that causes a function to stop executing and return a value to its caller immediately.
shape (of an array)
An array’s dimensions, represented as a vector. For example, a 5×3 array’s shape is `(5,3)`. In R we get the shape of a matrix or data frame with `dim`, or `nrow` and `ncol`, and the length of a vector with `length`.
silent failure
Failing without producing any warning messages. Silent failures are hard to detect and debug.
slice
A regular subsequence of a larger sequence, such as the first five elements or every second element.
stack frame
A data structure that provides storage for a function’s local variables. Each time a function is called, a new stack frame is created and put on the top of the call stack. When the function returns, the stack frame is discarded.
standard input (stdin)
A process’s default input stream. In interactive command-line applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process.
standard output (stdout)
A process’s default output stream. In interactive command-line applications, data sent to standard output is displayed on the screen; in a pipe, it is passed to the standard input of the next process.
string
Short for “character string”, a sequence of zero or more characters. In R, this is the “character” data type.
while loop
A loop that keeps executing as long as some condition is true. See also: for loop.

## Next steps

• Look at the `ggplot2` package for high quality graphics. The syntax is a little odd but the online documentation has many examples.

• Bioconductor is a collection of bioinformatics related packages, including the popular `limma` and `edgeR` packages for RNA-Seq analysis developed at the Walter and Eliza Hall Institute.

Stackoverflow-style sites are great for getting help:

Online tutorials:

Books:

• “The R Book” by Michael J. Crawley for general reference.

• “Linear Models with R” and “Extending the Linear Model with R” by Julian J. Faraway cover linear models, with many practical examples. Linear models are a central part of R, many familiar statistical tests can be expressed in terms of a linear model.

Other languages:

• Python is another popular language in the scientific community.