Basics

R is a programming language

  • You might not use R for programming, but it will help if you understand roughly how it works inside
  • What is a program?
    • A program is a collection of simple instructions (read, write, send, manipulate, calculate) that should be executed in a logical order
    • A program’s flow is controlled by control structures (instructions about what instruction to do next)

Calculation

R works as a (very fancy) calculator.

2+2
2439*41
259*3861
2+sqrt(92)

What about:

7+3*9
32/4*4

Figure out what the second one does, and then explain how you would write it more clearly.

Assigning values

R can remember numbers (and other things) and give them back to you. This is the foundation of being able to write programs that do many things in a specified order.

x <- 23
print(x)
x + 17        ## no assignment!
print(x)
y <- x + 17
y             ## symbol by itself is shorthand for "print"

Objects


R keeps track of a variety of objects. You can see all of the objects you’ve defined by typing ls() (short for “list”: the parentheses mean that you are calling a function; more on this later), or looking in the “Environment” tab in RStudio.

Rstudio view

You can also switch to the “Grid View” in the Environment tab in RStudio.

Values

In addition to numbers, R has other kinds of values. The main ones we’re interested in are character, numeric and logical (ie., TRUE or FALSE). str() (short for “structure”) is one way to figure out what your R object is.

num <- 3
str(num) 
 
char <- "Hello, class?"
str(char)
 
logic <- TRUE ## No quotes
str(logic)

Other types

At the simplest level, R objects are divided into types. Other than the ones above, we’re mostly interested in vectors, lists and functions.

Vectors

A vector consists of zero or more values of the same storage mode:

words <- c("Mary", "had", "a", "little", "lamb")
# c() puts elements together into a vector
str(words)

v <- 1:5  ## 'm:n' creates a sequence from m to n
str(v)

Vector math

R does math on vectors directly.

  • If we add (for example) two vectors we add each pair of corresponding elements.
  • If we multiply a vector by a scalar (a single number, in R really just a vector of length 1!) we multiply each element by the scalar.
  • Arithmetic involving two vectors of different length is possible but dangerous (what happens??)
v <- 1:5
w <- c(0, 1, 1, 2, 4)
v+w
2*w

Lists

A list is a bunch of things in order, not necessarily of the same type. Those things can be any R object - vectors, functions, other lists …

L <- list(1:3, "Apple tree", TRUE)
str(L)
str(L[[1]]) ## L[1] picks out the first element of the list
LL <- list(L, c(2, 7, 9))
str(LL)
str(LL[1])

Functions

Functions are “called” using parentheses. A function is a set of commands that uses whatever “arguments” are inside the parentheses and does things (including possibly “returning” a result).

What functions have we seen so far?

Example: mean() takes a vector argument and returns the mean.

mean(c(2, 5, 11))
 
x <- 1:10
m_x <- mean(x)
print(m_x)

About functions

You can learn about any built-in function by using R’s help: type ?"mean" or help("mean"), or find the Help tab in RStudio.

Arguments can be passed to functions in order, or by using names. For example, mean takes an optional argument trim.

x <- (1:10)^2
mean(x)
mean(x, trim=0.1)

Function syntax

= is a synonym for <- at the top level (and many people use it). Be aware that they are not synonyms inside a function call: <- still means assignment, but = is used to name an argument. mean(x, trim <- 0.1) does not do what you think it does.

Variable names

R variable names are case-sensitive: e.g. m and M are different variables. In general, you should not try to take advantage of this, because it may cause confusion.

R variable names have to start with a letter, and certain characters (particularly space characters) are not allowed. Good naming conventions include:

  • Use mostly letters, underscores (_) and dots (.)
  • Avoid potentially confusing names like l or O.
  • Avoid built-in names (like c, t, list, or data) for variables.
  • Make readable variable names using camelCase, snake_case, or kebab.case
  • Avoid variableNamesThatAreExcessivelyLong
  • Be consistent! (Bååth 2012)

Style

In addition to function naming there are lot of details (such as indentation) that make your code easier to read. Check out the style guide

Compound objects


R is a sophisticated system that builds complicated structures on top of these simple objects. Hopefully we’ll develop an intuition for more of that as we go along. For now, we’ll talk about just a few structures.

Data frames

Data frames are lists of vectors organized into a rectangle (each column is a vector; the columns all have to be the same length and every column has to be homogeneous but every column can be a different mode: in particular you can mix numbers, factors (see below), dates, …).

We’ll talk more about data frames when we start dealing with data.

tibbles are a special “tidyverse” version of data frames that act a little bit differently.

Factors

Factors are used to describe items that can have a discrete, known set of values (ice cream flavour, species, social class, etc.) - categorical variables in statistical terms. We will also talk more about them later.

More stuff

Control structures

Control structures in R include:

  • loops: we will use for, lapply and apply, among others. lapply stands for list apply.
    • tidyverse also has apply analogs
  • conditionals:
    • if controls program flow
    • ifelse operates on vectors, and can be tricky – use with care!

v <- 1:10
for (i in v) {
  print(i^2)
}
for (i in v) {
    if(i>=4) {
        print(i^2)
    }
}

y <- ifelse(v>4, v^2, v)
print(y)

Control syntax

R uses == to test whether two objects are equal, and != to test if they are not equal.

Boolean operators

  • Use & for “and”, and | for “or” operate on vectors, they go well with ifelse()
  • && and || expect single conditions; they go well with if().

Syntactic principles

You may have noticed that we sometimes just type a variable’s name in order to print its value. It’s better practice to always say print, when that’s what you want.

  • Clearer, and less fragile

Similarly, you usually don’t need to say return(x) to return a function value … but you should.

Clarity and explicitness are worth more than saving a few keystrokes.

Selecting

You can select single elements from an R object using the element operator [[]], and subsets using the subset operator []. If the object has rows and columns, you can separate them with a comma.

You can also select things by name, using the syntax v["name"] or v[n], if n is a variable that contains the name. Use names instead of numbers whenever you can, because (1) your code will be easier to read (x["temperature"] instead of x[21]) and (2) your code will be more robust (e.g., if something changes so that temperature is now in position 22 rather than 21).

x[[21]] and x[21] are usually synonyms: the latter “collapses” to the former. If you get in the habit of using single brackets for selecting, you could get burned – not often, but maybe hard. What’s the advantage?

Philosophical principles


The tidyverse

basic concepts

  • a set of R packages: https://www.tidyverse.org/
  • advantages
    • expressiveness
    • speed
    • new hotness
  • disadvantages
    • incompatibilities with base R
    • instability (rapid change)
    • non-standard evaluation

Sample data

We want to get some data. Before executing the code below:

  • think about where you want to keep your stuff for this class
  • set your working directory accordingly (setwd(), or in RStudio Session / Set Working Directory / Choose Directory)
  • create a data/ subdirectory if you want, or edit it out of the destfile argument below
download.file(url="https://ndownloader.figshare.com/files/2292169",
              destfile = "data/portal_data_joined.csv")

also here

And now …

We will continue with the Data Carpentry lesson for R in ecology

References

Bååth, Rasmus. 2012. “The State of Naming Conventions in R.” The R Journal 4 (2): 74–75. https://journal.r-project.org/archive/2012/RJ-2012-018/index.html.