11 March 2024
R works as a (very fancy) calculator.
2+2 2439*41 259*3861 2+sqrt(92)
What about:
7+3*9 32/4*4
Figure out what the second one does, and then explain how you would write it more clearly.
R can remember numbers (and other things) and give them back to you. This is the foundation of being able to write programs that do many things in a specified order.
x <- 23 print(x) x + 17 ## no assignment! print(x) y <- x + 17 y ## symbol by itself is shorthand for "print"
R keeps track of a variety of objects. You can see all of the objects you’ve defined by typing ls()
(short for “list”: the parentheses mean that you are calling a function; more on this later), or looking in the “Environment” tab in RStudio.
You can also switch to the “Grid View” in the Environment tab in RStudio.
In addition to numbers, R has other kinds of values. The main ones we’re interested in are character
, numeric
and logical
(ie., TRUE
or FALSE
). str()
(short for “structure”) is one way to figure out what your R object is.
num <- 3 str(num) char <- "Hello, class?" str(char) logic <- TRUE ## No quotes str(logic)
At the simplest level, R objects are divided into types. Other than the ones above, we’re mostly interested in vectors, lists and functions.
A vector consists of zero or more values of the same storage mode:
words <- c("Mary", "had", "a", "little", "lamb") # c() puts elements together into a vector str(words) v <- 1:5 ## 'm:n' creates a sequence from m to n str(v)
R does math on vectors directly.
v <- 1:5 w <- c(0, 1, 1, 2, 4) v+w 2*w
A list is a bunch of things in order, not necessarily of the same type. Those things can be any R object - vectors, functions, other lists …
L <- list(1:3, "Apple tree", TRUE) str(L) str(L[[1]]) ## L[1] picks out the first element of the list LL <- list(L, c(2, 7, 9)) str(LL) str(LL[1])
Functions are “called” using parentheses. A function is a set of commands that uses whatever “arguments” are inside the parentheses and does things (including possibly “returning” a result).
What functions have we seen so far?
Example: mean()
takes a vector argument and returns the mean.
mean(c(2, 5, 11)) x <- 1:10 m_x <- mean(x) print(m_x)
You can learn about any built-in function by using R’s help: type ?"mean"
or help("mean")
, or find the Help tab in RStudio.
Arguments can be passed to functions in order, or by using names. For example, mean
takes an optional argument trim
.
x <- (1:10)^2 mean(x) mean(x, trim=0.1)
=
is a synonym for <-
at the top level (and many people use it). Be aware that they are not synonyms inside a function call: <-
still means assignment, but =
is used to name an argument. mean(x, trim <- 0.1)
does not do what you think it does.
R variable names are case-sensitive: e.g. m
and M
are different variables. In general, you should not try to take advantage of this, because it may cause confusion.
R variable names have to start with a letter, and certain characters (particularly space characters) are not allowed. Good naming conventions include:
_
) and dots (.
)l
or O
.c
, t
, list
, or data
) for variables.camelCase
, snake_case
, or kebab.case
variableNamesThatAreExcessivelyLong
In addition to function naming there are lot of details (such as indentation) that make your code easier to read. Check out the style guide
R is a sophisticated system that builds complicated structures on top of these simple objects. Hopefully we’ll develop an intuition for more of that as we go along. For now, we’ll talk about just a few structures.
Data frames are lists of vectors organized into a rectangle (each column is a vector; the columns all have to be the same length and every column has to be homogeneous but every column can be a different mode: in particular you can mix numbers, factors (see below), dates, …).
We’ll talk more about data frames when we start dealing with data.
tibbles
are a special “tidyverse” version of data frames that act a little bit differently.
Factors are used to describe items that can have a discrete, known set of values (ice cream flavour, species, social class, etc.) - categorical variables in statistical terms. We will also talk more about them later.
Control structures in R include:
for
, lapply
and apply
, among others. lapply
stands for list apply.
apply
analogsif
controls program flowifelse
operates on vectors, and can be tricky – use with care!v <- 1:10 for (i in v) { print(i^2) } for (i in v) { if(i>=4) { print(i^2) } } y <- ifelse(v>4, v^2, v) print(y)
R uses ==
to test whether two objects are equal, and !=
to test if they are not equal.
&
for “and”, and |
for “or” operate on vectors, they go well with ifelse()
&&
and ||
expect single conditions; they go well with if()
.You may have noticed that we sometimes just type a variable’s name in order to print its value. It’s better practice to always say print
, when that’s what you want.
Similarly, you usually don’t need to say return(x)
to return a function value … but you should.
Clarity and explicitness are worth more than saving a few keystrokes.
You can select single elements from an R object using the element operator [[]]
, and subsets using the subset operator []
. If the object has rows and columns, you can separate them with a comma.
You can also select things by name, using the syntax v["name"]
or v[n]
, if n
is a variable that contains the name. Use names instead of numbers whenever you can, because (1) your code will be easier to read (x["temperature"]
instead of x[21]
) and (2) your code will be more robust (e.g., if something changes so that temperature is now in position 22 rather than 21).
x[[21]]
and x[21]
are usually synonyms: the latter “collapses” to the former. If you get in the habit of using single brackets for selecting, you could get burned – not often, but maybe hard. What’s the advantage?
We want to get some data. Before executing the code below:
setwd()
, or in RStudio Session / Set Working Directory / Choose Directory
)data/
subdirectory if you want, or edit it out of the destfile
argument belowdownload.file(url="https://ndownloader.figshare.com/files/2292169", destfile = "data/portal_data_joined.csv")
also here
We will continue with the Data Carpentry lesson for R in ecology
Bååth, Rasmus. 2012. “The State of Naming Conventions in R.” The R Journal 4 (2): 74–75. https://journal.r-project.org/archive/2012/RJ-2012-018/index.html.