R works as a (very fancy) calculator.
2+2
2439*41
259*3861
2+sqrt(92)
What about:
7+3*9
32/4*4
Figure out what the second one does, and then explain how you would write it more clearly.
R can remember numbers (and other things) and give them back to you. This is the foundation of being able to write programs that do many things in a specified order.
x <- 23
print(x)
x + 17 ## no assignment!
print(x)
y <- x + 17
y ## symbol by itself is shorthand for "print"
R keeps track of a variety of objects. You can see all of the objects
you’ve defined by typing ls()
(short for “list”: the
parentheses mean that you are calling a function; more on this
later), or looking in the “Environment” tab in RStudio.
You can also switch to the “Grid View” in the Environment tab in RStudio.
In addition to numbers, R has other kinds of values. The main ones
we’re interested in are character
, numeric
and
logical
(ie., TRUE
or FALSE
).
str()
(short for “structure”) is one way to figure out what
your R object is.
num <- 3
str(num)
char <- "Hello, class?"
str(char)
logic <- TRUE ## No quotes
str(logic)
At the simplest level, R objects are divided into types. Other than the ones above, we’re mostly interested in vectors, lists and functions.
A vector consists of zero or more values of the same storage mode:
words <- c("Mary", "had", "a", "little", "lamb")
# c() puts elements together into a vector
str(words)
v <- 1:5 ## 'm:n' creates a sequence from m to n
str(v)
R does math on vectors directly.
v <- 1:5
w <- c(0, 1, 1, 2, 4)
v+w
2*w
A list is a bunch of things in order, not necessarily of the same type. Those things can be any R object - vectors, functions, other lists …
L <- list(1:3, "Apple tree", TRUE)
str(L)
str(L[[1]]) ## L[1] picks out the first element of the list
LL <- list(L, c(2, 7, 9))
str(LL)
str(LL[1])
Functions are “called” using parentheses. A function is a set of commands that uses whatever “arguments” are inside the parentheses and does things (including possibly “returning” a result).
What functions have we seen so far?
Example: mean()
takes a vector argument and returns the
mean.
mean(c(2, 5, 11))
x <- 1:10
m_x <- mean(x)
print(m_x)
You can learn about any built-in function by using R’s help: type
?"mean"
or help("mean")
, or find the Help tab
in RStudio.
Arguments can be passed to functions in order, or by using names. For
example, mean
takes an optional argument
trim
.
x <- (1:10)^2
mean(x)
mean(x, trim=0.1)
=
is a synonym for <-
at the top level
(and many people use it). Be aware that they are not synonyms
inside a function call: <-
still means
assignment, but =
is used to name an argument.
mean(x, trim <- 0.1)
does not do what you think
it does.
R variable names are case-sensitive: e.g. m
and
M
are different variables. In general, you should not try
to take advantage of this, because it may cause confusion.
R variable names have to start with a letter, and certain characters (particularly space characters) are not allowed. Good naming conventions include:
_
) and dots
(.
)l
or
O
.c
, t
,
list
, or data
) for variables.camelCase
,
snake_case
, or kebab.case
variableNamesThatAreExcessivelyLong
In addition to function naming there are lot of details (such as indentation) that make your code easier to read. Check out the style guide
R is a sophisticated system that builds complicated structures on top of these simple objects. Hopefully we’ll develop an intuition for more of that as we go along. For now, we’ll talk about just a few structures.
Data frames are lists of vectors organized into a rectangle (each column is a vector; the columns all have to be the same length and every column has to be homogeneous but every column can be a different mode: in particular you can mix numbers, factors (see below), dates, …).
We’ll talk more about data frames when we start dealing with data.
tibbles
are a special “tidyverse” version of data frames
that act a little bit differently.
Factors are used to describe items that can have a discrete, known set of values (ice cream flavour, species, social class, etc.) - categorical variables in statistical terms. We will also talk more about them later.
Control structures in R include:
for
, lapply
and
apply
, among others. lapply
stands for
list apply.
apply
analogsif
controls program flowifelse
operates on vectors, and can be tricky – use
with care!v <- 1:10
for (i in v) {
print(i^2)
}
for (i in v) {
if(i>=4) {
print(i^2)
}
}
y <- ifelse(v>4, v^2, v)
print(y)
R uses ==
to test whether two objects are equal, and
!=
to test if they are not equal.
&
for “and”, and |
for “or”
operate on vectors, they go well with ifelse()
&&
and ||
expect single
conditions; they go well with if()
.You may have noticed that we sometimes just type a variable’s name in
order to print its value. It’s better practice to always say
print
, when that’s what you want.
Similarly, you usually don’t need to say return(x)
to
return a function value … but you should.
Clarity and explicitness are worth more than saving a few keystrokes.
You can select single elements from an R object using the element
operator [[]]
, and subsets using the subset operator
[]
. If the object has rows and columns, you can separate
them with a comma.
You can also select things by name, using the syntax
v["name"]
or v[n]
, if n
is a
variable that contains the name. Use names instead of numbers whenever
you can, because (1) your code will be easier to read
(x["temperature"]
instead of x[21]
) and (2)
your code will be more robust (e.g., if something changes so that
temperature is now in position 22 rather than 21).
x[[21]]
and x[21]
are usually synonyms: the
latter “collapses” to the former. If you get in the habit of using
single brackets for selecting, you could get burned – not often, but
maybe hard. What’s the advantage?
We want to get some data. Before executing the code below:
setwd()
, or in RStudio
Session / Set Working Directory / Choose Directory
)data/
subdirectory if you want, or edit it out
of the destfile
argument belowdownload.file(url="https://ndownloader.figshare.com/files/2292169",
destfile = "data/portal_data_joined.csv")
also here
We will continue with the Data Carpentry lesson for R in ecology