As in many programming languages, understanding how data are stored and manipulated is important to getting the most out of the experience. In these next few sections, we will introduce some basic R data types and structures as well as some general approaches for working with them. # Vectors In R, even a single value is a vector with length=1. ```{r} z = 1 z length(z) ``` In the code above, we "assigned" the value 1 to the variable named `z`. Typing `z` by itself is an "expression" that returns a result which is, in this case, the value that we just assigned. The `length` method takes an R object and returns the R length. There are numerous ways of asking R about what an object represents, and `length` is one of them. Vectors can contain numbers, strings (character data), or logical values (`TRUE` and `FALSE`) or other "atomic" data types (table \@ref(tab:simpletypes)). *Vectors cannot contain a mix of types!* We will introduce another data structure, the R `list` for situations when we need to store a mix of base R data types. Table: (\#tab:simpletypes) Atomic (simplest) data types in R. Data type Stores ----------- ------------------------ numeric floating point numbers integer integers complex complex numbers factor categorical data character strings logical TRUE or FALSE NA missing NULL empty function function type ## Creating vectors Character vectors (also sometimes called "string" vectors) are entered with each value surrounded by single or double quotes; either is acceptable, but they must match. They are always displayed by R with double quotes. Here are some examples of creating vectors: ```{r} # examples of vectors c('hello','world') c(1,3,4,5,1,2) c(1.12341e7,78234.126) c(TRUE,FALSE,TRUE,TRUE) # note how in the next case the TRUE is converted to "TRUE" # with quotes around it. c(TRUE,'hello') ``` We can also create vectors as "regular sequences" of numbers. For example: ```{r} # create a vector of integers from 1 to 10 x = 1:10 # and backwards x = 10:1 ``` The `seq` function can create more flexible regular sequences. ```{r} # create a vector of numbers from 1 to 4 skipping by 0.3 y = seq(1,4,0.3) ``` And creating a new vector by concatenating existing vectors is possible, as well. ```{r} # create a sequence by concatenating two other sequences z = c(y,x) z ``` ## Vector Operations Operations on a single vector are typically done element-by-element. For example, we can add `2` to a vector, `2` is added to each element of the vector and a new vector of the same length is returned. ```{r} x = 1:10 x + 2 ``` If the operation involves two vectors, the following rules apply. If the vectors are the same length: R simply applies the operation to each pair of elements. ```{r} x + x ``` If the vectors are different lengths, but one length a multiple of the other, R reuses the shorter vector as needed. ```{r} x = 1:10 y = c(1,2) x * y ``` If the vectors are different lengths, but one length *not* a multiple of the other, R reuses the shorter vector as needed *and* delivers a warning. ```{r} x = 1:10 y = c(2,3,4) x * y ``` Typical operations include multiplication ("\*"), addition, subtraction, division, exponentiation ("\^"), but many operations in R operate on vectors and are then called "vectorized". ## Logical Vectors Logical vectors are vectors composed on only the values `TRUE` and `FALSE`. Note the all-upper-case and no quotation marks. ```{r} a = c(TRUE,FALSE,TRUE) # we can also create a logical vector from a numeric vector # 0 = false, everything else is 1 b = c(1,0,217) d = as.logical(b) d # test if a and d are the same at every element all.equal(a,d) # We can also convert from logical to numeric as.numeric(a) ``` ### Logical Operators Some operators like `<, >, ==, >=, <=, !=` can be used to create logical vectors. ```{r} # create a numeric vector x = 1:10 # testing whether x > 5 creates a logical vector x > 5 x <= 5 x != 5 x == 5 ``` We can also assign the results to a variable: ```{r} y = (x == 5) y ``` ## Indexing Vectors In R, an index is used to refer to a specific element or set of elements in an vector (or other data structure). [R uses `[` and `]` to perform indexing, although other approaches to getting subsets of larger data structures are common in R. ```{r} x = seq(0,1,0.1) # create a new vector from the 4th element of x x[4] ``` We can even use other vectors to perform the "indexing". ```{r} x[c(3,5,6)] y = 3:6 x[y] ``` Combining the concept of indexing with the concept of logical vectors results in a very power combination. ```{r} # use help('rnorm') to figure out what is happening next myvec = rnorm(10) # create logical vector that is TRUE where myvec is >0.25 gt1 = (myvec > 0.25) sum(gt1) # and use our logical vector to create a vector of myvec values that are >0.25 myvec[gt1] # or <=0.25 using the logical "not" operator, "!" myvec[!gt1] # shorter, one line approach myvec[myvec > 0.25] ``` ## Character Vectors, A.K.A. Strings R uses the `paste` function to concatenate strings. ```{r} paste("abc","def") paste("abc","def",sep="THISSEP") paste0("abc","def") ## [1] "abcdef" paste(c("X","Y"),1:10) paste(c("X","Y"),1:10,sep="_") ``` We can count the number of characters in a string. ```{r} nchar('abc') nchar(c('abc','d',123456)) ``` Pulling out parts of strings is also sometimes useful. ```{r} substr('This is a good sentence.',start=10,stop=15) ``` Another common operation is to replace something in a string with something (a find-and-replace). ```{r} sub('This','That','This is a good sentence.') ``` When we want to find all strings that match some other string, we can use `grep`, or "grab regular expression". ```{r} grep('bcd',c('abcdef','abcd','bcde','cdef','defg')) grep('bcd',c('abcdef','abcd','bcde','cdef','defg'),value=TRUE) ``` Read about the `grepl` function (`?grepl`). Use that function to return a logical vector (TRUE/FALSE) for each entry above with an `a` in it. ## Missing Values, AKA “NA” R has a special value, “NA”, that represents a “missing” value, or *Not Available*, in a vector or other data structure. Here, we just create a vector to experiment. ```{r} x = 1:5 x length(x) ``` ```{r} is.na(x) x[2] = NA x ``` The length of `x` is unchanged, but there is one value that is marked as "missing" by virtue of being `NA`. ```{r} length(x) is.na(x) ``` We can remove `NA` values by using indexing. In the following, `is.na(x)` returns a logical vector the length of `x`. The `!` is the logical _NOT_ operator and converts `TRUE` to `FALSE` and vice-versa. ```{r} x[!is.na(x)] ```