As in many programming languages, understanding how data are stored and manipulated is important to getting the most out of the experience. In these next few sections, we will introduce some basic R data types and structures as well as some general approaches for working with them.

# Vectors

In R, even a single value is a vector with length=1. 

```{r}
z = 1
z
length(z)
```

In the code above, we "assigned" the value 1 to the variable named `z`. Typing `z` by itself is an "expression" that returns a result which is, in this case, the value that we just assigned. The `length` method takes an R object and returns the R length. There are numerous ways of asking R about what an object represents, and `length` is one of them.

Vectors can contain numbers, strings (character data), or logical values (`TRUE` and `FALSE`) or other "atomic" data types (table \@ref(tab:simpletypes)). *Vectors cannot contain a mix of types!* We will introduce another data structure, the R `list` for situations when we need to store a mix of base R data types. 

Table: (\#tab:simpletypes) Atomic (simplest) data types in R.

  Data type   Stores
  ----------- ------------------------
    numeric      floating point numbers
    integer           integers
    complex       complex numbers
    factor        categorical data
   character          strings
    logical        TRUE or FALSE
      NA              missing
     NULL              empty
   function        function type

## Creating vectors

Character vectors (also sometimes called "string" vectors) are entered with each value
surrounded by single or double quotes; either is acceptable, but they
must match. They are always displayed by R with double quotes. Here are some examples of creating vectors:

```{r}
# examples of vectors
c('hello','world')
c(1,3,4,5,1,2)
c(1.12341e7,78234.126)
c(TRUE,FALSE,TRUE,TRUE)
# note how in the next case the TRUE is converted to "TRUE"
# with quotes around it.
c(TRUE,'hello')
```

We can also create vectors as "regular sequences" of numbers. For example:

```{r}
# create a vector of integers from 1 to 10
x = 1:10
# and backwards
x = 10:1
```

The `seq` function can create more flexible regular sequences. 
  
```{r}
# create a vector of numbers from 1 to 4 skipping by 0.3
y = seq(1,4,0.3)
```    


And creating a new vector by concatenating existing vectors is possible, as well.

```{r}
# create a sequence by concatenating two other sequences
z = c(y,x)
z
```


## Vector Operations

Operations on a single vector are typically done element-by-element. For example, we can add `2` to a vector, `2` is added to each element of the vector and a new vector of the same length is returned.

```{r}
x = 1:10
x + 2
```

If the operation involves two vectors, the following rules apply. If the vectors are the same length: R simply applies the operation to each pair of elements.

```{r}
x + x
```

If the vectors are different lengths, but one length a multiple of the other, R
reuses the shorter vector as needed.

```{r}
x = 1:10
y = c(1,2)
x * y
```

If the vectors are different lengths, but one length *not* a multiple of the
other, R reuses the shorter vector as needed *and* delivers a
warning.

```{r}
x = 1:10
y = c(2,3,4)
x * y
```


Typical operations include multiplication ("\*"), addition,
subtraction, division, exponentiation ("\^"), but many operations
in R operate on vectors and are then called "vectorized".

## Logical Vectors

Logical vectors are vectors composed on only the values `TRUE` and
`FALSE`. Note the all-upper-case and no quotation marks.

```{r}
a = c(TRUE,FALSE,TRUE)

# we can also create a logical vector from a numeric vector
# 0 = false, everything else is 1
b = c(1,0,217)
d = as.logical(b)
d
# test if a and d are the same at every element
all.equal(a,d)

# We can also convert from logical to numeric
as.numeric(a)
```

### Logical Operators

Some operators like `<, >, ==, >=, <=, !=` can be used to create logical
vectors.

```{r}
# create a numeric vector
x = 1:10
# testing whether x > 5 creates a logical vector
x > 5
x <= 5
x != 5
x == 5
```

We can also assign the results to a variable:

```{r}
y = (x == 5)
y
```


## Indexing Vectors

In R, an index is used to refer to a specific element or
set of elements in an vector (or other data structure). [R uses `[` and `]` to perform indexing,
although other approaches to getting subsets of larger data
structures are common in R.

```{r}
x = seq(0,1,0.1)
# create a new vector from the 4th element of x
x[4]
```

We can even use other vectors to perform the "indexing".

```{r}
x[c(3,5,6)]
y = 3:6
x[y]
```


Combining the concept of indexing with the concept of logical vectors
results in a very power combination.


```{r}
# use help('rnorm') to figure out what is happening next
myvec = rnorm(10)

# create logical vector that is TRUE where myvec is >0.25
gt1 = (myvec > 0.25)
sum(gt1)
# and use our logical vector to create a vector of myvec values that are >0.25
myvec[gt1]
# or <=0.25 using the logical "not" operator, "!"
myvec[!gt1]
# shorter, one line approach
myvec[myvec > 0.25]
```


## Character Vectors, A.K.A. Strings

R uses the `paste` function to concatenate strings.

```{r}
paste("abc","def")
paste("abc","def",sep="THISSEP")
paste0("abc","def")
## [1] "abcdef"
paste(c("X","Y"),1:10)
paste(c("X","Y"),1:10,sep="_")
```

We can count the number of characters in a string.

```{r}
nchar('abc')
nchar(c('abc','d',123456))
```
        
Pulling out parts of strings is also sometimes useful.

```{r}
substr('This is a good sentence.',start=10,stop=15)
```

Another common operation is to replace something in a string with something (a find-and-replace).

```{r}
sub('This','That','This is a good sentence.')
```

When we want to find all strings that match some other string, we can use `grep`, or "grab regular expression".

```{r}
grep('bcd',c('abcdef','abcd','bcde','cdef','defg'))
grep('bcd',c('abcdef','abcd','bcde','cdef','defg'),value=TRUE)
```

Read about the `grepl` function (`?grepl`). Use that function to
return a logical vector (TRUE/FALSE) for each entry above with an `a`
in it.

## Missing Values, AKA “NA”

R has a special value, “NA”, that represents a “missing” value, or *Not Available*, in a
vector or other data structure. Here, we just create a vector to experiment.

```{r}
x = 1:5
x
length(x)
```

```{r}
is.na(x)
x[2] = NA
x
```
The length of `x` is unchanged, but there is one value that is marked as "missing" by virtue of being `NA`.

```{r}
length(x)
is.na(x)
```

We can remove `NA` values by using indexing. In the following, `is.na(x)` returns a logical vector the
length of `x`. The `!` is the logical _NOT_ operator and converts `TRUE` to `FALSE` and vice-versa.

```{r}
x[!is.na(x)]
```