## Control Structures in R

-   [R has multiple types of control structures that allows for
    sequential evaluation of statements.]{}

-   [For loops]{}

        for (x in set) {operations}

-   [while loops]{}

        while (x in condition){operations}

-   [If statements (conditional)]{}

        if (condition) {
        some operations 
         } else { other operations }

### Control Structure and Looping Examples

```{r}
    x<-1:9
    length(x)
    # a simple conditional then two expressions
    if (length(x)<=10) {
       x<-c(x,10:20);print(x)}
    # more complex 
    if (length(x)<5) {
        print(x)
    } else {
        print(x[5:20])
    }           
    # print the values of x, one at a time
    for (i in x) print(i) 
    for(i in x) i   # note R will not echo in a loop

    # loop over a character vector
    y<-c('a','b','hi there')            
    for (i in y) print(i)

    # and a while loop
    j<-1                
    while(j<10) { # do this while j<10      
      print(j)
      j<-j+2} # at each iteration, increase j by 2
```

## Applying

### Why Does R Have Apply Functions

-   [Often we want to apply the same function to all the rows or columns
    of a matrix, or all the elements of a list.]{}

-   [We could do this in a loop, but loops take a lot of time in an
    interpreted language like R.]{}

-   [R has more efficient built-in operators, the apply functions.]{}

[example]{} If mat is a matrix and fun is a function (such as mean, var,
lm ...) that takes a vector as its argument, then you can:

    apply(mat,1,fun) # over rows--second argument is 1      
    apply(mat,2,fun) # over columns--second argument is 2

In either case, the output is a *vector*.

### Apply Function Exercise

1.  [Using the matrix and rnorm functions, create a matrix with 20 rows
    and 10 columns (200 values total) of random normal deviates.]{}

2.  [Compute the mean for each row of the matrix.]{}

3.  [Compute the median for each column.]{}

### Related Apply Functions

-   [`lapply(list, function)` applies the function to every element of
    list]{}

-   [`sapply(list or vector, function)` applies the function to every
    element of list or vector, and returns a vector, when possible
    (easier to process)]{}
    
-   [`vapply(list or vector, function)` applies the function to every
    element of list or vector, and returns a vector, when possible
    (easier to process)]{}

-   [`tapply(x, factor, fun)` uses the factor to split vector x into
    groups, and then applies fun to each group]{}


```{r}
    # create a list
    my.list <- list(a=1:3,b=5:10,c=11:20)
    my.list
    # Get the mean for each member of the list
    # return a vector
    sapply( my.list, mean)
    # Get the full summary for each member of
    # the list, returned as a list
    lapply( my.list, summary)
    # Find the mean for each group defined by a factor
    my.vector <- 1:10
    my.factor <- factor(
      c(1,1,1,2,2,2,3,3,3,3))
    tapply(my.vector, my.factor, mean)
```

## Functions

-   [Functions are objects and are assigned to names, just like data.]{}

        myFunction = function(argument1,argument2) {
          expression1
          expression2
        }

-   [We write functions for anything we need to do again and again.]{}

-   [You may test your commands interactively at first, and then use the
    `history()` feature and an editor to create the function.]{}

-   [It is wise to include a comment at the start of each function to
    say what it does and to document functions of more than a few
    lines.]{}

### Example Functions

    add1 = function(x) {
        # this function adds one to the first argument and returns it
        x + 1
    }
    add1(17)
    ## [1] 18
    add1(c(17,18,19,20))
    ## [1] 18 19 20 21

## Exercises

- Use system.time to compare the two codes here. Both accomplish the same thing--adding 1 to every value of the vector `rn`.

```{r eval=TRUE, results='hide'}
rn = rnorm(1e6)
system.time(
for (i in seq_along(rn)) {
  rn[i] = rn[i] + 1
}
)
# vectorized
system.time(
    {rn = rn + 1}
)
```
- Create a function that takes a numeric vector and calculates the mean without using the R `mean` function.

- Modify the function above so that it can calculate the "trimmed mean" by adding a second argument that specifies the proportion of data to trim from ends of the numeric vector before calculating mean. The definition of trimmed mean is:

> A trimmed mean (similar to an adjusted mean) is a method of averaging that removes a small designated percentage of the largest and smallest values before calculating the mean. After removing the specified outlier observations, the trimmed mean is found using a standard arithmetic averaging formula. The use of a trimmed mean helps eliminate the influence of outliers or data points on the tails that may unfairly affect the traditional mean.

- Use the `system.time()` function to time your mean function with a vector of length 1000. Do the same with the builtin R version of mean, `mean()`. Is there a difference in timings? Do you believe that these timings could show a difference?

- Use the [microbenchmark package](https://www.r-bloggers.com/using-the-microbenchmark-package-to-compare-the-execution-time-of-r-expressions/) to compare the performance of your mean function to that of `mean()` builtin to R.

- Write a function that takes as input a string (character vector of length 1) and counts the number of occurrences of each letter (after converting to lower case). Take a look at the `tolower()`, `strsplit()`, and `table()` functions to help you with this task. Then, modify the function to return the proportion of each letter rather than the count. Would this be useful for any biological data?