R has multiple types of control structures that allows for sequential evaluation of statements.
For loops
for (x in set) {operations}
while loops
while (x in condition){operations}
If statements (conditional)
if (condition) {
some operations
} else { other operations }
x<-1:9
length(x)
## [1] 9
# a simple conditional then two expressions
if (length(x)<=10) {
x<-c(x,10:20);print(x)}
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# more complex
if (length(x)<5) {
print(x)
} else {
print(x[5:20])
}
## [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# print the values of x, one at a time
for (i in x) print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
for(i in x) i # note R will not echo in a loop
# loop over a character vector
y<-c('a','b','hi there')
for (i in y) print(i)
## [1] "a"
## [1] "b"
## [1] "hi there"
# and a while loop
j<-1
while(j<10) { # do this while j<10
print(j)
j<-j+2} # at each iteration, increase j by 2
## [1] 1
## [1] 3
## [1] 5
## [1] 7
## [1] 9
Often we want to apply the same function to all the rows or columns of a matrix, or all the elements of a list.
We could do this in a loop, but loops take a lot of time in an interpreted language like R.
R has more efficient built-in operators, the apply functions.
example If mat is a matrix and fun is a function (such as mean, var, lm …) that takes a vector as its argument, then you can:
apply(mat,1,fun) # over rows--second argument is 1
apply(mat,2,fun) # over columns--second argument is 2
In either case, the output is a vector.
Using the matrix and rnorm functions, create a matrix with 20 rows and 10 columns (200 values total) of random normal deviates.
Compute the mean for each row of the matrix.
Compute the median for each column.
Functions are objects and are assigned to names, just like data.
myFunction = function(argument1,argument2) {
expression1
expression2
}
We write functions for anything we need to do again and again.
You may test your commands interactively at first, and then
use the history()
feature and an editor to create the
function.
It is wise to include a comment at the start of each function to say what it does and to document functions of more than a few lines.
add1 = function(x) {
# this function adds one to the first argument and returns it
x + 1
}
add1(17)
## [1] 18
add1(c(17,18,19,20))
## [1] 18 19 20 21
rn
.rn = rnorm(1e6)
system.time(
for (i in seq_along(rn)) {
rn[i] = rn[i] + 1
}
)
# vectorized
system.time(
{rn = rn + 1}
)
Create a function that takes a numeric vector and calculates the
mean without using the R mean
function.
Modify the function above so that it can calculate the “trimmed mean” by adding a second argument that specifies the proportion of data to trim from ends of the numeric vector before calculating mean. The definition of trimmed mean is:
A trimmed mean (similar to an adjusted mean) is a method of averaging that removes a small designated percentage of the largest and smallest values before calculating the mean. After removing the specified outlier observations, the trimmed mean is found using a standard arithmetic averaging formula. The use of a trimmed mean helps eliminate the influence of outliers or data points on the tails that may unfairly affect the traditional mean.
Use the system.time()
function to time your mean
function with a vector of length 1000. Do the same with the builtin R
version of mean, mean()
. Is there a difference in timings?
Do you believe that these timings could show a difference?
Use the microbenchmark
package to compare the performance of your mean function to that of
mean()
builtin to R.
Write a function that takes as input a string (character vector
of length 1) and counts the number of occurrences of each letter (after
converting to lower case). Take a look at the tolower()
,
strsplit()
, and table()
functions to help you
with this task. Then, modify the function to return the proportion of
each letter rather than the count. Would this be useful for any
biological data?