What is R?

  • A software package
  • A programming language
  • A toolkit for developing statistical and analytical tools
  • An extensive library of statistical and mathematical software and algorithms
  • A scripting language

Why R?

  • R is cross-platform and runs on Windows, Mac, and Linux (as well as more obscure systems).
  • R provides a vast number of useful statistical tools, many of which have been painstakingly tested.
  • R produces publication-quality graphics in a variety of formats.
  • R plays well with FORTRAN, C, and scripts in many languages.
  • R scales, making it useful for small and large projects. It is NOT Excel.
  • R eschews the GUI.

I can develop code for analysis on my Mac laptop. I can then install the same code on our massive computer cluster and run it in parallel on 1000 samples, monitor the process, and then update a database with R when complete.

Why not R?

  • R cannot do everything.
  • R is not always the ``best'' tool for the job.
  R will not hold your hand.
  • The documentation can be opaque.
  • R can drive you crazy (on a good day) or age you prematurely (on a bad one).
  • Finding the right package to do the job you want to do can be challenging; worse, some contributed packages are unreliable.
  • R eschews the GUI.

R License and the Open Source Ideal

  • R is free!
  • Distributed under GNU license
    • You may download the source code.
    • You may modify the source code to your heart's content.
    • You may distribute the modified source code and even charge money for it,
    • but you must distribute the modified source code under the original GNU license

Take-home Message

This license means that R will always be available, will always be open source, and can grow organically without constraint.

Getting started


Installing RStudio

The next step is to install RStudio, a program for viewing and running R scripts. Technically you can run all the code shown here without installing RStudio, but we highly recommend this integrated development environment (IDE).

R on your own

Try out the swirl tutorial, which teaches you R programming and data science interactively, at your own pace and in the R console. Once you have R installed, you can install swirl and run it the following way:


Quick R references

There are also many open and free resources and reference guides for R. Two examples are:

  • Quick-R: a quick online reference for data input, basic statistics and plots
  • R reference card PDF by Tom Short

R from zero

Follow along

  • In RStudio, copy and paste the following:
  • In the file pane, you can choose the HourOfCode.Rmd file and it will open in the RStudio text pane.

Interacting with R


1 + pi + sin(3.7)
## [1] 3.611757


x = 1
y <- 2
3 -> z

Interacting with R

  • The <-, -> and = are all assignment operators.
x = 1
y <- 2
3 -> z
  • If a line is not a complete R command, R will continue the next line with a +.
1 + pi + 

Getting help

R has extensive help functionality built in.

  • For any new function that you see, type help(newfunction).

First steps in R

Paths and the Working Directory

When you are working in R it is useful to know your working directory. This is the directory or folder in which R will save or look for files by default. You can see your working directory by typing:


Loading data into R

R can read files of many different types and from many different sources.

Directly from the web

dir <- ""
url <- paste0(dir, "femaleMiceWeights.csv")
dat <- read.csv(url)

Download first

library(downloader) ##use install.packages to install
dir <- ""
filename <- "femaleMiceWeights.csv" 
url <- paste0(dir, filename)
if (!file.exists(filename)) download(url, destfile=filename)

Working with data


Working with data

##   Diet Bodyweight
## 1 chow      21.51
## 2 chow      28.14
## 3 chow      24.04
## 4 chow      23.45
## 5 chow      23.68
## 6 chow      19.79

Working with data

##    Diet Bodyweight
## 19   hf      29.58
## 20   hf      30.92
## 21   hf      34.02
## 22   hf      21.90
## 23   hf      31.53
## 24   hf      20.73

Working with data

##    Diet      Bodyweight   
##  chow:12   Min.   :19.79  
##  hf  :12   1st Qu.:22.36  
##            Median :25.16  
##            Mean   :25.32  
##            3rd Qu.:28.14  
##            Max.   :34.02

Working with data

## [1] 24  2


dplyr filter

chow <- filter(dat, Diet=="chow") #keep only the ones with chow diet
##   Diet Bodyweight
## 1 chow      21.51
## 2 chow      28.14
## 3 chow      24.04
## 4 chow      23.45
## 5 chow      23.68
## 6 chow      19.79

dplyr select

chowVals <- select(chow,Bodyweight)
##   Bodyweight
## 1      21.51
## 2      28.14
## 3      24.04
## 4      23.45
## 5      23.68
## 6      19.79


chowVals <- filter(dat, Diet=="chow") %>% select(Bodyweight) %>% unlist

Plotting with ggplot2

ggplot2 package

The ggplot2 package is a relatively novel approach to generating highly informative publication-quality graphics. The "gg" stands for "Grammar of Graphics". In short, instead of thinking about a single function that produces a plot, ggplot2 uses a "grammar" approach, akin to building more and more complex sentences to layer on more information or nuance.

Data Model

The ggplot2 package assumes that data are in the form of a data.frame. In some cases, the data will need to be manipulated into a form that matches assumptions that ggplot2 uses. In particular, if one has a matrix of numbers associated with different subjects (samples, people, etc.), the data will usually need to be transformed into a "long" data frame.

Getting started

To use the ggplot2 package, it must be installed and loaded. Assuming that installation has been done already, we can load the package directly:


Playing with ggplot2

mtcars data

We are going to use the mtcars dataset, included with R, to experiment with ggplot2.

  • Exercise: Explore the mtcars dataset using View, summary, dim, class, etc.

Pairs plot

We can also take a quick look at the relationships between the variables using the pairs plotting function.


Go to the vignette



Literate programming


