16 Graphics and Visualization

Load the BRFSS-subset.csv data

path <- "BRFSS-subset.csv"   # or file.choose()
brfss <- read.csv(path)

Clean it by coercing Year to factor

brfss$Year <- factor(brfss$Year)

16.1 Base R Graphics

Useful for quick exploration during a normal work flow.

  • Main functions: plot(), hist(), boxplot(), …
  • Graphical parameters – see ?par, but often provided as arguments to plot(), etc.
  • Construct complicated plots by layering information, e.g., points, regression line, annotation.

    brfss2010Male <- subset(brfss, (Year == 2010) & (Sex == "Male"))
    fit <- lm(Weight ~ Height, brfss2010Male)
    
    plot(Weight ~ Height, brfss2010Male, main="2010, Males")
    abline(fit, lwd=2, col="blue")
    points(180, 90, pch=20, cex=3, col="red")

  • Approach to complicated graphics: create a grid of panels (e.g., par(mfrows=c(1, 2)), populate with plots, restore original layout.

    brfssFemale <- subset(brfss, Sex=="Female")
    
    opar = par(mfrow=c(2, 1))     # layout: 2 'rows' and 1 'column'
    hist(                         # first panel -- 1990
        brfssFemale[ brfssFemale$Year == 1990, "Weight" ],
        main = "Female, 1990")
    hist(                         # second panel -- 2010
        brfssFemale[ brfssFemale$Year == 2010, "Weight" ],
        main = "Female, 2010")

    par(opar)                      # restore original layout

16.2 What makes for a good graphical display?

  • Common scales for comparison
  • Efficient use of space
  • Careful color choice – qualitative, gradient, divergent schemes; color blind aware; …
  • Emphasis on data rather than labels
  • Convey statistical uncertainty

16.3 Grammar of Graphics: ggplot2

library(ggplot2)

‘Grammar of graphics’

  • Specify data and ‘aesthetics’ (aes()) to be plotted
  • Add layers (geom_*()) of information

    ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
        geom_point() +
        geom_smooth(method="lm")

  • Capture a plot and augment it

    plt <- ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
        geom_point() +
        geom_smooth(method="lm")
    plt + labs(title = "2010 Male")

  • Use facet_*() for layouts

    ggplot(brfssFemale, aes(x=Height, y=Weight)) +
        geom_point() + geom_smooth(method="lm") +
        facet_grid(. ~ Year)

  • Choose display to emphasize relevant aspects of data

    ggplot(brfssFemale, aes(Weight, fill=Year)) +
        geom_density(alpha=.2)