16 Graphics and Visualization
Load the BRFSS-subset.csv data
path <- "BRFSS-subset.csv" # or file.choose()
brfss <- read.csv(path)
Clean it by coercing Year
to factor
brfss$Year <- factor(brfss$Year)
16.1 Base R Graphics
Useful for quick exploration during a normal work flow.
- Main functions:
plot()
,hist()
,boxplot()
, … - Graphical parameters – see
?par
, but often provided as arguments toplot()
, etc. Construct complicated plots by layering information, e.g., points, regression line, annotation.
brfss2010Male <- subset(brfss, (Year == 2010) & (Sex == "Male")) fit <- lm(Weight ~ Height, brfss2010Male) plot(Weight ~ Height, brfss2010Male, main="2010, Males") abline(fit, lwd=2, col="blue") points(180, 90, pch=20, cex=3, col="red")
Approach to complicated graphics: create a grid of panels (e.g.,
par(mfrows=c(1, 2))
, populate with plots, restore original layout.brfssFemale <- subset(brfss, Sex=="Female") opar = par(mfrow=c(2, 1)) # layout: 2 'rows' and 1 'column' hist( # first panel -- 1990 brfssFemale[ brfssFemale$Year == 1990, "Weight" ], main = "Female, 1990") hist( # second panel -- 2010 brfssFemale[ brfssFemale$Year == 2010, "Weight" ], main = "Female, 2010")
par(opar) # restore original layout
16.2 What makes for a good graphical display?
- Common scales for comparison
- Efficient use of space
- Careful color choice – qualitative, gradient, divergent schemes; color blind aware; …
- Emphasis on data rather than labels
- Convey statistical uncertainty
16.3 Grammar of Graphics: ggplot2
library(ggplot2)
‘Grammar of graphics’
- Specify data and ‘aesthetics’ (
aes()
) to be plotted Add layers (
geom_*()
) of informationggplot(brfss2010Male, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm")
Capture a plot and augment it
plt <- ggplot(brfss2010Male, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm") plt + labs(title = "2010 Male")
Use
facet_*()
for layoutsggplot(brfssFemale, aes(x=Height, y=Weight)) + geom_point() + geom_smooth(method="lm") + facet_grid(. ~ Year)
Choose display to emphasize relevant aspects of data
ggplot(brfssFemale, aes(Weight, fill=Year)) + geom_density(alpha=.2)