Visit these sites for some ideas.
The command plot(x,y)
will plot vector x as the
independent variable and vector y as the dependent variable. Within the
command line, you can specify the title of the graph, the name of the
x-axis, and the name of the y-axis. - main=’title’ - xlab=’name of x
axis’ - ylab=’name of y axis’
The command lines(x,y)
adds a line segment to an
existing plot. The command points(x,y)
adds points to the
plot. A legend can be created using legend
, though getting
the legend right for base graphics can be a bit challenging. To get a
basic idea of what R offers, it has a build-in demo that can be run with
demo(graphics)
.
Try this yourself:
x = 1:100
y = rnorm(100,3,1) # 100 random normal deviates with mean=3, sd=1
plot(x,y)
plot(x,y,main='My First Plot')
# change point type
plot(x,y,pch=3)
# change color
plot(x,y,pch=4,col=2)
# draw lines between points
lines(x,y,col=3)
z=sort(y)
# plot a sorted variable vs x
plot(x,z,main='Random Normal Numbers',
xlab='Index',ylab='Random Number')
# A basic histogram
hist(z, main="Histogram",
sub="Random normal")
# A "density" plot
plot(density(z), main="Density plot",
sub="Random normal")
# A smaller "bandwidth" to capture more detail
plot(density(z, adjust=0.5),
sub="smaller bandwidth")
The ggplot2
package is a relatively novel approach to
generating highly informative publication-quality graphics. The “gg”
stands for “Grammar of Graphics”. In short, instead of thinking about a
single function that produces a plot, ggplot2
uses a
“grammar” approach, akin to building more and more complex sentences to
layer on more information or nuance. See the ggplot2
graphics gallery for some examples with accompanying
code.
The ggplot2
package assumes that data are in the form of
a data.frame. In some cases, the data will need to be manipulated into a
form that matches assumptions that ggplot2
uses. In
particular, if one has a matrix of numbers associated with
different subjects (samples, people, etc.), the data will usually need
to be transformed into a “long” data frame.
To use the ggplot2
package, it must be installed and
loaded. Assuming that installation has been done already, we can load
the package directly:
library(ggplot2)
We are going to use the mtcars dataset, included with R, to
experiment with ggplot2
.
data(mtcars)
mtcars
dataset using
View
, summary
, dim
,
class
, etc.We can also take a quick look at the relationships between the
variables using the pairs
plotting function.
pairs(mtcars)
That is a useful view of the data. We want to use
ggplot2
to make an informative plot, so let’s approach this
in a piecewise fashion. We first need to decide what type of plot to
produce and what our basic variables will be. In this case, we have a
number of choices.
ggplot(mtcars,aes(x=disp,y=hp))
First, a little explanation is necessary. The ggplot
function takes as its first argument a data.frame
. The
second argument is the “aesthetic”, aes
. The x
and y
take column names from the mtcars
data.frame
and will form the basis of our scatter plot.
But why did we get that “Error: No layers in plot”? Remember that ggplot2 is a “grammar of graphics”. We supplied a subject, but no verb (called a layer by ggplot2). So, to generate a plot, we need to supply a verb. There are many possibilities. Each “verb” or layer typically starts with “geom” and then a descriptor. An example is necessary.
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point()
We finally produced a plot. The power of ggplot2, though, is
the ability to make very rich plots by adding “grammar” to the “plot
sentence”. We have a number of other variables in our
mtcars
data.frame
. How can we add another
value to a two-dimensional plot?
ggplot(mtcars,aes(x=disp,y=hp,color=cyl)) + geom_point()
The color of the points is a based on the numeric variable
wt
, the weight of the car. Can we do more? We can change
the size of the points, also.
ggplot(mtcars,aes(x=disp,y=hp,color=wt,size=mpg)) + geom_point()
So, on our 2D plot, we are now plotting four variables. Can we do more? We can manipulate the shape of the points in addition to the color and the size.
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point(aes(size=mpg,color=wt,shape=cyl))
Why did we get that error? Ggplot2 is trying to be helpful by telling
us that a “continuous varialbe cannot be mapped to ‘shape’”. Well, in
our mtcars
data.frame
, we can look at
cyl
in detail.
class(mtcars$cyl)
## [1] "numeric"
summary(mtcars$cyl)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.000 4.000 6.000 6.188 8.000 8.000
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
The cyl
variable is “kinda” continuous in that it is
numeric, but it could also be thought of as a “category” of engines. R
has a specific data type for “category” data, called a factor.
We can easily convert the cyl
column to a factor like
so:
mtcars$cyl = as.factor(mtcars$cyl)
Now, we can go ahead with our previous approach to make a 2-dimensional plot that displays the relationships between five variables.
ggplot(mtcars,aes(x=disp,y=hp)) + geom_point(aes(size=mpg,color=wt,shape=cyl))
geom_text
to add labels to your plot.I leave this section open-ended for you to explore further options with the ggplot2 package. The data represent the on-time data for all flights that departed New York City in 2013.
# install.packages('nycflights13')
library(nycflights13)
data(flights)
head(flights)
## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
Use ggplot and other plotting tools to explore the data and look for features that might contribute to airport delays. Consider using other “geoms” during your exploration.
To make a plot directly to a file use: png()
,
postscript()
, etc.
png(file="myplot.png",width=480,height=480)
plot(density(z,adjust=2.0),sub="larger bandwidth")
dev.off()
## quartz_off_screen
## 2
On your own, save a pdf to a file. NOTE: The dimensions in
pdf()
are in inches.
To put multiple plots on a page, we can set the mfrow
graphics parameter.
par(mfrow=c(2,1))
plot(density(z,adjust=2.0),sub="larger bandwidth")
hist(z)
# use dev.off() to turn off the two-row plotting
R can have multiple graphics “devices” open.
dev.list()
dev.off()
dev.off(5)
dev.set(5)