5 Rectangular Data
5.0.1 Matrices and Data Frames
A matrix is a rectangular array. It can be viewed as a collection of column vectors all of the same length and the same type (i.e. numeric, character or logical).
A data frame is also a rectangular array. All of the columns must be the same length, but they may be of different types.
The rows and columns of a matrix or data frame can be given names.
However these are implemented differently in R; many operations will work for one but not both.
5.1 Matrix Operations
5.1.1 Matrix Operations
x<-1:10
y<-rnorm(10)
# make a matrix by column binding two numeric vectors
mat<-cbind(x,y)
mat
## x y
## [1,] 1 0.3842634
## [2,] 2 -1.2536470
## [3,] 3 -1.4098608
## [4,] 4 1.7139083
## [5,] 5 -0.1876347
## [6,] 6 -0.3343492
## [7,] 7 1.6553040
## [8,] 8 0.5974976
## [9,] 9 -1.8443792
## [10,] 10 1.4203497
# And the names of the rows and columns
rownames(mat)
## NULL
colnames(mat)
## [1] "x" "y"
5.1.2 Matrix Operations
Indexing for matrices works as for vectors except that we now need to include both the row and column (in that order).
# The 2nd element of the 1st row of mat
mat[1,2]
## y
## 0.3842634
# The first ROW of mat
mat[1,]
## x y
## 1.0000000 0.3842634
# The first COLUMN of mat
mat[,1]
## [1] 1 2 3 4 5 6 7 8 9 10
# and all elements of mat that are > 4; note no comma
mat[mat>4]
## [1] 5 6 7 8 9 10
5.1.3 Matrix Operations
# create a matrix with 2 columns and 10 rows
# filled with random normal deviates
m = matrix(rnorm(20),nrow=10)
# multiply all values in the matrix by 20
m = m*20
# and add 100 to the first column of m
m[,1] = m[,1] + 100
# summarize m
summary(m)
## V1 V2
## Min. : 44.58 Min. :-42.4628
## 1st Qu.: 93.26 1st Qu.:-19.8712
## Median :100.72 Median : 3.6691
## Mean : 95.27 Mean : -0.4271
## 3rd Qu.:106.33 3rd Qu.: 17.3559
## Max. :111.29 Max. : 31.2418
5.2 Data Frames
5.2.1 Matrices Versus Data Frames
mat<-cbind(x,y)
class(mat[,1])
## [1] "numeric"
z = paste0('a',1:10)
tab<-cbind(x,y,z)
class(tab)
## [1] "matrix"
mode(tab[,1])
## [1] "character"
head(tab,4)
## x y z
## [1,] "1" "0.384263423333191" "a1"
## [2,] "2" "-1.25364703104426" "a2"
## [3,] "3" "-1.40986082624299" "a3"
## [4,] "4" "1.7139082579506" "a4"
5.2.2 Matrices Versus Data Frames
tab<-data.frame(x,y,z)
class(tab)
## [1] "data.frame"
head(tab)
## x y z
## 1 1 0.3842634 a1
## 2 2 -1.2536470 a2
## 3 3 -1.4098608 a3
## 4 4 1.7139083 a4
## 5 5 -0.1876347 a5
## 6 6 -0.3343492 a6
mode(tab[,1])
## [1] "numeric"
class(tab[,3])
## [1] "factor"
rownames(tab)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
rownames(tab)<-paste0("row",1:10)
rownames(tab)
## [1] "row1" "row2" "row3" "row4" "row5" "row6" "row7" "row8"
## [9] "row9" "row10"
5.2.3 Data Frames, Continued
Data frame columns can be refered to by name using the “dollar sign” operator
tab\$x ## [1] 1 2 3 4 5 6 7 8 9 10 tab\$y ## [1] 0.3842634 -1.2536470 -1.4098608 1.7139083 -0.1876347 -0.3343492 ## [7] 1.6553040 0.5974976 -1.8443792 1.4203497
Column names can be set, which can be useful for referring to data later
colnames(tab) ## [1] "x" "y" "z" colnames(tab) = paste0('col',1:3)
5.2.4 Exercise: Subsetting Data Frames
Try these
ncol(tab)
nrow(tab)
dim(tab)
summary(tab)
tab[1:3,]
tab[,2:3]
tab[,1]>7
tab[tab[,1]>7,]
tab[tab[,1]>7,3]
tab[tab[,1]>7,2:3]
tab[tab\$x>7,3]
tab\$z[tab\$x>3]
5.3 Basic Textual Input and Output
5.3.1 Reading and Writing Data Frames to Disk
The
write.table
function and friends write a data.frame or matrix to disk as a text file.write.table(tab,file='tab.txt',sep="\t",col.names=TRUE) # remove tab from the workspace rm(tab) # make sure it is gone ls(pattern="tab") ## character(0)
The
read.table
function and friends read a data.frame or matrix from a text file.tab = read.table('tab.txt',sep="\t",header=TRUE) head(tab,3) ## col1 col2 col3 ## row1 1 0.3842634 a1 ## row2 2 -1.2536470 a2 ## row3 3 -1.4098608 a3
5.4 Lists and Objects
5.4.1 Lists
A list is a collection of objects that may be the same or different types.
[The objects generally have names, and may be indexed either by name (e.g. my.list$name3) or component number (e.g. my.list[[3]])
A data frame is a list of matched column vectors.
5.4.2 Lists in Practice
Create a list, noting the different data types involved.
a = list(1,"b",c(1,2,3)) a ## [[1]] ## [1] 1 ## ## [[2]] ## [1] "b" ## ## [[3]] ## [1] 1 2 3 length(a) ## [1] 3 class(a) ## [1] "list" a[[3]] ## [1] 1 2 3
5.4.3 Lists in Practice
A data frame is a list.
# test if our friend "tab" is a list is.list(tab) ## [1] TRUE tab[[2]] ## [1] 0.3842634 -1.2536470 -1.4098608 1.7139083 -0.1876347 -0.3343492 ## [7] 1.6553040 0.5974976 -1.8443792 1.4203497 names(tab) ## [1] "col1" "col2" "col3"
5.4.4 Summary of Simple Data Types
Data type | Stores |
---|---|
real | floating point numbers |
integer | integers |
complex | complex numbers |
factor | categorical data |
character | strings |
logical | TRUE or FALSE |
NA | missing |
NULL | empty |
function | function type |
5.4.5 Summary of Aggregate Data Types
Data type | Stores |
---|---|
vector | one-dimensional data, single data type |
matrix | two-dimensional data, single data type |
data frame | two-dimensional data, multiple data types |
list | list of data types, not all need to be the same type |
object | a list with attributes and potentially slots and methods |