10  Factors

Published

July 26, 2024

10.1 Factors

A factor is a special type of vector, normally used to hold a categorical variable–such as smoker/nonsmoker, state of residency, zipcode–in many statistical functions. Such vectors have class “factor”. Factors are primarily used in Analysis of Variance (ANOVA) or other situations when “categories” are needed. When a factor is used as a predictor variable, the corresponding indicator variables are created (more later).

Note of caution that factors in R often appear to be character vectors when printed, but you will notice that they do not have double quotes around them. They are stored in R as numbers with a key name, so sometimes you will note that the factor behaves like a numeric vector.

# create the character vector
citizen<-c("uk","us","no","au","uk","us","us","no","au") 

# convert to factor
citizenf<-factor(citizen)                                
citizen             
[1] "uk" "us" "no" "au" "uk" "us" "us" "no" "au"
citizenf
[1] uk us no au uk us us no au
Levels: au no uk us
# convert factor back to character vector
as.character(citizenf)
[1] "uk" "us" "no" "au" "uk" "us" "us" "no" "au"
# convert to numeric vector
as.numeric(citizenf)
[1] 3 4 2 1 3 4 4 2 1

R stores many data structures as vectors with “attributes” and “class” (just so you have seen this).

attributes(citizenf)
$levels
[1] "au" "no" "uk" "us"

$class
[1] "factor"
class(citizenf)
[1] "factor"
# note that after unclassing, we can see the 
# underlying numeric structure again
unclass(citizenf)
[1] 3 4 2 1 3 4 4 2 1
attr(,"levels")
[1] "au" "no" "uk" "us"

Tabulating factors is a useful way to get a sense of the “sample” set available.

table(citizenf)
citizenf
au no uk us 
 2  2  2  3