11 Factors

Author

Affiliation

Sean Davis

University of Colorado
Anschutz School of Medicine

Published

June 1, 2024

Modified

June 4, 2025

11.1 Factors

A factor is a special type of vector, normally used to hold a categorical variable–such as smoker/nonsmoker, state of residency, zipcode–in many statistical functions. Such vectors have class “factor”. Factors are primarily used in Analysis of Variance (ANOVA) or other situations when “categories” are needed. When a factor is used as a predictor variable, the corresponding indicator variables are created (more later).

Note of caution that factors in R often appear to be character vectors when printed, but you will notice that they do not have double quotes around them. They are stored in R as numbers with a key name, so sometimes you will note that the factor behaves like a numeric vector.

# create the character vector
citizen<-c("uk","us","no","au","uk","us","us","no","au") 

# convert to factor
citizenf<-factor(citizen)                                
citizen

[1] "uk" "us" "no" "au" "uk" "us" "us" "no" "au"

citizenf

[1] uk us no au uk us us no au
Levels: au no uk us

# convert factor back to character vector
as.character(citizenf)

[1] "uk" "us" "no" "au" "uk" "us" "us" "no" "au"

# convert to numeric vector
as.numeric(citizenf)

[1] 3 4 2 1 3 4 4 2 1

R stores many data structures as vectors with “attributes” and “class” (just so you have seen this).

attributes(citizenf)

$levels
[1] "au" "no" "uk" "us"

$class
[1] "factor"

class(citizenf)

[1] "factor"

# note that after unclassing, we can see the 
# underlying numeric structure again
unclass(citizenf)

[1] 3 4 2 1 3 4 4 2 1
attr(,"levels")
[1] "au" "no" "uk" "us"

Tabulating factors is a useful way to get a sense of the “sample” set available.

table(citizenf)

citizenf
au no uk us 
 2  2  2  3