9  Lists

Published

June 1, 2024

Modified

June 6, 2025

9.1 The Power of a “Catch-All” Container

So far in our journey through R’s data structures, we’ve dealt with vectors and matrices. These are fantastic tools, but they have one strict rule: all their elements must be of the same data type. You can have a vector of numbers or a matrix of characters, but you can’t mix and match.

But what about real-world biological data? A single experiment can generate a dizzying variety of information. Imagine you’re studying a particular gene. You might have:

  • The gene’s name (text).
  • Its expression level across several samples (a set of numbers).
  • A record of whether it’s a known cancer-related gene (a simple TRUE/FALSE).
  • The raw fluorescence values from your qPCR machine (a matrix of numbers).
  • Some personal notes about the experiment (a paragraph of text).

How could you possibly store all of this related, yet different, information together? You could create many separate variables, but that would be clunky and hard to manage. This is exactly the problem that lists are designed to solve.

A list in R is like a flexible, multi-compartment container. It’s a single object that can hold a collection of other R objects, and those objects can be of any type, length, or dimension. You can put vectors, matrices, logical values, and even other lists inside a single list. This makes them one of the most fundamental and powerful data structures for bioinformatics analysis.

The key features of lists are:

  • Flexibility: They can contain a mix of any data type.
  • Organization: You can and should name the elements of a list, making your data self-describing.
  • Hierarchy: Because lists can contain other lists, you can create complex, nested data structures to represent sophisticated relationships in your data.

9.2 Creating a List

You create a list with the list() function. The best practice is to name the elements as you create them. This makes your code infinitely more readable and your data easier to work with.

Let’s create a list to store the information for our hypothetical gene study.

# An experiment tracking list for the gene TP53
experiment_data <- list(
  experiment_id = "EXP042",
  gene_name = "TP53",
  read_counts = c(120, 155, 98, 210),
  is_control = FALSE,
  sample_matrix = matrix(1:4, nrow = 2, dimnames = list(c("Treated", "Untreated"), c("Replicate1", "Replicate2")))
)

# --- Function Explainer: print() ---
# The print() function displays the contents of an R object in the console. 
# For a list, it shows each element and its contents. It's the default action 
# when you just type the variable's name and hit Enter.
print(experiment_data)
$experiment_id
[1] "EXP042"

$gene_name
[1] "TP53"

$read_counts
[1] 120 155  98 210

$is_control
[1] FALSE

$sample_matrix
          Replicate1 Replicate2
Treated            1          3
Untreated          2          4

9.3 Inspecting Your List: What’s Inside?

When someone hands you a tube in the lab, the first thing you do is look at the label. When R gives you a complex object like a list, you need to do the same. R provides several “introspection” functions to help you understand the contents and structure of your lists.

9.3.1 str(): The Structure Function

This is arguably the most useful function for inspecting any R object, especially lists.

# --- Function Explainer: str() ---
# The str() function provides a compact, human-readable summary of an 
# object's internal "str"ucture. It's your best friend for understanding 
# what's inside a list, including the type and a preview of each element.
str(experiment_data)
List of 5
 $ experiment_id: chr "EXP042"
 $ gene_name    : chr "TP53"
 $ read_counts  : num [1:4] 120 155 98 210
 $ is_control   : logi FALSE
 $ sample_matrix: int [1:2, 1:2] 1 2 3 4
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "Treated" "Untreated"
  .. ..$ : chr [1:2] "Replicate1" "Replicate2"

The output of str() tells us everything we need to know: it’s a “List of 5”, and for each of the 5 elements, it shows the name (e.g., experiment_id), the data type (e.g., chr for character, num for numeric), and a preview of the content.

9.3.2 length(), names(), and class()

These functions give you more specific information about the list itself.

# --- Function Explainer: length() ---
# For a list, length() tells you how many top-level elements it contains.
length(experiment_data)
[1] 5
# --- Function Explainer: names() ---
# The names() function extracts the names of the elements in a list as a 
# character vector. It's a great way to see what you can access.
names(experiment_data)
[1] "experiment_id" "gene_name"     "read_counts"   "is_control"   
[5] "sample_matrix"
# --- Function Explainer: class() ---
# The class() function tells you the type of the object itself. 
# This is useful to confirm you are indeed working with a list.
class(experiment_data)
[1] "list"

9.4 Accessing List Elements: Getting Things Out

Okay, you’ve packed your experimental data into a list. Now, how do you get specific items out? This is a critical concept, and R has a few ways to do it, each with a distinct purpose.

9.4.1 The Mighty [[...]] and $ for Single Items

To pull out a single element from a list in its original form, you use either double square brackets [[...]] or the dollar sign $ (for named lists). Think of this as carefully reaching into a specific compartment of your container and taking out the item itself.

Let’s use our experiment_data list.

# Get the gene name using [[...]]
gene <- experiment_data[["gene_name"]]
print(gene)
[1] "TP53"
class(gene) # It's a character vector, just as it was when we put it in.
[1] "character"
# Get the read counts using the $ shortcut. This is often easier to read.
reads <- experiment_data$read_counts
print(reads)
[1] 120 155  98 210
class(reads) # It's a numeric vector.
[1] "numeric"
# The [[...]] has a neat trick: you can use a variable to specify the name.
element_to_get <- "read_counts"
experiment_data[[element_to_get]]
[1] 120 155  98 210

The key takeaway is that [[...]] and $ extract the element. The result is the object that was stored inside the list.

9.4.2 The Subsetting [...] for New Lists

The single square bracket [...] behaves differently. It always returns a new, smaller list that is a subset of the original list. It’s like taking a whole compartment, label and all, out of your larger container.

# Get the gene name using [...]
gene_sublist <- experiment_data["gene_name"]

print(gene_sublist)
$gene_name
[1] "TP53"
# --- Note the class! ---
# The result is another list, which contains the gene_name element.
class(gene_sublist) 
[1] "list"

This distinction is vital. If you want to perform a calculation on an element (like finding the mean() of read_counts), you must extract it with [[...]] or $. If you tried mean(experiment_data["read_counts"]), R would give you an error because you can’t calculate the mean of a list!

9.5 Modifying Lists

Your data is rarely static. You can easily add, remove, or update elements in a list after you’ve created it.

9.5.1 Adding and Updating Elements

You can add a new element or change an existing one by using the $ or [[...]] assignment syntax.

# Add the date of the experiment
experiment_data$date <- "2024-06-05"

# Add some notes using the [[...]] syntax
experiment_data[["notes"]] <- "Initial pilot experiment. High variance in read counts."

# Let's update the control status
experiment_data$is_control <- TRUE

# Let's look at the structure now
str(experiment_data)
List of 7
 $ experiment_id: chr "EXP042"
 $ gene_name    : chr "TP53"
 $ read_counts  : num [1:4] 120 155 98 210
 $ is_control   : logi TRUE
 $ sample_matrix: int [1:2, 1:2] 1 2 3 4
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "Treated" "Untreated"
  .. ..$ : chr [1:2] "Replicate1" "Replicate2"
 $ date         : chr "2024-06-05"
 $ notes        : chr "Initial pilot experiment. High variance in read counts."

9.5.2 Removing Elements

To remove an element from a list, you simply assign NULL to it. NULL is R’s special object representing nothingness.

# We've decided the matrix isn't needed for this summary object.
experiment_data$sample_matrix <- NULL

# See the final structure of our list
str(experiment_data)
List of 6
 $ experiment_id: chr "EXP042"
 $ gene_name    : chr "TP53"
 $ read_counts  : num [1:4] 120 155 98 210
 $ is_control   : logi TRUE
 $ date         : chr "2024-06-05"
 $ notes        : chr "Initial pilot experiment. High variance in read counts."

9.6 A Biological Example: A Self-Contained Gene Record

Let’s put this all together. Lists are perfect for creating self-contained records that you can easily pass to functions or combine into larger lists.

# --- Function Explainer: log2() ---
# The log2() function calculates the base-2 logarithm. It's very common in 
# gene expression analysis to transform skewed count data to make it more 
# symmetric and easier to model.

brca1_gene <- list(
  gene_symbol = "BRCA1",
  full_name = "BRCA1 DNA repair associated",
  chromosome = "17",
  expression_log2 = log2(c(45, 50, 30, 88, 120)),
  related_diseases = c("Breast Cancer", "Ovarian Cancer")
)

# Now we can easily work with this structured information

# --- Function Explainer: cat() ---
# The cat() function concatenates and prints its arguments to the console.
# Unlike print(), it allows you to seamlessly join text and variables, and 
# the "\n" character is used to add a newline (a line break).
cat("Analyzing gene:", brca1_gene$gene_symbol, "\n")
Analyzing gene: BRCA1 
cat("Located on chromosome:", brca1_gene$chromosome, "\n")
Located on chromosome: 17 
# Calculate the average log2 expression
# --- Function Explainer: mean() ---
# The mean() function calculates the arithmetic average of a numeric vector.
avg_expression <- mean(brca1_gene$expression_log2)
cat("Average log2 expression:", avg_expression, "\n")
Average log2 expression: 5.881784 

This simple brca1_gene list is now a complete, portable record. You could imagine creating a list of these gene records, creating a powerful, hierarchical database for your entire project.