1  Introducing R and RStudio

Author

Sean Davis

Published

June 6, 2024

Questions

  • What is R?
  • Why use R?
  • Why not use R?
  • Why use RStudio and how does it differ from R?

Learning Objectives

  • Know advantages of analyzing data in R
  • Know advantages of using RStudio
  • Be able to start RStudio on your computer
  • Identify the panels of the RStudio interface
  • Be able to customize the RStudio layout

1.1 Introduction

In this chapter, we will discuss the basics of R and RStudio, two essential tools in genomics data analysis. We will cover the advantages of using R and RStudio, how to set up RStudio, and the different panels of the RStudio interface.

1.2 What is R?

[R](https://en.wikipedia.org/wiki/R_(programming_language) is a programming language and software environment designed for statistical computing and graphics. It is widely used by statisticians, data scientists, and researchers for data analysis and visualization. R is an open-source language, which means it is free to use, modify, and distribute. Over the years, R has become particularly popular in the fields of genomics and bioinformatics, owing to its extensive libraries and powerful data manipulation capabilities.

The R language is a dialect of the S language, which was developed in the 1970s at Bell Laboratories. The first version of R was written by Robert Gentleman and Ross Ihaka and released in 1995 (see this slide deck for Ross Ihaka’s take on R’s history). Since then, R has been continuously developed by the R Core Team, a group of statisticians and computer scientists. The R Core Team releases a new version of R every year.

1.3 Why use R?

There are several reasons why R is a popular choice for data analysis, particularly in genomics and bioinformatics. These include:

  1. Open-source: R is free to use and has a large community of developers who contribute to its growth and development. What is “open-source”?
  2. Extensive libraries: There are thousands of R packages available for a wide range of tasks, including specialized packages for genomics and bioinformatics. These libraries have been extensively tested and ara available for free.
  3. Data manipulation: R has powerful data manipulation capabilities, making it easy (or at least possible) to clean, process, and analyze large datasets.
  4. Graphics and visualization: R has excellent tools for creating high-quality graphics and visualizations that can be customized to meet the specific needs of your analysis. In most cases, graphics produced by R are publication-quality.
  5. Reproducible research: R enables you to create reproducible research by recording your analysis in a script, which can be easily shared and executed by others. In addition, R does not have a meaningful graphical user interface (GUI), which renders analysis in R much more reproducible than tools that rely on GUI interactions.
  6. Cross-platform: R runs on Windows, Mac, and Linux (as well as more obscure systems).
  7. Interoperability with other languages: R can interfact with FORTRAN, C, and many other languages.
  8. Scalability: R is useful for small and large projects.

I can develop code for analysis on my Mac laptop. I can then install the same code on our 20k core cluster and run it in parallel on 100 samples, monitor the process, and then update a database (for example) with R when complete.

1.4 Why not use R?

  • R cannot do everything.
  • R is not always the “best” tool for the job.
  • R will not hold your hand. Often, it will slap your hand instead.
  • The documentation can be opaque (but there is documentation).
  • R can drive you crazy (on a good day) or age you prematurely (on a bad one).
  • Finding the right package to do the job you want to do can be challenging; worse, some contributed packages are unreliable.]{}
  • R does not have a meaningfully useful graphical user interface (GUI).

1.5 R License and the Open Source Ideal

R is free (yes, totally free!) and distributed under GNU license. In particular, this license allows one to:

  • Download the source code
  • Modify the source code to your heart’s content
  • Distribute the modified source code and even charge money for it, but you must distribute the modified source code under the original GNU license]{}

This license means that R will always be available, will always be open source, and can grow organically without constraint.

1.6 RStudio

RStudio is an integrated development environment (IDE) for R. It provides a graphical user interface (GUI) for R, making it easier to write and execute R code. RStudio also provides several other useful features, including a built-in console, syntax-highlighting editor, and tools for plotting, history, debugging, workspace management, and workspace viewing. RStudio is available in both free and commercial editions; the commercial edition provides some additional features, including support for multiple sessions and enhanced debugging

1.6.1 Getting started with RStudio

To get started with RStudio, you first need to install both R and RStudio on your computer. Follow these steps:

  1. Download and install R from the official R website.
  2. Download and install RStudio from the official RStudio website.
  3. Launch RStudio. You should see the RStudio interface with four panels.

1.6.2 The RStudio Interface

RStudio’s interface consists of four panels (see Figure 1.2):

  • Console
    This panel displays the R console, where you can enter and execute R commands directly. The console also shows the output of your code, error messages, and other information.
  • Source
    This panel is where you write and edit your R scripts. You can create new scripts, open existing ones, and run your code from this panel.
  • Environment
    This panel displays your current workspace, including all variables, data objects, and functions that you have created or loaded in your R session.
  • Plots, Packages, Help, and Viewer
    These panels display plots, installed packages, help files, and web content, respectively.
Figure 1.2: The RStudio interface. In this layout, the source pane is in the upper left, the console is in the lower left, the environment panel is in the top right and the viewer/help/files panel is in the bottom right.
Do I need to use RStudio?

No. You can use R without RStudio. However, RStudio makes it easier to write and execute R code, and it provides several useful features that are not available in the basic R console. Note that the only part of RStudio that is actually interacting with R directly is the console. The other panels are simply providing a GUI that enhances the user experience.

Customizing the RStudio Interface

You can customize the layout of RStudio to suit your preferences. To do so, go to Tools > Global Options > Appearance. Here, you can change the theme, font size, and panel layout. You can also resize the panels as needed to gain screen real estate (see Figure 1.3).

Figure 1.3: Dealing with limited screen real estate can be a challenge, particularly when you want to open another window to, for example, view a web page. You can resize the panes by sliding the center divider (red arrows) or by clicking on the minimize/maximize buttons (see blue arrow).

In summary, R and RStudio are powerful tools for genomics data analysis. By understanding the advantages of using R and RStudio and familiarizing yourself with the RStudio interface, you can efficiently analyze and visualize your data. In the following chapters, we will delve deeper into the functionality of R, Bioconductor, and various statistical methods to help you gain a comprehensive understanding of genomics data analysis.