5  Reading and writing data files

Published

July 26, 2024

5.1 Introduction

In this chapter, we will discuss how to read and write data files in R. Data files are essential for storing and sharing data across different platforms and applications. R provides a variety of functions and packages to read and write data files in different formats, such as text files, CSV files, Excel files. By mastering these functions, you can efficiently import and export data in R, enabling you to perform data analysis and visualization tasks effectively.

5.2 CSV files

Comma-Separated Values (CSV) files are a common file format for storing tabular data. They consist of rows and columns, with each row representing a record and each column representing a variable or attribute. CSV files are widely used for data storage and exchange due to their simplicity and compatibility with various software applications. In R, you can read and write CSV files using the read.csv() and write.csv() functions, respectively. A commonly used alternative is to use the readr package, which provides faster and more user-friendly functions for reading and writing CSV files.

5.2.1 Writing a CSV file

Since we are going to use the readr package, we need to install it first. You can install the readr package using the following command:

Once the package is installed, you can load it into your R session using the library() function:

Since we don’t have a CSV file sitting around, let’s create a simple data frame to write to a CSV file. Here’s an example data frame:

df <- data.frame(
  id = c(1, 2, 3, 4, 5),
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  age = c(25, 30, 35, 40, 45)
)

Now, you can write this data frame to a CSV file using the write_csv() function from the readr package. Here’s how you can do it:

write_csv(df, "data.csv")

You can check the current working directory to see if the CSV file was created successfully. If you want to specify a different directory or file path, you can provide the full path in the write_csv() function.

# see what the current working directory is
getwd()
[1] "/Users/seandavis/Documents/git/RBiocBook"
# and check to see that the file was created
dir(pattern = "data.csv")
[1] "data.csv"

5.2.2 Reading a CSV file

Now that we have a CSV file, let’s read it back into R using the read_csv() function from the readr package. Here’s how you can do it:

df2 <- read_csv("data.csv")
Rows: 5 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): name
dbl (2): id, age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

You can check the structure of the data frame df2 to verify that the data was read correctly:

df2
# A tibble: 5 × 3
     id name      age
  <dbl> <chr>   <dbl>
1     1 Alice      25
2     2 Bob        30
3     3 Charlie    35
4     4 David      40
5     5 Eve        45

The readr package can read CSV files with various delimiters, headers, and data types, making it a versatile tool for handling tabular data in R. It can also read CSV files directly from web locations like so:

df3 <- read_csv("https://data.cdc.gov/resource/pwn4-m3yp.csv")
Rows: 1000 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): state
dbl  (6): tot_cases, new_cases, tot_deaths, new_deaths, new_historic_cases, ...
dttm (3): date_updated, start_date, end_date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The dataset that you just downloaded is described here: Covid-19 data from CDC

5.3 Excel files

Microsoft Excel files are another common file format for storing tabular data. Excel files can contain multiple sheets, formulas, and formatting options, making them a popular choice for data storage and analysis. In R, you can read and write Excel files using the readxl package. This package provides functions to import and export data from Excel files, enabling you to work with Excel data in R.

5.3.1 Reading an Excel file

To read an Excel file in R, you need to install and load the readxl package. You can install the readxl package using the following command:

Once the package is installed, you can load it into your R session using the library() function:

Now, you can read an Excel file using the read_excel() function from the readxl package. We don’t have an excel file available, so let’s download one from the internet. Here’s an example:

download.file('https://www.w3resource.com/python-exercises/pandas/excel/SaleData.xlsx', 'SaleData.xlsx')

Now, you can read the Excel file into R using the read_excel() function:

df_excel <- read_excel("SaleData.xlsx")

You can check the structure of the data frame df_excel to verify that the data was read correctly:

df_excel
# A tibble: 45 × 8
   OrderDate           Region  Manager SalesMan  Item  Units Unit_price Sale_amt
   <dttm>              <chr>   <chr>   <chr>     <chr> <dbl>      <dbl>    <dbl>
 1 2018-01-06 00:00:00 East    Martha  Alexander Tele…    95       1198   113810
 2 2018-01-23 00:00:00 Central Hermann Shelli    Home…    50        500    25000
 3 2018-02-09 00:00:00 Central Hermann Luis      Tele…    36       1198    43128
 4 2018-02-26 00:00:00 Central Timothy David     Cell…    27        225     6075
 5 2018-03-15 00:00:00 West    Timothy Stephen   Tele…    56       1198    67088
 6 2018-04-01 00:00:00 East    Martha  Alexander Home…    60        500    30000
 7 2018-04-18 00:00:00 Central Martha  Steven    Tele…    75       1198    89850
 8 2018-05-05 00:00:00 Central Hermann Luis      Tele…    90       1198   107820
 9 2018-05-22 00:00:00 West    Douglas Michael   Tele…    32       1198    38336
10 2018-06-08 00:00:00 East    Martha  Alexander Home…    60        500    30000
# ℹ 35 more rows

The readxl package provides various options to read Excel files with multiple sheets, specific ranges, and data types, making it a versatile tool for handling Excel data in R.

5.3.2 Writing an Excel file

To write an Excel file in R, you can use the write_xlsx() function from the writexl package. You can install the writexl package using the following command:

install.packages("writexl")

Once the package is installed, you can load it into your R session using the library() function:

The write_xlsx() function allows you to write a data frame to an Excel file. Here’s an example:

write_xlsx(df, "data.xlsx")

You can check the current working directory to see if the Excel file was created successfully. If you want to specify a different directory or file path, you can provide the full path in the write_xlsx() function.

# see what the current working directory is
getwd()
[1] "/Users/seandavis/Documents/git/RBiocBook"
# and check to see that the file was created
dir(pattern = "data.xlsx")
[1] "data.xlsx"

5.4 Additional options

  • Google Sheets: You can read and write data from Google Sheets using the googlesheets4 package. This package provides functions to interact with Google Sheets, enabling you to import and export data from Google Sheets to R.
  • JSON files: You can read and write JSON files using the jsonlite package. This package provides functions to convert R objects to JSON format and vice versa, enabling you to work with JSON data in R.
  • Database files: You can read and write data from database files using the DBI and RSQLite packages. These packages provide functions to interact with various database systems, enabling you to import and export data from databases to R.