Databases, and relational databases in particular, are designed to handle large amounts of data and will maintain data consistency for concurrent users. For this reason, they are often used to store structured data. Connecting to relational databases from R is relatively straightforward and allows retrieval, filtering, manipulating, and visualizing data. In addition, since the data in a database can be filtered before loading into R, such interactions allow working with datasets that are much larger than can fit into memory. I will discuss the basics of relational databases and present a short tutorial on working with relational databases from R using standard SQL and dplyr.
The course materials include:
Thanks to the folks at Software Carpentry for ideas inspired by this extended SQL tutorial.