What is R?
R is a number of things, simultaneously. Depending on who you ask, R is:
- A software package
- A programming language
- A toolkit for developing statistical and analytical tools
- An extensive library of statistical and mathematical software and algorithms
- A scripting language
- A complete data science environment
- …
Why use R?
- R is cross-platform and runs on Windows, Mac, and Linux (as well as more obscure systems).
- R provides a vast number of useful statistical tools, many of which have been painstakingly tested.
- R produces publication-quality graphics in a variety of formats.
- R plays well with FORTRAN, C, and scripts in many languages.
- R scales, making it useful for small and large projects. It is NOT Excel.
- R does not have a meaningfully useful graphical user interface (GUI).
Why not use R?
- R cannot do everything.
- R is not always the “best” tool for the job.
- R will not hold your hand. Often, it will slap your hand instead.
- The documentation can be opaque (but there is documentation).
- R can drive you crazy (on a good day) or age you prematurely (on a bad one).
- Finding the right package to do the job you want to do can be challenging; worse, some contributed packages are unreliable.
- R does not have a meaningfully useful graphical user interface (GUI).
R License and the Open Source Ideal
R is free (yes, totally free!) and distributed under GNU license. In particular, this license allows one to:
- Download the source code
- Modify the source code to your heart’s content
- Distribute the modified source code and even charge money for it, but you must distribute the modified source code under the original GNU license]{}
This license means that R will always be available, will always be open source, and can grow organically without constraint.
R and Data Science
Data Science is both a field of study and a process. All of us practice data science when using R.
R has capabilities and shines in all aspects of the data science process.