Introduction to R

Len Goff



(Use right and left arrow keys to navigate)

Where does R fit into the constellation of software for economics research?

  • What most economists currently use:
    • Stata and R for applied microeconomics
    • Matlab for macro, computational
  • Increasingly, economists are using other general purpose programming languages for stats/machine learning:
    • Python, Julia
  • Other software you may have heard of (mostly older):
    • SAS, SPSS, GAUSS, Mathematica


Most of these software environments can do most of the things you might want. Main considerations: ease of coding, cost, speed, strength of user-generated libraries.

The workflow of a programming research project

Before we dive into the nuts-and-bolts of R, let's discuss the important issue of staying organized in a research project.

This is both a practical matter as well as important for the principle of replicability.

There should always be a complete enough record of the steps between your initial input of data (e.g. from the web) to the final results that somebody could repeat your whole analysis and arrive at the same ones, numerically.
  • This person will likely be you later, after you've forgotten exactly what you did..
  • Replicability is a strong reason to use code-based software rather than point-and-click (e.g. Excel)
  • Submitting code (and data, when possible) is now a requirement at many journals
  • Subtle issue: when using randomization-based methods (e.g. bootstrap standard errors, simulations), should set a "seed"

The project folder

When you embark on a research project (e.g. your senior thesis), I like to create a master folder for the project. Then I keep different types of things separate within it. e.g.

My Project Name/
  • raw data/
  • processed data/
  • code/
  • results/
  • notes/
  • literature/
  • presentations/
  • paper/

Try not to rely too heavily on filenames to remind you what a thing is. Rather, have a record of how file was produced (down).

Inside the raw data folder

If a file was produced by code (e.g. an intermediate dataset, table, figure) then there is already a record of how it got there.

If I downloaded data from the web, I like to include a "_datasource.txt" file that describes the URL, date of download, and any notes.

Similarly, if you had to do anything "by hand" to your data (e.g. format it somehow in Excel before import to R), it's good to keep a record of exactly what you did (e.g. in some sort of ``notes" file or perhaps as comments in your R code).

Considerations:

Wise to work within a folder that's synchronized to the cloud (e.g. Dropbox, iCloud, etc..). This also makes collaboration much easier.

If you have sensitive data, you may want to add "end-to-end" encryption. I use BoxCryptor.

"Version control": most popular software for this is called Git.
  • Additionally, may want an "/old" folder within each subfolder (e.g. /code) to save versions periodically and at salient points.


If you're going to be doing several research projects:
  • Nice to put all projects in parent folder, e.g. Dropbox/Research
  • Makes sense to keep raw data in a folder that's outside of project folders, if using some of the same data across projects

Now on to R!

Getting and running R

R is open-source software. Yay! You can download it for free here. Comes with a simple interface for editing and running code.

Usually, it's easiest to use an interactive interface to edit and run your code (scroll down for more info):
  • RStudio -- the most popular among economists
  • Jupyter notebooks
  • RGui (comes by default with R)
  • Using another "IDE", e.g. Eclipse, Atom


R can also be run from the command line, e.g. "R.exe CMD BATCH myscript.R" in Windows Command Prompt. Might be necessary if running on a server.

Rgui

Rstudio

A Jupyter notebook

You can install Jupyter notebooks by downloading the Anaconda software package. Then follow these instructions to integrate R.

We'll shortly discuss some basic syntax and tips for getting started quickly in R. But more important is knowing where to turn and how to make use of resources as you go.

stackoverflow.com is your friend!

For syntax of a particular function, can use "??" in RStudio, e.g. ??readstata13. Online, official PDF documentation will look like:

There are many tutorials on the web that cover basic syntax in R. Here is a nice one that presents it side-by-side with Python.

Getting started in R

The rest of this tutorial will be displayed as a Jupyter notebook.

Click here to open the notebook as a static webpage.

If you'd like to download the Jupyter notebook to run it interactively, click here.
  • This requires installing Jupyter (e.g. through Anaconda)
  • Note: Jupyter notebooks can also be run interactively in the cloud through the binder project