2.2 Brief Intro to R

Throughout this class, we use R and RStudio to visualize, analyze, and model real data. To straighten out which is which: R is the name of the language itself (syntax, words, etc.) and RStudio is a convenient software interface that you’ll interact with on the computer.

While you’ll be learning about and using R throughout the course, this is not a course on R. Our focus will be on data and statistical modelling. We will be using R and RStudio as tools to help us get information from data.

2.2.1 Basic Syntax

For this class, we will have data that we want to pass to a function that performs a particular operation (does something cool) on our data. Thus, we’ll pass inputs as arguments to a function:

FunctionName(argument1 = a1, argument2 = a2,..., argumentk = ak)

Note the FunctionName and the use of parantheses. Inside the parantheses, the argument name (argument1) goes first and the value you are passing as an input is after = (a1).

We may want to save the output of the function by assigning it a name using the assignment operator, <-:

OutputName <- FunctionName(argument1 = a1, argument2 = a2,..., argumentk = ak)

R allows us to be lazy and not include the argument name as long as we provide the input in the correct order:

OutputName <- FunctionName(a1, a2,..., ak)

We can also nest functions by first performing one operation and then passing that as an input into another function. In the code below, Function1() would first run with the input data and create some output that is then passed as the first input in Function2(). So R evaluates functions from the inside-out.

Function2(Function1(data))

As we go through real examples below, notice the names of the functions that we use. The name comes right before ( and the inputs we pass in right after (.

Additionally, we are going to use a shortcut that makes our code more readable. It is called a pipe and looks like %>%. What this does is pass the output on its left as the first argument to the function on the right. The following two sections of code do exactly the same thing but the second is easier to read. For this code, we take data and summarize the variable height and then take the mean of the heights.

summarize(data, mean(height))

data %>%
  summarize(mean(height))

There is so much more we could say about functions in R, but we will stop here for now.

With this in mind, we’ll point to external references if you’d like to go deeper in your understanding of R as a programming language throughout this class.

To get a broad sense of R, you can work through R primers (https://rstudio.cloud/learn/primers) in RStudio Cloud in addition to any coursework and use the R cheatsheets available online (https://rstudio.cloud/learn/cheat-sheets).