Checkpoint 3
You should work with your time series project partner on the R code, but I want each of you to write your own paragraphs of what you learn. This will be to your benefit to have separate insights when you work on writing up the mini-project.
Github Setup
If you haven’t already, create a free Github account by going to github.com. Choose a username that you’d like to use for a long time (something you’d be willing to share with future employers) and send Brianna your username. See Github Setup for more information.
To create a shared repository (for you and your partner + me), each person should go to https://classroom.github.com/a/0LJLLFiq. One of you create a new team and name it TS-Name1-Name2
filling in you and your partner’s name for Name1
and Name2
. Once the team is created, the other person can join that team.
Once that repository is created, you’ll see an Rmd template file and the csv file of the data in this repository. You can do this by setting it up with Github Desktop (my recommendation) or by uploading existing files.
The way you should view Github is that it is an online storage location (like Google Drive) that can do version history for code files in a more elegant manner than Google Drive. This can facilitate efficient collaboration. But you need to keep in mind that you have a “local version” on your computer and in order for your collaborator to see your updates, you must make a commit (snapshot of the file) and push the file to Github (in the cloud). In order to see any updates your collaborator makes, you must pull the changes from Github (from the cloud) to your local machine.
To commit, push, and pull, I recommend using Github Desktop by going through the Github Setup.
If that feels like too much right now, you can commit (and push) directly on Github.com by uploading files and then you can download files to pull.
Revisit Checkpoint 2
- With your partner, decide on one meter on Macalester’s campus you’d like to explore. You may choose one of the meters you used individually or choose another. Please list the Meter Name and Property Name that you together decided on.
ANSWER:
- Do a brief search about electricity usage and type (seasonality, sustainability goals on college campuses). Find 2 reputable sources (reputable journal articles or news sources or websites) on the topic. Write a short paragraph introducing general topic of electricity use and why you think it is interesting and important to investigate the usage over time.
ANSWER:
Source 1:
Source 2:
Paragraph [Person A]:
Paragraph [Person B]:
Visualize
- Come up with 2 interesting and informative visualizations of the time series and each of you write a paragraph summarizing what you learn about the data from the visualizations.
Paragraph of Plot 1 [Person A]:
Paragraph of Plot 2 [Person B]:
Detrend & Decompose
- Estimate the trend of the time series data; make a plot of the estimated trend and a plot of the left over residuals. Justify the method you used to estimate the trend and write a brief paragraph about what you learn about the data from the visualizations.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
- Estimate the seasonality and make a plot of the seasonality and a plot of the left over residuals. Justify the method you used to estimate the seasonality and write a brief paragraph about what you learn about the data from the visualizations.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
- Now, try going back to the original data and using differencing to remove the trend and seasonality. Make a plot of the left over residuals. Write a brief paragraph about what you learn about the data from the visualization.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
- Lastly, plot the sample autocorrelation function and the sample partial autocorrelation function (
acf2()
) of the errors after removing both the trend and seasonality [choose the errors from differencing or from estimating & removing]. Describe the patterns you see and make comments about any insights you might have about how to go about modeling the errors. The partial autocorrelation function gives the conditional correlation of points lag k apart, conditional on the data in between. [If we haven’t talked about what to do with info we gain from the pacf yet, you can still comment on what you observe].
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
Modeling Errors
- Come up with a list of candidate models for the errors based on the ACF and PACF. Justify those choices.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
- Fit the candidate models for the errors and compare them. Write a paragraph justifying the choice of one model over the other models.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]:
- Now fit your chosen model, incorporating the trend estimation or differencing in the model fit. If you used B-splines or a polynomial linear model, incorporate your estimation of the trend and seasonality into model fit using some example code below (consolidate your
lm
models into one). If you are using the differenced data, incorporate your differencing through d (trend) and D (seasonality) arguments in sarima(). Rerun the final models.
ANSWER:
Predicting the Future
Try this out, if you have time, otherwise, you can incorporate this into the mini-project.
- Create a prediction for the next 24 months in the future using
sarima.for()
. Make a plot of those predictions and tell a brief story about what they can tell you.
ANSWER:
Paragraph [Person A]:
Paragraph [Person B]: