library(tidyverse)
<- read_csv("~/Desktop/112/data/my_data.csv") my_data
File organization
Change the default file download location for your internet browser
- Generally by default, internet browsers automatically save all files to the
Downloads
folder on your computer. This does not encourage good file organization practices. You need to change this option so that your browser asks you where to save each file before downloading it. - This page has information on how to do this for the most common browsers.
Folder/directory structure
Another word for a computer folder is a directory.
Why is a good directory structure important?
- Organization: It helps you keep track of your files and folders.
- Collaboration: It makes it easier for others to understand your work.
- Reproducibility: It helps you and others reproduce your work.
- Efficiency: It saves you time when you need to find something.
Course Materials Directory Structure
When starting a new semester and a new course, I recommend setting up a directory structure such as the one below. Sub-bullets indicate folders that are inside other folders.
Documents
Course_Work
2024_Fall
STAT212
class_activities
homework
project
Data Science Project Directory Structure
When working on any data science project, I recommend setting up the directory structure below. Sub-bullets indicate folders that are inside other folders.
Documents
(This should be some place you can find easily through your Finder (Mac) or File Explorer (Windows).)descriptive_project_name
code
: All code files (.R
,.Rmd
,.qmd
) should go here. Recommendation:raw
: For messy code that you’re actively working on or used for explorationexplore_visualizations.qmd
: for exploratory plotsexplore_modeling.qmd
: for any statistical or predictive modeling
clean
: For code that you have cleaned up, documented, organized, and tested to run as expectedcleaning.qmd
: for data acquisition and wrangling. Save (write) the cleaned dataset at the end of this file withreadr::write_csv()
.final_visualizations.qmd
: for final plotsfinal_modeling.qmd
: for any statistical or predictive modeling
data
: All data files go here (raw and cleaned versions)raw
: Original data that hasn’t been cleaned (this might involve large files that can’t be pushed to Github)clean
: Any non-original data that has been processed in some way
results
: e.g., written narratives, plots saved as images, results tablesreport.qmd
: for written narrativefigures
: Saved plots (e.g., png, jpeg, svg, pdf, tiff) from using `ggsave()
that will be used in communicating your project conclusions should go here. (Using screenshots of output in RStudio is not a good practice.)tables
: Any sort of plain text file results (e.g., CSVs)interactive
: for interactive shiny apps
From this point onward, we will use a simplified version of this directory structure for all of our class activities.
File paths
When you read in data from a source on your computer, you need to specify the file path correctly. A file path is an address to where the file is stored on our computer.
Consider “1600 Grand Ave, St. Paul, MN 55105”. Think about how different parts of the address give increasingly more specific information about the location. “St. Paul, MN 55105” tells us the city and smaller region within the city, “““Grand Ave” tells us the street, and “1600” tells us the specific location on the street.
In the example below, the file path tells us the location giving more and more specific information as you read it from left to right.
- “~” on an Apple computer tells you that you are looking in the user’s home directory.
- “Desktop” tells you to go to the Desktop within that home directory.
- “112” tells you that you are looking in the 112 folder on the Desktop.
- “data” tells you to next go in the data folder in the 112 folder.
- “my_data.csv” tells you that you are looking for a file called my_data.csv location within the data folder.
There are two types of paths: absolute and relative.
Absolute file paths start at the “root” directory in a computer system. Examples:
- Mac:
~/Desktop/Course_Work/2024_fall/STAT212/class_activities/advanced_maps/us_states_hexgrid.geojson
- On a Mac the tilde
~
in a file path refers to the “Home” directory, which is typically a user-specific directory
- On a Mac the tilde
- Windows:
C:/Users/lesliemyint/Documents/Course_Work/2024_fall/STAT212/class_activities/advanced_maps/us_states_hexgrid.geojson
- Note: Windows uses both
/
(forward slash) and\
(backward slash) to separate folders in a file path.
- Note: Windows uses both
Relative file paths start wherever you are right now (the working directory). Note that the working directory when you’re working in a code file may be different from the working directory specified in the Console.
Directory setup 1: Data is in same folder as code file
project_folder
your_code_file.qmd
data.csv
There are two options for specifying the relative path:
./data.csv
(The./
refers to the current working directory.)data.csv
Directory setup 2: Data is within a subfolder called data
project_folder
your_code_file.qmd
data
data.csv
The relative path would be data/data.csv
. (Note: ./data/data.csv
would also work.)
Directory setup 3: Need to go to a “parent” folder first to get to the data
project_folder
data
data.csv
code
your_code_file.qmd
From your_code_file.qmd
, you must go “up” to the parent folder of code
to project_folder
and then back “down” into the data
folder. To go “up” to a parent folder in a relative path we use ../
.
The relative path to the data file would be ../data/data.csv
.