Project Overview


I. Goals


  • Tell a story using data, on a topic of your choice, and at your own direction.

    • Why? Working outside the context of homework prompts can be overwhelming. The best way to confront this feeling with confidence is through practice.

    • How? Keep an open, creative mind and be kind to yourself. Know that there’s not one “right” answer to this analysis. Instead of pursuing the “perfect” analysis, pursue a logical set of decisions which lead to a reasonable analysis.

  • Showcase your interests, skills, etc to future employers, graduate programs, and friends / family. No matter your field of interest, it is important to have some independent projects to highlight, share, or discuss.

  • Practice critical skills in independent research: a growth mindset, teamwork, time-management, communication, and identification and use of important materials.

  • Be creative & have fun.





II. Structure

You will work in groups of 3 - 4 students which I will help form, using your input.

  • Why groups?

    • Projects are more successful when you have a team to bounce ideas off of, ask for help when you get stuck, catch your mistakes, complement skill sets, etc.

    • No matter your eventual career path, it will involve teamwork.

  • Each group member will be graded on their own contribution to every project stage. Thus grades may vary among group members.





III. General content

Your data story should include the following, all within context.

  • Motivation, research question, & background

    • What question are you trying to answer?

    • Why? Why is it important or interesting?

    • As necessary: What background information is necessary here? What assumptions or terms or acronyms need to be clarified?

  • Data

    • Data collection: What was collected, When was it collected, Why was it collected, How was it collected originally, Who collected it

    • Data acquisition: Where / how did you get the data? What is the source?

    • Data understanding: How much data do you have? What types of measurements? Anything you needed to clean before getting started?

  • Data insights
    It’s your job to explicitly identify and discuss key insights. Don’t simply present the audience with some code and output and expect them to do that work. Specifically:

    • What are the important takeaways from the data? What was interesting?

    • Why do these takeaways matter?

    • Was there anything surprising?

    • Overall, what do you want the audience to walk away with? What do you want them to understand about your data and research questions?!?

  • Conclusions / big picture

    • How do the insights connect to answer your research question?

    • What improvements might someone make to your analysis? Are there any limitations or weaknesses of your data / analysis?





IV. Assessment


Your project will be assessed according to the following components:



Learning outcome Requirements
Collaboration You fully engage with your group, support your group members, and contribute to each stage of the project.
Progress You successfully complete all individual & group project milestones.
Depth & growth

Your topic selection, analysis, and communication reflect depth and growth. Growth will require you to learn new things, and this looks like different things for different people, depending upon your data / computing experience prior to this course. Sometimes, these are quick tools (eg: learning how to angle axis labels if they’re not readable). Sometimes, these are bigger ideas:

Final report Due by 5pm on Last Day of Classes.
Content and delivery meet the expectations. (Details below)
Final presentation Final Exam Period Content and delivery meet the expectations. (Details below)
Supporting documentation Due by 5pm on Last Day of Classes. | You submit a zip folder with all supplemental material needed to reproduce your final report (e.g. data files, Rmd with all code, etc). |





V. Final report

Content

Your final report must cover the general content areas outlined in Section III.

Audience

People like your 112 peers – not familiar with your project, but comfortable with data.

Format

Unless discussed with me, your report must be submitted as an html in the form of a technical blog post. Here are some examples that give you a decent sense of an appropriate format and delivery, but don’t follow all structural requirements (e.g., most don’t include R code in the body of the paper, but you will).

German election issues | NYC Ubers | Trump tweets | Fortune cookies | Crossword puzzle scandal | Yelp tool for restaurant owners

Length

Roughly 1000 words.

Style

Writing
Your report should be: engaging (a broad audience will be turned off by an overly technical post), concise (a broad audience has limited time), and professional (grammar, spelling, and appropriate citations are always important). It should tell a cohesive story – don’t simply present a list of things you did or get distracted by elements that aren’t relevant to your research question.

Aesthetics
Your report should be visually pleasing and easy to follow. Be sure to utilize graphics, tables, etc to help illustrate your findings.

Code and Reproducibility
Your audience might want to reproduce your results. You must weave code throughout your report. This code must be properly commented, formatted, efficient, and easy to follow. Do not contain any code unnecessary to your final report – this is distracting.

Accessible, professional graphics
Your graphs must all have thoughtful axis labels (not just the default variable names), alt text, and figure captions OR titles. They must use color-blind friendly color palettes.





VI. Final presentation

Content

Your final presentation must cover the general content areas outlined in Section III.

Audience

People like your 112 peers – not familiar with your project, but comfortable with data.

Length

8-9 minutes for groups of 3 and 10-11 minutes for groups of 4.

Attendance & participation

Each group member must speak for roughly equal amounts of time.
You must be present at & engaged in all other presentations, not just your own.

Speaking

When speaking yourself, work toward the following: confidence, steady pacing, eye
contact, body language (don’t speak to the board), accessible volume. 

Format (slides)

Your talk should utilize a set of Google slides (shared with me). These slides should…

  • be organized and informative

  • be free of spelling and grammar errors

  • be clear and engaging (utilize pictures, avoid excessive text, etc)

  • only include R code when that code, not its output, is the point

  • utilize effective, accessible, & professional graphics (e.g. use thoughtful axis
    labels, figure captions or titles, and color-blind friendly palettes)

Style

Tell an engaging and cohesive story – don’t simply present a list of things you did or get distracted by elements that aren’t relevant to your research question.





VII. Data

You can work with any dataset that: (1) you haven’t used in this or other MSCS courses; and (2) that’s rich enough to produce an engaging project. You can take any of the following approaches to find data:

CAVEAT: You cannot collect your own data via a survey. This requires a lot of paperwork that isn’t feasible in our time frame.





VIII. General project timeline

  • Week 1: Establish foundations. Get in groups, finalize topics, get data, do preliminary data checks / cleaning, do some preliminary analysis (wrangling and plots). Do more if possible!!

  • Week 2: Complete the bulk of the analysis, and start thinking about final report / presentation.

  • Week 3: Finalize analysis and chip away at the final report / presentation.

  • Week 4: Finalize final report and presentation.