14  Multiple linear regression: model building (part 2)

Settling In

  • Sit with people you might want to do the project with
    • This is not a commitment…
    • Introduce yourself
    • Check in with each other.
  • Help each other get ready to take notes!
    • Open your notebook.
    • Open the online manual to the “Course Schedule” and click on today’s activity. That brings you here!
    • You don’t need to work in a qmd file today! Take notes in your notebook.





Recap

NoteLearning goals

By the end of this lesson, you should be able to:

  • Distinguish between descriptive, predictive, and causal research questions
  • Iterate on a research question to make it more precise and answerable
  • Choose appropriate model(s) for addressing a research question

Q & A

What questions do you have about:

  • Interpretation of Coefficients
    • MLR (without interaction)
    • MLR (with interaction)
  • Model Building
    • R squared, Adjusted R squared
    • Redundancy, Multicollinearity
    • Causal Questions
      • Causal Diagrams
      • Relationship of Interest
      • Variable Roles (Confounder, Effect Modifier, Precision, Collider, Mediator)
    • Predictive Questions
      • Overfitting







Exercises

Step 0: Choose a data set

Look at the Data options for the project (see project description on Moodle).

For the purposes of today, choose 1 data for your table group to talk about.

Step 1: Brainstorm

Brainstorm possible 4-5 research questions about this data context:

Record possible research questions here

Now choose 1 of these questions to focus on for today.

Record chosen research questions here

For this question, answer the following questions as a group:

  • What variables do you need in a dataset to address this question?
  • What data summaries (not models) would help you answer this question, and why?
  • What plots (not models) would help you address your research question, and why?

Step 2: Descriptive Research Questions

Descriptive research questions are questions that seek to better understand the relationships between variables, without interest in causality. In practice, nearly every research question asked is ultimately interested in causality, but practical constraints (such as unmeasured confounding) lead us to ask descriptive questions instead.

If we’re only interested in associations (not causality), we don’t need to adjust for potential confounding variables in our model.

For your group’s chosen research question, write a model statement that would address a descriptive version of your research question below:

Model statement for a descriptive question here

Step 3: Predictive Research Questions

Predictive research questions seek to determine if (and how well) we can predict outcomes for new / future events, using the information we already have. We’ve seen a bit of prediction in this course when we talked about fitted values!

With your groups, discuss the following:

  • Is your research question predictive, or inferential? Inferential questions seek to understand the relationships between variables.
  • If your question were predictive, who would be interested/invested in the results from your project? How could the results from your project be used in practice?
  • Are there any variables that are not available to you in your data that you would include in your predictive model if you could? Why or why not?

Step 4: Causal Research Questions

Causal research questions are ultimately what most inferential statistics is interested in, regardless of whether or not we end up being able to make causal conclusions. From the videos for today, you learned about different types of variables, and whether or not they should be included or excluded from a model, depending on your causal research question.

With your groups, make a causal diagram (DAG) on the whiteboard for your research question. Consider including all variables you wish you had access to, even if they aren’t available in your data (this will help you later when talking about limitations of your analysis in your final paper), but certainly include relevant variables that are available in your data.

For each variable in your DAG that is available in your dataset, determine whether it should be included or excluded from your model. Use this to update your descriptive model statement from Step 2.

Model statement for a causal question here

Now look back at your DAG, and note if any of the variables that are not available in your data are potential confounders. If so, record them here (this means you likely won’t be able to draw causal conclusions):

List of “unmeasured” confounding variables here

Step 5: Repeat!

Repeat some of the above thinking for other interesting research questions that group members came up with in your brainstorming.

Step 6: Reflection

Today was all about iterating on a research question, and using those questions to guide the way we explore data and fit statistical models. How confident do you feel in distinguishing between descriptive, predictive, and causal research questions? How confident do you feel in knowing which components of a model matter more or less, in each specific case? What might help you feel more confident?

Response: Put your response here.





Wrap Up

This Friday:

  • No Class
  • Complete project group preference form (link on Moodle)
  • Attend 2 MSCS Capstone Talks
    • See MSCS Capstone Schedule in Moodle & on Slack!
    • Complete Reflections about talks in Capstone Reflection Moodle assignment

Next Monday, 3/9:

  • CP 10, PS 4 Due