Maintain the low bias of an unpruned tree while decreasing variance.
APPROACH
Build a bunch of unpruned trees from different data.
This way, our final result isn’t overfit to our sample data.
THE RUB
We only have 1 set of data…
EXAMPLE 4: Take a REsample of candy
We only have 1 sample of data.
But we can resample it (basically pretending we have a different sample).
Let’s each take our own unique candy resample (also known as bootstrapping):
Take a sample of 85 candies from the original 85 candies, with replacement.
Some data points will be sampled multiple times while others aren’t sampled at all.
On average, 2/3 of the original data points will show up in the resample and 1/3 will be left out. [Mathematical proof of this on course website]
Take your resample:
# Set the seed to YOUR phone number (just the numbers)set.seed(123456789)# Take a REsample of candies from our samplemy_candy <-sample_n(candy, size =nrow(candy), replace =TRUE)# Check it outhead(my_candy, 3)
chocolate fruity caramel nutty nougat wafer hard bar
Snickers Crisper...1 TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE
Fruit Chews...2 FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Nestle Crunch...3 TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
pluribus sugar price popularity
Snickers Crisper...1 FALSE 0.604 0.651 med
Fruit Chews...2 TRUE 0.127 0.034 med
Nestle Crunch...3 FALSE 0.313 0.767 high
In the next exercise, we’ll each build a tree of popularity using our own resample data.
First, check your intuition:
TRUE / FALSE: All of our trees will be the same.
TRUE / FALSE: Our trees will use the same predictor (but possibly a different cut-off) in the first split.
TRUE / FALSE: Our trees will use the same predictors in all splits.
FALSE, FALSE, FALSE
EXAMPLE 5: Build & share YOUR tree
Build and plot a tree using your unique sample (my_candy).
Use your tree to classify Baby Ruth, the 7th candy in the original data.
Finally, share your results!
Record your prediction and paste a picture of your tree into this document.
EXAMPLE 6: Using our FOREST
We now have a group of multiple trees – a forest!
These trees…
differ from resample to resample
don’t use the same predictor in each split (not even in the first split)!
produce different popularity predictions for Baby Ruth
Based on our forest of trees (not just your 1 tree), what’s your prediction for Baby Ruth’s popularity?
What do you think are the advantages of predicting candy popularity using a forest instead of a single tree?
Can you anticipate any drawbacks of using forests instead of trees?
Bagging (Bootstrap AGGregatING) & Random Forests
To classify a categorical response variable y using a set of p predictors x:
Take B resamples from the original sample.
Sample WITH replacement
Sample size = original sample size n
Use each resample to build an unpruned tree.
For bagging: consider all p predictors in each split of each tree
For random forests: at each split in each tree, randomly select and consider only a subset of the predictors (often roughly p/2 or \(\sqrt{p}\))
Use each of the B trees to classify y at a set of predictor values x.
Average the classifications using a majority vote: classify y as the most common classification among the B trees.
Ensemble Methods
Bagging and random forest algorithms are ensemble methods.
They combine the outputs of multiple machine learning algorithms.
As a result, they decrease variability from sample to sample, hence provide more stable predictions / classifications than might be obtained by any algorithm alone.
EXAMPLE 7: pros & cons
Order trees, forests, & bagging algorithms from least to most computationally expensive.
What results will be easier to interpret: trees or forests?
Which of bagging or forests will produce a collection of trees that tend to look very similar to each other, and similar to the original tree? Hence which of these algorithms is more dependent on the sample data, thus will vary more if we change up the data? [both questions have the same answer]
Small Group Activity
For the rest of the class, work together on Exercises 1-7.