It's time for my 3rd annual NHL playoff predictions! My model and methodology are basically unchanged from last year, so check it out here. In the two years I've been doing this, I've learned that it's pretty hard to predict the outcome of any one series, but if we look at it from a stratification standpoint, we can gain some valuable information. This is, instead of saying, "team x has a 60% chance of beating team y", it may be more constructive for us to say, "teams w, x, y, and z are more likely than the other teams to make deep playoff runs and win the Stanley Cup." Picking individual winners is hard, but we can see which teams rise to the top.
When I developed this model, I ran a tenfold cross-validation. That basically tells us how good the model is at predicting the outcome of new data. The past two years, the prediction accuracy is about 0.66. So the model should be able to correctly predict the series winner about 66% of the time. That's not great, but it's much better than just using point totals. Over the course of the past two seasons, there have been 30 playoff series, and the model has predicted the winner correctly 20 times, for an accuracy rate of 67%. So the model is performing as expected. So while 67% isn't great for an individual series, the model does seem to be able to tell us which teams are more likely to win 3 or 4 playoff series. Anyway, I ran the model again for this year, and here are the results:
The main takeaway is this: STL, PIT, LA, ANA, and WSH are the teams that rise to the top. They are the ones most likely to go deep into the playoffs and win the Cup.
I was a little surprised that San Jose, Chicago, Philadelphia came out as low as they did. These are good teams, but they have brutal paths to the Finals. San Jose would have to go through LA, then likely Anaheim and St. Louis just to get to the Final. Philly would have to go through Washington and Pittsburgh.
One other thing that stands out is that the model thinks that the Pens have a 91% chance of beating the Rangers. I don't think any team has a true probability of winning that is so high. The model thinks it's so high because there are only a few observations since 2007-2008 where the difference between the teams regarding score-adjusted Fenwick % (SAF) is so huge. In every series where the difference was this extreme, the team with the better SAF won the series. The Pens are favorites in this series, but probably not 91% likely to win.
Enjoy the 1st round!