Data Sources (check out these sites)
Man Games Lost
This year's model was similar to last year's model despite my ability to test many more variables. Within the past year, we've seen some great hockey data sites (notably war-on-ice) come online, and we have more data than ever before. Last year's model consisted of score-adjusted Fenwick percentage, 5v5 save percentage, and penalty killing percentage. I tested more variables this year, including: score-adjusted Fenwick percentage over last 20 regular season games, score-adjusted Fenwick for per 60 mins, score-adjusted Fenwick against per 60 mins, late season (March and April) score-adjusted Fenwick for per 60 mins, late season score-adjusted Fenwick against per 60 mins, 5v5 save percentage of anticipated series goalie, 5v5 adjusted save percentage of anticipated goalie, 5v5 high danger save percentage of anticipated goalie, shorthanded Fenwick against per 60 mins, adjusted short-handed save percentage of anticipated goalie, power play Fenwick for per 60 mins, and Time Missed Impact to Team.
I ran a logistic regression model and used ten-fold cross validation to test its predictive ability. Model details may be coming in another post. The final model consisted of the following 4 predictors: score-adjusted Fenwick percentage in last 20 games, team 5v5 save percentage, penalty killing percentage, and power play percentage. Like last year, power play percentage was not statistically significant, but it did slightly help the predictive ability of the model, so I left it in. I used data from the 2007-2008 season up through the 2013-2014 season to construct the model. The sample size is 105 playoff series over that time period.
The logistic model can calculate the probability of a team winning a playoff series. It can take any potential match-up, input the teams' peripheral statistics, and calculate a probability of outcome. Using those probabilities, we can simulate the playoffs a whole bunch of times (10,000 in our case) and see how often each team wins.
We don't know everything. We can't accurately capture everything that happens on the ice and turn it into sure-fire predictions. There's a lot of inherent randomness in hockey, and a lot of data that we're not yet able to collect. But what we can say is that teams that are good at puck possession heading into the playoffs, good on the penalty kill, and good at 5v5 save percentage are more likely to win than teams that aren't as good at those things.
Of course, a goalie can go on an incredible run and carry a team. That's how the Bruins won the Cup in 2011. Any team can beat any other team in a small sample size series. I have also not adjusted for injuries. We know that Kris Letang is out for the Pens, and that Christian Ehrhoff and Derrick Pouliot are also banged up, and this really hampers their defense corps. Max Pacioretty may not play for Montreal. The predictions should be able to give us an idea of which teams may be overrated or overlooked, and it should give us an idea of which teams are more likely than others to go deep into the playoffs. The inherent randomness in hockey makes a lot of individual series too close to call, and it also makes for a lot of drama and fun.
The model loves the Pens. Among playoff teams, the Pens were the best at score-adjusted Fenwick percentage in their last 20 games. They were third-best on the penalty kill. The Rangers have the edge in terms of save percentage, but they have struggled with puck possession. It seems crazy to think that the Pens, who struggled mightily down the stretch, are favorites against the team that won the Presidents' Trophy. But here we are.
The model also loves the Caps. They are slightly better than the Islanders at puck possession, but their odds are so good because of goaltending and special teams. The Islanders are terrible on the penalty kill, and the Caps are great on the power play. Braden Holtby has had a very solid season for the Caps. Meanwhile, the Islanders have the worst 5v5 save percentage of any playoff team.
Anaheim is another team that looks vulnerable. The Jets have been great down the stretch. They've both been good puck possession teams in their last 20 games, but Winnipeg has a slight edge. They also have a slight edge in goaltending and special teams.
Now let's take a look at the conference and Cup predictions from the simulations:
I know what you're thinking. It's similar to what I'm thinking. It's kind of shocking that the Pens are at the top of this list, but their underlying stats have been very good. They've been snake-bitten by low shooting percentage, bad luck, injuries to key players, and salary cap mismanagement that forced them to play several games with only five defensemen. A couple of teams with very good records, the Rangers and Canadiens, are near the bottom of this list. They're underwater possession teams, and the model does not think these teams are likely to win three or four playoff series. Of course, with Henrik Lundqvist and Carey Price, anything's possible.
The Ducks have been a decent team of late. The model is down in their chances because of their goaltending and their path to the finals. They'd have to beat a good Jets team and likely one of Chicago or St. Louis to get to the finals.
It's probably most constructive to think of these results in terms of a group of teams that rise to the top. Looking at the probabilities, there's a 77 percent chance that the Stanley Cup winner comes from this group: Pittsburgh, St. Louis, Washington, Winnipeg, and Chicago.
Looking back to my model from last year, my initial predictions showed an 85 percent chance of the Cup winner coming from this group: Boston, Los Angeles, NY Rangers, St. Louis, and San Jose.
I'll keep updating the predictions and as the playoffs progress. Enjoy the first round!