Wednesday, April 16, 2014

FTC's NHL Playoffs Preview and Predictions

The 2014 NHL playoffs are here! This year, we've decided to fire up the ol' FTC9000 to make some data driven predictions of what we might see this spring. In the ultimate celebration of hockey, we developed a statistical model to make predictions regarding this year's playoffs.

  The Data 

The objective of our model is to use regular season data to predict playoff performance. So I've gone over to stats.hockeyanalysis.com and downloaded a bunch of data. I've also collected some data from espn.com. Much of the possession data I use only goes back to the 2007-2008 season, so our data set consists of 90 playoff series over the past 6 seasons.

 Possession statistics 

Hockey analysts have been figuring out that the biggest driver of winning hockey is puck possession. Teams that direct more shots at the opposition goal than they allow on their own goal tend to win more games. Statistics like Fenwick and Corsi can give a relatively good approximation of how much a team controls the puck. However, teams play differently when they have the lead than they do when they're behind. Teams with a lead tend to go into a defensive shell and stop launching shots at the opposing net. Teams that are behind take more chances. So there's a real score effect when it comes to puck possession. Thus, I've calculated score-adjusted Fenwick (SAF) and score-adjusted Corsi (SAC) to test in our model. The adjusted stats are weighted according to how much time each team spends in a certain scenario (tied, up 1, up 2+, down 1, down 2+).

The Model

Our goal is to test which variables are associated with winning a playoff series. Along with SAF and SAC, I tested 5 on 5 save percentage, power play % (PP), penalty kill % (PK), total points, points in the last 20 games (to see if trade deadline acquisitions or team hotness mattered). I used logistic regression to model the outcome of a home team series victory. I used backward stepwise regression as a model building technique, and our final model consisted of SAF, save percentage, and PK. I then used tenfold cross-validation to test the prediction ability of the model.

The Simulations

The logistic model can calculate the probability of a team winning a playoff series. It can take any potential match-up, input the teams' peripheral statistics, and calculate a probability of outcome. Using those probabilities, we can simulate the playoffs a whole bunch of times (10,000 in our case) and see how often each team wins.

The Caveats 

First of all, we have issues with the quality and quantity of data. Our sample size is only 90. Also, the data collection in hockey is nowhere near what it is in other sports like baseball and basketball. Ideally, we'd like to be able to track where the players are on the ice, track where the shots are coming from, etc. But this is what we have. In the future, I may look at using power play and short-handed possession statistics instead of conversion rates. Second, there is a lot of inherent randomness in hockey. Third, I have not adjusted for injuries. So if Tuukka Rask and Zdeno Chara get hurt while moving a bookshelf, it won't be reflected in the projections. If anything, the projections should be able to give us an idea of which teams may be overrated or overlooked, and it should give us an idea of which teams are more likely than others to go deep into the playoffs. The inherent randomness in hockey makes a lot of series too close to call, and it also makes for a lot of drama and fun.

First Round Projections

Boston Bruins vs. Detroit Red Wings
The Wings are a dangerous wildcard team, but the Bruins are just very good.
79% chance of a Bruins victory.
21% chance of a Red Wings victory.

Tampa Bay Lightning vs. Montreal Canadiens
These teams are pretty even. TB is the better possession team, but the Habs have had the edge in goaltending and penalty killing. This is in toss-up territory, but if Ben Bishop is out for the Lightning, that could push the needle further toward Montreal.
54% chance of a Canadiens victory.
46% chance of a Lightning victory.

Pittsburgh Penguins vs. Columbus Blue Jackets
The model has this series as a true toss-up. The Pens were not a particularly good team at 5 on 5 play this season, and it could catch up to them. Their reliance on the power play and unsustainable penalty killing rates could lead to a swift demise. The model simply does not like the Pens' chances in these playoffs. However, the injury situation is looking better. Kris Letang, Paul Martin, and Beau Bennett are back. Evgeni Malkin should be back for Game 1. Meanwhile, Nathan Horton, R.J. Umberger, and Nick Foligno have been banged up for the Jackets. That could push the needle a bit more toward the Pens. If the Pens are to make a deep run, they have to play a fundamentally different, and better, brand of hockey than we've seen all season. Are they capable of doing that? Maybe. But it's kind of shocking to think that the Pens are essentially a coin flip away from a failure that could potentially lead to big changes.
51% chance of a Blue Jackets victory.
49% chance of a Penguins victory.

New York Rangers vs. Philadelphia Flyers
The Rangers are a better possession team than the Flyers. They also have Henrik Lundqvist. The Rangers could be a darkhorse; don't be surprised to see them go deep in the playoffs.
73% chance of a Rangers victory.
27% chance of a Flyers victory.

Anaheim Ducks vs. Dallas Stars
This could be a closer series than the seeding would indicate. The Ducks over-performed all year, and Dallas has decent possession numbers. Don't be surprised by an upset.
57% chance of a Ducks victory.
43% chance of a Stars victory.

San Jose Sharks vs. Los Angeles Kings
This should be a great series. These are two of the best teams in the NHL. It's unfortunate that they have to play each other in the first round. The model loves LA. The Kings have great numbers across the board. The team that wins this series will probably have faced its toughest opposition in round 1.
61% chance of a Kings victory.
39% chance of a Sharks victory.

Colorado Avalanche vs. Minnesota Wild
The Avs put up gaudy numbers in the regular season, and combined with a late Blues collapse, won the Central Division. But don't be fooled; these guys aren't that good. This is a toss-up. If the Wild can win this, they'll be partying like it's 2003 in the Twin Cities. Big upset potential here.
52% chance of a Wild victory.
48% chance of an Avalanche victory.

St. Louis Blues vs. Chicago Blackhawks
Another series involving two of the best teams the NHL has to offer. The model shows the Blues as a favorite, but injuries to guys like T.J. Oshie, David Backes, Patrik Berglund, and Vladimir Teresenko, combined with the return to health of Jonathan Toews and Patrick Kane could change things. The Blues don't need to worry about their late season slide. The Bruins sputtered to the finish line in 2011 and went on to win the Cup. The injury situation, however, could meaningfully alter the picture, and it seems like the model's limitations could be underestimating Chicago. The Blackhawks are a great possession team, but they've had difficulties with goaltending and penalty killing. The model isn't high on the Hawks, but if they can get past the Blues, I think their chances may improve dramatically.
66% chance of a Blues victory.
34% chance of a Blackhawks victory.

Conference winning probabilities



Eastern

Western

Boston Bruins
44%
Los Angeles Kings
37%
New York Rangers
27%
St. Louis Blues
24%
Montreal Canadiens
7%
San Jose Sharks
18%
Columbus Blue Jackets
5%
Chicago Blackhawks
7%
Tampa Bay Lightning
5%
Anaheim Ducks
6%
Pittsburgh Penguins
4%
Dallas Stars
3%
Philadelphia Flyers
4%
Minnesota Wild
3%
Detroit Red Wings
4%
Colorado Avalanche
2%












Stanley Cup probabilities


Boston Bruins
28%
Los Angeles Kings
23%
New York Rangers
14%
St. Louis Blues
11%
San Jose Sharks
9%
Chicago Blackhawks
2%
Montreal Canadiens
2%
Anaheim Ducks
2%
Tampa Bay Lightning
2%
Pittsburgh Penguins
1%
Columbus Blue Jackets
1%
Detroit Red Wings
1%
Philadelphia Flyers
1%
Dallas Stars
1%
Minnesota Wild
1%
Colorado Avalanche
1%

1 comment:

Nilesh said...

Found a small bug that couldn't identify the home team in a Habs-Kings final, resulting in a few missing data points. Fixed bug, re-ran simulations. Pretty much the same results; some %s may have changed minimally.