My favourite sport is football (soccer). But my favourite sport to talk about is NBA basketball. I even occasionally podcast about the NBA here. The core statistics in basketball are easy to comprehend: points, rebounds, assists, steals, blocks, and wins. And they are full of exciting action. Whereas in football, if it’s not goals or saves, the statistics (fouls, number of passes, CAPs) are rather boring. Both football and basketball have game flow, unlike American football which has quite a lot of start/stop motions.
Moreover, in American football, the players wear helmets and padding, so it’s hard to identify them as they move across the field. Identifying NBA players is the easiest thing in the world. LeBron James. Kevin Durant. Kobe. Boom, easy, you probably were able to visualize their faces in milliseconds. But do I really know what Super Bowl champion Antonio Brown looks like? Basketball courts are also much smaller than EPL football pitches or NFL fields, so it is even easier to identify and track players as they move, score, rebound, and dunk on TV. Dunk. Jam. Fun, action verbs! Nuanced number crunching combined with personalized player narratives is what fuels basketball conversations. And that’s why it’s so much fun to talk about the NBA.
The purpose of this project is to communicate NBA stories with statistical evidence. Here are my opinions on the greatest NBA players of all time:
- Michael Jordan
- Magic / Bird (I can’t decide!)
- Bill Russell
- Tim Duncan
- Shaq / Hakeem / Moses Malone (I change my order on these three every week it seems)
Nothing too controversial in that ranking. Probably very little controversy in almost everyone’s top-10 ranking I would imagine. But how do you decide if player X is the 27th best player of all time and player Y is the 28th? What separates the two players? I think I can use those fun NBA stats to solve the problem.
Here’s the code on my GitHub if you are interested.
I began by scraping Basketball Reference (a fantastic site full of statistical data) and using the Python package BeautifulSoup for several players like LeBron, Joel Embiid, Giannis, and Steph Curry. After extracting NBA player information like name, points, rebounds, and assists, I created pandas dataframes for each player. Following a normalization step, I created a new statistic value, called M_VALUE, that essentially combined several of those traditional stats (points, rebounds, assists) with more advanced ones (PER, true shooting percentage) for each season for each player.
Finding an NBA player’s prime then became about finding the window over the newly created M_VALUE statistic. The window specified the number of prime years. A window size of 3 meant finding a player’s 3-year prime. A window size of 4 meant finding a player’s 4-year prime.
As an example, here are the results for LeBron’s 3-year prime:
Perhaps the results are not too surprising, considering that during this 3-year stretch, LeBron won two championships, two MVP awards, and led the Miami Heat to a 27-game win streak (the second-longest single-season win streak in NBA history).
And here are several charts simply illustrating LeBron’s individual stats over different seasons.
Once I had the M_VALUE data calculated for several current players, I generated a heatmap in order to visualize their prime years in comparison to each other over the years 2010-present.
The progressive leaps Joel Embiid and Giannis made over the years are nothing short of amazing. Blake Griffin has tailed off noticeably, and Steph’s 2015–16 MVP campaign continues to jump out of the history books.
Lastly, here’s a list of future work I hope to incorporate into this project:
- Use a SQL database backend to have weekly scheduled runs to get the latest player stats
- Expand the dataset to 100+ players
- Use historical players like MJ, Bird, Magic, and Shaq to complete my top-10 ranking