A deep dive into sports analytics research. Hosted by Canzhi Ye, former Brooklyn Nets basketball analytics associate.
I was joined by Kostya Medvedovsky, the creator of the basketball projection system DARKO. We went into the weeds about the methodology behind DARKO and answered questions from Twitter about it. We then picked Kostya's brain on a wide range of NBA analytics ideas, mostly focused on topics that involve predicting things. Finally, we answered some more questions from Twitter. You can find Kostya on twitter: @kmedved
[00:00:00] - DARKO
[00:05:05] - exponential decay
[00:08:10] - kalman filters
[00:11:10] - gradient boosted trees go brrrr?
[00:13:40] - data infra
[00:21:30] - seasonality effects
[00:30:45] - rule changes, big external shifts like COVID
[00:37:50] - aging effects
[00:42:40] - measuring defense with box score data
[00:44:50] - Canzhi rants on RAPM, SPM
[00:52:35] - garbage time
[00:57:00] - how to structure data, wax poetic on pbpstats
[01:02:20] - player career and longevity projections
[01:09:30] - another GBT package
[01:10:50] - DFS, betting
[01:19:40] - when DARKO struggles / can learn from betting markets
[01:26:55] - college basketball DARKO
[01:35:00] - NBA preseason
[01:41:30] - coach RAPM lol
[01:43:10] - playoff adjustments
[01:46:25] - playoff adjustments
[01:49:05] - hot hand (fallacy? paper linked below), midwits
[01:55:40] - Sengun
[02:00:50] - team building, quantifying lineup ~synergies~
Things we brought up in the episode:
DARKO: https://darko.app
DRIP: https://theanalyst.com/na/2021/10/nba-stats-and-player-projections/
Hot hand discussion: https://marketing.wharton.upenn.edu/wp-content/uploads/2018/11/Paper-Joshua-Miller.pdf
I was joined by Udit Ranasaria who was part of a team that worked on the Passing Value in Expectation project for the Big Data Bowl. They went deep into the tracking data and built physics-based models for ball and player trajectory to estimate passing value, for all possible passes - not just the ones that happened. We also opined on the state of public tracking data based sports analytics, and Udit touched on some exciting new projects in the works. You can find him on twitter @uditranasaria
Project: https://www.kaggle.com/dutta64/pave-the-way-for-nfl-passing-analytics
In this episode, I was joined by the winners of the Big Data Bowl 2021, although we recorded this well before the results were announced. We talked about their project called Weighted Assessment of Defender Effectiveness which uses tracking data to allocate credit to individual defenders in coverage on pass plays. We touch on what it was like to work on a data project with others virtually during a pandemic before going deep on methods for defensive target probability over expectation (dTPOE) and defensive catch probability over expectation (dCPOE), and ideas for credit allocation.Â
Their write up is here: https://www.kaggle.com/asmaetoumi/weighted-assessment-of-defender-effectiveness
You can find the guests on twitter!
I was joined by Alex Stern, a current data science master's student and football analytics researcher at the University of Virginia. We talk about his Big Data Bowl project that goes to beyond the box score to quantify the value of defensive backs. The work extends upon the winning Big Data Bowl project from last year to include some cool Bayesian hierarchical models.
The project: https://www.kaggle.com/acs4wq/algorithms-in-the-war-room
I was joined by Charlie Gelman, a recent computer science and stats graduate from Duke.  We talk about his Big Data Bowl project about defensive backs playing in press man coverage. Bonus segment at the end about wrestling analytics.
0:24 Intro, background
4:00 PaperÂ
54:00 Wrestling analytics
The project: https://www.kaggle.com/charlesgelman/hip-reaction-drill
I was joined by Dani Chu, a graduate student in statistics at Simon Fraser University and part of the team that won the NFL's inaugural Big Data Bowl. We talk about his fascinating research done on NFL tracking data to identify routes. We also speculate and think wishfully about the metrics that could be developed if the tracking data were to ever become public.
The paper: https://arxiv.org/abs/1908.02423
The Josh Hermsmeyer article we mentioned: https://fivethirtyeight.com/features/can-nfl-coaches-overuse-play-action-they-havent-yet/
My guest today is Kanaad. He is a friend of mine from Berkeley, and he is a Cavs and RL enthusiast. We talk about a fascinating paper from the 2018 Sloan Sports Analytics Conference. The main idea is that by modeling each possession as a Markov Decision Process, it becomes easy to build a simulator that can help us answer questions like "How much more efficient would a team be if they took 20% fewer midrange shots early in the shot clock?"Â
The paper: http://www.lukebornn.com/papers/sandholtz_ssac_2018.pdf
Introducing the Sports on Paper podcast, hosted by Canzhi Ye.
First real episode coming soon...
Follow the pod on twitter @sportsonpaper!
Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.