This course provides an introduction to using Python to analyze team performance in sports. Learners will discover a variety of techniques that can be used to represent sports data and how to extract narratives based on these analytical techniques. The main focus of the introduction will be on the use of regression analysis to analyze team and player performance data, using examples drawn from the National Football League (NFL), the National Basketball Association (NBA), the National Hockey League (NHL), the English Premier LEague (EPL, soccer) and the Indian Premier League (IPL, cricket).
This course does not simply explain methods and techniques, it enables the learner to apply them to sports datasets of interest so that they can generate their own results, rather than relying on the data processing performed by others. As a consequence the learning will be empowered to explore their own ideas about sports team performance, test them out using the data, and so become a producer of sports analytics rather than a consumer.
While the course materials have been developed using Python, code has also been produced to derive all of the results in R, for those who prefer that environment.
Introduction to Sports Performance and Data
This week introduces a simple example of sports analytics in practice - the calculation of the Pythagorean expectation to model winning in team sports. This can also be used for the purposes of prediction. Examples are developed for five different sports leagues, Major League Baseball (MLB), the National Basketball Association (NBA), the National Hockey League (NHL), the English Premier League (EPL-soccer) and the Indian Premier League (IPL-cricket).
Introduction to Data Sources
This week will use NBA data to introduce basic and important Python codes to conduct data cleaning and data preparation. This week also discusses summary and descriptive analyses with statistics and graphs to understand the distribution of data, the characteristics and pattern of variables as well as the relationship between two variables. At the end of this week, we will introduce correlation coefficients to summarize the linear relationship between two variables.
Introduction to Sports Data and Plots in Python
This module introduces some ways of representing data using examples from MLB, the NBA and Indian Premier League. MLB data is used to analyze the spatial distribution of different hits. NBA data is used to generate heatmaps to illustrate the different ways in which players contribute. IPL data is used to show how team performances can be compared graphically.
Introduction to Sports Data and Regression Using Python
This week introduces the fundamentals of regression analysis. We will discuss how to perform regression analysis using Python and how to interpret regression output. We will use NHL data to estimate multiple regression models to identify the team level performance factors that affect the team's winning percentage. We will also use cricket data from the Indian Premier League to run regression analyses to examine whether player performance impacts player salary.
More on Regressions
This module uses regression analysis to investigate the relationship between team salary spending and team performance in the NBA, NHL, EPL and IPL. The module explores different ways of defining the regression model, and how to interpret competing regression model results.
Is There a Hot Hand in Basketball?
This week studies an interesting topic in sport, the hot hand. We will introduce the concept of hot hand and discuss the academic research that examines whether the hot hand is a phenomenon or a fallacy. We will demonstrate how to analytically test the hot hand using the NBA shot log data. We will test whether NBA players have hot hand by computing conditional probabilities and autocorrelation coefficients as well as performing regression analyses.