Benchmarking Machine Learning for Basketball Performance

A recent study published in Knowledge and Information Systems shows us an analysis of machine learning (ML) models for predicting the future performance of  basketball players. This study tested 14 different ML models against 18 basketball statistics and Key Performance Indicators (KPIs).

Methodology and Data Focus

The researchers applied their methodology to a pool of 90 NBA players. To qualify, players had to meet strict criteria based on high averages across crucial KPIs like GameScore (GMSC), FOUR FACTORS, TENDEX, Fantasy Points (FP), and Efficiency (EFF) and the players had to have had a good amount of participation time (15+ minutes) over three recent seasons  (2019–20 up to 2021–22).

The study focused on developing individual scenarios for each player and the players averages in their past games in order to predict upcoming games. The 14 ML models tested spanned several categories, including Linear-Based models, Tree-based models, Non-Parametric (K-nearest neighbors or KNN), and Online-Learning models. To keep it simple, the study used different versions of models that best suit the measuring category.

In order to rank the models in comparison to each other, the authors made an evaluation metric called the weighted average percentage error (WAPE), which is derived from the weighted Mean Absolute Percentage Error (MAPE) results for each statistic and model.

Key Findings: Tree-Based Models Dominate

The research clearly showed that the Tree-based models are the best performers across most of the tested performance indicators.

The best models, based on performance against unseen data, were ranked as follows by WAPE score:

  1. Extra Trees (ET): Achieved the best performance with a WAPE of 34.14%. ET was a better predictor for metrics such as AST RATIO, PIE, PM, and TOV.
  2. Random Forest (RF): Scored a WAPE of 34.23%.
  3. Decision Tree (DT): Scored a WAPE of 34.41%.

While Tree-based models were really good, certain Linear-Based models like LASSO (34.54% WAPE) and LARS (34.53% WAPE) also showed exceptional overall performance, demonstrating good accuracy for specific metrics.

Optimizing KPI Forecasting

A crucial finding of the study was the success of a comprehensive forecasting approach designed to improve KPI prediction results, specifically focusing on Fantasy Points (FP).

Instead of forecasting FP as a single combined metric, the researchers experimented with forecasting the six constituent individual metrics (Points, Rebounds, Assists, Steals, Blocks, and Turnovers) separately, and then constructing the FP KPI from these predictions.

This optimized approach led to a significant improvement. The overall forecasting accuracy for FP improved by 3.6% MAPE on unseen data when utilizing this method. The Gradient Boosting Machine (GBM) model, for instance, achieved a MAPE of 29.81% when using the reformatted approach, which outperformed the best result from the single-metric forecast approach (LARS at 33.39% MAPE).

Implications

The findings show us lots of new  insights for the sports analytics industry, including researchers, coaches, data scientists, and organizations that invest in creating data science departments for scouting insights. The research not only benchmarks current ML model performance but also introduces an innovative approach to improve KPI forecasting accuracy, showing us that advanced statistical techniques combined with practical methods can set a new benchmark for predicting professional basketball player performance. Future work suggested by the authors includes incorporating external factors such as sentiment analysis, betting odds, and motion capture data to further refine predictions.

What does this mean for the future of sports? 

We are heading towards a future where statistical analysis can help teams attract the best talent and retain the best evaluate performers. The benchmarks are only getting better and more accurate. This begs the question of whether this is fair or not? Would it be fair for a team to cut a players contract based on data from a machine learning model? As these models start becoming more and more involved in team environments, the ethical considerations of using these models are going to start taking effect as well. So, what do you think; do you think these models are doing more good than bad for sports?

 

Reference

Papageorgiou, G., Sarlis, V., & Tjortjis, C. (2024). Evaluating the effectiveness of machine learning models for performance forecasting in basketball: a comparative study. Knowledge and Information Systems, 66, 4333–4375. https://doi.org/10.1007/s10115-024-02092-9

More like this

Tectonic Stress and Fault

References Dalaison, M. (2023, April 19). A snapshot of the long-term evolution of a distributed tectonic plate boundary....

Supernovas

  References Burrows, A. (2000, Feburary 17). Supernova explosions in the Universe. nature. https://www.nature.com/articles/35001501#Sec1 Nanowork. (2025, 10 December). New report...

Coca-Cola vs Zero Sugar Coca-Cola

References Brown, M. J. (2023, July 14). Saccharin — Is This Sweetener Good or Bad? healthline. https://www.healthline.com/nutrition/saccharin-good-or-bad Malik, V....