# Predicting Ice Hockey Goals Using Random Forest and XGBoost
# Predicting Ice Hockey Goals Using Random Forest and XGBoost
## Intro & Background
## Intro
Our project aims to predict goal outcomes in ice hockey by leveraging over 500,000 event-level entries from the Linhac24-25 dataset. Treating it as a binary classification task, the research focuses on identifying key factors—such as puck position, game state, and player context—that influence the likelihood of scoring. The motivation lies in improving tactical insights, player evaluation, and fan engagement by understanding which in-game situations most frequently lead to goals.
Our project aims to predict goal outcomes in ice hockey by leveraging over 500,000 event-level entries from the Linhac24-25 dataset. Treating it as a binary classification task, the research focuses on identifying key factors—such as puck position, game state, and player context—that influence the likelihood of scoring. The motivation lies in improving tactical insights, player evaluation, and fan engagement by understanding which in-game situations most frequently lead to goals.
## Algorithms
## Algorithms
To model goal-scoring events, the study employs two ensemble learning algorithms: Random Forest (RF) and XGBoost (XGB). Random Forest aggregates multiple decision trees and is robust to overfitting and class imbalance (via class_weight). XGBoost, a gradient boosting method, is optimized for performance and interpretability through regularization and one-hot encoding. Among the variants tested, XGBoost outperformed Random Forest, especially in identifying rare positive events, making it better suited for tactical prediction tasks.
To model goal-scoring events, the study employs two ensemble learning algorithms: Random Forest (RF) and XGBoost (XGB). Random Forest aggregates multiple decision trees and is robust to overfitting and class imbalance (via class_weight). XGBoost, a gradient boosting method, is optimized for performance and interpretability through regularization and one-hot encoding. Among the variants tested, XGBoost outperformed Random Forest, especially in identifying rare positive events, making it better suited for tactical prediction tasks.
## Summary & Future
## Summary
Our project uses ensemble machine learning models to predict goal-scoring events in professional ice hockey. Using spatial and contextual features from the Linhac24-25 dataset, both Random Forest and XGBoost models were trained and compared. Results show that XGBoost performs better in classifying rare goal events, with key predictors including puck location, game situation, and expected goals. Visualizations and feature importance analysis align with domain knowledge, supporting both model accuracy and practical application in strategic decision-making.
Our project uses ensemble machine learning models to predict goal-scoring events in professional ice hockey. Using spatial and contextual features from the Linhac24-25 dataset, both Random Forest and XGBoost models were trained and compared. Results show that XGBoost performs better in classifying rare goal events, with key predictors including puck location, game situation, and expected goals. Visualizations and feature importance analysis align with domain knowledge, supporting both model accuracy and practical application in strategic decision-making.