Addressing Bias in Bagging and Boosting Regression Models: Article No. 18452

Research output: Contribution to journalArticlepeer-review

Abstract

As artificial intelligence (AI) becomes widespread, there is increasing attention on investigating bias in machine learning (ML) models. Previous research concentrated on classification problems, with little emphasis on regression models. This paper presents an easy-to-apply and effective methodology for mitigating bias in bagging and boosting regression models, that is also applicable to any model trained through minimizing a differentiable loss function. Our methodology measures bias rigorously and extends the ML model's loss function with a regularization term to penalize high correlations between model errors and protected attributes. We applied our approach to three popular tree-based ensemble models: a random forest model (RF), a gradient-boosted model (GBT), and an extreme gradient boosting model (XGBoost). We implemented our methodology on a case study for predicting road-level traffic volume, where RF, GBT, and XGBoost models were shown to have high accuracy. Despite high accuracy, the ML models were shown to perform poorly on roads in minority-populated areas. Our bias mitigation approach reduced minority-related bias by over 50%.
Original languageAmerican English
Number of pages12
JournalScientific Reports
Volume14
DOIs
StatePublished - 2024

NREL Publication Number

  • NREL/JA-2C00-91027

Keywords

  • artifcial intelligence
  • bias in machine learning
  • fair machine learning
  • gradient-boosted trees
  • random forest
  • XGBoost

Fingerprint

Dive into the research topics of 'Addressing Bias in Bagging and Boosting Regression Models: Article No. 18452'. Together they form a unique fingerprint.

Cite this