Boasting About Boosting
The origins of the XGBoost algorithm lie in decision trees, a type of flow chart used to visualize the decision-making process by mapping out different courses of action and (potential) outcomes. Decision trees can be implemented as mathematical algorithm and models are produced through a training process across a set of features in the data to predict target characteristics or outcomes. Dr. Leo Breiman of the University of California at Berkeley and his colleagues coined the umbrella term Classification and Regression Trees (CART) for these models in the mid-1980s. When the target is a set of discrete values then a classification model is derived; when the target is continuous values then a regression model is produced.
Dr. Breiman was instrumental in forging a bridge in the field of statistics to the algorithmic approaches developing in the field of computer science. He especially was well known for his early work combining the predictions of multiple decision trees to create “ensemble learning.” His “bootstrap aggregation” or “bagging” approach for managing the ensemble involves training the multiple decision trees on random samples of the data with replacement and then combining the decisions to determine a winner by consensus. However, it was Tin Kam Ho, a computer scientist at IBM Research, who is attributed with building the first random forest algorithm in 1995. She used a different bagging approach. Breiman and a former student of his, Adele Cutler at Utah State University, extended Ho’s ensemble learning work and trademarked the phase “random forest”.
Boosting is another approach to ensemble learning. The idea is to build the ensemble stepwise by adding new trees that focus on the training sample where errors were made by the previous ensemble. Basically, these models learn iteratively from their mistakes. Similarly, gradient boosting focuses on the more difficult training cases to predict but with an algorithm that minimizes the impact of adding new members on the existing ensemble.
One concern with gradient boosting is over-fitting the model to the training data – so the model won’t generalize to other data. That’s where Extreme Gradient Boosting (XGBoost) comes in. This method was initially developed in research by Tianqi Chen, who was a graduate student at the University of Washington at the time (there’s another story there). XGBoost uses “regularization” to improve model generalization by controlling the strongest weights, and to reduce model complexity by zeroing the weakest weights. XGBoost robustly produces significantly better performing models. It also runs super-fast.
Random forest models greatly improve the accuracy of prediction but suffer from a loss in the ease of interpretation of what contributes to the predicted outcome. In many cases, but not all, understanding a model’s underlying mechanisms is essential to the utility of the model—not only in scientific research but in business. During a pivotal period for machine learning, Breiman addressed the trade-off between prediction accuracy and interpretability in a 2001 paper entitled Statistical Modeling: The Two Cultures that was published in the journal Statistical Science. In summary, he argued that the machine learning algorithms deserve greater attention among statisticians as follows:
- There are the potentially suspect conclusions (interpretations) arising from the weak predictions of traditional statistical methods.
- The higher accuracy of new “black box” algorithms ultimately provides greater understanding of the underlying mechanisms for the prediction.
There is still a great amount of work being done today to devise methods for improving the interpretation of machine learning and artificial intelligence algorithms. This includes work on the variable importance measures used to analyze the black box mechanisms of random forests. Even so, these algorithms have seen significant acceptance and adoption.
Follow us on LinkedIn to stay updated on the latest Softrams news and developments.