Extreme Gradient Boosting vs Algorithm models - PhishingHunters
post-template-default,single,single-post,postid-15735,single-format-standard,theme-bridge,bridge-core-3.0.2,woocommerce-no-js,qodef-qi--no-touch,qi-addons-for-elementor-1.5.3,qode-page-transition-enabled,ajax_fade,page_not_loaded,,qode-title-hidden,columns-4,qode-child-theme-ver-1.0.0,qode-theme-ver-28.8,qode-theme-bridge,wpb-js-composer js-comp-ver-6.9.0,vc_responsive,elementor-default,elementor-kit-16935
Extreme Gradient Boosting vs algorithm models

Extreme Gradient Boosting vs Algorithm models

The previous Extreme Gradient Boosting blog post described a study by PhishingHunters comparing different predictive analytics ranking models, which classified different credit card transactions into fraudulent and non-fraudulent transactions. The goal was to determine which algorithm had the highest performance. The Extreme Gradient Boosting ranking algorithm was the one that obtained the best results in our study.

Both the previous study and this one have been carried out with data from credit card transactions, although the studies and conclusions could be applied to any type of fraud.

Once we saw the best algorithm, we thought: what if we compare it with a group, that is, what if we compare it with a model generated by several algorithms, would it still be the best option?

Based on this premise, three new models have been implemented, and each of them involves a combination of different classification algorithms: 

  • Model A is based on the Decision Tree, Logistic Regression, k nearest neighbor and Support Vector Classification (SVC) algorithms. 
  • Model B has been created using theRandom Forest, Extreme Gradient Boosting (XGB) y Artificial Neural Networks algorithms.
  • Model C has been built using the four algorithms that obtained the highest individual performance from the seven classification algorithms used in the two previous models: k nearest neighbor, SVC, Random Forest and XGB.


How to evaluate each predictive model?


The model evaluation metrics have been achieved using the test sample (the training sample is used to create the model, and the test sample is used to evaluate the model). Based on the class prediction (fraud or non-fraud) of each of the test sample data, the performance of each model is evaluated by calculating these two measures:

  • The percentage of fraud detected.
  • The probability, as a percentage, that a data detected as fraud from the model is actually fraud.

The following table shows the value of both measures for each of the models created.

Percentage of fraud (%) Probability of fraud (%)
Model A 74.2 97.3
Model B 79.6 94.3
Model C 76.2 95.7

Table 1. Percentage of fraud detected and probability of being fraud for each model.

In Table 1, we can see that model A detects a lower percentage of fraud than either of the other two models, but the fraud it detects is more likely to be fraud. Model B is the one that detects the highest percentage of fraud, but is less likely to actually be so. And model C has both measures in the interval created by the values of model A and B.

It is important to note that greater fraud detection, but less likely to be, is often better valued by organizations. It is preferable to detect more instances of fraud, although a greater number of cases that are ultimately not fraud will have to be studied. Therefore, in our project it has been decided to use model B for fraud detection, as it detects a higher percentage of fraud than the other models.


Extreme Gradient Boosting vs Model B


A comparison was then made between the model B and the Extreme Gradient Boosting (EGB) classification algorithm, and the data obtained were as follows:

Percentage of fraud (%) Probability of fraud (%)
Model B 79.6 94.3
Extreme Gradient Boosting 78.2 94.3

Table 2. Percentage of fraud detected and probability of being fraud for the model B and the Extreme Gradient Boosting algorithm.

Table 2 shows that the probability that the fraud detected is actually fraud is the same for both models, and that the percentage of fraud detected by model B is higher than if only the EGB algorithm is used.

Although these are basic data, and no categorical conclusion can be reached, we can conclude that the use of models generated by various algorithms improves the performance of the models, making them more optimal, even if the maintenance of the model is more complex.