09 Jun Insurance fraud
One of the sectors most affected by fraud is that of insurance companies and, without doubt, the main products that have become targets of cybercrime are car insurance, as more than 63% of fraud cases are focused on them.
Every year, large amounts of money are invested in detecting fraud. A fraud that generates losses in the millions, although it must be acknowledged that the latest studies have shown that such investments are profitable, since for every euro of investment made, more than 48 euros are recovered.
A typical car insurance fraud occurs when the insured deceives the company by reporting a claim with the hidden intention of making a financial gain from it. A party with material damage or physical injuries that in fact did not take place, or, having occurred such an accident, the consequences are aggravated and exaggerated.
The search for automobile accident fraud is a highly specialized job that consumes a great deal of resources and time. Thanks to data science, this work is largely automated by focusing on possible cases of fraud and atomizing the problem for an optimal solution.
The magic of data science
How does this magic work? Based on a historical data of real, fraudulent and non-fraudulent incidents, their characteristics are analyzed in search of patterns. These are guidelines used by fraudsters who may give indications that a case is fraudulent. This is done by creating what is called a model. To make the model as accurate as possible, the data should be provided by the insurance company and should be as extensive and concise as possible.
Let’s suppose that an car insurance company has provided us with a data set (dataset) formed by the different characteristics surrounding an automobile incident. This type of data is usually: the type of policy, the months you have been a client, sex and age of the insured, type of incident, number of vehicles involved in the accident, type of collision (if any), severity of the incident, number of witnesses, etc. Data that will give us clues, that will allow us to study them as a whole and that will help us to solve if a case is fraudulent.
If we try to group cases with similar data, we can discover those that have characteristics that do not fit any group, any cluster. These cases are called outliers, which are possible cases of fraud.
In our case, we have discovered a characteristic that differentiates the cases that are not outliers and those that are. The characteristic “severity of the incident” represents the cost assumed by the insurance company in an incident and is often key to detecting possible fraudulent cases.
Image 1. Counting of the values of the characteristic ‘severity of the incident’ for all fraudulent (left image) and non-fraudulent (right image) incidents
As can be seen, at first sight, the seriousness of the incident is different in cases considered outliers (possible cases of fraud) and those considered real incidents. While in the possible cases of fraud the group of seriousness that stands out is number 3 (high cost), it is the cases of seriousness 2 (average cost) that stand out the most.
With this first approach, one could even draw out data that may be interesting, such as the probability that the incident is fraudulent (Table 1).
|Group||Probability of fraud (%)|
Table 1. Probability of fraud from each group.
Naturally, since “just because you wear cowboy boots doesn’t mean you can ride a horse”, a characteristic does not determine whether a case is fraudulent or has strong indications of being so. One set of features cannot help to narrow down the problem.
Data Science to protect the insurance industry from fraud
Data science provides us with an opportunity to be able to combat fraud in such sensitive sectors as insurance companies, but it is not the only field of application. At PhishingHunters we apply this technology for the detection of fraud in other sectors, such as the financial sector or the search for possible insiders in a company. Our extensive experience in the fraud sector allows us to carry out predictive and detection work with high reliability. Shall we talk? Ask us!