The Importance of Self-learning in Card Fraud Prevention
Fraud detection systems find fraud using two distinct approaches – user-written fraud detection rules and mathematical models. A given institution may use just rules alone or a model alone or both rules and a model together. Typically, smaller institutions will adopt a rules-only approach, whereas a larger institution will use both rules and models.
The good things about rules are that they are defined by the user, who is normally a fraud analyst, and their effect is easy to understand, i.e. you readily can understand why the rule has triggered a fraud alert for a given card transaction.
The not-so-good things are that false positive ratios tend to be high (e.g. 60:1 to 80:1 or higher) which means that for every fraud found by the rule, 60 to 80 false alerts are generated (which imposes quite a load on the institution’s fraud department) and users tend to create lots of rules (maybe many hundreds) which generates a considerable administrative and management burden. Another feature of rules is that they are fixed and cannot self adapt as time and fraud patterns evolve – so while a rule may be effective when first deployed, it rapidly can become ineffective as time goes on and fraud patterns change.
When a mathematical fraud detection model is used, it is first necessary to ‘train’ the model to identify fraud patterns by feeding it several months of card transaction data in which the fraudulent transactions have been marked or tagged. Once so trained, a model is generally more effective than rules in that it will detect more fraud with much lower false positives.
The major plus point about using a model is that, typically, a good custom model (a model trained on a single institution’s transaction data) is able to detect around 70% of fraud with a false positive ratio of about 20:1 when it is first deployed after being trained1. A not-so-good aspect is that models can take a long time and considerable expense to train (particularly if the model uses a neural network).
Also, similar to rules, a model maybe effective when it is first deployed post-training but generally it is fixed and does not adapt as fraud patterns change and so becomes less effective as time goes on (i.e. detection rate falls, false positive ratio worsens). Given that a typical time between retrains of a neural network model may be 12-18 months, it is easy to see how model detection performance can really drop off towards the end of such a period.
Alaric has been running a long-term research programme to try to find ways to mitigate or eliminate the issues mentioned above for models. As a result of this research Alaric discovered proprietary self-learning techniques that have proved to be very effective in live production.
What we mean by ‘self learning’ is the ability of a model to change dynamically (instantaneously or real time) and to self optimise itself as and when a fraud analyst marks a transaction as being fraudulent, dynamically learning new fraud patterns as and when they start to emerge. So as frauds are flagged, the model automatically adapts itself to maximise the detection rate while minimising false alerts. Other vendors train/optimise their models infrequently, every 12 to 18 months – the innovation here is that such optimisation is performed dynamically, without having to wait several months for a periodic model retrain.
The effects of introducing self-learning are very striking. The primary effects are:
An important secondary effect of introducing self-learning is that model detection performance becomes so strong that users find that writing rules produces little incremental benefit. The result is that users write less rules meaning there are less rules to maintain and administer. Therefore, as well as overcoming some of the issues experienced with models, a side effect of self-learning is that the administrative and management costs associated with rules are also reduced.
The high first time detection rate achievable using self-learning makes it possible for an institution to intercept a high proportion of fraud runs on the first fraud and to block the first bad transaction ‘in flight’ before any loss has been incurred. Real-time or in-flight interception of itself is known to reduce fraud losses by about 25% for a typical institution, even when a straightforward model (without self-learning) is used and is an essential feature in any aggressive fraud reduction programme2.
It is also worth noting that, in Europe, around 75% of fraudulent transactions are card not present (CNP) and therefore the introduction of self-learning models can have a very beneficial effect in clamping down on CNP fraud to the benefit of both card holders and online merchants alike.
To give a rough order of magnitude comparison, assume that an institution would lose US$1m to fraud in a given month without a fraud detection system.
So the financial benefit of self-learning is plain to see and its implementation is immediately justifiable on economic grounds, the savings far outweighing the costs of introducing it.
Card issuers who want to radically reduce their fraud losses should:
1A ‘consortium’ model can be created by pooling the data from several institutions. Consortium models generally have a markedly inferior detection performance as compared to a custom model for a specific institution. Consortium models achieve detection rates in the range 35%-45% with a false positive ratio of about 20:1.
2Timing: A Critical Factor in Fraud Prevention’, Marcel Drescher, head risk and compliance, UBS Card Centre, 8 April 2011.