Improve Performance of Fraud Models
Using Synthesized SDK
Customers realize millions in cost savings by improving model performance by up to 15%.
The Synthesized SDK Difference
Banking customer accelerated model performance across entire fraud model portfolio by 4-17%.
Retail banking customer realized $5M in cost savings with a 2% performance increase in a single fraud model.
Customers eliminate 2-4 months from model delivery cycles with fast synthetic data generation.
Reduce the Noise and Amplify the Signal in Your Training Data
Synthesized makes it easy to reshape and rebalance training data to amplify the fraud signal, critical to improving model performance.
With data bootstrapping, Synthesized delivers remarkable statistical accuracy across every dimension of your data.
Models retrained with Synthesized simply perform better.
Read the Docs
Automatically Improve the Quality of Your Training Data
Missing data is another curse of model training.
Synthesized data imputation instantly replaces missing values with synthetic values learned from the patterns of the existing data.
Improving training data quality means better model performance.
Read the Docs
Generate any Volume of Synthetic Training Data
Most companies simply don’t have enough real data to train their models effectively.
Synthesized allows you to generate any volume of high-quality synthetic data in minutes.
And Synthesized is proven to be the fastest data generation platform.
See Synthesized SDK in Action
Today we're going to be diving into an example of how Synthesized can be used to improve the performance of your fraud detection models.
In this example, we have a transactions dataset with 8 columns. You can see fraud is the binary target on the left.
Let’s see how good a simple model is on the existing dataset. We'll use age, gender, category, and amount as explanatory variables to try to predict fraudulent transactions.
So we get an ROC AUC of about 88%. Not bad! But we can do better by adding in some synthetic data.
First, we'll need to extract some metadata, build a generative model and train it. But this is easy with Synthesized. The training process itself doesn't take long either. Here we have 8 columns and about 20 thousand rows. and on a 4-core CPU, it's going to take about 3 - 5 minutes.
Once the generative model is trained we can use it to *upsample* the number of fraudulent transactions in our training dataset and thereby amplify the signal of fraud in the dataset. Fraud datasets are typically very imbalanced with a weak signal. Synthesized can be used to highlight this signal and improve model performance.
It's finished training now. Let's use a Conditional Sampler to generate a dataset but the amount of fraud rebalanced to be 50:50.
Now that we've created the new dataset, let's validate what it looks like compared to the original. We can do that with the Assessor class. Let’s save that figure and have a look.
As you can see, the fraud in the new dataset has been upsampled to a 50:50 split.
Now we can reevaluate our model, comparing its performance when trained on the synthetic dataset, to that trained on the original dataset and evaluated on some held out original data.
We've improved the performance here from 88% to 95% -> an absolute difference of 7%. And it only took about 5 minutes to do!
This has been a walkthrough of just one of the ways Synthesized can help you extract the most out of your data. Thank you for listening.
Join the Community on Slack
Connect with other Synthesized users and directly with our engineers.Join our Community