Bank Marketing Dataset

In order to optimize the resources available, any marketing campaign must focus its efforts in those potential customers that are more likely to accept the proposed offer. In this case, a Portuguese banking institution called it's users to offer them a new product, and they registered whether the outcome of the call was positive. A proper analysis on this data can drastically increase the acceptance rates by targeting customers that are more interested on the offer and not wasting resources on customers that are not.

Dataset

This bank marketing dataset contains 11,162 records that belong to telephone calls done to the clients of the bank. The target variable deposit tells whether the call was successful and the client subscribed to a term deposit, and there are 16 explanatory variables to predict the outcome of the call. These variables contain personal information about the user (such as age, job, family status, education), other products that they signed for, and detailed information about the current and previous marketing campaigns.

Use Case

The objective is to train a ML model that returns the probability of a customer to accept the offered product. This is a binary classification task, therefore F1-score is a good metric to evaluate the performance of this bank marketing dataset as it weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously.

Data Problems and Synthesized Solutions

Although this dataset can make a huge difference on the banking institution's performance, it has some problems that complicate its usage. Luckily, Synthesized can solve these problems in a fast and intuitive way.

Privacy. This bank marketing dataset contains personal information about users, making it difficult to work and share this dataset. In Synthesized we can generate a synthetic dataset that preserves statistical information (95% utility across multiple ML tasks compared to original data) in under 10 minutes, while removing all risk of non-compliance with data regulation such as GDPR, HIPAA and CCPA.
Imbalanced Dataset. This bank marketing dataset contains some columns that are highly imbalanced (only 1.5% of customers have defaulted). This class imbalance may heavily reduce performance of the model for this subsample if not treated carefully. With Synthesized's Data Manipulation tool we can manipulate the output distributions of this column and generate more samples to balance this population. Read more about the benefits of data rebalancing in our blog post.
Fairness and Biases. AI models can be unintentionally (and potentially illegal) discriminative to certain sensitive groups of people, if the underlying training data is biased. Synthesized can help assessing how biased a dataset is, finding where the biases are and flagging them to the user. Read more about discrimination by AI in our blog post.

References

This bank marketing dataset is publicly available in the UCI dataset repository as "Bank Marketing".