Nicolai Baldin, CEO and Founder of Synthesized recently joined Open Finance, a podcast series by Finastra that explores how fintech can help tackle climate change, and what embedded finance means for the future of banks. The host of the day is Helen Driver, Founder of Money Ready, a financial education platform, and a former fund manager.
Helen and Nicolai are joined in the discussion by Adam Lieberman, Head of Artificial Intelligence and Machine Learning at Finastra. They looked at what can be done to stamp out AI bias in financial services. AI and Machine Learning are revolutionizing the way the financial services industry operates. But this huge opportunity comes with a risk that if left unchecked, can perpetuate existing human biases.
Helen starts off the discussion by citing Renee Sattiewhite, President and CEO of the African-American Credit Union Coalition: “What I'm concerned about is that the same bright people who have honest intentions are making these software programs, but they haven't addressed their own implicit bias. So it's like a hamster on a wheel. It just doesn't stop. And so from that aspect, my concern is the people who are making it don’t look like me, and they're not the colored majority. So I think if people who are making the AI modeling partner with people of color, they're clear that they're not putting implicit bias in their data findings”.
Nicolai: I'm the CEO and Founder of Synthesized, a software company that provides a development framework for any company to create optimized datasets for use in development and testing practices. We provide data needed by any test engineer or software engineer for the development and testing of applications and services. The data is optimized for quality, but also by eliminating bias and making sure it's compliant to use in different test environments.
Adam: I'm the Head of Artificial Intelligence and Machine Learning at Finastra, and together with my data science organization we lead the application of machine learning, research, and development to innovate around the needs of the financial services industry. Part of my role is strategy, serving the field, and identifying the areas of opportunity for machine learning at Finastra.
The other part is hands-on development with my data science teams to develop and productize our models as APIs on our platform or to integrate it into the products as well as building internal MLOps tools to create this consistently smooth machine learning engineering environment. Ultimately it's strategy mixed with a ton of statistical modeling Machine Learning engineering, and it's a lot of fun.
Nicolai: I think it's very important to first define what data bias is and how it's different from AI bias. It's clear that data bias causes AI bias and it occurs when there is some discrepancy in terms of distributions for some subsets of data we deal with. But at the same time, all data is biased and the problem arises when the bias is in fact against legally protected attributes. Such attributes are clearly defined in the UK, Europe, and the U.S, and that leads to pinpointing possible discrimination.
And needless to say, many face such issues today, and fairness and data transparency should be at the core of practices we have today.
The problem is even worse due to unintentionally hidden biases in data, and implicitly in the AI algorithms. More concretely, from the application point of view, as a result of hidden biases in data, we have skewed models with implications on the customer experience. There are inevitable compliance and legal risks with potential reputational and financial damages as well.
Adam: Definitely, and I agree one hundred percent with what Nicolai just said. Bias is a problem for fintech and other organizations who don't know that the data they're modeling from is biased. It can stem from an uncountable number of reasons, to be hidden underneath proxies and confounding variables. It can be very hard to detect if you don't look for it within your data. And these unaware companies are learning models to replicate this bias and propagate it forward causing a vicious cycle of bias.
As machine learning engineers we're so focused on performance. We want the best F1 score, the highest accuracy, the lowest RMSE. And we have these great statistical tricks to help achieve our performance criteria. However, I believe we need to take a step back, understand that we need to prioritize people without lowering our standards on performance.
Nicolai: I fully agree with Adam, it's very important to tackle the issue with AI bias now. Simply put, in three to five years, or potentially sooner, people will not say we do not care about our algorithms being fair, or our data being unbiased. It’s highly possible that regulations will become even stricter so it’s better for businesses to adjust now so that they can save resources and ensure that we protect users and customers sooner. And I believe that companies can quickly get a competitive advantage if they make sure the algorithms and data aren’t biased first and foremost.
There are very clear incentives for companies to self-regulate. Even though some of the countries and regulations are on a journey to get these in place, the incentives are there. When building your teams make sure we provide equal opportunities for everyone. At Synthesized, we want to ensure that these principles work well, are respected by all of our clients and partners as part of the wider ecosystem as well.
Nicolai: In terms of the regulatory framework, we see growing momentum, working groups in Europe, and in the UK working on different proposals with regards to data bias and fairness. I think there is still lots to be done, beyond the theoretical discussions to be held. There is a need for a practical solution, and actions in this space as well.
Adam: The first thing we do when we take on any new modeling project is we have what's called a Legal and Governance Review. Where we classify all bits of data, we look for the PII, the sensitive attributes, we highlight the use case and we determine what bias and fairness look like. We'll work with our product domain experts and legal experts to craft any fairness criteria outside of the standard measures we look at, such as statistical parity or equalized odds that can help us measure the degree of bias in our modeling problem. Then we start with the root, the data, and we thoroughly inspect it to determine if we see any unfairness to particular groups or individual data points that we'll be modeling.
It's a lot of heavy interaction with our domain experts to help us to detect historical bias, human bias, measurement, or even proxy biases. And for us it’s about digging deep into the data, ensuring that we have the right population for a modeling problem. Additionally, depending on the use case and how severe the outcome is for the end-user, we'll leverage post-processing tactics, like some adversarial de-biasing to reduce impact, any confounding variables, or sensitive attributes that could contribute. And within our data science organization, we assess fairness on all models and leverage these tools and techniques to ensure that any model we put into production is fair for our end-user.
Nicolai: Adam, well done on your approach to AI bias! I think we are very complementary in the sense that we look at the data bias, which often causes AI bias. And for us, the reason to look at the data biases is also quite important because we understand that data has been used not only by ML engineers and data scientists to develop models, but also by marketing teams, sales teams, software engineering teams, and testing teams. And we never want our software engineers, test engineers, or any other stakeholders to work with biased data. We focus on eliminating data bias from data pipelines. And for that purpose, we released FairLens, which allows any engineer to discover and visualize hidden biases within data, in minutes, as opposed to hours, days, or even a week.
We provide very clear reporting of all potential and hidden biases in data pipelines with regard to different groups of data. It's important to focus on legally protected attributes as well in the US, UK, and Europe as well.
Nicolai: We saw the need for a solution to identify and visualize bias in data pipelines and it was clear to us that we want to work with the community. We are firm believers that there should not be a single party defining bias. It's a collaborative effort that requires a collaborative solution. That is why we wanted to open-source the framework, make the tooling available to others, and work together on what the formal definition of bias should be, how to take action, what that means for different countries and different jurisdictions. It's for society to define fairness and how inequalities can be ironed out.
First, we need to seek an understanding of bias and prioritize it, otherwise, we're going to replicate it. As I mentioned before, machine learning is about using historical data, drawing patterns, insights, and making predictions. And often our predictions become new data points that will go into a new version model or even a completely different model. So if we don't define what bias looks like for the problem, and we don’t look for it within our data to ensure our model is as bias-free as possible, then we're going to be learning to replicate this bias that lives in our dataset, create biased predictions, resulting in biased actions, which then become bias data points that are used to train a further biased model.
This is a cycle of bias that could be prevented by the proper understanding of inspection, planning, and then leveraging the correct mitigation techniques.
Nicolai: I think it's a very good point about each rating on the issue, and I think it's also about how some of the components of our algorithms of the open-source framework work like a simple example. If you mitigate bias with regards to some groups, it may produce bias with regards to some other groups which you haven't seen before.
So it's very important to iterate. In computer science it’s called a greedy algorithm, to look at those situations and elements in an iterative way. It's important to make sure you understand how bias propagates, what influences it, and if you’ve already changed some of the variables, what else do you need to look at, and that's exactly what we focused on. The first step is to have full visibility of hidden biases across the pipelines, bias mitigation is an iterative process where different stakeholders need to work collaboratively. The system is fairly complex, we have data pipelines, databases, data warehouses, and data lakes, which are already well interconnected. We need to ensure the work that's being done in one database, or one data pipeline also propagates across other databases.
Nicolai: I think there is a clear understanding that it is impossible that in three to five years people won’t care about algorithms being fair, or data being unbiased. It's also highly possible that the regulations will only become stricter and to such a rapidly growing economy, commercial, operational and reputational risks arise very quickly when the AI models and machine learning rely on limited poor quality and skewed biased data.
And we were going to see more interest and active work of regulators, academic environments, AI labs, and machine learning labs at different corporations as well. Already there are industry conferences specifically dedicated to the fairness of machine learning models and AI data bias and more and more and more people contributing to the topic either at theoretical level or with practical solutions.
Adam: There's a quote that I read and love, which says that algorithms don't remember incidents of unfair bias, but our customers do. Naturally, as we start deploying more models, and consumers or the end-users start understanding that the algorithms can produce the output, that's becoming the deciding factor for them.
They're naturally going to be asking more questions, and fairness is going to continue to be a hot topic in our end users' minds. And I think in terms of the future, we look at machine learning, and it's been such a model-centric approach. We're trying to figure out different statistical properties for models or trying to build them bigger and better. Look at the field of natural language processing, where all our models are, you know, in the news it's about who can build it bigger.
But this year we've started taking more of a data-centric data approach, to ensure we have the right data for the problem we try to solve. Let's not try to hyperparameter tune these models on a poor quality dataset, but rather let’s see if and how we can inspect our data better and faster and ultimately fix the data. I believe we will continue to see an increase in tools designed to help us inspect our data which typically is a time-consuming process. Finding a data set that doesn't have some bias is going to be hard to find! And so, I expect to see an increased focus on data-centric fairness.
Nicolai: We have a hard-working, diverse team, and we continue to invest in equal opportunities for everyone, to grow in our company. We believe in having fundamental principles and this is part of our DNA and our culture. But also it’s something which I personally saw that was and is still missing in some organizations, especially when you come from under-represented groups, as many of my close friends do. I personally consider myself an immigrant. It was very important for us to make sure we put the right foundation in place and be clear on how we see our company grow.
Adam: When I look at doctors and surgeons, they're responsible for providing their patients with the best medical care and advice possible. They've taken an oath to do right by their customers, which are their patients. And when I look at us as machine learning engineers, we have a responsibility too. We have a responsibility to deliver high-quality models that we hope serve the needs of millions of people. I want to make people's lives better through machine learning. I want to help automate the redundant and boring, create insight that leads to action and make the world a better place through machine learning. And I'm such a big proponent for finance for good and helping the world, and being able to leverage statistical learning makes it twice as better.
The models I build are deployed and touch the lives of people directly and indirectly. I need to ensure that these models are just direct, equitable, and fair. And ultimately that leads to better decision-making and better outcomes.