Share:
Platform
August 9, 2021

Announcing Fairlens: an Open-Source Package for Data Bias Discovery

Author:
Nicolai Baldin
Announcing Fairlens: an Open-Source Package for Data Bias Discovery

Today, we have announced the release of FairLens as part of the Synthesized SDK.

FairLens is an open-source Python library for automatically discovering biases in data products.

The goal of FairLens is to enable data practitioners to gain a deeper understanding of their data, and help to ensure fair use of data in analytics and data science tasks.

FairLens is, in fact, the world’s first data centric open-source library for identifying data bias and driving fairness in decision-making. FairLens comes with full documentation and it can be downloaded here.

FairLens can be tried in Synthesized SDK Colab alongside other capabilities.

Problem It Solves

Machine learning and data science can be a force for good in the world, and yet many data science models rely on biased and skewed datasets for their development and training.

With limited, poor-quality or skewed datasets; data-driven applications often fail to achieve their intended purpose as they are inherently biased.

Data bias results in poor predictive capability, and functional failure with legal and reputational consequences.

The Core Features

  • Measuring Bias - FairLens can be used to measure the extent and significance of biases in datasets using a wide range of metrics
  • Sensitive Attribute and Proxy Detection - data scientists can automatically identify and flag sensitive columns and hidden correlations between columns to protect sensitive attributes
  • Visualization Tools - FairLens has a range of tools that can be used to help visualize data and identify biases before delving further into measuring them. For instance, FairLens can be used to visualize the distribution of a variable with respect to different sensitive demographics, or a correlation heatmap.
  • Fairness Scorer - Data Scientists can highlight hidden biases and correlations within a dataset by selecting a target variable

This version of FairLens assumes that the entire dataset is a “true” representation of the underlying phenomena, i.e. the "true" distribution can be estimated from data already. As the package develops and users of the package have more knowledge about different true distributions, the tool should be able to operate without this assumption. This means that aggregating knowledge about different distributions to eventually remove the assumption is part of the roadmap.

Jobs to Be Done

The core features of FairLens allow any data practitioner to perform a number of different jobs straight away and they can be tried in the Google Colab environment.

Fairness score - all identified biases in the dataset are aggregated to form the Fairness Score. It equally takes into account the positive and negative biases and provides a method to readily compare the biases across multiple datasets. The Fairness Score ranges from 0 to 1, with 1 meaning perfectly unbiased and 0 being heavily biased.

References