FairLens is an open-source Python library for automatically discovering biases in data products.
The goal of FairLens is to enable data practitioners to gain a deeper understanding of their data, and help to ensure fair use of data in analytics and data science tasks.
FairLens is, in fact, the world’s first data centric open-source library for identifying data bias and driving fairness in decision-making. FairLens comes with full documentation and it can be downloaded here.
FairLens can be tried in Synthesized SDK Colab alongside other capabilities.
Machine learning and data science can be a force for good in the world, and yet many data science models rely on biased and skewed datasets for their development and training.
With limited, poor-quality or skewed datasets; data-driven applications often fail to achieve their intended purpose as they are inherently biased.
Data bias results in poor predictive capability, and functional failure with legal and reputational consequences.
This version of FairLens assumes that the entire dataset is a “true” representation of the underlying phenomena, i.e. the "true" distribution can be estimated from data already. As the package develops and users of the package have more knowledge about different true distributions, the tool should be able to operate without this assumption. This means that aggregating knowledge about different distributions to eventually remove the assumption is part of the roadmap.
The core features of FairLens allow any data practitioner to perform a number of different jobs straight away and they can be tried in the Google Colab environment.
Fairness score - all identified biases in the dataset are aggregated to form the Fairness Score. It equally takes into account the positive and negative biases and provides a method to readily compare the biases across multiple datasets. The Fairness Score ranges from 0 to 1, with 1 meaning perfectly unbiased and 0 being heavily biased.