Today, it’s data that powers the enterprise, helping to provide competitive advantage, inform business decisions, and drive innovation. However, accessing high-quality data can be costly and time-consuming, and using it often involves complying with strict data compliance regulations.
Synthesized helps organizations gain faster access to data and navigate compliance restrictions by using generative AI to create shareable, compliant snapshots of large datasets. These snapshots can then be used to make faster and more informed business decisions, and power application development and testing. It does this by helping organizations overcome many of the obstacles to fast and compliant insights:
Synthesized uses generative AI to help customers across healthcare, financial services, insurance, government, and more generate a new and accurate view of their data with confidentiality restrictions automatically applied.
The solution effectively applies data transformations such as masking, subsetting, redaction or generation to create high-fidelity snapshots of large datasets that can be used for modeling and testing. Synthesized uses generative AI to capture deep statistical properties, which are often hidden in the data, to create valuable data patterns and recreate them in synthetic data. At the same time, Synthesized helps ensure adherence to enterprise data privacy regulations, as the output data is programmatically designed to be fully anonymized, for easy and fast access to high-quality data, enabling better decision-making.
With the click of a button, organizations can access insights from a synthetic snapshot that is representative of the entire original dataset — in a way that’s fast and compliant. In other words, the solution addresses the “chicken-and-egg” problem of data access: Data consumers have to formulate their request for data access in terms of SQL query, but they can’t write the query without access to data in the first place.
The newly generated synthetic data can be used for a variety of purposes, including:
One of the largest banks in Germany turned to Synthesized to give its engineers and data science teams fast access to the synthetic test data. They wanted to accelerate the preparation time needed to query the data so that they could speed up testing and time to market, and increase accuracy. Synthesized provided non-traceable snapshots of the original datasets, enabling the bank to start data analysis, app migration and testing in the cloud, and experiment with large datasets for new AI/ML use cases and technologies.
Likewise, a leading insurance company wanted to move away from highly manual and resource-intensive data processes to help it remain competitive. Synthesized helped the company generate millions of highly representative test datasets that could be shared safely with third-party vendors for product development. The company was able to accelerate product development, save 200 man-hours per project and drastically reduce its volume of work.
Synthesized extends the functions already available in BigQuery. For example, BigQuery covers masking and data loss prevention for redaction, while Synthesized applies transformations like subsetting and generation. Integrating Synthesized and BigQuery can help organizations to gain fast and secure access to ready-to-query datasets, extracting only the snapshots they need to inform testing or business intelligence. Once the snapshots are ready to be shared safely from a compliance perspective, they can be stored in an organization's own systems, or shared with third parties for analysis.
Because these snapshots remain in BigQuery, they can be easily used with the full range of Google Data and AI products, including training AI models with BigQuery ML and Vertex AI.
Synthesized has API access to BigQuery, so extracting snapshots and provisioning data is easy and automated. Synthesized also uses a generative model to synthesize data and create balanced datasets from unbalanced datasets, providing the necessary distribution of examples that are ready for sharing. This generative model is stored within the customer's tenant and can also be shared along with the data.
Here is a simple illustrative example query to generate a fast and compliant snapshot with 1,000 rows from a input table:
Built with BigQuery helps ISVs and data providers build innovative applications with Google’s Data Cloud. Participating companies can:
BigQuery gives ISVs the advantage of a powerful, highly scalable unified AI lakehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. Click here to learn more about Built with BigQuery.
Originally published at cloud.google.com/blog/ on February 2, 2024.