Pricing: from data imputation to data driven testing

Compare plans, feature by feature

Free developer package

30 days

—

On-premise, private/public cloud

Developer hub

Details

The esssentials

Enterprise compliance

Enterprise success

License length

Data types

Data structures

String

Integer

Float

Boolean

Datetime

Timestamp

Person

Address

FormattedString

JSON

Tabular data

Time-series data

Event-based data

Hosting

CLI

REST API

Web interface

ETL integartion
(Airflow, Dataproc, Spark, etc.)

Management & reporting

Plugins & integrations

Code management integration

Kubernetes

Automation

Automated data imputation

Automated data rebalancing

Labelled tabular data generation

Re-usable policies for data transformations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Enterprise

Annual or multi-year

On-premise, private/public cloud

Details

The esssentials

Enterprise compliance

Enterprise success

Licence length

Data types

Data structures

String

Integer

Float

Boolean

Datetime

Timestamp

Person

Address

FormattedString

JSON

Tabular data

Time-series data

Event-based data

Hosting

CLI

REST API

Web interface

ETL integartion
(Airflow, Dataproc, Spark, etc.)

Management & reporting

Plugins & integrations

Code management integration

Kubernetes

Automation

Automated data imputation

Automated data rebalancing

Labelled tabular data generation

Re-usable policies for data transormations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Free developer package

30 days

—

On-premise, private/public cloud

Developer hub

Details

The esssentials

Enterprise compliance

Enterprise success

Licence length

Data types

Data structures

String

Integer

Float

Boolean

Datetime

Timestamp

Person

Address

FormattedString

JSON

Tabular data

Time-series data

Event-based data

Hosting

CLI

REST API

Web interface

ETL integartion
(Airflow, Dataproc, Spark, etc.)

Management & reporting

Plugins & integrations

Code management integration

Kubernetes

Automation

Automated data imputation

Automated data rebalancing

Labelled tabular data generation

Re-usable policies for data transormations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Enterprise

Annual or multi-year

On-premise, private/public cloud

Free developer package

Unlimited

10k

up to 200%

On-premise, private/public cloud

Developer hub

Details

The esssentials

Enterprise compliance

Enterprise success

License length

Database transformations

Masking

Generation

Subsetting

Max tables

Max rows per table

Supported databases

Open-source relational databases (PostgreSQL, MySQL, etc.)

Enterprise relational databases (Oracle, MSSQL, etc.)

Management & reporting

Hosting

CLI

REST API

Web interface

Plugins & integrations

CI/CD integrations (GitHub, GitLab, Jenkins)

Test containers

BigID

Code management integration

Kubernetes

Automation

Automated preserevation of key properties within one database

Preserving properties between databases

Re-usable policies for data transformations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Enterprise

Annual or multi-year

Unlimited

On-premise, private/public cloud

Details

The esssentials

Enterprise compliance

Enterprise success

Licence length

Database transformations

Maksing

Generation

Subsetting

Max tables

Max rows per table

Supported databases

Open-source relational databases (PostgreSQL, MySQL, etc.)

Enterprise relational databases (Oracle, MSSQL, etc.)

Management & reporting

Hosting

CLI

REST APICLI

Web interface

Plugins & integrations

CI/CD integrations (GitHub, GitLab, Jenkins)

Testcontainers

BigID

Code management integration

Kubernetes

Automation

Automated preserevation of key properties within one database

Preserving properties between databases

Re-usable policies for data transormations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Free developer package

Unlimited

10k

up to 200%

On-premise, private/public cloud

Developer hub

Enterprise

Annual or multi-year

Unlimited

On-premise, private/public cloud

Details

The esssentials

Enterprise compliance

Enterprise success

Licence length

Database transformations

Maksing

Generation

Subsetting

Max tables

Max rows per table

Supported databases

Open-source relational databases (PostgreSQL, MySQL, etc.)

Enterprise relational databases (Oracle, MSSQL, etc.)

Management & reporting

Hosting

CLI

REST APICLI

Web interface

Plugins & integrations

CI/CD integrations (GitHub, GitLab, Jenkins)

Testcontainers

BigID

Code management integration

Kubernetes

Automation

Automated preserevation of key properties within one database

Preserving properties between databases

Re-usable policies for data transormations

Enterprise data masking and obfuscation

Enterprise privacy-preserving data generation

Rich API

Support & services

Bespoke end user training

Manuals, guides and refreshed materials

SLA - M-F 9-5 or 24/7

Questions?

Which data sources is Synthesized capable of connecting to for the purpose of creating test/training data?

Available immediately: Oracle, Postgres, DB2, Sybase, SQL Server, MySQL. Sources enable upon request: Sharepoint / File Shares / SMB, Drill, Druid, Hive, Solr, CockroachDB, CrateDB, Exasol, Elasticsearch, Firebird, BigQuery, Google Sheets, Informix, Netezza, MonetDB, ASE, Hana, Snowflake, Teradata. Development of new data connectors is straightforward and we would be happy to work with you on any specific requirements.

For structured data, is Synthesized capable of reading the data schema for the physical structure of the data to be understood?

Synthesized is capable of reading the data schema. Synthesized understands the underlying data model so that different protection techniques can be applied at an attribute level. Users can also provide additional information: either annotating the data with information about whether a given field is of a particular format (e.g: addresses, names) or providing rules about the structure of the data itself (e.g: column A > column B).
Available immediately: Database Schemas/DDL, JSON Schemas.

Can Synthesized acquire and protect data from multiple data sources, i.e. multiple tables and databases? When creating cleansed versions of these tables / databases, will referential integrity be preserved?

Yes, Synthesized is able to acquire and protect data from multiple data sources including multiple tables and databases and can be integrated into ETL pipelines. In creating test/training data, referential integrity is preserved. This is also true if sensitive data is used for references/foreign keys.

Does Synthesized support attribute-level data anonymization - where information relating to a data subject (e.g. a clients name) is removed, thereby eliminating the possibility of identifying the data subject?

Yes, Synthesized supports two ways of attribute-level anonymisation.

Data obfuscation.
- Partial masking. Values can be partially (or totally) be substituted by a placeholder character, "x" by default. For example, the value "4905 9328 9320 4630" would be replaced by "xxxx xxxx xxxx 4630".
- Nulling. The contents of a column can be completely removed, and the output dataset would contain an empty column.
- Swapping. The output column contains the same unique values as the input one, but they are randomly shuffled so that correlations with other columns are completely lost.
- Random strings. Generate random strings with similar format to input values, for example "490GH830L" could be transformed into "L3N8O3H2M".
- Generalization. Individual values of attributes are replaced with a broader category. For example, the value '19' of the attribute 'Age' may be replaced by ' ≤ 20', the value '23' by '20 < Age ≤ 30' , etc.
Fake data.Synthesized also supports the substitution of production values with realistic "fake" data and continuity can be maintained across attributes in a row using the Synthesized annotation feature (e.g. a real name will be replaced by a "fake" name and this same "fake" name can be used to create a "fake" email address etc).
Generated "fake" data is coherent across columns.

Does Synthesized ensure referential integrity when it replaces sensitive values with "fake" equivalents?

Yes, Synthesized will ensure referential integrity is preserved when generating new data. In addition,Synthesized handles referential integrity across multiple data sources. Circular references may require additional configuration but we have handled these successfully with several customers.

How is data security defined for derived data compared to original data points?

Synthesized is highly secure and we continuously work to ensure we adopt the latest security protocols and techniques. Security at the platform level applies universally to original and Synthesized data points and includes:

Technology
- We support the latest security protocols such as JWT tokenization, SSL and Bcrypt cryptography to keep data secured
Access Management
- Access to both sensitive original data and Synthesized data is controlled through role based administration including functional privileges, such as whom can modify or edit data sets, and sharing privileges, such as whom can see the original data attributes or the Synthesized ones
- Single sign on integration with SAML 2.0, OpenID and Active directory

The derived data can then be exported to enterprise databases for consumption and usage, at which point the data does not contain any sensitive data nor links to the databases or connectors with the original data.

When using another tenant’s data, is access control/approval managed by the original data owner or mirrored

Access control and approval can be managed both by the original Data Product Owner or in a mirrored fashion where the owner can set up a mirrored environment for any user.

Language support: What are the main languages supported by the platform? e.g. Python, Spark, Hive SQL, …

There are three ways to interact with Synthesized engine:

The core SDK is a Python package, which can easily be integrated into any Python pipeline
The core TDK is a Java package
The engines can run in a Docker container so the user can communicate via API, supporting any language with network access. ETL integration is also straightforward in this case
Additionally, we offer a web interface for more interactive and user-friendly communication with the engines.

What are the logical data models vs the physical model, e.g. tables, files, etc. ?

The lineage plot includes the following types of nodes:

Physical models:

Host
Database
Schema
Dataset
Transformation

Logical models:

Data Model
Data Entity

The plot is a nested structure of physical and logical models connected with edges representing flows of data. We are happy to provide any additional detail.

Describe expected performance estimates for the following scenarios, assuming a reasonable component deployment:

The lineage plot includes the following types of nodes:

Small & Simple:Single Table, Independent Attributes, 30 Columns, 100k recordsAnalysis: < 10 secondsSynthesis < 10 seconds
Small & Connected: Single Table, Several attributes that are dependent and must be processed consistently, 30 columns, 100k recordsAnalysis: < 5 minutesSynthesis < 1 minute
Medium & Simple: 10 tables with no circular references, references only by primary key (normalized), 20 columns each table, 50k records per tableAnalysis: < 20 minutesSynthesis < 1 minutes
Medium & Complex:30 tables with circular references, references only by primary key (normalized), 20 columns per table, 100k records per table.Supported by a separate moduleAnalysis < 1hSynthesis < 10min
Large & Simple :Single table, 40 columns, 1TB of dataAnalysis: < 3 hrSynthesis < 1 hr
Large & Complex: 40 tables with circular references, multiple dependencies between tables (data copied between tables), 30 columns per table, 1m recordsSupported by a separate module onlyAnalysis: < 3 hrSynthesis < 1 hr
Very Large & Complex:Full Data Warehouse, 100's of tables, lots of interdependent data, 10TB of dataSupported by a separate module onlyAnalysis: < 24 hrSynthesis < 4 hr

Pricing that fits your needs

Free

Enterprise

Enterprise

Compare plans, feature by feature

SDK

tDK

Questions?