Automated Test Data Generation and Masking Automation for a Pathology Software Provider

About the Customer
A well‑known HealthTech software provider delivers a mission‑critical Laboratory Information System (LIS) used by pathology labs where every second impacts patient care timelines and lab revenue.
The LIS orchestrates end‑to‑end sample‑to‑diagnosis workflows across large, complex relational databases ranging from 30–200 GB per customer environment. The engineering teams responsible had the software quality challenge of testing (using production realistic test data) and reproducing bugs/ issues in order toresolving high‑priority incidents.
The Challenge
The provider’s legacy LIS monolith suffered performance issues in production because there was no safe way to test against large, representative datasets that mirrored 30–200 GB customer environments.
At the same time, a high volume of support tickets was driven by issues the support team could not reproduce, as they were blocked from accessing live customer data for regulatory and security reasons.
This constraint was particularly critical given the LIS role in end‑to‑end sample‑to‑diagnosis workflows, where downtime or slowdowns directly affected patient care and lab revenue.
Support engineers had resorted to handcrafting data scenarios, a slow and inconsistent process that still failed to surface deeper performance bottlenecks.
To address this, the HealthTech provider defined clear test data management requirements.
They needed to:
- Work with very large relational databases (30–200 GB) without degrading performance or sacrificing referential integrity across complex LIS schemas.
- Automatically detect and anonymize PII across nested JSON fields, documents, and binary blobs to ensure zero patient or customer data exposure in non‑production.
- Generate production‑realistic, human‑readable test data for performance testing, regression, and live debugging while preserving the statistical properties of real workloads.
- Provision compliant datasets to the offshore support team so they could reproduce and fix high‑priority issues without access to raw production data.
- Partner with a mature, responsive vendor able to meet their scale and timelines, offering strong support and pricing flexibility after evaluation of alternative tools.
The Solution
Synthesized Platform was implemented as an AI‑native test data automation layer on top of the company's production databases. Using Synthesized, the team ingested and anonymized full‑scale customer databases in the 30–200 GB range, leveraging automated sensitive‑data discovery to find PII even in JSON fields, documents, and other complex structures.
Synthesized applied privacy‑by‑design transformations that removed all PII while preserving relational integrity and business logic so anonymized datasets behaved like the original LIS environments. The platform generated synthetic, human‑readable values (for example, realistic names and addresses) to keep test data understandable for engineers and support staff while remaining fully compliant.
With this foundation, the provider was able to:
- Stand up representative performance test environments and run load and regression tests at realistic scale, surfacing previously hidden performance bottlenecks and enabling focused technical debt reduction.
- Provision safe, anonymized copies of customer datasets to the offshore support team so they could live debug critical tickets against realistic data without breaching data residency or privacy controls.
This approach aligned with Synthesized’s strengths in AI‑powered schema analysis, automated masking, and production‑quality data generation at scale, delivered through a unified, automation‑ready platform.
The Impact
By adopting Synthesized:
- The HealthTech software provider reduced customer frustration and support escalations by resolving LIS performance issues faster.
- Manual effort eliminated. Support engineers moved from manual data creation to precise debugging on anonymized environments, accelerating resolution of high‑priority incidents without increasing compliance risk.
- Fast and efficient workflows adopted. Engineering teams uncovered and fixed deep performance bottlenecks, improving system responsiveness under real‑world workloads.
- Production realistic test data was generated and masked, on-demand and at scale, in minutes, ensuring the required availability of large‑scale, annonymized test datasets with zero compliance risk.
Summary
By deploying Synthesized Platform as an AI‑native test data automation layer, the company automated PII detection and anonymization at scale, preserved LIS business logic, and generated human‑readable, production‑realistic data for performance testing and compliant offshore debugging.
The result was a shift from test data as a blocker to a safe, on‑demand asset that improved system performance, accelerated issue resolution, and provided a scalable foundation for future growth.
