AI Bias: Shifting from Old Misconceptions to New Practical Ways to Mitigate Business Risks And Drive Impact - Blog

In this episode, we welcome Ken and Matt from Speedscale. They provide a service tool that enables any company to stress test their APIs. They are technical co-founders and being technical founders ourselves, we find it quite exciting to share the learnings and challenges of addressing certain pain points in the markets right now, and building a great business. We’re excited about this episode because whenever we talk about test data, synthetic data, and automating QA processes, we inevitably end up dealing with questions about whether we can help with the stress testing of APIs. It's a massive problem that software engineers, test engineers, and QA engineers inevitably face. It's a great opportunity to discuss how test data can be utilized together with Speedscale to enable very fast and efficient stress testing of APIs.

The full recording of their discussion is available here.

Nicolai: I'm joined by Marc and Denis from Synthesized. Marc is our COO, based in Atlanta, Georgia, and Denis is our CTO based in London. Both Synthesized and Speedscale are addressing similar pain points using complementary technologies in terms of stress testing of APIs and test data. We had a call last week with a potential customer and we thought that they needed test data, but we realized that they needed test data for testing APIs. And the project we did for them was stress-testing for APIs before using test data. Approaching this fairly challenging problem from a technical standpoint is interesting, and I’m sure we can share great ideas.

Tell us a bit more about your work and whether you consider Speedscale to be a tool or a framework?

Matt: We consider ourselves as a tool because whenever there's a new wave of technology that comes along there are always the early adopters. And there are organizations such as startups that run out ahead. As Ken likes to say, they run through the haunted house, get the cobwebs all over themselves and eventually figure out the right way to do something. What we have right now, especially in cloud-native testing and this space as a whole, are a lot of frameworks that people use and they're bolting them together while trying to figure things out themselves. We believe in the cloud-native philosophy, which is automation.

Everything that can be automated should be automated, and that sounds more like a tool. What we're doing for our customers as we're growing is we're figuring out the right way to do things such as automatic service mocks driven by traffic. We replay production traffic, and we put it into automatically creating service mocks, creating tests, etc, which is hard. And if you read Facebook and Twitter's blogs, they're trying to figure that out too. They are spending a lot of energy doing that. We're doing that with all of our early customers.

And we’re codifying that into a best practice in the tool. We have framework elements in what we do, but the automation approach means we're more like something you pick off the shelf, and it runs and refreshes on its own, and does what it's supposed to do without humans.

In terms of stress testing, tell us a bit more about the pain points you are trying to address and tell us about the challenges you currently see in the market regarding the stress testing of APIs.

Ken: One of the big challenges that are happening nowadays is that development teams are reaching for new types of architecture, such as cloud-native architectures and microservices, where we can assemble an application out of many different parts. We're going to internal APIs and third-party APIs, and the development of this is rapid because there are so many reusable components. And because it's so fast, you want to release as fast as possible, which means that there’s no time for any kind of manual processes, manual QA testing.

You don't have time to sit down and write test cases for a month before your release. You want to release the code the day it's done. Companies end up realizing they can't use their traditional approaches and many start moving to this concept of testing in production and shifting the problem to the right.

We're not sure about the quality of this code. We want to try to get it into production in a safe way. And you'll hear teams use phrases such as having a “limited blast radius.” Worrying about things that have a limited blast radius when they explode is not the way you want to think about your application exploding after you spend all the development time building it. That's one of the big industry trends I see, as people want to release fast, they want to use the newest technology and they want it to blow up the least amount of times.

Matt: Some questions that we’d ask our customers are if they’re testing in production, or if they’re A/B testing or not, and how it’s going. Then we might ask if their customers like being treated like crash test dummies. And the answer is almost always no. They need quality. They want the release velocity of cloud-native, but they need the quality practices that are now starting to catch up.

Nicolai: That relates to how we address this pain point. We simply ask companies if they still use production data in their testing and QA processes, and what they think about that. In terms of the stress testing of APIs, we see that with most of our customers right now. It’s a huge pain point and it's a great market for Speedscale right now.

In terms of your previous experiences, is this an idea that you discovered recently?

Ken: In my last company, I worked at a company called New Relic and we ran a very large SaaS service, with 10,000 plus customers on the platform and we were continuously ingesting an enormous amount of data.

The non-prod environment was well-known, and the staging environment was 1% of the size of production. With the quality signal, you get out of putting things into staging, and you can figure out if it functionally works, but you can't figure out if it can scale. We invested heavily in feature flag technology and a lot of the new development had to be released only to a subset of customers. All of a sudden the customers can tell the difference. And that happened frequently, especially with anything that was computationally expensive.

At times there was even some functionality that couldn't get rolled out to the entire customer base and at first, it seems like a good idea since have the feature flags that are limiting in some way. But if you can't build a product that's available to all of your customers, then you're going to have a hard time selling it to everyone. I experienced this firsthand having worked with engineering teams to get things deployed.

Nicolai: That helps when you truly understand the pain point. And it helps with the product, talking to customers and partners as well.

Marc, do you want to introduce Synthesized, the pain points we address, and the challenges we see right now in the market regarding test data?

Marc: It's interesting when you think about every company of a certain size that gets to benefit from economies of scale. However, with that size comes more operational bottlenecks and the handling of data. For security, regulatory compliance, and structural reasons, it exhibits challenges. The trade-off, and what most enterprises eventually do, is lower risk for data agility, which is wrong when it comes to being innovative and strategic. What we do at Synthesized is allow you to use synthetic data to help you mitigate risk while reducing both times to data and increasing value.

When you look at the core use cases and where we see our most successes there are two core use cases: one being agile test data management for software testing, where we're providing unlimited synthetic volumes of high-quality test data in the cloud, ensuring up to 100% coverage of all of the individual test cases, both functional and non-functional. We provide regulatory governance and compliance for privacy frameworks, including GDPR, HIPAA, CCPA, and more. The other core use case is around data science. When I look at the data science piece, this was the most interesting piece about the organization. It's always been around test data management, but the same technology can be used for machine learning, data science, and more, where during a data science project, many of these similar data-related problems can arise. At Synthesized, we tackle problems such as imbalanced biased and or small data sets and overfit among others. It’s about the ability to create full production-quality or subsetted production-quality instances for data pipelines in these use cases.

Nicolai: It's the problem I experienced for about 10 years. In data science and machine learning, there’s a common problem, which is understanding how to train and test machine learning models, but you need data for that. You need to train and test, and ultimately you want to make sure data is available right away, but due to different operational bottlenecks, it's not easy. Marc approached this from the software testing point of view and I approached that from the data science and machine learning point of view. We are trying to address this from the data pipeline perspective, ensuring that we create those data pipelines for machine learning and analytics use cases, but also some testing use cases.

We see right now in the space that Test Data Management is a huge market with the stress testing of APIs, which we see with our customers. At the same time, we see two types of movements. Some companies are trying to move to the left whereby software engineers run component tests and they basically run stress tests of APIs, but you also see companies doing tests in production, by deploying it and letting go.

Why do enterprise companies still need TDM and how do you see the market for manual testing evolving over time?

Denis: If you attend conferences and listen to what big technology companies report, they’re explaining an idealistic picture of both radical shift-left and shift-right testing. It means developers test and test in production using A/B tests, canary deployments, etc. They don't need QA functions at all. It ends up being an ideal DevOps and Agile paradise. What happens in the industry, especially in regulated industries such as financial services, is that shift-right testing is not an option. As a result, they cannot test financial products in production. Sometimes it's not only risky but also makes a customer unhappy, and it might not be possible from a regulatory point of view. Some companies have regulations where they have to test before they go to production.

In reality, an idealistic picture is quite far away for a lot of companies. That means you have to do a lot of production-like testing. You're not gonna test in production, but you need production-like environments to do E2E or stress testing, for which you need test data. You need production-like data to simulate production behavior, and Synthesized provides this type of data.

How have cloud-native technologies changed the way that companies approach testing?

Matt: Whenever there's a new wave of technology everybody rushes in thinking everything is going to be wonderful. They subsequently roll with it, get some experience, and all of a sudden 100 important customers now have all these problems because we're testing in production.

What cloud-native technologies are fundamentally about is elegant scaling, and release velocity. So the question becomes how do you get them that release velocity without throwing away all of your quality practices? It's an exciting time for both of our companies because we're solving two critical aspects of the same problem with forms of automation.

You need the test data, which is your department at Synthesized, and you need reliable test data that matches. An example from one of our customers a few months back was when they said, “we can create machine learning models using data that we synthesize, but it doesn't match the actual pattern of what’s in production.” What I'm excited about with Synthesized is that as you continue to evolve, you can take the production data and keep the right patterns to train the models properly without actually having the sensitive data. That’s the holy grail for a lot of the folks we have talked to. The other piece you need to get this testing function is the ability to replicate the production environment and real production in your CI pipeline. This is what Speedscale focuses on.

One company cannot solve all of this, but the ideal state as we get mature is that enough people are hitting a wall, not so much with banks, which have a different model. But as they hit that wall and they come back a little bit, they realized that there's enormous innovation. We can get the speed without giving up the safety.

Ken: I also want to point out one other big trend that’s part of cloud-native, which is the use of third-party APIs. You end up with a complex environment that has a first-party database and key data in there.

With third-party systems, the most obvious ones are CRMs. You have to ask if the record is in Salesforce, and does the ID in Salesforce match the ID that's in our internal system? This isn't going away, it's skyrocketing the use of APIs. We've focused a lot around that part of the problem and getting the APIs that you own, as well as the third-party APIs, back under your control. Bringing that back and shifting that left, because the only time when everything is all together is in production right now for most companies. If you can bring those environments back, with data in your databases, along with the Synthesized data through the third-party APIs, you all of a sudden have environments that you can use. So that's an area where we focus.

Denis: And in terms of provisioning environments, one of the interesting trends that we've seen before is the creation of a production-like environment, such as a staging environment. But nowadays with Kubernetes and similar technologies we are talking about the on-demand creation of those environments, which pushes this idea of democratization and testing by developers even further.

Ken: Dennis, that’s a great point. The majority of our customers are using the Kubernetes environment and the default state of the environment is off. Nothing is running. If you're using the cloud (most people are using the cloud) then you pay for usage, so it's off most of the time. And right when your CI pipeline runs for your code, all of a sudden there’s a proper environment set up with your new app, and also the dependencies behind it. We then run a replay and turn the whole thing off when you're done. This wasn't possible before when you might have to wait for tens of minutes, or hours in some cases, to get the environments configured in the cloud let alone the data center.

Matt: A lot of that is being made possible by all these cloud-native technologies, to your original question, in our product, we have two modes. You’re running with more tried and true technologies, such as Docker Desktop, Virtual Machines (VMs), or whatever it is. We love that. But if you're running in Kubernetes, we can do something clever. There’s a button that the developer can bypass, understand all the environments, and run it instantly. And that's what Kubernetes makes possible because all that stuff can be orchestrated and automated in more of a traditional environment.

Nicolai: All of us need to sell our product and demonstrate the value of the product to our customers. But we do that when we approach QA leads, QA functions, and QA engineers, and ultimately what they want is to automate the QA process and make sure that it's smooth, robust, and stable.

In terms of QA automation, what, in your opinion, can be automated, and what's hard to automate?

Ken: We work a lot with a new role that you see called an SDET, which is the modern QA engineer, and we focus a lot around APIs because APIs are highly structured, and easier to automate the replay of traffic into the APIs. One of our interesting observations was that when we started Speedscale, we talked to our first design partner customers, we said that because you already have a lot of testing in place, we're going to help you with this environmental problem. They asked us if we can make tests too. And, we've learned that it takes too long to sit down and write and code everything by hand.

So there's this, trade-off where we spend some time scripting and building things, but as much time as possible on more of a no-code assembly type of model. We spend most of our customer development time on the CI pipelines. As soon as we can get something in the CI pipeline, everyone in the team benefits. APIs are critical for this because of their highly structured nature type but then getting the data into continuous integration is critical.

Denis: APIs are one of the sweet spots for automation. UI is traditionally much harder in terms of automation, even though there is a significant amount of work that has been done recently. Here is my view on automation and what cannot be automated. There are two different types of tests: at the higher level, it's regression testing and exploratory testing. So regression testing is the subject ultimately. Sometimes it's easier for APIs.

Regression testing should be automated. You shouldn't have somebody going through a list of pre-scripted tests and executing them manually one by one. That's a no-go. There’s exploratory testing too, which is hard to do without an actual human or a person with a hacker's mentality. It needs someone who thinks about overlooked pieces, in terms of how you can break the system, or what can go wrong. They subsequently give this feedback to developers so they can improve coverage. This is hard, but users are good hackers. If companies do shift-right testing they’re using their users as exploratory testers. But then it's a question of can you afford that?

Matt: I wonder if it’s fair, Dennis, to think about it as having all the testing you're going to need to do, and then having the exploratory stuff which is out on the bleeding edge. Automation means you are trying to get it to waterfall down into the automation.

How do you see that exploratory activity becoming more automated?

Denis: It should be easier and more accessible. For instance, we've seen a lot of work in terms of exploratory testing done by developers. It's not only the testers that need production-like data, but also developers need it. If you have production-like data on your machine, you do unit testing, component testing, and some API testing and then you have production-like data. You can also see from the user's point of view what you might’ve missed if you need to have more unit tests to cover some cases.

Ken: I agree, Dennis, and it's not only testing but also the development work itself. I'm working on a new algorithm and want to run it, but I’m not trying to take a test. I need to run it and see what it does when you run it in and there's one record in the database, it takes one millisecond. But that's not how the production database is configured.

I need a real environment on my machine so I can run it and say I wrote the new code, and this algorithm is fantastic. Why does it take 5 seconds? You can get that feedback right away as a developer. You don't have to be doing an official unit test and check it in etc. That's one thing that we have seen. As mentioned earlier, one of the areas we focus on is third-party APIs. We were talking to a customer, whose business is built on top of Salesforce APIs. It's not simple for them to make a whole copy of the CRM and an entire sandbox that's properly configured and ephemeral, where they can throw it away and developers can get in and out. That's the use case for Speedscale: to bring Salesforce in a box. It's run as a Docker container on its machine.

For you at Synthesized, there’s always cutting-edge QA and testing. How does it change when you talk to a customer in a highly-regulated environment and startups?

Nicolai: It’s slightly tied to the overall topic of a shift to the right. Normally, QA people join startups when you have about 10 or more engineers before testing is done by the engineers or by customers, and you don't want your customers to test the software. When you have more to do, you need more engineers and start hiring QA functions, and QA leads. There are different benchmarks for different companies. When you have the complexity, you need to automate the process as well. That’s when Speedscale and Synthesized become important. A customer recently said that some of the teams they are working with are trying to shift to the right and enable testing in production.

What are the challenges of shifting to the right when you're a large enterprise?

Ken: There's a good use of the tool and approach and then there is overuse and using for the wrong problems. An example is, that we have a couple of different ways we can approach this, such as using a screen layout or the way people will engage with different versions, and you need to collect the data from real users which is difficult. It's hard to run surveys and get results, and that's a traditional AB testing predicament.

My other example is that we have to turn the feature flags on because this new under-development can’t be rolled out to everybody. And since we have these tools, some people take them to extremes and let the developer write a new piece of code, put a feature flag on it, and release it to production that day. And you think that one developer doing that is not the end of the world, but over time you will accumulate too many of these features. Additionally, a human has to name the features. Matt might name them differently, and I won't understand what they are called and if they are all linked.

And has anyone tested the combination of them altogether? No. So we start flipping these flags because there's a support ticket, and all of a sudden the customer is in this broken condition, and the only one experiencing this is your important customer. Ideally, since you're not confident in the code and you're unsure if it might break, you want to shift that work to the left to try and get a sense of the quality early. If we input this change, we want to know if it would break other components in production, or have a different performance profile than our current service does. That's where we see a good fit for Speedscale. People who are already shifting right know all of this stuff themselves.

Someone asked us if we can’t just do a rollback. Rollbacks are difficult, especially if you run database migration. We wrote all the new records with the new fields and now we have to roll it back, which is not possible. So now I have to restore from backup and everything.

Matt: Something we notice as companies scale is that everybody loves breaking monoliths into smaller services. For some people, microservices is a bad word, but for other folks, they're taking certain best practices from microservices. Some people are all the way in and have 500 microservices. What happens in big companies is there's a law called Conway's law, which says that any software architecture will mimic the organizational structure of the company.

That is fine, and the size of a service shrinks to the size of the team, resulting in hundreds of agile teams, and hundreds of Microservices. The problem starts when development becomes easy because the code base is shrunk down. A human can hold it in their head, and understand their code. But the problem is that there are now 500 of them, and they all interact. The moment something changes it becomes an explosion that fans out across a variety of different systems. So the pathology of these diseases in the productions becomes harder to find in terms of the source of the problem because the issue is no longer in one monolithic condition. It's somewhere between the services.

One of the challenges with traditional approaches is that most tools are not set up for that. Most engineers do not think that way, except for the leading-edge ones. An opportunity arises if companies such as Synthesized and Speedscale can make that process safe, get the benefit of shrinking it down, and have the velocity to do it safely. This is a huge opportunity, and that's how the market's evolving. Nobody's figured that out yet, but we're nibbling at the problem and so is Synthesized.

Nicolai: Absolutely, and for the stress testing of APIs we ultimately need data. But we still see many companies using prod data for stress testing of their APIs. And that’s considered a taboo right now.

What do you think are the main challenges of using prod data in the stress testing of APIs? And what are the challenges in getting the data to a test environment?

Ken: You can't just take your prod data and reuse it. A lot of people think the only data is in the database, the system of record, but it's also in your third-party APIs and your test case. If we've got Matt's name in the record, we're going to send a call into the payment system and it's going to look up Matt. If we do a traditional approach, we've got to replace Matt's name with Nicolai’s, and we've got to replace it in the test case, the database, and the third-party API.

Suddenly, you start to see that the project work for replacing names is bigger than the test project we're trying to run, and we don't have time for that. So you have to use synthetic data as part of this environment, and you have to consider all the places where things are.

You've heard us talk about our focus on APIs, and the integrations between components. That is a great spot, as you can see that one part is in scope, another part is out of scope, and you can draw the line and use the Speedscale component, and then you need Synthesized to do the same activity around all your databases. You need to ensure that they are all in sync, but now it can all be automated. Subsequently, the generation can be instant, as opposed to a one-month project each time you want to do it.

Matt: For the engineering mind, that's called a non-item potent transaction. If you have a non-item potent transaction, the data is going to get messed up in a lot of places. That's something we had to learn early on with Speedscale. We have to mock all of the connections, not just the inbound or the outbound because you should treat it holistically.

Ken: And I want to mention that regulatory requirements and privacy are a big deal everywhere. Not just in Europe, but in the U.S. as well. Even at the individual state level, California has its own rule, that looks a lot like GDPR. Grabbing production data and copying it or doing a bit of find and replace is not okay. That's not going to work. It's a major privacy problem and you see companies nowadays talk about their privacy policies in a much more detailed way. Especially since we work with a lot of B2B SAAS companies that pull this information out about their security policies, what they use, what the data is used for, and how you can't use your production data for testing.

Nicolai: There is no need to use production data to stress test APIs, and there is access to high-quality synthetic data on-demand, which can be immediately used to stress test APIs, and conduct performance and load tests, which many companies understand. It's a huge market evolving in the test space.

Denis: One of the interesting use cases of synthetic data when it comes to stress testing, is capacity planning. You can test how the system will perform when it’s 10 times bigger in terms of your data volume etc. You can always estimate, and with synthetic data, you can measure it and see how it works.

Ken: You can estimate, or you can run a real transaction through to get the real data. And we work a lot around Kubernetes environments and it's very well known in Kubernetes that if you don't configure your limits correctly (CPU and memory limits for your pods) your stuff will break. It's one of the most common ways that things break. When you ask the teams, how they are configuring it, they say it’s a guess. And then we wait for all the alerts to go off and then we start changing them. Or when the new code is coming down you could run a stress test through the environment, take measurements so you know what it is, and then you can decide, for example, if there is a need to increase the limits or what the service will cost to run as well.

Nicolai: Thank you for an incredibly insightful and valuable discussion guide. We had a great conversation about market trends in the testing space, how we can automate different steps in the QA process, and which steps are hard to automate for companies shifting to the left.

Avoid testing in production with Synthesized & Speedscale