February 15, 2022

How to Build a Modern Testing Organization

How to Build a Modern Testing Organization


Seva: We sat together with Denis, the CTO of Synthesized, and Ivan, Staff Software Engineer, to discuss how to establish the proper quality gate, and how you can start writing tests and configure linters, even in the earliest stage projects. We looked at how to avoid spending all of your resources,  which is crucial for early-stage projects.

Why do we need quality gates and a test automation process?

Denis: I often see that developers hope that even if they start with low quality, they can improve it in the future. While you can do refactoring, an important part to be aware of is the code is never ideal—you always have to try to improve that. Yet it is important that some pieces are right from the very beginning, otherwise, it will be expensive to change later. And, important to call out: it is cheaper to start using them straight away for things such as a static code analysis or quality gates.

Quality gates are not hard to configure or start using at all. If you have not configured from the beginning, later, when you have thousands of lines of code, it’s going to be a headache to try and pass all of these quality gates. 

Seva: At this point, some might argue that it’s easy to configure everything. We know how it happens. You add a tool like SonarCloud or Linter for Kotlin, and you get into this rabbit hole to configure how you can write code that passes the gate.

Denis: Yes, the tool might be too restrictive and produce a lot of false positives or false negatives. The good news is that some tools have good defaults. From my experience, SonarCloud has a very liberal configuration by default. You most likely won't have any problems if you enable it, and that's already a very good start.

Seva: Ivan, what's your experience with it? 

Ivan: Well, first of all, I have to agree with Denis, if you are not starting tests from the very beginning, you’re writing untestable code and then you cannot do anything about it. And then, if you are not using static analysis from the very beginning, but you choose to attach an analyzer later, you’ll encounter many issues, and you will not be able to fix them. 

So it's very important to start with all those quality gates, from the very beginning! 

Some say the actual writing of code may slow down your productivity. Developers will complain that they cannot merge their changes to the code quickly. Your responsibility as a senior engineer is to configure everything correctly. If you are using Checkstyle, for example, you have to very carefully choose what you will prohibit, and what you will accept. If you are considering certain test coverage requirements,  then they should not be too restrictive to the point they won't allow developers to do their work. 

Seva: I agree with this. At least at this point if you haven't started writing according to the linter from the beginning, later you may see a lot of warnings in the build logs and you will never be able to distinguish the real issues from inessential ones. 

Denis: Overall, the hardest and most expensive part in testing is always the starting point: introducing testing to your project. Once you have the infrastructure in place, it's relatively easy to code tests. 

Seva: That's true, you need to have the infrastructure in place,  the linters configured, and some configurations written for the test automation. You need a step-by-step approach to the point where you are comfortable writing more tests.

Ivan:  And while it may slow you down at the very beginning, eventually it will speed up your development dramatically.

Seva: You need tests, not because the manager wants big coverage, but rather for faster feedback. You want to debug your code faster, without spinning all the environment locally using Docker. Tests overall allow you to do things much faster and smarter.

Denis: I worked in startups several times, and I understand what it takes: sometimes it comes with pressure, sometimes to show the investors your prototype next week. You need to finish some coding and you’re working under high time pressure. Of course, you will sacrifice some quality and quite often you'll compromise on some tests. So it’s hard to judge people for not following all development practices because there are different business conditions, especially in the startup world. So what does it mean? If you don't have time, you can’t just relax and say you don't have the time and you're not going to write tests.

Even if you've worked under time pressure and you have very little time resources, you can still start doing testing. We would call it the “zero-level” of testing.

How do you start testing if you don’t have many resources and you’re in a rush? 

At this stage, you should be very smart about your return on investment. What would give you the biggest return, given a small investment? This is your strategy.

Of course, there are optimal strategies, we know about the test pyramid and what that should look like. But now, let’s focus on quick wins. How to get something without spending significant time (maximum half an hour) and yet get results. First of all, I would recommend setting up some smoke tests. It will tell you if your project is alive or not, especially when you are under pressure to show progress. Whenever you make a change, you need to quickly see if it's still alive. 

Secondly, look into static analysis and quality gates, both easy to start and implement. No code is needed for any test. The beauty of static analysis is that it comes for free since it's configured. You just need to follow all analyser recommendations and that's all, it tests code for you!

Thirdly, I recommend you to focus on integration testing, as it’ll answer some high-level questions, for instance, whether your code works or not.  

These measures won’t provide you with very good coverage, but high-level answers at this stage are more important.

Denis, what tools would you recommend for this “zero stage”? 

Denis: Some integration and additional E2E testing,  depending on your language and your technology. If it's Spring then you’re probably going to have a Spring component test using spring context. If it's the UI part of your code then it’s going to be Playwright.

Seva: At this stage, you don't want to do a lot of tests but rather cover some cases and some smoke tests. E2E tests are hard to write but very helpful and it's a good balance.

Denis: I think E2E tests become more tricky for bigger applications, with a lot of components. For some applications probably the only test set you need are end to end tests. For example, if it’s a CRUD application you enter some data and you get it back to your test almost entirely. 

Ivan: We mentioned the static analysis and end to end tests as minimal requirements for each project. We should add that Dockerfile and docker-compose are equally important to do from the very beginning. It’s the best documentation for the project and for the team that later will deploy the application into the cloud. It will provide the other developers with the important information about what is needed to start your application. 

I would also add documentation coverage requirements, which are no less important than test coverage. Never forget about, and you should think about documentation as code from stage zero. What you need at first is just to to set up a separate folder in your project with a bunch of Asciidoctor files, but it will evolve with your project. With all these in place, you don’t need to catch up later when your project has evolved too much.

Denis: Yes and I think a good part here is if you use a modern CI system, you get it for free. You just need to configure that. No actual work, no actual coding, just configuration that needs to be done

Seva: Do you think it's feasible to manage all this with only one middle-level engineer? 

Denis: I think it's feasible, even if you're under time pressure. If a backend engineer is working on a REST service, you can create Swagger test lists. You can even have some unit tests for complex logic.

Ivan: You must have the experience. In the beginning, if you’ve never done these things, it may take you some time while later, it will take only a couple of days. 

Moving from “stage zero” to “stage one”, what improvements would you make?

Ivan: Now you can plan your testing activity, make it part of your development process, and maybe start thinking about how your testing pyramid should look. 

I think that if at “zero level” you omitted unit testing, now it's the best time to introduce it and do this right. So you should select a tool for your tests,  depending on the language of choice. If we are talking about the world of Java, then I’d prefer JUnit 5 because it's the most elaborate testing framework for Java. And based on your experience, you should select the best testing framework that will help you to do unit testing effectively. When you are writing your code together with unit tests, you will not be able to write untestable code, which you will not be able to cover with tests later. So this is the most important thing at this stage.

Denis: The holy grail of testing are the tests that can run very fast! They give feedback very quickly, shape your code and help you to design. 

Ivan: At this point, you will also face the problem of testing your external dependencies to your unit tests or component tests. Your program can run some databases or it may depend on Kafka or Redis. In Java, we have some fine tools like TestContainers.

Seva: Yes, you probably can find some tools for JavaScript or Python development. At the same time, I think it's good to think about the coverage, how you measure it, how many tests you want to write and how many things haven’t been covered yet. You most likely care more about which methods you forgot to cover, and where something will possibly crash rather than the numbers themselves.

Ivan: I agree. Numbers do not make any sense when we are measuring coverage, except for a legacy project that has never been tested before. When at some point you attach a coverage report to an untested legacy project the numbers will show you the progress that you make in writing the tests. 

Aside from that, for a greenfield project that was tested from the beginning, knowing the number alone doesn't add much value. You might have close to 100% coverage and still have lots of bugs. Besides that, it’s generally difficult to achieve a high percentage of coverage. So I think that the most important thing here is to have a clear coverage report that you can watch at any time to see if you forgot to cover some branches of your code.

When talking about Java, we have JaCoCo which calculates the coverage of branches. You don’t need to cover everything! It’s ok if something is not covered when you can see with your own eyes that the uncovered code is correct. What you should cover are complicated methods. And if you explain these points to your programmers, they will complain less about the need to cover their code.

Seva: Some people believe that component and integration tests are superior to unit tests because they run in an environment closer to production and produce higher code coverage. What do you think? 

Ivan: I think the most important problem is terminology. What do we call a unit test? What do we call component tests? What do we call integration tests? Because borders are very vague here, at a high level I’d prefer to talk of two kinds of tests: those that don't need our program to run to test it and those that do. 

The first category are, for example, unit and component tests, while end to end tests are the second category. 

Then, we can subdivide these two categories into a hierarchy based on the complexity of setup which is needed to run the tests. When we talk about unit and component tests, some of them don’t require anything. No databases, just mocks. And some of them will require a database, Kafka, Redis and all the modern stuff to run. 

If you are moving up this hierarchy of tests, you'll get more and more restrictions, because more and more of your tests will become asynchronous. If you are running actual systems, like databases, they are black boxes for you. At the top of the hierarchy, there is a web browser, which is a complex and poorly manageable system. You ask your browser to do something and then you just wait for it to happen. And if it doesn't happen, you never know if it's not going to happen at all, or if you've just been waiting not long enough, and you should wait for a bit longer. And this is a fundamental, insurmountable problem of all the asynchronous tests.

It is the same problem for all the so-called “integration” tests which involve “real” systems running. And although such tests are closer to “real-world” scenarios, they are not a “silver bullet” of testing.

Take Apache Kafka. If you are running a real Kafka message broker and in, say, a five-second interval your test didn’t receive any messages, you can never tell for sure that you won’t receive a message just one millisecond later. Thus, you just cannot write a reliable test for verification of some complex logic, and you will have to rely on lower-level tests for this.

Answering your question: to test your system thoroughly, you always need all kinds of tests, low-level and high-level, because they test different aspects of your system, and one cannot substitute the other. 

Seva: There is an opinion that is coming from web development circles that if the only external system you have is a database, and you work with that database with ORM, then for the test you can use an in-memory database. So those tests, they're simple, they seem to be somewhat in-between like unit and integration tests. Strictly speaking, they’re not unit tests. They test several units, but also they're quite lightweight. They work with stuff on the memory and don't do any network communication. So they’re in a very sweet spot. A lot of people argue that you need only them. 

Ivan: The problem is that your mock or in-memory database never works as a real database. So if you use something like H2, you don't have guarantees that if your code works on H2, it will work on PostgreSQL. And this is a huge problem with mocks and things like H2.

But you still have to use both approaches.

If you use an H2 in-memory database then your tests will be lightning fast.  But if you are using TestContainers to run an MS SQLServer, it will take you a couple of minutes just to start an SQLServer in a container. Programmers will not run slow tests very often. And this is another problem with ‘real’ systems used for testing: the tests are just slow. 

And also there are things that you actually cannot test without mocks. In Kafka world, for example, we have a great mock tool called TopologyTestDriver, and it runs very fast. It has rich functionality and it allows you to verify things you cannot verify asynchronously on real Kafka clusters, but it still doesn't guarantee that it will behave like a real Kafka cluster. What should we do then? Write most of the tests using TopologyTestDriver and a couple of end-to-end tests using a real Kafka in TestContainer.

The same with Redis. We have a tool called JedisMock and it is a re-implementation of Redis in Java, and no doubt it is a buggy re-implementation. Why do we need it then? First, because when you use JedisMock you don’t need Docker and everything runs much faster. Then, it’s not a black box any more. You can intercept and verify the commands that were sent to Redis by your application. You can easily return any data from that mock that you need for testing or even emulate failure. These things cannot be easily achieved with a ‘real’ Redis.

Denis: Also, in my experience, I used to be a believer in this idea that having the right component test with an in-memory database, all will be fine. But unless you get something on top of that, your worldview is ruined. You suddenly need to write complex mocks and you don't know how to. For example, if you have Kafka, your code is untested as you can't properly test it.

On top of low-level tests that utilize mocks you should also have a test that utilizes a real system. That's basically what I was trying to say (you need both types of testing, one cannot substitute the other), but from another point of view.

Key takeaways?

Denis: I think you should thoroughly think about things that you are going to test. You should build some framework or platform and test it thoroughly with test containers. By using API or a platform that guarantees you the same work with mock and real data, you can write lots of unit tests using mocks that will test your business's complex business logic.

Seva: At stage one, you have to write more unit tests and component tests, and use the test coverage reports. This will ensure you cover everything you want to!

How large of a team do you assign to this? 

Ivan: It depends on the experience of engineers. 

Denis: To invest in the infrastructure you need at least two engineers, one focused on business tasks and the other one on technical debt backlog. 

Moving to the next stage or “stage two”, how can we improve?

Denis: Well, I guess the third level is already about having a systematic approach to your testing organization-wide. You should start thinking about things such as efficiency in a high-scale manner. 

For example, for the end-to-end tests, if you prefer using Selenium you can start to think about how to use them in parallel. Or if you have, for example, Playwright tests or some tests which are checking UI, you could build some dynamically created environments, allowing you to run this test against each branch before the merging. This requires more time to configure and perform more precise end to end tests.  In order to not waste several hours to run such tests, you take them case by case or requirement by requirement.

Should we start removing these tests at the same time as writing more on the low-level?

Ivan: Yes, we should keep the set of end-to-end tests as small as possible, because they are flaky and expensive, and not something one would run too often. So you should keep them short. And at this point, we have many people, we have a mature code base and all our pipeline is going to run longer. A developer should not wait for more than 5 to 10 minutes to get to that green checkbox or red cross. So at this point, we should think about how to achieve this. Maybe by not running all the tests, running tests in parallel, cutting the end-to-end tests, and writing more fast-running unit tests.

What about manual testing? 

Seva: If you're thinking about manual testing as a role, which you can do at least two hours a day while you're developing something, that probably makes sense.

Manual testing is common sense: you write the code, you run the environment, or you code your API. This is something like you're doing manually while you're developing, and I’m not quite sure if you need to do this repeatedly, stage by stage. As soon as you are writing some test automation using some quality gates, there is no value in having a team member on point for this. 

Denis: Yes, speaking of activities, manual testing exists in general as a practice, rather than as a profession. And if you look at all types of testing, there are two big groups: a) regression testing and b) exploratory testing. Regression testing is a subject for automation, so you would automate your regression testing completely. Automation of exploratory testing is tricky, but on the other hand, exploratory testing is less resource-intensive as it can be done by team members or product owners.  In most modern technology companies using modern testing is performed by either developers, product owners, or both. 

Ivan: I agree, no person should be specifically appointed to just read some script and repeat it without meaning.   

Denis: Still, while we do not have a dedicated person to perform this activity, we need a time and a place to do so, and for that, it would be nice to have an environment.  While the developer may have finished their task, you or the product owner will need to deploy it somewhere. An interesting trend nowadays is getting rid of static test environments. For scaling, when more teams are working within the same set of test environments, it can cause frustrations, and people start creating different environments. A more modern technology nowadays is a dynamic environment that is created for each pull request.

It’s very interesting, but might be challenging, depending on your system. Monolithic applications are relatively simple. If it requires a lot of services it can be more tricky. Especially if it’s a stateful service, it means for each pull request, you have to create your database and create completely isolated instances. 

Ivan: You need Kubernetes and cloud “magicians” to do this. But if you have such engineers in your team, it works nicely. This is the point where your docker-compose file from “stage zero” will help you a lot. If your whole application still can be run on a single laptop, you can use your docker-compose. yaml to run the application and test it, explore it and make demos. And, if you have engineers skilled in how to deal with Kubernetes in clouds, they can take this docker-compose file and automate things and then you just open a pull request, and you have a link to an environment made especially for a certain branch.

Denis: Another challenge is creating the test database. And here I have a chance to talk a little bit about Synthesized, and what we are building. It allows you to create test databases very quickly. You can potentially use it for creating this dynamic environment. Whenever you create pull requests, you can create a test database for this environment very quickly and automatically.

Seva: To sum up, at the “stage two”, we are erasing end to end tests, writing less and moving them to the low-level unit and component levels. We are also working more with environments, and establishing a way to parallelise these tests. 

At this point dedicated resources are needed, engineers that maintain and work with all that, that goes beyond building new features and running some tests. At this point, you need to support and create such infrastructure.

Denis: On the topic of creating the platform team, the reality of scaling calls for you to have  dedicated people looking after scaling. 

Seva: The next level, the “stage three”, is where you continue to improve at scale and look into pricing, contract testing, and so on. 

Let us briefly discuss what quality gate means today.

Denis: The way developers typically work is they create the pull request with code and somebody reviews that. But as part of this review process, you have automated checks. And typically you want to make a manual review as small as possible, and do all possible checks, automatically. This includes running tests, linters, static analysis, security analysis and so on. 

Then there are even dedicated solutions that provide these checks out of the box. You connect them and they create this quality gate for you. So they provide a set of tools that they run automatically for each build. And this is what a quality gate is.

Ivan: It is the thing that allows you to pass your code to the next stage after checking everything. And after the green check mark, you can push it to production. That doesn't mean that it will not break at some point but it does mean that you did everything to catch the bugs. Nothing more can be done.

Denis: Yes, and there is the management beyond that. You have something called the “definition of done”. So if all checks pass, it means your increment is sufficient quality, and it can be pushed to production.

Ivan: And in many projects, we still have a manual code review and discussion, even if all the CI tests are green. But when we are talking about most business projects,  if the code is green then you can push.

Denis: Finally, there are the AI-driven code reviews, quite experimental as a concept. There are projects which are trying to automate that part to some extent. Overall, in the future, you'll be just creating pull requests and you'll get the green check box. And then you just push it to production. 

Ivan: Talking about the definition of quality gates… You know that even my five-year-old kid knows pretty well what a quality gate is. I’m working remotely from home sometimes and he knows that daddy is always busy and worried about red crosses, but as soon as that big green check mark appears dad finally has some time to play. So that’s what quality gates mean for him.