Clean unit tests in 7 lessons

It’s been a while since my last post here, due to simply lack of time. In the last months, I was busy with preparing Cantiga Project for the first release. Because of the amount of work, I sacrificed automatic tests. Now, once the first release is out, I’m trying to catch up and write tests for various components.

At work, I’m an expert in building automated testing environments of different scales: from unit tests to large-scale acceptance tests of distributed systems. I’m also a mentor for new employees and interns, so I have a good overview of the knowledge they have at the beginning. Do they know how to write big test suites? Unfortunately, not. And not because they forgot to learn it. It’s because it’s not that easy - I’ve been collecting my experience in the last 7 years, and I’m still learning something new. This inspired me to share my thoughts about building and maintaining automated test suites, organized into 7 lessons I learned over time.

Lesson 1: Tests must be stable

‘Yesterday I fixed the tests for module XYZ’

‘Oh, and they work now? What did you do?’

‘I just deleted those 5 pseudotests that were failing randomly’

‘Dele… WHAT?’

It’s amazing how many wasted hours could be saved, if developers were not so worried about deleting tests that have no chance to work. A test that fails randomly brings absolutely no value for developers, except frustration. Moreover, if they cannot understand, what the test is doing, they cannot distinguish between a false positive and a real problem (and what about maintenance?). In my test automation projects, I follow the simple rule: if the developer is not able to make his/her test stable within a few days since adding it to the test suite, the test is removed. In this way, we do not pollute the test suite with broken tests, and focus on the areas, where the automation is simpler. If you take over a poorly maintained test environment, this strategy also helps with getting it back to work.

I had a chance of working with several test suites that suffered from stability problems for a long time. All of them had three things in common:

  • developers had a lot of excuses not to touch the suites: the tests are important for us, these are complex scenarios (oh really?) and we need more time to fix them (OH REALLY?), etc.
  • the problems were caused by a small group of conceptually broken tests. These test were introduced with breaking all of the good software development principles: lack of review, developers worried of telling the author that his idea sucks, etc.,
  • over time, developers got used to committing new test cases that lacked stability, and nobody seemed to care about that.

I always start the stabilization by eliminating the bad habits: enforcing a code review, and challenging authors to make the tests deterministic until they finally see that their idea doesn’t work. Another step is throwing away some tests. I do it by showing the test code to several developers and asking them whether they understand what the test is supposed to do, and what should I fix if I see that it fails. If the response is negative, the test is deleted. Once the developers see that the tests got stabilized, they start following these rules on their own.

Lesson 2: Don’t test experiments

TDD is a wonderful approach. Unfortunately, when you apply it to a highly experimental prototype, it usually fails. Experiments tend to change API and interface contracts so rapidly, that maintaining an automated test suite becomes impossible with time/budget constraints. It was only a few days ago, when I heard yet another story like this, and I had very similar personal experiences.

We do experiments for two reasons:

  1. to gain experience, how the application and its API should look like to solve the problem,
  2. to get the results as soon as possible.

Unit tests are not practically maintainable, if the API is not stable. If you have experience in the given problem domain, you are able to start with a relatively stable API and write unit tests, and they help you achieving point #2. If not, you first need to gain it - the more experiments you can do, the faster you make it.

My current approach looks like this:

  1. for experimental projects, I start without any automatic tests,
  2. when the project starts looking nice, and I see that it has a chance to work, I start thinking about making the API stable,
  3. here come unit tests - I use them as a tool for making the API stable. I specify the contracts, add missing methods, remove inconsistencies and let the unit tests ‘certify’ it.
  4. over time, as the project gets matured, I move towards a real TDD approach.

If the project is not that experimental, I can start with a relatively stable API, so I immediately jump to step 2 or 3.

Lesson 3: clean code helps

Tests are about giving feedback to the developers. The shorter time the developer can figure out what’s wrong, the better. To understand the problem, one must also know and understand the use case. This is a good place to introduce clean code habits to our test suites. One of my favorite ideas is dividing the unit test into given-when-then sections, because it clearly shows what’s going on, and enforces testing one thing at a time. Basically, such a unit test looks like this:

@Test
public void testSomething() {
   // Given
   someInitialSituation();
   
   // When
   result = component.doSomething();

   // Then
   assertSomething(result);
}

After a couple of years, I see a big readability difference when I have to get back to old tests that follow this practice, and to those that do not. Other practices, such as paying attention to the code formatting, or encapsulating the repeating code into reusable methods, do apply here as well. Treat your unit tests as a regular, production code, that needs a proper care and maintenance, and they will work for you for a very long time.

Lesson 4: follow SOLID principles

Last year, I read a book called Building Maintainable Software by Joost Visser. Joost is a Head of Research in SIG, an institute focusing on measuring the maintainability of the software. Initially, I was a bit skeptical about it, remembering bad experiences with pointless metrics at work, but I gave it a try. It was worth it. The book showed me, how small things make a big difference in terms of the software maintainability. The author started with a metric, and then explained why it matters, showing both numbers from real-life projects, and practical examples of refactoring the source code of Jenkins CI.

I especially liked two metrics:

  1. write units of code (methods) that are no longer than 15 lines,
  2. limit the number of unit (method) arguments to at most 4.

I tried them in one of my projects and surprisingly, writing unit tests became a lot easier. I looked at my code and noticed that these two limits enforce following SOLID principles. If you can’t pass more than four things into some method, you need to introduce another class that would wrap more values. If you can’t make methods longer than 15 lines, you notice that certain parts of the algorithm can now be moved to the new class. The Single Responsibility Principle comes in.

More smaller units means that your problem is split into smaller subproblems and that you can test them independently. Smaller units, and fewer arguments means fewer mocks to write. Fewer mocks and smaller subproblems means less interaction with mocks to be programmed, and this means simpler and more readable unit tests.

Lesson 5: I/O is not stable

I often see ‘unit’ tests that initialize half of an application, setup some server, open network connections and do strange things. This has little to do with unit testing, and definitely does not help making tests stable. I/O makes our test depend on the state of the underlying operating system. Of course, there are harmless I/O operations, such as reading from a predefined configuration file, but in general, I/O is dangerous. Especially, I recommend avoiding socket operations. Sockets require acquiring a free TCP/UDP port, and in the continuous integration environment, it is very hard to ensure that each job uses a distinct port. I was working with a Jenkins instance that managed hundreds of jobs, and there was a group of modules that caused problems, because they often ran concurrently, and tried to use the same ports. UDP is even more dangerous, because it does not guarantee a delivery. And here we go back to the lesson 1 - if you can’t make a test stable, delete it.

Personally, I avoid any intensive I/O in my code. I do it by a careful API design in my applications. I try to separate the I/O calls from the algorithms that process the data, by introducing simple facade interfaces that can be mocked in unit tests. There is a place for I/O in higher-level functional tests of entire application, or system, where there is a different tolerance for failures. By doing so, I can test the algorithms independently in unit tests, giving the fast and deterministic feedback to the developers. Once I know that the algorithms work, it is very easy to spot any bug in the I/O code through functional or manual tests.

Lesson 6: simple scenarios first

If you are about to start a new test automation project, don’t automate complex scenarios in the first place. Complex scenarios require a big expertise in the problem domain to make them stable, and usually they need advanced testing tools, too (plus, the knowledge how to use them). At the beginning, you have neither of them. Test automation projects that start with a goal of automating complex scenarios, after a couple of months end up with one or two tests that hardly work, and alone, bring very little value. This is because the team wasted most of the time fighting against unexpected problems related to their lack of knowledge.

Start with simple scenarios. If it’s a GUI application, write a test that checks that all the menu options are in place. Then proceed to a more advanced test that opens all the pages to see if they work. Do it in small steps, learn and after a while, you will be able to automate even the most complex scenarios, and over time, you will build necessary tools for that. Simple tests DO HAVE a value, especially if they crawl entire application, because they can spot the most obvious problems, such as broken links or pages that do not load at all.

Lesson 7: separate GUI tests

Every a couple of months I need to fight against a developer who comes with an idea: hey, let’s write a cross-functional test that would make a change on GUI and check the response of 5 other modules. GUI tests are unstable by definition. While writing GUI tests, you have to deal with time-outs and non-deterministic behavior of the input event processing. So write two separate tests:

  • one for GUI interactions,
  • one that checks the rest of the system, and the necessary actions are triggered directly through API calls.

A good study of the topic can be found in Experiences of Test Automation book by Dorothy Graham and Mark Fewster. The book describes several big test automation projects, history of their development and final conclusions. Most of the chapters are not actually written by the authors, but by developers and testers involved in those projects. The topic of testing the GUI is covered in most of the chapters, and if you are interested in it, I can recommend you reading this book.

Summary

Test automation projects have a similar complexity level to regular software development. At the same time, there are much fewer good books and publications covering this topic, and the managers in many software companies tend to think that testing is good for interns. When I started working for my current company, the approach was very similar and I spent a lot of time changing it. After a couple of years, my current team not only has a well-maitained automatic test suite, but a knowledge, how to keep it in a good shape. Interns still come to the company and are involved in automated testing, but now they have an opportunity to learn something useful, and work with excellent, mature tools. And now it’s time for me to bring the same experiences to Cantiga Project.