Common test smells

Most of us are familiar with the problems exhibited by the systems that contain design smells (understanding the term “smell” as defined in Martin Fowler’s book Refactoring: Improving the Design of Existing Code).

Some of these problems are:

The system is hard to understand.
The system is hard to change because every simple change forces you to modify many other parts of the system.
The system is immobile because its parts are so tightly coupled that they cannot be reused independently.

Many books offer techniques to deal with these problems. A notable example is Martin Fowler’s refactoring book, which describes ways to improve the structure of code without altering its observable behavior. Other examples are Code Complete by Steve McConnell and Working Effectively with Legacy Code by Michael C. Feathers.

Reading books about good Software Engineering practices is a great way to grow in your career. However, I have observed that some developers tend to apply the acquired knowledge to production code only: they do not treat test code with the same care.

This is unfortunate. Test code should be kept to the same quality standards as production code. Otherwise, the quality of the production code will decline over time because you cannot refactor safely (without good tests).

This is why smells in test code are as bad as smells in production code; and they can even be worse.

If the production code has design smells but the tests don’t, the tests give you a safety net that allows you to fix the smells through refactoring. But, when the problem is in the tests, how do you fix it?

This post is not about techniques to improve test code. Rather, it describes five common test smells that can help you identify when it may be necessary to take corrective actions.

1. Fragile tests

When some behavior changes in the system, it is expected that the tests that assert the old behavior fail. After all, the system does not exhibit the old behavior anymore.

This is the very reason why we write tests in the first place. We may change the behavior of the system unintentionally, and, when this happens, we want the tests to fail and warn us. This is how tests prevent bugs.

But, if we modify code without changing behavior and tests fail anyway, then the tests are failing for no valid reason.

These tests are fragile.

A fragile test is a test that breaks easily. It is a test that fails when it should not fail. It is a test that imposes a heavy burden because we are forced to revisit it often.

When we are forced to revisit tests more often than we should, we cannot change and improve code comfortably.

A good suite of tests makes refactoring easier. When the tests are fragile, the effect is the opposite.

2. Slow tests

Tests may be slow.

A common cause is access to external sources, such as the file system, a database, or a distributed service.

The most obvious consequence of slow tests is that productivity decreases. We waste precious seconds every time we run the tests.

But there are other, less obvious, consequences: our ability to prevent bugs decreases and bugs become more expensive to fix.

Our ability to prevent bugs decreases because the key to bug prevention is getting immediate feedback about code changes. If we don’t get this feedback, we can introduce bugs and discover them the next day, or the next week, if we ever do it. We get immediate feedback by running the tests with every change to the system; however, this is not practical if the tests are slow.

A nasty side effect is that bugs suddenly become more expensive to fix because they are discovered late, when the context is not fresh in our minds.

When tests take too long to run, we can't run them often enough. Productivity decreases and we lose our ability to deal with bugs in a cost-effective way.

3. Obscure tests

Automated tests have some non-obvious benefits.

One of them is that automated tests give us examples about how the system is used at the code level. Therefore, they help us understand the system better.

Another benefit is that tests offer defect localization. Tests must fail when we introduce defects; that is, when the behavior of the system changes in ways that it should not change. Ideally, the tests will help us locate the problem easily.

If we want the tests to be useful code examples and to help us locate defects effectively, the tests must be readable, not obscure. If the tests are obscure, their benefits are minimized because it is not easy tell what the tests are testing. We can only reap the benefits of testing when the cause-effect relationship between the inputs and the outcomes of the tests is crystal-clear and easy to identify at first sight.

Tests must be intent-revealing; otherwise, they lose most of their value. The system becomes harder to understand and defects harder to diagnose.

4. Tests with conditional logic

When you introduce conditional statements and loops in a test, the complexity of the test increases. After a certain (very low) threshold, you cannot be sure that the test is bug-free and it works as expected.

You need automated tests for the tests.

But then you need tests for the tests of the tests. When does this recursion stop?

The “trick” is to write tests that are so simple that they can be easily seen correct, thereby not requiring testing. This happens, for example, when tests contain only a few sequential statements.

When a test contains branches or loops, it is more complex than it should. Complex tests can hide subtle bugs.

5. Assertion roulette

Tests must verify a single condition.

Another way of expressing the same concept is that tests must assert a single expected behavior.

This is easy to say. However, it is hard to get an idea about what exactly a single condition (or a single behavior) is. These notions are subjective.

What works best for me is thinking about tests in terms of their three well-known phases: arrange, act, and assert.

Testing a single condition does not imply that there is only one “physical” assert statement. It implies that there is a single act phase and a single assert phase within the same test. We avoid a series of act-assert pairs.

The problem with alternating act and assert phases in the same test is that, when the test fails, it is hard to determine the failing assertion because the test verifies multiple behaviors. When this happens, we say that we are experiencing assertion roulette.

Tests should have only one unambiguous reason to fail.

Conclusion

If you think there are test smells that do not appear in this post, you are right. The nature of the problems that can arise during testing is too diverse to fit in a single post.

For example, another type of test smell is erratic tests: tests that behave in an apparently non-deterministic way. Sometimes they pass and sometimes they fail, and it is not clear why this happens or how to obtain predictable results.

Despite the necessary incompleteness of this post, I hope that it gives you an idea of how bad tests look like and the problems they can bring. Hopefully, this will motivate you and, next time you come across a bad test, you will feel the urge to act and fix it.