Characterization testing: adding tests to legacy code

Some people feel uneasy when they test-drive code, so they favor the traditional workflow where testing is an after-development activity. Other people, on the contrary, believe that adding automated tests after development is more challenging, so they favor a test-first approach.
Even though I belong to the second group, when I am dealing with legacy code I do not get to choose.
Adding tests to legacy code can be painful. Very painful.
When you write tests first, you know (or, at least, you have a fair idea about) what the new code needs to do. You have expectations for the new code, so you can capture these expectations as tests. In contrast, when you add tests to legacy code, your knowledge about the code can be a major limiting factor. Legacy code is convoluted, obscure, and hard to read, so, most often, you don’t understand the code well.
Lack of understanding is an insidious problem. Without a well-grounded approach, adding tests to an unfamiliar system is like crossing a busy avenue blindfolded: you may succeed, but the chances of a negative outcome are high. You may produce tests that are not significantly better than no tests at all, and this will make leaving the code untested a reasonable alternative.
This is a serious problem. As Frederick Brooks says: “the only constancy is change itself” [1]. The need for change always arises, and you need a comprehensive suite of tests to modify code safely.
As a solution to this problematic situation, Michael Feathers proposes a less-conventional approach to testing that is called characterization testing [2].
In this post, I describe what characterization tests are, what benefits they bring, and how we can write characterization tests effectively.
What are characterization tests?
A conventional way to look at automated tests is as documents that describe the expected behavior of the system. The feat of “tests as documentation” is difficult to achieve in legacy code, however. If we don’t understand the code well, the intent of the tests is not in our head; therefore, we don’t have expectations that we can easily articulate. Writing readable and intent-revealing tests seems impossible in this context.
The way out of this dead end is a shift in perspective. If you, just for a moment, relegate documentary value to a secondary role, you realize that writing tests just to invoke the system and observe its output offers insight and learning.
This perspective is what characterization tests exploit. Characterization testing relaxes the initial readability focus so that you can observe the actual system in operation and capture your observations as automated tests. As your knowledge grows, you will improve the readability of the tests, but the focus will be on making your observations more explicit and intent-revealing.
Characterization tests allow you to invoke the legacy code, observe what it does, and understand its behavior. When you write characterization tests, you do not document your expectations. You characterize the actual behavior of the system.
Benefits of characterization testing
The main benefit of characterization testing is that you improve your understanding about the legacy code. You add one test at a time, and this allows you to learn about the system incrementally.
This is not the only benefit, however. While you improve understanding, you gain a suite of tests almost for free. This suite, similarly to any other form of automated testing, helps you preserve the behavior of the system. If you change some behavior unintentionally, the characterization tests will notice and warn you. Characterization tests give you a safety net that reduces risk when applying changes.
An equivalent way to look at the benefit of risk reduction is bug prevention. If you run the tests often enough, every time they catch an error, the live time of this error has been a few seconds or minutes, instead of days, weeks or even months. Characterization tests help you detect errors early, when they are cheaper to fix.
Characterization tests help you improve your understanding of legacy code, reduce risk when applying changes, and make errors easier to detect and fix.
Preserving behavior of legacy code
A common argument against characterization testing is that legacy code is often buggy. Why would you want to preserve behavior?
When a system has been in operation for a non-negligible amount of time, users depend on the way it works. Some behavior may look defective to you, but it is possible that users rely on that behavior.
Michael Feathers says:
When a system goes into production, in a way, it becomes its own specification. We need to know when we are changing existing behavior regardless of whether we think it's right or not.
Preserving behavior is important, but, when you write characterization tests, it is common to come across behavior whose correctness raises reasonable doubt. If you suspect that some behavior is a bug, get the opinion of other stakeholders. If it is a bug, go ahead and fix it.
How to write characterization tests
There is one thing that you will not do when you write characterization tests: looking at functional specifications.
Functional specifications, if they exist, state what the system is supposed to do, not what it actually does. We don’t look for mismatches between the behavior that we expect and the behavior that the system exhibits. Characterization testing is not bug search; it is characterization of actual behavior. Therefore, we will look where the only truth about the system behavior lies: the code.
Looking at the code to write tests is not a bad thing. Characterization tests are white-box tests, and we can use this fact to our advantage. For example, we can use code coverage and mutation testing tools to help us decide what tests to write next. Assisted by these tools, we can achieve a comprehensive suite of tests more easily.
An algorithm for characterization testing
Michael Feathers suggests the following algorithm to write characterization tests:
Put a piece of code in a test harness; that is, call it from a test.
Write an assertion that you know will fail.
Let the failure tell you what the behavior is.
Change the test so that it expects the behavior that the code produces.
Repeat.
Step 1 is usually the hardest. You may want to call a method from a test, but, to do so, you need to instantiate the class that contains the method. Instantiating this class can be tough if it has undesired side effects, such as accessing a database or loading an expensive resource. You may need to break dependencies first.
An example of characterization test
Steps 2 to 5 are the steps where you write characterization tests.
Suppose you want to add tests for a function called “padString”. You look at the code of this function, and, apparently, it fills an input string number with 0s so that it contains a certain number of digits. It also looks like it removes certain characters. However, you are not sure because the code is hard to understand.
Looking at the code, you are almost certain that, if you pass “3.45” into the function, it will not return “abc”, so you write the following failing test:

The name of the test is deliberately vague. You don’t have enough knowledge at this point to come up with a good intent-revealing name that states the behavior under test.
You run the test, and, as expected, it fails. You look at the assertion failure and observe that the actual output of the method is “0000000345”. Now, you know more that you did before running the test.
At this point, you can update the test so that it asserts (and preserves) the actual behavior of the “padString” method. And, if you feel comfortable enough, you can also improve the test name:

After the test update, you can continue to write more tests. You will stop when you are satisfied with your understanding of the “padString” method and when the tests allow you to apply safely the changes that you want to apply.
Conclusion
Characterization tests offer a different perspective on testing.
Unlike test-first approaches (such as Test-Driven Development), the focus is not on driving development with tests. Unlike other test-later approaches (such as manual exploratory testing), the focus is not on finding bugs. Instead, characterization testing focuses on (1) characterizing the behavior of legacy code to understand what it actually does and (2) preserving its behavior under changes.
Increasing understanding about code is key to minimize the chance of errors when we modify it. This chance of errors is also greatly minimized by the suite of tests that we get almost for free.
References
[1] The Mythical Man-Month: Essays on Software Engineering. Brooks, F. P. (1975, 2ª ed. 1995).
[2] Working Effectively with Legacy Code. Feathers, M. Prentice Hall Professional (2004).



