Code Structure vs Behavior in TDD

Recently, I wrote a post about my talk at the 1st International Conference on Test-Driven Development (TDD).

The post covers the part where I discuss about the notion of robust tests and how unit size impacts robustness.

In that post, I leave one question unanswered: how can we decide the right size of a unit?

This question is important. After all, there seems to be correlation between the size of the units under test and test robustness.

Here, I address the question from two different perspectives — code structure and behavior — and I show that each perspective aligns with a different style of TDD — outside-in and classic.

As we will see, both TDD styles help us write tests that are robust.

Outside-in TDD is about code structure

In outside-in TDD, you write system-level scenarios (i.e., customer tests), and then you make them pass by proceeding towards inner layers of the software, guided by lower-level tests. At each step, you identify responsibilities and distribute them in different modules (the systems under test, or SUTs). This approach requires test doubles (a.k.a. mocks) to stand in for the modules that are not developed yet.

Test doubles can be seen as a design tool. When we are working on a module, we identify its needs; then, we decide whether the current module implements these needs or delegates them to depended-on modules. In the latter case, test doubles help us design the required interfaces.

Observe the relevance of code structure. We analyze responsibilities, and then we decide which modules to create, how they interact, and where to place test doubles. The resulting modules and test doubles impact the size of our units and the tests that we will subsequently write.

Deciding unit size

In this talk, Sandro Mancuso explains how we can reason about where to place test doubles.

Consider a class A that uses classes B and C. If you remember UML, you know that there are two types of associations: composition and aggregation.

In composition, B and C can be considered as part of A. If we incorporated B and C into A, A would still be cohesive. In this case, it makes sense to consider the three classes as a unit.

For example, let’s suppose A implements a tax-calculation algorithm. The algorithm checks marital status, sources of income, etc. These checks can go in different classes (B and C) but, if they didn’t, A would still be cohesive because all the logic has the same responsibility.

When we have aggregation, B and C are not part of A. If we incorporated B and C into A, this class would not be cohesive. In this case, three units may make more sense.

For example, let’s suppose A implements the check-out process of an e-commerce application. This process involves several steps: payment, user notification, etc. Each step is complex, so it can go in a separate class (e.g., B can deal with payment and C can be an email service). If we incorporated B and C into A, this class would deal with unrelated responsibilities. Therefore, it makes sense to test the check-out process in isolation and mock the other classes.

Common pitfalls

We must watch out for common pitfalls in outside-in TDD.

Small units: overuse of test doubles leads to small units, higher coupling, and tests that are less robust.
Test class per production class: in extreme cases, the structure of the tests will mirror the structure of the code. This negatively affects robustness: every time we remove classes, tests will fail.
Breaking encapsulation: when we focus on code structure, it is tempting to test every method in every class, even if they are private. Private logic changes often, so tests will break often.

How to increase robustness?

Backtracking. Once the modules in the inner layers are available, we can remove test doubles to increase the size of the units. We can also delete unnecessary tests.

But remember that small units also have benefits. You will not always remove test doubles (or tests), but this possibility is a good resource to have in our mental toolbox.

Classic TDD is about behavior

Classic TDD, as described by Kent Beck in the “Test-Driven Development by Example” book, focuses on behavior, and code structure plays a less prominent role.

This behavior-centric perspective was not obvious to me when I read the book. I became fully aware when I watched this brilliant talk by Ian Cooper, where he makes the following observation:

Adding a new class is not the trigger for writing tests. The trigger is implementing a requirement.

Creating a new structure of code, such as a function or a class, is not sufficient reason to write new tests. You write new tests when the system must exhibit new behavior.

This is how behavior drives development. You specify one behavior at a time, as automated tests; and, every time you write a new test, you make it pass, making sure that the test runs in isolation from other tests — when tests are independent, they are more robust.

And you don’t use test doubles, other than e.g. to replace external entities that introduce indeterminism or negatively affect speed. You don’t isolate units of code. The unit of isolation is the test, not the system under test.

This is why the popular notion of unit test does not match classic TDD.

This is explicit in Kent Beck’s book. He only uses the term “unit test” once: to say that tests in TDD are not unit tests:

Kent Beck uses the term “small-scale tests”. Other terms are “developer tests” (by Ian Cooper) and “micro tests” (by GeePaw Hill). The common factor is the focus on behavior, not code structure.

To give us a feel for what behavior looks like, Kent Beck offers some examples in his book.

When he discusses about tests, he states behavior: “we need to be able to add amounts in two different currencies”. He does not say: “we need to call an add method in a MoneyCalculator class". No one gives you such a requirement.

So, when you formulate tests (e.g., using the Given-When-Then notation), make sure they state behavior:

GIVEN: two positive integer numbers.
WHEN: we multiply the numbers.
THEN: we obtain correct multiplication as a result.

Not code structure and implementation details:

GIVEN: two 32-bit positive integer numbers.
WHEN: we invoke the multiply method on the Calculator class.
THEN: the method returns the correct result as a 32-bit integer number.

Advantages of behavior-oriented tests

The focus on behavior of classic TDD has several advantages:

Tests are more robust: code structure are volatile implementation details; behavior is at a higher and more stable level of abstraction.
Tests are more intent-revealing: when tests verify small units of code, it may be hard to see the big picture. Tests that specify high-level behavior allow you to understand requirements better.

Conclusion

Classic TDD focuses on behavior, while code structure plays a more central role in outside-in TDD.

In outside-in TDD, you make some design decisions upfront, and these decisions affect the resulting code structure, the size of the units, and the tests that you write. Test robustness depends strongly on unit size. By contrast, in classic TDD, upfront design is minimized; design emerges as you make tests pass and refactor. The units of isolation are the tests, so test robustness depends on this isolation and on the focus on high-level behavior.

Outside-in TDD is not all about code structure; you also specify behavior: the behavior of isolated units of code. Classic TDD is not all about behavior; you need code structure to access the behavior in the system. But tests are unaware of internal implementation details; that is, tests have no knowledge of how the behavior is partitioned.

For more details on the topic of this blog post, you can watch my talk at the TDD conference here:

https://www.youtube.com/watch?v=APFbb5MwLEU&list=PLJ3Q-TNrdsXi-och0A0PaXKojDlxv4YsB

Mario Cervera's Blog