Part 1: How to Not Write Bad Unit Tests


0 Intro

First, some definitions:

With these definitions in mind, let us discuss some ways to write unit tests that are clear, effective, and reliable. The discussion below will cover general principles. Most examples are in Java and jUnit, but the ideas involved should be translatable to other languages and unit testing frameworks.

1 Stick to a single unit.

The whole point of unit tests is to verify one thing at a time. If the success of a particular test depends on the correct operation of more than one software component, then the cause of a failure of that test cannot be determined. Here's a contrived example of what not to do:

@Test
public void testBadCoupling() {
    // given
    Thing thing = new Thing(2);
    Sprocket sprocket = new Sprocket("sprocket", 3);
    // when
    int result = thing.getOffset() + sprocket.getVolume();
    // then
    assertEquals(5, result);
}

The problem lies in the when section, in that it calls two separate methods and checks the sum of the results. What should instead happen, is that the two methods should be checked separately.

Here's a sneaky example:

@Test
public void testSneakyMultipleUnits () {
    assertEquals(123, Thing.createInstance(456).doSomething(789));
}

That's two calls, again. It may seem trivial to call a static method to create an object, but it nevertheless is still a separate operation from calling methods on that object afterwards. The instance creation should go in the given portion of the test, and should also have test methods of its own. Here's what it should look like, expanded somewhat:

@Test
public void testThingCreateInstance() {
    // when
    Thing t = Thing.createInstance(456);
    // then
    assertNotNull(t);
    assertEquals(456, t.getOffset());
}

@Test
public void testThingDoSomething() {
    // given
    Thing t = Thing.createInstance(456);
    // when
    int result = t.doSomething(789);
    // then
    assertEquals(123, result);
}

The modified form actually tests instance creation and the doSomething method call separately. More tests are warranted, but at least no test relies on any feature of any unit that isn't already tested somewhere else.

At the far extreme, there is sometimes a temptation to put as much into a test as possible, to test as many components as possible at one time, with the idea that, if it passes --if-- then everything is OK. This might even be combined with high code coverage. But this approach is that of the foolhardy. Should the test fail, it will take a lot of effort to determine why it failed. Which piece caused the failure? If someone looking at the test can't determine why it failed, then the test is of no use to them.

2 Use the Given-When-Then Formulation.

Tests should be structured into three sections: assemble, act, and assert. Another way of labeling it is given/when/then, similar to formal logic proofs. This structure breaks up the logic of an individual test into discrete parts, making it easier to read, understand, and reason about.

@Test
public testSomething() {
    // given
    Thing thing = new Thing();
    // when
    int result = thing.calculate(13);
    // then
    assertEquals(81, result);
}

A test should specify an optional starting state, what unit (class + method) is being exercised, and what the expected result is. This typically takes the form of variable initializations, method calls, and assertions, respectively. Furthermore, subdividing a test method into these three sections (preferably with comments to denote the sections) helps organize and make the test more readable and understandable.

In some cases it may also be permissible to combine the second and third parts (when + then = expect), as when one calls a simple method and compares the return value immediately to an expected quantity, without bothering to store the result in a local variable:

@Test
public testSomethingRevisited() {
    // given
    Thing thing = new Thing();
    // expect
    assertEquals(81, thing.calculate(13)); // only do this when it's short
}

The Spock Framework has a given/when/then labelling mechanism built in. Various behind-the-scenes AST transformations are activated by the labels, making for simpler and more elegant test methods. For example, assertions can be written simply as expressions, without any need to call various assert* methods. This set of features alone is reason enough to recommend Spock.

def "test the calculate method"() {
    given:
    def thing = new Thing()
    when:
    def result = thing.calculate(13)
    then:
    81 == result
}

def "test the calculate method using an expect block"() {
    given:
    def thing = new Thing()
    expect:
    81 == thing.calculate(13)
}

There is also an optional piece in between given and when which serves to confirm the state of some object before the action takes place. This is called the "precondition" or "guard assertion". It is not meant to test the correctness of the given section; that should be covered by the then section of a separate test specific to that end. Rather, it documents a change of state of an object, by making assertions both before and after the action. A precondition is not useful for stateless actions, such as a pure functions.

@Test
public void testThingConstructorWithOffset() {
    // when
    Thing thing = new Thing(4);
    // then
    assertEquals(4, thing.getOffset());
}

@Test
public void testSetOffset() {
    // given
    Thing thing = new Thing(2);
    // precondition
    assertEquals(2, thing.getOffset());
    // when
    thing.setOffset(3);
    // then
    assertEquals(3, thing.getOffset());
}

http://wiki.c2.com/?ArrangeActAssert
https://en.wikipedia.org/wiki/Given-When-Then
http://spockframework.org/spock/docs/1.0/spock_primer.html (look for "blocks")

3 Use mocks and dependency injection.

Hard-coding dependencies hinders isolation of units. It's like forcing the testing of multiple units. This is related to #1 above. A unit test should cover as few pieces as possible, so that if it fails, we know exactly which component was the problem, which requirement about the operation of the software does not meet the requirements. If an object instantiates a number of sub-objects in its constructor, then it is effectively forcing the testing of multiple hidden units together.

Instead, make components that can be composed from other components as parts. Then, in the tests, substitute carefully defined stand-ins in place of the parts that would be used for normal operation. This allows for testing objects and sub-objects separately.

A fairly standard example of this is a database connection. If a component needs to talk to a database, have the constructor take an argument for the database connection. If the code is written in a statically-typed language, said argument should be an interface or abstract base class. At run-time, if no value is specified for the argument, use a default (likely whatever class that was going to be used anyways). At test-time, specify a purpose-built object that returns predefined answers, without talking to the database at all.

Let's work through an example:

Class App {
    Private DatabaseConnection connection;
    Public App() {
        connection = new DatabaseConnection("hard coded connection string");
    }
    // ...
}

This is just poor design to begin with. What if the db has to be changed? The only way to test this is to let it connect to the database specified by the connection string. Presumably, since it's hard-coded, that means that any tests running would be interacting the the same database as production. Dear reader, you would be right to be terrified of such a proposition.

class App {
    private DatabaseConnection connection;
    public App(connectionString) {
        connection = new DatabaseConnection(connectionString);
    }
    public static void Main(String[] args) {
        App app = new App("hard coded connection string");
        // ...
    }
    // ...
}

This is a little more flexible. It doesn't force tests to share the production database, but it still requires us to stand up a database instance to run the tests. Let's pull the DatabaseConnection reference out of the constructor completely:

interface Connection {
    // ...
}

class DatabaseConnection implements Connection {
    public DatabaseConnection(String connectionString) {
        // ...
    }
    // ...
}

class App {
    private Connection connection;
    public App(connectionString) {
        this(new DatabaseConnection(connectionString));
    }
    public App(connection) {
        this.connection = connection;
    }
    public static void Main(String[] args) {
        App app = new App("hard coded connection string");
        // ...
    }
    // ...
}

... and then create a stand-in class to use in a test:

class MockConnection implements Connection {
    // ...
}

class AppTest {
    @Test
    public testSomething() {
        // given
        Connection conn = new MockConnection(/* ... */);
        App app = new App(conn);

        // when
        // ...

        // then
        // ...
    }
    // ...
}

That is much more composable. Now the App class does not have to know anything about the actual connection being used. The app in production can still use the same connection string as in the original (but really, that should be configurable from an env var or properties file or something, though that's another topic entirely). The tests can use stand-in replacements for actual database connections, testing the underlying logic without needing a live DB instance. Everybody's happy.

The above examples use custom-made mock objects. There also exist mocking libraries that make it very easy to create mocks that match a certain signature or behave a certain way. For example, Mockito can use Java reflection to construct custom testing objects that inherit from a given production class on the fly.

Also, the term "mock" is being used here somewhat loosely, to mean any object that stands in during tests for what would be used in production. There is a veritable panoply of so called test doubles (e.g. mock objects, fake objects, dummy objects, test stubs, and more), each with subtle distinctions as to what particular fakery they perform. The distinctions are a subject for another time. They each serve the general purpose, though, of facilitating isolation testing of code.

https://en.wikipedia.org/wiki/Dependency_injection
https://en.wikipedia.org/wiki/Mock_object
https://en.wikipedia.org/wiki/Mockito
https://en.wikipedia.org/wiki/Test_double

4 Resist the temptation to eliminate duplication.

In other words, "Don't Don't Repeat Yourself." In normal programming, code duplication is a bad thing. Many practices in general programming are intended to reduce that duplication where appropriate. In tests, however, various forms of simple duplication can in fact increase isolation, making tests independent and more useful. Perhaps surprisingly, replacing duplicate test code can make tests coupled to each other and harder to understand.

https://en.wikipedia.org/wiki/Duplicate_code
https://en.wikipedia.org/wiki/Don%27t_repeat_yourself

4.1 Static variables

Gathering constants into static variables is frowned upon, but will work well enough. It may require some annoying refactoring later if and when requirements change, and tests have to change with them:

static final int OFFSET = 456; // this doesn't really improve anything
@Test
public void testThingCreateInstanceWithStaticValues() {
    // when
    Thing t = Thing.createInstance(OFFSET);
    // then
    assertNotNull(t);
    assertEquals(OFFSET, t.getOffset());
}

@Test
public void testThingDoSomethingWithStaticValues() {
    // given
    Thing t = Thing.createInstance(OFFSET);
    // when
    int result = t.doSomething(789);
    // then
    assertEquals(123, result);
}

This has the same function as using the literal values, but has a few minor, long-term downsides. In particular, if the requirements for the software should change such that each of the above tests methods requires different numerical values when constructing the objects, the code will have to be changed more broadly than if each method had simply used the raw numbers. An acceptable compromise would be to assign the value to local variables within each function. That way, the variable can be used to document the meaning of the numbers, without coupling the tests together.

@Test
public void testThingCreateInstanceWithStaticValues() {
    // given
    int meaningfulOffsetValue = 654;
    // when
    Thing t = Thing.createInstance(meaningfulOffsetValue);
    // then
    assertNotNull(t);
    assertEquals(meaningfulOffsetValue, t.getOffset());
}

@Test
public void testThingDoSomethingWithStaticValues() {
    // given
    int differentMeaningfulOffsetValue = 456;
    Thing t = Thing.createInstance(differentMeaningfulOffsetValue);
    // when
    int result = t.doSomething(789);
    // then
    assertEquals(123, result);
}

4.2 Assertions in other methods

On the other hand, extracting sets of assertions into methods to be called multiple times is a major red flag.

public void assertWidget(Component sub) {
    assertNotNull(sub);
    assertTrue(sub instanceof Widget);
    Widget widget = (Widget)sub;
    assertEquals(1, widget.getVolume());
}

public void assertSprocket(Component sub, String name) {
    assertNotNull(sub);
    assertTrue(sub instanceof Sprocket);
    Sprocket sprocket = (Sprocket)sub;
    assertEquals(name, sprocket.getName());
    assertEquals(1, sprocket.getVolume());
    assertEquals(0, sprocket.getValue());
}

public void assertThingInstance(Thing thing, int offset) {
    assertNotNull(thing);
    assertEquals(4, thing.countSubThings(Component.Type.All));
    assertWidget(thing.getSubThingAtIndex(0));
    assertSprocket(thing.getSubThingAtIndex(1), "base1");
    assertSprocket(thing.getSubThingAtIndex(2), "base2");
    assertSprocket(thing.getSubThingAtIndex(3), "base3");
}

@Test
public void testThingCreateInstanceWithAssertionsInMethod() {
    // when
    Thing t = Thing.createInstance(456);
    // then
    assertThingInstance(t, 456);
}

It's not technically incorrect. All of the warranted assertions exist, somewhere. But anyone looking at the method will have to go looking elsewhere to find where the assertions are located. The intention of the test is not clear at a glance. What's worse, if one of the assertions in assertSprocket fails, it won't be clear which Sprocket instance was the offending one.

4.3 Loops

Worse by far than extracting assertions into methods is putting assertions into a loop. With method calls, at least the point of failure can maybe be divined from a stacktrace. With loops, even that is not possible.

@Test
public void testThingCreateInstanceWithLoop() {
    // when
    Thing t = Thing.createInstance(456);
    // then
    assertNotNull(t);
    assertEquals(4, t.countSubThings(Component.Type.All));
    int i;
    for (i = 1; i < 4; i++) {
        Component sub = t.getSubThingAtIndex(i);
        assertNotNull(sub);
        assertTrue(sub instanceof Sprocket);
        Sprocket sprocket = (Sprocket)sub;
        assertEquals("base1" + i, sprocket.getName());
        assertEquals(0, sprocket.getVolume());
        assertEquals(1, sprocket.getValue());  // failure here, but which loop iteration?
    }
}

Which gives the following failure:

java.lang.AssertionError:
Expected :1
Actual   :2
at com.izrik.examples.ThingTest.testThingCreateInstanceWithLoop(ThingTest.java:140)

Woe betide the poor programmer who is tasked with uncovering the cause of failure. It is almost impossible to determine what has gone wrong without carefully examining the code to be tested, or running the test in a debugger, both of which defeat the entire purpose of automatic unit tests. Please don't ever do this.

4.4 Setup methods

Most testing libraries provide some kind of setup() method; try to avoid it if possible. The downsides to this are the same as the example above concerning static fields. Most test methods should be able to confine their setup to the given section of the test method. If not, then the units being tested are probably too complicated.

In jUnit, this is done with the @Before annotation:

public class WithSetupTest {

    Thing thing;

    @Before
    public void setup() {
        thing = new Thing(456);
    }

    @Test
    public void testThingCreateInstanceWithSetup() {
        // then ?
        assertNotNull(thing);
        assertEquals(456, thing.getOffset());
    }

    @Test
    public void testThingDoSomethingWithSetup() {
        // when
        int result = thing.doSomething(789);
        // then
        assertEquals(123, result);
    }
}

The setup() method gets called once for each test method. Again, there's not much benefit to the setup method, at least in this example. The separate test methods are now coupled; to make changes in one without changing the other, the setup code will have to be re-duplicated. What's more, in the example, the initialization of the object would normally go in the when section but is now separate from the test completely, which makes the purpose of the test harder to understand without looking at multiple methods.

4.5 When to duplicate

There is one case wherein it is allowable to de-duplicate code: testing libraries will usually provide a tearDown() method (corollary to setup()) to clean up resources that were allocated during the test. It is a good idea to use it if there is anything that needs to be properly disposed of at the end of the test(s). It can close files, sockets, database connections, and the like (not that the tests should be using these, but if they do...). It will be guaranteed to run after each test method, even if the test incurred a fatal error, such as an uncaught exception. The code that runs in the teardown method won't affect the outcome of the test, and using it is more reliable than trying to re-create exception-handling code in every test method.

In jUnit, this is done using the @After annotation.

public class WithTearDownTest {

    Thing thing;

    @Before
    public void setup() {
        thing = new Thing(123);
    }

    @Test
    public void testCalculate() {
        // when
        int result = thing.calculate(234);
        // then
        assertEquals(497, result);
    }

    @After
    public void tearDown() {
        thing.closeAllConnections();
    }
}

Even if there is an exception thrown by calculate(234), the tearDown method will still be called. This does tend to require the use of a setup method, but hopefully it shouldn't be needed often. Using the above principles should reduce the need for a tearDown.

To Be Continued

These suggestions are enough for now. They should give you some direction in what not to do to avoid some common obstacles to good tests. In the next installment, we will discuss some positive practices that will make tests better, in addition to the above bad practices to avoid. Until then, I bid you good day.