From Example-based unit-tests to property testing

I recently wrote about how many unit-tests one needs to write to reduce the number of faults in programs.

One of the conclusion one can have is that this is a big number of tests to write!

Surely, nobody have time to write that many tests by hand!

That’s where property testing enters the scene.

1. A simple function: toUpper

Imagine you have to write a new function named toUpper that takes a string as input, and returns the same string, but in upper case.

The place you work with uses Kotlin, so you decide to implement it in this language, and you write its signature:

fun toUpper(input: String): String

However you implement this, you can already write some tests to ensure that it works as expected.

2. Hand written unit-tests

First, you think about a simple valid case, and you write it down, following a good AAA (archive) formats:

@Test
fun nominalCase() {
  // Arange
  val input = "hello"

  // Act
  val output = toUpper(input)

  // Assert
  output shouldBe "HELLO"
}

You try it, and all is well, the function seems to work.

However, you think about a weird case: what happens when an empty string is provided? Well, in this case, it makes sense to also expect an empty string.

So, again, you write your test:

@Test
fun emptyStringShouldBeValid() {
  val input = ""

  val output = toUpper(input)

  output shouldBe ""
}

3. Bugs happen

This function is shipped and used, but one day, someone says that it is sometimes broken. Indeed, when the function is given "é", it returns "E", but it should actually return "É".

That’s a new test, so you write it and then you’ll fix the implementation so that this new case also works:

@Test
fun emptyStringShouldBeValid() {
  val input = "é"

  val output = toUpper(input)

  output shouldBe "É"
}

It’s now fixed. But you’ve also thought about all accented letters, so you fixed the code for all those as well, but there are too many to tests to be sure. It works for one, it must work for all of them. Plus, your code has been reviewed, so your confidence is high.

Yet, you get a new bug. This time, a user says that given "´e", we should get "´E", but instead your function returns "É", incorrectly.

You accept the case, after arguing about the weirdness of this behaviour, but eventually accepting that Unicode allows this:

@Test
fun emptyStringShouldBeValid() {
  val input = "´e"

  val output = toUpper(input)

  output shouldBe "´E"
}

4. What now?

This is a typical example of developments, where multiple things interplay:

the original specification for your function wasn’t precise enough, leading to incorrect expectations (so bugs);
tests are added along with what users discovers overtime, testing only for a few particular cases.

One could argue that the developer should have thought about all cases, but it’s unrealistic.

Instead, we can use computers to help us generate a large amount of tests for us.

5. Property testing

To be able to ask our computer to generate cases for us, we need to think about tests a bit differently.

The computer can generate a set of values that we don’t necessarily know in advance.

For each value in that set, we should test something called a property.

A property is something that is true for the entire set of values generated. A good way to find properties is to think about the assumptions we made when writing the code.

Here, a simple one seems to be that the length of the input and that of the output are always equal. Let’s write this test with the help of kotlintest, a testing library that support property testing:

@Test
fun inputAndOutputShouldHaveTheSameLength() {
  forAll(Gen.string().take(1000)) { input ->

    val output = toUpper(input)

    output.length shouldBe input.length
  }
}

As expected, the only difference is that the arrange section of the test is a bit more complicated, and reads as follows: generate 1000 random strings, and check that the test is true for all of them.

Here, the library itself will generate those strings for us. The good thing is that this lib can be improved over time, so all projects using it will benefit from tricky inputs other projects might have discovered as well. It also is very fast.

6. Other properties?

At first, finding properties that your code holds doesn’t seem that easy, but becomes easier with time, and it allows to reap the benefits of having an automatic battery of property tests for very cheap.

In our case, trying to convert twice in a row should do nothing the second time. As a property, this can be written as toUpper(toUpper(input)) == toUpper(input).

More generally, we can even say that for some set of values, the input and the output are identical: toUpper(input) == input. This is true for strings that are already in uppercase, but also for string containing only characters that do not have an uppercase variant, such as Chinese or Korean characters.

The opposite is also true: there is a set of value that always have a different uppercase output, so in this case we would have toUpper(input) != input.

7. Conclusion

Property testing allows to automate some unit-tests, and is likely to discover cases we didn’t think about. It can be close to the concept of fuzzing (archive).

Although thinking about properties or invariants that are held before and after the execution of a unit of code is a slight departure from having to think about examples, doing so allows to improve and better graps our designs (archive).