Hedgehog - State Machine Testing

Hedgehog will eat all your bugs.

State machine testing in Hedgehog is an interesting and expressive testing paradigm for R.

The goal is to test object oriented code or complex systems involving state and hidden or abstracted functionality.

John Hughes has a series of excellent talks regarding testing of state based and non-deterministic systems using QuviQ’s proprietary QuickCheck implementation, which has been using these techniques to great effect for many years.

Here’s a quick example: the following code is a reference class which maintains a lookup for values, with integer keys. One could also imagine writing a fast implementation of this in c. If you haven’t seen reference classes before, that’s ok – one can think of them as akin to C++ classes.

refs <- setRefClass("Refs",
    fields = list(
        num = "numeric"
      , refs = "list"
      )
  , methods = list(
        initialize = function() .self$reset()
      , newRef = function() {
        .self$num <- .self$num + 1
        .self$refs[[.self$num]] <- 0
        return ( .self$num )
      }
      , readRef = function(i) {
        return ( .self$refs[[i]] )
      }
      , writeRef = function(i, a) {
        .self$refs[[i]] <- a
        invisible(NULL)
      }
      , reset = function() {
        .self$num = 0
        .self$refs = list()
        invisible(NULL)
      }
    )
)
grefs <- refs$new()

We use the grefs value as a global system which we will interact with.

How would we test this object? Using hedgehog, we will simulate a model for this class as a list of tuples, one for naming the map entry, and one for its value. For example:

list(key = 2, val = 0)

## $key
## [1] 2
## 
## $val
## [1] 0

The commands we can use to interact with the reference class api are creating a new map entry, writing to a map entry, and reading a map entry.

We will define a hedgehog command which will be used to test the newRef functionality of the class.

new  <- command ( "New",
    generator = function( state ) list()
  , execute   = function() grefs$newRef()
  , update    = function( state, output )
      snoc( state, list(key = output, val = 0))
  )

The command function takes a name of the function, which will be shown when presenting counterexamples, as well as a set of functions which are used when building and exercising this class’s functionality.

The generator function is provided with the current state of the system while we are running the generators and generates the list of arguments which should be passed to the new function while it is being executed.

Here, the new command doesn’t take an input, so this is the empty list (the arguments are curried to the property functions using do.call).

The execute function will take the inputs generated by the generator, and applies them. In our case, there is no input (as mentioned above), so this function takes no arguments. Inside the execute function we actually call the newRef() function of the class.

Finally the update function shown will add a new value to the test’s model of the system. This function is run both at generation time and during execution of the property. This is interesting because during generation, there isn’t actually a value for the output which we can pass to this function. Instead, we pass a symbolic value representing the output and what it will be during execution. This does have one interesting side effect, in that we can not permit one to look at or examine the value of the output in this function. It must be agnostic to whether the value is the type it expects.

For new the update function is rather simple, and just adds a new map entry with value 0 to the model.

We can now test this functionality with an initial state.

initialmodel <- list()

test_that( "Registry State Machine Model",
  forall( gen.actions ( initialmodel, list(new) ), function( actions ) {
    grefs$reset()
    expect_sequential( initialmodel, actions )
  })
)

## Test passed 🌈

And it passes. Hedgehog has generated a random list of commands (all of which are new) and run them.

Let’s look at the functions we have used: gen.actions is a generator which will build a list of commands to call, ensuring that the preconditions are sound, and providing sensible shrinks; the test, expect_sequential, is an expectation which will run the actions and the post condition expectations. Notice that inside the forall, we also call grefs$reset(); this ensures that the global state is in a pristine condition before the tests are run.

There aren’t actually any expectations in this model, so it’s not too surprising that the test passes.

Now we can make this test more interesting and will add a command for the readRef functionality of the system.

read <- command ( "Read",
    generator = function( state ) {
      if ( length(state) == 0 )
        return(NULL)
      list(
        key = gen.with(gen.element( state ), function(i) i$key )
      )}
  , require = function( state, key )
      !is.null ( Find( function( proc ) { proc$key == key } , state ) )
  , execute = function( key ) grefs$readRef(key)
  , ensure  = function( state, output, key ) {
      expected <- Find( function( proc ) { proc$key == key } , state )$val
      expect_equal( expected, output)
    }
  )

There quite a few interesting things going on here. Firstly the generator now is a function which will choose one of the keys to read. Notice that if the model is currently empty, we return NULL. This is one of the ways we ensure we only run read after a call to new has been performed. The require function is also applicable here, but furthermore tests that the generated function inputs are valid – this is important during shrinking, as we don’t want to cull a command during shrinking and then end up with an invalid state.

The execute function now takes the key to read, and can pass it to readRef.

Finally, we have an ensure function. After execution, we run this function, and any testthat expectations within it.

Ok, let’s add the final command:

write <- command ( "Write",
    generator = function( state ) {
      if ( length(state) == 0 )
        return(NULL)
      list (
        key = gen.map( function(i) i$key, gen.element( state ))
      , val = gen.int(10)
      )}
  , require = function( state, key, val )
      !is.null ( Find( function( proc ) { proc$key == key } , state ) )
  , execute = function( key, val ) grefs$writeRef( key, val )
  , update  = function( state, output, key, val )
      lapply( state, function(proc)
        if (proc$key == key) list(key = proc$key, val = val) else proc
      )
  )

Now we can run the tests with new, read, and write commands.

test_that( "Registry State Machine Model",
  forall( gen.actions ( initialmodel, list(new, read, write) ), function( actions ) {
    grefs$reset()
    expect_sequential( initialmodel, actions )
  })
)

## Test passed 😀

These tests now pass, meaning our model and our system are consistent. Let’s now subtly break something, so we can show how counterexamples and shrinking work. What we’ll do is write an incorrect write function, which writes a value incremented by 1, instead of the correct value. This will cause the model and reality to clash.

writeIncorrect <- command ( "Write (Broken)",
    generator = function( state ) {
      if ( length(state) == 0 )
        return(NULL)
      list (
        key = gen.with( gen.element( state ), function(i) i$key)
      , val = gen.int(10)
      )}
  , require = function( state, key, val )
      !is.null ( Find( function( proc ) { proc$key == key } , state ) )
  , execute = function( key, val ) grefs$writeRef( key, val + 1)
  , update  = function( state, output, key, val )
      lapply( state, function(proc)
        if (proc$key == key) list(key = proc$key, val = val) else proc
      )
  )

Now we’ll run the expectation with the incorrect write function as well.

test_that( "Registry State Machine Model",
  forall( gen.actions ( initialmodel, list(new, read, write, writeIncorrect) ), function( actions ) {
    grefs$reset()
    expect_sequential( initialmodel, actions )
  })
)

## ── Failure: Registry State Machine Model ───────────────────────────────────────
## Falsifiable after 14 tests, and 2 shrinks
## <expectation_failure/expectation/error/condition>
## `expected` not equal to `output`.
## 1/1 mismatches
## [1] 3 - 4 == -1
## Backtrace:
##      ▆
##   1. └─hedgehog::forall(...)
##   2.   └─hedgehog:::run.prop(property, counterexample$smallest, curry)
##   3.     ├─base::tryCatch(...)
##   4.     │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
##   5.     │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
##   6.     │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
##   7.     ├─base::withCallingHandlers(...)
##   8.     ├─base::do.call(property, arguments)
##   9.     └─`<fn>`(`<list>`)
##  10.       └─hedgehog::expect_sequential(initialmodel, actions)
##  11.         └─base::Reduce(...)
##  12.           └─hedgehog (local) f(init, x[[i]])
##  13.             └─hedgehog:::execute(acc$state, acc$environment, action)
##  14.               ├─base::do.call(...)
##  15.               └─global `<fn>`(state = `<list>`, output = 4, key = 1)
##  16.                 └─testthat::expect_equal(expected, output)
## Counterexample:
## [[1]]
## New 
## output variable: 1 
## 
## [[2]]
## Write (Broken) 
## inputs:
## $key
## Var 1 (symbolic)
## $val
## [1] 3
## 
## output variable: 2 
## 
## [[3]]
## Read 
## inputs:
## $key
## Var 1 (symbolic)
## output variable: 3

## Error:
## ! Test failed

One can see that the minimal shrink for incorrect behaviour has been found: where we create a new value; write to it with the broken function; and read the value, finding it is inconsistent with our model.