Hedgehog will eat all your bugs.
State machine testing in Hedgehog is an interesting and expressive testing paradigm for R.
The goal is to test object oriented code or complex systems involving state and hidden or abstracted functionality.
John Hughes has a series of excellent talks regarding testing of state based and non-deterministic systems using QuviQ’s proprietary QuickCheck implementation, which has been using these techniques to great effect for many years.
Here’s a quick example: the following code is a reference class which maintains a lookup for values, with integer keys. One could also imagine writing a fast implementation of this in c. If you haven’t seen reference classes before, that’s ok – one can think of them as akin to C++ classes.
refs <- setRefClass("Refs",
fields = list(
num = "numeric"
, refs = "list"
)
, methods = list(
initialize = function() .self$reset()
, newRef = function() {
.self$num <- .self$num + 1
.self$refs[[.self$num]] <- 0
return ( .self$num )
}
, readRef = function(i) {
return ( .self$refs[[i]] )
}
, writeRef = function(i, a) {
.self$refs[[i]] <- a
invisible(NULL)
}
, reset = function() {
.self$num = 0
.self$refs = list()
invisible(NULL)
}
)
)
grefs <- refs$new()
We use the grefs
value as a global system which we will
interact with.
How would we test this object? Using hedgehog, we will simulate a model for this class as a list of tuples, one for naming the map entry, and one for its value. For example:
## $key
## [1] 2
##
## $val
## [1] 0
The commands we can use to interact with the reference class api are creating a new map entry, writing to a map entry, and reading a map entry.
We will define a hedgehog command which will be used to test the
newRef
functionality of the class.
new <- command ( "New",
generator = function( state ) list()
, execute = function() grefs$newRef()
, update = function( state, output )
snoc( state, list(key = output, val = 0))
)
The command
function takes a name of the function, which
will be shown when presenting counterexamples, as well as a set of
functions which are used when building and exercising this class’s
functionality.
The generator function is provided with the current state of the
system while we are running the generators and generates the list of
arguments which should be passed to the new
function while
it is being executed.
Here, the new
command doesn’t take an input, so this is
the empty list (the arguments are curried to the property functions
using do.call
).
The execute
function will take the inputs generated by
the generator, and applies them. In our case, there is no input (as
mentioned above), so this function takes no arguments. Inside the
execute
function we actually call the newRef()
function of the class.
Finally the update
function shown will add a new value
to the test’s model of the system. This function is run both at
generation time and during execution of the property. This is
interesting because during generation, there isn’t actually a value for
the output which we can pass to this function. Instead, we pass a
symbolic value representing the output and what it will be during
execution. This does have one interesting side effect, in that we can
not permit one to look at or examine the value of the output in this
function. It must be agnostic to whether the value is the type it
expects.
For new
the update function is rather simple, and just
adds a new map entry with value 0
to the model.
We can now test this functionality with an initial state.
test_that( "Registry State Machine Model",
forall( gen.actions ( initialmodel, list(new) ), function( actions ) {
grefs$reset()
expect_sequential( initialmodel, actions )
})
)
## Test passed 🌈
And it passes. Hedgehog has generated a random list of commands (all
of which are new
) and run them.
Let’s look at the functions we have used: gen.actions
is
a generator which will build a list of commands to call, ensuring that
the preconditions are sound, and providing sensible shrinks; the test,
expect_sequential
, is an expectation which will run the
actions and the post condition expectations. Notice that inside the
forall
, we also call grefs$reset()
; this
ensures that the global state is in a pristine condition before the
tests are run.
There aren’t actually any expectations in this model, so it’s not too surprising that the test passes.
Now we can make this test more interesting and will add a command for
the readRef
functionality of the system.
read <- command ( "Read",
generator = function( state ) {
if ( length(state) == 0 )
return(NULL)
list(
key = gen.with(gen.element( state ), function(i) i$key )
)}
, require = function( state, key )
!is.null ( Find( function( proc ) { proc$key == key } , state ) )
, execute = function( key ) grefs$readRef(key)
, ensure = function( state, output, key ) {
expected <- Find( function( proc ) { proc$key == key } , state )$val
expect_equal( expected, output)
}
)
There quite a few interesting things going on here. Firstly the
generator now is a function which will choose one of the
keys
to read. Notice that if the model is currently empty,
we return NULL
. This is one of the ways we ensure we only
run read
after a call to new
has been
performed. The require
function is also applicable here,
but furthermore tests that the generated function inputs are valid –
this is important during shrinking, as we don’t want to cull a command
during shrinking and then end up with an invalid state.
The execute
function now takes the key
to
read, and can pass it to readRef
.
Finally, we have an ensure
function. After execution, we
run this function, and any testthat
expectations within
it.
Ok, let’s add the final command:
write <- command ( "Write",
generator = function( state ) {
if ( length(state) == 0 )
return(NULL)
list (
key = gen.map( function(i) i$key, gen.element( state ))
, val = gen.int(10)
)}
, require = function( state, key, val )
!is.null ( Find( function( proc ) { proc$key == key } , state ) )
, execute = function( key, val ) grefs$writeRef( key, val )
, update = function( state, output, key, val )
lapply( state, function(proc)
if (proc$key == key) list(key = proc$key, val = val) else proc
)
)
Now we can run the tests with new
, read
,
and write
commands.
test_that( "Registry State Machine Model",
forall( gen.actions ( initialmodel, list(new, read, write) ), function( actions ) {
grefs$reset()
expect_sequential( initialmodel, actions )
})
)
## Test passed 😀
These tests now pass, meaning our model and our system are
consistent. Let’s now subtly break something, so we can show how
counterexamples and shrinking work. What we’ll do is write an incorrect
write
function, which writes a value incremented by
1
, instead of the correct value. This will cause the model
and reality to clash.
writeIncorrect <- command ( "Write (Broken)",
generator = function( state ) {
if ( length(state) == 0 )
return(NULL)
list (
key = gen.with( gen.element( state ), function(i) i$key)
, val = gen.int(10)
)}
, require = function( state, key, val )
!is.null ( Find( function( proc ) { proc$key == key } , state ) )
, execute = function( key, val ) grefs$writeRef( key, val + 1)
, update = function( state, output, key, val )
lapply( state, function(proc)
if (proc$key == key) list(key = proc$key, val = val) else proc
)
)
Now we’ll run the expectation with the incorrect write function as well.
test_that( "Registry State Machine Model",
forall( gen.actions ( initialmodel, list(new, read, write, writeIncorrect) ), function( actions ) {
grefs$reset()
expect_sequential( initialmodel, actions )
})
)
## ── Failure: Registry State Machine Model ───────────────────────────────────────
## Falsifiable after 14 tests, and 2 shrinks
## <expectation_failure/expectation/error/condition>
## `expected` not equal to `output`.
## 1/1 mismatches
## [1] 3 - 4 == -1
## Backtrace:
## ▆
## 1. └─hedgehog::forall(...)
## 2. └─hedgehog:::run.prop(property, counterexample$smallest, curry)
## 3. ├─base::tryCatch(...)
## 4. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
## 5. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
## 6. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
## 7. ├─base::withCallingHandlers(...)
## 8. ├─base::do.call(property, arguments)
## 9. └─`<fn>`(`<list>`)
## 10. └─hedgehog::expect_sequential(initialmodel, actions)
## 11. └─base::Reduce(...)
## 12. └─hedgehog (local) f(init, x[[i]])
## 13. └─hedgehog:::execute(acc$state, acc$environment, action)
## 14. ├─base::do.call(...)
## 15. └─global `<fn>`(state = `<list>`, output = 4, key = 1)
## 16. └─testthat::expect_equal(expected, output)
## Counterexample:
## [[1]]
## New
## output variable: 1
##
## [[2]]
## Write (Broken)
## inputs:
## $key
## Var 1 (symbolic)
## $val
## [1] 3
##
## output variable: 2
##
## [[3]]
## Read
## inputs:
## $key
## Var 1 (symbolic)
## output variable: 3
## Error:
## ! Test failed
One can see that the minimal shrink for incorrect behaviour has been found: where we create a new value; write to it with the broken function; and read the value, finding it is inconsistent with our model.