Hack w/ Limbic Media

Interactive Electronica and IoT

Visit Limbic Media

hack.limbicmedia.ca was created by the developers and engineers at Limbic Media to share their knowledge and experiences with DIY and hacker community-at-large.

by Jorge Aranda

Property-based testing with Hypothesis

We are big fans of automated testing at Limbic—our projects routinely have test coverage in the high-nineties, and we use our test suites as part of our continuous integration process.

However, until recently, most of our testing has been of the traditional, example-based kind. If you are at all familiar with unit testing, chances are this is what you are doing too, and there's nothing bad about it, but there are ways to improve your testing efforts significantly with a bit of effort. This post is about one of those techniques—property-based testing.

Let me illustrate with an example inspired by our work: suppose that in a Python project you have a utility function

point_in_polygon(point, polygon)

...which takes a point (in our case, a list of two floats) and a polygon (a list of at least three points) and returns True if the point is within the polygon and False otherwise1. This is useful for us to see, for instance, if a device reporting GPS coordinates has left or entered a region of interest.

Of course, we want to test our code. Basic tests would look like this:

def test_point_in_square():
    square = [[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0]]
    square_center = [0.5, 0.5]
    assert point_in_polygon(square_center, square) is True

def test_point_outside():
    square = [[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0]]
    outside = [1.5, 1.5]
    assert point_in_polygon(outside, square) is False

...and they pass. We get 100% test coverage on the function, too!

At this point two competing thoughts arise. First: we should not let our guard down; there are a bunch of potential problems here that we are not testing for. Second: on the other hand, we do need to deliver our code, and it seems to be working fairly well on these cases. It's unreasonable to add a test case for every single corner case all the time, especially as the combination of arguments for more complex functions explodes.

What to do in each case is a matter of judgment, but some tools help ease the dilemma. Let's see how Hypothesis and property-based testing help in this case.2

The first thing to realize is that tests of the kind I wrote above are example-based: we give the test framework some examples and we see if our code handles them successfully. With Hypothesis we do property-based testing: we define the inputs to our test as a function with some characteristics, and we see if our code handles a large sample of them in the way we expected.

A key technique of property-based testing is generating the right inputs for your test. If we wanted to run property-based tests on the example above, we would need a way to generate points and polygons. Since we represent polygons as lists of points, let's tackle points first. Hypothesis gives us a number of strategies to generate test inputs, one of them is called floats (it predictably generates floating-point numbers):

>>> from hypothesis.strategies import floats
>>> floats().example()
0.3333333333333333

Yes, that's a nice floating-point number. One more?

>>> floats().example()
nan

Oh right, nan is a valid float (so is inf, infinity). Do we allow nans in our function? Do we check for them? We're just getting started and we've found a potentially important thing to check. For now, let's merrily assume nan and inf are not valid inputs to our function. The floats strategy allows us to filter them out:

>>> floats(allow_nan=False, allow_infinity=False).example()
5.712024203845889e+18

That is a very big number, and we want our utility function to work in a geographical plane. For simplicity, let's constrain ourselves to numbers between -180 and 180 (which is out of bounds for latitude, but we don't care about that for now):

>>> floats(min_value=-180, max_value=180).example()
138.354040512508

Note that we can take out allow_nan and allow_inf as we are being explicit on the values we want. Now let's put those numbers on lists:

>>> from hypothesis.strategies import lists
>>> lists(floats(min_value=-180, max_value=180)).example()
[-133.70204562676602, -66.09805767846638, 107.1351659065852, 166.78629406401296]

The lists strategy allows us to declare the type of content we want, and gives us lists of arbitrary length with that type of content. In our case though, we want the list to have a specific length of two:

>>> lists(floats(min_value=-180, max_value=180), min_size=2, max_size=2).example()
[-143.25840064385602, 180.0]

That is pretty good. In our test file, we write:

points = lists(floats(min_value=-180, max_value=180), min_size=2, max_size=2)

Now onto polygons, which are hopefully easier once that we have their foundations ready. In our system, a polygon is a list of at least three vertices. The following should do the trick:

polygons = lists(points, min_size=3)

The first thing to try is to see if there is anything terribly wrong with the code—does it crash unexpectedly?

from hypothesis import given, reject

@given(points, polygons)
def test_point_in_polygon_does_not_go_down_in_flames(point, polygon):
    try:
        point_in_polygon(point, polygon)
    except:
        reject()

This code will generate random points and polygons (about 200 of them)3 and run them against our code. In our case, this works. If it hadn't, Hypothesis keeps the offending example in a database, and it'll be the first thing it will try next time you run it again.

Let's move on to a somewhat tougher test. Assuming we have a triangle, the point in the middle of all three vertices (mean(x), mean(y)) should be inside the triangle. Let's test for that.

triangles = lists(points, min_size=3, max_size=3)

@given(triangles)
def test_average_is_inside(triangle):
    average = [(triangle[0][0] + triangle[1][0] + triangle[2][0]) / 3.0,
               (triangle[0][1] + triangle[1][1] + triangle[2][1]) / 3.0]
    assert point_in_polygon(average, triangle) is True

In the snippet above we (inelegantly) calculate the mean on each dimension, call it average, and feed that along with our triangle to the function. Test away!

======================================================================
FAIL: test_geography.test_average_is_inside  
----------------------------------------------------------------------
Traceback (most recent call last):  
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/jaranda/Projects/hypothesis-test/test/test_geography.py", line 52, in test_average_is_inside
    def test_average_is_inside(triangle):
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/core.py", line 520, in wrapped_test
    print_example=True, is_final=True
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/core.py", line 110, in run
    return test(*args, **kwargs)
  File "/Users/jaranda/Projects/hypothesis-test/test/test_geography.py", line 55, in test_average_is_inside
    assert point_in_polygon(average, triangle) is True
AssertionError:  
-------------------- >> begin captured stdout << ---------------------
Falsifying example: test_average_is_inside(triangle=[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0]])

--------------------- >> end captured stdout << ----------------------

Hypothesis is telling us that it found a way to break our code: feeding it a triangle where all points are the same (in this case [0.0, 0.0]). This turned out to be a legitimate problem, not necessarily in point_in_polygon, but in our approach to handle these polygons in the rest of our system: while by definition this is not a valid polygon, polygons are created by our users, and we were not rejecting nor checking for "polygons" like these anywhere in our code. Another important improvement courtesy of following this testing approach.

What do we do about this? Let's say we decide to handle these potentially malformed polygons elsewhere, and consider only valid polygons here. Hypothesis gives us a filter() method to filter out generated inputs that we don't need.

def unique(vertices):  
    for index, coords in enumerate(vertices):
        if coords in vertices[index + 1:]:
            return False
    else:
        return True

triangles = lists(points, min_size=3, max_size=3).filter(unique)  

We now check that each point in the list of vertices is unique, and if it is not, we don't let it go through. Onwards!

...F
======================================================================
FAIL: test_geography.test_average_is_inside  
----------------------------------------------------------------------
Traceback (most recent call last):  
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/jaranda/Projects/hypothesis-test/test/test_geography.py", line 52, in test_average_is_inside
    def test_average_is_inside(triangle):
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/core.py", line 520, in wrapped_test
    print_example=True, is_final=True
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/Users/jaranda/anaconda/envs/hypothesis/lib/python2.7/site-packages/hypothesis/core.py", line 110, in run
    return test(*args, **kwargs)
  File "/Users/jaranda/Projects/hypothesis-test/test/test_geography.py", line 55, in test_average_is_inside
    assert point_in_polygon(average, triangle) is True
AssertionError:  
-------------------- >> begin captured stdout << ---------------------
Falsifying example: test_average_is_inside(triangle=[[0.0, 0.0], [0.0, 5e-324], [0.0, 1e-323]])

--------------------- >> end captured stdout << ----------------------

Gut reaction: you got to be kidding me, Hypothesis. These are pretty much the same point; it should not be allowed! Also, how is this happening? After reviewing the text in the algorithm description website, we discover the following (emphasis added):

Don't mess with this code unless you're familiar with the idea of Simulation of Simplicity. This pretends to shift the ray infinitesimally down so that it either clearly intersects, or clearly doesn't touch. Since this is merely a conceptual, infinitesimal, shift, it never creates an intersection that didn't exist before, and never destroys an intersection that clearly existed before.

Well I'm afraid I'm not familiar with the idea of Simulation of Simplicity, and I am still unsure what is causing the error. But I suspect now that these infinitesimal details are actually important when the inputs have over three hundred leading zeroes.

Two consequences of this. First, fine, when we check for whether points in a polygon are the same, we should also check if they are pretty close, not just identical.4 Second, after assuming that, we still need to fix our test generation code. Here is one approach, overwriting our unique check (the code for isclose comes from this PEP proposal; this is an easier problem in Python 3):

def isclose(a, b, rel_tol=1e-09, abs_tol=0.0001):  
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)


def unique(vertices):  
    for index, coords in enumerate(vertices):
        for compare_coords in vertices[index + 1:]:
            if isclose(coords[0], compare_coords[0]) or isclose(coords[1], compare_coords[1]):
                return False
    else:
        return True

Let's run it...

....
----------------------------------------------------------------------
Ran 4 tests in 0.739s

OK  

It works!

Or, it kind of works. I've discovered other issues that I'm skipping here—for instance, it turns out that vertices along a line are also not valid polygons, and we also need to check for that—but that we need to deal with in our actual projects. I'm happy I've found them.

So, bottom line assessment. After spending some time doing property-based testing on seemingly innocuous code, we found:

  • We'll need to validate our inputs, filtering out nan and inf
  • When creating or updating these polygons, we need to reject inputs that have any identical vertices
  • ...even better, we need to reject inputs that have almost-identical vertices

This came out only at the costs of struggling somewhat with creating inputs in the right formats (which I find gets easier as you gain more practice), and of somewhat longer test run times (which are caused by running about 200 tests per test function). To be fair, we might have found these problems with example-based testing too, but property-based testing makes it harder to blind ourselves to the corner cases that we would otherwise blissfully avoid. The balance, in my view, is clearly in favour of introducing this kind of testing, not exclusively perhaps, but as a component of our automated test suites.


  1. Our implementation follows the PNPOLY algorithm as described here

  2. Hypothesis is mainly the brainchild of David MacIver, and I first heard about it in Michael Kennedy's excellent Talk Python to Me podcast

  3. Semi-random, rather. Hypothesis is good at finding and focusing on devious corner cases

  4. This is an illustration of David MacIver's warning that working with floating point numbers is pretty hard

Need help with a similar project?

Innovative Digital Art

Limbic Media

Consulting Engineering

Limbic Consulting