Afterwood

There is a classic puzzle that goes:

The poor have it, the rich need it, and if you eat it you’ll die. What is it?

If you haven’t come across this before and Google is out of reach because you’re reading the printed edition going through a tunnel or an internet blackspot, the answer is “nothing”. I think it would be fairly easy to come up with a programming specific version of this particular puzzle as there appears to be quite a few variants of “nothing” in our world, many of which seem to occupy a significant amount of our time due to their cunning camouflaged outfits.

It probably seems strange to us now but once upon a time there was no zero. Essentially you had something which was countable or you had nothing. There were no negative numbers either and nothing preceded one (no pun intended, for once). I recently read The Nothing that Is: A Natural History of Zero and it got me thinking about the ways we represent nothingness in programming and the problems it causes.

Zero is almost certainly our “go to choice” for representing a lack of something as it’s been with us since our early days of schooling and support for integer values has also been around in programming pretty much since the beginning too whether you’re using assembly or a high-level language. Even most children could tell you “numApples = 0” means you don’t have any apples.

Where it starts to go astray is when we have to squeeze the concept of “none” into a domain that doesn’t really support it because you are no longer representing countable things. A classic example here is representing dates where day zero represents some epoch like 1st January 1970 or 1st January 1900 and negative values stretch backwards in time. Here every value in the domain represents a valid value and so if we want something to accommodate the notion of “no value” we probably have to repurpose one or other end of the spectrum, e.g. INT_MIN or INT_MAX.

What about if we’re searching an array for some value or object and there is nothing to be found? If our array starts at index 1 (e.g. Visual Basic) we could use the value 0, but many languages have adopted the 0-based approach and Stan Kelly-Bootle’s suggestion of using 0.5 has never really received any uptake. For languages like Java and C# that are inherently based on signed integers they can return any negative value for the “none found” answer. In C++ where unsigned integers are the preferred choice we have no such luck and instead have concocted a magical value by the name of “npos” for strings which (implementation-wise) sits within the valid range of values but on the precipice such that you’d probably run out of memory long before it could ever be a valid result.

Sadly the use of -1 (in either of its signed or unsigned guises) as both a perfectly good response to a question and as a way of signalling an error has only succeeded in muddying the waters further. The Windows API for example uses the constants LB_ERR (and CB_ERR) in this way which means you often stumble across code that initialises variables with it, e.g. index = LB_ERR, because it allows us to exploit a duality of semantics (“no value, yet” and “not found”) and write less code, irrespective of whether it makes comprehending it any easier. (You might argue not finding it is an error; either way you still have the same type representing two different domains – index and error.)

With enumerated types we often walk right into the same trap with our eyes wide open thinking we’re being clever by adding an explicit value called None or Default (usually with a value of 0 in languages that zero-initialise values and references for “safety’s sake”).

Of course when you’re forced to abuse the type system it will get its own back. By masquerading two different result values within the normal course of events you will trick the caller into believing it’s safe to simply compose functions when in reality they’re just storing up a world of pain in the form of an IndexOutOfBounds exception, access violation or, if really unlucky, undefined behaviour and subsequent data corruption.

The NullPointerException, or “NPE” as it is commonly referred to in the Java world, is a blight on modern programming caused in part by our use of programming languages which embrace the use of reference types over value types meaning that all of our objects can potentially exist or not exist. Unless the use is entirely local it can be difficult to reason about any object’s existence and therefore null checks can easily dominate a codebase in an act of overly defensive programming. The introduction of the “?.” operator in some languages might reduce the noise but it’s just a case of treating the symptoms, not the disease.

This particular foe likes to disguise themselves by changing their name too, but whether they be null, NULL, nullptr, nil, 0, end, -1, npos, “”, NaN, None, etc. we should be on our guard and be ready to banish them to computing history or at the very least quarantine them.

But what can be done, surely we can’t undo the past? Well, maybe we can. Over the years the awareness of the Optional (Option, Maybe, etc.) type has grown so that it’s no longer just a niche technique used by Comp Sci purists. The desire to right this wrong is so strong in some circles that there is currently a preview of C# [1] where reference types have been given the Nullable makeover thereby allowing us to finally consider deleting our own homebrew variants and deprecating our static analysis annotations in favour of a kosher type annotation. Surprised?

One of Shakespeare’s most famous comedies is titled Much Ado About Nothing. Given the amount of time we’ve lost over the years debugging issues caused by our inability to express “nothing” in a way that is obvious to our fellow programmers I’d say it was no laughing matter. We need to realise that failure can indeed be an option and that the type system should be there to help us, nothing more nothing less.

 [1] https://github.com/dotnet/csharplang/wiki/Nullable-Reference-Types-Preview

Chris Oldwood
09 July 2018

Biography

Chris is a freelance programmer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros. These days it's enterprise grade technology in plush corporate offices. He also commentates on the Godmanchester duck race and can be easily distracted via gort@cix.co.uk or @chrisoldwood.