List Incomprehension

 

I’ve never really got on very well with the “list” data type. Back when I started programming professionally I was writing code in C and we had to create many of the basic data types ourselves. Generally speaking, at the time, if you needed a variable sized collection you opted for a linked-list, and usually a doubley-linked list at that. This gave you more flexibility when it came to traversing, and also the same underlying structure could be used for a Stack or Queue depending on whether you needed to add or remove from the head or tail.

 

As a consequence I grew up for many years believing a list always had a head, a body and a tail; where the head and tail were a single item at the front and back respectively and the body was all the elements in between. The side-effect of this distorted mental model was a great difficulty [1] understanding the concept of Typelists when Andrei Alexandrescu published his seminal work Modern C++ Design.

 

I lay some of the blame for my early programming ills firmly at the door of MFC. The following snippet shows what MFC’s idea of a list class does (of course it also reinforces my own already misguided belief of what a list was at that point too):-

 

CList list;

 

list.AddHead(42);

list.AddTail(99);

 

ASSERT(list.GetHead() == 42);

ASSERT(list.GetTail() == 99);

 

Around the same time as I was struggling to grasp what I later discovered was a “recursive model” of lists (single item head + multi-itemed tail) I was also trying to learn Python. Unlike C++, where the default container type was an array (std::vector), Python was presented around the list type and terms like “list comprehension” and “cons operator” did very little to help me understand what was going on. It didn’t help either that the list type smelt an awful lot like an array to me and further talk of slices and shallow copies just seemed a far cry from the safe world of C++ that I was used to.

 

It’s probably no surprise I didn’t really grok the list stuff in Python when I didn’t exactly understand the true power of iterators in C++ either. For that light bulb moment to happen I had to read Matthew Wilson’s Extended STL book. Up until that point iterators seemed to me mostly just “clever” pointers, but what Matthew’s book did was to show me the more the powerful idea of representing other types of “collections” as sequences such as a file-system directory or a database table. For example, to iterate over the files in a folder using C++ 98 might go something like this:-

 

FolderIterator it(“C:\\Temp”);

FolderIterator end;

 

for (; it != end; ++it)

{

    const std::string& filename = *it;

    . . .

}

 

At around the same time I joined ACCU and the first local branch meeting I attended was in Cambridge. This was hosted by Jez Higgins and the title of the talk was “Iteration: It's just one damn thing after another”. While I can’t claim to have understood all the non-C++ variations it did begin to help cement my realisation that containers are often just used as temporary storage for consuming a sequence.

 

With this revolutionary (to me) mindset under my belt the eventual transition from C++ to C# was somewhat easier because its ubiquitous IEnumerable type felt very similar to an iterator in C++ and the C# List<T> type was really just a std::vector<T>. In fact it wasn’t until I hit a performance anomaly in PowerShell that I even knew C# had a classic linked-list type.

 

Even with help from Boosts’ iterator_facade implementing simple iterators in C++ are still a chore. In contrast the “yield returnconstruct in C# makes it a doddle because the compiler builds a state machine under the covers for you so that your function can become the internal algorithm for an iterator:-

 

public IEnumerable<int> NumbersTo10()

{

    for (int i = 1; i != 11; ++i)

        yield return i;

}

. . .

IEnumerable<int> range = NumbersTo10();

foreach (var number in range)

    sum += number;

 

Manually looping over a range like the numbers 1 to 10 is an algorithm that is fairly well ingrained in my muscle memory. When I first saw this way of doing it I was bewildered:-

 

var range = Enumerable.Range(1, 10);

 

But of course it’s highly unlikely you’d pass that to a foreach loop, you’d use LINQ all the way and hide the loop entirely:-

 

var sum = Enumerable.Range(1, 10)

                    .Sum();

 

I still felt deeply uncomfortable with this. The classic “for” loop is simple and my C++ background meant I was fairly certain the (JIT) compiler would translate it into some equally simple machine code. With the Enumerable method I’m not so sure, it feels like there are a few memory allocations going on there which just seems excessive. No matter how much I might remind myself of the Parento Principle [2] (a.k.a. the 80 / 20 rule) it just seems gratuitous.

 

Despite attending Oliver Sturm’s ACCU conference session on F# way back in 2009 and listening to Don Syme himself talk about F# to the BCS a couple of months later it has taken me until now to try and pick it up seriously. I purchased a second hand copy of Programming F# and started reading it in the background mostly as a way of gently introducing myself to some of the functional programming concepts that I was aware of, like currying and partial function application, but didn’t really know in depth.

 

And then on page 33 I hit that term “list comprehension” again, but this time the example (slightly modified below) didn’t seem all that confusing:-

 

let numbersTo10 =

    [

        for i in 1..10 do

            yield i

    ]

 

In fact it looked very similar to the C# equivalent. Suddenly the mystique was gone – a “list comprehension” is just a list generated by some algorithm. I remember reading a tweet once suggesting “list comprehension” was probably up there in the top 10 badly named computer science terms. To me “sequence generation” seems a more suitable term.

 

But hold on, that term doesn’t seem quite right either; perhaps “list generation” would be more appropriate. But why would you generate a list, which takes up valuable space, from a lazily evaluate-able sequence which takes virtually no space at all, except as an optimisation for further iteration?

 

It feels like I’ve discovered the beauty of lazily evaluated sequences with iterators in C++ and IEnumerable in C# and now consequently discovered what list comprehensions are. But in return I’m now questioning why so much importance seems to be placed on them when sequences and pipelines appear to offer so much more value. Will my journey never end?

References

[1] http://chrisoldwood.blogspot.co.uk/2013/06/a-tail-of-two-lists.html

[2] http://en.wikipedia.org/wiki/Pareto_principle

 

Chris Oldwood

17 April 2014

 

Bio

Chris is a freelance developer who started out as a bedroom coder in the 80’s writing assembler on 8-bit micros; these days it’s C++ and C#. He also commentates on the Godmanchester duck race and can be contacted via gort@cix.co.uk or @chrisoldwood.