simple design made complex

Written by Thomas Henderson
Published on 25 August 2017

The Feynman Problem-Solving Algorithm:

1. Write down the problem.
2. Think very hard.
3. Write down the answer.

Attributed to Murray Gell-Mann

It is tempting to think that programming works like this:

  1. Write down the behavior that you want;
  2. Write down the code for that behavior;
  3. There is no step three.

But, that’s not how it works. Why? And if it doesn’t work, what can we do instead?

Understanding the Four Rules of Simple Design

Understanding the Four Rules of Simple Design takes as a fundamental premise, “The only thing we truly know about software development is that we can expect changes to our system.” It presents programming as an iterative process:

Now, when I see a simple, iterative process, I get interested in seeing it though a lens of formal systems. If I can formalize a system, then I am confident I understand it, or at least some part of it; if I can’t formalize it, then I know that I still have learning to do, and/or that I need to accept a degree of uncertainty and ambiguity. Additionally, if I can formalize a system, I can frequently draw one or more diagrams representing that system, which I love doing.

The four rules from the book’s title, in compact form, are:

  1. Tests Pass
  2. Expresses Intent
  3. No Duplication
  4. Small

Today I will examine the first rule, Tests Pass, in appalling detail, through the lens of category theory, before discussing the other three rules in that light. Category theory is a mathematical framework for organizing complex information. It makes simple things difficult, in order to help make difficult things simple.A modern classic on the distinctions among ‘simple,’ ‘easy,’ ‘complex,’ and ‘difficult’ is Rich Hickey’s 2011 talk, Simple Made Easy. It’s worth a listen. It is by no means the only way to think precisely about code that improves iteratively over time, but it’s a framework that works well for my algebraic and diagrammatic sensibilities. With that caveat, let’s jump in and try and hold two or three infinite spaces in our heads.

On Code and Behavior

I want you to imagine two very large ‘spaces’ of things. (A ‘space’ in this context just means, a sort of collection, but a collection that might be too big to count, even in principle.)

The first space is the collection of all possible programs. Let’s make that a tiny bit more manageable, and consider only the collection of all possible programs in your favorite language. My favorite language is Ruby these days, so when I talk about “the space of all programs,” I mean the space of programs written in syntactically correct Ruby.

The second space is even more ridiculous than the first: it’s the collection of all possible behaviors of the computer you’re programming. Now we’re getting completely ridiculous. What makes your computer special is that it is a machine that can behave like many other machines. Your computer is can compute prime numbers, it can encrypt messages so that no one on Earth can decode them without great effort; it can show you a picture of your dog, it can download the 1945 version of “The Big Sleep”Inferior to the more widely released 1946 version — Lauren Bacall’s re-shot scenes are tighter, the chemistry between Bacall and Humphrey Bogart is better — but, interesting from a film history point of view., it can order creamed corn for next-day delivery. In fact, you might argue that this space is so large as to be not well-defined at all. At least the space of all programs, that’s just a collection of finite sets of finite text files which follow the rules of Ruby — it’s an infinite collection, but at least it’s a relatively well-behaved one. So, how can we possibly get any kind of handle on this absurdly huge space with ill-defined boundaries?

We write a test.

The space of all behaviors of our computer is unfathomably large. But when we write a new test, we introduce a constraint on that space. That is to say, there are behaviors which include passing the new test as a sub-behavior, and there are behaviors which do not. “Write down the behaviors we want, then write the code that produces them” may be an impossible process. But when we write a test, we’re writing a function from your code to a set with two elements:

That function creates an induced constraint on the space of all programs: there’s code that passes, and there’s code that isn’t.

I propose, then, that we introduce a new space, one that will protect us from the uncertain horrors of the poorly defined space of all behaviors: the space of all finite collections of tests. When we begin to write a system, we have the empty set as our test suite. It is no constraint at all. When we write our first test, we constrain both our computer’s behavior, and the programs which encode for that behavior. When we write our second, we constrain both spaces, more than they were constrained when we had one test. And so on, and so on. And since tests are just a kind of code, this space is just as nice as the space of all programs: infinite, sure, but at least it’s countably infinite and reasonably well-defined. Tests with certain side effects, like asking a web page outside of our control, might fall into a gray area, but not one I’ll take the time to examine.

Now, it’s nice that we’re not staring into the abyss of infinite behavior and uncountable side effects — we’re protected by our test suite, like a radiologist in a lead apron. But, what about actually writing code? What about passing those tests?


Let’s pause here and recognize a fact about text files, which you may have already thought about if you’ve worked with the git version control system. Suppose there are two programs in the space of all programs in Ruby, which I will name C and C'. Then it must be the case that there is some sequence of transformations I can do in my text editor that will transform C into C'. How do I know this? Not for any deep reason — I just know that I could very easily delete every file in C, and write anew the files of C'. And so it seems reasonable to say that, in ordinary circumstances, there are likely to be less drastic transformations to get from C to C'. Taking from the book’s running example of Conway’s Game of Life, one simple transformation would be to take every instance of the word location and replace it with the word position. From the point of view of the compiler, this would be “the same” code. But to a human reader, there might be a slightly different connotation between the two words. I’ll address this notion of “the same but different” when I address the rule Express Intent.

Recall that my goal here is to put a categorical structure on the space of all programs. If we only consider the transformation, “X can be turned into Y in a text editor”, then there is a transformation (a.k.a. an arrow, a.k.a. a morphism) from every program to every other program. It’s not a very interesting structure, because it is unconstrained.

But wait — we wrote a test! In attempting to wrestle with the impossibly ill-defined and huge space of behaviors, we wrote a test Okay, okay, we imagined writing a test., which is a function that takes a program C to the set { Pass, Fail }. So that function induces a constraint on the space of programs: programs that do pass, and programs that don’t pass. (At least, it should — if the empty program passes your test, you need to be a bit more aggressive about defining failure.)

This suggests two special kinds of program-to-program morphisms. One might be called writing code: given a program that fails a collection of tests, transform it into a program that passes that collection of tests. When we begin, we have the empty program, which better fail your test. Through work and creativity, we struggle from the red zone to the subspace of code that runs green.

Another program-to-program morphism is called refactoring: given a program C that passes a collection of tests, transform it into a new program C' that also passes the tests. These programs are “the same” from the perspective of your tests. But from the perspective of you, the programmer, and your colleagues, they ought to be different. The new program should be “better.”

≥, abstractly

What does “better” mean? It isn’t a notion that can be easily formalized and categorized — maybe it cannot be mathematically formalized at all. But the remaining rules give us guidance:

Does C' use an improved name for a concept (Express Intent)? Does it combine two similar ideas into a one abstraction with variations (No Duplication)? Does it break out an inline method into a well-named helper method (Express Intent)? Does it move a helper method that is now used only once back inline (Small)? We have an enormous number of choices that we can make to rewrite C into C' — but we do not have every choice, thanks to the first rule: we can only morph our code from passing to passing states.


The space of all behaviors of our computer is of innumerable size and indeterminate boundaries. If we try and write out a nice compact list of behaviors in advance, at the point in time when we know the least about what we’re trying to do, we are bound to fail.

But we still have hope. We can write tests, which are code that constrains those innumerable behaviors. We can use those tests to guide us as we write more code, by letting our test suite be the fixed point by which we judge failure and success. And we can use our experience at reading words, thinking abstractly, and trimming excess to steer our code toward flexibility and readability, within the constraints provided by our tests.

From unknowable; to constrained; to successful; to better; to more constrained. Trusting the 4 Rules requires one to be comfortable with ambiguity, uncertainty, and change, but it provides assurances that your work is heading in the right direction. I’ll conclude with one of my favorite quotes about writing and uncertainty, from author E.L. Doctorow:

Writing is like driving at night in the fog. You can only see as far as your headlights, but you can make the whole trip that way.