Strata of the World—moved to: https://nosetgauge.com/

2019-09-27

Growth and civilisation

3.0k words (≈ 12 minutes)

It is often said that continuous exponential economic growth cannot be sustainable in the long run. This may well be so. But are our values sustainable without growth?

The zero-sum world

Game theorists distinguish between zero-sum games and non-zero-sum (positive-sum or negative-sum) games. In a zero-sum game, one player’s gain is another’s loss, and visa versa. The sum of the player’s gains is zero; it is impossible for the world at large to gain.

A world without growth is a zero-sum game. If the resources available at time $$T_2$$ are the same as those available at time $$T_1$$, the only way to increase your share of those resources is to take them from someone else.

For most of human history, the world was largely zero-sum. Before the industrial revolution, economic and technological progress were generally slow enough that major increases in resources (or human power more generally) did not happen over an individual’s lifespan.

A well-managed estate or a hard-working farmer could, of course, beat the averages without hurting others. However, if you sought to become rich, creating value was a bad bet; you were far better off trying to become friends with the powerful. The powerful had only so many resources at their disposal, so this generally meant – directly or indirectly – worsening someone else’s access to riches. If you were a king seeking to make your nation great, you were probably better off trying to seek control over the resources of other nations (whether through royal marriage, warfare, or other means) than figuring out how to best create wealth within your nation. In a world of slow growth, the first strategy might net you France; the second strategy might mean that your descendants see agricultural efficiency improve by 10%.

Land was essential in premodern societies. Populations generally grew to the maximum density that the land would support, so in the long run land also meant people. Land is an inherently zero-sum game – very little productive land was unoccupied (even historically) and you can’t make more, so gains in land for one party are always losses for another.

Look at premodern societies through a modern lens, and the zero-sum thinking inherent in them is striking. If you were a member of the elite, you squeezed as much value out of the land and labour you have control over as you can; there’s no reason to invest in the future, because productivity would not change much anyways. The ultimate institution in a zero-sum world is the military, because that is how you grab value from others and stop others from grabbing it from you. Hence military culture was venerated.

A note on the above historical claims
All of these things are, of course, vast generalisations to which there are innumerable exceptions and which, in a more thorough piece, would require plenty of asterisks. Below I’ve gestured at data that supports the general gist of the points made above (feel free to skip this section):

The transition from a zero- to positive-sum world is indisputable. Consider for instance English per capita GDP over the past 700-and-some years: from 1270 to 1800, wealth per person rose about 3-fold, for an average growth rate of 0.2% per year, compared to an average 1.1% since then. Over a 70-year life starting in the year 1400, you’d observe average income dip a few percent; over the same life starting in 1900, you’d see it almost triple. Note that such charts don’t measure money; they measure wealth, including the value of home-grown food, etc. See this excellent write-up for more on the methodology.

Importance of land: There is a very nice graph I once saw showing, for some roughly medieval historical period, almost no correlation between arability of land and per capita wealth but a strong correlation between arability and population density. I was unable to locate this graph, but be assured it exists (at least in my imagination). Nevertheless, I hope you will agree that 1) pre-industrial agrarian societies had a rather Malthusian relationship with land, thus 2) land was dreadfully important, and thus 3) there was a lot of non-value-creating politicking and fighting over land. The issue of land has not stopped being important (or divisive), but today lack thereof is no longer nearly as much of a cap on economic power
EDIT [2020]: I have found the graph! Behold:

The source, as usual, is the excellent website Our World in Data. Original here.

Military values: I was unable to find quantitative data on this, but the general pattern seems to be that the military played a more central role in pre-industrial societies than today, and that military values like bravery, martial prowess, discipline, and aggression have declined in importance since the industrial revolution.

Tendency towards exploitation: Historical data on GINI coefficients suggests that they were often about as high as they could get (in societies with average wealth close to the subsistence level, inequality is limited by the fact that you can’t take very much from people before they start starving to death, and when the poorest no longer exist, inequality goes down; the wealthier a society, the higher the rate of inequality that is “sustainable” in this sense). The Great Leveler by Walter Scheidel provides a good summary of this data. A summary of the summary might be the following fact: in 28 pre-industrial societies (including places like 1290s England, Byzantium in the year 1000, 1730s Holland, 1860s Chile), the average extraction rate was 77% of the theoretical maximum (for comparison, today’s OECD countries are roughly in the 20-40% range). I consider this strong evidence for a general tendency towards maximum extraction of resources by the elite in a zero-growth world. However, it’s clear that the causes of any shift are likely more complex than just the zero- to positive-sum transition (for instance, democracy makes ruthless exploitation of the masses harder, and knowledge work is less amenable to forceful extraction than agricultural work).

Corruption as the best get-rich-scheme in pre-industrial societies: In the same book (in fact, on the same page I linked above), Scheidel states that pre-industrial fortunes were usually extremely closely tied to political power, to an extent far greater than today.

Things change

The industrial revolution was the first time in human history during which the world saw prolonged economic growth at a rate fast enough to be obvious over a single human life.

If we step back and look at the grand sweep of human economic history, we see something like this:

Figure taken from this page on the phenomenal website Our World in Data.

Of course, there is much more to life than economics. However, the past few hundred years have also been ones of immense ethical change. Since the industrial revolution, we have gone from a world were war, slavery, racism, sexism, and religious intolerance are the norm and even celebrated to one where all of these things are rightly condemned.

A large part of this is because prosperous people living comfortable lives tend to care a lot more about others than poor people in bad conditions. Thus, even if growth were to suddenly stop, a large part of the moral gains we have made would likely remain. It is also true that the effect is not one way – in fact, one study found that secularisation often preceded economic growth.

However, there is a case to be made that, regardless of the level of prosperity, whether wealth is increasing or not is an important factor for what sort of attitudes prevail in the long run.

Intuitively, this makes sense. It’s much easier to be altruistic and tolerant when the ceiling of human capacity keeps rising. Economic troubles are among the first explanations cited by political pundits as a cause of the recent rise in intolerant populism. Whether the world is stagnant or growing also has an effect on what sort of strategies make sense.

We can capture this intuition with a thought experiment.

Blue vs red strategies

A shift from positive- to zero-sum games is also a shift in what sort of strategies are successful, and hence what sort of strategies will govern society in the long run.

Consider two different starting scenarios with the same players, one in an (almost) zero-sum world and the other in a strongly positive-sum world. Imagine, in each, three different factions, each following a specific strategy:

Blue invests in future growth to create value.
Red tries to capture value from others.
Green sits around being captured by Red.

In a positive-sum world like our current one, the future might unfold something like the graph on the right side in the image above. Red captures a bit of Green, but Blue makes enormous gains.

In a zero-sum world, like our past, or a hypothetical no-growth future, the future might unfold more like in the graph on the left. Blue succeeds in creating some value, but its gains are dwarfed by Red’s gains from conquering Green.

The key point is this: in the long run and in a positive-sum world, the Blue strategy will dominate, and Blue players – individuals, companies, institutions, governments, whatever – are the ones who dictate what the future looks like. In the long run and in a zero-sum world, the Red strategy will dominate, and Red players will have the most say in what the future looks like.

Thus, when the industrial revolution made the world economy shift from a zero- to a positive-sum game, a shift from Red to Blue strategies inevitably followed. The fact that society was wired for a zero-sum world slowed the spread of Blue strategies, but in the long run existing zero-sum values and customs were often swept aside by the greater success of the Blue strategy at capturing future value. Given a sufficiently long time scale, it is hard to resist this kind of harsh evolutionary logic.

In medieval Europe, there certainly were people who believed in peaceful cooperation and investing in the future. Unfortunately, in that time and place, this is not the strategy that maximises its adherents’ share of future power, and so these people were largely trampled underfoot by those who followed a Red strategy of capturing value from others.

To take another example: today, war is no longer the best way to make your nation greater. This doesn’t just mean that peaceful, tolerant, growth- and future-investing nations are the winners – it also means that, because they are the winners, they get a lot of say in how the world works. After all, it is human nature to spread your values to others. No surprise, then, when the post-industrial world order gradually shifts from one where war is simply politics by other means, to one where it is rare and condemned. Things like treaties, international organisations, and cross-border trade now dominate international politics. Ease-of-doing-business indices matter more than troop numbers.

Not everyone got the memo; some of those who didn’t even ended up in charge of big nations and started a few world wars, before being crushed by the Allies’ economic superiority. Being defeated in war forced Japan and Germany to become even more peaceful and growth-oriented than the rest, and now they’re among the richest countries in the world. Nowadays no serious up-and-coming nation even considers going warpath. Instead they compete to hit double-digit GDP growth, usually by first trying to build products for everyone else and then worrying a lot about things like investing in education to maximise the human potential of their citizens.

The transition is far from absolute. Win-win cooperation and future investment were never entirely absent, just as zero-sum fights are still very much part of our world. However, I’d argue that a shift in which type of interaction tends to have more power over the long run has happened.

Zero-sum thinking - a mistake?

Many foolish mistakes we now scorn are only mistakes because we live in a positive-sum world. For example, Donald Trump thinks in zero-sum terms: China gains a lot from trade, therefore that trade must be hurting someone, and most likely that someone is the United States, China’s largest trade partner; immigrants are moving into the country, they consume resources and take jobs when they live there, and therefore they must be a net drain on Americans; and so on. The critical mistake in all such lines of reasoning is that they ignore the fact that trade and immigration are often positive-sum situations. Trump’s suspicion for win-win cooperation would be a perfectly reasonable attitude in a negative- or zero-sum world.

A tendency for zero-sum thinking seems partly innate to humans. This is because a strongly positive-sum world has existed for less than two centuries, and is not the one our brains evolved to deal with. Many of the worst tendencies that zero-sum thinking brings with it are kept at bay only because (for the time being) growth is now a regular part of our world.

If the world turns back into a zero-sum world (or society turns zero-sum for a large enough section of the population), the danger isn’t just that zero-sum thinkers will be the winners. The danger is that they’ll also be right.

Sustainability vs values?

The idea that there is a serious contradiction between the ever-accelerating growth of human civilisation and the finite resources of our planet has become mainstream.

This view is broadly correct. A civilisation powered by fossil fuels cannot even maintain our current prosperity level without causing serious environmental issues (the finiteness of fossil fuels might eventually be a problem, but only long after the impacts on the climate have become catastrophic). It is also true that being naively optimistic about technological solutions is not wise.

Thus the early-21st-century dream for the future might look something like a prosperous sustainable planetary civilisation that has outgrown its hubristic drive towards ever greater capabilities, inhabited by people who coexist peacefully and hold on to altruistic liberal values.

However, like most dreams, something is off about this vision. We should not expect a stagnant, zero-sum world to be one where openness, altruism, and a future-oriented outlook are winning strategies.

This is not to say that a zero-sum world would revert back to medieval levels of warfare and violence. However, in the long run value-capturing players will gain at the expense of others. If history is any guide, a world where it is difficult to create value will tend towards one where connections and loyalty are everything, and those without are increasingly exploited. Most likely this would manifest more as politicking than outright bloodshed: a steadily rising tide of influence struggles, political dynasties, and moralising about who deserves what.

But even if we want to ensure that growth continues, what can we do about it? Environmental limits are very real, and a stagnant future is better than no future at all.

The only solution is to think bigger.

The physical limits are a lot further out than they may seem. Humanity’s energy consumption is about $$2 \times 10^{13}$$ watts (20 trillion joules per second). Harvesting 1% of the solar radiation that falls on Earth would net us on the order of $$10^{15}$$ watts (a thousand trillion joules per second). Relying only on this small sliver of solar energy, we can keep up a growth in energy consumption of 2% per year for the next 200 years, roughly as long as humanity has been making significant use of fossil fuels. After we reach this limit, we will have captured an infinitesimal slice of the energy output of one star in a galaxy of hundreds of billions.

(Ultimately, however, exponential growth is impossible. Physics sets an upper limit on the maximum density of computation, and presumably we need computation to create value – most fundamentally, you can't experience anything without computation going on somewhere (e.g. a brain). The finite speed of light means that the volume of space we can influence from the present grows in proportion to the cube of elapsed time. In the extremely long run, we are limited to cubic growth, which is polynomial, not exponential.)

There’s no guarantee that we will ever have the technology (or the will) to harness such power. However, it’s important to understand that the problems standing in the way are not fundamental physical limits. We do not lack energy – we lack the organisation, will, and ingenuity needed to harness the right energy sources. Given enough of these elements, the capacities of future humans may be as far removed from us as ours are from hunter-gatherers.

In the shorter run, the most critical task is transitioning to a sustainable civilisation, because what is not sustainable must eventually end, and certainly cannot grow without limit.

I think we should also make a greater effort to recognise and promote the non-zero-sumness of our world. Some problems genuinely are zero-sum, but many only seem that way because of our cognitive biases.

We must also make sure that the right variables are positive-sum. It is of little use if GDP keeps growing, but the benefits accrue only to a small number or are outweighed by non-economic costs. Growth in indicators like Green GDP or the Genuine Progress Indicator is likely a far better measure of the type of positive-sumness discussed here than raw GDP growth figures.

Finally, I want to draw attention to a simplification made in this discussion. I’ve written about zero- or positive-sumness as if they were immutable properties of the world that have a one-way causal effect on what happens. In reality there’s no magical ceiling on growth that constrains human activity. Human wealth increases when people go out and make things – life-saving medicines, time-saving devices, whatever.

Of course, different societies in different times can be more or less hospitable to growth. A peasant in medieval Europe would have a hard time making a significant contribution to human capacities. The industrial revolution relied on a critical mass of scientific understanding and Enlightenment values to get going.

Today, we have this immense legacy to thank for our ability to (on average) raise living standards by a few percent each year and keep the self-improving loops of both technology and values going.

The best future is not a stagnant one, but a growing one: a world where human capabilities stretch a bit further every year, and where the winners are those who create value rather than those who take it from others.

2019-09-08

Review: Structure and Interpretation of Computer Programs

Book: Structure and Interpretation of Computer Programs,
by Harold Abelson, Gerald Jay Sussman, and Julie Sussman (1996, 2nd ed.)

2.7k words (≈10 minutes)

Many regard Structure and Interpretation of Computer Programs (SICP) as the bible of programming. For good reason, as it turns out.

Beware the wizards

The Wizard Book. (Credit: MIT Press)

SICP is sometimes called the “Wizard Book”, because there’s a wizard on the cover (if your job is making an interesting cover for a programming book, what would you do?). However, this does not mean that the book has anything to do with –

“[L]earning to program is considerably less dangerous than learning sorcery, because the spirits we deal with are conveniently contained in a secure way.”

Um. Okay, I rest my case. Proceed with caution.

Contrarian SICP

For most subjects there is a standard way to present it that most books, lectures, etc. will follow.

For programming, the standard way seems to be to take some “mainstream” language, show how to print “Hello, World!” onto the screen, then start introducing things like assigning values to variables, conditionals, and so on. Pretty soon you can be doing some pretty impressive things.

SICP does not follow this route.

Why Lisp?

The first thing that might strike you about SICP is that the programming language of choice is Scheme, a dialect of Lisp (short for “LISt Processor”), which is commonly known as that obscure language invented in 1958 that wears down the parentheses keys on your keyboard.

Comic by Randall Munroe of xkcd. This comic can be found here.

However, the authors are not just being contrarian here; there are many good arguments for using Lisp in a book like this.

First, Lisp is the closest a programming language can get to having no syntax. You don’t have to learn where curly brackets are used, or which operators/functions follow which type of syntax, or a multitude of special characters that perform arcane pointer logic (I’m looking at you, C++).

If you have an expression in parentheses, the first thing inside the parentheses is the name of the function that is being called. Everything after it is an argument to be passed to that function. Something not in parentheses represents either just itself (e.g. a string, number, or boolean), or is the name of a variable that in turn represents something.

For example: (+ 1 (* 2 3) var) evaluates to the sum of the numbers 1, the product of 2 and 3, and whichever number the variable var has been set to.

Now you know approximately 90% of Lisp syntax (there’s also a few other things, like a special syntax that stands in for an unnamed function, and some shortcuts for things you’d otherwise have to type out repeatedly).

If you follow along with SICP, Lisp is self-explanatory.

The second point in favour of Lisp follows immediately from the first: the near-absence of syntax means you don’t have to think about it. Once you get used to it, writing in Lisp feels almost like transcribing pure thought into code.

When a language implements various special syntaxes, it generally privileges certain design patterns and ways of thinking; if for-loops are unavoidable, the programmer will think in for-loops. A near-absence of syntax means neutrality. Some might call it blandness; fair enough, but Lisp’s blandness is very powerful when used right. It makes it a very useful language for a book like SICP, which tries to teach you (for example) many different ways of abstracting data, rather than the one that is made most convenient by a language’s syntax.

The third point in favour of Lisp is that what little syntax it has was chosen carefully, namely in such a way that Lisp code is also Lisp data. The example function call (+ 1 (* 2 3) var) given above is just a list of the elements +, 1, the list of the elements *, 2, and 3, and var. This means that it’s very easy to write Lisp code that operates on Lisp code, something that comes in handy when SICP walks through the operation of a Lisp interpreter (in more practical situations, it also enables Lisp’s powerful macro system). To put it another way, introspection is easier in Lisp than other languages.

Finally, as the (perhaps biased) authors write: “Above and beyond these considerations, programming in Lisp is great fun.”

Executable math

Once you’ve gotten over all the parentheses, the second thing you’ll notice about SICP is the order in which topics are presented.

The first chapter is entirely devoted to creating abstractions by defining functions. Only function (and variable) definition and function calling are used – no mention is made of data structures or changing the values of variables.

If you think it’s impossible to do anything interesting by just calling functions, you are wrong, and SICP will prove it.

The chapter runs through the very basics of function application, variable definitions, and the substitution model of how to apply functions (this last point will latter be amended). It discusses iterative and recursive processes, and how iterative processes can be described by recursive functions.

A lot of the things you can do by just calling function are quite math-y. SICP does not shy away from this: Newton’s method for square roots, numerical integration, and finding fixed points of (mathematical) functions are prominent examples. No prior knowledge about the math is assumed, but this may still put off many readers because it’s abstract and not directly relevant to most real-world problems. “Executable math” is a pretty good summary of what most of this chapter is about.

However, the chapter really is striking. Using just one type of abstraction (defining functions) and not too many pages, SICP scales from the very basics to solving fairly involved problems with techniques, like extensive use of higher-order functions, that would be left for much later in a more conventional work.

Finally: data!

Only in the second chapter does SICP turn to data structures. Once again the format is the same: introduce exactly one type of abstraction, and systematically introduce examples of how it’s useful and what can be done with it.

The basic Lisp data structure is creating cells that link together two values. The primitive function for this is cons. If we want to chain together many values, for instance to create a list of the elements 1, 2, and 3, we can do this with (cons 1 (cons 2 (cons 3 null))) (of course, there’s also a function – list – that creates lists like this automatically ).

Additionally, Lisp provides primitive functions for accessing the first and second element in a cons cell. For historical reasons, these functions are called car (returns the first element) and cdr (returns the second element). This means that the cdr of a list defined in the same way as above would be all but the first element of the list.

But what is data? Or do we even care? After all, all that interests us about cons, car, and cdr is that if we define, say, x as (cons 1 2), then (car x) should be 1 and (cdr x) should be 2.

One clever way of implementing this – and one that will likely seem both weird and ingenious the first time you see it – is the following:

(define (cons x1 x2) ; define cons as a function on two inputs
  (define (dispatch n) ; define a function inside cons
    (if (= n 1)
      x1  ; return x1 if n = 1
      x2)) ; else, return x2
  dispatch) ; the cons function returns the dispatch function

(define (car x)
  (x 1))

(define (cdr x)
  (x 2))

What’s happening is this: cons returns the function dispatch. Let’s say x is a cons cell that we have made with cons, consisting of the elements x1 and x2.

Now we’ve defined the car of x to be whatever you get when you call the function x with 1 as the argument. x is what the cons function returned, in other words the dispatch function, and when we call that with 1 as the argument, it will return x1. Likewise, when we call x with the argument 2, the dispatch function that x represents will return x2. We have satisfied all the properties that we wanted cons, car, and cdr to have.

Is this how any reasonable Lisp implementation actually works? No.

(If you’re confused about the previous example: note that we’ve snuck in an assumption about how variable scoping works in functions. When the dispatch function is created inside the cons function, the variables x1 and x2 are obviously bound to whatever values we inputted into cons. What’s not obvious is that dispatch can access these values when it’s called later – after all, x1 and x2 were local variables for a function call that will have ended by then (and the cons function might have been called many times, meaning many x1s and x2s). However, in Lisp the environment at the time a function is called is bound to that function. When that function is called again, any local variables like x1 and x2 present in a parent function in which that function was defined will remain accessible. This type of thing is called a closure.)

Mutability (changing variable values after they’ve been defined) is only introduced in the third chapter; up until then, the book focuses purely on functional programming.

The third chapter is the culmination of the first half of the book: now that functions, data abstraction, and mutability have all been discussed, the authors introduce many examples of the structures that are now possible.

The what evaluator?

SICP walks the reader through the process of writing a Lisp evaluator in Lisp, something that is called a “metacircular evaluator”.

Writing a Lisp evaluator in Lisp might seem pointless, but remember that a programming language, especially one like Lisp, is just as much a language for setting down our thoughts about procedures as it is something to be executed by computers. A Lisp-to-Lisp interpreter has the advantage that it is one of the simplest interpreters that it is possible to write. Interpreters for Lisp benefit greatly from the simplicity of Lisp’s syntax, while interpreters written in Lisp benefit from the expressiveness and flexibility of the language. Thus, with our Lisp-to-Lisp interpreter, the essence of an evaluator is laid about as barely before our eyes as it can be.

If you're willing to forget about code readability and leave out some syntactic sugar like cond expressions, you can literally behold the metacircular evaluator at a glance. (Note the #lang sicp line – DrRacket has a package that implements the exact version of Scheme used in SICP.)

The authors write:

“It is no exaggeration to regard this as the most fundamental idea in programming:

‘The evaluator, which determines the meaning of expressions in a programming language, is just another program.’

To appreciate this point is to change our images of ourselves as programmers. We come to see ourselves as designers of languages, rather than only users of languages designed by others.”

After presenting the metacircular evaluator (and an optimisation), the authors go on to discuss three “variations on a Scheme” (haha …):

Making the evaluator lazier. More precisely, delaying the evaluation of an expression until it is needed (“lazy evaluation”). This allows, for example, the convenient representation of infinite lists (“streams”), and more flexibility in creating new conditionals.
Non-deterministic computing, in which the language has built-in capabilities to handle statements like “pick one of these three items”, or “search through these options until some permutation matches this condition”. With such a language, some logic puzzles can be solved by simply stating the requirements and pressing enter.
A logic programming language, which can process queries about data.

Programming often involves wanting to do something, and then taking that task and “building down” to the demands of whatever programming language is used. A powerful alternative method is to also build up the language to customise it for the needs of the task at hand. The boundary between language and program blurs.

It’s almost as if …

“The evaluator, which determines the meaning of expressions in a programming language, is just another program.”

What do we say to compilers? Not today

There’s a fifth chapter to SICP, in which a register machine simulator is constructed, and then used to implement – surprise surprise – a Lisp compiler.

In a way, this completes the loop: the first three chapters show what kinds of things various programming abstractions allow, the fourth shows how these abstractions can be used to implement themselves, and the fifth looks “under the hood” of Lisp itself to consider how it can be implemented with elements simpler than itself. Of course, the question of how the simpler register machine itself can be implemented is left unanswered, but this is already starting to brings us into the realm of hardware, for which another book might be better suited.

For the first four chapters I did perhaps half of the exercises; for the last, I just read the main text. The chapter feels more theoretical than the previous ones. Even though the Lisp-to-Lisp evaluator of the fourth chapter is purely academic, I found it more interesting (and also more practical, since I recently wrote an interpreter for a project) than the construction of a compiler from simulated versions of very restrictive components. Hopefully I will return to the chapter at a later point, but for now a more thorough reading will have to wait.

First Principles of Computer Programming

SICP is a rather unconventional programming book. I think this is largely because the authors seem to have started from first principles and asked “what should a good book on deep principles in high-level programming languages look like?”, rather than making all the safest choices.

Therefore, Lisp.

Therefore, presenting one element at a time (functions, data abstraction, mutability) with care and depth, rather than the (admittedly faster and more practical) approach of introducing all the simplest things first.

Therefore, spending a lot of time hammering in the point that what evaluates/compiles your program is just another program.

SICP is not about showing you the fastest route to making an app. Unless you’re of a theoretical bent, it might not even be a particularly good introduction to programming in general (on the other hand, on several occasions I was slowed down by prior misconceptions; those with a fresher perspective may avoid some difficulties).

However, it excels as a deep dive into the principles of programming. Especially if you have experience with programming but haven't yet read a systematic treatment of the topic, SICP will be invaluable in straightening out and unifying many concepts.

Links & resources

SICP is available for free online in a variety of formats:
- MIT’s official web version and other SICP stuff
- PDF and EPUB conversions are available here

The SICP lectures are on YouTube.

I’m not aware of an official SICP solution set, but you will find many on the internet. This one seems to be the most complete, often featuring many solutions to a given exercise.

How to Design Programs: an alternative book

A similar, first-principles-driven, Lisp-based book on programming called How to Design Programs (HTDP) also exists (I have not read it). This book was consciously designed to emulate SICP in the good while fixing the bad, particularly in the context of being used as an introduction to programming (the authors of HTDP have written an article called The Structure and Interpretation of the Computer Science Curriculum in which they summarise their case).

Incredibly, HTDP is also available available for free online. Either MIT Press has been overrun by communists, or the people who write good programming books are far more charitable than the average textbook writer.

2019-08-20

Review: The Pleasures of Counting

Book: The Pleasures of Counting, by Thomas William Körner (1996)

1.8k words (≈6 minutes)

On his website, T. W. Körner introduces The Pleasures of Counting as follows:

Longer than “With Rod and Line Through the Gobi Desert”, funnier than “The Wit and Wisdom of the German General Staff” and with more formulae than “A Brief History of Time” [The Pleasures of Counting] was voted Book of the Year by a panel consisting of Mrs E. Körner, Mrs W. Körner, Miss K. Körner and Dr A. Altman (née Körner).

The Pleasures of Counting is hard to categorise. On one hand, its flow and lucidity match the best works of general non-fiction, even though the book features more tangents than an intro to derivatives course. On the other hand, in contrast to books that merely tell about math, Körner has the gall to make the reader do exercises.

The result is 500 pages of insights, proofs, exercises, and real-world applications about topics ranging from cholera to submarine warfare to weather prediction, all delivered in Körner’s personable style and with a generous heaping of witty anecdotes and the occasional bit of verse. And it is glorious.

Warning: may contain math – but don’t worry

A first question about any book involving math is how much you need to know beforehand for it to be comprehensible.

Most mathematical arguments in The Pleasures of Counting can be followed with straightforward algebra. This does not mean the results themselves are straightforward – some, for instance the derivation of the Lorentz transformation or an outline of Shannon’s theorem, require careful thought and are easy to get lost in. Some exercises also either require or benefit greatly from prior exposure to calculus. However, in general The Pleasures of Counting manages to be both accessible and fairly deep. A casual reader can skip exercises and tricky arguments while still getting the gist, while other readers will find much to dig into in the more intricate proofs and exercises. All notation used is explained in an appendix.

Most people can gain something from this book, and given the breadth of the material, I expect very few will encounter nothing new.

The pleasures of everything under the sun

Körner discusses many common examples of mathematical reasoning and results, including special relativity, Galileo’s arguments about motion, Engima machines, Turing’s work, fractals, sorting algorithms, and the effects of scaling on biology, though always with his own spin on each topic.

I particularly enjoyed Körner’s discussion of dimensional analysis in physics – a fancy way of saying that you figure out what variables some quantity should depend on, fiddle with them until you get an equation where the units (mass, length, time) check out, and then go design bridges with it.

This is an example of the “dangerous but fascinating past-time” of what Körner calls science “in a darkened room” – trying to derive scientific facts from pure thought alone. Science requires both reason and observation; relying on one alone is like trying to walk with one leg. That’s not to say it’s impossible to go places by hopping with one leg: Körner relishes in showing how you can start from small, abstract assumptions and hop over to interesting conclusions, such as why helicopters have long blades or how spacetime works.

The most unique and refreshing parts of The Pleasures of Counting are Körner’s presentation of the works of several somewhat less well-known scientific figures, such as G. I. Taylor, Lewis Fry Richardson, and Patrick Blackett. I have a feeling Körner’s pick of figures to examine is not random – all three are British mathematicians, physicists, or mathematical physicists who lived from the late 1800s to the mid/late-1900s, worked on war-related issues (Blackett was a major advisor on military strategy and operational research in World War II, Taylor participated in the Manhattan Project, and Richardson was an ardent pacifist who was a conscientious objector during World War I and later attempted a mathematical analysis of the causes of war), and studied/taught at Cambridge, like Körner. The timelines make it possible that Taylor and Blackett could have worked in Cambridge at the same time as Körner studied there, though I cannot recall Körner mentioning any personal knowledge of them in The Pleasures of Counting.

The pleasures of digression

Körner does not restrict himself to purely mathematical matters. At one point a Socratic dialogue on the axioms of number theory segues into a discussion on the purpose of university:

TEACHER: […] When Mill wrote On The Subjection of Women [alright, the dialogue may have been going on a slight tangent even before the university stuff], he was consciously following Plato in this, and, still more importantly, in his view that everything is open to question and that positive good may come from rational discussion.

STUART [a student]: And that is what university is all about.

TEACHER: Not really.

STUART: But that is what university ought to be all about.

TEACHER: So you think the taxpayer is parting with large sums of money so that young ladies and gentlemen can sit around discussing life, the universe and everything. You are here to learn mathematics and more mathematics – not to row, play bridge, act or even to find yourselves – and that is what I am going to teach you.

STUART: But, even if that is what the taxpayers want, is it what they ought to get? A university which just trains technicians is not a university; it is a technical college.

TEACHER: Better a good technical college than a corrupt university. What ought you to learn at university besides mathematics?

STUART: Students learn to question received opinions.

TEACHER: So, after I have made you write out 100 times: ‘I must not accept authority,’, what do we do next?

ELEANOR [another student]: That’s simple. You make us write out: ‘I really, really must not accept authority.”

TEACHER: Besides which, asking questions is the easy bit. It’s finding good answers which is hard. A university is at least as much a repository for the accumulation of human experience and an instrument for passing it on as it is a device for adding to it.

STUART: But just teaching mathematics is not enough. A lot of us will go on to be engineers and managers and will have to take moral decisions. So why don’t you teach us ethics?

TEACHER: But would you actually go to lectures on ethics?

STUART: If the lecturer was good, yes.

TEACHER: But anybody would go to hear Sir Isaiah Berlin lecturing on how to watch paint dry. The question is, would you go listen to your ordinary lecturers talking about ethics?

ELEANOR: Not unless it was for examinations.

STUART: So why not examine it?

TEACHER: What would the examination questions look like? ‘Is it wrong to steal from widows and orphans? Answer yes or no and give brief reasons.’

STUART: There are lots of difficult and interesting moral problems.

TEACHER: Yes, but the problems of the human race are not those of finding the answer to moral problems in hard cases but of acting on the answer in simple ones. American law schools now include courses on ethics, but the only observable result is that the defence in cases of fraud now begins ‘My client’s behaviour has throughout been not merely legal but ethical.’ […] If wisdom were teachable it would surely be our duty to teach it. Since it is not, we simply try to teach mathematics.

Opinionated? Yes. Controversial? Perhaps. Does he have a point? Definitely.

Körner also discusses how to persuade bureaucratic committees (and when to give up), the principles of successful smalltalk, and the philosophical issue of whether and how we should discount future values.

And then, after you’ve been nodding along at one of these digressions for a while, you snap out of a Körner-induced trance, realise you’re halfway through a proof, and that you’ve been enjoying it all the way.

The idle mathematician of an empty day

Ultimately, The Pleasures of Counting is not about the usefulness or applicability of mathematics, but the joy of it. Deriving truths from other truths, or looking at the messiness of the real world and capturing its broad strokes with a few symbols is not just a means to an end but also an art form, a way of thinking, and a purpose in itself.

Körner closes The Pleasures of Counting with the prologue of William Morris’s The Earthly Paradise. This poem is perhaps the best (and certainly most poetic) argument for the importance of “useless” endeavours like math or poetry. My idle blogging cannot beat Morris’s verse, so here is the poem in full:

Of Heaven and Hell I have no power to sing,
I cannot ease the burdens of your fears,
Or make quick-coming death a little thing,
Or bring again the pleasures of past years,
Nor for my words shall ye forget your tears,
Or hope again for aught that I can say,
The idle singer of an empty day.

But rather, when aweary of your mirth,
From full hearts still unsatisfied ye sigh,
And, feeling kindly onto all the earth,
Grudge every minute as it passes by,
Made the more mindful that the sweet days die –
– Remember me a little then I pray,
The idle singer of an empty day.

The heavy trouble, the bewildering care
That weighs us down who live and earn our bread,
These idle verses have no power to bear;
So let me sign of names remembered,
Because they, living not, can ne’er be dead,
Or long time take their memory quite away
From us poor singers of an empty day.

Dreamer of dreams, born out of my due time,
Why should I strive to set the crooked straight?
Let it suffice me that my murmuring rhyme
Beats with light wings against the ivory gate,
Telling a tale not too importunate
To those who in the sleepy region stay,
Lulled by the singer of an empty day.

Folk say, the wizard to a northern king,
At Christmas-tide such wondrous things did show,
That through one window men beheld the spring,
And through another saw the summer glow,
And through a third the fruited vines a-row,
While still unheard, but in its wonted way,
Piped the drear wind of that December day.

So with this Earthly Paradise it is,
If ye read aright and pardon me,
Who strives to build a shadowy isle of bliss
Midmost the beatings of the steely sea,
Where tossed about all hearts of men must be,
Whose ravenous monsters mighty men shall slay,
Not the poor singer of an empty day.

2019-08-15

Review: From Nand to Tetris

Book: The Elements of Computing Systems, by Noam Nissam and Shimon Schocken (2005)

5.9k words (≈30 minutes)

A brief rant about the title

“From Nand to Tetris” (Nand2Tetris for short) is the name of the course, website, and overall project that the book The Elements of Computing Systems is part of. It’s an excellent name – catchy, concise, and expertly captures the content.

However, apparently it’s a law that a textbook must have a stodgy title consisting of a reference to the subject matter (bonus points for ostentatious circumlocution, like writing “computing system” instead of “computer”), perhaps attached to a generic word like “concepts” or “elements” that doesn’t mean much.

For the rest of this post, I will pointedly ignore the name “The Elements of Computing Sytems” and refer to it as “From Nand to Tetris” or “Nand2Tetris” instead.

You’re a wizard

At first glance, computers are basically magic.

Science fiction author Arthur C. Clarke once said:

Any sufficiently advanced technology is indistinguishable from magic.

A more accurate phrase might be “any sufficiently advanced technology appears to be indistinguishable from magic”. Of magic, all you can say is it just works. With technology (or, for that matter, anything in our world), there is always a reason.

The goal of Nand2Tetris is to take computers from the realm of magic into the realm of understanding.

This is a difficult task, since the technological stack connecting the physical world to a desktop operating system is perhaps the highest and most abstract technological stack humans have created. The functioning of computers is also a topic split into many layers, each of them its own broad field, from chip design to compilers to programming languages to operating systems.

What Nand2Tetris does is presents one example of a path from logic gates to chips to a machine language to virtual machines to high-level languages to an operating system. The aim is not so much to explore every nook and cranny of the computational jungle, or even to provide a map, but instead to demonstrate that such a path is even possible.

A path through the jungle

Logic gates

Boolean logic and basic gates

Most of the function of a computer can be constructed from just two pieces of hardware. The first is the NAND gate.

The only thing we need to know about the NAND gate is that it takes two inputs, each of which takes a binary value (we call the values 0 and 1), and produces a 1 as an output except when both inputs are 1, in which case the output is a 0.

We will not discuss the implementation of the NAND gate, but instead assume that such a device can be implemented by electrical engineers who are clever enough (we have to start somewhere).

In fact, it is barely relevant that the NAND gate is a physical device. We can instead think of it – and other logic gates – as a mathematical function, which maps some set of 0s and 1s to an output value.

In the case of a NAND gate, it takes two inputs and maps it to one output in the manner specified by the following table:

0, 0 -> 1
0, 1 -> 1
1, 0 -> 1
1, 1 -> 0

(The name “NAND” is an abbreviation of “not and”, since the NAND of A and B is true except when the AND of A and B is true)

Such tables are called truth tables. From Nand to Tetris provides a handy list of all 2^4 = 16 two-argument boolean functions:

If you have studied boolean logic before, you have doubtlessly spent time manipulating sets of 0s and 1s (or truths and falsities) linked together by AND, OR, and NOT operators. Why these three? In addition to them being straightforward to understand, it turns out that it is possible to specify any boolean function with AND, OR, and NOT operators.

(How? If we have a truth table of the function (if we don’t or can’t make one, then we haven’t properly specified the function!), we can simply take every row for which the value of a function is a 1, build an expression for identifying that row out of ANDs, and then chain together all of these expressions with some ORs. For example, if we want the input sequence a=1, b=0, and c=1 map onto a 1, the expression (a AND ((NOT b) AND c)) will be true for this and only this sequence. If we have a bunch of such expressions, say expressions w, x, y, and z, and we want to figure out if at least one of them is true, we can do so with the expression (w OR (x OR (y OR z))). If we have AND and OR functions/gates that can take more than two arguments, this becomes simpler, since we don’t have to nest ANDs and ORs and can instead write something like OR(AND(a, (NOT b), c), expression2, …).)

It turns out that the NAND function itself is sufficient for defining AND, OR, and NOT (NOR – the negation of OR, in the same way that NAND is the negation of AND – has the same property). This implies that if we have a logic gate that implements the NAND function, we can use it – and it alone – to build chips that implement AND, OR, and NOT functions, and hence any boolean function.

How? (NOT x) can be implemented as (x NAND x). Using this definition, we can write (x AND y) as (NOT (x NAND y)), and (x OR y) as ((NOT x) NAND (NOT y)). Using logic gate symbols, we can represent these gates and some others as follows:

(Warning: the book makes you design all of these.)

Note that the demultiplexor has two outputs. Hence “demultiplexing” is not a function (a function has only one output), and we call it a chip rather than a logic gate.

Note that the concerns of designing a chip out of NAND gates are not precisely the same as that of specifying the boolean function out of NAND operations. For instance, since we defined (NOT x) as (x NAND x) and (x AND y) as (NOT (x NAND y)), the NAND representation of (x AND y) is ((x NAND y) NAND (x NAND y)). There are three NAND operations, so it looks like we need 3 NAND gates – but no, we can split wires as in the above diagram and do it with two. Similar concerns apply to the implementation of the XOR gate in the above diagram.

There are many subtleties in optimising chips, which we (following the example of Nand2Tetris) will skip in order to get on with our journey.

Multi-bit and multi-way chips

A multi-way version of a basic gate allows for applying the function of the gate to many inputs at once. A multi-way AND outputs true if and only if every input is a 1; a multi-way OR outputs true if at least one input is a 1. The implementation is simple: for a 8-bit AND, for instance, take the AND of the first two inputs (call it A), then the AND of A and the third (call it B), then the AND of B and the fourth, and so on.

A multi-bit version of a chip is basically many of those chips in parallel, applying their function to every piece of the input.

For example, a 4-bit AND chip fed 1101 and 0100 as its inputs will output 0100 – the first output is the AND of the first digit of the two inputs, and so on. The implementation is even simpler: send bit 1 of inputs A and B through one AND gate, bit 2 of both through another, and so on.

It gets a bit more complicated when dealing with multiplexors that are both multi-way and multi-bit, but the basic principle is the same: we have a bunch of binary values that we want to group together (perhaps they represent a number), and so we build chips that allow us to deal with them together.

In addition, we don’t want to deal with the wires representing each binary digit of a number individually, so we group them into “buses” that transfer many bits at once from component to component (basically just a clump of wires, as far as I understand). On diagrams, a bus looks like a wire, except with a slash through it.

Arithmetic

Now that we have constructed our basic gates, we can begin doing something interesting – at least if addition counts as interesting.

Since our chips are built of gates that deal with binary values – true or false, 1 and 0, whatever – any reasonable low-level implementation of arithmetic will be confined to base-2 rather than our standard base-10 number system.

(To convert binary to decimal, just remember that the value of each digit goes up by a factor of 2 rather than 10 as you move from right to left; for example, 1011 (base 2) = 1 x 1 + 2 x 1 + 4 x 0 + 8 x 1 = 11 (base 10))

This turns out to make things much simpler. The algorithm for addition is the same (add corresponding digits, output a result and a carry, take the carry into account when adding the next two corresponding digits), but we have far fewer cases:

0 + 0 --> 0, carry 0
0 + 1 --> 1, carry 0
1 + 0 --> 1, carry 0
1 + 1 --> 0, carry 1

We can see that the result bit has the same truth table as the XOR function, and the carry bit has the same truth table as the AND function. Hence, for the simple purpose of determining the result digit and carry bit of two binary digits, the following chip (called a half-adder) is sufficient:

Now we have to figure out how to chain such chips to create a multi-bit adder that can deal with carry bits. Observing that, at most, we will have two input bits and one carry bit to deal with to determine the resulting bit, let’s construct a chip that takes three bits as input and outputs the result and the carry bit. If we add a 0, a 1, and a 1, the corresponding result digit is a 0 and the carry is a 1; if we add a 1, a 1, and a 1, the result bit and the carry bit are both a 1.

The result, called a full-adder, can be constructed like so:

Now that we have a full-adder, we can construct a multi-bit addition chip by simply chaining them together, feeding the carry bit from the previous full-adder as one of the three inputs into the next one.

The only complication is that we have to connect a constant-0 input to the first full-adder to fill its third input, and the final carry bit goes nowhere.

A 4-bit adder looks like this:

This is pretty cool – after all this messing around with logic and logic gates, we have finally built a piece of hardware that does something real.

We still have to consider some issues. The most important is how we represent negative numbers. If are smart enough about it, this also comes with a bonus: we don’t have to construct a separate chip to handle subtraction, but can instead subtract A from B by converting B to the representation of -B and then adding it to A.

The standard method is called the two’s complement method, and it says: in an n-bit system, represent -x as the binary representation of $$2^n - x$$.

For example, let’s say our system is 8-bit. 0000 0000 is 0 (space inserted for readability), 0000 0001 is 1, 0000 0010 is 2, 0000 0011 is 3, and so on. -0 is $$2^8$$ - 0 = 1 0000 0000, but we keep only 8 bits so this turns into 0000 0000 as well. -1 is $$2^8 - 1$$ = 255 = 1111 1111. -2 is $$2^8 - 2$$ = 254 = 1111 1110. And so on.

Another example: From Nand to Tetris presents the complete table for a 4-bit system:

The most important consequence of this is that our addition circuit can properly add negative and positive numbers together. -1 + 1 would be represented as inputting two buses, one containing 1111 1111 and the other 0000 0001, to our addition circuit. Adding these up gives 0000 0000 (since our adder only deals with the first 8 bits), or 0, just as intended. -1 + -1 becomes 1111 1110, which is -2. The reader can verify other examples if they so wish.

Another consequence of this is that the largest number we can represent in our n-bit system is $$2^n / 2 - 1$$, and the most negative number is $$-(2^n) / 2$$. Our 8-bit system only gave us the integers -256 to 255 to play with, but the growth is exponential. In a 16-bit system, we can represent the numbers -32768 to 32767; in a 32-bit system, -2 147 483 648 to 2 147 483 647; in a 64-bit system, -9 223 372 036 854 775 808 to 9 223 372 036 854 775 807.

(Of course, when necessary we can implement logic for handling even larger numbers at a higher level in the computer, just as we can implement logic for handling any other data type).

Yet another consequence is that if we add together positive numbers that exceed the limit, the result will be a negative number (large enough negative numbers will also add to a positive). This feature has lead to countless incidents, with some of the more notable ones (ranging from exploding rockets to nuke-obsessed Gandhis in the Civilization game series) listed in this Wikipedia article. As always: beware leaky abstractions.

The Arithmetic Logic Unit

The final piece of our journey that relies on logic gates alone is the construction of an arithmetic logic unit (ALU). Though it has a fancy name, all it does is implements some basic operations we will need: adding, ANDing, negating, and zeroing input buses.

The ALU constructed in Nand2Tetris operates on two 16-bit buses, and also takes 4 bits to configure the inputs (to determine whether to negate and/or zero the x and y inputs), 1 bit to change the function (switches between ANDing and adding the 16-bit input buses), and 1 bit to determine whether or not to negate the output. In addition to outputting a 16-bit bus with the result of the computation, it also outputs 1 bit saying whether or not the output is zero, and another saying whether or not it’s a negative number.

(Note that “negating” a binary bus means flipping each bit (e.g. 0101 --> 1010), and is a different process from switching the sign of a number (e.g. 0101 (5) --> 1011 (-5) with the two’s complement method))

My implementation of such an ALU looks like this (for clarity, I have first defined a separate “ALUPreP” ALU pre-processor chip to perform the negating/zeroing of the x/y inputs):

Slashes on lines indicate a 16-bit bus, not a single wire. The 16s on various bits indicate that they are 16-bit versions of the chip (the AND-gate marked MW 16 is instead a multiway chip; see above discussion on multibit vs multiway chips). The splitting of the first bit from the result bus is a bit questionable as a chip design element, but it works since in the 2's complement method all negative numbers begin with a 1.

Having a bunch of bits to zero and negate outputs and inputs and whatever may seem pointless. However, such a design allows us to compute many different functions in only one chip, including x + y, x AND y, x OR y, -x, x+1, y-x, and so on. From Nand to Tetris provides a table (where the notation & = AND, | = OR, and ! = NOT is used):

Remember that the only piece of hardware needed to implement all of this is the humble NAND gate.

(To give a sense of scale: by my count, my ALU design above requires 768 NAND gates if we simply substitute in the NAND-only versions of other gates (768 happens to be 2^9 + 2^8, but this is just a coincidence).)

Flip-flops

I mentioned earlier that only two pieces of hardware are required to implement most of our computer. In the previous section, we examined what we can do with NAND gates; now, we will turn to flip-flops (no, not that type of flip-flop).

NAND gates allow us to perform any feat of (binary) logic that we wish, but they do not allow for memory.

The way in which the Nand2Tetris computer implements memory is with a data flip-flop (DFF). The principle of a DFF, like a NAND gate, is simple: its output at one “tick” of the computer’s central clock is its input at the previous tick.

Thus, to add DFFs to our computer, we need to assume the existence of some type of clock, which broadcasts a signal to all DFFs. This allows us to divide time into discrete chunks.

Real electronics always involves a delay in passing the signal from one component to another. Thus, when we pass inputs to our ALU, there’s a brief moment before the ALU stabilises to the true result. Inputs arriving from different parts of the computer also do not arrive simultaneously. Dividing time into ticks allows us to abstract away all such concerns (as long as the ticks are long enough for everything to stabilise); all we care about is the state of the computer at each tick, not what happens in between two ticks while the state is transitioning.

A DFF and a multiplexor (a logic gate with two inputs and one selector bit, outputting the first input if the selector bit is 0 and the second if the selector bit is 1) can be combined to create a 1-bit register:

The operation of this register is as follows:

If the selector bit is 1, the DFF’s output at time t is the input value at time t-1.
If the selector bit is 0, the DFF’s output at time t is its output at time t-1.

Hence, we can set a value (either a 0 or a 1) into the 1-bit register, and it will keep outputting that value until we tell it to change to a new value (by sending it the new value and sending a “1” as the input to the multiplexor’s selector bit).

Of course, a storage of 1 bit doesn’t allow for very many funny cat GIFs, so clearly there’s still some work to be done.

The first thing we do is we make the registers bigger, simply by adding many 1-bit registers in parallel. Most useful elements on which we do computations (numbers, letters (which are stored as numbers), etc.) take more than one bit to specify, so it’s useful to split memory into chunks – 16 bit chunks in the case of the Nand2Tetris computer.

Next, let’s take many of these bigger registers, and put them in parallel with each other. The problem now is accessing and setting registers in the memory independently of each other. We can add a series of address bits as inputs to our memory chip and then build some circuitry so that the output will be the contents of the memory with the address specified by the address bits, and if we load a new input, the input will be loaded into the register with the address that is being inputted.

A simple memory unit of four 16-bit registers, each uniquely identified by 2 address bits (address are: 00, 01, 10, and 11), and its control logic can be implemented as follows:

Less complicated than it looks. The input is split so that it reaches every register. If load=1, the three multiplexors on the left route it to one of the registers, and the input is loaded into that register (if load=0, nothing is meant to be loaded this tick and all load?-inputs into the registers are 0, so nothing happens). The register output buses are passed through a series of multiplexors to select which one's output is finally sent out of the memory chip.

To construct larger memory chips, all we need to do is add registers and expand our address access logic. If we want a memory chip with, say, 64 registers, we need log2(64) = 6 bits to specify which address we are talking about, and hence 6 address bits (n address bits gives you 2^n unique addresses, which is why computer memory sizes are usually powers of 2: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, etc.).

Since we can access and set any part of the memory at the same speed and in any order, we call this “random access memory” (RAM). RAM is not permanent memory – maintaining state requires DFFs constantly passing looping signals to each other, which in turn requires power. Turn off power, and you lose memory contents.

(RAM based on DFFs is called static RAM or SRAM, and is faster but more expensive and power-hungry than the alternative capacitor-based DRAM design. Hence SRAM is mainly used for small CPU caches (in the kilobyte or single-digit megabyte range), while the main system memory – what people generally think of when they hear “RAM” – uses DRAM, with capacities in the one or two-digit gigabytes.)

Nand2Tetris does not examine the functioning of hard disk drives (HDDs) or solid state drives (SSDs) or other more permanent data storage devices.

Instruction sets & machine language

So far, using nothing but NAND gates and DFFs (and, well, wires and buses and clocks and so on), we have figured out how to:

perform arbitrary logic operations on binary data (and hence also perform basic arithmetic in base-2), and
store arbitrary binary data in memory of arbitrary size.

All this arbitrariness gives us a lot of power. Let’s put it to use.

The next thing we want to implement is the ability to give our computer instructions. We have already shown that it is possible to build chips that carry out arithmetic on binary numbers, so if we encode instructions as binary strings, we can identify and handle them just fine (though the control logic is complex).

In Nand2Tetris, two basic types of instructions are implemented, each 16 bits long. I list them here to give you an impression of what they’re like:

If the first bit is a 0, the next 15 bits are interpreted as a memory address (in a memory of size 2^15 = 32768 bits), and the contents in memory at that point (a 16-bit value in our 16-bit system) are loaded into a special address register inside our CPU.
If the first bit is a 1, then:
- Bits 2 and 3 do nothing.
- Bit 4 determines whether the second input we will pass to the ALU is the contents of the address register, or the contents of the memory location that the address register points to (the first input to the ALU is always the contents of a special data register in our CPU).
- Bits 5-10 are the 6 bits passed to the ALU to determine which function it will compute on its inputs (see ALU table above).
- Bits 11, 12, and 13 determine whether or not to send the output of the ALU to, respectively: i) the address register, ii) the data register, and/or iii) the memory location that the address register points to.
- Bits 14, 15, and 16 determine which of 8 (=2^3) possible jump conditions apply. If all three bits are zero, do not jump (the next instruction executed is the next one in the program); the remaining 7 possibilities encode things like “jump if ALU output is negative”, “jump if ALU output is zero”, “jump regardless of ALU output”, and so on. The destination of the jump is always to the instruction in instruction memory* with the address currently in the address register.

(*For simplicity, the computer design in Nand2Tetris features a separate memory space for the instructions that make up the program and for the data the program operates on)

The Nand2Tetris CPU instruction set is a rather minimalist one, but even so it allows for real computation.

If you need convincing that this is true, consider a program that adds all numbers from 1 to 100. An annotated instruction sequence which achieves this is given below, interspersed with what the code might look like in a more readable language:


D refers to the data register, A to the address register, and M[A] to the memory contents that the address register points to. Before each machine language segment are one or two lines of higher-level code which may translate into the machine code underneath. The choice of M[0] and M[1] as the places where we store the two variables is arbitrary.

Such a list of instructions is called machine language.

With machine language, we have finally risen from the abyss of hardware to the surface world of software. Having achieved this, all that remains (to misquote Winston Churchill) is the proper application of overwhelming abstraction.

Assemblers & virtual machines

Machine language, though powerful, suffers from a significant flaw: no one wants to write machine language.

Thankfully (these days), practically no one has to.

The first layer we can add on machine language is ditching the ones and zeroes for something marginally more readable, but keeping a one-to-one mapping between statements in our programming language and machine instructions. For example, instead of “0000 0000 0010 1011” to load the contents of memory location 43 into the address register, we write “LOAD 43”, and use a program that converts such statements to the machine language equivalents (if such a program does not exist yet, we have to do it manually).

We can also write programs that let us define variables as stand-ins for memory addresses or data values, and then convert these to the corresponding memory locations for us. A massive benefit for the programmer is also ditching the insistence on one-to-one correspondence between lines and machine instructions. A single statement in a high-level language translates into many machine language instructions.

Programming languages that retain a strong correspondence between statements and the computer’s machine language are termed assembly languages. The program that performs the work of converting an assembly language into machine language is called an assembler.

In general, a program that converts another program written in language A into a version that runs in language B are called compilers. The process of running any program eventually ends with it being compiled, often through many intermediate steps, into machine language.

Virtual machines

Often, we want our programs to work not just on one processor type and one computer, but on many computers with different processors and hence possibly different underlying instruction sets.

One way to achieve this is to have a specification for a virtual machine (VM). For any language that we want to implement, we write a compiler that converts that language into instructions that the VM can understand. For all computer platforms we need to support, we write a compiler that converts the VM instruction language into the machine language of that computer platform.

This means that whenever we want to add a new programming language, we need to write only one compiler (language -> VM), rather than having to write separate compilers for every platform (assuming VM -> platform machine language compilers exist for every platform). The standardised VM specification serves as a station through which all programs on the journey to execution.

(The Java Virtual Machine (JVM) is one famous – or infamous – example of a VM.)

There are two main models for a VM: register-based and stack-based. Nand2Tetris implements a stack-based VM.

A stack is a data structure on which we have two operations: push, which adds an element to the top of the stack, and pop, which returns and removes the topmost element. We also assume we can implement functions that pop two elements off the stack, add them together, and push the result onto the stack.

The first surprising fact about stacks is that it is possible to compute any arithmetic expression with stacks. From Nand To Tetris gives an example:

Likewise, we can determine the truth or falsity of logical expressions by implementing, for example, an equal-to function that pops two elements of the stack, pushes “true” to the stack if they’re equal, and pushes “false” otherwise (boolean values may be implemented as false being 0 and true being any other value, for instance).

The second surprising fact about stacks is that it also lends itself naturally to modelling program flow.

Consider what flow control must do. Most simply, the flow must split if we have an if-statement. Even our machine language can handle this with its jump commands.

The trickier part is calling functions. If a program encounters a function call while executing, the program should now switch to running the commands that make up that function, but not before figuring out what the arguments to the function are supposed to be. After the function has finished executing, its value should be returned to the expression that was executing when it was called, and execution should continue where it left off before the function call.

All of this gets more complicated when we consider that the function we call may itself call many different functions and these functions may call the original function yet again, or that some functions may call themselves, and so on. How can we possibly implement such complicated logic with a stack machine?

Consider – what is the natural representation of a nested series of function calls? A stack. We want the ability to add things to the top of the called-function stack when new functions are called (pushing), and removing them from the stack when they finish (popping).

The details of implementing function calling is rather complex. The Nand2Tetris implementation involves maintaining various special memory segments, used for things like function arguments, local variables, and so on, which I will not discuss here.

It should be noted that though VMs are a useful abstraction, they are not strictly necessary and many compilers still work by compiling directly to machine language.

High-level languages & operating systems

Now that we have our VM, we can start implementing high-level programming languages (ones that are made for humans to write, not computers to execute) on top of it.

The implementation process typically looks like writing a specification for a language and then writing a compiler that translates the new language into VM or machine code that does what the language specification says it should do. Alternatively, we can have an interpreted language, where a lower-level programming language determines the values and effects of statements in the new language on the fly, though this is typically far slower.

Armed with a high-level language, we can now implement an operating system in a reasonable amount of time and without driving the developers crazy (imagine writing 100 000 lines of code for a stack machine, let alone in machine language).

(However, to implement an operating system, it is also important that the programming language is sufficiently low-level that we can deal with hardware with the required finesse.)

The task of an operating system is to provide commonly-needed functions like mathematical operations (in the Nand2Tetris platform, only addition is implemented in hardware, so even multiplication must be implemented in software), string handling, parsing of input from other devices, sending instructions to other devices, and allocating time on the processor. Imagine if the programmer had to specify exactly which pixels on the screen should be black to render the letter ‘a’ – not fun.

We’ve come a long way – and skipped steps here and there for brevity – but now we’re finally ready to implement Tetris.


Credit to Randall Munroe of xkcd. Original can be found here.

Tetris not included

The book’s website has free software for implementing (a software emulation of) everything discussed in the book. In theory, this allows the reader to build (an emulation of) the computer hierarchy specified in the book, all the way from logic gates to a simple Pong game (in an example of patently criminal false advertising, Tetris is never actually implemented in the book).

In practice, I found the software to be rather slow and finicky to use (it might run better in Windows). Also, if you follow along with the book, you won’t actually build a self-contained computer emulation yourself, but rather go through the implementation level by level, implementing each level yourself but using the pre-built implementation for all levels below. This is a rather trivial difference, but I find it does remove some of the motivation from the project (though I am sure that doing so would still be of enormous educational value).

After deciding to ditch the companion software, I used Racket (a language in the Lisp family that comes with a very friendly IDE) to write code that emulates logic gates. Once this was no longer sufficient, I followed the book with circuit diagrams for a few more chapters, after which I resorted to just reading (often many times, until things sunk in).

I found the first part of the book to be the most interesting (you may have noticed that my discussion of chips and ALUs was much more in-depth than that of later topics; rest assured that this is my own personal bias rather than the book’s).

The primary value of From Nand to Tetris is that it dispels any sense of magical mystery surrounding the functioning of a computer. After you’ve designed the hardware and implemented a machine language, all that follows is the implementation of ever fancier languages and programs on top of those. If you have decent programming experience, though you may have no idea about the details, it does not require you to suspend much disbelief that such a thing is possible. I read From Nand to Tetris after having already spent a lot of time programming, but having only the most rudimentary understanding of logic gates and no clue what DFFs were; hence the first chapters were eye-opening, and while the later chapters contained a wealth of new information, they weren’t captivating in the same way.

I wholeheartedly recommend From Nand to Tetris for people who have done some programming and want to see how such a miracle can be implemented in hardware in the first place. The first part of the book is remarkably self-contained, assuming no knowledge of electronics (logic gates and DFFs are assumed as primitive objects and the details of their construction not discussed), and though prior exposure to boolean logic is helpful, it is not necessary.

The key point to understand with From Nand to Tetris is that it’s goal is to get you from NAND-gates and DFFs to operating systems and high-level languages as quickly as possible. It almost entirely omits discussion of alternative design decisions, though entirely different architecture possibilities exist on virtually every step of the journey. Optimisation is not discussed. If you want an in-depth understanding of some part of the journey, you are better off reading something else.

If you want to see how some steps are even possible, or to see one complete example of what the entire computer technology stack looks like at a (300-page) glance, then From Nand to Tetris is an impressively complete and concise guide.

If you’re ever stuck on a desert island with nothing but a large supply of NAND-gates and DFFs, electrical wiring, a copy of From Nand to Tetris, and a lot of patience, don’t worry – just build a computer, program it, and play Tetris to pass the time.