On this page:
The Rational Programmer
Language Design, Rationally
Teaching Programming, Rationally
Comparing Tooling, Rationally

1 The Rational Programmer🔗

Christos Dimoulas & Matthias Felleisen

31 Aug 2021



Mainstream programming language research rests on many “obvious” claims about programming that, upon closer inspection, turn out to be not all that relevant to the “developer on the street.”

Correctness matters. We all know that every non-trivial software system comes with bugs, and neither developers nor their employers care too much about bugs that rarely if ever affect users. Indeed, software systems don’t even have proper specifications, so why should correctness matter.

But, you say, compiler correctness must matter. Does it? How many software projects died because the underlying programming languages didn’t come with formally verified implementations? Developers have worked around compiler bugs for as long as compilers and programming have been around.

Performance matters. Would people really program in Python if performance mattered that much? This is not to say that performance doesn’t matter. It matters as much as correctness. Get lots of “it” right, and developers will be happy.

The performance argument is as old as compilers. And yet, developers have steadily adopted languages at increasingly high levels of expressiveness and deceasing levels of performance. After a long period of Fortran and C programming, developers moved to C++ and then Java. Over the past 20 years, JavaScript and Python have made tremendous gains in mindshare; now it’s TypeScript’s turn and optional typing in general; even Python comes with features that resemble types.

How much of this development is due to programming language research? Some, there is no doubt. Considering the proportion of research effort that goes into certain topics, however, it seems like very little of what programming language research serves up matters, even given the long lag between invention and industrial adoption.

So why do developers switch from one language to the next? Pragmatics.

When a linguist says “pragmatics,” it is a reference to a contextual use of language. Developers understand this idea at an intuitive level. If the context is the early 2000s and the web browser, a programmer can’t use Fortran. If a developer wishes to communicate thoughts about code to other developers, Assembly won’t do. If a biologist-programmer needs a script for analyzing some experiment’s data, he won’t reach for JavaScript. Platforms and libraries, the tool chain and clarity of expression matter most in the real world of software development.

Developers want to balance development speed with delivery quality. The former covers both the creation of software and altering existing software, often just called “maintenance.” Delivery quality comprises many factors: reasonably correct functionality, sufficient speed, affordable cost, and benefits from being “first.”

And that brings us to a question that programming language research has avoided for several decades:

how can we study the pragmatics of programming languages?

As Jim Morris noted a long time ago (1967) in his dissertation, language researchers must understand the syntax, semantics, and pragmatics of their objects of study. Sadly, he himself identified the behavior of an executing machine as the pragmatics—a notional machine so to speak—of a language. But we all know that this is just a different form of semantics and therefore doesn’t seem to qualify as a true exploration of pragmatics. And ever since, most language researchers have essentially been happy to identify semantics and pragmatics, to the detriment of language users in the world of software.

The Rational Programmer🔗

The research question is similar to the one economists faced when they began to think about the effectiveness of policies—the pragmatics of government. In this context, pragmatics is to be understood as the way government interacts with, and speaks to, citizens, including the implicatures of these conversations. From the early days, economists figured they had to understand people’s economic behavior, first at the level of individuals and then at an aggregate level.

A century back and some, John Stuart Mills proposed to make up a “model human being,” homo economicus.Although Homo economicus is the foundation of classic economics, models resting on it can explain only some economic phenomena and miss many others. Behavioral economicsdue to Kahnemann and Thaler—starts from the alternative assumption, namely, that economics must study the non-rational decision-making process. This approachj is complementary to classic economics and is able to explain economic phenomena that escape the classicists. See below. He postulated that this economic being acted rationally, using all available information to make beneficial decisions in the realm of economics. Much later, economists identified benefit with maximal profit or minimal cost. To this day, homo economicus is a major assumption of all (neo)classical models of economics, and the assumption has furthered the development of the discipline.

In translation, programming languages needs investigative tools beyond mathematical modeling and performance evaluation. Here we propose the foundation of such a tool: the rational programmer. Like homo economicus, the rational programmer is an idealization—a programmer who exploits all available information about a program and its execution to make progress on some task. The next section illustrates how we can use the rational programmer to compare a modern language design—gradual typing—in the context of a particular task.

At a high level, the rational programmer is a computable process that attempts to complete a programming task. It does so by transforming a program either step-by-step or in one go. Specifically, given a utility function U, the goal of a rational programmer RT is to turn a program p that, according to U, has utility below a threshold TH to a program p’ that has utility above TH. Both the utility function U and the rational programmer RT depend on a feature extraction function F. Such a function plays the role of abstracting the information related to the program (and its evaluation) that is available to the RT as it attemptes to complete its task. Putting these pieces together a rational programmer RT can be described as a function RT[U,TH, F](p) where U[F](p) < TH and U[F](RT[U,TH, F](p)) > TH.

Language Design, Rationally🔗

The design of a language (feature) consists of two pieces: syntax and semantics. Hence, if the same syntax comes with two different semantics, a developer may wonder in which manner each semantics supports the various aspects of software development, which is “use in context” and thus pragmatics.

While a programmer edits code, the static semantics—variable declared, types specified and checked—plays the dominant role. If a piece of syntax comes with two different static semantics, an exploration of the editing practicalities is of interest.

While a programmer debugs code, the dynamic semantics—observable behavior—is key. If the syntax comes with two dynamic semantics, the ways the language is used will differ with respect to several development activities: testing, finding the logical source of runtime exceptions, performance measurements, etc.We consider gradually typed languages with manifest types as ill-suited for gradually migrating an untyped code base to the typed world or even just optionally annotating code with types.

Let’s make this idea concrete. Research and development of gradually typed languages has produced three major different forms of semantics for systems with first-class objects and convenient ways to connect typed and untyped code fragments:
  • no runtime checking of types

    This semantic approach emphasizes the editing benefits of types. As a developer edits code, an IDE can use type information with many tasks, especially name completion and checking for typo-level mistakes.

  • wrapper-based runtime checking of types

    While this approach comes with all the benefits of the first one, it also aims to provide the runtime benefits of types. Specifically, a sound type system prevents runtime conflicts between operations and data. In a context where typed and untyped code fragments co-exist with boundaries between them drawn in arbitrary ways, all kinds of values flow back and forth between those. The question becomes how to realize type soundness.

    Static checking is no longer enough, because an untyped piece of code can call a method on an object that originates from a typed piece of code and who knows whether this method exists or whether the arguments conform to the specified types. Hence, the original academic designs of gradual typing come with a semantics that wraps objects as they flow from typed to untyped code and vice versa. These wrappers can then perform runtime checks that correspond to static type checks.

  • first-order runtime checking of types

    Wrapping objects is obviously expensive in terms of both time and space. The question is whether the tests inside of wrappers can somehow be distributed over the code and thus avoid the cost of wrapping. The so-called transient semantics of gradual type systems accomplishes just that and seems to offer similar guarantees.

When typed and untyped code are in conflict—say, untyped code calls an integer-typed method with a string—these semantics exhibit different behavior. The first one relies on the checks in the underlying untyped language implementation to eventually expose this mismatch; it may or may not do so. In the second one, the string enters the wrapper, which checks its integerness and then signals a type mismatch. Finally, the transparent semantics has an intergeness check at the entry point of the method, which also catches the problem. If the argument is an object itself, however, the two semantics exhibit dramatically different behavior.

More generally, the two semantics with runtime checks discover problems at different times and report different information about the failure. The wrapper semantics can pinpoint the exact place where typed and untyped code “agreed” to exchange an object under certain expectations with respect to the failed type tests. The first-order checks do not support such precise reporting of information.

Even at this point, it is completely clear that pragmatics is not (just) semantics. A “student” of programming language pragmatics would now ask the question

how a semantics serves the developer in the context of debugging a type mismatch.

Specifically, he might wish to figure out which semantics is most helpful in finding the source of the runtime exceptions that indicate a conflict between typed and untyped code. And that is pragmatics.

A rational-programmer investigation can answer this question (to some extent). Here is how. Imagine running a gradually typed code base with a known type mismatch, that is, an untyped piece of code treating a value from the type side in an inappropriate manner. Thus the above question is specialized to this:

Which of the semantics helps discover the problem and track down its source?

The “rationality” of the postulated programmer turns this re-statement into two truly operational questions:
  1. What information can or should the rational programmer act on?

  2. How can this information be used to find the source of the problem?

Let’s take a look at the three distinct semantics from the “information” angle:
  • The no-runtime-checking semantics defers to the underlying untyped programming language to catch type mismatches. In the simple scenario from above an integer-typed method may stash away the given string in some array and some other method may try to use it later as an int. It is at that point that the underlying machinery signals an error and points the developer to the runtime stack.

    In short, the rational programmer must use the runtime stack information to find the actual source of the problem as opposed to its symptom. The astute reader might be wondering now whether the runtime stack is useful in our example or in general.

  • The semantics using wrapper-based runtime checks realizes a rather different behavior. For plain types, such as int, it checks the argument value as soon as it crosses from untyped code to typed code. So in the example from the previous bullet, the presence of a string is discovered immediately not when some other method retrieves the string from some container.

    A starker illustration requires a modified example. Imagine a method md that accepts an object and eventually calls some method on this object. Several things can go wrong now when an untyped code fragment calls md. The supplied object may not have the desired callback method. Even if the method exists, it may take a different number of arguments than expected. It may take the correct number of arguments but expect a different type of argument. The reader can easily imagine other variants of such mismatches.

    While the no-runtime-checks semantics can get lost in arbitrary ways for such a higher-order scenarios, the wrapper-based one can catch any form of mistake here. Indeed, wrappers can also enclose information about where the object crossed from one side of the code base to the other, and failing checks can display this information in the error message. The question is whether this information is helpful.

  • The semantics using first-order checks is directly comparable to the wrapper semantics for simple cases. Checks catch int-String mismatches immediately. For object-based mismatches, however, the semantics may discover a problem eventually and, for some classes of mismatches, it never does. Even if it does, there is much less information available about the source of the problem. There simply is no place to store comparable information about an object as it flows through the program.

At least at first glance, the three semantics seem to have a rather different impact on how developers debug type-mismatches they migrate a code base from the untyped world to the typed one. They catch these problems at the different execution stages—if they discover them at all—and provide different kinds of information about where the type mismatch comes from.

This brings us to the second question, namely, whether and how the developer can exploit this error information during debugging sessions. Since the errors are about type mismatches between typed pieces of code and untyped pieces, the most systematic way to eliminate them is to add more type annotations and to let the type checker discover where the developer makes faulty assumptions about the addition of types to a piece of code. The error message of each semantics—if any—comes with some information about where the mismatch was discovered and how it relates to the code base. More technically, it tends to identify a boundary between typed and untyped code that causes a mismatch. By typing the untyped side of this boundary, the developer ought to be able to discover the faulty assumption or at least get closer to its source.

Based on this analysis, we can see that these two questions are algorithmic in nature. That is, we can implement a rational programmer and run it on thousands and tens of thousands scenarios to find out whether the semantics differ effectively during debugging sessions. In other words, we can find answers to the original pragmatics question in an automated manner.

To go into details here would distract from the main goal of the essay. The interested reader should check out Lazarek et al.’s ICFP article on this precise idea. The basic insight a reader should take away from this section is this:

by assuming rationality in a programmer—using all available information during development activities—it may become possible to computationally simulate activities and to inspect a large number of scenarios automatically.

Doing so suggests what a well-trained programmer may do and may shed light on how different semantics compare from a pragmatic angle.

Teaching Programming, Rationally🔗

Programming languages exist so that people can create useful software systems. Hence, programming language research should inform programming, at both the level of experienced programmers as well as novices. That is, insights into programming language design should provide instructors of programming courses with insights on how to teach the subject.

This analysis raises two questions: Given the recent popularity of synthesis in the programming languages field, we could specialize the first question and ask it slightly differently. We leave this task and reflecting about this question to our readers.
  • what does mathematical modeling of programming languages and associated tools tell instructors of programming courses about their work?

  • what does the construction of compilers and optimizers tell instructors of programming courses about their work?

While the field provides some partial answers to these questions, they are unsatisfactory to people who have taught introductory programming for decades. The rational-programmer concept, however, seems to immediately provide ideas.

Let us look at debugging again, this time in the context of an introductory course. Debugging should be a rational activity, and it is definitely a task that programming instructors must teach. To make it all concrete, consider a simple example, taken from How to Design Programs (HtDP). One of the book’s early examples of “design by composition” is the add-polygon-to function, which solves the following problem:

Design a function that consumes a (representation of a) polygon together with an image and adds the former to the latter. Here is the data definition for Polygons:
; A Polygon is one of:
;  (cons Posn '[])
;  (cons Posn (tech "Polygon"))
; INTERPRETATION The Posns represent the corners of the
; polygon, in order.

Following the design recipe for function design, the first pieces needed are data examples. The first one is a list of Posns:

(list (make-posn 10 10) (make-posn 90 90) (make-posn 90 10))

Three Posns means the list represents a triangle. The second is a background image:


Just combining these two data examples yields a functional example:

Adding SP from figure 13 to image should yield image

because, thinking in terms of the yellow square, (10,10) is the Posn near the top-left, (90,90) is near the bottom right, and (90,10) is the Posn close to the top right.

; LANGUAGE: htdp/isl
(require 2htdp/image)
(define BG
  (rectangle 100 100 'solid 'yellow))
(define SP
  (list (make-posn 10 10) (make-posn 90 90) (make-posn 90 10)))
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; {Polygon Image -> Image}
(define (add-polygon-to lop background)
  (local ((define closed-path (add-at-end (first lop) lop))
          (define background+ (add-path closed-path background)))
; A ClosedPath is a Polygon where the first and last
; Posn are identical.
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; {ClosedPath Image -> Image}
(define (add-path lop background)
    [(empty? (rest lop)) background]
     (local ((define others (add-path (rest lop) background))
             (define link++ (link (first lop) (second lop) background)))
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; [Posn Posn Image -> Image]
(define (link fm to bg)
  (add-line bg (posn-x fm) (posn-y fm) (posn-x to) (posn-y to) 'red))
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; {X [Listof X] -> [NEListof X]}
(define (add-at-end x lox)
    [(empty? lox) (list x)]
    [else (cons (first lox) (add-at-end x (rest lox)))]))

Figure 13: Adding a polygon to an image

The point of the problem in HtDP is to introduce design by composition, that is, the design of a function that performs its computation by calling other functions—built-in operations or user-designed ones—and compose their results into the final answer. In this context, “composing” isn’t just the usual mathematical notion of function composition, but it means any way of using several designed functions.

Figure 13 displays the solution; the actual includes this dependency diagram, too:
; dependencies:
;  add-polygon-to
;      |
;      |      
;      |-> add-at-end
;      |
;      |
;      |-> add-path * bug here
;                |
;                |-> link
ItHere the word “design” is used in the spirit of HtDP. shows how the top-level function depends on three user-designed helper functions. The critical hint is to add the first Posn of the Polygon to its end, turning it into a (closed) ClosedPath and to render this ClosedPath into the given image.

HtDP also demands unit tests for each user-designed function. For function design by composition, the book recommends testing the functions “bottom up,” meaning the one without dependencies gets tested first, then those that depend on the just tested functions, and so on. Figure 14 displays the first few tests, all of which succeed.

(check-expect (add-at-end 'd '[a b c])
              '[a b c d])
(check-expect (link (make-posn 10 10) (make-posn 90 90) BG)
              (add-line BG 10 10 90 90 'red))
(check-expect (add-path (list (make-posn 10 10)) BG)

Figure 14: Testing a program, bottom up

The final and critical test, though, the one for add-polygon-to fails:DrRacket’s shows expression coverage and that alone clarifies immediately where the error is. We ignore this aspect here but a rational, information-maximizing programmer would not.
(check-expect (add-polygon-to SP BG)
              (let* ((s BG)
                     (s (add-line s 10 10 90 90 'red))
                     (s (add-line s 90 90 90 10 'red))
                     (s (add-line s 90 10 10 10 'red)))
A failing test case tells us that something’s wrong with the collection of function, and the challenge is to find the bug.

What is a rational programmer to do?

In this case, maximizing information refers two different ideas: (1) the failing test case and (2) the design of the function itself. The first one is given. The second one is “by composition,” meaning the buggy function calls other functions, each designed systematically. Rationality suggests that the bug is either a part of the function under investigation or one of the functions it calls. Furthermore, rationality tells us that the composition is at fault only if the called functions work properly for the given inputs.

The rational approach is to derive test cases for the called functions from the failing test case of the calling function. Derive means to construct the inputs of the new test case from the inputs of the failing test using the shape of the embedded function call; the rational programmer must also calculate the expected output for these inputs. Once these derived test cases exist, the rational programmer re-runs the test suite to see whether the derived tests work properly.

By looking at the two composed calls, our rational programmer can create two test cases from the failing one: the first one for the call (add-at-end (first lop) lop) and the second one for the call (add-path closed-path background). The construction of the first derived test is straightforward; the second one takes a bit of symbolic manipulation of the program.A human programmer would take a short cut here. Here are the results of these derivations:
  (add-at-end (make-posn 10 10) SP)
  (list (make-posn 10 10)
    (make-posn 90 90)
    (make-posn 90 10)
    (make-posn 10 10)))
  (local ((define closed-path (add-at-end (make-posn 10 10) SP)))
    (add-path closed-path background))
  (let* ((s BG)
         (s (add-line s 10 10 90 90 'red))
         (s (add-line s 90 90 90 10 'red))
         (s (add-line s 90 10 10 10 'red)))
The first test checks whether add-at-end works correctly for the given polygon. It succeeds. The second test checks whether drawing the closed path into the given background image yields the desired triangle. It doesn’t, that is, this second test fails.

Our rational programmer now understands that add-path is buggy too. He hasn’t ruled out yet that there are two errors: one in add-poylgon-to and one in add-path. But he knows that there’s one in the second function or its dependencies.

(define BG+1link
  (let* ((s BG)
         (s (add-line s 10 10 90 90 'red)))
  (link (make-posn 10 10) (make-posn 90 90) BG)
(define BG+2links
  (let* ((s BG)
         (s (add-line s 10 10 90 90 'red))
         (s (add-line s 90 90 90 10 'red)))
  (link (make-posn 90 90) (make-posn 90 10) BG+1link)
(define BG+tri
  (let* ((s BG)
         (s (add-line s 10 10 90 90 'red))
         (s (add-line s 90 90 90 10 'red))
         (s (add-line s 90 10 10 10 'red)))
  (link (make-posn 90 10) (make-posn 10 10) BG+2links)

Figure 15: Derived tests for link

The add-path function depends on one user-designed function: link. To rule out a bug in this dependency, the rational programmer once again derives tests for the dependency (link) from the buggy test for add-path: see figure 15. The tests add one side of the triangle, two, and all three, respectively. All of them pass.

At this point, the rational programmer knows that add-path is guaranteed to have a bug. In other words, the composition of add-path (recursively) and link does not work properly. Looking closely, the programmer discovers that link should be called on others not backgrounda common mistake. Once this problem is eliminated, all tests pass, including the one for add-polygon-to.We consider systematic design also a consequence of the rational-programmer assumption, but illustrating this idea is not straightforward.

In sum, this second example suggests that

assuming rationality in a programmer—using all available information during development activities—can inform the pedagogy of programming. Specifically, systematic design (a la HtDP) naturally yields systematic debugging.

Note how this statement does not claim that novice programmers are rational. Instead it says that the rational-programmer assumptions informs instructors of novices how to teach debugging (and perhaps bring the thinking skills of novices in line with some rationality).

Postscript A “real” programmer would not waste bits on writing three helper functions when all of this functionality can be implemented with a function that uses a single “loop” (or comprehension). Figure 16 displays this variant of add-polygon-to. The variant includes a bug that is analogous to the one above.

How should this “real” programmer tackle the bug search in this context?

; {[Listof Posn] Image -> Image}
(define (add-polygon-to lop background)
  (define poly (append lop (list (first lop))))
  (define-values (background+ _)
    (for/fold ([img background] [from (first poly)]) ([to (rest poly)])
      (define x.from (posn-x from))
      (define y.from (posn-y from))
      (define x.to   (posn-x to))
      (define y.to   (posn-y to))
      (define img++  (add-line background x.from y.from x.to y.to 'red))
      (values img++ to)))

Figure 16: The “concise” or “real” version of add-poly-to

No, this function isn’t very large. Indeed, to allude to the possibility of a large function, we named all coordinates explicitly instead of using nested expressions. But every instructor knows that the standard curriculum introduces loops first, functions second. Worse, it continues to spread the nonsense that function calls are expensiveSo why are these people using Python then?, pushing student towards writing large, unmanageable functions—which cannot be debugged rationally.

Comparing Tooling, Rationally🔗

Software Developers no longer distinguish between a programming language itself and the tool chain it comes with. Of course a language comes with a compiler.Some have only an interpreter. We ignore those here. But, a language must also come with support for a developer’s favorite IDE. It must have a debugger. It must supply a profiler. Indeed, to an individual developer there is often the IDE, the debugger, the profiler, and so on.

In the meantime, language researchers work on enhancing compilers, lifting the level of debugging, and experimenting with alternative ways of profiling code. For compilers, researchers can often show that for some collection of well-known benchmark an enhanced compiler performs better on many programs, but not all. For other tools, however, they face the same problem as language designers: they often cannot validate that developers would be better off with one or the other variant of the same tool.

For a concrete example, consider the task of profiling