23 --- CEK, CESK

7.7.0.3

23 — CEK, CESK

Tuesday, 31 March 2020

Presenters (1) Kaylin Devchand & Kate Rupa (2) Sean Wallace, Zachary Wolfe

Our next step is to re-introduce variables. First we deal with variables à la mathematics and then with assignable ones.

CEK

The syntax extension is the familiar one:

(struct decl [var rhs body])

e	=	...
	\|	x
	\|	[decl x e e]

x	=	Variables

That is, the language is extended with variable occurrences and with variable declarations. The scope of a variable in a decl syntax node is only the body sub-expression. Let’s refer to this language as VarExpr.

We know from our work with interpreters that variable expressions must be interpreted within the scopes of all of its variable occurrences.

An expression is represented with abstract syntax trees, and a scope is represented with an environment. Even if we ignore the exact nature of environments we can still equip the existing machine rules for ArithmeticExpr with an environment register; see figure 87. The side conditions on these expression are as before, which is why they are omitted.

current

next
C

E

K

C

E

K
(e1 o n2)

E

K

e1

E

K, (L E o n2)
(e1 o e2)

E

K

e2

E

K, (R E o e1)
(n1 o n2)

E

K

n

∅

K
n

∅

K, (L E o n1)

(n o n1)

E

K
n

∅

K, (R E o e1)

(e1 o n)

E

K
Figure 87: A Tabular Description of the CEK Machine

Here is how these rules differ from those of the CK machine:

the machine has an environment register, which holds values for all variables in the control (C) register;
when a sub-expression gets shifted from the C register to the K register, it must be accompanied by the current E—because the scope of an expression never changes;
when the machine returns a number, the value of E is ∅, the empty environment, because numbers aren’t variables;
when a return state is merged with a frame from the control context, the machine puts a possibly full-fledged expression in the C register and therefore it must put the corresponding E (from K) back into the environment register.

Thus far the environment was just along for the ride, without being used in any way. Clearly, before the machine can use the environment to find the value of variables, such values need to get into the environment. And this can happen only when the machine evaluates a decl tree. Specifically, the machine must evaluate the right-hand side sub-expression and remember to put its value into the environment; after that happens, it may interpret the body sub-expression of the decl tree.

So, the first concept we need is an extension of Frame:

(D E x e) is a frame that denotes (decl x (--) e). When a value v shows up in a return state, this context specifies that it is the value of x .

The second concept is an environment:

E is a sequence of variable mappings, each of which is written as (x ↦ v). We use ∅ for the empty environment. The notation (E+ (x ↦ v)) means a variable binding for x is added or replaces an existing one.

current

next
C

E

K

C

E

K

if
(decl x e1 e)

E

K

e1

E

K, (D E x e)

n

∅

K, (D E x e)

e

E, (x ↦ n)

K

x

E

K

v

∅

K

(x ↦ v) ∈ E
Figure 88: The CEK Machine Transitions for Vars and Decls

With these in place, we can formulate the three rules that explain how the CEK machine deals with variable declarations and references to variables; see figure 88. Let’s look at them in order:

The transition rule for a decl tree shifts the rhs field to the control register and remembers the rest of the expression with the new kind of D frame on the K.
The introduction of a new kind of frame demands a new “return state” transition. The second transition rule takes care of the new frame on K by shifting the body field of the decl struct (as remembered in the frame) to control register; the returned value gets put into the environment E as the value of x (also as remembered in the frame).
The third rule deals with variables in the control register. The value of the variable sits in E, the environment register. So the machine extracts the value and places it into the control register.
Stop! Why is the environment register set to ∅?

Added post lecture

The Nature of CEK States

The specification of the CEK machine leaves implicit how to load a program into the machine—establish an initial state—and how to recognize a final state and unload it:

An initial state is a complete program in which all variable references point to a decl declaration plus an empty environment and an empty stack. The stack register must obviously be empty. But why is the environment empty, too?
A final state has a value in the C register and an empty stack in the K register. The content of the environment register does not matter.

In general, the environment register must provide meaning—a value—for all variable references without corresponding declaration in the code register. Environments in a stack frame explain the meaning of the variable references without decl in its expressions.

Stop! How can we get back to a form of CK machine?

Stack vs Tree

In the previous lecture we observed that we can use a stack to represent the rest of the instructions, that is, those instructions that must be executed when the instructions in the control register are all done. This remains the case for the language with declared variables.

At first glance, the content of the environment register seems to only grow; at least only one transition rule adds (or replaces) an (existing) association of a variable with a value. But this impression is wrong because (1) a return state wipes out the current register and (2) every time the machine pops a frame from the stack register, it extracts an environment E and sticks it into the environment register.

Nevertheless, with only decl in the language, the expression is evaluated exactly once and we know that it is evaluated in the context of its original stack. Hence, one could “optimize” this machine and place variable bindings onto this stack until the instruction in decl’s body are all run. At that point, it would be safe to pop the variable binding from the stack.

This observation was exploited in the early days of compiler work. Once a stack was accepted (which took two decades), it was used to represent both the control context and the variable context. Indeed, language designers turned this idea around and rejected language features for a long time if they destroyed this unity of control and variable information.

Procedures are one critical example of this kind. They—not the closures or objects found in modern languages—were added to languages in a rather restricted manner. C is one such traditional language, and Rust has in some ways inherited this idea.

Adding closures or objects destroys this unity and demands an environment representation that resembles a tree.

Adding First-Class Functions

Here is our old syntax for adding (non-recursive) functions and function calls:

(struct fun [parameter body])
(struct cal [fun arg])

e	=	...
	\|	[fun x e]
	\|	[cal e e]

The addition of functions to our language calls for three changes. First, the evaluation of a function call demands two new kinds of stack frames:

one for evaluating the argument, and it must remember the function expression and its environment;
one for evaluating the function expression, and it must remember the argument value.

Second, the set of values must include closures, pairings of fun ASTs with environments. Third, the transition that realizes a function call must shift the closure’s encapsulated environment into the environment register and extend it with the value for the parameter.

Stop! Work out the details on paper and pencil.

Implementing the CEK Machine

The code in figure 89 uses the same environment that we always used. Otherwise it is a transliteration of the transition rules into the usual code schema.

Lectures/23/cek.rkt
  #lang racket

  (require "../21/while.rkt" "../22/stack.rkt")
  (require "../6/environment.rkt")
  (require "simple-show.rkt")

  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  (define LEFT "left")
  (define RGHT "right")

  (define DECL "let")

  (struct decl [var rhs body] #:transparent)
  #; {type VarExpr = ... || [decl Var VarExpr VarExpr] || Var}
  #; {type Var = Symbol}
  (define var? symbol?)

  #; {type Env : mapping Var to Val}
  #; {type Val = Number}
  (define val? number?)

  #; {VarExpr -> Number}
  ;; print each step of the calculation that reduces `expr` to a number
  (define (driver initial)
    (define-values (*C *E *K) (load initial))
    (show-state *C *E *K)
    (while (not (final? *C *E *K)) do
      (set!-values (*C *E *K) (transition *C *E *K))
      (show-state *C *E *K))
    (unload *C *E *K))

  #; {VarExpr Env Stack -> VarExpr Env Stack}
  (define (transition C E K)
    (match* { C E K }
      [{(? var?) E stack}
       (define val (lookup C E))
       (show-return-state val '[] stack)
       (values val '[] stack)]
      [{(decl x ae bdy) E stack }
       (values ae E (push stack (list DECL x bdy E)))]
      [{(? val? n) E (app pop `((,(? (is? DECL)) ,x ,bdy ,E-d) ,K))}
       (values bdy (add x n E-d) K)]

      [{(list(? not-number? ae_1) o (? number? ae_2)) E stack}
       (values ae_1 E (push stack (list LEFT o ae_2 E)))]
      [{(list ae_1 o (? not-number? ae_2)) E stack }
       (values ae_2 E (push stack (list RGHT o ae_1 E)))]
      [{(list (? number? l) o (? number? r)) E stack }
       (define n (reduce C))
       (show-return-state n E stack)
       (values n mt stack)]
      [{(? val? n) E (app pop `((,(? (is? RGHT)) ,o ,ae_1 ,E-r) ,K))}
       (values (list ae_1 o n) E-r K)]
      [{(? val? n) E (app pop `((,(? (is? LEFT)) ,o ,ae_2 ,E-l) ,K))}
       (values (list n o ae_2) E-l K)]))

  #; {VarExpr -> VarExpr Env Stack}
  (define (load ae) (values ae '[] mt))

  #; {VarExpr Env Stack -> Number}
  (define (unload n _1 _2) n)

  #; {VarExpr Env Stack -> Boolean}
  (define (final? control E stack)
    (and (val? control) (equal? '[] E) (equal? mt stack)))

  #; {(list Number o Number) -> Number}
  ;; okay this is a trick, but almost every language has this trick
  (define ns (make-base-namespace))
  (define (reduce ae)
    ( (eval (second ae) ns) (first ae) (third ae)))

  (define not-number? (compose not number?))

  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide driver decl)

Figure 89: Implementing the CEK Machine

CESK

Adding assignments to VarExpr does in principle not require a store. Every decl is evaluated exactly once. Hence, if we systematically rename variables (modern parlance “refactor”) variables before evaluation, we can make sure that no variable name is ever in scope twice. Updating the environment directly would suffice; indeed, we could get away with updating it functionally.

The point of a model is to express realistic ideas with minimal effort. We know that first-class functions or objects would demand the separation of environment and store. Instead of those, however, we pick allocated structures as representative and yet slightly simpler linguistic constructs that demand such a separation. As we proceed, imagine the presence of the complicated ones, too.

These infix forms make it easy to modify the CEK machine without changing the environment or the stack. Here is the language extension—dubbed Expr—in terms of syntax:

e	=	...
	\|	(e alloc e)
	\|	(e dot left)
	\|	(e dot right)
	\|	(e setleft e)
	\|	(e setright e)

Syntactically these language extensions merely add new forms of “arithmetic (infix) expressions” although we know of course that their meaning is nothing like those of ordinary arithmetic operations.

Semantically, alloc puts two values into the store. The two are treated as a pair. The dot expressions retrieve the left or right part of such a pair. The set... expressions modify the left or right part of the pair. The setters return the old values of the fields of a pair that they modify.

The representation of the store is like that of the environment:

Locations are taken from an unspecified set. Given a store, we can always pick a new location for now. l+1 means a location next to l.
S stands for a store, a sequence of location mappings, each of which is written as (l ↦ v). The notation (S+ (l ↦ v)) means a location binding for l is added or replaces an existing one.
We use ∅ for the empty store.

Like numbers, Locs are values. We use v when we mean an arbitrary value, number or location; n1 and n2 mean number only; and l is a location.

As above, we first introduce stores into the machine; see figure 90 So far, the store is simply along for the ride.

current

next
C

E

S

K

C

E

S

K
(e1 o v2)

E

S

K

e1

E

S

K, (L E o v2)
(e1 o e2)

E

S

K

e2

E

S

K, (R E o e1)
(n1 o n2)

E

S

K

n

∅

S

K
v

∅

S

K, (L E o v1)

(v o v1)

E

S

K
v

∅

S

K, (R E o e1)

(e1 o v)

E

S

K
(decl x e1 e)

E

S

K

e1

E

S

K, (D E x e)
v

∅

S

K, (D E x e)

e

E, (x ↦ v)

S

K
x

E

S

K

v

∅

S

K




where (x ↦ v) ∈ E
Figure 90: A Tabular Description of the CESK Machine

But, once we have a machine with a store, we can use it to interpret the additional “arithmetic” operations in Expr; see figure 91.

current

next
C

E

S

K

C

E

S

K
(v1 alloc v2)

E

S

K

l

∅

S, (l ↦ v1, l+1 ↦ v2)

K




where l and l+1 are not in S
(l dot left)

E

S

K

v

∅

S

K




where (l ↦ v) ∈ S
(l dot right)

E

S

K

v

∅

S

K




where (l+1 ↦ v) ∈ S
(l setleft v1)

E

S

K

v

∅

S, (l ↦ v1)

K




where (l ↦ v) ∈ S
(l setright v1)

E

S

K

v

∅

S, (l+1 ↦ v1)

K




where (l ↦ v) ∈ S
Figure 91: The CESK Machine Transitions for Mutable Pairs

Here are interpretations of these five new rules:

the alloc transition reserves two neighboring places in the store and places the given values there;
the dot left transition retrieves the current value from the first slot of the given pair (location) that the corresponding allocation reserved;
the dot right transition retrieves the current value from the second slot of the given pair (location) that the corresponding allocation reserved;
the setleft transition sets the first slot of the (location of the) given pair to the right-hand value of the operation;
the setright transition sets the second slot of the (location of the) given pair to the right-hand value of the operation.

The notation l+1 suggests that locations are numbers, but this is not necessarily so. In some languages this is the case. If such languages also lack (type) soundness, it is even possible to “confuse” numbers that represent locations with actual numbers, multiply or divide two such locations (meaningless operations!), and get into all kind of trouble. C, C++, and Objective C are still-prominent languages that take this view.

Example Let’s take a look at the a trace of a sample program in a language with allocation:

*C	*E	*S	*K
(2 * ((1 alloc 2) dot left))	∅	∅
((1 alloc 2) dot left)	∅	∅	[R ∅ * 2]
(1 alloc 2)	∅	∅	[R ∅ * 2], [L ∅ dot left]
0	∅	0 ↦ 1, 1 ↦ 2	[R ∅ * 2], [L ∅ dot left]	pop
0	∅	0 ↦ 1, 1 ↦ 2	[R ∅ * 2], [L ∅ dot left]
(0 dot left)	∅	0 ↦ 1, 1 ↦ 2	[R ∅ * 2]
1	∅	0 ↦ 1, 1 ↦ 2	[R ∅ * 2]	pop
1	∅	0 ↦ 1, 1 ↦ 2	[R ∅ * 2]
(2 * 1)	∅	0 ↦ 1, 1 ↦ 2
2	∅	0 ↦ 1, 1 ↦ 2		pop
2	∅	0 ↦ 1, 1 ↦ 2

The machine pushes the context of the alloc expression on the stack and place 1 and 2 into the store in neighboring locations. Here the store uses natural numbers as locations, and hence the alloc expression returns the first to the two locations as its result. This location is then placed into the context ((--) dot left), which extracts the first value: 1. Finally, the machine executes the multiplication expression.

Stop! Is the CESK interpretation of Expr sound?

Wait! There are no types, what could “type sound” mean here? In the absence of types, sound means the “interpretation” of all programs has an expected outcome: a result, a loop, or an erroneous situation that we accept, such as not being able to add a number to a function. And this is the hint.

Can the machine produce nonsensical results? Should it get stuck in some states and doesn’t even though it’s neither final nor really defined for the current state?

This is one of two topics we will discuss next time.

Implementing the CESK Machine

Figure 92 displays a fairly standard implementation. The key difference between the CESK machine and the CEK machine concerns the transitions for alloc, dot left, dot right, setleft and setright. Some fit the standard format of infix arithmetic notation (intentionally so); the one for dot notation doesn’t and therefore needs two special-case transitions.

Lectures/23/cesk.rkt
  #lang racket

  (require "store.rkt")
  (require "../21/while.rkt" "../22/stack.rkt" "../6/environment.rkt")
  (require "show-with-store.rkt")

  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (define LEFT  "left")
  (define RGHT "right")
  (define DECL  "let")

  (struct decl [var rhs body] #:transparent)
  #; {type Expr = ...
           || [Expr alloc Exp]
           || [Expr dot left]
           || [Expr dot right]
           || [Expr setleft Expr]
           || [Expr setright Expr]}
  #; {type Var = Symbol}
  (define var? symbol?)
  #; {type Env mapping Var to Val}
  #; {type StacK = [Listof Frame]}
  #; {type Frame = [list {LEFT || RIGHT || DECL} Env X Expr]}
  #; {type Loc}
  #; {type Store mapping Locs to Val}
  #; {type Val = Number || Loc}
  (define (val? x) (or (number? x) (loc? x)))

  #; {Expr -> Number}
  ;; print each step of the calculation that reduces `expr` to a number
  (define (driver initial)
    (define-values (*C *E *S *K) (load initial))
    (show-state *C *E *S *K)
    (with-handlers ([exn:fail? (λ (xn) (show-core-dump xn *C *E *S *K))])
      (while (not (final? *C *E *S *K)) do
        (set!-values (*C *E *S *K) (transition *C *E *S *K))
        (show-state *C *E *S *K))
      (unload *C *E *S *K)))

  #; {Expr Env Store Stack -> Expr Env Store Stack}
  (define (transition C E S K)
    (match* { C E S K }
      ;; - - - special case for 'dot [left | right] - - - - - - - - -
      [{(list (? val? v) 'dot (or 'left 'right))  E S K}
       (define-values (n S+) (reduce C S))
       (values n empty S+ K)]
      [{(list ae_1 'dot (and which (or 'left 'right))) E S K }
       (values ae_1 E S (push K (list LEFT E 'dot which)))]
      ;; - - - because left and right are not expressions - - - - - -

      [{(? var?) E S K }
       (define val (lookup C E))
       (values val empty S K)]
      [{(decl x ae bdy) E S K }
       (values ae E S (push K (list DECL E x bdy)))]
      [{(? val? n) E S (app pop `((,(? (is? DECL)) ,E-d ,x ,bdy) ,K))}
       (values bdy (add x n E-d) S K)]
      [{(list (? not-val? ae_1) o (? val? v)) E S K}
       (values ae_1 E S (push K (list LEFT E o v)))]
      [{(list ae_1 o (? not-val? ae_2)) E S K }
       (values ae_2 E S (push K (list RGHT E o ae_1)))]
      [{(list (? val? l) o (? val? r)) E S K }
       (define-values (n S+) (reduce C S))
       (values n empty S+ K)]
      [{(? val? v) E S (app pop `((,(? (is? RGHT)) ,E-r ,o ,ae_1) ,K))}
       (values (list ae_1 o v) E-r S K)]
      [{(? val? v) E S (app pop `((,(? (is? LEFT)) ,E-l ,o ,ae_2) ,K))}
       (values (list v o ae_2) E-l S K)]))

  #; {Expr -> Expr Env Store Stack}
  (define (load ae) (values ae '[] plain mt))

  #; {Expr Env Store Stack -> Number}
  (define (unload n _1 S _3)
    (if (loc? n) (retrieve-pair S n) n))

  #; {Expr Env Store Stack -> Boolean}
  (define (final? control E S K)
    (and (val? control) (equal? mt K)))

  #; {(list Number o Number) Store -> Number Store}
  (define ns (make-base-namespace))
  (define (reduce ae S)
    (match ae
      [(list val1 'alloc val2) (alloc S val1 val2)]

      [(list l 'set-left  v) (values (retrieve S l)
                                     (update S l v))]
      [(list l 'set-right v) (values (retrieve S (loc+1 l))
                                     (update S (loc+1 l) v))]
      [(list l 'set sel nu)  (error 'set "")]

      [(list l 'dot 'left)  (values (retrieve S l) S)]
      [(list l 'dot 'right) (values (retrieve S (loc+1 l)) S)]
      [(list l 'dot sel)           (error 'dot "")]

      ;; rely on meta-language for arithmetic
      [(list lft o rgt) (values ((eval o ns) lft rgt) S)]))

  (define not-val? (compose not val?))

  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide driver decl)

Figure 92: Implementing the CESK Machine

The other change concerns the interpretation of instructions. Before instructions were arithmetic operations such as those provided by Racket and other languages. The allocation primitives work on the store, and therefore reduce accepts and returns the store now. Otherwise the conditional inside of reduce performs precisely the computations specified in the above table.

Added post lecture

The Nature of CESK States The initial and final states of the CESK machine are similar to those of the CEK machine. An initial state consists of the program’s instructions plus an empty environment, store, and stack.

A final state is again recognized as one with a value in the code register and an empty stack. Such a value might be a location, though, and in that case unloading the machine may have to reach into the store and extract the two parts of the pair. The CESK code in figure 92 indicates how such an unloading may work.— Of course, if one of these parts is a location, the unloading may have to continue.

Stop! When will this process stop?

Two Answers

Added post lecture

To get from a CEK machine state to a CK machine state, we replace all free variables in the expressions in C and inside of stack frames with their values from the “covering” environments. More generally this transformation yields a revised CK machine from a CEK machine. The revised CK machine uses substitution to explain variables and scope instead of environments.
At this point you may wonder whether we can get from the CESK machine to something like a CEK machine and back to a C machine—meaning an explanation of assignment statements in terms of a step-by-step calculation. While the answer is “yes,” the introduction of a store—to explain the effects of allocation of objects and mutation to their fields—changes the nature of the machine more than any other transformation to our machines (CC, CK, CEK). The answer was first given in my dissertation, which shows that we can calculate with expressions that contain assignment statements basically as much as we calculate with the kinds of expressions we get to know in middle school.—Covering this material in an undergraduate course goes beyond “principles” and I therefore skip it.
The unloading function of a CESK machine could go into an infinite loop if a pair is somehow involved in a cycle. Then again, we know that the graph involved in such a cycle is finite, and starting in Fundamentals I you learn about techniques for recognizing loops in finite graphs. Modern programming languages therefore typically implement fast mechanisms for rendering such graphs.

← prev up next →

	Abstract
	General
	Lectures
	Readings
	Email, Office Hours, Etc.
	Lab Book
	In-Class Reviews
	Delivery
	Assignments

	1 — Programming Languages: Research and Teaching
	2 — Parsing
	3 — Scope, Compilation
	4 — Compilers; Fun
	5 — Help!
	6 — Recursive Functions
	7 — Errors and Ordering
	8 — Assignments
	9 — Store Passing
	10 — What and When
	11 — Types & Proofs
	12 — The Truth
	13 — Poly Types
	14 — More Types
	15 — JULIA Types
	16 — Gradual Types
	17 — CPS
	18 — ONLINE
	19 — CANCELLED
	20 — Control from CPS
	21 — State Machines
	22 — C, CC, CK
	23 — CEK, CESK
	24 — Memory, Safety
	25 — Space
	26 — Q & A
	27 — Expressiveness
	28 — Monads
	29 — Continuations

current			next
C	E	K	C	E	K
(e1 o n2)	E	K	e1	E	K, (L E o n2)
(e1 o e2)	E	K	e2	E	K, (R E o e1)
(n1 o n2)	E	K	n	∅	K
n	∅	K, (L E o n1)	(n o n1)	E	K
n	∅	K, (R E o e1)	(e1 o n)	E	K