23 — CEK, CESK
Tuesday, 31 March 2020
Presenters (1) Kaylin Devchand & Kate Rupa (2) Sean Wallace, Zachary Wolfe
Our next step is to re-introduce variables. First we deal with variables à la mathematics and then with assignable ones.
CEK
(struct decl [var rhs body])
e | = | ... | ||
| | x | |||
| | [decl x e e] | |||
x | = | Variables |
We know from our work with interpreters that variable expressions must be interpreted within the scopes of all of its variable occurrences.
An expression is represented with abstract syntax trees, and a scope is represented with an environment. Even if we ignore the exact nature of environments we can still equip the existing machine rules for ArithmeticExpr with an environment register; see figure 87. The side conditions on these expression are as before, which is why they are omitted.
current
next
C
C
(e1 o n2)
e1
K, (L E o n2)
(e1 o e2)
e2
K, (R E o e1)
(n1 o n2)
n
∅
n
∅
K, (L E o n1)
(n o n1)
n
∅
K, (R E o e1)
(e1 o n)
the machine has an environment register, which holds values for all variables in the control (C) register;
when a sub-expression gets shifted from the C register to the K register, it must be accompanied by the current E—
because the scope of an expression never changes; when the machine returns a number, the value of E is ∅, the empty environment, because numbers aren’t variables;
when a return state is merged with a frame from the control context, the machine puts a possibly full-fledged expression in the C register and therefore it must put the corresponding E (from K) back into the environment register.
Thus far the environment was just along for the ride, without being used in any way. Clearly, before the machine can use the environment to find the value of variables, such values need to get into the environment. And this can happen only when the machine evaluates a decl tree. Specifically, the machine must evaluate the right-hand side sub-expression and remember to put its value into the environment; after that happens, it may interpret the body sub-expression of the decl tree.
(D E x e) is a frame that denotes (decl x (--) e). When a value v shows up in a return state, this context specifies that it is the value of x .
E is a sequence of variable mappings, each of which is written as (x ↦ v). We use ∅ for the empty environment. The notation (E+ (x ↦ v)) means a variable binding for x is added or replaces an existing one.
current
next
C
C
if
(decl x e1 e)
e1
n
∅
K, (D E x e)
e
E, (x ↦ n)
x
v
∅
(x ↦ v) ∈ E
The transition rule for a decl tree shifts the rhs field to the control register and remembers the rest of the expression with the new kind of D frame on the K.
The introduction of a new kind of frame demands a new “return state” transition. The second transition rule takes care of the new frame on K by shifting the body field of the decl struct (as remembered in the frame) to control register; the returned value gets put into the environment E as the value of x (also as remembered in the frame).
The third rule deals with variables in the control register. The value of the variable sits in E, the environment register. So the machine extracts the value and places it into the control register.
Stop! Why is the environment register set to ∅?
Added post lecture
The Nature of CEK States
An initial state is a complete program in which all variable references point to a decl declaration plus an empty environment and an empty stack. The stack register must obviously be empty. But why is the environment empty, too?
A final state has a value in the C register and an empty stack in the K register. The content of the environment register does not matter.
In general, the environment register must provide meaning—
Stop! How can we get back to a form of CK machine?
Stack vs Tree
In the previous lecture we observed that we can use a stack to represent the rest of the instructions, that is, those instructions that must be executed when the instructions in the control register are all done. This remains the case for the language with declared variables.
At first glance, the content of the environment register seems to only grow; at least only one transition rule adds (or replaces) an (existing) association of a variable with a value. But this impression is wrong because (1) a return state wipes out the current register and (2) every time the machine pops a frame from the stack register, it extracts an environment E and sticks it into the environment register.
Nevertheless, with only decl in the language, the expression is evaluated exactly once and we know that it is evaluated in the context of its original stack. Hence, one could “optimize” this machine and place variable bindings onto this stack until the instruction in decl’s body are all run. At that point, it would be safe to pop the variable binding from the stack.
This observation was exploited in the early days of compiler work. Once a stack was accepted (which took two decades), it was used to represent both the control context and the variable context. Indeed, language designers turned this idea around and rejected language features for a long time if they destroyed this unity of control and variable information.
Procedures are one critical example of this kind. They—
Adding closures or objects destroys this unity and demands an environment representation that resembles a tree.
Adding First-Class Functions
e | = | ... | ||
| | [fun x e] | |||
| | [cal e e] |
one for evaluating the argument, and it must remember the function expression and its environment;
one for evaluating the function expression, and it must remember the argument value.
Stop! Work out the details on paper and pencil.
Implementing the CEK Machine
The code in figure 89 uses the same environment that we always used. Otherwise it is a transliteration of the transition rules into the usual code schema.
Lectures/23/cek.rkt
#lang racket (require "../21/while.rkt" "../22/stack.rkt") (require "../6/environment.rkt") (require "simple-show.rkt") ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (define LEFT "left") (define RGHT "right") (define DECL "let") (struct decl [var rhs body] #:transparent) #; {type VarExpr = ... || [decl Var VarExpr VarExpr] || Var} #; {type Var = Symbol} (define var? symbol?) #; {type Env : mapping Var to Val} #; {type Val = Number} (define val? number?) #; {VarExpr -> Number} ;; print each step of the calculation that reduces `expr` to a number (define (driver initial) (define-values (*C *E *K) (load initial)) (show-state *C *E *K) (while (not (final? *C *E *K)) do (set!-values (*C *E *K) (transition *C *E *K)) (show-state *C *E *K)) (unload *C *E *K)) #; {VarExpr Env Stack -> VarExpr Env Stack} (define (transition C E K) (match* { C E K } [{(? var?) E stack} (define val (lookup C E)) (show-return-state val '[] stack) (values val '[] stack)] [{(decl x ae bdy) E stack } (values ae E (push stack (list DECL x bdy E)))] [{(? val? n) E (app pop `((,(? (is? DECL)) ,x ,bdy ,E-d) ,K))} (values bdy (add x n E-d) K)] [{(list(? not-number? ae_1) o (? number? ae_2)) E stack} (values ae_1 E (push stack (list LEFT o ae_2 E)))] [{(list ae_1 o (? not-number? ae_2)) E stack } (values ae_2 E (push stack (list RGHT o ae_1 E)))] [{(list (? number? l) o (? number? r)) E stack } (define n (reduce C)) (show-return-state n E stack) (values n mt stack)] [{(? val? n) E (app pop `((,(? (is? RGHT)) ,o ,ae_1 ,E-r) ,K))} (values (list ae_1 o n) E-r K)] [{(? val? n) E (app pop `((,(? (is? LEFT)) ,o ,ae_2 ,E-l) ,K))} (values (list n o ae_2) E-l K)])) #; {VarExpr -> VarExpr Env Stack} (define (load ae) (values ae '[] mt)) #; {VarExpr Env Stack -> Number} (define (unload n _1 _2) n) #; {VarExpr Env Stack -> Boolean} (define (final? control E stack) (and (val? control) (equal? '[] E) (equal? mt stack))) #; {(list Number o Number) -> Number} ;; okay this is a trick, but almost every language has this trick (define ns (make-base-namespace)) (define (reduce ae) ( (eval (second ae) ns) (first ae) (third ae))) (define not-number? (compose not number?)) ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (provide driver decl)
CESK
Adding assignments to VarExpr does in principle not require a store. Every decl is evaluated exactly once. Hence, if we systematically rename variables (modern parlance “refactor”) variables before evaluation, we can make sure that no variable name is ever in scope twice. Updating the environment directly would suffice; indeed, we could get away with updating it functionally.
The point of a model is to express realistic ideas with minimal effort. We know that first-class functions or objects would demand the separation of environment and store. Instead of those, however, we pick allocated structures as representative and yet slightly simpler linguistic constructs that demand such a separation. As we proceed, imagine the presence of the complicated ones, too.
e | = | ... | ||
| | (e alloc e) | |||
| | (e dot left) | |||
| | (e dot right) | |||
| | (e setleft e) | |||
| | (e setright e) |
Semantically, alloc puts two values into the store. The two are treated as a pair. The dot expressions retrieve the left or right part of such a pair. The set... expressions modify the left or right part of the pair. The setters return the old values of the fields of a pair that they modify.
Locations are taken from an unspecified set. Given a store, we can always pick a new location for now. l+1 means a location next to l.
S stands for a store, a sequence of location mappings, each of which is written as (l ↦ v). The notation (S+ (l ↦ v)) means a location binding for l is added or replaces an existing one.
We use ∅ for the empty store.
As above, we first introduce stores into the machine; see figure 90 So far, the store is simply along for the ride.
current
next
C
C
(e1 o v2)
e1
K, (L E o v2)
(e1 o e2)
e2
K, (R E o e1)
(n1 o n2)
n
∅
v
∅
K, (L E o v1)
(v o v1)
v
∅
K, (R E o e1)
(e1 o v)
(decl x e1 e)
e1
v
∅
K, (D E x e)
e
E, (x ↦ v)
x
v
∅
where (x ↦ v) ∈ E
But, once we have a machine with a store, we can use it to interpret the additional “arithmetic” operations in Expr; see figure 91.
current
next
C
C
(v1 alloc v2)
l
∅
S, (l ↦ v1, l+1 ↦ v2)
where l and l+1 are not in S
(l dot left)
v
∅
where (l ↦ v) ∈ S
(l dot right)
v
∅
where (l+1 ↦ v) ∈ S
(l setleft v1)
v
∅
S, (l ↦ v1)
where (l ↦ v) ∈ S
(l setright v1)
v
∅
S, (l+1 ↦ v1)
where (l ↦ v) ∈ S
the alloc transition reserves two neighboring places in the store and places the given values there;
the dot left transition retrieves the current value from the first slot of the given pair (location) that the corresponding allocation reserved;
the dot right transition retrieves the current value from the second slot of the given pair (location) that the corresponding allocation reserved;
the setleft transition sets the first slot of the (location of the) given pair to the right-hand value of the operation;
the setright transition sets the second slot of the (location of the) given pair to the right-hand value of the operation.
The notation l+1 suggests that locations are numbers, but this is not necessarily so. In some languages this is the case. If such languages also lack (type) soundness, it is even possible to “confuse” numbers that represent locations with actual numbers, multiply or divide two such locations (meaningless operations!), and get into all kind of trouble. C, C++, and Objective C are still-prominent languages that take this view.
*C |
| *E |
| *S |
| *K |
| |
(2 * ((1 alloc 2) dot left)) |
| ∅ |
| ∅ |
| |||
((1 alloc 2) dot left) |
| ∅ |
| ∅ |
| [R ∅ * 2] | ||
(1 alloc 2) |
| ∅ |
| ∅ |
| [R ∅ * 2], [L ∅ dot left] | ||
0 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| [R ∅ * 2], [L ∅ dot left] |
| pop |
0 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| [R ∅ * 2], [L ∅ dot left] | ||
(0 dot left) |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| [R ∅ * 2] | ||
1 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| [R ∅ * 2] |
| pop |
1 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| [R ∅ * 2] | ||
(2 * 1) |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
| |||
2 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
|
| pop | |
2 |
| ∅ |
| 0 ↦ 1, 1 ↦ 2 |
|
Stop! Is the CESK interpretation of Expr sound?
Wait! There are no types, what could “type sound” mean here? In the absence of types, sound means the “interpretation” of all programs has an expected outcome: a result, a loop, or an erroneous situation that we accept, such as not being able to add a number to a function. And this is the hint.
Can the machine produce nonsensical results? Should it get stuck in some states and doesn’t even though it’s neither final nor really defined for the current state?
This is one of two topics we will discuss next time.
Implementing the CESK Machine
Figure 92 displays a fairly standard implementation. The key difference between the CESK machine and the CEK machine concerns the transitions for alloc, dot left, dot right, setleft and setright. Some fit the standard format of infix arithmetic notation (intentionally so); the one for dot notation doesn’t and therefore needs two special-case transitions.
Lectures/23/cesk.rkt
#lang racket (require "store.rkt") (require "../21/while.rkt" "../22/stack.rkt" "../6/environment.rkt") (require "show-with-store.rkt") ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (define LEFT "left") (define RGHT "right") (define DECL "let") (struct decl [var rhs body] #:transparent) #; {type Expr = ... || [Expr alloc Exp] || [Expr dot left] || [Expr dot right] || [Expr setleft Expr] || [Expr setright Expr]} #; {type Var = Symbol} (define var? symbol?) #; {type Env mapping Var to Val} #; {type StacK = [Listof Frame]} #; {type Frame = [list {LEFT || RIGHT || DECL} Env X Expr]} #; {type Loc} #; {type Store mapping Locs to Val} #; {type Val = Number || Loc} (define (val? x) (or (number? x) (loc? x))) #; {Expr -> Number} ;; print each step of the calculation that reduces `expr` to a number (define (driver initial) (define-values (*C *E *S *K) (load initial)) (show-state *C *E *S *K) (with-handlers ([exn:fail? (λ (xn) (show-core-dump xn *C *E *S *K))]) (while (not (final? *C *E *S *K)) do (set!-values (*C *E *S *K) (transition *C *E *S *K)) (show-state *C *E *S *K)) (unload *C *E *S *K))) #; {Expr Env Store Stack -> Expr Env Store Stack} (define (transition C E S K) (match* { C E S K } ;; - - - special case for 'dot [left | right] - - - - - - - - - [{(list (? val? v) 'dot (or 'left 'right)) E S K} (define-values (n S+) (reduce C S)) (values n empty S+ K)] [{(list ae_1 'dot (and which (or 'left 'right))) E S K } (values ae_1 E S (push K (list LEFT E 'dot which)))] ;; - - - because left and right are not expressions - - - - - - [{(? var?) E S K } (define val (lookup C E)) (values val empty S K)] [{(decl x ae bdy) E S K } (values ae E S (push K (list DECL E x bdy)))] [{(? val? n) E S (app pop `((,(? (is? DECL)) ,E-d ,x ,bdy) ,K))} (values bdy (add x n E-d) S K)] [{(list (? not-val? ae_1) o (? val? v)) E S K} (values ae_1 E S (push K (list LEFT E o v)))] [{(list ae_1 o (? not-val? ae_2)) E S K } (values ae_2 E S (push K (list RGHT E o ae_1)))] [{(list (? val? l) o (? val? r)) E S K } (define-values (n S+) (reduce C S)) (values n empty S+ K)] [{(? val? v) E S (app pop `((,(? (is? RGHT)) ,E-r ,o ,ae_1) ,K))} (values (list ae_1 o v) E-r S K)] [{(? val? v) E S (app pop `((,(? (is? LEFT)) ,E-l ,o ,ae_2) ,K))} (values (list v o ae_2) E-l S K)])) #; {Expr -> Expr Env Store Stack} (define (load ae) (values ae '[] plain mt)) #; {Expr Env Store Stack -> Number} (define (unload n _1 S _3) (if (loc? n) (retrieve-pair S n) n)) #; {Expr Env Store Stack -> Boolean} (define (final? control E S K) (and (val? control) (equal? mt K))) #; {(list Number o Number) Store -> Number Store} (define ns (make-base-namespace)) (define (reduce ae S) (match ae [(list val1 'alloc val2) (alloc S val1 val2)] [(list l 'set-left v) (values (retrieve S l) (update S l v))] [(list l 'set-right v) (values (retrieve S (loc+1 l)) (update S (loc+1 l) v))] [(list l 'set sel nu) (error 'set "")] [(list l 'dot 'left) (values (retrieve S l) S)] [(list l 'dot 'right) (values (retrieve S (loc+1 l)) S)] [(list l 'dot sel) (error 'dot "")] ;; rely on meta-language for arithmetic [(list lft o rgt) (values ((eval o ns) lft rgt) S)])) (define not-val? (compose not val?)) ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (provide driver decl)
The other change concerns the interpretation of instructions. Before instructions were arithmetic operations such as those provided by Racket and other languages. The allocation primitives work on the store, and therefore reduce accepts and returns the store now. Otherwise the conditional inside of reduce performs precisely the computations specified in the above table.
Added post lecture
The Nature of CESK States The initial and final states of the CESK machine are similar to those of the CEK machine. An initial state consists of the program’s instructions plus an empty environment, store, and stack.
A final state is again recognized as one with a value in the code register
and an empty stack. Such a value might be a location, though, and in that
case unloading the machine may have to reach into the store and extract the
two parts of the pair. The CESK code in figure 92 indicates how
such an unloading may work.—
Stop! When will this process stop?
Two Answers
Added post lecture
To get from a CEK machine state to a CK machine state, we replace all free variables in the expressions in C and inside of stack frames with their values from the “covering” environments. More generally this transformation yields a revised CK machine from a CEK machine. The revised CK machine uses substitution to explain variables and scope instead of environments.
At this point you may wonder whether we can get from the CESK machine to something like a CEK machine and back to a C machine—
meaning an explanation of assignment statements in terms of a step-by-step calculation. While the answer is “yes,” the introduction of a store— to explain the effects of allocation of objects and mutation to their fields— changes the nature of the machine more than any other transformation to our machines (CC, CK, CEK). The answer was first given in my dissertation, which shows that we can calculate with expressions that contain assignment statements basically as much as we calculate with the kinds of expressions we get to know in middle school.— Covering this material in an undergraduate course goes beyond “principles” and I therefore skip it. The unloading function of a CESK machine could go into an infinite loop if a pair is somehow involved in a cycle. Then again, we know that the graph involved in such a cycle is finite, and starting in Fundamentals I you learn about techniques for recognizing loops in finite graphs. Modern programming languages therefore typically implement fast mechanisms for rendering such graphs.