7.7.0.3

22 — C, CC, CK

Friday, 27 March 2020

Presenters (1) Mason Eldridge & Kia Zafar, (2) Mitchell Gamburg & Da-Jin Chu

The single assignable variable in our stepper state machine contains the code of the program, that is, the “machine instructions.” These instructions control every machine transition. It is therefore known as the control-code.

; print each step of the calculation that reduces expr to a number
(define (driver initial)
  (display "   ") (displayln initial)
  (define *control-code initial)
  (while (not (final? *control-code)) do
    (set! *control-code (transition *control-code))
    (display " = ") (displayln *control-code)))

Figure 81: The C Machine Driver

Here I use the convention to start the names of registers with *. The best way of looking at this machine—its driver really—is to think of the assignable variable as a register that points to the place where next instruction can be found. Because this machine has a single register, containing control code, it is known as the C Machine.

Over the course of this lecture we will tease apart how the machine searches for the next instruction by adding another register and then changing the data representation of what we put into this register. In the next lecture, we will add two registers to deal with variable declarations and assignments to variables.

The CC Machine

The CC, CK, CEK, and CESK machines are from my dissertation, where they played the role of proof tools. Since then they have become ubiquitous elements in the PL literature, which is why I teach them here.

When you look closely at the C machine’s tabular presentation or the sample “execution” of 21 — State Machines, you realize that the machine repeatedly performs the following actions:
  • stop, if the control code is just a number;

  • partition the expression into an instruction proper and the evaluation context;

  • produce the number from a proper instruction;

  • place the number back into the evaluation context;

  • go back to step 1.

The key to recognize though is that we know exactly which part of the evaluation context the machine uses next: the arithmetic expression surrounding the hole.

By separating the arithmetic expression in which the instruction proper can be found from its evaluation context, we can gain insight into how a machine’s workings. We don’t actually need new definitions to write down a machine that keeps these two pieces separate and explicates the search.

Figure 82 displays its complete tabular description. Each state consists of two pieces of data and is therefore represented by two columns. The condition in the fifth column specifies additional constraints on the control code or how to evaluate it. As you read this table, keep in mind that n, n1, and n2 stand for numbers while e1 and e2 stand for arbitrary ArithmeticExpr.

current

   

current

   

next

   

next

   

Control

   

Context

   

Control

   

Context

   

if

(e1 op n2)

   

E

   

e1

   

E [(-) op n2]

   

!(e1 ∈ N)

(e1 op e2)

   

E

   

e2

   

E [e1 op (-)]

   

!(e2 ∈ N)

(n1 op n2)

   

E

   

n

   

E

   

n = n1 op n2

n

   

E [(-) op n1]

   

(n op n1)

   

E

n

   

E [e1 op (-)]

   

(e1 op n)

   

E

Figure 82: A Tabular Description of the CC Machine

Let’s reformulate this table as the usual four-piece description:
  • states a state consists of two pieces of data: an ArithmeticExpr and an evaluation context E

  • transitions the five transitions fall into three categories:
    • with the first two rules, the machine searches for a proper instruction.

      If the control register contains an arithmetic expression whose right side is already a number it searches on the left; otherwise on the right.

      A search step fills the existing evaluation context with the remainder of the arithmetic expression, that is, one with a hole at the place where the machine searches next.

    • with the third rule, it executes a proper instruction.

      Here we know how to execute a + - * or / instruction on two numbers. When the set of proper instruction becomes more complicated, we may have to add machine instructions.

    • with the last two rules, the machine “return” the result of a proper instruction to the context.

      It removes the partial arithmetic operation that surrounds the hole. This is also a context and can therefore be filled to obtain another expression, and this expression becomes the new control code.

  • an initial state consists of an expression and an empty context

  • a final state consists of a number and an empty context

While the transitions look more complicated, they explain in much more detail than the C machine how to find an instruction proper, execute it, and resume the search.

Finally, here is a sample execution of the expression we used in 21 — State Machines to illustrate the workings of the C machine:

*control

  

*context

  

((1 + 2) * (3 - (84 / 21)))

  

(-)

(3 - (84 / 21))

  

((1 + 2) * (-))

(84 / 21)

  

((1 + 2) * (3 - (-)))

4

  

((1 + 2) * (3 - (-)))

  

return state

(3 - 4)

  

((1 + 2) * (-))

-1

  

((1 + 2) * (-))

  

return state

((1 + 2) * -1)

  

(-)

(1 + 2)

  

((-) * -1)

3

  

((-) * -1)

  

return state

(3 * -1)

  

(-)

-3

  

(-)

  

return state

-3

  

(-)

Implementing the CC Machine

Figure 83 displays the implementation of the CC machine. Its driver uses two registers: *control and *context. As before, the implementation represents the set of transitions with a function. The five match* clauses precisely correspond to the five rules in the table.

Racket Note Racket’s match* matches two values to two patterns. So the two columns are matched as two patterns. The match* expression returns two values, which go into the two registers next.

The (app split pat) pattern applies split to the context and then matches the result with patwhich is how we can match the tabular rules and the function so closely. End

Since splitting an evaluation context for the hole and the surrounding arithmetic operation is not a simple step—the data definition is self-referential—the implementation relies on an auxiliary to accomplish this goal: return.

Lectures/22/cc.rkt

  #lang racket
   
  (require "../21/while.rkt" "show-state.rkt")
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
   
  #; {ArithmeticExpr -> Number}
  ;; print each step of the calculation that reduces `expr` to a number
  (define (driver initial)
    (define-values (*control *context) (load initial))
    (show-state *control *context)
    (while (not (final? *control *context)) do
      (set!-values (*control *context) (transition *control *context))
      (show-state *control *context))
    (unload *control *context))
   
  #; {ArithmeticExpr E -> ArithmeticExpr E}
  (define (transition control context)
    (match* ( control context )
      [{ (list (? not-number? ae_1) o (? number? ae_2)) E}
       (values ae_1 (fill context (list HOLE o ae_2)))]
      [{ (list ae_1 o (? not-number? ae_2)) E}
       (values ae_2 (fill context (list ae_1 o HOLE)))]
      [{ (list (? number? l) o (? number? r)) E}
       (define n (reduce control))
       (show-return-state n context)
       (values n context)]
      [{ (? number? n) (app split `((,(? (is? HOLE)) ,o ,ae_2) ,E))}
       (values (list n o ae_2) E)]
      [{ (? number? n) (app split `((,ae_1 ,o ,(? (is? HOLE))) ,E))}
       (values (list ae_1 o n) E)]))
   
  #; {E -> E E}
  (define (split context)
    (define-values (inside E)
      (let return ([context context])
        (match context
          [(list (? (is? HOLE)) o ae_2)
           (values context HOLE)]
          [(list ae_1 o (? (is? HOLE)))
           (values context HOLE)]
          [(list ae_1 o (? number? k))
           (define-values (ae E) (return ae_1))
           (values ae (list E o k))]
          [(list ae_1 o ae_2)
           (define-values (ae E) (return ae_2))
           (values ae (list ae_1 o E))])))
    (list inside E))
   
  #; {E ArithemticExpr -> ArithemticExpr}
  (define (fill context inside)
    (match context
      [(? (is? HOLE))   inside]
      [(list E o (? number? n)) (list (fill E inside) o n)]
      [(list ae_1 o E)          (list ae_1 o (fill E inside))]))
   
  #; {ArithmeticExpr E -> ArithmeticExpr E}
  (define (load ae) (values ae HOLE))
   
  #; {ArithmeticExpr E -> Number}
  (define (unload n _) n)
   
  #; {ArithmeticExpr E -> Boolean}
  (define (final? control context)
    (and (number? control) (equal? HOLE context)))
   
  #; {(list Number o Number) -> Number}
  ;; okay this is a trick, but almost every language has this trick
  (define ns (make-base-namespace))
  (define (reduce ae)
    ( (eval (second ae) ns) (first ae) (third ae)))
   
  (define not-number? (compose not number?))
   
  (define HOLE '[---])
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  (module+ test
    (require rackunit)
    (check-equal? (driver '((1 + 2) * (3 - (84 / 21)))) -3))
   

Figure 83: Implementing The CC Machine

One other small detail distinguishes this implementation from the C machine. The machine itself accepts an ArithmeticExpr but runs a two-register configuration. To bridge the gap, we use a load function. When the machine reaches a final state, we unload the registers to obtain the final result for the driver.

Note When you step through the machine’s execution, you notice that it is in a “return” state twice in a row. The first produces the value, the second one actually returns it to the surrounding context. An implementation could merge these two steps—but then the code would no longer look like the tabular description.

The CK Machine

When the CC machine goes into a return state, it must split the evaluation context into the innermost arithmetic operation—which is of the shape ((---) op n) or (ae op (---))and the remainder of the context. The first context is filled with the number that resulted from the basic instruction step; the second one becomes the next context. The point is that this split operation has to traverse the entire context to find the innermost arithmetic operation, but this is also the last operation the machine added to the context.

In short, the CC machine treats the evaluation context as a “last-in, first-out” data structure—and that, as you know, is called a stack.

The representation as a context—an expression with a hole—hides the intention and turns the “in” and “out” part into a rather expensive traversal of a good portion of the abstract syntax tree.

The idea then is to use an actual stack instead of an evaluation context. Each element of the stack must contain enough information to reconstruct what would be the innermost arithmetic operation of a context. While we could use small contexts directly, it is even better to use a simpler representation:

A Frame is one of:
  • (L op n) which denotes this context ((--) op n)

  • (R op e) which stands for (ae op (--))

A stacK (or just K) is a sequence of Frames to which add to and take from the right.

Figure 84 presents the tabular description of the revised machine.

current

   

current

   

next

   

next

   

Control

   

StacK

   

Control

   

Stack

   

if

(e1 o n2)

   

K

   

e1

   

K, (L o n2)

   

!(e1 ∈ N)

(e1 o e2)

   

K

   

e2

   

K, (R o e1)

   

!(e2 ∈ N)

(n1 o n2)

   

K

   

n

   

K

   

n = n1 o n2

n

   

K, (L o n1)

   

(n o n1)

   

K

n

   

K, (R o e1)

   

(e1 o n)

   

K

Figure 84: A Tabular Description of the CK Machine

With Ks, the transition describes clearly what is happening: the machine adds frames to the right and it takes them from there when it returns.

Tracing this machine is also a bit clearer than tracing the CC machine:

*control

  

*stack

  

((1 + 2) * (3 - (84 / 21)))

  

(3 - (84 / 21))

  

[right * (1 + 2)]

(84 / 21)

  

[right * (1 + 2)], [right - 3]

4

  

[right * (1 + 2)], [right - 3]

  

pop

(3 - 4)

  

[right * (1 + 2)]

-1

  

[right * (1 + 2)]

  

pop

((1 + 2) * -1)

  

(1 + 2)

  

[left * -1]

3

  

[left * -1]

  

pop

(3 * -1)

  

-3

  

  

pop

-3

  

Implementing the CK Machine

The implementation of the CK machine relies on the existence of a stack data structure. Sadly most existing programming languages co-mingle the idea of creating and using a stack with mutating a data structure directly. When we use them for a state machine, however, we wish to separate the register assignment from the operations on stacks as data. In short we follow Josh Bloch’s advice for Java programmers and “favor immutability.” This permits a through testing of the stack data structure independently of changing a variable that contains one.

; Frame = [list 'right op ArithmeticExpr]
;       || [list 'left op Number]
; 
; type Stack[Frame]
(provide
 ; Stack
 mt
 ; Stack[X] X -> Stack[X]
 push
 ; Stack[X] -> (list X Stack[X])
 pop)

Figure 85: A Functional Stack

Figure 85 displays the complete K of Frames implementation needed. The signatures are in Fundamentals I style:
  • mt is the empty stack constant

  • push adds an X to a K

  • pop simultaneously returns the last X added and the remainder of the stack.

Each of these functions is about one line in Racket and one 3-line class in Java a la Fundamentals.

With this kind of stack, the CC machine can easily be turned into a CK machine, and the result can be seen in figure 86.

Lectures/22/ck.rkt

  #lang racket
   
  (require "../21/while.rkt" "stack.rkt" "show-stack-state.rkt")
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
   
  (define LEFT "left")
  (define RGHT "right")
   
  #; {ArithmeticExpr -> Number}
  ;; print each step of the calculation that reduces `expr` to a number
  (define (driver initial)
    (define-values (*control *stack) (load initial))
    (show-state *control *stack)
    (while (not (final? *control *stack)) do
      (set!-values (*control *stack) (transition *control *stack))
      (show-state *control *stack))
    (unload *control *stack))
   
  #; {ArithmeticExpr Stack -> ArithmeticExpr Stack}
  (define (transition control stack)
    (match* { control stack }
      [{ (list(? not-number? ae_1) o (? number? ae_2)) stack}
       (values ae_1 (push stack (list LEFT o ae_2)))]
      [{ (list ae_1 o (? not-number? ae_2)) stack }
       (values ae_2 (push stack (list RGHT o ae_1)))]
      [{ (list (? number? l) o (? number? r)) stack }
       (define n (reduce control))
       (show-return-state n stack)
       (values n stack)]
      [{ (? number? n) (app pop `((,(? (is? LEFT)) ,o ,ae_2) ,S)) }
       (values (list n o ae_2) S)]
      [{ (? number? n) (app pop `((,(? (is? RGHT)) ,o ,ae_1) ,S)) }
       (values (list ae_1 o n) S)]))
   
  #; {ArithmeticExpr -> ArithmeticExpr Stack}
  (define (load ae) (values ae mt))
   
  #; {ArithmeticExpr Stack -> Number}
  (define (unload n _) n)
   
  #; {ArithmeticExpr Stack -> Boolean}
  (define (final? control stack)
    (and (number? control) (equal? mt stack)))
   
  #; {(list Number o Number) -> Number}
  ;; okay this is a trick, but almost every language has this trick
  (define ns (make-base-namespace))
  (define (reduce ae)
    ( (eval (second ae) ns) (first ae) (third ae)))
   
  (define not-number? (compose not number?))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  (module+ test
    (require rackunit)
    (check-equal? (driver '((1 + 2) * (3 - (84 / 21)))) -3))
   

Figure 86: Implementing the CK Machine

The transition function still distinguishes five cases with a match* and returns the two new values: a Control and a K. The key difference concerns the “return state”. Instead of using the recursive traversal function of the CC machine, the CK machine uses just pop to extract the top frame and to use it for the construction of the next control code. As you can see from figure 86, the CK implementation is significantly more concise than the CC implementation.