7.7.0.3

8 — Assignments

Friday, 31 January 2020

Presenters (1) Elizazijin Huang, Iman Moreira (2) E Ogra, S Wisniowiecki

When you first encounter statements such as

    x = x + 1

your stomach should turn over. After all, by this time your math teacher has taught you at least this much:

    x       = x + 1

    

    // subtracting x from both sides preserves equality:

    (x) - x = (x + 1) -x

    

    // simplifying both sides yields

    x - x   = x + 1 -x

    0       = 1

and you know that 0 != 1. So what gives?

Assignment statements are not equations; the (usual) syntax is deceiving, misleading, and just stupid. Okay, we got this out of our system. Now we swallow and accept it.

Once you’re over this first shock and you wrap your head around this x = x +1 idea, you may feel joy because you just realized you got new, magical powers.We shall return to this idea of new powers in a few weeks. You may think "oh that’s easy, variables can now be varied." Indeed, because you also get sequences of steps at the same time, say

    x = x + 1;

    y = y + 1;

    println("new location: [%d,%d]", x, y)

you think "oh great, I can finally break down these big mathematical expressions into small steps and understand every one of them."

Functions aren’t values in FORTRAN IV.

Far from it though, the addition of assignment statements to algebraic expressions may have been simple in 1960 when life was simple and FORTRAN IV ruled the world. Sadly it is now 2020 and JavaScript, Python, Java and other big monsters roam the planet, and they turn assignment statements into complex beasts.

Don’t Judge the Book by its Cover (Terminology)

This happened just recently on Piazza.

People often call something like int x = x + 1; or in our notation

  ["let", "x", "=", ["x", "+", 1]]

an assignment statement. These phrases are not assignments.

Not everything that looks like "x = y" is an assignment statement. Some language designers have preserved sanity (do not use "x = y" at all), and some language designers have gone maddeningly sick in this context.

Definition Some languages permit variable declarations with initialization. For example, in Java you may write

  Point p = q;

and the same idea looks deceivingly similar in C++: .. which is why CE students think they know how to program in Java after taking a course on C++.

  Point p = q;

and it means something totally different. In Rust, you may write

  let x : int = 0;

and in OCaml this looks quite similar again

  let x : int = 0;

and also means something different from the same thing in Rust.

Definition So-called imperative languages come with assignment statements, which have a variety of shapes:

  x = x + 1;       // C-style languages

  

  x := x + 1;      // Algol/Pascal-style languages

  

  x <- x + 1;      // Haskell

  

  x = !x + 1;      // OCaml

  

  (setq x (+ x 1)) // Lisp-style languages

and there are probably some I forgot.

Here is an example of how different these two ideas are. If you separate the initialization of a Point in C++, as in,

  Point p = q; // assume q is of type Point and in scope

from its declaration with the use of an immediate assignment, as in,

  Point p;

  p = q;     // assume q is of type Point and in scope

you get a totally different behaviors of seemingly identical programs. The first invokes the copy constructor on Point and thus effectively assigns p a different point; mutating p will (may) not mutate q. The second means p and q refer to very same object (and bits) in the computer’s memory.

Short Don’t be deceived by the surface syntax of programming languages. .

New Syntax

As always, we first extend the language of abstract syntax trees with appropriate new structs for representing the new syntactic constructs.

Lectures/8/ass-as-data.rkt

  #lang racket
   
  ;; internal representation of a language with
  ;; -- arithmetic
  ;; -- variables
  ;; -- functions
  ;; -- assignment statements
   
  (struct node [op left right] #:transparent)
  (struct decl [variable value scope] #:transparent)
  (struct fun  [parameter body] #:transparent)
  (struct call [fname argument] #:transparent)
  (struct if-0 [test then else] #:transparent)
  (struct set  [lhs rhs] #:transparent)
  (struct sequ [fst rst] #:transparent)
   
  #; {AssExpr  = Int || (node O AssExpr AssExpr) ||
               (decl Var AssExpr AssExpr) ||
               Var
               (fun Var AssExpr)
               (call AssExpr AssExpr)
               (if-0 AssExpr AssExpr AssExpr)
               [set  Var AssExpr]
               (sequ AssEexpr AssExpr)}
  #; {O = + || *}
  #; {Var = String}
         
  (define (ass-expr x)
    (or (integer? x) (node? x) (decl? x) (string? x) (if-0? x)
        (fun? x) (call? x)
        ;; - - - - - - - - - - - - - - - - - - - - - - - - -
        [set? x] (sequ? x)))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  ;; SCOPE in changed:
  #; (decl var rhs-expr body-expr)
  ;; var is visible in _both_ rhs-expr and _body-expr_
   
  ;; neither set nor sequ affect the scope of variables 
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide
   ass-expr
   (struct-out node)
   (struct-out decl)
   (struct-out fun)
   (struct-out call)
   (struct-out if-0)
   (struct-out set)
   (struct-out sequ))
   

Figure 31: Assignments in a Language with First-Class Functions

The language of AssExpr expressions introduces two new constructs: assignment statements and sequences of statements. For simplicity, we do not distinguish between statements but stick to expressions. That is, we think of assignments and sequences as expressions.

Scope A new syntactic construct in a programming language must always raise the question of scope, especially if it involves variables. Assignments do not create a new scope. But, the variable—called the left-hand side here—should be in the scope of an appropriate declaration.

Informal Meaning Just to make sure we are on the same page, here are informal descriptions of the new constructs:
  • (set x (node + x 1)) looks up the current value of x, say vO; evaluates the right-hand side to a value vn; associates x with vn until the next assignment to x takes place or the end of the program; and returns vo.

    Note An ordinary assignment statement skips the last step.

  • (sequ e1 e2) evaluates the expression e1 and throws away its result. Then it evaluates e2 and returns its value as the value of the entire expression.

A True Blue Example

Let’s use a single example to understand the power of assignments.

Lectures/8/examples.rkt

  #lang racket
   
  (require "ass-as-data.rkt")
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  ;; an example that covers it all and is almost realistic
   
  (define example1
    (decl
     "count!" ;; calls f on x and adds how many times f has been called
     (fun "f"
          (decl "count" 0
                (fun "x"
                     (sequ (set "count" (node + "count" 1))
                           (node + "count" (call "f" "x"))))))
     ;; - - - 
     (decl "square" (fun "x" (node * "x" "x"))
           (decl "double" (fun "x" (node + "x" "x"))
                 ;; - - - 
                 (decl "f" (call "count!" "square")
                       (decl "g" (call "count!" "double")
                             ;; - - - 
                             (node *
                                   (node + (call "f" 2) (call "f" 3))
                                   (node + (call "g" 2) (call "g" 3))
    )))))))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide example1)
   

Figure 32: A Full-Powered Example in the Full Language

Since our abstract syntax is a bit obscure, it’s best to look at this example in the syntax of a real language and in two styles. Figure 33 shows what this example looks like in a "procedural" style.

Lectures/8/examples-fun.rkt

  #lang racket
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  ;; in a functional language with assignment statements (ASL)
  (define (example1-asl)
    (define (count! f)
      (define count 0)
      (lambda (x)
        (set! count (+ count 1))
        (+ count (f x))))
    (define (square x)
      (* x x))
    (define f (count! square))
    (define (double x)
      (+ x x))
    (define g (count! double))
    (* (+ (f 2) (f 3))
       (+ (g 2) (g 3))))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide example1-asl)
   

Figure 33: A Full-Powered Example, Written in Procedural Racket

Figure 34 displays an object-oriented version, which—without parentheses—could be a C++ program or—with some judicious use of lambdacould be written in Java.

Lectures/8/examples-ood.rkt

  #lang racket
   
  (require "examples-fun.rkt")
   
  ;; in an object-oriented language with function defs 
  (define (example1-object)
    (define count%
      (class object%
        (init-field f)
        (super-new)
        (field [count 0])
        (define/public (apply x)
          (set! count (+ count 1))
          (+ count (f x)))))
    (define (square x)
      (* x x))
    (define f (new count% [f square]))
    (define (double x)
      (+ x x))
    (define g (new count% [f double]))
    (* (+ (send f apply 2) (send f apply 3))
       (+ (send g apply 2) (send g apply 3))))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (provide example1-asl example1-object)
   
  (module+ test
    (require rackunit)
    (check-equal? (example1-object) (example1-asl)))
   

Figure 34: A Full-Powered Example, Written in Object-Oriented Racket

Here is a list of questions about this example that you must now be able to answer:
  • how many instances of count exist during the evaluation of the program?

  • how many times is each of them assigned a new value?

  • does the order of the function calls matter?

Aliasing is the source of many bugs in plain sequential programs and, when a language supports parallelism, it gets much worse. The example demonstrates how scope is distinct from the concept of a mutable variable. Evaluating the same scope and retaining access to a local variable twice yields separate places where values change. But, accessing the function resulting from such a scope-variable evaluation twice mutates the same variable twice, a phenomenon called aliasing.

The Idea: Box the Values in the Environments

If the assignment statement (set "count" (node + "count" 1)) is to affect the next use of "count" via the addition expression

(node + (call "f" 2) (call "f" 3))

the interpreter must be able to change the association of a variable with a value in the environment.

One way to implement mutability would be to make the environment mutable. Doing so, however, would conflate two concerns:
  • the environment represents the scope of a variable declaration, which never changes

  • assignments may modify the value with which a variable is associated.

Since these two concerns are separate ideas, we reject this implementation.

A second way to implement mutability is to place a mutable object into the environment. Then the environment associates one and the same object for the entire execution with the scope of a variable, including dynamic introductions of variables such as function parameters. Yet, the interpreter may realize mutability by what is inside the one object per variable.

While Racket already supplies such mutable boxes, figure 35 shows how to realize them in an object-oriented language.

Lectures/8/left-hand-side-value.rkt

  #lang racket
   
  (provide
   boxed-value%)
   
  (define boxed-value%
    (class object%
      (init-field content)
      (super-new)
   
      #; {Value Value}
      (define/public (get)
        content)
   
      #; {Value -> Value}
      (define/public (get-then-set nu)
        (begin0
          content
          (set! content nu)))))
   

Figure 35: Boxing Values

No matter which implementation we choose the extensional consequences remain the same. Choosing the second one is merely cleaner with respect to the goal of understanding what the idea of mutability is really all about.

Revising the Interpreter: Attempt 1

So the idea is to associate an instance of boxed-value% with a variable in the environment, which then makes all of our variables assignable.

Concretely, this idea implies that we need to inspect all places where the environment is extended and of course the place where the interpreter retrieves a variable’s value from the environment.

Our existing interpreter extends the environment in two places. The first and simpler one happens when a function is called, i.e., in fun-apply:
[(function-value fpara fbody env)
 (interpret fbody (add fpara argument-value env))]
Wrapping the argument value in a box is straightforward:
[(function-value fpara fbody env)
 (define boxed-value (new boxed-value% [content argument-value]))
 (interpret fbody (add fpara boxed-value env))]

His change obviously implies that a variable lookup retrieves a boxed value and that’s not the expected result. The fix is to retrieve the value from the box:
[(? string?)
 (unless (defined? ae env)
   (error 'value-of "undeclared variable ~e" ae))
 (define boxed-value (lookup ae env))
 (send boxed-valye get)]

This second change also suggests how the interpreter realizes an imperative assignment:
[(set lhs rhs)
 (define box (lookup lhs env))
 (define val (interpret rhs env))
 (send box get-then-set val)]

  box = env.lookup(lhs)

  val = rhs.interpret(env)

  old = box.get_than_set(val)

  return old

In conventional OO style, this code may look like the one on the right.

And this leaves us with one last problem, namely, how to fix the decl interpretation. Applying the same straightforward transformation to the interpreter yields this revision:
[(decl x a1 a2)
 (define boxer (λ (env) (new boxed-value% [content (interpret a1 env)])))
 (define new-env (add-rec x boxer env))
 (interpret a2 new-env)]

Note 1 A careful programmer will not revise the interpreter in this order. The proper ordering is to first revise the existing interpreter without adding casesfor set and sequand then run the existing test suite. The good news is that this test suite will pass.

Note 2 After adding the cases for the new language constructs—assignment and sequence—the same programmer adds tests that uses these constructs. He might start with one that is a bit simpler than the above but has similar characteristics:
(decl "f" (decl "z" 1
                (fun "y"
                      (sequ (set "z" (node + "z" 1))
                            (node + "y" "z"))))
      (node + (call "f" 1) (call "f" 1)))
Here the function "f" increments a local counter and dds this counter to the given argument. A by-hand calculation of an experienced Python or JavaScript or Java programmer tells us that the expected result is 7.

Problem is, the actual result is 4.

The interpreter is broken.

Why Attempt 1 Fails

The problem is the one we discussed at the end of 6 — Recursive Functions. Whenever the interpreter looks up a declared variable, the environment calls the stored function with the extended environment, and interprets the initialization expression.

Stop! Why does this work in the context of 6 — Recursive Functions?

It doesn’t work once we have an internally observable computational effect such as assignments to variables because effects—such as mutations of variables or allocations—are repeated.

Here every lookup of a declared variable, puts the re-computed value into a newly allocated boxed-value% object. Hence teh "counter" "z" is re-initialized for every call to "f", which explains the wrong result.

This behavior is not what we know from Python or JavaScript or Java objects. In these languages, the right-hand side value is retained between function calls and every call increments this counter and does not start over from the initial value.

Revising the Interpreter: Attempt 2

The analysis of the first interpreter delivers the key insight for a fix. Our interpreter must
  • evaluate the initialization expression exactly once

  • allocate a new box for this value once and for all

If it does so, it will get the current value of the variable for every reference to the declared variable.

Here are the essential lines then:
[(decl x a1 a2)
 (define the-lhs (new boxed-value% [content #f]))
 (define env++   (add x the-lhs env))
 (define lhs-val (interpret a1 env++))
 (send the-lhs get-then-set lhs-val)
 (interpret a2 env++)]
The first line allocates a new boxed-value% and the second line sticks it into the environment. Now we have a meaning for the declared variable reference, thought not a real value. So the interpreter computes the initial value next and use an assignment to stick it into the-lhs. At this point, the interpreter is ready to evaluate the body of the given declaration. Figure 36 shows the complete revision.

Lectures/8/interpreter-b.rkt

  #lang racket
   
  ;; an interpreter that sticks mutable objects
  ;; into the environment, interpreting assignment
  ;; statements via meta-assignment statements
   
  ;; INTERPRET RHS OF DECL ONLY ONCE 
   
  (require "../6/environment.rkt")
  (require "ass-as-data.rkt")
  (require "examples.rkt" "examples-fun.rkt" "examples-ood.rkt")
  (require "left-hand-side-value.rkt")
  (require "../4/possible-values.rkt")
  (require SwDev/Debugging/spy)
   
  #; {Value = Number || (function-value parameter AssExpr Env)}
   
  (define UNDECLARED "undeclared variable ~e")
   
  #; {AssExpr -> Value}
  ;; determine the value of ae via a substitutione semantics 
  (define (interpret ae0)
   
    #; {AssExpr Env -> Value}
    ;; ACCUMULATOR env tracks all declarations between ae and ae0
    (define (interpret ae env)
      (match ae
        [(? integer?)
         ae]
        [(node o a1 a2)
         (define right (number> (interpret a2 env)))
         (define left  (number> (interpret a1 env)))
         (o left right)]
        [(decl x a1 a2)
         (define the-lhs (new boxed-value% [content #f]))
         (define env++   (add x the-lhs env))
         (define lhs-val (interpret a1 env++))
         (send the-lhs get-then-set lhs-val)
         (interpret a2 env++)]
        [(? string?)
         (if (defined? ae env)
             (send (lookup ae env) get)
             (error 'vo UNDECLARED ae))]
        [(call ae1 ae2)
         (define right (interpret ae2 env))
         (define left  (function> (interpret ae1 env)))
         (fun-apply left right)]
        [(fun para body)
         (function-value para body env)]
        [(if-0 t thn els)
         (define test-value (interpret t env))
         (if (and (number? test-value) (= test-value 0))
             (interpret thn env)
             (interpret els env))]
        [(set lhs rhs)
         ;; env.lookup(lhs).get_than_set(rhs.interpret(env))
         (send (lookup lhs env) get-then-set (interpret rhs env))]
        [(sequ fst rst)
         (interpret fst env)
         (interpret rst env)]))
   
    #; {Value Value -> Value}
    (define (fun-apply function-representation argument-value)
      (match function-representation
        [(function-value fpara fbody env)
         (define new-box (new boxed-value% [content argument-value]))
         (interpret fbody (add fpara new-box env))]))
   
    (interpret ae0 empty))
   
  #; {Any -> Number}
  (define (number> x)
    (if (number? x)
        x
        (error 'interpreter "number expected, given ~e " x)))
   
  #; {Any -> Function}
  (define (function> x)
    (if (function-value? x)
        x
        (error 'interpreter "closure expected, given ~e " x)))
   
  ;; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  (module+ test
    (require rackunit)
   
    (check-equal? (interpret (if-0 0 1 2)) 1)
    (check-equal? (interpret (if-0 1 0 2)) 2)
    (check-equal? (interpret example1) (example1-asl))
    (check-equal? (interpret example1) (example1-object)))
   

Figure 36: The Interpreter

Thinking about the Interpretation of AssExpr

Stop! This second interpreter solves the problem explained above. How again?

Stop again! The interpreter also solves the problems with the interpretation of recursive functions. Why? How?

But, this second interpreter introduces a problem all of its own that clearly contradicts everybody’s inner mathematical sense. Can you see it?