On this page:
2.1 Definition
2.2 Lists are special structures
2.3 S-expressions and Other Nestings
2.4 Quote and Quine

2 Some More Design, Some More Racket

Let us work through one more exercises to see how the design recipe works.

#lang racket

(require rackunit)

#|
   PROBLEM STATEMENT

   A Russian doll is either a solid wooden figure or a
   Russian doll encased in a hollow wooden dress. Each
   dress has a distinct color.

   Design a program that consumes the representation of a
   Russian doll and counts how many layer wrap the core
   figure.

|#

;; DATA DEFINITION

(struct dress (content) #:transparent)
;; A Russian-doll (RD) is one of:
;; "wooden figure"
;; (dress Russian-doll)


;; RD -> Number
;; how many dresses are around the "wooden figure" in rd
;; given (dress (dress (dress (dress "wooden figure"))))
;; expect 4
(define (count-dresses rd)
  (cond
    [(string? rd) 0]
    [else ; (dress? rd)
     (+
      (count-dresses (dress-content rd))
      1)]))

(check-equal? (count-dresses (dress (dress (dress (dress "wooden figure"))))) 4)
(check-equal? (count-dresses "woodern figure") 0)

Figure 5: The dresses of a Russian doll

Figure 5 isn’t about anything realistic. It is a finger exercise in a minimalistic representation of a naturally occurring nested piece of "information." The key is, however, that by carefully following the design recipe, the definition of the function itself is a matter of a few keystrokes. The question is whether/how to come up with these keystrokes.

2.1 Definition

The step from the template to the full definition is all about combing the values that the template expressions compute. Here are some basic ideas on how to go about that.

What are the answers for the non-recursive cond clauses?

The examples should tell you which values you need here. If not, formulate appropriate examples and tests.

What do the selector expressions in the recursive clauses compute?

The data definitions tell you what kind of data these expressions extract and the interpretations of the data definitions tell you what this data represents.

What do the natural recursions compute?

Use the purpose statement of the function to determine what the value of the recursion means not how it computes this answer. If the purpose statement doesn't tell you the answer, improve the purpose statement.

How can you combine these values into the desired answer?

If you are stuck here, arrange the examples from the third step in a table. Place the given input in the first column and the desired output in the last column. In the intermediate columns enter the values of the selector expressions and the natural recursion(s). Add examples until you see a pattern emerge that suggests a ``combinator'' function.

Figure 6: How to turn the template into a function definition

In addition, keep in mind that you should design auxiliary functions when a function gets large. Conversely, keep functions small and the "combination" functions are natural candidates for separate helper functions. HtDP/2e has some additional guidelines on this topic.

Finally, on some occasions the “combinator” cannot be just a function. Because the arguments to functions are evaluated first – Racket is a call-by-value language – you may have to create nested conditional expressions on occasion.

Work through the example in figure 7 using the question-answer games for creating templates and function definitions.

#lang racket

(require rackunit)

#|
   PROBLEM STATEMENT

   An arithmetic expression is either a number, a string
   (representing a variable), an addition of two
   arithmetic expressions, or a subtraction of two
   arithmetic expressions. Design a function that
   replaces some given string with the number 5 in an
   arithmetic expression.

|#

;; DATA DEFINITION

(struct plus (left right) #:transparent)
(struct subt (left right) #:transparent)
;; An AE is one of:
;; number
;; string
;; (plus AE AE)
;; (subt AE AE)


;; String AE -> AE
;; eeplace all occurrences of string s with 5 in ae
(define (replace s ae)
  (cond
    [(number? ae) ae]
    [(string? ae) (if (string=? s ae) 5 ae)]
    [(plus? ae)
     (plus (replace s (plus-left ae))
           (replace s (plus-right ae)))]
    [else
     (subt (replace s (subt-left ae))
           (replace s (subt-right ae)))]))


(check-equal? (replace "hari" 0) 0)
(check-equal? (replace "hari" "hari") 5)
(check-equal? (replace "hari" (plus 42 "hari")) (plus 42 5))
(check-equal? (replace "hari" (subt 42 "hari")) (subt 42 5))

;; the second cond line needs another test case:
(check-equal? (replace "matthias" (plus 42 "hari")) (plus 42 "hari"))

Figure 7: Arithmetic expressions and their value

The example illustrates a first important point about programming languages, too. Instead of working through syntax-as-strings, a programming language researcher thinks of the syntax of a language as trees and of functions/programs that process a program as tree-walking functions. This form of syntax is called abstract syntax tree (AST).

2.2 Lists are special structures

Racket provides one prominent form of data, lists. Lists aren’t really special, however. And it isn’t lists per se that Racket provides but cons structures, which combine two arbitrary values. Thus

(cons 12 23)

creates an instance of cons as does

(cons "hello" (cons 42 "world"))

The key to lists is the nesting of cons instances inside of each other. In particular, lists nest cons structures in the right field and the chain of right fields is the spine of the list.

The end of the list is a special token: empty also written as '().

Naturally, cons structures come with a predicate, cons?, as does the special token empty, namely empty?. The only unusual aspect of lists is that the cons selector names don’t start with cons-. Instead they are called first and restas long as they are dealing with "real" lists.

Traditionally the selectors for cons structures are known as car and cdr, and yes, these selectors are available in Racket. We will deal with proper lists whenever possible and ignore these two selectors therefore.

Here is the data definition for "real" lists:
; A List is one of:
;  empty
;  (cons AnyValue List)
This definition immediately rules out the values above: yes they are instances of cons, but no they are not elements of List.

The definition also shows how to create special subsets, say lists of strings:
; A LOS  (list of strings) is one of:
;  empty
;  (cons String LOS)
Another example is a list of list of numbers:
; A LON  (list of numbers) is one of:
;  empty
;  (cons Number LON)
 
; A LOLON  (list of list of numbers) is one of:
;  empty
;  (cons LON LOLON)

If you are wondering why the collection of lists contains the empty list, it doesn’t have to. Here is a definition that introduces a set of non-empty lists of strings:
; A NELOS is one of:
;  (cons String empty)
;  (cons String NELOS)

Using non-empty lists may strike you as pragmatic. A function that counts the number of strings on a list of strings is a good example:
(define (count los)
  (cond
    [(empty? los) 0]
    [else (+ (count (rest los)) 1)]))
(define (count nl)
  (cond
    [(empty? (rest nl)) 1]
    [else (+ (count (rest nl)) 1)]))
The difference is really only visible in the first condition. The version for general lists—on the left-hand side—uses (empty? los) to check whether the list is empty. In contrast, the version for non-empty lists—on the right-hand side—needs to extract the rest and then check whether it is empty. Generally, however, it is much easier to deal with the general set of lists because it greatly simplifies the design of functions.

Consider a function that looks for the presence of "xxx" in a list of strings:
(define (xxx? los)
  (cond
    [(empty? los) false]
    [else
      (or (string=? (first los) "xxx")
          (xxx? (rest los)))]))
(define (xxx? nl)
  (cond
    [(empty? (rest nl))
     (string=? (first los) "xxx")]
    [else
      (or (string=? (first los) "xxx")
          (xxx? (rest los)))]))
This comparison demonstrates how a function must check even the last item on a list, because it could just be the one that is the key to the computation.

You will find that this kind of code repetition between the two cond clauses is common, and the version for plain lists almost always looks simpler than the one for non-empty lists.

Exercise: Racket is untyped so cons doesn’t have to be used with the linear list format, and Racketeers use cons in all kinds of ways, because it is convenient. See below.

Here is an example. Represent Russian dolls with a nested cons representation like this:
;  A S-Russian-doll is one of:
;  "wooden"
;  (cons Russian-doll empty)
Now design a function that counts the number of layers around the wooden figure.

2.3 S-expressions and Other Nestings

Lisp introduced S-expressions as Lisp’s syntax and as its major datatype. In Racket the collection of S-expressions is just a subset of the universe of data:
; An S-expression is one of:
;  Atom
;  LS-expression
; 
; An LS-expression is one of:
;  empty
;  (cons S-expression LS-expression)
; 
; An Atom is ...
This data definition refers to Atom, however, which is an unspecified set. There are two ways to deal with this problem:
  1. consider Atom a parameter of the definition and specify it differently at different times; or

  2. define Atom once and for all as some set of atomic data.

The literature knows both approaches and both are used in Racket programming.

The definition also uses two interrelated data definitions, but still programming with S-expressions isn’t any different from programming with arithmetic expressions or Russian dolls. The key is to create two interrelated templates that refer to each other where the data definitions refer to each other.

To create each template, answer the questions from figure 4. Doing so says that the template needs a three-pronged cond; that empty? and atom? of first distinguish the first two cases (which is enough); that the second and third clause involve structs but the first one is about atomic data; and that the three self-references of the data definition imply three recursions in the template:
(define (s-template s)
  (cond
    [(atom? s) ...]
    [else ... (ls-template s) ...]))
 
(define (ls-template s)
  (cond
    [(empty? s) ...]
    [else ... (s-template (first s)) ... (ls-template (rest s)) ...]))
Other than that, there is little special about S-expressions.

2.4 Quote and Quine

Well, S-expressions are an extremely convenient representation for (extensible) languages. It is noteworthy here to say that the XML community discovered this fact some 50 years after the Lispers. What would the world know if they listened to Lispers instead of complaining about the parentheses?

The key to S-expressions are three special notations for lists: quote (’), quasiquote (`), and unquote (,). There is more but these three ideas are a good start.

In addition, you need to know about one more atomic datatype: symbols. A symbol is a quote (’) form followed by keys other than these: ( ) , ` ’ and a few more.

So '(1 2 3) is short for (cons 1 (cons 2 (cons 3 '()))); see lists-as-sequences-of-cons are easy to create with a single stroke. In reality quote is distributed over the pieces of a parenthesized form, so '(a 2 "hello") is short for (cons 'a (cons 2 (cons "hello" '()))). In other words, the letter a turned into a symbol, while quote numbers and string remain what they are. To show how powerful it all is, look at this S-expression:
'(html ()
       (head ()
             (title () "a first web page"))
       (body ((bgcolor "white"))
            "My  first web page. Please admire it."))
The quote just creates all the necessary conses and symbols and otherwise leaves things alone. The result is a concise representation of the HTML for a simple web page.

When the quote is distributed over a parenthesized expression, it makes no exception. Thus

'(a b (+ 1 2) c d)

turns into

(cons 'a (cons 'b (cons '(+ 1 2)  (cons 'c (cons 'd '())))))

and then the inner quoted expression is dealt with in the same manner:
(cons 'a
  (cons 'b
    (cons (cons '+ (cons 1 (cons 2 '())))
      (cons 'c
        (cons 'd '())))))

In some cases, however, the program needs the value of this inner expression not its symbolic representation. This is what quasiquote and unquote accomplish:

`(a b ,(+ 1 2) c d)

When this shorthand is expanded, the comma exempts the nested quotation and the expansion is this expression:

(cons 'a (cons 'b (cons (+ 1 2)  (cons 'c (cons 'd '())))))

Note how the inner addition expression isn’t quoted, and hence is evaluated once the program runs. The result is this piece of data:

(cons 'a (cons 'b (cons 3 (cons 'c (cons 'd '())))))

For an example of how quasiquote and unquote work in functions, take a look at figure 8.

#lang racket

;; String String -> S-expression
(define (a-web-page t stuff)
  `(html ()
         (head ()
               (title () "A " ,t " web page"))
         (body ((bgcolor "white"))
               "My " ,stuff " web page. Please admire it.")))

(a-web-page "first" "first")
(a-web-page "second" "beautiful")
(a-web-page "last" "final")

Figure 8: Quasiquote and unquote at work