5 Language Extensions

7.8.0.8

← prev up next →

5 Language Extensions

Goals

— when language extension is appropriate

— creating language extensions

— making language extensions well behaved

— generating small code for language extensions

5.1 When and Why

Extending a language is necessary to eliminate repeated programming patterns—often derisive called boilerplate—that cannot be hidden with the available abstraction mechanisms: functions, class inheritance, and so on. In addition to abstracting over repeated programming patterns, language extensions come with additional advantages, such as enforcing invariants that an implementation may exploit.

_f: addq %r7, %r7
    cmpq %r7, $32
    jle _f
    retq

_g: movq %r7, $8
    callq _f
    movq %r8, %r7
    callq _f
    addq %r8, %r7
    retq

int f (int x) {
while ( (x += x) <= 32 ) {}
return x;
}

int g () {
int t = f(8);
return t + f(t);
}
Figure 1: The Procedure Pattern

Language extensions add abstractions to a programming language that it does not already have. For example, in most assembly languages, there is the abstraction of the stack and the program counter, but there is not an abstraction of a procedure. In C, we add the language extension of procedure calls by integrating the many features of pushing arguments on the stack, jumping while pushing a return address, and extending the stack to allocate local variables. Although these features exist in assembly, their use does not constitute an abstraction, because assembly does not allow us to enforce the integrity of the procedure. For example, in C, it is not possible to goto a label in another procedure, while this is permissible inside of procedure-style assembly. Enforcing these abstractions is valuable because it allows us to implement the abstraction in ways that cannot be scalably done at the level of the pattern. For example, in C, we can perform exotic optimizations on our program because we know that the stack will always have the same structure at the start of every label, which is not the case in assembly. A fundamental aspect of language-oriented programming is identifying these intended abstractions and the invariants that enforce their integrity, then exploiting those invariants to produce a better program than you would have done without the abstraction.

button.onclick =
function () {
  channel
   .send("Where's my money?",
    function (ans) {
     box.text = ans; }); }

button.onclick =
async function () {
  var ans = await channel
   .send("Where's my money?");

  box.text = ans; }
Figure 2: The Async Callback Pattern

It is typically the case that prior to the creation of a language extension, programmers that wish to use the concept employ a programming pattern, as mentioned above. For example, in JavaScript prior to ES7, it was typical for asynchronous functions to accept an explicit callback argument and extensive libraries and patterns built up to consume and produce these callbacks. In 2016, however, ES7 introduced the async and await keywords that automatically produce these callbacks from callback-less functions. Identifying an appropriate opportunity for language extension typically involves identifying a pattern like this in many programs.

As already mentioned, language extensions are distinct from existing abstractions. They can neither be implemented as functions nor via class inheritance or other abstraction mechanisms. Consider the asyncronous callback pattern, await is not a function that is definable inside of JavaScript. It is not a function that is definable inside of Racket either, because it consumes a syntactic block that contains a function call, rather than a value. It is, however, definable as a syntactix extension in Racket.

5.2 Two Simple Language Extensions

Don’t forget to add (require (for-syntax syntax/parse)) to your program.

A simple re-occurring programming pattern in programs in the timer pattern. In this pattern, you want to know how long an expression took to evaluate, so you capture the time before, and the time after, then display the difference. For example:

(let* ([before (current-inexact-milliseconds)]
[answer (factorial (fibonacci 10))]
[after (current-inexact-milliseconds)])
(printf "It took ~a ms to compute.\n" (- after before))
answer)

This code sequence is awkward, especially when the expression to be timed is embedded inside of a larger context, such as

(printf "The factorial of the tenth fibonacci is: ~a\n"
(factorial (fibonacci 10)))

because the surrounding context is not already suited to define the local variables to capture the time and the answer.

If you attempt to abstract this programming pattern into a function, as

(define (try-to-time-it computation)
(let* ([before (current-inexact-milliseconds)]
        [answer computation]
        [after (current-inexact-milliseconds)])
  (printf "It took ~a ms to compute.\n" (- after before))
  answer))

(printf "The factorial of the tenth fibonacci is: ~a\n"
        (try-to-time-it (factorial (fibonacci 10))))

then you will find that your computer suddenly got a lot faster. This is because try-to-time-it is not executed until after the computation is finished evaluating. This means that to make an abstraction out of this timer, you must employ the thunk pattern:

(define (thunk-time-it do-computation)
(let* ([before (current-inexact-milliseconds)]
        [answer (do-computation)]
        [after (current-inexact-milliseconds)])
  (printf "It took ~a ms to compute.\n" (- after before))
  answer))

(printf "The factorial of the tenth fibonacci is: ~a\n"
        (thunk-time-it (λ () (factorial (fibonacci 10)))))

This implementation of thunk-time-it fails to be a satisfying abstraction, because it exposes to the user that it is a function and that the user cannot simply "time the evaluation of expression", but must also explicitly cooperate with thunk-time-it and delay the evaluation of the computation.

Let’s use our macro knowledge then. Here is a time-it form with which programmers can directly measure how long the evaluation of an expression takes:

(define-syntax (time-it stx)
(syntax-parse stx
[(_ computation) #'(thunk-time-it (λ () computation))]))

Equipped with this form, a programmer write simple expressions like this

(printf "The factorial of the tenth fibonacci is: ~a\n"
(time-it (factorial (fibonacci 10))))

and get the desired timing information. While the author of the time-it macro still needs to know about the need to delay the evaluation of the given expression, the programmer can happily supply arbitrary expressions to time-it.

Another commonly occurring programming pattern in functional Racket is the iteration-producing-a-list pattern. In this pattern, you iterate through a list and perform the same computation on every element, collecting the values into a list. For example:

(define (burn-them-all l)
  (cond [(empty? l) '()]
        [else
         (cons (string-append "Burnt " (first l))
               (burn-them-all (rest l)))]))

(burn-them-all (list "Rickard" "Brandon"))

This programming pattern is so common that experienced programmers immediately replace it with a different one:

(map (λ (s) (string-append "Burnt" s)) (list "Rickard" "Brandon"))

Although we, as functional programmers, are extremely proud of this map-lambda programming pattern, it is important to recognize it as a syntactic programming pattern. And these patterns cry out for syntactic abstraction.

Stop! Explain why (map (lambda (x) _) _) has the characteristic of a pattern.

Racket comes with an awesome abstraction for this programming pattern: for/list, a relatively recent addition to the language. Since it is familiar to you, it serves as a good case study for explaining how developers recognize the need for language extensions and how to create them via the familiar iterative refinement process.

Here is a first, simple implementation:

> (define-syntax (simple-for/list0 stx)
    (syntax-parse stx
     [(_ (elem-name a-list) computation)
      #'(map (λ (elem-name) computation) a-list)]))
> (simple-for/list0 [s (list "Rickard" "Brandon")]
   (string-append "Burnt" s))
'("BurntRickard" "BurntBrandon")

Note how this macro introduces parentheses around the elem-name and a-list. The point of these parentheses is to suggest a connection between these two pieces of the syntax. Even if you don’t know Racket’s for/list, you will recall that let pairs its variables with its initialization expressions. It is Racket’s way of drawing eyes to related notions.

Note Observers of languages such as Racket often describe them as “lacking syntax” because everything is indicated with parentheses. This observation is superficial at best, however, as the rules about parentheses suggest. These rules are the core of Racket’s syntax.

Stop! Rewrite the above macro so that it does not pair the loop identifier and its initialization expression. Show how to use it. Judge the Racket-y-ness of the code.

What Racket lacks are notational keywords. It is possible, of course, for Racket language extensions to add such keywords. For example, we could have defined our macro as

> (define-syntax (simple-for/list-verbose stx)
    (syntax-parse stx
     [(_ (~literal with) elem-name (~literal from) a-list (~literal in) comp)
      #'(map (λ (elem-name) comp) a-list)]))
> (simple-for/list-verbose with s from (list "Rickard" "Brandon") in
   (string-append "Burnt" s))
'("BurntRickard" "BurntBrandon")

And as the interaction demonstrates, this version works just as well as ours. It just isn’t the Racket way to add such superfluous words.

5.3 Evolving a Language Extension

Racket’s let enables programmers to define several variables at once, not just one at a time. It is thus natural to wonder whether simple-for/list could benefit from an analogous idea. Generally speaking, a macro often starts as a simple short-hand for some programming pattern; but, as we come to understand it as an abstraction on its own, we discover additional possibilities. Returning to the let analogy, we may get the idea that the loop should support iterating over two lists simultaneously:

(simple-for/list1 ([s (list "Rickard" "Brandon")]
[i (list 1 2)])
(string-append (number->string i) ". Burnt" s))

One option to implement this extension is to write another macro that allows two lists at the same time. Another option is to have a single macro that allows an arbitrary number of lists. Clearly this second option is better than the first, because it both covers the first one and generalizes to potential other uses.

A macro for iterating over an arbitrary number of lists is clearly a case for an ellipsis:

(define-syntax (simple-for/list1 stx)
  (syntax-parse stx
    [(_ ([elem-name a-list] ...) computation)
     #'(map (λ (elem-name ...) computation) a-list ...)]))

In the syntax pattern, the ellipsis signifies the repetition of identifier-list pairs. Hence the syntax-pattern variables elem-name and a-list stand for equally long sequences of syntax objects. In the syntax template, the sequence of elem-names becomes the parameters of a function that is mapped over an equally long sequence of list expressions.

Stop! Equip the syntax pattern with annotations that ensure simple-for/list1 reports syntax errors in terms of itself not lambda or something else.

While map supports iterating through multiple lists, let’s suppose for the rest of this section that it did not.

Here is one way to get around this (imaginary) restriction on map:

(define-simple-macro (simple-for/list1* ([elem-name a-list] ...) computation)
  (map (λ (elems)
         (define-values (elem-name ...) (split elems))
          computation)
       (transpose (list a-list ...))))

The tranpose function takes a list of N lists each with M elements; it then produces a list of M lists each with N elements. Similarly, the split function takes a list of N elements and returns them as N values.

This implementation is reasonably simple. Its downside is a drastic change to the performance of the language extension. Specifically, while a programmer may imagine a simple parallel traversal of two lists, the macro actually hides an extra linear traversal of the data and the creation of a large intermediate data structure. As every performance-aware functional programmer knows, however, allocating large intermediate data structures is a bad idea. Even if a deforesting compiler might be able to remove the traversals and constructions, it is inappropriate for macro authors to ignore such performance impacts when alternatives exist.

In this case, the alternative is to generate and apply a looping function directly:

(define-synatx (simple-for/list2 stx)
  (syntax-parse stx
    [(_ ([elem-name a-list] ...) computation)
     #'(letrec ([iteration
                  (λ (elem-name ...) computation)]
                [looping
                  (λ (elem-name ...)
                     (cond
                       [(or (empty? elem-name) ...) '()]
                       [else
                         (cons (iteration (first elem-name) ...)
                               (looping (rest elem-name) ...))]))])
         (looping a-list ...))]))

This elaborate use of ... arranges for each list to be bound to name elem-name inside the looping function, then each first element to be bound to the same name via the iteration function, which represents the body of the loop.

By exploring alternative implementation strategies, we have discovered a slightly different semantics. When map is called with multiple lists, each must have the same length. When our new macro iterates through multiple lists, it merely stops when one of the list runs out of elements. That is, the following use of the macro works just fine:

(simple-for/list2 ([s (list "Rickard" "Brandon")]
[i (list 1 2 3)])
(string-append (number->string i) ". Burnt" s))

And as Racket programmers know, this semantics is the one that for/list supports, and it is more pragmatic than the one that expanding to map would afford.

Our iteration macro is needlessly complicated with its re-use of the element name to bind the lists themselves. It only does this because we need to have an equal number of arguments to the iteration function as there are lists and we know that the number of element names is the same as the number of lists.

We can eliminate this small annoyance with a use #:with:

(define-syntax (simple-for/list3 stx)
  (syntax-parse stx
    [(_ ([elem-name a-list] ...) computation)
     #:with (list-name ...) (generate-temporaries #'(elem-name ...))
     #'(letrec ([iteration
                  (λ (elem-name ...) computation)]
                [looping
                  (λ (list-name ...)
                     (cond
                       [(or (empty? list-name) ...) '()]
                       [else
                         (cons (iteration (first list-name) ...)
                               (looping (rest list-name) ...))]))])
         (looping a-list ...))]))

Just as a reminder, let’s carefully step through the elements of this revision.

First, this macro performs computations during macro expansion with the call to the function generate-temporaries.

Second, we are constructing syntax objects with syntax/#' that do not get returned from the macro. The arguments of generate-temporaries is a syntax object that represents the sequence of names. For each element of this list, the function will generate a new name.

Stop! Could we have used #'(a-list ...) instead?

Third, the use of #:with turns the new names into syntax-pattern variables that syntax/#' may substitute when it is instantiating the output template. As in the preceding chapter, the #:with form uses a list pattern on the left to deconstruct a list of syntax objects—new names—on the right.

Finally, we should ask if it is possible to use these new names, list-name, in the computation? The answer is no, and it is not because they are actually obscured as identifiers like temporary_list-name1924_abcd. Instead, it is because Racket systematically tracks the definition of variables and ensures that variables bound in different scopes do not shadow or capture variables defined in other scopes.

5.4 Another Evolution

Iterating through a data structure is a common programming pattern in all languages. A typical iteration employs a conditional and dispatches on expected values, computing one value for the '() case, another for cons, and maybe a different thing for other atomic values, such as numbers.

(define (sum a-tree)
  (cond
    [(null? a-tree) 0]
    [(cons? a-tree)
     (define elem (first a-tree))
     (define more (rest a-tree))
     (+ (sum elem) (sum more))]
    [(number? a-tree) a-tree]
    [else
     (error "..., given: ~e" a-tree)]))

(define (in a-tree)
  (cond
    [(null? a-tree) '()]
    [(cons? a-tree)
     (define elem (first a-tree))
     (define more (rest a-tree))
     (append (in elem) (in more))]
    [(number? a-tree) (list a-tree)]
    [else
     (error "..., given: ~e" a-tree)]))
Figure 3: Traversing a Data Structure

Figure 3 displays two such iteration function with similar function bodies. This conditional is tedious to get right. Programmers tend to forget the error case, fail to name pieces of the data structure and thus re-extract them, or make other small mistakes.

At this point, you know what to do: create a language extension. The purpose of the extension is to hide the tedious details of this conditional. Doing so shouldn’t be difficult at this point:

(define-syntax (tree-match stx)
  (syntax-parse stx
    [(_ a-tree
        [(#:null) null-case]
        [(#:cons (~var one id) (~var more id)) cons-case]
        [(#:number) num-case])
     #'(cond
         [(null? a-tree) null-case]
         [(cons? a-tree)
          (define one  (first a-tree))
          (define more (rest a-tree))
          cons-case]
         [(number? a-tree) num-case]
         [else (error "..., given: ~e" a-tree)])]))

The macro uses keywords to distinguish the three cases, an appropriate use of syntax within Racket.

Using this extension, the definition of sum becomes much simpler:

(define (sum a-tree)
  (tree-match a-tree
   [(#:null) 0]
   [(#:cons elem more) (+ (sum elem) (sum more))]
   [(#:number) a-tree]))

Stop! Rewrite the in function using tree-match.

Although this macro does the job at hand, it is too precisely tied to its exact task. To begin with, it demands an exact ordering of the three cases, which makes the keywords a burden for the programmer. Furthermore, not every function that traverses a tree of cons cells, '(), and numbers must deal with all cases. Here are two alternative scenarios and sketches of functions that could benefit from an improved tree-match extension:

If numbers occur only in the left part of a cons, the data structure is list-like and a summation function might look like this:
(define (sum-list a-tree)
  (simple-match a-tree
   [(#:null) 0]
   [(#:cons elem more) (+ elem (sum-list more))]))
That is, the function body would use only the #:null and #:cons case.
If the numbers are always in the leaves, the data structure is a binary tree and the summation function would use just the #:cons and the #:number clauses of the extension:
(define (sum-leaves a-tree)
  (simple-match a-tree
   [(#:cons left right)
    (+ (sum-leaves left) (sum-leaves right))]
   [(#:number) a-tree]))

While a programmer could just use match for traversing this data structure, let’s suppose for the rest of this section that Racket did not come with such a fancy library.

We have been faced with such a situation

The way to implement this is with a macro that has one case for each different kind of clause and expands to itself recursively:

(define-syntax (simple-match stx)
  (syntax-parse stx
    [(_ var)
     #'(error "Input was not matched ~e" var)]
    [(_ var [(#:null) null-case] others ...)
     #'(if (null? var) null-case (simple-match var others ...))]
    [(_ var [(#:number) num-case] others ...)
     #'(if (number? var) num-case (simple-match var others ...))]
    [(_ var [(#:cons (~var one id) (~var more id)) cons-case] others ...)
     #'(if (cons? var)
           (let ([one (first var)]
                 [more (rest var)])
             cons-case)
           (simple-match var others ...))]))

Since this macro is the first “real” use of this technique, we briefly discuss the four cases:

The syntax pattern in the first clauses matches (simple-match x) where x is any identifier. The generated code calls the error function to report that no valid simple matches can be found.
The second pattern catches the case when the first clause of simple-match deals with the empty list. Note the others syntax variable followed by an ellipsis, which matches the remaining clauses in the simple-match. The template uses these remaining clauses to generate a recursive use of the macro in the else branch of the standard if expression.
The third syntax-parse clause is for the #:number case. It is similar to the second case.
The last syntax pattern deals with the appearance of #:cons case at the beginning of the simple-match clauses. It also generates an if expression, though with a complex let expression for the positive branch. Otherwise the template is like the preceding two cases.

Stop! How can the expansion of a simple-match macro reach the first clause of this syntax-parse expression?

5.5 Language Extensions Must Behave Properly

The macros of this chapter aren’t proper language extensions yet. A proper extension is indistinguishable from a linguistic construct that comes with the core language. In particular, like such constructs, a proper extension does not reveal—especially not through error messages—that it is just programmer-defined syntax.

Our simple-for/list0 macro violates this rule in several ways.

(define-syntax (simple-for/list0 stx)
  (syntax-parse stx
   [(_ (elem-name a-list) computation)
    #'(map (λ (elem-name) computation) a-list)]))

The intention is that each element of a-list is bound to the identifier elem-name inside of computation. However, our implementation fails to respect this intention, exposing implementation details to programmers and possibly confusing them:

First, it reveals that it is using map in the implementation by not checking that seq evaluates to a list:
> (simple-for/list0 [x 5] (add1 x))
map: contract violation
expected: list?
given: 5
Second, it exposes that it places elem-name into the formals positions of a lambda:
> (simple-for/list0 [5 (list 1 2)] (add1 x))
λ: not an identifier, identifier with default, or keyword
at: 5
in: (λ (5) (add1 x))

You know how to fix the second one by adding an annotation. For the first problem you might be tempted to inject a run-time test like this:

> (define-syntax (robust-for/list0 stx)
    (syntax-parse stx
     [(_ ((~var elem-name id) a-list) computation)
      #'(if (list? a-list)
            (map (λ (elem-name) computation) a-list)
            (error 'robust-for/list0 "expected a list, given ~e" a-list))]))

But watch what happens when someone writes code like this:

> (define (tee x)
    (displayln `(debugging ,x) (current-error-port))
    x)
> (define (run l)
    (robust-for/list0 (i (tee l))
      (add1 i)))
> (run '(1 2 3))
(debugging (1 2 3))
(debugging (1 2 3))
'(2 3 4)

The two lines of output must disturb you. An initialization expression, such as the one for a loop variable, should never be evaluated twice.

Improving on robust-for/list0 is easy. Instead of using the initialization expression in three places, we evaluate it once via a let expression and use the local variable three times:

(define-syntax (robust-for/list1 stx)
  (syntax-parse stx
    [(_ ((~var elem-name id) a-list) computation)
     #'(let ([a-list-v a-list])
         (if (list? a-list-v)
             (map (λ (elem-name) computation) a-list-v)
             (error 'robust-for/list1 "Expected list, given ~e" a-list-v)))]))

Now the world is fine. Or is it?

Suppose for a moment that you have learned enough to place this wonderful new macro into a library and to provide it from there to the numerous Racket programmers who can’t wait to use it to improve their code. When one of these coders doesn’t provide an initialization expression that evaluates to a list, the error message from the macro might not be as informative as one from contracted library function. That is, it may not blame the module that contains the actual bug.

For the final hardening step of our simple loop macro, we need to explain more of the expressive power of syntax-parse:

the #:declare directive, which allows for the for declaration of annotations;
another syntax class, expr/c, which can also be used to annotate syntax-pattern variables; and
the notion of syntax-class attributes.

With all these ideas in place, the macro definition can be formulated as follows:

> (define-syntax (robust-for/list2 stx)
    (syntax-parse stx
      [(_ ((~var elem-name id) a-list) computation)
       #:declare a-list (expr/c #'list?)
       #'(map (λ (elem-name) computation) a-list.c)]))

While short, this code is a mouthful and deserves a thorough explanation.

Before we explain this final version of the macro, though, we demonstrate that it behaves properly in all the erroneous situation mentioned above:

> (robust-for/list2 [5 (list 1 2)] (add1 x))
robust-for/list2: expected identifier
  at: 5
  in: (robust-for/list2 (5 (list 1 2)) (add1 x))
> (robust-for/list2 [x 5] (add1 x))
robust-for/list2: contract violation
  expected: list?
  given: 5
  in: (listof any/c)
      macro argument contract
  contract from: 'program
  blaming: (quote program)
   (assuming the contract is correct)
  at: eval:87.0
> (define (run2 l)
    (robust-for/list2 (i (tee l))
      (add1 i)))
> (run2 '(1 2 3))
(debugging (1 2 3))
'(2 3 4)

As you can see, the macro properly catches syntactic mistakes, discovers when the initialization expression delivers something else than a list, assigns blame like a well-contracted library function, and evaluates this expression only once.

Stop! Can you think of any other syntactic mistake that a programmer could make in conjunction with robust-for/list2? Do the compile-time error message make sense? Do they reveal that the construct is programmer-defined when compared to error messages from Racket’s core constructs?

It’s time to explain the implementation of robust-for/list2. First, we turn to expr/c, which is somewhat like id but consumes the syntax object representation of a predicate. Its result is an actual syntax class. Second, we can now declare a syntax-pattern variable such as a-list to have this syntax class. Third, the effect of this declaration is to attach an attribute to the matched syntax-pattern variable. Here the attribute’s name is c, its value is the matched expression wrapped with a contract check, and it is retrieved by using a-list.c. And all this explains the small revision of the template:

(define-syntax (robust-for/list2 stx)
  (syntax-parse stx
    [(_ ((~var elem-name id) a-list) computation)
     #:declare a-list (expr/c #'list?)
     #'(map (λ (elem-name) computation) a-list.c)]))

By in-lining a version of a-list wrapped in a contract, we get two things at once: a single evaluation of the original expression and a proper contract error if the result isn’t a list.

5.6 Language Extensions Must Generate Properly Sized Code

Figure 4 displays two implementations of the time-it language extension side by side. At first glance the two look equivalent. Both The original one on the left and the new one on the right essentially perform the same computation steps: they grab the current time, run the computation, grab the current time again, compute how many milliseconds have passed, print this number, and finally return the result of the computation. The new definition is more compact than the old one, and thus you may wonder whether the new one is an acceptable implementation of the time-it extension.

original macro definition

alternative definition
(define-syntax (time-it stx)
  (syntax-parse stx
   [(_ computation)
    #'(thunk-time-it
        (λ () computation))]))

(define (thunk-time-it do-c)
  (define before (cim))
  (define answer (do-c))
  (define after  (cim))
  (define delta  (- after before))
  (printf "time: ~a ms\n" delta)
  answer)

(define-syntax (alt-time-it stx)
  (syntax-parse stx
    [(_ computation)
     (let()
       (define before (cim))
       (define answer computation)
       (define after  (cim))
       (define delta (- after before))
       (printf "time: ~a ms\n" delta)
       answer)]))
To keep things short:
(define cim current-inexact-milliseconds)
Figure 4: Code Size Matters

Let’s compare the two implementations in detail. The original definition generates a function application whose function position refers to a run-time function definition and whose argument us a generated, null-ary procedure. The alternative definition in-lines the body of the function and thus does not need to create the procedure; it merely places the computation code at the right place.

Based on this detailed description, we can now describe the effects in simple terms. While time-it generates a small amount of code and shares most of the needed code via a run-time library, alt-time-it duplicates the generated code for measuring the time at every single use site of the macro. If this language extension is used a lot, the resulting Racket code may become large, and large source code may affect the compiler’s back-end, e.g., put pressure on a register allocator. The downside of time-it is that it allocates a closure for every call site, which may create pressure on the memory manager at run-time.

Although time-it is a small language extensions, we can still generalize from these observations. We observe two points. On one hand, the generation of code is the point of language extensions via macros. On the other hand, keeping this code small is a good idea, and like compiler writers, language extenders create auxiliary run-time functions to achieve this goal.

Stop! Doesn’t the multi-list variant of our simple-for/list extension generate large code blocks with recursive functions? Can this be simplified?

5.7 Macros in Libraries

Say you have developed your macros, protected them against potential compile-time and run-time mistakes, and reduced the size of generated syntax as much as possible in the given time. You are ready to create your first extension library. Figure 5 demonstrates how to accomplish this last step. You place the code into a module, both the macro definitions and the run-time definitions that your generated code needs; and if you need compile-time only definitions, place them into this module, too. Then you specify those identifiers that you wish to export in a provide specification. This may include macros, as the figure illustrates. Be a good citizen and document the functionality of the exported identifiers.

Stop! The figure uses a new syntax-class annotation. What do you think it means?

"extensions.rkt"
#lang racket

(provide
  ; (my-for/list ((~var elem-name id) (~var a-list expr))
  ;   (~var computation))
  ; evaluates computation with elem-name successively
  ; bound to the elements of a-list, which must evaluate to a list
  my-for/list

  ; (time-it (~var computation expr))
  ; evaluates computation and returns the result
  ; —it measure the time it takes and prints the number of ms consumed
  time-it)

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(require (for-syntax syntax/parse))

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(define-syntax (my-for/list stx)
  (syntax-parse stx
    [(_ ((~var elem-name id) a-list) computation)
     #:declare a-list (expr/c #'list?)
     #'(map (λ (elem-name) computation) a-list.c)]))

(define-syntax (time-it stx)
  (syntax-parse stx
   [(_ computation)
    #'(thunk-time-it
        (λ () computation))]))

(define (thunk-time-it do-c)
  (define before (cim))
  (define answer (do-c))
  (define after  (cim))
  (define delta  (- after before))
  (printf "time: ~a ms\n" delta)
  answer)

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; local tests
Figure 5: An Extension Library

Figure 6 shows how to program with these extensions. It simply suffices to require the extension library. The use of a relative path name in the example relies on the assumption that the two files—"extensions.rkt" and "extensions-client.rkt"—exist in the same directory. With the macros imported, the rest of the module may use them as if they had always existed. The require specification is the only evidence that the forms used in the module are language extensions.

"extensions-client.rkt"
#lang racket

(require "extensions.rkt")
(require math)

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(time-it
(my-for/list (p (build-list 10000 (λ (i) (+ 50000000000 i))))
`(,p ,(prime? p))))
Figure 6: An Extension Client

If you know how to create a collection with raco, you may create one for the extension library and use a collection-based path for importing the language extensions.

5.8 Lessons Learned

This chapter is about extending the Racket language with macros. Like an ordinary library function, a language extension doesn’t come about easily and deserves careful design. And also like a library function, once the extension exists, people will make use of it and the uses may even surprise the creator.

So here are our first lessons on language extensions and the process of creating them:

A discerning programmer looks out for programming patterns. While functional and object-oriented abstraction can eliminate patterns that differ in values only, some patterns are of syntactic nature and hiding those calls for language extensions.
The implementation of a language extension uses macros. Like an ordinary language construct, a language extension needs compile-time and run-time components.
The compile-time component consists of macros and compile-time functions. A macro rewrites pieces of code into different pieces of code. These generated pieces of code may refer to identifiers from run-time libraries as well as identifiers locally defined for just this macro.
The run-time component consists of collections of arbitrary definitions. The use of these run-time values is usually restricted by the macro-generated code and may thus make certain logical assumptions. Conversely, the macro may also make assumptions about the run-time components.
Once the language extension is developed, it is imperative to use it and explore its capabilities and limitations. Like libraries, language extensions need several iterations to settle on a good interface and services. This evolution usually adds syntactic conveniences and expands the facilities’ functionality. Additionally this phase of the process calls for paying attention to the language extension’s error behavior and its impact on compilation time.
The creator of a language extension is the compiler writer’s sub-contractor. Like the compiler writer, this creator must make sure that the extension behaves well, even when programmers do not use it properly. And like the creation of a compiler, this process must pay attention to both syntax and run-time errors.
When it is possible to discover an error at compile time, the macro’s implementation should do so. The syntax-class annotations of syntax-parse are a macro’s writer’s friends for this task. So far we have used some built-in ones; in the next chapter we show how to program syntax classes and that they can do more than check the properties of syntax objects.
When a constraint must be checked at run-time, the expr/c syntax class—technical a parametrized syntax class—provides services analogous to Racket’s contract system. Using this syntax class also tends to ensure that run-time checking does not accidentally evaluate a given expression more than necessary.
Like a compiler writer, the creator of language extensions must pay attention to the effects of a new linguistic construct on compilation and run time.
To reduce the effect on compilation time, a macro must keep the generated code—after all the use of a macro in client module generates code, unlike the use of a library function, which is just a function call. It is thus best to move code into a run-time definition if it doesn’t depend on a macro’s arguments. The generated code may freely refer to this factored out code.
To reduce the effect on run time, the generated code should follow the usual rules of programming.

An Insight

You may have noticed that the combination of define-syntax and syntax-pattern lead to a repetitive a style of programming that in many cases, employs a syntactic programming pattern. Specifically, the pattern is to write down

(define-syntax (macro-name stx) _)
(syntax-parse stx _) for the dash in the preceding line
write down one clause consisting of a one-line syntax pattern and a template that starts with syntax.

Clearly, this programming patterns calls for an abstraction.

Stop! Can you use define-syntax and syntax-parse to create a language extension that hides this pattern? It might be good to call it define-simple-macro.

← prev up next →

	Prelude
1	Programming Languages
2	A Simple Model
3	Onward to Racket
4	Syntax Pattern
5	Language Extensions
6	Syntax Classes
7	Macro Conspiracies
8	Scope and Hygiene
9	Embedded Languages
10	Module Languages
	Postlude

5.1	When and Why
5.2	Two Simple Language Extensions
5.3	Evolving a Language Extension
5.4	Another Evolution
5.5	Language Extensions Must Behave Properly
5.6	Language Extensions Must Generate Properly Sized Code
5.7	Macros in Libraries
5.8	Lessons Learned