5 Language Extensions
Goals |
— |
— |
— |
— |
5.1 When and Why
Extending a language is necessary to eliminate repeated programming
patterns—
_f: addq %r7, %r7
cmpq %r7, $32
jle _f
retq
_g: movq %r7, $8
callq _f
movq %r8, %r7
callq _f
addq %r8, %r7
retq
int f (int x) {
while ( (x += x) <= 32 ) {}
return x;
}
int g () {
int t = f(8);
return t + f(t);
}
Language extensions add abstractions to a programming language that it does not already have. For example, in most assembly languages, there is the abstraction of the stack and the program counter, but there is not an abstraction of a procedure. In C, we add the language extension of procedure calls by integrating the many features of pushing arguments on the stack, jumping while pushing a return address, and extending the stack to allocate local variables. Although these features exist in assembly, their use does not constitute an abstraction, because assembly does not allow us to enforce the integrity of the procedure. For example, in C, it is not possible to goto a label in another procedure, while this is permissible inside of procedure-style assembly. Enforcing these abstractions is valuable because it allows us to implement the abstraction in ways that cannot be scalably done at the level of the pattern. For example, in C, we can perform exotic optimizations on our program because we know that the stack will always have the same structure at the start of every label, which is not the case in assembly. A fundamental aspect of language-oriented programming is identifying these intended abstractions and the invariants that enforce their integrity, then exploiting those invariants to produce a better program than you would have done without the abstraction.
button.onclick =
function () {
channel
.send("Where's my money?",
function (ans) {
box.text = ans; }); }
button.onclick =
async function () {
var ans = await channel
.send("Where's my money?");
box.text = ans; }
It is typically the case that prior to the creation of a language extension, programmers that wish to use the concept employ a programming pattern, as mentioned above. For example, in JavaScript prior to ES7, it was typical for asynchronous functions to accept an explicit callback argument and extensive libraries and patterns built up to consume and produce these callbacks. In 2016, however, ES7 introduced the async and await keywords that automatically produce these callbacks from callback-less functions. Identifying an appropriate opportunity for language extension typically involves identifying a pattern like this in many programs.
As already mentioned, language extensions are distinct from existing abstractions. They can neither be implemented as functions nor via class inheritance or other abstraction mechanisms. Consider the asyncronous callback pattern, await is not a function that is definable inside of JavaScript. It is not a function that is definable inside of Racket either, because it consumes a syntactic block that contains a function call, rather than a value. It is, however, definable as a syntactix extension in Racket.
5.2 Two Simple Language Extensions
Don’t forget to add (require (for-syntax syntax/parse)) to your program.
A simple re-occurring programming pattern in programs in the timer pattern. In this pattern, you want to know how long an expression took to evaluate, so you capture the time before, and the time after, then display the difference. For example:
(let* ([before (current-inexact-milliseconds)] [answer (factorial (fibonacci 10))] [after (current-inexact-milliseconds)]) (printf "It took ~a ms to compute.\n" (- after before)) answer)
This code sequence is awkward, especially when the expression to be timed is embedded inside of a larger context, such as
(printf "The factorial of the tenth fibonacci is: ~a\n" (factorial (fibonacci 10)))
because the surrounding context is not already suited to define the local variables to capture the time and the answer.
If you attempt to abstract this programming pattern into a function, as
(define (try-to-time-it computation) (let* ([before (current-inexact-milliseconds)] [answer computation] [after (current-inexact-milliseconds)]) (printf "It took ~a ms to compute.\n" (- after before)) answer)) (printf "The factorial of the tenth fibonacci is: ~a\n" (try-to-time-it (factorial (fibonacci 10))))
then you will find that your computer suddenly got a lot faster. This is because try-to-time-it is not executed until after the computation is finished evaluating. This means that to make an abstraction out of this timer, you must employ the thunk pattern:
(define (thunk-time-it do-computation) (let* ([before (current-inexact-milliseconds)] [answer (do-computation)] [after (current-inexact-milliseconds)]) (printf "It took ~a ms to compute.\n" (- after before)) answer)) (printf "The factorial of the tenth fibonacci is: ~a\n" (thunk-time-it (λ () (factorial (fibonacci 10)))))
This implementation of thunk-time-it fails to be a satisfying abstraction, because it exposes to the user that it is a function and that the user cannot simply "time the evaluation of expression", but must also explicitly cooperate with thunk-time-it and delay the evaluation of the computation.
(define-syntax (time-it stx) (syntax-parse stx [(_ computation) #'(thunk-time-it (λ () computation))]))
(printf "The factorial of the tenth fibonacci is: ~a\n" (time-it (factorial (fibonacci 10))))
(define (burn-them-all l) (cond [(empty? l) '()] [else (cons (string-append "Burnt " (first l)) (burn-them-all (rest l)))])) (burn-them-all (list "Rickard" "Brandon"))
(map (λ (s) (string-append "Burnt" s)) (list "Rickard" "Brandon"))
Stop! Explain why (map (lambda (x) _) _) has the characteristic of a pattern.
Racket comes with an awesome abstraction for this programming pattern: for/list, a relatively recent addition to the language. Since it is familiar to you, it serves as a good case study for explaining how developers recognize the need for language extensions and how to create them via the familiar iterative refinement process.
> (define-syntax (simple-for/list0 stx) (syntax-parse stx [(_ (elem-name a-list) computation) #'(map (λ (elem-name) computation) a-list)]))
> (simple-for/list0 [s (list "Rickard" "Brandon")] (string-append "Burnt" s)) '("BurntRickard" "BurntBrandon")
Note Observers of languages such as Racket often describe them as “lacking syntax” because everything is indicated with parentheses. This observation is superficial at best, however, as the rules about parentheses suggest. These rules are the core of Racket’s syntax.
Stop! Rewrite the above macro so that it does not pair the loop identifier and its initialization expression. Show how to use it. Judge the Racket-y-ness of the code.
> (define-syntax (simple-for/list-verbose stx) (syntax-parse stx [(_ (~literal with) elem-name (~literal from) a-list (~literal in) comp) #'(map (λ (elem-name) comp) a-list)]))
> (simple-for/list-verbose with s from (list "Rickard" "Brandon") in (string-append "Burnt" s)) '("BurntRickard" "BurntBrandon")
5.3 Evolving a Language Extension
(simple-for/list1 ([s (list "Rickard" "Brandon")] [i (list 1 2)]) (string-append (number->string i) ". Burnt" s))
One option to implement this extension is to write another macro that allows two lists at the same time. Another option is to have a single macro that allows an arbitrary number of lists. Clearly this second option is better than the first, because it both covers the first one and generalizes to potential other uses.
(define-syntax (simple-for/list1 stx) (syntax-parse stx [(_ ([elem-name a-list] ...) computation) #'(map (λ (elem-name ...) computation) a-list ...)]))
Stop! Equip the syntax pattern with annotations that ensure simple-for/list1 reports syntax errors in terms of itself not lambda or something else.
While map supports iterating through multiple lists, let’s suppose for the rest of this section that it did not.
(define-simple-macro (simple-for/list1* ([elem-name a-list] ...) computation) (map (λ (elems) (define-values (elem-name ...) (split elems)) computation) (transpose (list a-list ...))))
This implementation is reasonably simple. Its downside is a drastic change to the performance of the language extension. Specifically, while a programmer may imagine a simple parallel traversal of two lists, the macro actually hides an extra linear traversal of the data and the creation of a large intermediate data structure. As every performance-aware functional programmer knows, however, allocating large intermediate data structures is a bad idea. Even if a deforesting compiler might be able to remove the traversals and constructions, it is inappropriate for macro authors to ignore such performance impacts when alternatives exist.
(define-synatx (simple-for/list2 stx) (syntax-parse stx [(_ ([elem-name a-list] ...) computation) #'(letrec ([iteration (λ (elem-name ...) computation)] [looping (λ (elem-name ...) (cond [(or (empty? elem-name) ...) '()] [else (cons (iteration (first elem-name) ...) (looping (rest elem-name) ...))]))]) (looping a-list ...))]))
(simple-for/list2 ([s (list "Rickard" "Brandon")] [i (list 1 2 3)]) (string-append (number->string i) ". Burnt" s))
Our iteration macro is needlessly complicated with its re-use of the element name to bind the lists themselves. It only does this because we need to have an equal number of arguments to the iteration function as there are lists and we know that the number of element names is the same as the number of lists.
(define-syntax (simple-for/list3 stx) (syntax-parse stx [(_ ([elem-name a-list] ...) computation) #:with (list-name ...) (generate-temporaries #'(elem-name ...)) #'(letrec ([iteration (λ (elem-name ...) computation)] [looping (λ (list-name ...) (cond [(or (empty? list-name) ...) '()] [else (cons (iteration (first list-name) ...) (looping (rest list-name) ...))]))]) (looping a-list ...))]))
Just as a reminder, let’s carefully step through the elements of this revision.
First, this macro performs computations during macro expansion with the call to the function generate-temporaries.
Second, we are constructing syntax objects with syntax/#' that do not get returned from the macro. The arguments of generate-temporaries is a syntax object that represents the sequence of names. For each element of this list, the function will generate a new name.
Stop! Could we have used #'(a-list ...) instead?
Third, the use of #:with turns the new names into syntax-pattern
variables that syntax/#' may substitute when it is
instantiating the output template. As in the preceding chapter, the
#:with form uses a list pattern on the left to deconstruct a list of
syntax objects—
Finally, we should ask if it is possible to use these new names, list-name, in the computation? The answer is no, and it is not because they are actually obscured as identifiers like temporary_list-name1924_abcd. Instead, it is because Racket systematically tracks the definition of variables and ensures that variables bound in different scopes do not shadow or capture variables defined in other scopes.
5.4 Another Evolution
Iterating through a data structure is a common programming pattern in all languages. A typical iteration employs a conditional and dispatches on expected values, computing one value for the '() case, another for cons, and maybe a different thing for other atomic values, such as numbers.
(define (sum a-tree) (cond [(null? a-tree) 0] [(cons? a-tree) (define elem (first a-tree)) (define more (rest a-tree)) (+ (sum elem) (sum more))] [(number? a-tree) a-tree] [else (error "..., given: ~e" a-tree)]))
(define (in a-tree) (cond [(null? a-tree) '()] [(cons? a-tree) (define elem (first a-tree)) (define more (rest a-tree)) (append (in elem) (in more))] [(number? a-tree) (list a-tree)] [else (error "..., given: ~e" a-tree)]))
Figure 3 displays two such iteration function with similar function bodies. This conditional is tedious to get right. Programmers tend to forget the error case, fail to name pieces of the data structure and thus re-extract them, or make other small mistakes.
(define-syntax (tree-match stx) (syntax-parse stx [(_ a-tree [(#:null) null-case] [(#:cons (~var one id) (~var more id)) cons-case] [(#:number) num-case]) #'(cond [(null? a-tree) null-case] [(cons? a-tree) (define one (first a-tree)) (define more (rest a-tree)) cons-case] [(number? a-tree) num-case] [else (error "..., given: ~e" a-tree)])]))
(define (sum a-tree) (tree-match a-tree [(#:null) 0] [(#:cons elem more) (+ (sum elem) (sum more))] [(#:number) a-tree]))
- If numbers occur only in the left part of a cons, the data structure is list-like and a summation function might look like this:
(define (sum-list a-tree) (simple-match a-tree [(#:null) 0] [(#:cons elem more) (+ elem (sum-list more))])) That is, the function body would use only the #:null and #:cons case. - If the numbers are always in the leaves, the data structure is a binary tree and the summation function would use just the #:cons and the #:number clauses of the extension:
While a programmer could just use match for traversing this data structure, let’s suppose for the rest of this section that Racket did not come with such a fancy library.
We have been faced with such a situation
(define-syntax (simple-match stx) (syntax-parse stx [(_ var) #'(error "Input was not matched ~e" var)] [(_ var [(#:null) null-case] others ...) #'(if (null? var) null-case (simple-match var others ...))] [(_ var [(#:number) num-case] others ...) #'(if (number? var) num-case (simple-match var others ...))] [(_ var [(#:cons (~var one id) (~var more id)) cons-case] others ...) #'(if (cons? var) (let ([one (first var)] [more (rest var)]) cons-case) (simple-match var others ...))]))
The syntax pattern in the first clauses matches (simple-match x) where x is any identifier. The generated code calls the error function to report that no valid simple matches can be found.
The second pattern catches the case when the first clause of simple-match deals with the empty list. Note the others syntax variable followed by an ellipsis, which matches the remaining clauses in the simple-match. The template uses these remaining clauses to generate a recursive use of the macro in the else branch of the standard if expression.
The third syntax-parse clause is for the #:number case. It is similar to the second case.
The last syntax pattern deals with the appearance of #:cons case at the beginning of the simple-match clauses. It also generates an if expression, though with a complex let expression for the positive branch. Otherwise the template is like the preceding two cases.
5.5 Language Extensions Must Behave Properly
The macros of this chapter aren’t proper language extensions yet. A proper
extension is indistinguishable from a linguistic construct that comes with the
core language. In particular, like such constructs, a proper extension does not
reveal—
(define-syntax (simple-for/list0 stx) (syntax-parse stx [(_ (elem-name a-list) computation) #'(map (λ (elem-name) computation) a-list)]))
> (define-syntax (robust-for/list0 stx) (syntax-parse stx [(_ ((~var elem-name id) a-list) computation) #'(if (list? a-list) (map (λ (elem-name) computation) a-list) (error 'robust-for/list0 "expected a list, given ~e" a-list))]))
> (define (tee x) (displayln `(debugging ,x) (current-error-port)) x)
> (define (run l) (robust-for/list0 (i (tee l)) (add1 i))) > (run '(1 2 3))
(debugging (1 2 3))
(debugging (1 2 3))
'(2 3 4)
(define-syntax (robust-for/list1 stx) (syntax-parse stx [(_ ((~var elem-name id) a-list) computation) #'(let ([a-list-v a-list]) (if (list? a-list-v) (map (λ (elem-name) computation) a-list-v) (error 'robust-for/list1 "Expected list, given ~e" a-list-v)))]))
Suppose for a moment that you have learned enough to place this wonderful new macro into a library and to provide it from there to the numerous Racket programmers who can’t wait to use it to improve their code. When one of these coders doesn’t provide an initialization expression that evaluates to a list, the error message from the macro might not be as informative as one from contracted library function. That is, it may not blame the module that contains the actual bug.
the #:declare directive, which allows for the for declaration of annotations;
another syntax class, expr/c, which can also be used to annotate syntax-pattern variables; and
the notion of syntax-class attributes.
> (define-syntax (robust-for/list2 stx) (syntax-parse stx [(_ ((~var elem-name id) a-list) computation) #:declare a-list (expr/c #'list?) #'(map (λ (elem-name) computation) a-list.c)]))
> (robust-for/list2 [5 (list 1 2)] (add1 x))
robust-for/list2: expected identifier
at: 5
in: (robust-for/list2 (5 (list 1 2)) (add1 x))
> (robust-for/list2 [x 5] (add1 x))
robust-for/list2: contract violation
expected: list?
given: 5
in: (listof any/c)
macro argument contract
contract from: 'program
blaming: (quote program)
(assuming the contract is correct)
at: eval:87.0
> (define (run2 l) (robust-for/list2 (i (tee l)) (add1 i))) > (run2 '(1 2 3)) (debugging (1 2 3))
'(2 3 4)
Stop! Can you think of any other syntactic mistake that a programmer could make in conjunction with robust-for/list2? Do the compile-time error message make sense? Do they reveal that the construct is programmer-defined when compared to error messages from Racket’s core constructs?
(define-syntax (robust-for/list2 stx) (syntax-parse stx [(_ ((~var elem-name id) a-list) computation) #:declare a-list (expr/c #'list?) #'(map (λ (elem-name) computation) a-list.c)]))
5.6 Language Extensions Must Generate Properly Sized Code
Figure 4 displays two implementations of the time-it language extension side by side. At first glance the two look equivalent. Both The original one on the left and the new one on the right essentially perform the same computation steps: they grab the current time, run the computation, grab the current time again, compute how many milliseconds have passed, print this number, and finally return the result of the computation. The new definition is more compact than the old one, and thus you may wonder whether the new one is an acceptable implementation of the time-it extension.
original macro definition
alternative definition
(define-syntax (time-it stx) (syntax-parse stx [(_ computation) #'(thunk-time-it (λ () computation))])) (define (thunk-time-it do-c) (define before (cim)) (define answer (do-c)) (define after (cim)) (define delta (- after before)) (printf "time: ~a ms\n" delta) answer)
(define-syntax (alt-time-it stx) (syntax-parse stx [(_ computation) (let() (define before (cim)) (define answer computation) (define after (cim)) (define delta (- after before)) (printf "time: ~a ms\n" delta) answer)])) To keep things short:
Let’s compare the two implementations in detail. The original definition generates a function application whose function position refers to a run-time function definition and whose argument us a generated, null-ary procedure. The alternative definition in-lines the body of the function and thus does not need to create the procedure; it merely places the computation code at the right place.
Based on this detailed description, we can now describe the effects in simple terms. While time-it generates a small amount of code and shares most of the needed code via a run-time library, alt-time-it duplicates the generated code for measuring the time at every single use site of the macro. If this language extension is used a lot, the resulting Racket code may become large, and large source code may affect the compiler’s back-end, e.g., put pressure on a register allocator. The downside of time-it is that it allocates a closure for every call site, which may create pressure on the memory manager at run-time.
Although time-it is a small language extensions, we can still generalize from these observations. We observe two points. On one hand, the generation of code is the point of language extensions via macros. On the other hand, keeping this code small is a good idea, and like compiler writers, language extenders create auxiliary run-time functions to achieve this goal.
Stop! Doesn’t the multi-list variant of our simple-for/list extension generate large code blocks with recursive functions? Can this be simplified?
5.7 Macros in Libraries
Say you have developed your macros, protected them against potential compile-time and run-time mistakes, and reduced the size of generated syntax as much as possible in the given time. You are ready to create your first extension library. Figure 5 demonstrates how to accomplish this last step. You place the code into a module, both the macro definitions and the run-time definitions that your generated code needs; and if you need compile-time only definitions, place them into this module, too. Then you specify those identifiers that you wish to export in a provide specification. This may include macros, as the figure illustrates. Be a good citizen and document the functionality of the exported identifiers.
Stop! The figure uses a new syntax-class annotation. What do you think it means?
"extensions.rkt"
#lang racket (provide ; (my-for/list ((~var elem-name id) (~var a-list expr)) ; (~var computation)) ; evaluates computation with elem-name successively ; bound to the elements of a-list, which must evaluate to a list my-for/list ; (time-it (~var computation expr)) ; evaluates computation and returns the result ; — it measure the time it takes and prints the number of ms consumed time-it) ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (require (for-syntax syntax/parse)) ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (define-syntax (my-for/list stx) (syntax-parse stx [(_ ((~var elem-name id) a-list) computation) #:declare a-list (expr/c #'list?) #'(map (λ (elem-name) computation) a-list.c)])) (define-syntax (time-it stx) (syntax-parse stx [(_ computation) #'(thunk-time-it (λ () computation))])) (define (thunk-time-it do-c) (define before (cim)) (define answer (do-c)) (define after (cim)) (define delta (- after before)) (printf "time: ~a ms\n" delta) answer) ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; local tests
Figure 6 shows how to program with these
extensions. It simply suffices to require the extension
library. The use of a relative path name in the example relies on the
assumption that the two files—
"extensions-client.rkt"
#lang racket (require "extensions.rkt") (require math) ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (time-it (my-for/list (p (build-list 10000 (λ (i) (+ 50000000000 i)))) `(,p ,(prime? p))))
If you know how to create a collection with raco, you may create one for the extension library and use a collection-based path for importing the language extensions.
5.8 Lessons Learned
This chapter is about extending the Racket language with macros. Like an ordinary library function, a language extension doesn’t come about easily and deserves careful design. And also like a library function, once the extension exists, people will make use of it and the uses may even surprise the creator.
A discerning programmer looks out for programming patterns. While functional and object-oriented abstraction can eliminate patterns that differ in values only, some patterns are of syntactic nature and hiding those calls for language extensions.
The implementation of a language extension uses macros. Like an ordinary language construct, a language extension needs compile-time and run-time components.
The compile-time component consists of macros and compile-time functions. A macro rewrites pieces of code into different pieces of code. These generated pieces of code may refer to identifiers from run-time libraries as well as identifiers locally defined for just this macro.
The run-time component consists of collections of arbitrary definitions. The use of these run-time values is usually restricted by the macro-generated code and may thus make certain logical assumptions. Conversely, the macro may also make assumptions about the run-time components.
Once the language extension is developed, it is imperative to use it and explore its capabilities and limitations. Like libraries, language extensions need several iterations to settle on a good interface and services. This evolution usually adds syntactic conveniences and expands the facilities’ functionality. Additionally this phase of the process calls for paying attention to the language extension’s error behavior and its impact on compilation time.
The creator of a language extension is the compiler writer’s sub-contractor. Like the compiler writer, this creator must make sure that the extension behaves well, even when programmers do not use it properly. And like the creation of a compiler, this process must pay attention to both syntax and run-time errors.
When it is possible to discover an error at compile time, the macro’s implementation should do so. The syntax-class annotations of syntax-parse are a macro’s writer’s friends for this task. So far we have used some built-in ones; in the next chapter we show how to program syntax classes and that they can do more than check the properties of syntax objects.
When a constraint must be checked at run-time, the expr/c syntax class—
technical a parametrized syntax class— provides services analogous to Racket’s contract system. Using this syntax class also tends to ensure that run-time checking does not accidentally evaluate a given expression more than necessary. Like a compiler writer, the creator of language extensions must pay attention to the effects of a new linguistic construct on compilation and run time.
To reduce the effect on compilation time, a macro must keep the generated code—
after all the use of a macro in client module generates code, unlike the use of a library function, which is just a function call. It is thus best to move code into a run-time definition if it doesn’t depend on a macro’s arguments. The generated code may freely refer to this factored out code. To reduce the effect on run time, the generated code should follow the usual rules of programming.
An Insight
(define-syntax (macro-name stx) _)
(syntax-parse stx _) for the dash in the preceding line
write down one clause consisting of a one-line syntax pattern and a template that starts with syntax.
Stop! Can you use define-syntax and syntax-parse to create a language extension that hides this pattern? It might be good to call it define-simple-macro.