4 Syntax Pattern

7.8.0.8

← prev up next →

4 Syntax Pattern

Goals

— the basics of syntax patterns and templates

— matching patterns, instantiating templates

— optional patterns, default templates

— pattern matching and compile-time functions

4.1 The Basics of Syntax Pattern Matching

The macro and compile-time functions in the preceding chapter use syntax-e and primitive Racket functions to de-structure the given syntax nodes. Once they have the pieces, they compose the new syntax node using #’ and friends. We know from defining run-time functions in programs that this sequence of de-structuring a given compound form of data and constructing new data naturally leads to repeated programming patters. In the case of run-time functions, we use match and similar facilities to eliminate these patterns, greatly enhancing the readability of code.

As a matter of fact, Racket supplies several libraries for defining macros and compile-time functions, both more primitive ones than syntax-parse and derived forms.

Unsurprisingly, Racket provides a similarly powerful sub-language for defining macros and compile-time functions, namely, the language of syntax-parse. It is an embedded sub-language, defined via primitive macro facilities, a fact that we ignore here. The construct is tuned to help with syntax-processing functions.

Let’s reformulate the one “manual” example from the preceding chapter with syntax-parse:

; (define-hello-v2 name) is like define-hello
(define-syntax (define-hello-v2 stx)
(syntax-parse stx
((_ x) #'(define x "world"))))

If this doesn’t work for you, you need to (require (for-syntax syntax/parse)) to your Definitions Window.

Morally, define-hello-v2 is the same macro as define-hello. But look how much easier it is to write it down. Instead of three definitions to de-structure the given syntax and two to construct the new one, syntax-parse allows us to take apart the syntax node with a simple pattern and a template that is to be filled in:

The pattern is (_ x), matching a syntax node that contains a two-element list. The first element is the new keyword, define-hello-v2, and the pattern emphasizes this with _. The second element of the pattern is x, a pattern variable that matches any sub-tree of the given syntax node in this position.
For example, if the given syntax node contains (define-hello-v2 a), then the syntax-pattern variable x is instantiated as the syntax object containing a.
The template is #'(define x "world") (short for (syntax (define x "world"))), which uses Racket’s define and the string "world" to create a definition from the syntax-pattern variable x. Once x is instantiated due to a successful match, the expander substitutes the x in the template with its value. We say the template gets instantiated.
For example, if x stands for a, then the result of instantiating the template is (define a "world").

Like functions written with match, macros written with syntax-parse are thus much easier it is to read than functions using ordinary selectors.

Here are two working instances of our new macro:

> (define-hello-v2 a)
> a
"world"
> (define-hello-v2 a-variable)
> a-variable
"world"

Stop! Use define-hello-v2 in inappropriate ways: with too few or too many sub-terms, with a number as a sub-term, and so on. What happens in these cases?

The syntax-parse pattern language allows developers to add annotations to tell the pattern matcher that the second part of the syntax tree must be an identifier or id for short:

; like (define-hello-v2 name) but expresses errors in terms of itself
(define-syntax (define-hello-v3 stx)
(syntax-parse stx
((_ (~var x id)) #'(define x "world"))))

The ~var annotation emphasizes that x is a syntax-pattern variable and the id part—a so-called syntax class—-enforces that it matches identifiers only.

When a developer now uses define-hello-v3 the wrong way, the error message explains the problem in terms of define-hello-v3 not the broken code it generates due to bad syntax trees:

> (define-hello-v3 1)
define-hello-v3: expected identifier
at: 1
in: (define-hello-v3 1)

We can even express this as a unit test:

> (require rackunit)
> (require syntax/macro-testing)
> (check-exn #rx"define-hello-v3: expected identifier"
(lambda () (convert-syntax-error (define-hello-v3 1))))

There is no output, because the test succeeds.

Sample Problem Suppose we want define-hello to define more than one name to stand for "world", this most amazing string of all.

With syntax-parse, creating this extension is also straightforward. Let’s first write down a pattern for a construct that deals with multiple definitions only:

; (define-hello* name ...) defines every name ... to stand for a "world"
(define-syntax (define-hello* stx)
(syntax-parse stx
[(_ (~var x id) ...) #'???]))

This pattern in this definition is mostly like the first one except for the ... (pronounced ellipsis) part:

In the context of a pattern, ... means “matches the term to my immediate left repeated 0 or more times.”
Here (~var x id) is to the immediate left, that is, a syntax-pattern variable with an annotation.
Thus the pattern matches any syntax list that starts with define-hello* followed by 0 or more identifiers. As far as the pattern is concerned, the syntax-pattern variable stands for the entire sequence of identifiers.
For example, it matches (define-hello*), (define-hello* a), and (define-hello* aa a). In these three cases, the syntax-pattern variable x stands for the empty sequence of identifiers, the sequence that contains only a, and a two-identifier sequence. Stop! Which are the two identifiers?

If a term matches a pattern with ellipsis, the macro expander instantiates the template in a slightly different fashion from the normal case.

There are different templates that solve our sample problem; here is a Racket-y one:

(define-syntax (define-hello* stx)
(syntax-parse stx
[(_ (~var x id) ...) #'(begin (define x "world") ...)]))

This template exploits two ideas. First, the sub-terms of begin are spliced into its expression context. Second, a sub-template might be followed by an ellipsis just like a pattern:

In the context of a template, ... means “instantiate the sub-template to my immediate left with the sequence found in the syntax-pattern variable contained in this term.”
Here (define x "world") is the template sub-term to the immediate left of .... This sub-template contains x, which indeed stands for a sequence of identifiers.
The macro expander instantiates the sub-template for every element of the sequence that the syntax-pattern variable stands for.
For example, if x stands for the two-element sequence of identifiers aa and a, then the resulting sequence of terms is (define aa "world") and (define a "world").

In short, when the define-hello* macro matches, it generates a begin that contains a sequence of definitions. When it is used in a definition context, these definitions are added to the context.

Let’s watch this macro in action:

> (define-hello* aa a)
> aa
"world"
> a
"world"

As promised, all specified identifiers now stand for our favorite string.

Exercise 3. One alternative way to create definitions for several identifiers is to use define-values:
(define-syntax (define-hello*-v2 stx)
  (syntax-parse stx
    [(_ (~var x id) ...)
     #'(define-values (x ...)
        (values (begin 'x  "world") ...))]))
Explain whether and how the term (define-hello* world good bye) matches the pattern. Then step through the instantiation process for the template. Confirm your explanation in DrRacket. Rationalize why this implementation works. End

Sample Problem In principle we can use define-hello* without supplying any identifiers, like this: (define-hello*). Using the macro in this way doesn’t make much sense, though.
So the question is how we can demand that every use supplies at least one identifier. Let’s call the revised macro define-hello+.

The pattern language of syntax-parse has two ways to define define-hello+. The first one uses a different ellipsis notation for the syntax pattern:The + convention goes back to Kleene who used a* to denote possibly empty sequences of a and a+ for non-empty sequences.

> (define-syntax (define-hello+-v1 stx)
(syntax-parse stx
[(_ (~var x id) ...+) #'(begin (define x "world") ...)]))
> (define-hello+-v1 b)
> b
"world"

The + in ...+ means that the sequence of sub-patterns cannot be empty.

The second defining define-hello+ uses a different, more verbose pattern notation than the first one:

> (define-syntax (define-hello+-v2 stx)
    (syntax-parse stx
      [(_ (~var x-1 id) (~var x-2 id) ...)
        #'(begin (define x-1 "world") (define x-2 "world") ...)]))
> (define-hello+-v2 c)
> (define-hello+-v2 d e)
> (list c d e)
'("world" "world" "world")

Note how this pattern consists of four parts: the underline continues to match the new keyword, the second sub-pattern is a syntax-pattern variable (that matches only identifiers), and the third one is also such a variable followed by an ellipsis. Since only the second pattern variable is to the immediate left of the ellipsis, the complete pattern demands at least one identifier in the second position followed by a possibly empty sequence of identifiers.

Stop! Try to use both variants of define-hello+ without supplying an identifier. If you are using DrRacket, turn on the on-line syntax checker and watch the error messages in the status line near the bottom of the window.

4.2 More Pattern Matching, More Templating

Racket’s match supports far more than simple patterns and one-clause pattern-matching, and so does syntax-parse. The point of the next few little exercises is to start expanding your knowledge of syntax-parse’s pattern-matching and templating facilities.

Sample Problem Let us revise define-hello+ so that it allows the optional specification of string prefixes. Specifically, the revised macro should allow the optional prelude of a clause that looks like (pre s) for the literal identifier pre and any string s.

Like the preceding sample problem, this one also has two solutions. The classical one is to use two pattern-matching clauses in syntax-parse:

> (define-syntax (define-hello+-v3 stx)
    (syntax-parse stx
      [(_ (~var x id) ...+)
       #'(begin (define x "world") ...)]
      [(_ ((~literal pre) (~var p str)) (~var x id) ...+)
       #'(begin (define x (string-append p "world")) ...)]))
> (define-hello+-v3 (pre "hello ") c d)
> c
"hello world"
> d
"hello world"

When a syntax-parse expression comes with several pattern-matching clauses, the expander looks for the first one that matches the given syntax object. Once it finds such a pattern, it evaluates the right-hand side and uses the resulting syntax object in place of the given one. The example shows how we can prefix "world" with "hello" for all defined identifiers. To complete the task, we should also check that the old behavior is preserved:

> (define-hello+-v3 e)
> e
"world"

For this example, the first clause of the syntax-parse matches, and thus e stands for just "world".

The second solution uses another pattern annotation and a novel template constructor:

> (define-syntax (define-hello+-v4 stx)
    (syntax-parse stx
      [(_ (~optional ((~literal pre) (~var p str))) (~var x id) ...+)
       #'(begin (define x (string-append (~? p "") "world")) ...)]))

The ~optional annotation says that the entire sub-pattern is optional and, if a corresponding element isn’t part of the given syntax object, the pattern matches anyways.

The result of the macro depends on the presence of this clause, though, and in particular the content of the pattern variable p. When the optional pre clause does not exist, there is no prefix for "world". The template language of syntax-parse therefore allows a conditional that checks whether a syntax-pattern variable in an optional sub-pattern has been matched:

(~? p default)

If p is a pattern variable and is matched, this expression evaluates to the content of p; otherwise the expander evaluates the default expression and splices the resulting value into the surrounding syntax context instead. In other words, #’ not only knows to replace syntax-pattern variables with their content, but also to evaluate expressions of the shape (~? p e).

Let’s wrap up this section with a couple of interactions that demonstrates the workings of define-hello+-v4:

> (define-hello+-v4 (pre "good ") d)
> d
"good world"
> (define-hello+-v4 e)
> e
"world"

Stop! Make ill-formed instances of define-hello+-v4 and check what kind of syntax error syntax-parse issues. In particular, make sure to create an instance that does not match any variant of the pattern.

4.3 Yet More Pattern Matching, Yet More Templating

There is still much more to learn about macros and syntax-parse. Macros can be recursive, and syntax-parse may compute the resulting code in a procedural manner, not just by instantiating a template. We’ll introduce these ideas by revising our sample macro once again.

Sample Problem Revise the define-hello macro so that it permits the optional postfixing of each string. That is, every individual identifier will still be defined to stand for "world", but an identifier paired with a string in parentheses stands for "world" postfixed with this string. The revised macro, called define-hello-post allows empty sequences of sub-terms and does not accommodate optional prefixes.

This time we show three solutions. The goal is to bring across different techniques. The first and older one solves the problem using a recursive macro. The second one uses plain compile-time syntax processing to generate some of the pieces of the desired result, mixing the procedural techniques of the preceding chapter and the templating style of this one. The last one simplifies the second one by using additional syntax-parse, and it is the most direct approach.

Before we turn to the two solutions, let’s look closely at the problem. We start with examples of the desired macro:

(define-hello-post)
; is equivalent to (begin)

(define-hello-post a b)
; defines a and b to stand for "world"

(define-hello-post (c " bye"))
; defines c as "world bye"

(define-hello-post g [f ", hello"] h [i "---done"])
; defines g and h to stand for "world",
; f as "world, hello", and i as "world---done"

As always, brackets are interchangeable with parentheses.

If define-hello-post were a function and its sub-terms were a list, we would write a recursive functions that iterates through the terms until the list is exhausted. Depending on the shape of the first term in the list, the function would compute a different result.

Macros defined via syntax-parse can process their sub-terms in this way, too. We already know that we can write several syntax-parse clauses and that the first matching pattern determines which template a macro picks to generate the new code. Hence to make this idea work, the macro needs two distinct patterns to match the two possible kinds of first sub-terms and one last one for when the sequence of sub-terms is exhausted:

> (define-syntax (define-hello-post stx)
    (syntax-parse stx
      [(_ ((~var x id) (~var p str)) others ...)
       #'(begin
           (define x (string-append "world" p))
           (define-hello-post others ...))]
      [(_ (~var x id) others ...)
       #'(begin
           (define x "world")
           (define-hello-post others ...))]
      [(_)
       #'(begin)]))

Stop! Inspect the first two patterns closely. What do they match? What kind of macro use does the third pattern match?

The first pattern of define-hello-post matches uses whose first sub-term consists a pair of an identifier and a string, followed by a possibly empty sequence of other sub-terms:

(define-hello-post (x "done"))
(define-hello-post (x "done") y)
(define-hello-post (x "done") y (z "wow") w u v)

The second pattern matches plain identifiers followed by any arbitrary sub-terms.

Now that we understand the patterns, we can turn to the templates. When the leading term consists of a pair of an identifier and a string p, the macro must generate a definition for this variable that initializes it to (string-append "world" p);. Similarly, when the leading term is just an identifier, the macro defines it to stand for "world". In both cases, the pattern also mentions others, the sequence of remaining sub-terms. To deal with these sub-terms, the macro generates another instance of itself: (define-hello-post others ...).

Stop! Why can a macro produce code that uses itself?

Now that you know this much, you also understand that a Racket programmer has the power to make the compiler diverge. Write a short macro that makes the Racket compiler run forever.

At this point, recall from the model of macro expansion that the result of a macro needs to be a syntax node—there are no other requirements. If this node happens to be a macro, the expander retrieves and defers to a transformer function again. If the syntax node points to the same macro or contains a sub-term that does, we get recursion. And it actually works as expected:

> (define-hello-post g [f ", hello"] h)
> (list g f h)
'("world" "world, hello" "world")

Stop! Experiment with the other examples from above.

For the second solution, we start with a brief detour to drive home the nature of syntax-parse. Thus far, we have acted as if syntax-parse had to have a syntax template on the right-hand side of its clauses. But, this isn’t the case; syntax-parse is like any other conditionals, meaning we can place any expression there. For a macro, the result of this expression must be a syntax object. When syntax-parse is used somewhere else, say in for/list, a syntax-parse conditional may return anything.

Let’s apply these two ideas to the alternative implementation of the define-hello-post macro:You need to add (require (for-syntax racket/list)) to your file to get this solution to work.

(define-syntax (define-hello-post-v1 stx)
  (syntax-parse stx
    [(_ x-or-x+post ...)
     (define xs+ps
       (for/list ((one (syntax-e #'(x-or-x+post ...))))
         (syntax-parse one
           [(~var x id) (list #'x #'"")]
           [((~var x id) (~var p str)) (list #'x #'p)])))

     #`(begin
         #,@(for/list ((y+q xs+ps))
              (define y (first y+q))
              (define q (second y+q))
              #`(define #,y (string-append "world" #,q))))]))

Instead of a multi-pronged syntax-parse the second implementation re-uses the simple pattern of the first few macros in this chapter. The right-hand side, however, consists of two pieces: a local definition (see highlight) followed by a template formed with #‘ (quasisyntax), #,@ (unsyntax-splicing), and #, (unsyntax).

The local definition names a list of pairs that combine an identifier from the macro’s sub-terms plus a string to be appended to "world". It computes this list by iterating over the list of the macro’s sub-terms. Note how this for/list loop uses syntax-parse to analyze the term. If it is just an identifier, it forms a list of the identifier and the (syntax of) the empty string; otherwise it forms a pair of the given identifier and string.

The syntax template generates the desired list of definitions from the computed list of pairs and splices it into a begin expressions. This inner loop effectively simulates an ellipsis in a template. It is needed because the list of pairs isn’t one formed from an ellipsis pattern. The body of the loop takes apart the pair, naming the identifier y and its post string q. Each iteration of the loop produces a single definition with another quasisyntax template.

Stop! This macro definition is by far the complicated one you have seen so far. Make sure to experiment with it. Use it in all the ways that define-hello-post is intended to be used.

Our final solution simplifies the second one. We start by lifting the body of the first for/list into a separate compile-time function:

> (define-for-syntax (fill-in-option x-or-x+post)
    (syntax-parse x-or-x+post
      [(~var x id) (list #'x #'"")]
      [((~var x id) (~var p str)) (list #'x #'p)]))

This function consumes a syntax object and produces a pair consisting of an identifier (syntax object) and a string (syntax object). Using this function we could now rewrite the auxiliary definition in define-hello-post-v1 like this:

(define xs+ps
(for/list ((one (syntax-e #'(x-or-x+post ...))))
(fill-in-option one)))

or even more simply as

(define xs+ps (map fill-in (syntax-e #'(x-or-x+post ...))))

The second step is about turning this list of pairs into pattern variables as if they had been a part of an ellipsis pattern. To this end, we need to introduce another one of syntax-parse’s facilities, #:with. Roughly speaking, #:with matches a syntax pattern with any value. If the match succeeds, it introduces new syntax-pattern variables that may be used in a syntax template; otherwise the syntax-parse clause acts as if its pattern had failed.

Without further ado then, here is the revised macro definition:

> (define-syntax (define-hello-post-v2 stx)
    (syntax-parse stx
      [(_ x-or-x+post ...)
       #:with ((x p) ...) (map fill-in-option (syntax-e #'(x-or-x+post ...)))
       #'(begin (define x (string-append "world" p)) ...)]))

In this definition, #:with employs a pattern that consists of the sub-pattern (x p) followed by an ellipsis. The outer parentheses mean that the pattern expects a list of (x p) pairings. Now recall that (map fill-in-option _) produces a list of pairs, that is, each element is a list that consists of an identifier (syntax object) and a string (syntax object). Hence the syntax pattern matches, meaning x and p represent sequences of identifiers and strings, respectively.

The rest is straightforward. The template is an ellipsis sequence of defines within begin. An ellipsis in a template expects a syntax-pattern variable in the template to its immediate left so that the expander can instantiate into a sequence of syntax objects. In this example, there are two: x and p, and they represent sequences of the same length. The result then is the expected sequence of definitions.

And it all works just fine:

> (define-hello-post-v2 i [j ", hello"] k)
> (list i j k)
'("world" "world, hello" "world")

Stop! Re-read the explanation of how these solutions work and then solve the following exercise.

Exercise 4. The three solutions to our current sample problem generates different definitions sequences. Show the sequences of definitions that
(define-hello-post one [two ", 2"]),
(define-hello-post-v1 one [two ", 2"]), and
(define-hello-post-v2 one [two ", 2"])
generate. If string-append were an expensive operation, the first solution might look more efficient to you now. Can you modify one of the other two solutions so that it generates the same sequence as the first one? End

One Last Thought

Suppose you were to enter the following into your DrRacket’s definitions window:

(define-hello-post-v2 m m)

The IDE would then display this error message in its status line:

identifier already defined at: m in:
(define-values (m) (string-append "world" ""))

Note that the error message mentions define-values, but not define-hello-post-v2. A similar error message would show up if you used any of the other two implementations.

You might think that you have seen enough variations on the define-hello theme by now, and you are ready for something, anything interesting. Let’s go there but keep this problem in mind because we will have to return to this idea one more time to get things right. In the meantime, let’s get started on “good stuff.”

← prev up next →

	Prelude
1	Programming Languages
2	A Simple Model
3	Onward to Racket
4	Syntax Pattern
5	Language Extensions
6	Syntax Classes
7	Macro Conspiracies
8	Scope and Hygiene
9	Embedded Languages
10	Module Languages
	Postlude

4.1	The Basics of Syntax Pattern Matching
4.2	More Pattern Matching, More Templating
4.3	Yet More Pattern Matching, Yet More Templating