10 Module Languages
Goals |
— |
— |
— |
As a new running example for building languages in Racket, let’s look at a toy shell language. You can run external programs in Racket using functions like find-executable-path and system*, but it’s not nearly as convenient as running external programs in a language like bash. We can make a shell language that’s more streamlined like bash for running programs, but that has enough parentheses to make it beautiful. We’ll call it “pfsh” for “parenthesis-friendly shell.” Pronunciation: the “p” in “pfsh” is silent.
"demo.rkt"
#lang pfsh ; List all in the current directory (ls -l) ; Hello, world! (echo "Hello, world!") ; Hello to me (define me (whoami)) (echo -n Hello to me) ; Count how many things are in the current directory (define l (ls)) (wc -l < l) ; Run some other program in our path (racket)
This language looks something like Racket, but identifiers behave
differently. When an identifier appears after an open parenthesis, it
is normally treated as an external program name, instead of a
reference to a definition. When an identifier is in an argument
position, it turns into a string argument—
To get started on this language, we have to first learn about Racket modules and about how #lang turns into a module import.
10.1 Defining Modules
Let’s create a run function that combines find-executable-path and system* so that, for example,
(run "ls" "-l")
lists the content of the current directory in long format. We’ll put run in its own module, so we can use it in multiple programs. Here’s the module:
#lang racket (provide run) (define (run prog . args) (apply system* (find-program prog) args)) (define (find-program str) (or (find-executable-path str) (error 'pfsh "could not find program: ~a" str)))
This module defines run and uses the provide form to export it for use by other modules. The module also defines a find-program helper function, but that function is not exported for external use.
To use run, another module imports it with require. Assuming that "use-run.rkt" is in the same directory, we can reference the "run.rkt" module using a relative path:
Windows users: try (run "cmd.exe" "/c" "dir"), instead.
"use-run.rkt"
#lang racket (require "run.rkt") (run "ls" "-l")
If we want to be able to write (run ls -l), then we can’t implement run as a function, because the run form’s pieces are not expressions in the usual Racket sense. Of course, we can implement the revised run as a macro in terms of the run function that we have:
"pfsh-run.rkt"
#lang racket (require "run.rkt" (for-syntax syntax/parse)) (provide (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(run (symbol->string 'prog) (symbol->string 'arg) ...)]))
"use-pfsh-run.rkt"
#lang racket (require "pfsh-run.rkt") (run ls -l)
Note how "pfsh-run.rkt" defines pfsh:run but renames it to run when exporting via provide. That way, the implementation of the macro can refer to the run function imported from "run.rkt". The macro system manages bindings properly to ensure that run in the expansion of pfsh:run will always refer to the run function, even if pfsh:run is used in a module like "use-pfsh-run.rkt" where run does not refer to the function.
10.2 From Modules to Languages
The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an meaning for the rest of the file, and it gets control at the character level. The only constraint on a #lang’s meaning is that it denotes a Racket module that can be referenced using the file’s path.
The implementation of a language doesn’t have to go all the way from characters to the machine-code representation of a module, however. Instead, it compiles the module text to a syntax object that represents a primitive module form.
It turns out that you can write the primitive module form directly in DrRacket. If you leave out any #lang line and write
(module example racket (#%module-begin (+ 1 2)))
then it’s the same as
#lang racket (+ 1 2)
and if you write the latter form, then it essentially turns into the former form. Both forms have the same (+ 1 2) because #lang racket uses the native syntax for the module body.
Technically, there’s a difference in intent in the above two chunks of text showing programs. In the second case witth #lang, the parentheses are meant as actual parenthesis characters that reside in a file. In the first case with module, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. A language implementation has to actually parse the parentheses in the second block of code to produce the first.
Other languages show a bigger difference between the #lang and module forms. For example,
#lang scribble/base Hello, world!
turns into
(module example scribble/base/lang (#%module-begin (doc-begin doc values () "\n" "Hello, world!" "\n")))
You can see that the Hello, world! text and even the newlines have been turned into syntax-object strings here, but not much else has happened. In general, that’s a good strategy for a #lang: perform just enough parsing to get into syntax objects, and then use macros to finish the language’s compilation.
We’ll define pfsh so that the original pfsh example program corresponds to
(module example pfsh (#%module-begin (ls -l) (echo "Hello, world!") (define me (whoami)) (echo -n Hello to me) (define l (ls)) (wc -l < l) (racket)))
(module example "pfsh.rkt" (#%module-begin (define me (whoami)) (echo -n Hello to me)))
Without creating a "pfsh.rkt" file, copy the #lang s-exp "pfsh.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "pfsh.rkt" module, but it will show you the parsed form.
10.3 The Core module Form
Module | = |
| |||||
| |
|
For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.
If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.
Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.
"simple.rkt"
#lang racket (provide #%module-begin)
10.4 A First Implementation of pfsh
It’s going to take a few steps to get to the pfsh language in all of its glory. As a first step, let’s create a variant "pfsh0.rkt" that has a run form to run an external program:
#lang s-exp "pfsh0.rkt" (run ls -l)
Since that’s equivalent to
(module example "pfsh0.rkt" (#%module-begin (run ls -l)))
then we need to create a "pfsh0.rkt" module that provides #%module-begin and run. The run macro’s job is to treat its identifiers as strings and deliver them to the run function that we defined in "run.rkt":
"pfsh0.rkt"
#lang racket (require "run.rkt" (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
We’ve wrapped void around the call to run to suppress the success or failure boolean that would otherwise print after the run program’s output.
10.5 Compile Time and Run Time
Notice that pfsh so far has two parts:
the compile-time part that is about dealing with the syntax of the language, here implemented by the pfsh:run macro in "pfsh0.rkt"; and
the run-time part that is called by generated code, here implemented by the run function in "run.rkt".
Although we happen to have implemented the two parts in different modules, they don’t have to be different. We could just as well have put the run function’s implementation directly in "pfsh0.rkt":
"pfsh0-alt.rkt"
#lang racket (require (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))])) (define (run prog . args) (apply system* (find-program prog) args)) (define (find-program str) (or (find-executable-path str) (error 'pfsh "could not find program: ~a" str)))
At this point, it’s worth double-checking that we have appropriately sorted computation in the compile and run phases. Generally, it’s better to perform a computation at compile time instead of run time, if possible. In this case, the pfsh:run macro generates symbol->string expressions to convert symbols to strings at run time,It’s a good idea to let the compiler optimize away computations when it can. Unfortunately, symbol->string is defined to generate a fresh mutable string every time it’s called, and the compiler cannot tell that the freshness is unnecessary here, so it won’t optimize the symbol->string calls to literal strings. but that conversion could be performed at compile time, instead. Let’s improve pfsh:run to perform that work at compile time.
The most obvious way to move the computation is to immediately escape back to compile time in the result template for pfsh:run:
"pfsh1a.rkt"
#lang racket (require "run.rkt" (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #`(void (run #,(symbol->string (syntax-e #'prog)) #,@(map symbol->string (map syntax-e (syntax->list #'(arg ...))))))]))
Alternatively, we can stay within the template language in pfsh:run and defer the compile-time escape to a helper macro:
#lang racket (require "run.rkt" (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #`(void (run (as-string prog) (as-string arg) ...))])) (define-syntax (as-string stx) (syntax-parse stx [(_ sym:id) #`#,(symbol->string (syntax-e #'sym))]))
10.6 More Language Variations
Let’s continue implementing pfsh by solving the following problem.
Sample Problem Add input redirection to run by allowing < followed by a defined identifier at the end of a run form. You’ll probably find syntax/parse’s ~datum or #:datum-literals handy, as well as with-input-from-string from racket/port. Call the extended language "pfsh3.rkt".
"use-pfsh3.rkt"
#lang s-exp "pfsh3.rkt" (define l (run ls)) (run wc -l < l)
Here is a solution.
"pfsh3.rkt"
#lang racket (require "run.rkt" racket/port (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run] [pfsh:define define])) (define-syntax (pfsh:run stx) (syntax-parse stx #:datum-literals (<) [(_ prog:id arg:id ... < stream:id) #'(with-input-from-string stream (lambda () (pfsh:run prog arg ...)))] [(_ prog:id arg:id ...) #`(void (run (as-string prog) (as-string arg) ...))])) (define-syntax (as-string stx) (syntax-parse stx [(_ sym:id) #`#,(symbol->string (syntax-e #'sym))])) (define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))]))
10.7 The Application Form
So far, the biggest difference between the pfsh that we’ve implemented and the pfsh that we want is that we have to put run before every program name. Instead of (run ls), we want to write (ls).
Since macros can do any kind of work at compile time, you might imagine changing pfsh so that it scans the filesystem and builds up a set of definitions based on the programs that are currently available via the PATH environment variable. That’s not how scripting languages are meant to work, though. Also, it’s likely to cause trouble to use the filesystem and environment-variable state at such a fine granularity to determine bindings of a module.
Instead, we would like to change the default meaning of parentheses. In Racket, a pair of parentheses mean a function call by default. In pfsh, a pair of parentheses should mean running an external program by default. The “by default” part concedes that an identifier after an open parenthesis can change the meaning of the parenthesis, such as when define appears after an open parenthesis. Otherwise, though, it’s as if a function-call identifier appears after the open parenthesis to specify a function-call form... and function-call exists, except that it’s spelled #%app.
(+ 1 2)
"pfsh7.rkt"
#lang racket .... (provide .... (rename-out [pfsh:run #%app] ....)) ....
10.8 More Implicit Forms
You can have seen two implicit forms that a language can adjust, #%module-begin and #%app, so you may wonder how many implicit forms there are. The others are #%datum, #%top, and #%top-interaction.
10.8.1 #%datum
The #%datum form is implicitly wrapped around a literal constant such as 0, #true, or "apple" when it appears in a place where an expression is expected. Since the #%datum form always has a single subform, it takes advantage of a performance hack internally by being written with parentheses and a ., which corresponds to a non-list pair instead of a list; so, 0 is implicitly (#%datum . 0), and so on.
Let’s not allow numbers in pfsh, but let’s allow literal strings, which can be useful for piping to a program’s input. Since a literal string is useful as a program’s input, let’s also change #%app to allow any expression after a < redirection.
"pfsh8.rkt"
#lang racket .... (provide #%module-begin (rename-out [pfsh:run #%app] [pfsh:define define] [pfsh:datum #%datum])) (define-syntax (pfsh:run stx) (syntax-parse stx #:datum-literals (<) [(_ prog:id arg:id ... < stream:expr) #'(with-input-from-string stream (lambda () (pfsh:run prog arg ...)))] [(_ prog:id arg:id ...) #`(void (run (as-string prog) (as-string arg) ...))])) .... (define-syntax (pfsh:datum stx) (syntax-parse stx [(_ . (~var s str)) #'(#%datum . s)] [(_ . other) (raise-syntax-error 'pfsh "only literal strings are allowed" #'other)]))
10.8.2 #%top
(define-syntax (complain-top stx) (syntax-parse stx [(_ . x:id) (raise-syntax-error 'variable "misplaced" #'x)]))
"pfsh9.rkt"
#lang racket (require "run.rkt" racket/port (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run #%app] [pfsh:top #%top] [pfsh:define define] [pfsh:datum #%datum])) (define-syntax (pfsh:run stx) (syntax-parse stx #:datum-literals (<) [(_ prog arg ... < stream:expr) #'(with-input-from-string stream (lambda () (pfsh:run prog arg ...)))] [(_ prog arg ...) #`(void (run prog arg ...))])) (define-syntax (pfsh:top stx) (syntax-parse stx [(_ #,dot sym:id) #`#,(symbol->string (syntax-e #'sym))])) (define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))])) (define-syntax (pfsh:datum stx) (syntax-parse stx [(_ #,dot (~var s str)) #'(#%datum . s)] [(_ #,dot other) (raise-syntax-error 'pfsh "only literal strings are allowed" #'other)]))
10.8.3 #%top-interaction
Finally, you may have noticed that when you run any of the working programs with "pfsh9.rkt" and earlier variants, DrRacket usually reports “Interactions disabled: language does not support a REPL (no #%top-interaction).”
"pfsh9.rkt"
.... (provide .... (rename-out .... [pfsh:top-interaction #%top-interaction])) (define-syntax (pfsh:top-interaction stx) (syntax-parse stx [(_ . form) #'form])) ....
10.9 Defining Functions
Our pfsh implementation can now run the original example script, but let’s go a little further. An advantage of a parenthesis-friendly shell is that we can mix in more of Racket to better support abstraction in a script. At a minimum, we’d like to be able to define and call functions in pfsh scripts:
"use-pfsh11.rkt"
#lang s-exp "pfsh11.rkt" (define (double x) (string-append x x)) (define l (ls -l)) (wc -l < l) (wc -l < (double l))
It’s easy to make the string-append function available.
It’s also easy to change define to match and distinguish
function and stream shapes—
Here are two ways to make the adaptation work:
We can change #%app so that it inspects an identifier in the “function” position to check whether the identifier is bound. If so, the #%app corresponds to a function call.
In this case, a define for a function can expand to a regular define.
We can make define bind a function name as a macro, so that using the name after an open parenthesis triggers a function-specific macro instead of the generic #%app form.
In this case, a define for a function needs to expand to define-syntax to bind the function name as a macro.
Slightly different behaviors fall out from each of these strategies. With the first strategy, an identifier that is bound to a string for a program name cannot be used to run the program, because using the identifier after an open parenthesis would trigger a function call. With the second strategy, a name bound to a string still works as a program name, but a function identifier doesn’t work as an argument to another function (unless we do a little more work to make that possible). Both approaches are viable, and either could be made to fit a preferred behavior, so let’s try both of them.
10.9.1 Detecting Bindings
To try the first strategy, we need #%app to recognize whether an identifier has a binding or not. Since the #%app macro receives only an immediate application form, how can it know what definitions are in the rest of the module? That is, although the #%app macro can do any work its wants at compile time, it doesn’t have a handle on the whole module to inspect it. The macro expander itself must know about bindings, because it uses binding information to determine which macro should handle an expansion. Happily for our #%app, the macro expander shares its binding information with macros in several ways, including through a identifier-binding function.
The identifier-binding function takes an identifier and reports #f if the identifier has no binding. Otherwise, it reports some information about the binding, such as which module (possibly the current one) contains a definition of the identifier. For our purposes, we do not care about the additional details, so we can just check whether identifier-binding returns #f.
(define-syntax (pfsh:run stx) (syntax-parse stx #:datum-literals (<) [(_ prog arg ... < stream:expr) #'(with-input-from-string stream (lambda () (pfsh:run prog arg ...)))] [(_ prog:id arg ...) #:when (identifier-binding #'prog) #'(prog arg ...)] [(_ prog arg ...) #`(void (run prog arg ...))]))
(define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))] [(_ (proc:id arg:id ...) expr) #'(define (proc arg ...) expr)]))
(provide .... string-append)
10.9.2 Macro-Defining Macros
With the strategy where define binds a function name as a macro, we don’t have to change #%app. We just have to change pfsh:define to compile a pfsh function definition into a racket macro definition.
(define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))] [(_ (proc:id arg:id ...) expr) #'(define-syntax (proc stx) (syntax-parse stx [(_ arg ...) #'expr]))]))
(define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))] [(_ (proc:id arg:id ...) expr) #'(begin (define (actual-proc arg ...) expr) (define-syntax (proc stx) (syntax-parse stx [(_ arg ...) #'(actual-proc arg ...)])))]))
(define (double x) (string-append x x))
(define (actual-proc x) (string-append x x)) (define-syntax (double stx) (syntax-parse stx [(_ arg ...) #'(actual-proc arg ...)]))
(provide .... (rename-out .... [pfsh:string-append string-append])) (pfsh:define (pfsh:string-append arg1 arg2) (string-append arg1 arg2))
10.10 Installing a Language
Let’s take the last step in defining a language, which will let use switch from #lang s-exp "pfsh11.rkt" to #lang pfsh. To enable writing #lang pfsh, we must do two things:
Adjust our language implementation so that it explicitly specifies S-expression parsing, instead of having S-expression parsing imposed externally.
Install our language as a package so that #lang pfsh will work from anywhere.
The part of a language that specifies its parsing from characters to syntax objects is called a reader. A language’s reader is implemented by a reader submodule (i.e., a nested module) inside the language’s module. That submodule must export a read-syntax function that takes an input port, reads characters from it, and constructs a module form as a syntax object. For historical reasons, the submodule should also provide a read function that does the same thing but returns a plain S-expression instead of a syntax object.
(module reader racket (provide (rename-out [pfsh:read-syntax read-syntax] [pfsh:read read])) (define (pfsh:read-syntax name in) (datum->syntax #f `(module anything pfsh (#%module-begin ,@(read-body name in))))) (define (read-body name in) (define e (read-syntax name in)) (if (eof-object? e) '() (cons e (read-body name in)))) (define (pfsh:read in) (syntax->datum (pfsh:read-syntax 'src in))))
(module reader syntax/module-reader pfsh)
In short, we just need to add those two lines to our current pfsh implementation, and then save it as "main.rkt" in a "pfsh" directory. Here’s the complete implementation:
#lang racket (require "run.rkt" racket/port (for-syntax syntax/parse)) (provide #%module-begin (rename-out [pfsh:run #%app] [pfsh:top #%top] [pfsh:define define] [pfsh:datum #%datum] [pfsh:top-interaction #%top-interaction] [pfsh:string-append string-append])) (module reader syntax/module-reader pfsh) (define-syntax (pfsh:run stx) (syntax-parse stx #:datum-literals (<) [(_ prog arg ... < stream:expr) #'(with-input-from-string stream (lambda () (pfsh:run prog arg ...)))] [(_ prog arg ...) #`(void (run prog arg ...))])) (define-syntax (pfsh:top stx) (syntax-parse stx [(_ #,dot sym:id) #`#,(symbol->string (syntax-e #'sym))])) (define-syntax (pfsh:define stx) (syntax-parse stx [(_ stream:id expr) #'(define stream (with-output-to-string (lambda () expr)))] [(_ (proc:id arg:id ...) expr) #'(begin (define (actual-proc arg ...) expr) (define-syntax (proc stx) (syntax-parse stx [(_ arg ...) #'(actual-proc arg ...)])))])) (pfsh:define (pfsh:string-append arg1 arg2) (string-append arg1 arg2)) (define-syntax (pfsh:datum stx) (syntax-parse stx [(_ #,dot (~var s str)) #'(#%datum . s)] [(_ #,dot other) (raise-syntax-error 'pfsh "only literal strings are allowed" #'other)])) (define-syntax (pfsh:top-interaction stx) (syntax-parse stx [(_ #,dot form) #'form]))
You’ll also need "run.rkt" in the same "pfsh" directory.
After either of those steps, you can run
#lang pfsh (echo Hello!)