10 Module Languages

7.8.0.8

10 Module Languages

Goals

— a review of plain Racket modules

— exporting and important macros

— interposition points

As a new running example for building languages in Racket, let’s look at a toy shell language. You can run external programs in Racket using functions like find-executable-path and system*, but it’s not nearly as convenient as running external programs in a language like bash. We can make a shell language that’s more streamlined like bash for running programs, but that has enough parentheses to make it beautiful. We’ll call it “pfsh” for “parenthesis-friendly shell.” Pronunciation: the “p” in “pfsh” is silent.

"demo.rkt"
#lang pfsh

; List all in the current directory
(ls -l)

; Hello, world!
(echo "Hello, world!")

; Hello to me
(define me (whoami))
(echo -n Hello to me)

; Count how many things are in the current directory
(define l (ls))
(wc -l < l)

; Run some other program in our path
(racket)

This language looks something like Racket, but identifiers behave differently. When an identifier appears after an open parenthesis, it is normally treated as an external program name, instead of a reference to a definition. When an identifier is in an argument position, it turns into a string argument—but a reference to a defined name gets the string from the output of the command used to define the name. The language is sufficiently like Racket that we’ll be able to mix in some Racket functionality, but it doesn’t start out with any of Racket’s usual constructs; when we write our own language, we get full control.

To get started on this language, we have to first learn about Racket modules and about how #lang turns into a module import.

10.1 Defining Modules

Let’s create a run function that combines find-executable-path and system* so that, for example,

(run "ls" "-l")

lists the content of the current directory in long format. We’ll put run in its own module, so we can use it in multiple programs. Here’s the module:

"run.rkt"
#lang racket

(provide run)

(define (run prog . args)
  (apply system* (find-program prog) args))

(define (find-program str)
  (or (find-executable-path str)
      (error 'pfsh "could not find program: ~a" str)))

This module defines run and uses the provide form to export it for use by other modules. The module also defines a find-program helper function, but that function is not exported for external use.

To use run, another module imports it with require. Assuming that "use-run.rkt" is in the same directory, we can reference the "run.rkt" module using a relative path:

Windows users: try (run "cmd.exe" "/c" "dir"), instead.

"use-run.rkt"
#lang racket
(require "run.rkt")

(run "ls" "-l")

If we want to be able to write (run ls -l), then we can’t implement run as a function, because the run form’s pieces are not expressions in the usual Racket sense. Of course, we can implement the revised run as a macro in terms of the run function that we have:

"pfsh-run.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(run (symbol->string 'prog) (symbol->string 'arg) ...)]))

"use-pfsh-run.rkt"
#lang racket
(require "pfsh-run.rkt")

(run ls -l)

Note how "pfsh-run.rkt" defines pfsh:run but renames it to run when exporting via provide. That way, the implementation of the macro can refer to the run function imported from "run.rkt". The macro system manages bindings properly to ensure that run in the expansion of pfsh:run will always refer to the run function, even if pfsh:run is used in a module like "use-pfsh-run.rkt" where run does not refer to the function.

10.2 From Modules to Languages

The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an meaning for the rest of the file, and it gets control at the character level. The only constraint on a #lang’s meaning is that it denotes a Racket module that can be referenced using the file’s path.

The implementation of a language doesn’t have to go all the way from characters to the machine-code representation of a module, however. Instead, it compiles the module text to a syntax object that represents a primitive module form.

It turns out that you can write the primitive module form directly in DrRacket. If you leave out any #lang line and write

(module example racket
(#%module-begin
(+ 1 2)))

then it’s the same as

#lang racket
(+ 1 2)

and if you write the latter form, then it essentially turns into the former form. Both forms have the same (+ 1 2) because #lang racket uses the native syntax for the module body.

Technically, there’s a difference in intent in the above two chunks of text showing programs. In the second case witth #lang, the parentheses are meant as actual parenthesis characters that reside in a file. In the first case with module, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. A language implementation has to actually parse the parentheses in the second block of code to produce the first.

Other languages show a bigger difference between the #lang and module forms. For example,

#lang scribble/base
Hello, world!

turns into

(module example scribble/base/lang
(#%module-begin
(doc-begin doc values () "\n" "Hello, world!" "\n")))

You can see that the Hello, world! text and even the newlines have been turned into syntax-object strings here, but not much else has happened. In general, that’s a good strategy for a #lang: perform just enough parsing to get into syntax objects, and then use macros to finish the language’s compilation.

We’ll define pfsh so that the original pfsh example program corresponds to

(module example pfsh
  (#%module-begin
   (ls -l)
   (echo "Hello, world!")
   (define me (whoami))
   (echo -n Hello to me)
   (define l (ls))
   (wc -l < l)
   (racket)))

For now, we don’t want to bother parsing at the level of parentheses, so we’ll actually write

"example.rkt"
#lang s-exp "pfsh.rkt"
(define me (whoami))
(echo -n Hello to me)

The s-exp language doesn’t do anything but parse parentheses into syntax objects. For this example, it directly generates the syntax object

(module example "pfsh.rkt"
  (#%module-begin
   (define me (whoami))
   (echo -n Hello to me)))

Without creating a "pfsh.rkt" file, copy the #lang s-exp "pfsh.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "pfsh.rkt" module, but it will show you the parsed form.

which is half-way to where we want to be: the define and echo syntax objects are still here to be expanded by macros, but we no longer have to worry about parsing characters. (The change from pfsh to "pfsh.rkt" just lets us work with relative paths, for now, instead of installing a pfsh collection.)

10.3 The Core module Form

The core module grammar is

Module

(module name initial-import-module

(#%module-begin

form ...))

(module _name _initial-import-module

form ...)

The second variant is a shorthand for the first, and it is automatically converted to the first variant by adding #%module-begin.

For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.

If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.

Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.

Here is the simplest possible Racket language module:

"simple.rkt"
#lang racket
(provide #%module-begin)

Since "simple.rkt" provides #%module-begin, it’s a valid initial import. You can use it in the empty program

"use-simple.rkt"
#lang s-exp "simple.rkt"

as long as "use-simple.rkt" is saved in the same directory as "simple.rkt" (so that the relative path works). You can add comments after the #lang line, since comments are stripped away by the parser. Nothing else in the body is going to work, though. Actually, (#%module-begin) will work, since #%module-begin is bound and since s-exp relies on the implicit introduction of #%module-begin instead of adding it explicitly. That’s a flaw in s-exp.

10.4 A First Implementation of pfsh

It’s going to take a few steps to get to the pfsh language in all of its glory. As a first step, let’s create a variant "pfsh0.rkt" that has a run form to run an external program:

#lang s-exp "pfsh0.rkt"
(run ls -l)

Since that’s equivalent to

(module example "pfsh0.rkt"
(#%module-begin
(run ls -l)))

then we need to create a "pfsh0.rkt" module that provides #%module-begin and run. The run macro’s job is to treat its identifiers as strings and deliver them to the run function that we defined in "run.rkt":

"pfsh0.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

We’ve wrapped void around the call to run to suppress the success or failure boolean that would otherwise print after the run program’s output.

10.5 Compile Time and Run Time

Notice that pfsh so far has two parts:

the compile-time part that is about dealing with the syntax of the language, here implemented by the pfsh:run macro in "pfsh0.rkt"; and
the run-time part that is called by generated code, here implemented by the run function in "run.rkt".

Although we happen to have implemented the two parts in different modules, they don’t have to be different. We could just as well have put the run function’s implementation directly in "pfsh0.rkt":

"pfsh0-alt.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

(define (run prog . args)
  (apply system* (find-program prog) args))

(define (find-program str)
  (or (find-executable-path str)
      (error 'pfsh "could not find program: ~a" str)))

At this point, it’s worth double-checking that we have appropriately sorted computation in the compile and run phases. Generally, it’s better to perform a computation at compile time instead of run time, if possible. In this case, the pfsh:run macro generates symbol->string expressions to convert symbols to strings at run time,It’s a good idea to let the compiler optimize away computations when it can. Unfortunately, symbol->string is defined to generate a fresh mutable string every time it’s called, and the compiler cannot tell that the freshness is unnecessary here, so it won’t optimize the symbol->string calls to literal strings. but that conversion could be performed at compile time, instead. Let’s improve pfsh:run to perform that work at compile time.

The most obvious way to move the computation is to immediately escape back to compile time in the result template for pfsh:run:

"pfsh1a.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #`(void (run #,(symbol->string (syntax-e #'prog))
                  #,@(map symbol->string
                          (map syntax-e
                               (syntax->list #'(arg ...))))))]))

Alternatively, we can stay within the template language in pfsh:run and defer the compile-time escape to a helper macro:

"pfsh1.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

(define-syntax (as-string stx)
  (syntax-parse stx
    [(_ sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

10.6 More Language Variations

Let’s continue implementing pfsh by solving the following problem.

Sample Problem Add input redirection to run by allowing < followed by a defined identifier at the end of a run form. You’ll probably find syntax/parse’s ~datum or #:datum-literals handy, as well as with-input-from-string from racket/port. Call the extended language "pfsh3.rkt".
"use-pfsh3.rkt"
#lang s-exp "pfsh3.rkt"
(define l (run ls))
(run wc -l < l)

Here is a solution.

"pfsh3.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]
                     [pfsh:define define]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog:id arg:id ... < stream:id)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

(define-syntax (as-string stx)
  (syntax-parse stx
    [(_ sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]))

10.7 The Application Form

So far, the biggest difference between the pfsh that we’ve implemented and the pfsh that we want is that we have to put run before every program name. Instead of (run ls), we want to write (ls).

Since macros can do any kind of work at compile time, you might imagine changing pfsh so that it scans the filesystem and builds up a set of definitions based on the programs that are currently available via the PATH environment variable. That’s not how scripting languages are meant to work, though. Also, it’s likely to cause trouble to use the filesystem and environment-variable state at such a fine granularity to determine bindings of a module.

Instead, we would like to change the default meaning of parentheses. In Racket, a pair of parentheses mean a function call by default. In pfsh, a pair of parentheses should mean running an external program by default. The “by default” part concedes that an identifier after an open parenthesis can change the meaning of the parenthesis, such as when define appears after an open parenthesis. Otherwise, though, it’s as if a function-call identifier appears after the open parenthesis to specify a function-call form... and function-call exists, except that it’s spelled #%app.

In other words, in the racket language, when you write

(+ 1 2)

since + is not bound as a macro or core syntactic form, that expands to

(#%app + 1 2)

The #%app provided by racket is defined as a macro that expands to the core syntactic form for function calls. That core form is also called #%app internally, but in the rare case that we have to refer to the core form, we use the alias #%plain-app.

To change the default meaning of parentheses for pfsh, then, we can rename pfsh:run to #%app on export:

"pfsh7.rkt"
#lang racket
....
(provide ....
(rename-out [pfsh:run #%app]
....))
....

After that small adjustment, we conceptually change each run in a pfsh module to #%app, but we don’t actually have to write the #%app, since it’s added automatically by the expander:

"pfsh7.rkt"
#lang s-exp "pfsh7.rkt"
(define l (ls))
(wc -l < l)

10.8 More Implicit Forms

You can have seen two implicit forms that a language can adjust, #%module-begin and #%app, so you may wonder how many implicit forms there are. The others are #%datum, #%top, and #%top-interaction.

10.8.1 #%datum

Try including a number in a "pfsh7.rkt" program like this:

#lang s-exp "pfsh7.rkt"
0

The complaint is “literal data is not allowed; no #%datum syntax transformer is bound.”

The #%datum form is implicitly wrapped around a literal constant such as 0, #true, or "apple" when it appears in a place where an expression is expected. Since the #%datum form always has a single subform, it takes advantage of a performance hack internally by being written with parentheses and a ., which corresponds to a non-list pair instead of a list; so, 0 is implicitly (#%datum . 0), and so on.

Let’s not allow numbers in pfsh, but let’s allow literal strings, which can be useful for piping to a program’s input. Since a literal string is useful as a program’s input, let’s also change #%app to allow any expression after a < redirection.

"pfsh8.rkt"
#lang racket
....

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:define define]
                     [pfsh:datum #%datum]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog:id arg:id ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

....

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ . (~var s str)) #'(#%datum . s)]
    [(_ . other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

Now, this program works:

"use-pfsh8.rkt"
#lang s-exp "pfsh8.rkt"
(wc -w < "a b c")

and a program that has a literal number reports a better error message.

10.8.2 #%top

If you use an identifier that isn’t provided by "pfsh8.rkt" and isn’t between parentheses,

#lang s-exp "pfsh8.rkt"
oops

then you’ll get a message that mentions #%top. The #%top form is wrapped around an identifier that has no binding.

We could improve the error for users so that it doesn’t mention the implicit name #%top:

(define-syntax (complain-top stx)
  (syntax-parse stx
    [(_ . x:id)
     (raise-syntax-error 'variable "misplaced" #'x)]))

Or we could go a different direction, which is to treat an unbound identifier anywhere the same as a string. Then, #%app doesn’t need to insert any symbol conversions. That’s the direction we take in "pfsh9.rkt":

"pfsh9.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:top #%top]
                     [pfsh:define define]
                     [pfsh:datum #%datum]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

(define-syntax (pfsh:top stx)
  (syntax-parse stx
    [(_ #,dot sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]))

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ #,dot (~var s str)) #'(#%datum . s)]
    [(_ #,dot other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

A benefit of this strategy is that we can now use defined variables as program arguments. A bound identifier as an argument is replaced with its value, while an unbound identifier is converted to a string:

"use-pfsh9.rkt"
#lang s-exp "pfsh9.rkt"
(define me (whoami))
(echo Hello to me)

10.8.3 #%top-interaction

Finally, you may have noticed that when you run any of the working programs with "pfsh9.rkt" and earlier variants, DrRacket usually reports “Interactions disabled: language does not support a REPL (no #%top-interaction).”

The #%top-interaction form is wrapped around any expression entered into the interactions window, and DrRacket notices that it will never work in our pfsh implementations, so it doesn’t provide a prompt. We could enable the interactions window to have the same kinds of forms as a program by providing a #%top-interaction that just removes itself:

"pfsh9.rkt"
....

(provide ....
         (rename-out ....
                     [pfsh:top-interaction #%top-interaction]))

(define-syntax (pfsh:top-interaction stx)
  (syntax-parse stx
    [(_ . form) #'form]))

....

Now, when you run a program, you can keep interacting after the script completes:

"use-pfsh10.rkt"
#lang s-exp "pfsh10.rkt"
(echo Ready!)

10.9 Defining Functions

Our pfsh implementation can now run the original example script, but let’s go a little further. An advantage of a parenthesis-friendly shell is that we can mix in more of Racket to better support abstraction in a script. At a minimum, we’d like to be able to define and call functions in pfsh scripts:

"use-pfsh11.rkt"
#lang s-exp "pfsh11.rkt"
(define (double x)
(string-append x x))

(define l (ls -l))
(wc -l < l)
(wc -l < (double l))

It’s easy to make the string-append function available. It’s also easy to change define to match and distinguish function and stream shapes—but what should a function definition expand to? We have defined #%app so that it treats its “function” position as a string name of an external program. When an open parenthesis is followed by the name of a function that we have defined within the script, then we’d like the application form to mean a function call, instead.

Here are two ways to make the adaptation work:

We can change #%app so that it inspects an identifier in the “function” position to check whether the identifier is bound. If so, the #%app corresponds to a function call.
In this case, a define for a function can expand to a regular define.
We can make define bind a function name as a macro, so that using the name after an open parenthesis triggers a function-specific macro instead of the generic #%app form.
In this case, a define for a function needs to expand to define-syntax to bind the function name as a macro.

Slightly different behaviors fall out from each of these strategies. With the first strategy, an identifier that is bound to a string for a program name cannot be used to run the program, because using the identifier after an open parenthesis would trigger a function call. With the second strategy, a name bound to a string still works as a program name, but a function identifier doesn’t work as an argument to another function (unless we do a little more work to make that possible). Both approaches are viable, and either could be made to fit a preferred behavior, so let’s try both of them.

10.9.1 Detecting Bindings

To try the first strategy, we need #%app to recognize whether an identifier has a binding or not. Since the #%app macro receives only an immediate application form, how can it know what definitions are in the rest of the module? That is, although the #%app macro can do any work its wants at compile time, it doesn’t have a handle on the whole module to inspect it. The macro expander itself must know about bindings, because it uses binding information to determine which macro should handle an expansion. Happily for our #%app, the macro expander shares its binding information with macros in several ways, including through a identifier-binding function.

The identifier-binding function takes an identifier and reports #f if the identifier has no binding. Otherwise, it reports some information about the binding, such as which module (possibly the current one) contains a definition of the identifier. For our purposes, we do not care about the additional details, so we can just check whether identifier-binding returns #f.

Specifically, we add a new clause to the syntax-parse form in pfsh:app, where the clause has a #:when guard so that it only applies when identifier-binding produces a non-#f value:

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg ...)
     #:when (identifier-binding #'prog)
     #'(prog arg ...)]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

Meanwhile, pfsh:define recognizes a function definition and passes it on to racket’s define:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(define (proc arg ...) expr)]))

The string-append function can be provided as-is:

(provide ....
string-append)

10.9.2 Macro-Defining Macros

With the strategy where define binds a function name as a macro, we don’t have to change #%app. We just have to change pfsh:define to compile a pfsh function definition into a racket macro definition.

Here’s a first attempt, but it’s not right:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(define-syntax (proc stx)
         (syntax-parse stx
           [(_ arg ...) #'expr]))]))

This broken attempt directly substitutes the body of the pfsh function in place of a function call. That causes the expression to be “inlined” in every call, which is probably a bad idea. Even worse, it causes each argument expressions to be copied in place of every use of the argument in the body expression.

To solve those problems, even though a pfsh function definition needs to expand to a racket macro definition, we also want a racket function. So, a pfsh definition should expand to both:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(begin
         (define (actual-proc arg ...)
           expr)
         (define-syntax (proc stx)
           (syntax-parse stx
             [(_ arg ...) #'(actual-proc arg ...)])))]))

For example, the definition

(define (double x)
(string-append x x))

expands to

(define (actual-proc x)
  (string-append x x))

(define-syntax (double stx)
  (syntax-parse stx
    [(_ arg ...) #'(actual-proc arg ...)]))

Since the #'(actual-proc arg ...) form originates from a module in the racket language (i.e., the implementation of pfsh), it uses the normal #%app from racket, so the macro expansion is always a regular function call. Even though the function always is named actual-proc, the macro system will arrange for different bindings for different expansions, so it’s ok to define multiple functions in a pfsh script.

There’s a small catch using this approach. We can’t just export string-append for use in pfsh scripts, because string-append is not a macro. Instead, in the implementation of pfsh, we need to use pfsh:define to define a variant that is implemented with string-append and then provide the new variant:

(provide ....
         (rename-out ....
                     [pfsh:string-append string-append]))

(pfsh:define (pfsh:string-append arg1 arg2)
             (string-append arg1 arg2))

10.10 Installing a Language

Let’s take the last step in defining a language, which will let use switch from #lang s-exp "pfsh11.rkt" to #lang pfsh. To enable writing #lang pfsh, we must do two things:

Adjust our language implementation so that it explicitly specifies S-expression parsing, instead of having S-expression parsing imposed externally.
Install our language as a package so that #lang pfsh will work from anywhere.

The part of a language that specifies its parsing from characters to syntax objects is called a reader. A language’s reader is implemented by a reader submodule (i.e., a nested module) inside the language’s module. That submodule must export a read-syntax function that takes an input port, reads characters from it, and constructs a module form as a syntax object. For historical reasons, the submodule should also provide a read function that does the same thing but returns a plain S-expression instead of a syntax object.

Here’s one way to implement the reader suubmodule:

(module reader racket
  (provide (rename-out [pfsh:read-syntax read-syntax]
                       [pfsh:read read]))

  (define (pfsh:read-syntax name in)
    (datum->syntax #f `(module anything pfsh
                         (#%module-begin
                          ,@(read-body name in)))))

  (define (read-body name in)
    (define e (read-syntax name in))
    (if (eof-object? e)
        '()
        (cons e (read-body name in))))

  (define (pfsh:read in)
    (syntax->datum (pfsh:read-syntax 'src in))))

Notice that pfsh:read-syntax constructs a module that uses pfsh as the initial import. Otherwise, it doesn’t really do anything specific to pfsh, and most of the work is performed by the built-in read-syntax function that reads a single term (such an an identifier or parenthesized form) as a syntax object. In fact, since this pattern is so common, Racket provides a syntax/module-reader language that expects just the pfsh part and builds the rest of the submodule around that. #;

(module reader syntax/module-reader
pfsh)

In short, we just need to add those two lines to our current pfsh implementation, and then save it as "main.rkt" in a "pfsh" directory. Here’s the complete implementation:

"pfsh/main.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:top #%top]
                     [pfsh:define define]
                     [pfsh:datum #%datum]
                     [pfsh:top-interaction #%top-interaction]
                     [pfsh:string-append string-append]))

(module reader syntax/module-reader
  pfsh)

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

(define-syntax (pfsh:top stx)
  (syntax-parse stx
    [(_ #,dot sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(begin
         (define (actual-proc arg ...)
           expr)
         (define-syntax (proc stx)
           (syntax-parse stx
             [(_ arg ...) #'(actual-proc arg ...)])))]))

(pfsh:define (pfsh:string-append arg1 arg2)
             (string-append arg1 arg2))

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ #,dot (~var s str)) #'(#%datum . s)]
    [(_ #,dot other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

(define-syntax (pfsh:top-interaction stx)
  (syntax-parse stx
    [(_ #,dot form) #'form]))

You’ll also need "run.rkt" in the same "pfsh" directory.

To install this as a package, select Install Package... from the DrRacket File menu, click the Browse button to select a Directory, and select the "pfsh" directory. Alternatively, run

raco pkg install pfsh/

on the command line—and beware that the trailing slash is necessary (otherwise, raco pkg will consult a remote server to look for a registered pfsh package).

After either of those steps, you can run

#lang pfsh
(echo Hello!)

← prev up next →

	Prelude
1	Programming Languages
2	A Simple Model
3	Onward to Racket
4	Syntax Pattern
5	Language Extensions
6	Syntax Classes
7	Macro Conspiracies
8	Scope and Hygiene
9	Embedded Languages
10	Module Languages
	Postlude

10.1	Defining Modules
10.2	From Modules to Languages
10.3	The Core module Form
10.4	A First Implementation of pfsh
10.5	Compile Time and Run Time
10.6	More Language Variations
10.7	The Application Form
10.8	More Implicit Forms
10.9	Defining Functions
10.10	Installing a Language