1 Programming Languages

7.8.0.8

1 Programming Languages

Goals

— the many ways of language-oriented programming

— programming languages in Racket

1.1 One Project, Many Programming Languages

In this day and age, a software project employs many different programming languages. A “full stack” web application has software running in the web browser and the web server, and this software is rarely written in the same language. Furthermore the server is hardly ever a monolithic piece of software in one language. It typically glues together code that interprets requests (from the network), turns them into database queries, checks some business logic, and many more actions. Each of them might have been written in a different language (and possibly in a different era).

A few years ago a colleague from IBM presented (roughly) the following stack during his keynote address at POPL:

Programming with multiple external programming languages has been a reality for decades. It still is. Here are reasons why projects use many languages:

history—someone started a project in a language, and the language falls out of favor.
platform—a new platform appears and demands attention. The new platform (terminal, pc, browser, mobile) does not support the language in which the software is currently written.
expressiveness, productivity—on rare occasions, a team of developers can convince the project manager that a new programming language will make them more productive than the one that has been used so far.

There are probably more such reasons but, regardless, the only way to add a new component in a new language comes with a singular condition: there must be a “natural” way to separate the two components. Usually “natural” means the two components may communicated via an easy-to-use FFI or they are forced to communicate via some form of input/output anyways (say, the network for a view written for a web browser).

While one could call this form of programming, “language-oriented programming” this is a wider notion than the one we cover here. Indeed, in a way we would like to help programmers avoid such “language stacks” by internalizing it into Racket, as much as possible at least—because what we do accept from this historic development is that the use of multiple languages makes a software developers life easier. A close look at programming languages shows that designers have recognized this fact a long time ago, possibly from the very first moment they abstracted over assembly.

1.2 One Programming Language, Many Languages

Almost every programming language comes with several distinct sub-languages that deal with distinct programming domains. Let’s look at a couple of examples in Racket, illustrating how far back this idea dates.

1.2.1 Embedded Programs as Strings

Strings are the easiest way for a programming language to embedded programs from a separate language. The first example is the familiar one of format strings, dating back at least to platform-specific dialects of Algol 60:

"~a :: ~a\n"

By itself, such a format string is pointless. But, if a programming language supports interpreters for such strings, it greatly facilitate the rendering of values into strings for output devices:

; formatting strings to prepare for printing
(printf "~a :: ~a\n" "hello" 'String)

The printf function really plays the role of an interpreter for a program written as a string, whose inputs is a sequence of arbitrary Racket values. Of course, neither the Racket compiler nor the IDE understand the program because it is a string. Hence they can’t statically analyze it (well) and the developer is left without much assistance.

In Racket there are several interpreters for such embedded string-programs:

(format "~a :: ~a\n" "hello" 'String)

And this is a common phenomenon.

The language of regular expressions is a second string-based example that is equally common in modern languages. Many (all?) modern programming languages come with functions that interpret certain strings as regular expressions and matching such expressions against strings:

; String -> False of [List String Char-String Char-String]
(define (extract-digits year)
(regexp-match "20(.)(.)" year))

This is a function that extracts the last two digits of a string that represents a 21st-century year:

(extract-digits "2018")
(extract-digits "1999")

Again, regexp-match servers as an interpreter for an embedded program here. Racket comes with different embedded languages of regular expressions, and some facilitate solving this problem even more than plain regular-expressions-as-strings:

; String -> False of [List String Digit-String Digit-String]
(define (extract-digits-version-2 year)
(regexp-match #px"20(\\d)(\\d)" year))

In the “#px” language of regular expressions, we know that “\d” really matches just digits so this version of the function is “more correct” than the previous one.

Stop! Pick a natural number n. Enumerate n disadvantages of embedded programs as strings, not just for the formatting and regular-expression languages, but any such embedded domain-specific language you know.

1.2.2 Embedded Programs as Grammatical Productions

Extending a language’s grammar is an alternative to strings for adding small sub-languages. For example, many modern languages come with sub-languages for matching tree-shaped patterns of data and others for dealing with external events. Others support “frameworks” that are nearly indistinguishable from separate languages, though those do not enforce certain basic grammatical constraints and instead use run-time checking. This section illustrates this point with Racket examples, mostly to point out immediately that non-experts can program such grammatical extensions.

Pattern matching is one prominent example of a grammatical sub-language. While pattern matching in a language with type inference is a necessity, developers have come to love the idea of de-structuring deeply nested trees of (algebraic) data with just a bit of notations and pattern matching has thus made it into all kinds of programming languages.

The rest of these notes expect readers to be comfortable with programs such as this one.

Here is an example from our world:

#lang racket

(module+ test (require rackunit))

; LAMBDA is one of:
; – symbol, but not 'function
; – a list of two LAMBDAs, or
; – a list of this shape: (list 'function (list symbol) LAMBDA)

(define simple-tree '((function (x) (x x)) (f y)))

; how many times 'function occurs as part of LAMBDA data
(define (how-many-functions tree)
  (match tree
    [(? (and/c symbol? (compose not (curry eq? 'function)))) 0]
    [`(,function-expression ,argument-expression)
     (+ (how-many-functions function-expression)
        (how-many-functions argument-expression))]
    [`(function (,(? symbol? function-parameter)) ,function-body)
     (+ (how-many-functions function-body) 1)]
    [_ (error 'how-many-functions "LAMBDA expected, given: ~e" tree)]))

(module+ test (check-equal? (how-many-functions simple-tree) 1))

A match expression consists of an expression followed by sequence of match clauses. The patterns on the left hand side of such clauses are a brand-new category of syntactic things that “match programmers” can write down. In the above example, the last proper pattern demonstrates particularly well how easy it is to extract both the embedded symbol and the embedded LAMBDA from a 'function list.

Patterns are a new syntactic category, not comparable to anything that exists in Racket. Clearly one immediate advantage of this arrangement is that the compiler can check the validity of the “pattern program;” there is no need to wait until run time to discover problems. Equally important, if a programmer accidentally places a pattern outside, the compiler can issue an explanatory error message.

Additionally, Racket patterns allow escapes to Racket as the first one shows. The Racket code can be arbitrarily complicated, using elements of any library. And naturally, this Racket code could use match again.

We call such embeddings of sub-languages fine-grained, because elements of one grammatical category (patterns) can embedded Racket expressions and vice versa.

Our second example originates from the Racket-based teaching languages for “How to Design Programs.” Those supply a domain-specific language for dealing with events such as clock ticks, key strokes, mouse movements and clicks, and more. We can use this language of event inside of regular Racket programs:

; dealing with events from the environment

(require 2htdp/universe)
(require 2htdp/image)

(define (main s0)
  (big-bang s0
    [on-tick   sub1]
    [stop-when zero?]
    [to-draw   (lambda (s) (circle (+ 100 (* s 10)) 'solid 'red))]))

Run (main 40) and watch how this program deals with clock-tick events.

Writing down a keyword such as on-key or even a complete on-key clause outside a big-bang context is a syntactic error:

> [on-key (lambda (s ke) s)]
on-key: used out of context
in: (on-key (lambda (s ke) s))

Moving this clause inside the above big-bang allows us to stop the shrinking-circle animation in mid-sequence.

Stop! What kind of embedded domain-specific languages for programmers does your favorite programming language support?

Hint. Consider how how the JavaScript world has developed many such domain-specific embedded framework-languages to deal with queries of the DOM as a database rather a recursive tree or event-handling via virtual DOMs.

1.3 Why Embedded Languages Matter

Programming languages come with many different sub-languages because language designers accept the communication role of code. Every piece of code that moves from the initial prototyping stage to the maintenance phase needs a lot of attention; developers must repeatedly re-visit the code, read it, comprehend it, modify it, get it through unit and integration testing. Language designers embrace the idea that the language in which developers express ideas matters; and they go further in that they recognize the need for specialized sub-languages for the different aspects of programs.

The advantage of internal sub-languages over external languages is also clear, Combining such special-purposed languages into a coherent whole is much easier than linking programs via input/output code:

Composition is a mere syntactic act.
Computation is accomplished via translation into the host.
Communication is easy because embedded programs compute host values. Of course, this form of communicating poses its own problems.

In short, internal languages take away a lot of the pain of program linking.

In general though, only the designers of a programming language can extend the language with new sub-languages. They do not enable ordinary developers to program such languages on their own. On one hand, they may not trust developers to create languages with the necessary coherence. On the other hand, equipping a language with the tools for programming new well-designed (implementations of) sub-languages has been a research problem for several decades, and we seem to have figured it out only recently.

1.4 One Racket Programmer, Many Languages

Racket translates these insights into an explicit design goal:

Racket empowers developers to add (sub)languages, and the process of adding these languages to the existing eco-system is free of any friction.

When we speak of language-oriented programming (LOP), we mean this specific idea.

Racket comes with an expressive interface for the front-end of its implementation, that is, the syntax system. It enables developers to write compile-time functions and to hook such functions into the compiler. As a result, Racket is easy to extend with forms that abstract over recurring patterns in code that cannot be abstracted over with functions (or other conventional means of linguistic abstraction). Over the past two decades, we have figured out how to use these tools to program a wide spectrum of new sub-languages within Racket.

Let’s go over these ideas with some simplistic examples before we dive in. Here is an example of two similar syntactic phrases:

(define (bigger-string x)

(number->string (add1 x)))

(provide

(contract-out

[bigger-string

(-> number? string?)]))

(define (smaller-than-5 x)

(< x 5))

(provide

(contract-out

[smaller-than-5

(-> number? boolean?)]))

Racket’s notion of language extension goes back to the primitive Lisp macros from 1964. The idea has been thoroughly studied in the intervening 55 years, and Racketeers have advanced it more than any other community in the Lisp family. While functional (or OO) abstraction doesn’t work to hide these two similar phrases, syntactic extension does. And with a well-designed mechanism for language extension the result of building an extension is a new form of abstraction.

One direction of advancement concerns the creation of language modules. Like all modern languages, Racket supports modules and a modular style of programming. Unlike in other languages, a Racket programmer chooses the programming language for each module in a software base independently of what languages the other components are written in. Conversely, a language is just a module. So to construct languages we write modules and to write modules we use languages. This notion of langauge-modules is key to writing software systems in Racket.

Here are two modules that use languages other than plain racket:

#lang datalog

#lang typed/racket

edge(a,b).

edge(b,c).

edge(c,d).

edge(d,a).

path(X,Y) :- edge(X,Y).

path(X,Y) :-

edge(X,Z),path(Z,Y).

path(X,Y)?

(provide string->er)

(define-type ER Exact-Rational)

(: string->er (String -> (U ER False)))

; convert string to exact rational if possible

(define (string->er s)

(define r

(parameterize ([read-decimal-as-inexact #f])

(string->number s)))

(and (rational? r) (exact? r) (ann r ER)))

Creating and experimenting with such languages has become straightforward in the Racket eco-system.

Another direction of advancement is about the creation of fine-grained language embeddings. The preceding section presents the idea of a language of match patterns. Two other sub-languages are about exporting from, and importing into, Racket. In Racket provide and require specifications employ their own rich sub-languages.

Programming both kinds of languages in Racket—module languages and embedded languages—benefits tremendously from what we refer to as linguistic inheritance. New languages implementations may inherit many notions from an existing one. This principle makes programming languages in Racket so effective and quick.

One particular inheritance that module languages and embedded languages can benefit from is extensibility. That is, we may want sub-languages to be as extensible as Racket itself, and we can occasionally achieve this partly by inheriting Racket’s form of extensibility in some ways.

For example, Racket’s match pattern language is extensible. Here is a concrete example, an extension that allows the expression (adder 1 2 3) to mean one thing in a Racket expression context:

> (adder 1 2 3)
'(2 3 4)

and something different in a Racket pattern context:

> (match '(2 3 4)
[(adder 1 2 3) "success!"]
[_ "** failure **"])
"success!"

That is, in a regular context the phrase is evaluated as a regular function application, while in a pattern context within a match the same phrase turns into a pattern (that sets up a match with the result of the function call).

Similarly, the provide and require sub-languages are extensible, too. See provide Macros and require Macros for the documentation on how to extend these languages.

1.5 Programming A Language, Programming Many Languages

The goal of these notes is to introduce you to the idea of programming a language extension, programming a complete language, and eventually programming as many languages as needed for any project.

In short, you can think of these notes as introduction to the art of hacking your own languages (in Racket).

← prev up next →

	Prelude
1	Programming Languages
2	A Simple Model
3	Onward to Racket
4	Syntax Pattern
5	Language Extensions
6	Syntax Classes
7	Macro Conspiracies
8	Scope and Hygiene
9	Embedded Languages
10	Module Languages
	Postlude

1.1	One Project, Many Programming Languages
1.2	One Programming Language, Many Languages
1.3	Why Embedded Languages Matter
1.4	One Racket Programmer, Many Languages
1.5	Programming A Language, Programming Many Languages