1 — Programming Languages: Research and Teaching
Tuesday, 07 January 2020
To this day, most developers consider programming languages their primary tools of trade. When they think of their favorite language, they tend to include the entire ecosystem: the interactive development environment or at least the language-specific mode; the libraries; the on-line repositories; the social community; and perhaps the stackoverflow part dedicated to their community.
Software system architects recognize the value of individual languages but they also understand that the language can only break a project, it can’t make it. They know that, in this day and age, languages once again play special roles on particular platforms: JavaScript (really: jQuery, Angular, React, D3, etc) for the front-end; Java or C# for the back-end; SQL for the database connection; Python for the machine learning people who need to script some library or other because they can’t program anyways; and a few fringe languages that are taking over some aspect or other.
The managers at software companies hate programming languages. They absolutely hate them. What they want is the one last and forever language. If they had that, hiring and firing programmers would just be like hiring and firing interchangeable assembly-line workers. They want researchers to find this language, badly.
Programming languages is the oldest and most central research area in computer science. Researchers in this field know that there is no such thing as the one and only language; otherwise they would move on to another research field. They do recognize that a programming language, like a human being, has a dual nature: on one hand, a language is a mathematical idea; on the other hand, it is a technical artifact that exists on a computer. Due to a variety of reasons, a large majority of researchers focus on the mathematical aspect of languages, many on the technical aspect, and only a few attempt to bridge the gap.
In this course I will try to convey both sides of programming languages. I firmly embrace the idea that every programmer must understand the mathematical idea and, to do so, programming is a necessity. In this spirit, you will program a lot and thus begin to understand some of the many ideas that float around in this vast, spectacular field called programming languages.
Principles of Programming Languages
What makes up a programming language?
One lecture in Fundamentals I covers the basic idea using Beginning Student Language as an example. The phrase "it’s just semantics" has no meaning for programming language researchers. According to this lecture, a language comes with an alphabet (for making new words), a basic vocabulary, grammatical rules that govern the composition of words, and meaning—
often called semantics. What do the linguistic constructs of my chosen programming language mean?
Like a natural language, a programming language exists to convey thoughts from one programmer—
the creator of code— to many others— the maintainers. Once these thoughts are expressed in a programming language, a computer can also understand and interpret them on behalf of a consumer. Some people call an interpreter an operational semantics. But the mathematics of programming languages also knows of mathematical-operational semantics and denotational semantics and even other forms.
Hence one first idea for studying the meaning of language constructs is to use the interpretation procedure found in computers:Interpretation is the process of determining the answer and outputs of a program that is supplied with the desired inputs.
Compilation is the process of translating a program P written in language A to language B. To determine the answer of P in B it is now necessary to run a B interpreter on this new code.
A-Interpreter = B-Interpreter o A-to-B-compiler
Currently most languages compile to the language of a virtual machine, which then interprets these codes, possibly with intermediate compilations to the virtual machine in the hardware, also known as firmware.Once we understand the basics of interpretation, we can discuss differences between expected meaning and actual meaning in a precise fashion, and we might be able to pinpoint its cause.
How can a programming language (not) help developers convey intent?
The Church-Turing hypothesis states that the class of all full-fledged programming languages can express the exact same class of functions on the natural numbers. Thus far, all basic models of computations we have discovered satisfy this hypothesis; the proof always proceeds by constructing a translation from the new model to an existing one that is known to fall into this class.
While many computer scientists believe that this hypothesis describes our limits of programming—
if they are still aware of it— I consider it completely useless for the working programmer. Time permitting, I will explain my alternative idea of comparing programming languages, which is much more in tune with how developers work and thus perceive programming languages. What properties do languages guarantee for all programs?
Weak forms of "intent" come with universal guarantees, that is, guarantees that apply to all programs. If a language comes with such guarantees, it is easier to understand some aspects of the written text clearly. For a simple example, some languages are memory safe, meaning programs cannot accidentally mess up and reference memory outside their boundaries.
Types are the most widely studied "guarantee system". A type system is a mechanism for making and validating claims about (pieces of) programs. If an implementation is in sync with the type system of the language, the behavior of programs satisfies certain properties that are helpful to programmers as they cope with run-time errors. C++ is, for example, a language whose type system and implementation are out of sync; Java and its type system live mostly in harmony.
A recent development is the emergence of languages that permit both a typed and an untyped style of software development. They come under various slogans, e.g., optional typing, pluggable type systems, gradual typing, and migratory typing. Hack and TypeScript are two prominent industrial examples; Typed Racket—
the first such language— is the most complete academic research vehicle in this area. The emergence of such languages has re-opened the question of what it means for type systems to relate to the behavior of all programs. How do we implement programming languages/language constructs efficiently?
When I taught at Rice, known as the Department of Compiler Science at the time, we ran one undergraduate course on Compilers and three, non-overlapping follow-up courses.
This is a wide open question and still subject to research. I will touch on this topic, but Compilers (taught by Ben Lerner and Olin Shivers) addresses this question in some depth.
Like in most contexts, researchers focus on the questions they can answer at the cost of ignoring those that are of true interest to people who must use these technical artifacts. Fortunately the two parts overlap to some extent. This course will focus on well-established ideas from the overlap.
Administrivia I
You must pair-program on all homework assignments. We will attempt to set up partnerships during the first lecture.
If you wish to earn a grade in this course, read General and act on it immediately, together with your partner.
Next, as soon as you are assigned a github repository, start working on
1 —
Representing Programs as Data
An interpreter for a language is often called a meta-interpreter, especially when the interpreter is written in the same language as the one it is interpreting.
Say we wish to understand a programming language by writing programs in another programming language. First we need words to express this idea. The first language is called object language, as in the "object of study" or the "object of interest"; the second one is called meta-language as in the language in which we think about language.
The design recipes of Fundamentals I and II originate in programming language. Naturally when people study programming languages, they also ask how to best program in any given language. They then use these insights as they write programs for their research and beyond.
According to the design recipe philosophy, the first step is to determine a data representation for the information in the application domain. For us, the application domain is a programming language or actually programming languages. This form of reflective (or recursive) notion is common in the foundational areas of computer science, and admittedly, it needs some getting used to.
Focusing on a single part or aspect of a language is often referred to as modeling; here we think of modeling as leaving out (currently) uninteresting details.
Example Let’s make this concrete. Say we wish to understand a miniscule part of a object language, such as plain arithmetic.
-- an integer |
where ae represents an arithmetic expression. |
Luckily, whoever described this language of arithmetic expressions knows that this form of descriptions lends itself to a simple translation into the kind of data definitions Fundamentals I and II employ. So here are three ways of picking a data representation in three different languages.
Racket, which is not the language you used in Fundamentals I but represents the large class of scripting languages currently in vogue.
Almost every interesting form of information can be data-represented in several different ways. Here are three, one of which you should be able to translate into your favorite language but see below for alternative languages:quoted
This data representation is convenient for writing data examples, step 1a of the design recipe: '(1 + 1).with structures
#lang racket (struct plus [left right] #:transparent) (struct mult [left right] #:transparent) ; AE = Integer | (plus AE AE) | (mult AE AE) This one has a different advantage; rendering the information as data programmaticly eliminates vocabulary and grammatical mistakes once and for all.with classes,
#lang racket (define ae% (class object% (super-new))) (define const% (class ae% (init-field value) (super-new))) (define plus% (class ae% (init-field left right) (super-new))) (define mult% (class ae% (init-field left right) (super-new))) This representation is one that a Pythonista could easily write, too. [Of course, both Racketeers and Pythonista would actually abstract over the similar expressions because classes really are just first-class objects.)
- Typed Racket, is somewhat representative of typed functional languages:
#lang typed/racket (struct plus [{left : AE} {right : AE}] #:transparent) (struct mult [{left : AE} {right : AE}] #:transparent) (define-type Plus plus) (define-type Mult mult) (define-type AE (U Integer Plus Mult)) Here we use the sub-language of structs and a related sub-language of types to specify the data representation; as we will see, using types allows us to check some properties of the program before we run it. Java, a representative of the typed object-oriented family of languages:
interface IAE {}
class Cons implements IAE {
int i;
Cons(int i) { this.i = i; }
}
class Plus implements IAE {
IAE left;
IAE right;
Plus(IAE left, IAE right) { this.left = left; this.right = right;}
}
class Mult implements IAE {
IAE left;
IAE right;
Mult(IAE left, IAE right) { this.left = left; this.right = right;}
Interpretation
Sample Problem
The AE language determines the value of an arithmetic expression in exactly the same way that a grade school student calculates.
Design a program that models this calculation process.
Lectures/1/interpreter.rkt
#lang racket (module+ test (require rackunit)) ;; AE = Integer | (list AE '+ AE) | (list AE '* AE) ;; data examples ;; -------------- (define ex1 42) (define ex2 '(1 + 1)) (define ex3 '((1 + 1) * (1 + 1))) ;; interpreter ;; ----------- ;; AE -> Number ;; evaluate this arithmetic expression (module+ test (check-equal? (value ex1) 42) (check-equal? (value ex2) 2) (check-equal? (value ex3) 4)) (define (value an-ae) (match an-ae [(? number?) an-ae] [(list left-ae '+ right-ae) (+ (value left-ae) (value right-ae))] [(list left-ae '* right-ae) (* (value left-ae) (value right-ae))]))
Figure 2 displays the complete solution of this sample problem in Racket using the list representation introduced above. It consists of one function, value, and the design of this function follows the design recipe of Fundamentals I to the dot. In Programming Languages value is called an interpreter.
Figure 3 shows a partial use of the design recipe of Fundamentals II. While the code in the figure is the essence of the solution, it is missing both examples and tests.
interface IAE {
// determine the value of this arithmetic expression
int value();
}
class Cons implements IAE {
int i;
Cons(int i) { this.i = i; }
public int value() { return this.i; }
}
class Plus implements IAE {
IAE left;
IAE right;
Plus(IAE left, IAE right) { this.left = left; this.right = right;}
public int value() { return this.left.value() + this.right.value(); }
}
class Mult implements IAE {
IAE left;
IAE right;
Mult(IAE left, IAE right) { this.left = left; this.right = right;}
public int value() { return this.left.value() * this.right.value(); }
}
Stop! If you wish to use an object-oriented meta-language for this course, figure out how to formulate unit tests in this language. You may wish to go back to your notes from Fundamentals II and complete the figure above with test harness used there; alternatively, use jUnit to formulate the tests.
Stop again! The two programs actually differ in an important aspect. While Racket comes with a precise representation of the integers (dubbed "bignums"), Java’s int type represents only a miniscule portion of the integers. This is our first example of how the choice of meta-language may influence our answers to questions about an object language, here the question of how AE determines the value of an expression.
Administrivia II
One goal of first homework (1 —
From Information to Data
#lang racket ; data representation ; AE = Integer | (list AE "+" AE) (define example1 '(1 "+" 1)) ; information JSON text ; AE-J is one of: ; (1) a number ; (2) an array of 3 elements: an AE-J, the string "+", and another AE-J (define json1 "[1,\"+\",1]") ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (require json) (require rackunit) (check-equal? (with-input-from-string json1 read-json) example1 "representation (from information to data)") (check-equal? (with-output-to-string (λ () (write-json example1))) json1 "interpretation (from data to information)")
Administrivia III
You must find a JSON library for your chosen language that allows you to read and write JSON as demonstrated above. For example, Java comes with the GSon library for processing JSON (from Google) and also has a built-in one; if Java is your choice, explore both and decide.
In general, the second goal of first homework (1 —
Test Harnesses and Programs
In this course, we use the term program for code that operates on
(internal) data representations. By contrast, a test
harness reads (external) information—
test-Harness.rkt
#lang racket ; EFFECT ; - read JSON array of numbers from STDIN, ; - compute sum ; - WRITE sum to STDOUT as JSON number (provide main) (require json "program.rkt") (define (main) (write-json (program (read-json)))) program.rkt
; [Listof Number] -> Number (define (program lon) (apply + lon))
Figure 4 illustrates the two terms with two files (called module). The first reads a JSON array of numbers into a list of n numbers, uses the program to adds them up, and then prints the resulting number (as JSON). The knowledge that a JSON array reads as a list is documented in the json library.
Administrivia IV
You must determine for your chosen language how to read JSON from STDIN and STDOUT. All of our test harnesses will use these capabilities.