Referential Transparency

8.10.0.1

Referential Transparency

Matthias Felleisen

Jan 21 2019

Changed in version 1.2: Thu Feb 14 12:30:56 EST 2019, Tony Garnock-Jones
suggested some additional credits.

Changed in version 1.1: Tue Jan 22 12:30:08 CET 2019, thanks to Tony Garnock-Jones
for digging up the link to Quine’s book and pointing out typos

Changed in version 1.0: Sun Jan 20 20:00:11 GMT 2019, initial release

Abelson and Sussman’s highly influential Structure and Interpretation, particularly the section on Sameness and Change, has probably planted the seed for this argument in many people’s mind.

For decades now, programming-language researchers and others have used the phrases “referential transparency” or “referentially transparent” to praise certain programming languages (those that have “it”) and condemn others (those that don’t). Unfortunately, most don’t know how to define these terms properly; if they could, they might think twice about using these phrases in this way.

Substituting Equals for Equals

The most common response to “what is referential transparency” involves the sentence “it is always possible to substitute equals for equals.” The problem is that this reply is too simplistic. programming-language researchers, of all people, should know that the operation of substituting one term for another in a program requires a careful definition, and whether two terms are equal is an entirely separate question. Let’s look at both ideas carefully.

Referentially Transparent Positions

This example is inspired by Albert Meyer, Puzzles in Programming Logic, presented at the Workshop on Formal Software Development: Combining Specification Methods, Nyborg, Denmark, May, 1984.

Clearly, 2 + 2 is always equal to 4. Right?

Well, not really. Consider this silly little example:

(printf "2 + 2 = ~a\n" (+ 2 2))

Substituting the 2 + 2 with 4 would be wrong because between the "s, the programmer really meant to say 2 plus 2. And the point of this code snippet is to state that 2 plus 2 equals 4—not 4 equals 4, which is a completely different statement.

See Willard Quine’s Word and Object. Studies in Communication (§30 page 144). Cambridge, Massachusetts: Massachusetts Institute of Technology, 1960. The bottom half of page 144 is particularly enlightening.

Quine, a philosopher, pointed out a long time ago that all languages have referentially transparent positions and opaque ones. A position is a place in a “sentence” of the language. A referentially transparent position is a place where we can substitute a reference with its meaning. Positions are best represented as a complete “essay” with a single sentence removed. Quine calls these “essays with a hole” contexts. Since the inside of a quote(d sentence) is not a complete sentence, ripping out a part of a quote does not produce a context. Filling the hole of a context with a sentence yields a complete essay; filling the hole with a different sentence may or may not yield an equivalent essay.

While Quine was thinking about natural languages, it turns out his work applies to artificial ones, too. In an programming language, contexts are programs with one “programming-language sentence” removed. Here a “programming-language sentence” means any of the following: a definition such as val x = 1, an expression such as 1 + 1, a statement such as x := x + 1 or echo(x), or similar pieces. As in natural languages, we cannot think of pieces of a (quoted) string as an expression, so the above example is ruled out. So, not all positions in programs are referentially transparent, and that should raise a little red flag in everyone’s mind.

What Are Equals

Let’s turn to the second question, namely,

when are two programming-language sentences equal?

Going back to Quine and the idea that referentially transparent positions are those where we can replace references with their meaning suggests we should use the meaning of programming-language sentences as the ultimate arbiter for deciding equality.

Obviously we know that 2 + 2 evaluates to 4. But, programming-language sentences may have free variables, that is, they contain references to other “meanings” in the context of the sentence. For example, 2 + x uses x to refer to some yet-to-be-known number and x := 4 refers to some yet-to-be-known assignable variable (location) in the context. Should substituting equals for equals rule out such programming-language sentences? Since that would be rather limiting, we need an equality relationship that accounts for programming-language sentences with references whose meanings we don’t know.

So, we are looking for an equivalence relation that can evaluate programming-language sentences and that can replace any such sentence with its meaning—and thus by any other sentence with the same meaning—at referentially transparent positions. A bit more technically, we want a binary relation on the set of programming-language sentences that

is an equivalence relation (reflexive, symmetric, transitive)
is a congruence relation (can be used at any referentially transparent position), and
can equate sentences without free variables with their meaning.

Wow, this is a bit of a mouthful but we’re almost done.

See James Morris’s Lambda-calculus models of programming languages [page 53], Cambridge, Massachusetts: Massachusetts Institute of Technology, PhhD dissertation, Sloan School of Management, 1969.

Morris, an early programming-language researcher, investigated these binary relations and found that for any programming language, there are many binary relations that satisfy the above constraints. And, he also proved that there is one unique greatest relation that rules them all:

operational equivalence.

If eval is the function that maps all terminating programs to 0, then

a sentence e in a programming language is operationally equal to sentence e′

if Remember that functions are sets of pairs. We could write eval(C[e]) = 0 but then we would always have to say if eval is defined. Switching perspective to sets of pairs simplifies life.

(C[e],0) is in eval iff (C[e′],0) is in eval

for all program contexts C (where C[e] means C filled with e).

Since operational equivalence is the only unique binary relation of the kind we want and since all other such binary relation are subset of this greatest one, it is absolutely natural that we use it to define “equals.”

And this means that all programming languages have referentially transparent positions, including those with side effects and other strange behaviors. Ouch.

Referential Transparency, Definiteness and Unfoldability

Sondergaard and SestoftActa Informatica 27(6) 1990, pages 505-–517 deserve credit for a particularly careful attempt. Starting from Quine’s work, they roughly apply “referentially transparent” to a language whose entire syntax respects Quine’s idea of referentially transparent position. This attempt falls short, however, because it fails to pursue the idea of equality to the very end. Indeed, the definition is so synthetic that it may not apply to any realistic language at all.

contents ← prev up next →

	Modular Programming
	The Rational Programmer
	The Design Recipe
	Apology
	All As
	Teaching PL
	NSF
	The Laffer Curve of Types
	Referential Transparency
	Why/ How do I get a Ph D
	Free Speech
	Developing Developers
	Growing a Programmer
	Unions for Ph D Students?
	Racket is ...
	Why teach Ph D core courses, with programming languages
	Danger: unsafe languages
	What should the core achieve?
	Python for Asset-Backed Securities
	Measuring education
	Refereed conferences, not!
	Why teach programming languages
	.