On this page:
Church and Rosser Prove that their Calculus is Sound
From λ-Calculus to Programming Languages
Call-by-name, Call-by-value and the λ-calculus
What is the Role of Church-Rosser in this Context?
What is The Truth?
Don’t We Want Equations for Optimizing Compilers?
Don’t We then Get Nondeterministic Compilers?
Is it a Good Idea to Define a Semantics via Church-style Axioms?
And What About Nondeterminism?
8.14.0.2

Church Rosser🔗

Matthias Felleisen

Jun 2 2024

Changed in version 1.0: Sun Jun 2 14:04:32 EDT 2024, initial release

Over the past couple of months, several people have brought up the Church-Rosser property in conversation, and I was somewhat amazed at the uninformed opinions that these people reported.

As someone who has worked out his fair share of Church-Rosser theorems for several variants of λ-calculus-based language , I figured I would write down what I learned in this regard over the course of 40 years.

Church and Rosser Prove that their Calculus is Sound🔗

Neither Church nor Rosser designed a programming language. The λ-calculus per se is a programming language only as much as Turing machines are. So when someone says “people in 1936 knew how to deal with property X in programming languages via the Church-Rosser theorem” then this statement is (1) anachronistic and (2) wrong as history showed 40 years later.

Church and Rosser understood that writing down n axioms A to generate an equational proof system did not imply that working out proofs in this system would make sense. They had to show that

‘A ⊢ true = false’ is impossible.

Because once we can prove this equation, we can prove every other equation. But, a proof system is about discriminating falsehoods from truths, so we don’t want this equation.

A proof system that validates only true statements is sound.

Church and his colleagues had come up with several such systems, which Barendregt (North Holland, 1980) labels as λI, λK, λβ, λβη, and so on. Also, orienting these axioms into notions of reduction generates the same set of equations as using a plain proof system starting from equational axioms.

From λ-Calculus to Programming Languages🔗

In the late 1950s and 1960s our colleagues started to realize that the λ-calculus is an interesting basis for discussing aspects of programming languages. One aspect was scope; when is a variable occurrence x bound by a λ x. in its context—or what is lexical scope vs dynamic scope. Another one is language syntax; why not include functions as values in a programming language and mark them with the keyword λ or lambda. Once Scott had figured out how to construct a mathematical model For a short while, PL mistook Scott domains as The Truth behind λ-calculus, but the equations valid in typical Scott domains are also just a part of The Truth. See below. of the syntactic λ-calculus, Strachey could use the λ notation as a framework for writing down a semantics of (pieces of) existing programming laguages—but the meta-notation really just referred to elements of domains. (Yes, this is where store-passing and continuation-passing comes from, as well as plain “direct” semantics.)

And finally, Bohem, McCarthy, Landin, Reynolds, Morris, and others decided to design programming languages directly based on the λ-calculus. They added literal constants and libraries of functions, plus various conditionals and other forms. They were clever in explaining the meaning of their programs: they used evaluation functions (interpreters) often based on abstract machines; Landin’s SECD machine is a well-known example.

Notably they did not use Church’s axioms to specify the semantics of their designs.

Call-by-name, Call-by-value and the λ-calculus🔗

This last step—designing languages based on λ but using abstract machines to define the semantics—raises the obvious question how these languages relate to the systems of equations generated by Church’s axioms.

Algol’60 implements call-by-value as a compiler transformation, adding an assignment statement to the parameter at the entry of the procedure. The question had become interesting because some language designs used Algol’60’s call-by-name regime, evaluating argument expressions for every parameter occurrence. Others used call-by-value, meaning the argument expression was evaluated once and for all to a value before the actual function call took place.

Plotkin (1975, TCS) answered this question in a definitive manner. Here is the essence of his finding:
  • it is possible to define the meaning of a language via a set of Church-style axioms;

  • but, these notions of reduction are not the semantics;

  • the semantics is a function from programs to values, meaning it is always necessary to identify the subset of values in a language’s set of expressions;

  • different systems of axioms define different evaluation functions;

  • one system of axioms (λn), similar to Church’s, defines what language researchers label a language with a call-by-name convention;

  • a different set of axioms (λv) defines what language researchers label a language with a call-by-value convention;

  • the functions defined via sets of axioms (notions of reduction) are equal to the functions defined by abstract machines that implement the various call-by regimes.

What is the Role of Church-Rosser in this Context?🔗

To extract an efficient evaluation function, Plotkin proves that both calculi also satisfy the Curry-Feys Standardization property. A semantics generated from notions of reduction is a relation. To prove that it is a function—that evaluating the same program on the same input consistently produces the same result—requires a mathematical proof.

One of several lemmas in this proof is a variant of Church and Rosser’s lemma. That’s it.

What is The Truth?🔗

From Church and Rosser’s work, we know that λ ⊢ e = e is a proof that e = e is a true statement. But what is The Truth in the context of programming languages?

Following Morris’s dissertation (MIT 1968), Plotkin goes on to define observational equivalence for each semantics, a congruent equivalence relation on expressions. Two expressions e and e are observationally equal, written e ≈ e, if after plugging e and e into any arbitrary program context Can expression with a hole—the evaluation function applied to the programs C[e] and C[e] yields the same outcome.

Observational equivalence is the largest such relation, and as such it is recognized as The Truth. The Truth for any programming language is vastly larger than the set of equations of an axiomatic syntactic calculus. Indeed, The Truth for such λ-ish languages cannot be axiomatized. Period.

Don’t We Want Equations for Optimizing Compilers?🔗

Sure. We want true equations in the compiler. A λ-style calculus might be one way to validate such equations. But, given that
  • The Truth (e ≈ e) is so much larger than this one particular proof system (λ ⊢ e ≈ e),

  • and we want powerful optimizing passes,

the axioms play a minor role in the design of an optimizing compiler.

Instead, weq want as many true equations e ≈ e turned into transformations e ⟶ e if they can somehow improve the performance of a program (without changing its outcome). So what we really want are proof techniques for establishing The Truth between two expressions, one improving on the other.

Church-Rosser and notions of reductions are too simplistic for this job.

Use denotational semantics if you want lots of equations. Use logical relations to validate your favorite transformations. Use bisimulations. Use any proof technique that works.

Church-Rosser for axioms of notions of reduction is way too weak.

Don’t We then Get Nondeterministic Compilers?🔗

Yes.

Depending on when a transformation kicks in, we get different object code. But because they are validated elements of The Truth, running any of these outcomes produces the same value.

And that’s true in reality. I conjecture that all large compilers are non-deterministic translators.

Is it a Good Idea to Define a Semantics via Church-style Axioms?🔗

No.

Ploktin’s work shows that we want an evaluation function first. Define it via an abstract machine. Use relations on the surface syntax. Write an interpreter. But define the function first.

If, for some reason, you want a simple calculus such as λ, develop it after you have worked out the semantic function. Then use Church-Rosser to prove consistency of the reduction-axiom-based function and the original one.

Does this really work? Yes. Starting in 1985, I followed this program, developed λ-style systems for existing semantics, and proved a Church-Rosser theorem for these calculi:
  1. for call/cc-style continuations (LICS ’86, with Kohlbecker, Duba, and Friedman);

  2. a simpler calculus than the one in item 1 (TCS ’92, with Hieb);

  3. for delimited continuations (POPL ’88);

  4. for assignable variables in a λ-calculus-based language (POPL ’87, with Friedman);

  5. a simpler calculus than the one in item 4 (TCS ’92, with Hieb);

  6. for a language with exceptions (Info & Comp ’94, with Wright); and

  7. for call-by-need (JFP ’97, with Ariola; ESOP ’12, with Chang).

Items 2, 5, and 7 really show that Church-Rosser is a brittle syntatic property, and we can come up with several different axiom system, for each of which it is more or less elegant to prove Church-Rosser.

I also had one for futures (POPL ’95, with Flanagan; but it’s not in the paper).

And What About Nondeterminism?🔗

A nondeterministic language may evaluate the same program (and input) to distinct values. The semantics is not a function but a relation.

Designing a Church-Rosser calculus for such a language should be possible, using reduction on sets instead of expressions directly. But, this is just a boring exercise in proving some people wrong about the relationship between Church-Rosser and programming-language research. I don’t have time for this and there aren’t enough bits in this laptop to write the proof down in the margin.