| Lecture 12: Assertions and Contracts | ||||||||||||||||||
Types are Your FriendsA type checker is a theorem prover. As it crawls over your program, it confirms your claims about your functions and variables once and for all. Example:
When the Java type checker checks the program that contains this code fragment, it confirms two of your claims:
f .
Knowledge like this comes in handy when you are looking for errors in
your program. As you inspect your program for potential problems, you do
not have to re-confirm that WARNING: The above is only true for program languages with sound
type systems. Examples of such languages are Java, ML, and Haskell. In
contrast, C and C++ do not have sound type systems; a function such as
In the ideal world, we should therefore express everything we know about our program in the type system. Then, as the type checker blesses the program, it would confirm that the program runs correctly for ever. Unfortunately, this is impossible in the abstract (if we wish to have terminating type checkers) and in the concrete; the type languages of programming languages are just not expressive enough. Consider this example:
This code fragment specifies that h consumes and produces
prime numbers. No current type language supports a prime type
at the moment, and including it in a type language would soon run into
problems with the undecidable theory of arithmetic. Thus, even though such
function signatures would be highly useful in, say, the realm of
cryptography, it is impossible to state such facts and to have them
confirmed by the type checker.
Assertions(Types are not friendly enough)To overcome this deficiency, we need to generalize the language of types to a language of arbitrary assertions. Roughly speaking, an assertion is a claim about the values in our computer programs as the (abstract) computer executes them. Historically, people have associated stated assertions about program variables. Examples:
or
or
The last two examples also show that assertions are established as the
result of executing a statement, and using assertions, we can argue that
code fragments will execute properly. Due to this role, assertions are
often referred to as pre-conditions and post-conditions (of statements).
Since English is ambiguous and since assertions (claims of truth) are the elements that logic deals with, computer scientists quickly adopted the language of logic to express facts. Thus, instead of the above, people write:
and
or
Next people formulated rules for reasoning about program statements, especially assignment statements, if statements, and loops. Assertions Describe the Behavior of ProgramsTake a look at this program fragment:
The comments are logical assertions about the state of the program. Indeed,
they actually describe the execution of the program. Specifically, the
first assertion describes the entire state after initialization and the
second one the state after the assignment statement.
For if-statements, we need to take into account a case split:
In each branch, we can add the condition that succeeded (and calculate with
it: see 2b). Then, at the end of the if-statement, we can combine the two
branches with a logical conjunction. In this example, we may conclude that
no matter what x and y are, max is
equal to the larger of the two.
Last but not least, we can also use assertions to describe the execution of loops:
Here we see that one and the same assertion holds before the loop starts
(2), at the end of each loop body (3), and after the loop stops. The
conclusion is then that sum is the sum of all numbers in the
array a . Because this assertion holds at all these places, it
is also called a loop invariant (i.e., an assertion that is true across
several statements).
(Note: the assertion is false at the entry of the loop body. At that
point For years (1960's through the early 1980's), the holy grail of programming language research was the development of a calculus that would empower programmers to specify the initial state and the final state, and to create the program from these statements. Alternatively, the programmers would code something and an automatic theorem proofer would verify the result. (See Dijkstra's monograph.) The effort collapsed when people couldn't figure out how to easily scale this kind of reasoning to programming languages with procedure definitions and procedure calls (and beyond). The idea of programming triples, however, survived. To this day, it is a good idea to describe small code fragments as Hoare triples: PRE { programStatement } POST PRE and POST are logical statements
describing the state of the abstract machine. Since people couldn't reason
about several procedures, it is also natural that they started describing
the behavior of procedures with triples.
Assertions for Method BoundariesAlthough research on program correctness per se failed, the primary use
of Let us look at a canonical example:
This Java interface specifies a collection of data structures with four
methods. At first a newly created queue is empty. With enq , we can
add an item; with deq we can retrieve the item. The descriptions of
the two actions imply that they increase and decrease the size of
the queue. Their names and the name of the data structure finally suggest that
deq retrieves the item that has been in the queue the longest. In
short, our historical knowledge suggests a first-in/first-out data
structure.
In principle, it is possible to supplement the signatures and purpose statements of this interface with assertions in an equational logic that describe the behavior we just described informally. Here are just some sample equations:
Given the stateful specification of the queue interface, however, it is
impossible to formulate a simple set of equations or logical assertions that
describe the complete behavior of Queue .
Still, many aspects of our informal descriptions can be translated into logical assertions, specifically into pre- and postconditions. Let's practice this idea with two questions:
Here is a rewrite of the same interface with
The conditions contain two special notations. The first is
OLD[...] . It denotes the value of the expression before
the method call took place. The second one is RESULT . It is the
value that the method produces. Given these explanations, the pre- and
postconditions in the revised interface express all those conditions that we
mentioned above, except for those that provide a full-fledged formal
description of the methods' behavior. Still, experience shows that such
additional conditions quickly catch mistakes in the general logic, especially
in conjunction with other pre- and postconditions.
Let us look at a second example, the conversions of dollar amounts into
strings and vice versa. Since money amounts require precise calculations, it is
a good practice to develop a separate class of
Furthermore, we may also want to provide a static factory method that reads a
String into an Amount:
In either case, it makes sense to describe the obligation of print
or the obligation of read 's client with assertions that catch basic
mistakes.
The informal purpose statement implies two major characteristics about the string so far:
As before, the assertion uses both program variables and even programming
notation to express basic ideas but also mathematical symbols, e.g.,
∀. Furthermore, it does not express the complete behavior of either
print or read but it expresses essential
characteristics. In particular, it clarifies what it means to supply or receive
a check-style string.
If this condition fails, something is seriously wrong with the method or its caller. Hence monitoring conditions such as these helps us defend ourselves against miscommunication among programmers. The key is then to not only add such statements but to turn them into a part of the running code. From Assertions to ContractsIn the business world, a contract is a document that spells out the obligations of two (or more) participants in a joint activity. For example, a contract may specify that an oil company delivers oil to a residence and that the owner in turn pays for the oil. If either of the two parties doesn't live up to its obligations, the other one will go to a neutral arbiter who decides whether the claim is correct. If so, we say that one of the two parties got blamed. This scenario suggests that turning assertions into useful program monitoring tools requires three things:
We can go about making assertions executable in two ways. On one hand, we can create a notation for assertions, have programmers express their thoughts in this notation, and translate it into code. On the other hand, we can simply say that all boolean expressions of the underlying programming language are assertions and that the monitor just executes them at opportune moments. The boolean-expression approach is dominant in industrial languages though only Eiffel treats these assertions as contracts. Meyer chose to equate boolean expressions with assertions because of pragmatic reasons, even though he was keenly aware of the problems. The major problem is of course that a call to a boolean-value function is an assertion but the execution of this assertion may have side-effects. That is, it may diverge, raise an exception, or change the state of a variable. Research projects tend to use separate languages for contracts and assertions. Two prominent examples are the ESC/Java project at the former DEC SRC lab (short for extended static checking), which uses the Java Modeling Language (JML), and the SPEC# project at Microsoft research, which uses its own notation. The latter provides a programming language, a run-time monitoring system, and a theorem proofer. If the theorem proofer can verify that a party always meets its obligations, it can inform the compiler, which will then disable the corresponding monitoring code. Conversely, if the theorem prover cannot establish that the party lives up to the contract in all situations, the compiler will issue appropriate code. No matter how contract violations are found, it is important that the guilty party is blamed and that the error message is a good explanation for the problem. After all, the purpose of contracts is to protect programmers against miscommunication and to help them eliminate problems. Note 1: Unfortunately, most contract systems for Java and Eiffel lack a semantic foundation. As a result, they may not notice a contract violation at all, blame the wrong party, or blame the proper party with an incorrect explanation. Furthermore, the pun between assertion variables and program variables has prevented programmers from expressing and monitoring assertions about objects (rather than just plain values).(See Findler's dissertation and research for detailed explanations.) Note 2: Monitoring contracts can get expensive. It is therefore
important to think about what you want your contract system to check. The goal
is to find the proper compromise between checking basic properties and checking
them without much overhead in time, space, and energy. A basic property of a
method is an assertion whose violation suggests something fundamental went
wrong. For example, if there is no dot at the proper position in the result of
Sequence ContractsIn addition to types (signatures) and basic conditions about the before and after state of the world, we often also want to ensure that methods are called in the proper order. Take a look at this simple integer file interface:
Presumably, we create an IPort with a name and then work with it:
Clearly, the intention is that consumers call the methods of IPort in a
particular sequence, namely, one call to We call this a sequence contract and write something like the above as
The notation:
An alternative (but equivalent) notation is that of finite state machines:
This notation suggests an implementation via the state pattern [GoF]:
Contract Violations and Fault-Tolerant ProgrammingWhen contracts go wrong, the system is in trouble. Contracts monitor basic properties that should never go wrong. At the same time, however, the purpose of contracts is to check on the communication between disjoint parts of the system, and in common cases, the various parts of a system are not equals. If a lesser part fails, the overall system might just work at acceptable levels without the failing part. This reasoning suggests a natural combination of contracts with techniques from fault-tolerant systems. Specifically, imagine that the composition of the system is flexible. While it runs, various components can join or leave the system. Naturally, the system contains essential components for which this is not true, but it might be true for peripheral or for redundant parts. Then, when one of those redundant or peripheral subsystems fails, the kernel of the system can "unplug" the failing part and inform the rest of the components to work without it. Indeed, depending on how the system is configured, the remaining components don't ever need to know. Consider a system that is playing the games. The game administrator is obviously an essential component. If it fails, nothing else can replace it. In contrast, if one of the player fails---perhaps due to a bad connection or perhaps due to cheating---the game system can obviously do without the player. The internet is another example of a fault-tolerant system. When computers notice that some other computer no longer responds with acknowledgments, they route network traffic around the failing computer. If the failing computer is a mail server, they may hold the mail for several days and attempt to redeliver it on a regular basis. In general, the use of assertions (invariants) and contracts has led to a number of recent attempts to produce self-healing or self-repairing systems. While some ideas have already appeared in industrial contexts, for many it is too early to tell whether they will make it out of the research lab. Bibliography
|
last updated on Tue Jun 9 22:03:19 EDT 2009 | generated with PLT Scheme |