Alternative Data Designs
12 June 2011
Early this year, Stephen Bloch posted the following problem on the mailing list:
The game presents the player with a collection of randomly located dots on a canvas. Every second the collection changes and one point is added. The player uses the mouse to “chase” the points. The game ends when the player clicks on one of the dots, at which point the game software should display the text “Congratulations!” in the center of the canvas.
- a tick handler:
; DotGameWorld -> DotGameWorld ; create a random world that is one posn longer than this one - a rendering function:
; DotGameWorld -> Image ; render this world as a scene of dots - a mouse handler:
; DotGameWorld Nat Nat MouseEvent -> DotGameWorld ; is the player's mouse click on a dot in this world? - a function that checks whether the game is over:
; DotGameWorld -> Boolean ; is the game over?
; Nat -> DotGameWorld ; start the world with n+1 dots (define (main n) (big-bang (random-world (add1 n)) (on-tick add-one 1) (to-draw draw-dots) (on-mouse catch-one) (stop-when over?)))
With a bit of further reflection, it becomes clear that the mouse event handler plays two distinct roles. The API dictates that it processes a mouse click, meaning it consumes an element of DotGameWorld and some extra arguments and that it produces an element of DotGameWorld. The game specification demands that, in case the mouse click is on a dot, the world it returns signals that the game is over. Similarly, the stop-when handler wants to know which worlds signal that the game is over. Nothing in the data definition indicates, however, which lists represent a world in which the player has caught a dot. In short, it is unclear when the game should end.
The purpose of this note is to show how different data representations for one and the same “world” work differently with the constraints of a library’s interface. In this case, one “obvious” fix is to add a distinct set of “final” worlds to the data representation; another one is to designate one kind of world as special. Each choice comes with advantages and disadvantages, and your students need to learn to appreciate that choices of data representation have implications for the design of functions.
1 Strings are Final
(define (catch-one w x y me) (cond [(mouse=? "button-down" me) (if (is-on-a-dot? w x y) "CONGRATULATIONS!" w)] [else w]))
; DotGameWorld.v2 is one of ; – String ; – [Listof Posn] ; interpretation: a string represents a final world ; and a list represents the centers of the dots
; DotGameWorld.v2 -> DotGameWorld.v2 (define (add-one w) (cond [(string? w) ...] [(list? w) ...]))
; DotGameWorld.v2 -> Boolean ; is the game over? (define (over? w) (string? w))
What this exercise teaches us is that this kind of data representation is bad. It forces the designer of the functions to account for pieces of data that never show up. Even for a small program like this one, the extra overhead reduces the readability of the program. The situation calls for going beyond the first, obvious design.
2 Worlds and Final Worlds
At this point it is important to recall that a data definition describes a set of data. More precisely, it describes a subset of all possible values in our programming language, and even better, it provides instructions for how to construct all the elements. The question is how this understanding helps with our problem.
Here is how. We describe handlers as functions that consume elements of
DotGameWorld and allow the mouse event handler to produce elements
of DotGameWorld.v2, which is a superset of
DotGameWorld. Fortunately, the API tells us that the
stop-when function will process the result of a handler before it
flows back to other handlers—
- a tick handler:
; DotGameWorld -> DotGameWorld ; create a random world that is one posn longer than this one - a rendering function:
; DotGameWorld -> Image ; render this world (list of posns) as a scene of dots - a mouse handler:
; DotGameWorld Nat Nat MouseEvent -> DotGameWorld.v2 ; is the player's mouse click on a dot in this world? ; if so produce a string! - a function that checks whether the game is over, and that is only the case when the input is possibly a string:
; DotGameWorld.v2 -> Boolean ; is the game over?
; String -> Image ; draw the final world state, which is just a string (define (draw-final w) (overlay (text w 22 'red) BGRD))
; Nat -> World.v1 ; start the world with n+1 dots (define (main.v2 n) (big-bang (random-world (add1 n)) (on-tick add-one 1) (to-draw draw-dots) (on-mouse catch-one) (stop-when over? draw-final)))
And that is all there is to dealing with strings a bit more naturally. This treatment would also work in a language such as Java where subtyping is similar to the subset reasoning we use here, but it would fail in a strictly hierarchical language such as ML. In the latter world, we have to reason about the data representation of the game world in a third way.
3 Empty is Useless
This third way exploits a different insight, namely, that the game—
; DotGameWorld = [Listof Posn] ; Posn = (make-posn Nat Nat) ; interpretation: empty represents the final world, ; a non-empty list represents the centers of the dots
; Nat -> World.v1 ; start the world with n+1 dots (define (main.v3 n) (big-bang (random-world (add1 n)) (on-tick add-one 1) (to-draw draw-dots) (on-mouse catch-one.v3) (stop-when empty? draw-final.v3)))
; DotGameWorld -> Image ; draw the final scene, which is always the empty list (define (draw-final.v3 w) (overlay (text "CONGRATULATIONS!" 22 'red) BGRD))
The best point is that with this change, the “naive” design from the beginning of the essay works out perfectly. The rest of the program consists of some 10 lines of additional code.
4 Conclusion
The three sections show how the choice of a data representation and the existing APIs of a language are intertwined. In this particular example, the world teachpack allows a separation of rendering for intermediate state from the rendering for final states; because of this twist, we can use empty as the final world. In general, it is important to explore data representations for real programs and not to stick to the first one that comes to mind.
Another point to observe is that different representations come with different advantages. Consider a small change to the problem statement:
... at which point the software should display the text “Congratulations: <nm>” in the center of the canvas, where <nm> is the number of dots on the screen. Players can now measure and compare their performance.
(string-append "Congratulations: your game world contains " (number->string (length w)) " dots. Can you do better next time?")
When you teach a course that allows sufficiently interesting trade-offs like the ones discussed here, formulate exercises that tease out these problems. Your students will greatly benefit from exploring and comparing alternative data representation ideas. That’s how real life works, and with DrRacket it is easy to explore trade-offs in a technically meaningful way.