5 — Help!
Guest Presenters: Suzanne Becker, Julia Belyakova Tuesday, 21 January 2020
Let’s take a breather and slowly walk through what we have learned.
Basic Approach to Assignment 1
Count. Let us recall the design recipe. Here is an ISL solution to count we implemented in class.
We start with a data definition.
#lang racket ; An S is one of: ; - Name ; - [List S] Provide a type signature and a purpose statement.
; S -> Number ; Counts the number of Names in the given S Work thru examples and formulate the results as test cases.
(check-expect (count "a") 1) (check-expect (count `("a" "b" "c")) 3) (check-expect (count `("a" ("b") "c")) 3) If you’re stuck, recall the template step: one clause per case of S, one selector per field, one recursion per self-reference.
Complete the solution.
; S -> Number ; Counts the number of Names in the given S (define (count s) (cond [(string? s) 1] [else (foldr + 0 (map count s))])) (check-expect (count "a") 1) (check-expect (count `("a" "b" "c")) 3) (check-expect (count `("a" ("b") "c")) 3) Confirm with tests.
At this point, we have an implementation of count and unit tests. The only thing we are missing is a test harness to call count on an stdin input.
Let us write one in racket:
#lang racket (provide ; Read in JSON as string from stdin, convert to s-exp, ; apply the given func to input received, ; convert output to str and write to stdout run) (require json) ; imports: ; type JSexpr = String || [Listof JSexpr] || ... ; read-json: -> JSexpr ; from default port STDIN ; write-json: JSexpr -> Void ; to default port STDOUT (define (run func) (write-json (func (read-json))) (newline))
Context. Here, we had to use an accumulator.
#lang racket ; S -> T ; Replaces every Name in S with its depth in the arrays (define (context sg) (local ; S N -> T ; dep tracks the depth of the current s in sg ((define (context/acc s dep) (cond [(string? s) dep] [else (for/list ([e s]) (context/acc e (add1 dep)))]))) (context/acc sg 0))) (check-expect (context "a") 0) (check-expect (context `("s" "a")) `(1 1)) (check-expect (context `(("s" ("bb" "t")) "a")) `((2 (3 3)) 1))
JSON Representation vs Internal Representation
In Assignment 1, we have a one-to-one correspondence between constructors of valid JSON inputs as read by read-json and constructors of S:
JSON S |
| | |
|-- String |-- Name |
| | |
|-- Array of JSON |-- List of S |
And the same holds for outputs.
Now, consider the data flow in the ISL solution for context:
JSON |
‖ |
‖ (JSON reader) |
▽ |
JSexpr representation of input |
‖ |
‖ (context) |
▽ |
JSexpr representation of output |
‖ |
‖ (JSON printer) |
▽ |
JSON |
Because S and JSON are so similar, we implicitly reuse JSON representation (information) as S (data):
JSON |
‖ |
‖ (JSON reader) |
▽ |
JSexpr Representation of input |
┆ |
┆ |
▽ |
S input |
‖ |
‖ (context) |
▽ |
T output |
┆ |
┆ |
▽ |
JSexpr Representation of output |
‖ |
‖ (JSON printer) |
▽ |
String |
Can we use the same approach in Assignment 2?
Let us think about the information encoded in VExpr and distill relevant bits from its JSON representation.
JSON VExpr |
| | |
|-- String |-- Var |
| | |
|-- Integer |-- Int |
| | |
|-- Array |-- Arithmetic Operation |
|- [JSON, "+", JSON] |-- Addition(VExpr, VExpr) |
|- [JSON, "*", JSON] |-- Multiplication(VExpr, VExpr) |
|- [ | |
[let,String,"=",JSON], |-- Scope( |
…, Array of Declaration(Var, VExpr), |
JSON VExpr |
] ) |
difficult discrimination on data (consider the case of the JSON array);
potentially repetitive and thus inefficient matching of array values;
clutter when dealing with JSON arrays;
superfluous pieces of data such as let and "=".
To avoid those problems, we can explicitly define a better data representation of VExpr and work with it instead of your language’s representation of JSON.
To walk through a concrete example, let’s get back to Assignment 1.
If we have internal representations of S and T, the data flow looks like this:
JSON |
‖ |
‖ (JSON reader) |
▽ |
JSexpr representation of input |
‖ |
‖ (S parser) |
▽ |
S (input) |
‖ |
‖ (context) |
▽ |
T (output) |
‖ |
‖ (T printer) |
▽ |
JSexpr Representation of output |
‖ |
‖ (JSON printer) |
▽ |
String |
If we are in an OO setting, a good way to encode S and T would be to use classes.
For example, here is a class-based solution of context in Racket:
#lang racket ; An S is one of: Array% or Name%. (define S<%> (interface () ; S<%> -> T<%> context)) ; represent JSON arrays of S JSONs (define Array% (class* object% (S<%>) (init-field array) (super-new) (define/public (context depth) (new Array% [array (for/vector ([x array]) (send x context (+ depth 1)))])) ; represent JSON strings of S JSONS (define Name% (class* object% (S<%>) (init-field string) (super-new) (define/public (context depth) (new Int% [depth depth]))))))
Note that context is defined as a method.
A Solution in Julia, an Emerging, OO Programming Language
Next, let us look at a different solution to count, in a completely new language, Julia.
# S -> Int |
# Returns the number of Names in the given S |
count(s::SName) = 1 |
count(s::SArr) = sum(map(count, s.val)) |
The right-hand sides of the two definitions are straightforward, and, at a first glance, count appears to pattern match on SName and SArr constructors. But let’s look at the definition of S:
# Abstract S |
abstract type S end |
|
# Name case of S |
struct SName <: S |
val :: String |
end |
|
# Array case of S |
struct SArr <: S |
val :: Vector{S} # like Array<S> in Java |
end |
We can see that structs look more like Java classes, and <: S reminds of inhertitance or interface implementation.
So, what happens in count is (multiple) dynamic dispatch, similar to dispatch of Java methods. The main difference of Julia code from the OO Racket code we saw earlier is that count is defined externally as a function, not internally as method.
Respect Your Data Representation
Depending on a programming language, there can be one or more ways to encode the same information as a data representation and discriminate on the data.
Object-oriented. Data representation: interfaces/abstract classes and classes/structs. Algorithm: methods. Discrimination mechanism: method-based dynamic dispatch. (Java, Racket, Rust, Python, JavaScript)
(Typed) functional. Data representation: algebraic data types, possibly w/o type checking Algorithm: function. Discrimination mechanism: pattern matching. (Haskell, OCaml, Racket)
Dynamically Typed Languages Do not use instanceof ever in OOPLs Data representation: structs/classes. Algorithm: function with if/cond/switch. Discrimination mechanism: manual dynamic dispatch (ISL+, JavaScript, Python)
Multiple dispatch. Data representation: abstract types and structs. Algorithm: functions. Discrimination mechanism: multiple dynamic dispatch. (Julia, Cecil)
Note that while it is usually possible to do manual dynamic dispatch in languages such as Java or Julia, this is not recommended. Language designers and implementors work hard to make primary language mechanisms (such dynamic dispatch of methods) work well and fast, but the same cannot be said about manual dispatch.
In languages such as Racket and ISL+, on the other hand, manual dispatch using <name>? predicates is a primary mechanism of discriminating on data. (The pattern matching in Racket complies to the coding patterns you saw in Fundamentals I.)
Bad Programming Mistakes
This solution combines several bad elements found in your solutions.
Finally, let us look at a bad Julia solution to replace and discuss what’s wrong with it.
function replace(s::SName) |
str = s.val |
r = Vector{Char}(str) |
for i in 1:length(r)-1 |
r[i] = 218 - Int(str[i]) |
end |
SName(String(r)) |
end |
|
function replace(s::SArr) |
ts = Vector{S}(undef, length(s.val)) |
for i in 0:length(ts)-1 |
ts[i] = replace(s.val[i]) |
end |
SArr(ts) |
end |
the lack of helper function for constructing a dual string;
the use of for loops;
the use of (bad) indexes;
impossible-to-read formula instead of indexing;
the occurrence of magic constant
218 is actually incorrect.
It should be 219 because is Int(str(’z’)) + Int(str(’a’)), which is equal to 122 + 97, which is also a magic constant.
meaningless variable names;
Finally, let’s talk about testing.
We noticed that some of you tested equality by rendering objects as strings and comparing strings, either directly or indirectly in the context of unit tests. For example, in the following unit test for replace, S is rendered as a string via JSON:
@test JSON.json(S2json(replace(exSArr2))) == "[\"zyxd\]" |
If the JSON library changes its JSON-to-string printer, all string-based tests will break.
In the example above, we are testing three things at once: replace, S2json, and JSON.json. The last two might mess up with the result of replace.
It is extremely easy to make mistakes in a string representation (as demonstrated by the test above), and you might spend a lot of time figuring out whether the problem is in your tests or program.
It is hard to read long strings that represent structured data.
Note This issue is the same as with reading code that uses strings to represent information that comes with some structure. Strings don’t have structure and reading structure-free representation forces a person to discern the structure in parallel.
Solution You want to compare your chosen data representations directly. If your language does not automatically support (structural) equality, you need to define it. For example, in Julia, we can do it like this:
# for redefining equality |
import Base.== |
|
# Structural equality for S |
==(s1::SName, s2::SName) = s1.val == s2.val |
==(s1::SArr, s2::SArr) = s1.val == s2.val |
If your language requires a lot of boilerplate for creating data structures (to use in unit tests), it might be handy to define some helper functions. Besides that, if you have a nice concrete syntax in mind, you can implement a reader to convert information from this syntax into the data structure.