A Spelling Corrector
For this homework set, you will design a Rust version of Norvig’s spelling corrector.
The objective of this homework is to deepen your knowledge of Rust and its module system.
Deadline spelling_corrector is due on Friday 20 February NOON.
The final deadline for the memo is Friday 20 February NOON.
Your Favorite Project
Describe the purpose of your chosen code base in a one-page memo. The memo should address three points: its user-facing purpose, its internal organization, and the goal of your Rust extension.
Dos: Imagine the chosen code base as a concrete object—
Donts: Use any judgmental language. Try to convince the reader why it is your chosen code base.
Draft The draft of the memo is due on Friday 13 February NOON on Ms. Biron’s desk. Drop off a page in person. Make sure the memo displays your email addresses so that Ms. Biron can contact you and meet with you to discuss her corrections.
spelling_corrector
The purpose of spelling_corrector is to find possible corrections for misspelled words. It consists of two parts. The first part is a training module that consumes a corpus of correctly spelled words and counts the number of occurrences of these words. The second part uses the results of the first to check individual words. Specifically, it checks whether some given word is spelled correctly according to the training module and, if not, whether "small edits" create a variant that is correctly spelled.
the deletion of one letter;
the transposition of two neighboring letters;
the replacement of one letter with another letter; and
the insert of a letter at any position.
Once the second part has generated all possible candidate for a potentially misspelled word, it picks the most frequently used one from the training corpus. If none of the candidates is a correct word, spelling_corrector reports a failure.
Mechanically, spelling_corrector consumes a training file on the command line and then reads
words—
train.txt for the training corpus;
test.txt for testing the working program; and
output.txt for testing the working program.