On this page:
Your Favorite Project
spelling_  corrector
6.2.0.2

A Spelling Corrector

For this homework set, you will design a Rust version of Norvig’s spelling corrector.

The objective of this homework is to deepen your knowledge of Rust and its module system.

Deadline spelling_corrector is due on Friday 20 February NOON.

The final deadline for the memo is Friday 20 February NOON.

Your Favorite Project

Describe the purpose of your chosen code base in a one-page memo. The memo should address three points: its user-facing purpose, its internal organization, and the goal of your Rust extension.

Dos: Imagine the chosen code base as a concrete object—a painting, a vase, an automobile—and how you would describe its distinguishing features.

Donts: Use any judgmental language. Try to convince the reader why it is your chosen code base.

Draft The draft of the memo is due on Friday 13 February NOON on Ms. Biron’s desk. Drop off a page in person. Make sure the memo displays your email addresses so that Ms. Biron can contact you and meet with you to discuss her corrections.

spelling_corrector

The purpose of spelling_corrector is to find possible corrections for misspelled words. It consists of two parts. The first part is a training module that consumes a corpus of correctly spelled words and counts the number of occurrences of these words. The second part uses the results of the first to check individual words. Specifically, it checks whether some given word is spelled correctly according to the training module and, if not, whether "small edits" create a variant that is correctly spelled.

Given a word, an edit action is one of the following:
  • the deletion of one letter;

  • the transposition of two neighboring letters;

  • the replacement of one letter with another letter; and

  • the insert of a letter at any position.

In this context, Norvig suggests that "small edits" means the application of one edit action possibly followed by the application of a second one to the result of the first.

Once the second part has generated all possible candidate for a potentially misspelled word, it picks the most frequently used one from the training corpus. If none of the candidates is a correct word, spelling_corrector reports a failure.

Mechanically, spelling_corrector consumes a training file on the command line and then reads words—one per line—from standard input. For each word from standard in, spelling_corrector prints one line. The line consists of just the word if it is spelled correctly. If the word is not correctly spelled, spelling_corrector prints the word and the best improvement or "–" if there aren’t any improvements.

Use the following files for your program: