Table of Contents
cs19s23as05: Rhyming Dictionary
Goals
- Practice with leveraging STL types.
- Work with an interesting dataset.
- Write a program with time-efficiency in mind.
Prerequisites
This assignment requires familiarity with the lecture materials presented in class through week 05.
Background
CMUdict
The Carnegie Mellon University Pronouncing Dictionary is an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.
We have the CMUdict dataset available on our server, most importantly in file /srv/datasets/cmudict/cmudict.dict
(also available via HTTP).
The format of file cmudict.dict
is as follows:
Lines consist of a word followed by its pronunciation as a sequence of phonemes in ARPAbet format, e.g.:
dinosaur D AY1 N AH0 S AO2 R superstore S UW1 P ER0 S T AO2 R
Words with multiple pronunciations are suffixed with (n), e.g.:
subject S AH0 B JH EH1 K T subject(2) S AH1 B JH IH0 K T
Phonemes that indicate a syllable have integer suffixes:
- Unstressed syllables:
0
- Primary stressed syllables:
1
- Secondary stressed syllables:
2
Assignment
You shall write a C++ program that acts as a rhyming dictionary. The program shall:
- Expect a one-word command-line argument that will serve as a case-insensitive query.
- Consider rhymes to be those words that share identical phoneme patterns from the final stressed phoneme to the end of the word.
- Print, in lowercase and in alphabetical order, all words from the CMUdict dataset that rhyme with the query, excluding the query itself.
- By default, print only those rhymes that also have the same number of syllables as the query.
- Otherwise, if a second command-line exists and is the string
-a
, print all words that rhyme with the query, regardless of syllable count.
Make sure to choose your data types and logic carefully! Review the materials from week 05 in which we discussed the strengths and weaknesses of the different STL containers. Your program will be expected to terminate in no more than five times the running time of the sample executable, and consume no more than ten times its amount of memory, when compiled with -Ofast
.
Sample Executable
Sample executable cs19_cmudict_rhymes
exists on the server and demonstrates the expected behavior of your program:
$ cs19_cmudict_rhymes rhyme beim chime climb crime dime grime haim heim hime i'm kime lime lyme mime prime rime seim sime slime syme thyme time $ cs19_cmudict_rhymes rhyme -a all-time anticrime beim chime climb clothestime crime dime grime haim heim hime i'm kime lime lyme mime one-time onetime part-time prime rime seim sime slime sublime syme thyme time $ cs19_cmudict_rhymes putin shootin' $ cs19_cmudict_rhymes putin -a highfalutin rasputin shootin'
Leaderboard
As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.
Rank | Test Time (s) | Memory Usage (kB) | SLOC (lines) | User |
---|
Submission
Submit your source-code file(s) via turnin.
Feedback Robot
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
Due Date and Point Value
Due at 23:59:59 on the date listed on the syllabus.
Assignment 05
is worth 60 points.
Possible point values per category: --------------------------------------- Correct output... 40 ...under memory threshold 20 Possible deductions: Exceeding time threshold 100% Style and practices 10–20% Possible extra credit: Submission via Git 5% ---------------------------------------