====== cs19s23as05: Rhyming Dictionary ======
===== Goals =====
* Practice with leveraging STL types.
* Work with an interesting dataset.
* Write a program with time-efficiency in mind.
-----
====== Prerequisites ======
This assignment requires familiarity with the [[:lecture materials/]] presented in class through [[:lecture materials/week 05]].
-----
====== Background ======
===== CMUdict =====
{{:sphinx-plate.png?nolink|Image of the sphinx-plate logo of CMU's "Sphinx knowledge base tools": http://www.speech.cs.cmu.edu/tools/}}
> [[http://www.speech.cs.cmu.edu/cgi-bin/cmudict|The Carnegie Mellon University Pronouncing Dictionary]] is an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.
We have the [[https://github.com/cmusphinx/cmudict|CMUdict dataset]] available on our server, most importantly in file ''/srv/datasets/cmudict/cmudict.dict'' (also available [[http://jeff.cis.cabrillo.edu/datasets/cmudict/cmudict.dict|via HTTP]]).
The format of file ''cmudict.dict'' is as follows:
Lines consist of a word followed by its pronunciation as a sequence of [[https://en.wikipedia.org/wiki/Phoneme|phonemes]] in [[https://en.wikipedia.org/wiki/ARPABET|ARPAbet]] format, e.g.:
dinosaur D AY1 N AH0 S AO2 R
superstore S UW1 P ER0 S T AO2 R
Words with multiple pronunciations are suffixed with (n), e.g.:
subject S AH0 B JH EH1 K T
subject(2) S AH1 B JH IH0 K T
Phonemes that indicate a syllable have integer suffixes:
* Unstressed syllables: ''0''
* Primary stressed syllables: ''1''
* Secondary stressed syllables: ''2''
-----
====== Assignment ======
You shall write a C++ program that acts as a **rhyming dictionary**. The program shall:
- Expect a one-word command-line argument that will serve as a case-insensitive **query**.
- Consider //rhymes// to be those words that share identical phoneme patterns from the final stressed phoneme to the end of the word.
- Print, in lowercase and in alphabetical order, all words from the [[#CMUdict]] dataset that rhyme with the query, excluding the query itself.
- By default, print only those rhymes that also have the same number of syllables as the query.
- Otherwise, if a second command-line exists and is the string ''-a'', print **all** words that rhyme with the query, regardless of syllable count.
Make sure to choose your data types and logic carefully! Review the materials from week 05 in which we discussed the strengths and weaknesses of the different STL containers. Your program will be expected to terminate in **no more than five times the running time** of the sample executable, and consume **no more than ten times its amount of memory**, when compiled with ''-Ofast''.
===== Sample Executable =====
Sample executable ''cs19_cmudict_rhymes'' exists on the server and demonstrates the expected behavior of your program:
$ cs19_cmudict_rhymes rhyme
beim
chime
climb
crime
dime
grime
haim
heim
hime
i'm
kime
lime
lyme
mime
prime
rime
seim
sime
slime
syme
thyme
time
$ cs19_cmudict_rhymes rhyme -a
all-time
anticrime
beim
chime
climb
clothestime
crime
dime
grime
haim
heim
hime
i'm
kime
lime
lyme
mime
one-time
onetime
part-time
prime
rime
seim
sime
slime
sublime
syme
thyme
time
$ cs19_cmudict_rhymes putin
shootin'
$ cs19_cmudict_rhymes putin -a
highfalutin
rasputin
shootin'
===== Leaderboard =====
As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.
Rank | Test Time (s) | Memory Usage (kB) | SLOC (lines) | User |
-----
====== Submission ======
Submit your source-code file(s) via [[info:turnin]].
{{https://jeff.cis.cabrillo.edu/images/feedback-robot.png?nolink }} //**Feedback Robot**//
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
====== Due Date and Point Value ======
Due at 23:59:59 on the date listed on the [[:syllabus|syllabus]].
''Assignment 05'' is worth 60 points.
Possible point values per category:
---------------------------------------
Correct output... 40
...under memory threshold 20
Possible deductions:
Exceeding time threshold 100%
Style and practices 10–20%
Possible extra credit:
Submission via Git 5%
---------------------------------------