User Tools

Site Tools


cs19s23as05

cs19s23as05: Rhyming Dictionary

Goals

  • Practice with leveraging STL types.
  • Work with an interesting dataset.
  • Write a program with time-efficiency in mind.

Prerequisites

This assignment requires familiarity with the lecture materials presented in class through week 05.


Background

CMUdict

Image of the sphinx-plate logo of CMU's "Sphinx knowledge base tools": http://www.speech.cs.cmu.edu/tools/

The Carnegie Mellon University Pronouncing Dictionary is an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.

We have the CMUdict dataset available on our server, most importantly in file /srv/datasets/cmudict/cmudict.dict (also available via HTTP).

The format of file cmudict.dict is as follows:

Lines consist of a word followed by its pronunciation as a sequence of phonemes in ARPAbet format, e.g.:

dinosaur D AY1 N AH0 S AO2 R
superstore S UW1 P ER0 S T AO2 R

Words with multiple pronunciations are suffixed with (n), e.g.:

subject S AH0 B JH EH1 K T
subject(2) S AH1 B JH IH0 K T

Phonemes that indicate a syllable have integer suffixes:

  • Unstressed syllables: 0
  • Primary stressed syllables: 1
  • Secondary stressed syllables: 2

Assignment

You shall write a C++ program that acts as a rhyming dictionary. The program shall:

  1. Expect a one-word command-line argument that will serve as a case-insensitive query.
  2. Consider rhymes to be those words that share identical phoneme patterns from the final stressed phoneme to the end of the word.
  3. Print, in lowercase and in alphabetical order, all words from the CMUdict dataset that rhyme with the query, excluding the query itself.
    1. By default, print only those rhymes that also have the same number of syllables as the query.
    2. Otherwise, if a second command-line exists and is the string -a, print all words that rhyme with the query, regardless of syllable count.

Make sure to choose your data types and logic carefully! Review the materials from week 05 in which we discussed the strengths and weaknesses of the different STL containers. Your program will be expected to terminate in no more than five times the running time of the sample executable, and consume no more than ten times its amount of memory, when compiled with -Ofast.

Sample Executable

Sample executable cs19_cmudict_rhymes exists on the server and demonstrates the expected behavior of your program:

$ cs19_cmudict_rhymes rhyme
beim
chime
climb
crime
dime
grime
haim
heim
hime
i'm
kime
lime
lyme
mime
prime
rime
seim
sime
slime
syme
thyme
time

$ cs19_cmudict_rhymes rhyme -a
all-time
anticrime
beim
chime
climb
clothestime
crime
dime
grime
haim
heim
hime
i'm
kime
lime
lyme
mime
one-time
onetime
part-time
prime
rime
seim
sime
slime
sublime
syme
thyme
time

$ cs19_cmudict_rhymes putin
shootin'

$ cs19_cmudict_rhymes putin -a
highfalutin
rasputin
shootin'

Leaderboard

As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.

<html>

<table> <thead> <tr><th>Rank</th><th>Test Time (s)</th><th>Memory Usage (kB)</th><th>SLOC (lines)</th><th>User</th></tr> </thead> <tbody id=“leaderboard-table”> </tbody> </table> <script> function updateLeaderboard() { window.fetch(`/~turnin/leaderboard-cs19s23as05.txt?t=${new Date().getTime()}`, {

method: 'get'

}).then(response ⇒

response.text()

).then(text ⇒ {

let updated = document.getElementById('leaderboard-updated');
updated.innerText = `(Last updated: ${new Date()})`;
let lines = text.split('\n');
let table = document.getElementById('leaderboard-table');
while (table.firstChild)
  table.removeChild(table.firstChild);
for (let i = 0; i < lines.length; ++i) {
  let tokens = lines[i].split(' ').filter(token => token.length > 0);
  if (tokens.length < 2)
    continue;
  let tdRank = document.createElement('td');
  tdRank.textContent = i + 1;
  let tdTime = document.createElement('td');
  tdTime.textContent = Number(tokens[0]).toFixed(4);
  let tdMemUsage = document.createElement('td');
  tdMemUsage.textContent = tokens[1];
  let tdSloc = document.createElement('td');
  tdSloc.textContent = tokens[2];
  let tdUser = document.createElement('td');
  let userLink = document.createElement('a');
  userLink.href = `/~${tokens[3]}/`;
  userLink.target = '_blank';
  userLink.textContent = tokens[3];
  tdUser.appendChild(userLink);
  let tr = document.createElement('tr');
  tr.appendChild(tdRank);
  tr.appendChild(tdTime);
  tr.appendChild(tdMemUsage);
  tr.appendChild(tdSloc);
  tr.appendChild(tdUser);
  table.appendChild(tr);
}

}).catch(err ⇒ {

console.log('Something bad happened: ' + err);

});

window.setTimeout(updateLeaderboard, 60000);

} updateLeaderboard(); </script> </html>


Submission

Submit your source-code file(s) via turnin.

Feedback Robot

This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.

Please read the feedback carefully.

Due Date and Point Value

Due at 23:59:59 on the date listed on the syllabus.

Assignment 05 is worth 60 points.

Possible point values per category:
---------------------------------------
Correct output...                    40
  ...under memory threshold          20
Possible deductions:
  Exceeding time threshold         100%
  Style and practices            10–20%
Possible extra credit:
  Submission via Git                 5%
---------------------------------------
cs19s23as05.txt · Last modified: 2023-03-27 17:22 by 127.0.0.1