cs20ps23as04: Scrabble® Generators

Goals

Practice with generator functions, generator expressions, and functional programming in general.
Implement part of the logic for a NPC in a well-known board game.

Prerequisites

This assignment requires familiarity with the lecture materials presented in class through week 04.

Background

Scrabble® Datasets

If you've ever played Scrabble®, you may know that there are official Scrabble® dictionaries containing lists of legal words. We have the equivalent in file /srv/datasets/scrabble-hybrid (also available via HTTP), which contains 275381 legal Scrabble® words, one word per line. Here are the first five lines:

LARRIGAN
KUNE
SINGLETON
ACHENIUMS
LOCAL

Words in this file shall constitute the entirety of legal Scrabble® words for the purposes of this assignment.

Additionally, file /srv/datasets/scrabble-letter-values describes the value of each letter tile in Scrabble®.

Using a regex to extract potential Scrabble® words

In order to take full advantage of the space and time efficiency on this one, consider using re.finditer() to create an iterator that will yield all potential scrabble words in a string. A regular expression matching all contiguous sequences of uppercase English letters is simple enough:

[A-Z]+

You might consider combining this with itertools.chain.from_iterable() to “flatten” the results of iterating a file object:

>>> import itertools, re
>>> sum(1 for _ in itertools.chain.from_iterable(
...     re.finditer('[A-Z]+', line.upper()) for line in open('/srv/datasets/party.txt')))
20

Note you could verify that there are 20 “words” in that file using some command-line utilities, and/or view the words:

$ tr '[:lower:]' '[:upper:]' </srv/datasets/party.txt | grep -oP '[A-Z]+' | wc -l
20
$ tr '[:lower:]' '[:upper:]' </srv/datasets/party.txt | grep -oP '[A-Z]+'
THE
PARTY
TOLD
YOU
TO
REJECT
THE
EVIDENCE
OF
YOUR
EYES
AND
EARS
IT
WAS
THEIR
FINAL
MOST
ESSENTIAL
COMMAND

Assignment

You shall implement a module named scrabble containing definitions for various Scrabble®-related functions. Take care to accommodate the following:

The two dataset files must be read once and only once, when the module is imported/executed by the interpreter. Multiple calls to any of the required module functions shall not result in reading from those files multiple times.
The doctests in the function docstrings will help explain how the functions are supposed to behave, and also constitute embedded runnable tests. To run the tests and get a summary of what passed/failed:
```
python -m doctest scrabble.py     # only produces output if any tests fail
python -m doctest -v scrabble.py  # gives a detailed summary of all passed/failed tests
```

Starter Code

scrabble.py

"""
Generator and other functions related to Scrabble® words.
"""
 
from io import TextIOBase  # for type hints
from typing import Iterable, Iterator  # for type hints
import itertools  # suggested for permutations() and chain.from_iterable()
import re  # suggested for finditer()
 
# You will want to establish some module-scoped variables representing objects with the information
# from files /srv/datasets/scrabble-letter-values and /srv/datasets/scrabble-hybrid
# so that the various functions below have access to them as necessary.
 
 
def tokenize_words(file: TextIOBase) -> Iterator[str]:
  """
  Tokenizes the contents of a text-file object (e.g. from open() or sys.stdin) and yields
  all contiguous sequences of English letters from the file, in uppercase.
 
  >>> from pprint import pprint
  >>> pprint(list(tokenize_words(open('/srv/datasets/party.txt'))), compact=True)
  ['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
   'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
   'COMMAND']
  >>> pprint(list(tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))))
  ['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQRS', 'TUV', 'WXYZ']
  """
  pass  # TODO
 
 
def legal_words(words: Iterable[str]) -> Iterator[str]:
  """
  Selects from an iterable collection of strings only those that are legal Scrabble® words.
 
  >>> from pprint import pprint
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/party.txt')))), compact=True)
  ['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
   'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
   'COMMAND']
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))))
  ['DEF', 'GHI']
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/empty')))))
  []
  >>> list(legal_words(['all', 'in', 'lowercase']))
  []
  """
  pass  # TODO
 
 
def word_score(word: str) -> int:
  """
  Computes the sum of the tile values for a given word, or 0 if the word is illegal.
 
  >>> from pprint import pprint
  >>> pprint([(w, word_score(w)) for w in tokenize_words(open('/srv/datasets/party.txt'))], \
             compact=True)
  [('THE', 6), ('PARTY', 10), ('TOLD', 5), ('YOU', 6), ('TO', 2), ('REJECT', 15),
   ('THE', 6), ('EVIDENCE', 14), ('OF', 5), ('YOUR', 7), ('EYES', 7), ('AND', 4),
   ('EARS', 4), ('IT', 2), ('WAS', 6), ('THEIR', 8), ('FINAL', 8), ('MOST', 6),
   ('ESSENTIAL', 9), ('COMMAND', 14)]
  >>> pprint([(w, word_score(w)) \
                  for w in tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))], \
             compact=True)
  [('ABC', 0), ('DEF', 7), ('GHI', 7), ('JKL', 0), ('MNO', 0), ('PQRS', 0),
   ('TUV', 0), ('WXYZ', 0)]
  >>> word_score('lowercase')
  0
  """
  pass  # TODO
 
 
def highest_value_word(words: Iterable[str]) -> str:
  """
  Selects from an iterable collection of strings the highest-valued Scrabble® word,
  as returned by the word_score() function.
  If multiple words are maximal, the function returns the first one encountered.
  If no words are present, raises an exception of some kind.
 
  >>> highest_value_word(tokenize_words(open('/srv/datasets/party.txt')))
  'REJECT'
  >>> highest_value_word(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))
  'DEF'
  >>> try:
  ...   highest_value_word(tokenize_words(open('/srv/datasets/empty')))
  ... except Exception as error:
  ...   print('error')
  ... 
  error
  """
  pass  # TODO
 
 
def legal_tile_words(tiles: str) -> list[str]:
  """
  Returns a sorted list of all the legal Scrabble® words that could be formed from the "tiles"
  represented by the argument string.
 
  >>> legal_tile_words('JTQHDEZ')
  ['DE', 'ED', 'EDH', 'EH', 'ET', 'ETH', 'HE', 'HET', 'JET', 'TE', 'TED', 'THE', 'ZED']
  """
  pass  # TODO
 
 
if __name__ == '__main__':
  # Print the total number of legal words on stdin and their total value in points, just for fun
  import sys
  (i1, i2) = itertools.tee(legal_words(tokenize_words(sys.stdin)))
  print(sum(1 for _ in i1), 'legal Scrabble words worth', sum(map(word_score, i2)), 'points')

Leaderboard

As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.

Rank	Test Time (s)	Memory Usage (kB)	SLOC (lines)	User

Submission

Submit scrabble.py via turnin.

Feedback Robot

This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.

Please read the feedback carefully.

Due Date and Point Value

Due at 23:59:59 on the date listed on the syllabus.

Assignment 04 is worth 60 points.

Possible point values per category:
---------------------------------------
Properly implemented functions       60
  (split evenly)
Possible deductions:
  Style and practices            10–20%
Possible extra credit:
  Submission via Git                 5%
---------------------------------------

portfolio

Table of Contents