====== cs20ps23as04: Scrabble® Generators ====== ===== Goals ===== * Practice with generator functions, generator expressions, and functional programming in general. * Implement part of the logic for a NPC in a well-known board game. ----- ====== Prerequisites ====== This assignment requires familiarity with the [[:lecture materials/]] presented in class through [[:lecture materials/week 04]]. ----- ====== Background ====== ===== Scrabble® Datasets =====

If you've ever played [[https://en.wikipedia.org/wiki/Scrabble|Scrabble®]], you may know that there are official Scrabble® dictionaries containing lists of legal words. We have the equivalent in file ''/srv/datasets/scrabble-hybrid'' (also available [[http://jeff.cis.cabrillo.edu/datasets/scrabble-hybrid|via HTTP]]), which contains 275381 legal Scrabble® words, one word per line. Here are the first five lines:


LARRIGAN
KUNE
SINGLETON
ACHENIUMS
LOCAL

Words in this file shall constitute the entirety of //legal// Scrabble® words for the purposes of this assignment. Additionally, file ''[[http://jeff.cis.cabrillo.edu/datasets/scrabble-letter-values|/srv/datasets/scrabble-letter-values]]'' describes the value of each letter tile in Scrabble®. ===== Using a regex to extract potential Scrabble® words ===== In order to take full advantage of the space and time efficiency on this one, consider using ''[[https://docs.python.org/3/library/re.html#re.finditer|re.finditer()]]'' to create an iterator that will yield all potential scrabble words in a string. A regular expression matching all contiguous sequences of uppercase English letters is simple enough: [A-Z]+ You might consider combining this with ''[[https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable|itertools.chain.from_iterable()]]'' to "flatten" the results of iterating a file object:


>>> import itertools, re
>>> sum(1 for _ in itertools.chain.from_iterable(
...     re.finditer('[A-Z]+', line.upper()) for line in open('/srv/datasets/party.txt')))
20

Note you could verify that there are 20 "words" in that file using some command-line utilities, and/or view the words:


$ tr '[:lower:]' '[:upper:]' 

-----


====== Assignment ======

You shall implement a module named ''scrabble'' containing definitions for various Scrabble®-related functions. Take care to accommodate the following:

  - The two dataset files must be read once and only once, when the module is imported/executed by the interpreter. Multiple calls to any of the required module functions shall not result in reading from those files multiple times.
  - The [[https://docs.python.org/3/library/doctest.html|doctests]] in the function docstrings will help explain how the functions are supposed to behave, and also constitute embedded runnable tests. To run the tests and get a summary of what passed/failed: python -m doctest scrabble.py     # only produces output if any tests fail
python -m doctest -v scrabble.py  # gives a detailed summary of all passed/failed tests


===== Starter Code =====


"""
Generator and other functions related to Scrabble® words.
"""

from io import TextIOBase  # for type hints
from typing import Iterable, Iterator  # for type hints
import itertools  # suggested for permutations() and chain.from_iterable()
import re  # suggested for finditer()

# You will want to establish some module-scoped variables representing objects with the information
# from files /srv/datasets/scrabble-letter-values and /srv/datasets/scrabble-hybrid
# so that the various functions below have access to them as necessary.


def tokenize_words(file: TextIOBase) -> Iterator[str]:
  """
  Tokenizes the contents of a text-file object (e.g. from open() or sys.stdin) and yields
  all contiguous sequences of English letters from the file, in uppercase.

  >>> from pprint import pprint
  >>> pprint(list(tokenize_words(open('/srv/datasets/party.txt'))), compact=True)
  ['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
   'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
   'COMMAND']
  >>> pprint(list(tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))))
  ['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQRS', 'TUV', 'WXYZ']
  """
  pass  # TODO


def legal_words(words: Iterable[str]) -> Iterator[str]:
  """
  Selects from an iterable collection of strings only those that are legal Scrabble® words.

  >>> from pprint import pprint
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/party.txt')))), compact=True)
  ['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
   'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
   'COMMAND']
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))))
  ['DEF', 'GHI']
  >>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/empty')))))
  []
  >>> list(legal_words(['all', 'in', 'lowercase']))
  []
  """
  pass  # TODO


def word_score(word: str) -> int:
  """
  Computes the sum of the tile values for a given word, or 0 if the word is illegal.

  >>> from pprint import pprint
  >>> pprint([(w, word_score(w)) for w in tokenize_words(open('/srv/datasets/party.txt'))], \
             compact=True)
  [('THE', 6), ('PARTY', 10), ('TOLD', 5), ('YOU', 6), ('TO', 2), ('REJECT', 15),
   ('THE', 6), ('EVIDENCE', 14), ('OF', 5), ('YOUR', 7), ('EYES', 7), ('AND', 4),
   ('EARS', 4), ('IT', 2), ('WAS', 6), ('THEIR', 8), ('FINAL', 8), ('MOST', 6),
   ('ESSENTIAL', 9), ('COMMAND', 14)]
  >>> pprint([(w, word_score(w)) \
                  for w in tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))], \
             compact=True)
  [('ABC', 0), ('DEF', 7), ('GHI', 7), ('JKL', 0), ('MNO', 0), ('PQRS', 0),
   ('TUV', 0), ('WXYZ', 0)]
  >>> word_score('lowercase')
  0
  """
  pass  # TODO


def highest_value_word(words: Iterable[str]) -> str:
  """
  Selects from an iterable collection of strings the highest-valued Scrabble® word,
  as returned by the word_score() function.
  If multiple words are maximal, the function returns the first one encountered.
  If no words are present, raises an exception of some kind.

  >>> highest_value_word(tokenize_words(open('/srv/datasets/party.txt')))
  'REJECT'
  >>> highest_value_word(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))
  'DEF'
  >>> try:
  ...   highest_value_word(tokenize_words(open('/srv/datasets/empty')))
  ... except Exception as error:
  ...   print('error')
  ... 
  error
  """
  pass  # TODO


def legal_tile_words(tiles: str) -> list[str]:
  """
  Returns a sorted list of all the legal Scrabble® words that could be formed from the "tiles"
  represented by the argument string.

  >>> legal_tile_words('JTQHDEZ')
  ['DE', 'ED', 'EDH', 'EH', 'ET', 'ETH', 'HE', 'HET', 'JET', 'TE', 'TED', 'THE', 'ZED']
  """
  pass  # TODO


if __name__ == '__main__':
  # Print the total number of legal words on stdin and their total value in points, just for fun
  import sys
  (i1, i2) = itertools.tee(legal_words(tokenize_words(sys.stdin)))
  print(sum(1 for _ in i1), 'legal Scrabble words worth', sum(map(word_score, i2)), 'points')


===== Leaderboard =====

As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.





Rank Test Time (s) Memory Usage (kB) SLOC (lines) User







-----

====== Submission ======

Submit ''scrabble.py'' via [[info:turnin]].


{{https://jeff.cis.cabrillo.edu/images/feedback-robot.png?nolink }} //**Feedback Robot**//

This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.

Please read the feedback carefully.


====== Due Date and Point Value ======


Due at 23:59:59 on the date listed on the [[:syllabus|syllabus]].

''Assignment 04'' is worth 60 points.


Possible point values per category:
---------------------------------------
Properly implemented functions       60
  (split evenly)
Possible deductions:
  Style and practices            10–20%
Possible extra credit:
  Submission via Git                 5%
---------------------------------------