====== cs20ps23as04: Scrabble® Generators ======
===== Goals =====
* Practice with generator functions, generator expressions, and functional programming in general.
* Implement part of the logic for a NPC in a well-known board game.
-----
====== Prerequisites ======
This assignment requires familiarity with the [[:lecture materials/]] presented in class through [[:lecture materials/week 04]].
-----
====== Background ======
===== Scrabble® Datasets =====
If you've ever played [[https://en.wikipedia.org/wiki/Scrabble|Scrabble®]], you may know that there are official Scrabble® dictionaries containing lists of legal words. We have the equivalent in file ''/srv/datasets/scrabble-hybrid'' (also available [[http://jeff.cis.cabrillo.edu/datasets/scrabble-hybrid|via HTTP]]), which contains 275381 legal Scrabble® words, one word per line. Here are the first five lines:
LARRIGAN
KUNE
SINGLETON
ACHENIUMS
LOCAL
Words in this file shall constitute the entirety of //legal// Scrabble® words for the purposes of this assignment.
Additionally, file ''[[http://jeff.cis.cabrillo.edu/datasets/scrabble-letter-values|/srv/datasets/scrabble-letter-values]]'' describes the value of each letter tile in Scrabble®.
===== Using a regex to extract potential Scrabble® words =====
In order to take full advantage of the space and time efficiency on this one, consider using ''[[https://docs.python.org/3/library/re.html#re.finditer|re.finditer()]]'' to create an iterator that will yield all potential scrabble words in a string. A regular expression matching all contiguous sequences of uppercase English letters is simple enough: [A-Z]+
You might consider combining this with ''[[https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable|itertools.chain.from_iterable()]]'' to "flatten" the results of iterating a file object:
>>> import itertools, re
>>> sum(1 for _ in itertools.chain.from_iterable(
... re.finditer('[A-Z]+', line.upper()) for line in open('/srv/datasets/party.txt')))
20
Note you could verify that there are 20 "words" in that file using some command-line utilities, and/or view the words:
$ tr '[:lower:]' '[:upper:]'
-----
====== Assignment ======
You shall implement a module named ''scrabble'' containing definitions for various Scrabble®-related functions. Take care to accommodate the following:
- The two dataset files must be read once and only once, when the module is imported/executed by the interpreter. Multiple calls to any of the required module functions shall not result in reading from those files multiple times.
- The [[https://docs.python.org/3/library/doctest.html|doctests]] in the function docstrings will help explain how the functions are supposed to behave, and also constitute embedded runnable tests. To run the tests and get a summary of what passed/failed: python -m doctest scrabble.py # only produces output if any tests fail
python -m doctest -v scrabble.py # gives a detailed summary of all passed/failed tests
===== Starter Code =====
"""
Generator and other functions related to Scrabble® words.
"""
from io import TextIOBase # for type hints
from typing import Iterable, Iterator # for type hints
import itertools # suggested for permutations() and chain.from_iterable()
import re # suggested for finditer()
# You will want to establish some module-scoped variables representing objects with the information
# from files /srv/datasets/scrabble-letter-values and /srv/datasets/scrabble-hybrid
# so that the various functions below have access to them as necessary.
def tokenize_words(file: TextIOBase) -> Iterator[str]:
"""
Tokenizes the contents of a text-file object (e.g. from open() or sys.stdin) and yields
all contiguous sequences of English letters from the file, in uppercase.
>>> from pprint import pprint
>>> pprint(list(tokenize_words(open('/srv/datasets/party.txt'))), compact=True)
['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
'COMMAND']
>>> pprint(list(tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))))
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQRS', 'TUV', 'WXYZ']
"""
pass # TODO
def legal_words(words: Iterable[str]) -> Iterator[str]:
"""
Selects from an iterable collection of strings only those that are legal Scrabble® words.
>>> from pprint import pprint
>>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/party.txt')))), compact=True)
['THE', 'PARTY', 'TOLD', 'YOU', 'TO', 'REJECT', 'THE', 'EVIDENCE', 'OF', 'YOUR',
'EYES', 'AND', 'EARS', 'IT', 'WAS', 'THEIR', 'FINAL', 'MOST', 'ESSENTIAL',
'COMMAND']
>>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))))
['DEF', 'GHI']
>>> pprint(list(legal_words(tokenize_words(open('/srv/datasets/empty')))))
[]
>>> list(legal_words(['all', 'in', 'lowercase']))
[]
"""
pass # TODO
def word_score(word: str) -> int:
"""
Computes the sum of the tile values for a given word, or 0 if the word is illegal.
>>> from pprint import pprint
>>> pprint([(w, word_score(w)) for w in tokenize_words(open('/srv/datasets/party.txt'))], \
compact=True)
[('THE', 6), ('PARTY', 10), ('TOLD', 5), ('YOU', 6), ('TO', 2), ('REJECT', 15),
('THE', 6), ('EVIDENCE', 14), ('OF', 5), ('YOUR', 7), ('EYES', 7), ('AND', 4),
('EARS', 4), ('IT', 2), ('WAS', 6), ('THEIR', 8), ('FINAL', 8), ('MOST', 6),
('ESSENTIAL', 9), ('COMMAND', 14)]
>>> pprint([(w, word_score(w)) \
for w in tokenize_words(open('/srv/datasets/phonewords-e.161.txt'))], \
compact=True)
[('ABC', 0), ('DEF', 7), ('GHI', 7), ('JKL', 0), ('MNO', 0), ('PQRS', 0),
('TUV', 0), ('WXYZ', 0)]
>>> word_score('lowercase')
0
"""
pass # TODO
def highest_value_word(words: Iterable[str]) -> str:
"""
Selects from an iterable collection of strings the highest-valued Scrabble® word,
as returned by the word_score() function.
If multiple words are maximal, the function returns the first one encountered.
If no words are present, raises an exception of some kind.
>>> highest_value_word(tokenize_words(open('/srv/datasets/party.txt')))
'REJECT'
>>> highest_value_word(tokenize_words(open('/srv/datasets/phonewords-e.161.txt')))
'DEF'
>>> try:
... highest_value_word(tokenize_words(open('/srv/datasets/empty')))
... except Exception as error:
... print('error')
...
error
"""
pass # TODO
def legal_tile_words(tiles: str) -> list[str]:
"""
Returns a sorted list of all the legal Scrabble® words that could be formed from the "tiles"
represented by the argument string.
>>> legal_tile_words('JTQHDEZ')
['DE', 'ED', 'EDH', 'EH', 'ET', 'ETH', 'HE', 'HET', 'JET', 'TE', 'TED', 'THE', 'ZED']
"""
pass # TODO
if __name__ == '__main__':
# Print the total number of legal words on stdin and their total value in points, just for fun
import sys
(i1, i2) = itertools.tee(legal_words(tokenize_words(sys.stdin)))
print(sum(1 for _ in i1), 'legal Scrabble words worth', sum(map(word_score, i2)), 'points')
===== Leaderboard =====
As submissions are received, this leaderboard will be updated with the top-performing fully/nearly functional solutions, with regard to execution speed.
Rank | Test Time (s) | Memory Usage (kB) | SLOC (lines) | User |
-----
====== Submission ======
Submit ''scrabble.py'' via [[info:turnin]].
{{https://jeff.cis.cabrillo.edu/images/feedback-robot.png?nolink }} //**Feedback Robot**//
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
====== Due Date and Point Value ======
Due at 23:59:59 on the date listed on the [[:syllabus|syllabus]].
''Assignment 04'' is worth 60 points.
Possible point values per category:
---------------------------------------
Properly implemented functions 60
(split evenly)
Possible deductions:
Style and practices 10–20%
Possible extra credit:
Submission via Git 5%
---------------------------------------