Table of Contents
cs20ps23as07: Pursuing an Efficient Word Search
Goals
- Apply your knowledge of Python's built-in data structures and their search functionality.
- Write a program with efficiency (i.e., minimal time complexity) in mind.
Prerequisites
This assignment requires familiarity with the lecture materials presented in class through week 07.
Background
The animations below show words found in a word-search grid by traveling sequences of adjacent grid squares in any horizontal, vertical, or diagonal direction.
Assignment
You shall write a Python 3 program that reads a word-search grid from standard input, and prints all valid words in the grid to standard output. Specifically:
- Use
cs20p_word_search
as the name the module intended to be run as__main__
. - The grid will consist of any number of lines of text on standard input, containing uppercase English letters.
- You may assume that the length of each line is also the total number of lines, i.e. that the grid is square.
- The program shall expect two command-line arguments, which together dictate the characteristics of a valid word:
- Argument 1 shall be a positive integer indicating the minimum length of a valid word.
- Argument 2 shall be a path to a spell-checking dictionary file.
- You may assume that the dictionary file contains one word per line, though there are no guarantees on case or contents otherwise.
- The output shall consist of all unique valid words in the grid, one per line, in alphabetical order.
- Words may be oriented in any horizontal, vertical, or diagonal direction from the initial letter (i.e., 8 possible orientations).
FYI, no need to overcomplicate this. My straightforward/unoptimized solution is 19 lines of code, as measured by sloccount
.
Design Considerations
Make sure that you choose an approach with appropriate data types and algorithms so that the time complexity of your solution is:
- Good enough as a baseline to terminate within 5 seconds given 20×20 grids and dictionary sizes up to 1 million words.
- O(n) or better where n is the total size of the grid
- O(log n) or better where n is the total size of the dictionary
- O(n) or better where is the difference between the dimension of the grid and the minimum length of a valid word (e.g.
8
for a10×10
grid with a minimum length of2
)
Testing and Demos
Two sample executables are available on the server:
cs20p_word_search_gen
creates grids compatible with your program.- It expects three command-line arguments:
n
: The dimension of the grid (i.e. output will consistn
lines, each of which has lengthn
)min_length
: The minimum length of a valid worddict_file
: A path to the same kind of dictionary file your program expects.
- It produces two distinct outputs:
- The grid is emitted on standard output.
- The dictionary words are emitted on standard error, one per line, sorted alphabetically1).
cs20p_word_search
demonstrates the expected behavior of your program.
For example, try running cs20p_word_search_gen
to create a 10×10 grid using words at least 5 letters long from /usr/share/dict/american-english
, redirecting standard output to file test_grid
and standard error to file test_words
:
cs20p_word_search_gen 10 5 /usr/share/dict/american-english >test_grid 2>test_words
The contents of the two files will then be something like the following
$ cat test_grid ATSANACHZF UGALLAGHER JSSERYPGAS DRFPMQNMVR UEVHREFYEK BDJXIOBWRE ARZSPWMRRN IASPHHWOEI PWLGWKNSDR ALISTAIRTB
$ cat test_words ALISTAIR AVERRED BRINE CANASTA DUBAI GALLAGHER GNEISS PROMO PYRES WARDERS
You could then run cs20p_word_search
and your own program to verify that the output matches:
$ cs20p_word_search 5 /usr/share/dict/american-english <test_grid ALISTAIR AVERRED BRINE CANASTA DRAWL DUBAI ERRED GALLAGHER GNEISS PROMO PYRES REDRAW STAIR WARDER WARDERS
You could also automate some testing with diff
, to lessen the amount of typing. The following consists of two commands, the first of which generates a grid in file test_grid
, and the second of which runs diff
to compare your output with the expected output. Lines only in the expected output are prefixed with -
, and lines only in your output are prefixed with +
:
DIM=5 LEN=2 DICT=/usr/share/dict/american-english bash -c ' cs20p_word_search_gen $DIM $LEN $DICT >test_grid 2>/dev/null; diff -u <(cs20p_word_search $LEN $DICT <test_grid) \ <(python cs20p_word_search.py $LEN $DICT <test_grid) '
Leaderboard
As submissions are received, this leaderboard will be updated with the top-performing solutions that are fully functional in at least 90% of tests.
Rank | Average Test Time | Average Memory Usage (kB) | SLOC (lines) | User |
---|
Submission
Submit cs20p_word_search.py
(and any other source-code file(s), if you write multiple modules) via turnin.
Feedback Robot
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
Due Date and Point Value
Due at 23:59:59 on the date listed on the syllabus.
Assignment 07
is worth 60 points.
Possible point values per category: --------------------------------------- Correct output 30 Acceptable time complexity 30 Possible deductions: Style and practices 10–20% Possible extra credit: Submission via Git 5% ---------------------------------------