cs20ps23as07: Pursuing an Efficient Word Search

Goals

Apply your knowledge of Python's built-in data structures and their search functionality.
Write a program with efficiency (i.e., minimal time complexity) in mind.

Prerequisites

This assignment requires familiarity with the lecture materials presented in class through week 07.

Background

The animations below show words found in a word-search grid by traveling sequences of adjacent grid squares in any horizontal, vertical, or diagonal direction.

Assignment

You shall write a Python 3 program that reads a word-search grid from standard input, and prints all valid words in the grid to standard output. Specifically:

Use cs20p_word_search as the name the module intended to be run as __main__.
The grid will consist of any number of lines of text on standard input, containing uppercase English letters.
1. You may assume that the length of each line is also the total number of lines, i.e. that the grid is square.
The program shall expect two command-line arguments, which together dictate the characteristics of a valid word:
1. Argument 1 shall be a positive integer indicating the minimum length of a valid word.
2. Argument 2 shall be a path to a spell-checking dictionary file.
  1. You may assume that the dictionary file contains one word per line, though there are no guarantees on case or contents otherwise.
The output shall consist of all unique valid words in the grid, one per line, in alphabetical order.
1. Words may be oriented in any horizontal, vertical, or diagonal direction from the initial letter (i.e., 8 possible orientations).

FYI, no need to overcomplicate this. My straightforward/unoptimized solution is 19 lines of code, as measured by sloccount.

Design Considerations

Make sure that you choose an approach with appropriate data types and algorithms so that the time complexity of your solution is:

Good enough as a baseline to terminate within 5 seconds given 20×20 grids and dictionary sizes up to 1 million words.
O(n) or better where n is the total size of the grid
O(log n) or better where n is the total size of the dictionary
O(n) or better where is the difference between the dimension of the grid and the minimum length of a valid word (e.g. 8 for a 10×10 grid with a minimum length of 2)

Testing and Demos

Two sample executables are available on the server:

cs20p_word_search_gen creates grids compatible with your program.
- It expects three command-line arguments:
  1. n: The dimension of the grid (i.e. output will consist n lines, each of which has length n)
  2. min_length: The minimum length of a valid word
  3. dict_file: A path to the same kind of dictionary file your program expects.
- It produces two distinct outputs:
  1. The grid is emitted on standard output.
  2. The dictionary words are emitted on standard error, one per line, sorted alphabetically¹⁾.
cs20p_word_search demonstrates the expected behavior of your program.

For example, try running cs20p_word_search_gen to create a 10×10 grid using words at least 5 letters long from /usr/share/dict/american-english, redirecting standard output to file test_grid and standard error to file test_words:

cs20p_word_search_gen 10 5 /usr/share/dict/american-english >test_grid 2>test_words

The contents of the two files will then be something like the following

$ cat test_grid 
ATSANACHZF
UGALLAGHER
JSSERYPGAS
DRFPMQNMVR
UEVHREFYEK
BDJXIOBWRE
ARZSPWMRRN
IASPHHWOEI
PWLGWKNSDR
ALISTAIRTB

$ cat test_words 
ALISTAIR
AVERRED
BRINE
CANASTA
DUBAI
GALLAGHER
GNEISS
PROMO
PYRES
WARDERS

You could then run cs20p_word_search and your own program to verify that the output matches:

$ cs20p_word_search 5 /usr/share/dict/american-english <test_grid 
ALISTAIR
AVERRED
BRINE
CANASTA
DRAWL
DUBAI
ERRED
GALLAGHER
GNEISS
PROMO
PYRES
REDRAW
STAIR
WARDER
WARDERS

You could also automate some testing with diff, to lessen the amount of typing. The following consists of two commands, the first of which generates a grid in file test_grid, and the second of which runs diff to compare your output with the expected output. Lines only in the expected output are prefixed with -, and lines only in your output are prefixed with +:

DIM=5 LEN=2 DICT=/usr/share/dict/american-english bash -c '
  cs20p_word_search_gen $DIM $LEN $DICT >test_grid 2>/dev/null;
  diff -u <(cs20p_word_search $LEN $DICT <test_grid) \
          <(python cs20p_word_search.py $LEN $DICT <test_grid)
'

Leaderboard

As submissions are received, this leaderboard will be updated with the top-performing solutions that are fully functional in at least 90% of tests.

Rank	Average Test Time	Average Memory Usage (kB)	SLOC (lines)	User

Submission

Submit cs20p_word_search.py (and any other source-code file(s), if you write multiple modules) via turnin.

Feedback Robot

This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.

Please read the feedback carefully.

Due Date and Point Value

Due at 23:59:59 on the date listed on the syllabus.

Assignment 07 is worth 60 points.

Possible point values per category:
---------------------------------------
Correct output                       30
Acceptable time complexity           30
Possible deductions:
  Style and practices            10–20%
Possible extra credit:
  Submission via Git                 5%
---------------------------------------

¹⁾

Note that this only includes the dictionary words intentionally inserted into the grid, rather than all that are coincidentally present either due to the letters randomly selected to fill in the remainder of the grid or as shorter segments of chosen words.

portfolio

Table of Contents