====== cs20ps23as07: Pursuing an Efficient Word Search ======
===== Goals =====
* Apply your knowledge of Python's **built-in data structures** and their search functionality.
* Write a program with **efficiency** (i.e., **minimal time complexity**) in mind.
-----
====== Prerequisites ======
This assignment requires familiarity with the [[:lecture materials/]] presented in class through [[:lecture materials/week 07]].
-----
====== Background ======
The animations below show words found in a word-search grid by traveling sequences of adjacent grid squares in any horizontal, vertical, or diagonal direction.
-----
====== Assignment ======
You shall write a Python 3 program that reads a word-search **grid** from standard input, and prints all **valid words** in the grid to standard output. Specifically:
- Use ''cs20p_word_search'' as the name the module intended to be run as ''__main__''.
- The **grid** will consist of any number of **lines of text on standard input**, containing uppercase English letters.
- You may assume that the length of each line is also the total number of lines, i.e. that the grid is **square**.
- The program shall expect **two [[https://docs.python.org/3/library/sys.html#sys.argv|command-line arguments]]**, which together dictate the characteristics of a **valid word**:
- Argument 1 shall be a positive integer indicating the **minimum length** of a valid word.
- Argument 2 shall be a path to a spell-checking **dictionary file**.
- You may assume that the dictionary file contains **one word per line**, though there are no guarantees on case or contents otherwise.
- The output shall consist of **all unique valid words** in the grid, one per line, in alphabetical order.
- Words may be oriented in **any horizontal, vertical, or diagonal direction** from the initial letter (i.e., 8 possible orientations).
FYI, no need to overcomplicate this. My straightforward/unoptimized solution is 19 lines of code, as measured by ''sloccount''.
===== Design Considerations =====
Make sure that you choose an approach with appropriate **data types** and **algorithms** so that the time complexity of your solution is:
- Good enough as a baseline to terminate within 5 seconds given 20x20 grids and dictionary sizes up to 1 million words.
- O(//n//) or better where //n// is the total size of the grid
- O(log //n//) or better where //n// is the total size of the dictionary
- O(//n//) or better where is the difference between the dimension of the grid and the minimum length of a valid word (e.g. ''8'' for a ''10x10'' grid with a minimum length of ''2'')
===== Testing and Demos =====
Two **sample executables** are available on the server:
- ''cs20p_word_search_gen'' creates grids compatible with your program.
* It expects three command-line arguments:
- ''n'': The dimension of the grid (i.e. output will consist ''n'' lines, each of which has length ''n'')
- ''min_length'': The minimum length of a valid word
- ''dict_file'': A path to the same kind of dictionary file your program expects.
* It produces two distinct outputs:
- The grid is emitted on standard output.
- The dictionary words are emitted on standard error, one per line, sorted alphabetically((Note that this only includes the dictionary words intentionally inserted into the grid, rather than all that are coincidentally present either due to the letters randomly selected to fill in the remainder of the grid or as shorter segments of chosen words.)).
- ''cs20p_word_search'' demonstrates the expected behavior of your program.
For example, try running ''cs20p_word_search_gen'' to create a 10x10 grid using words at least 5 letters long from ''/usr/share/dict/american-english'', redirecting standard output to file ''test_grid'' and standard error to file ''test_words'':
cs20p_word_search_gen 10 5 /usr/share/dict/american-english >test_grid 2>test_words
The contents of the two files will then be something like the following
$ cat test_grid
ATSANACHZF
UGALLAGHER
JSSERYPGAS
DRFPMQNMVR
UEVHREFYEK
BDJXIOBWRE
ARZSPWMRRN
IASPHHWOEI
PWLGWKNSDR
ALISTAIRTB
$ cat test_words
ALISTAIR
AVERRED
BRINE
CANASTA
DUBAI
GALLAGHER
GNEISS
PROMO
PYRES
WARDERS
You could then run ''cs20p_word_search'' and your own program to verify that the output matches:
$ cs20p_word_search 5 /usr/share/dict/american-english
You could also automate some testing with ''diff'', to lessen the amount of typing. The following consists of two commands, the first of which generates a grid in file ''test_grid'', and the second of which runs ''diff'' to compare your output with the expected output. Lines only in the expected output are prefixed with ''-'', and lines only in your output are prefixed with ''+'':
DIM=5 LEN=2 DICT=/usr/share/dict/american-english bash -c '
cs20p_word_search_gen $DIM $LEN $DICT >test_grid 2>/dev/null;
diff -u <(cs20p_word_search $LEN $DICT
-----
===== Leaderboard =====
As submissions are received, this leaderboard will be updated with the top-performing solutions that are fully functional in at least 90% of tests.
Rank | Average Test Time | Average Memory Usage (kB) | SLOC (lines) | User |
====== Submission ======
Submit ''cs20p_word_search.py'' (and any other source-code file(s), if you write multiple modules) via [[info:turnin]].
{{https://jeff.cis.cabrillo.edu/images/feedback-robot.png?nolink }} //**Feedback Robot**//
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
====== Due Date and Point Value ======
Due at 23:59:59 on the date listed on the [[:syllabus|syllabus]].
''Assignment 07'' is worth 60 points.
Possible point values per category:
---------------------------------------
Correct output 30
Acceptable time complexity 30
Possible deductions:
Style and practices 10–20%
Possible extra credit:
Submission via Git 5%
---------------------------------------