Table of Contents
cs20pf22as11: Implementing a Hash Table
Goals
- Practice designing a hash table that works similarly to Python's built-in
set
, but is written in pure Python. - Get some experience creating a data structure that performs constant-time (O(1)) insertion, search, and deletion operations.
Prerequisites
This assignment requires familiarity with the lecture materials presented in class through week 11.
Assignment
You shall define a class named HashSet
in a module named hash_set
that re-implements a basic version of Python's built-in set()
type as a hash table. The class must offer the following functionality:
- As discussed in lecture, an initial table size of 8 (you'll want to use a
list
internally), a maximum load factor of ⅔, and a new table size of the set length * 3 when the load factor is exceeded 1).- Make sure the class doesn't hash an inserted key more than once, i.e. caches hashes to reuse when it is time to rebuild a bigger table.
- Method
__init__()
that accepts an optional iterable object containing initial keys for the set, à la the built-inset()
constructor. - A linear probing sequence that alternates looking first forward and then backward an increasing distance from the collision index. e.g. if there is a collision at index 0, the probing sequence to find an unused index is 1, -1, 2, -2, etc.
- Method
__bool__()
that returnsTrue
if the set is non-empty, andFalse
otherwise. - Method
__contains__()
that tests for membership in the set. - Method
__getitem__()
that returns the key at a given index of the internal table. - Method
__iter__()
that yields keys from the internal table, in the order encountered in the table. - Method
__len__()
that returns the number of keys in the set. - Method
add()
that adds a key to the set (keys must be hashable). - Method
table_size()
that returns the size of the internal table.
It should go without saying, but you may not use built-in set
, dict
, or anything built on top of those. Implement your hash table using a list
, as described in lecture.
Demonstration
Here is a REPL session demonstrating the expected behavior of the class:
>>> h = HashSet() >>> len(h) 0 >>> h.table_size() 8 >>> list(h) [] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, None), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(1) >>> len(h) 1 >>> h.table_size() 8 >>> list(h) [1] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(10) >>> len(h) 2 >>> h.table_size() 8 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> list(h) [1, 10] >>> >>> h.add(9) # collision at index 1, should probe 2 (used) and 0 (unused) >>> len(h) 3 >>> list(h) [9, 1, 10] >>> h.table_size() 8 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(10) # duplicate, should be ignored >>> len(h) 3 >>> list(h) [9, 1, 10] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(18) # collision at index 2, should probe 3 (unused) >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, 18), (4, None), (5, None), (6, None), (7, None)] >>> len(h) 4 >>> list(h) [9, 1, 10, 18] >>> >>> h.add (123) # load factor 5/8, almost exceeds maximum... >>> list(h) [9, 1, 10, 18, 123] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, 18), (4, 123), (5, None), (6, None), (7, None)] >>> >>> h.add(456) # max load factor will exceed 2/3, must rebuild table >>> len(h) 6 >>> list(h) [18, 1, 456, 9, 10, 123] >>> h.table_size() 18 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 18), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, 456), (7, None), (8, None), (9, 9), (10, 10), (11, None), (12, None), (13, None), (14, None), (15, 123), (16, None), (17, None)] >>> >>> words = HashSet(map(str.rstrip, open('/srv/datasets/many-english-words.txt'))) >>> len(words) 704240 >>> words.table_size() 1376253 >>> 'PALOMA' in words True >>> 'PELOTA' in words True >>> 'PERICO' in words False >>> 'PICNIC' in words True
Submission
Submit your source-code file(s) via turnin.
Feedback Robot
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
Due Date and Point Value
Due at 23:59:59 on the date listed on the syllabus.
Assignment 11
is worth 60 points.
Possible point values per category: --------------------------------------- Method functionality 60 (split roughly evenly) Possible deductions: Style and practices 10–20% Possible extra credit: Submission via Git 5% ---------------------------------------