set
, but is written in pure Python.This assignment requires familiarity with the lecture materials presented in class through week 11.
You shall define a class named HashSet
in a module named hash_set
that re-implements a basic version of Python's built-in set()
type as a hash table. The class must offer the following functionality:
list
internally), a maximum load factor of ⅔, and a new table size of the set length * 3 when the load factor is exceeded 1).__init__()
that accepts an optional iterable object containing initial keys for the set, à la the built-in set()
constructor.__bool__()
that returns True
if the set is non-empty, and False
otherwise.__contains__()
that tests for membership in the set.__getitem__()
that returns the key at a given index of the internal table.__iter__()
that yields keys from the internal table, in the order encountered in the table.__len__()
that returns the number of keys in the set.add()
that adds a key to the set (keys must be hashable).table_size()
that returns the size of the internal table.
It should go without saying, but you may not use built-in set
, dict
, or anything built on top of those. Implement your hash table using a list
, as described in lecture.
Here is a REPL session demonstrating the expected behavior of the class:
>>> h = HashSet() >>> len(h) 0 >>> h.table_size() 8 >>> list(h) [] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, None), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(1) >>> len(h) 1 >>> h.table_size() 8 >>> list(h) [1] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(10) >>> len(h) 2 >>> h.table_size() 8 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, None), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> list(h) [1, 10] >>> >>> h.add(9) # collision at index 1, should probe 2 (used) and 0 (unused) >>> len(h) 3 >>> list(h) [9, 1, 10] >>> h.table_size() 8 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(10) # duplicate, should be ignored >>> len(h) 3 >>> list(h) [9, 1, 10] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)] >>> >>> h.add(18) # collision at index 2, should probe 3 (unused) >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, 18), (4, None), (5, None), (6, None), (7, None)] >>> len(h) 4 >>> list(h) [9, 1, 10, 18] >>> >>> h.add (123) # load factor 5/8, almost exceeds maximum... >>> list(h) [9, 1, 10, 18, 123] >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 9), (1, 1), (2, 10), (3, 18), (4, 123), (5, None), (6, None), (7, None)] >>> >>> h.add(456) # max load factor will exceed 2/3, must rebuild table >>> len(h) 6 >>> list(h) [18, 1, 456, 9, 10, 123] >>> h.table_size() 18 >>> list(enumerate(h[i] for i in range(h.table_size()))) [(0, 18), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, 456), (7, None), (8, None), (9, 9), (10, 10), (11, None), (12, None), (13, None), (14, None), (15, 123), (16, None), (17, None)] >>> >>> words = HashSet(map(str.rstrip, open('/srv/datasets/many-english-words.txt'))) >>> len(words) 704240 >>> words.table_size() 1376253 >>> 'PALOMA' in words True >>> 'PELOTA' in words True >>> 'PERICO' in words False >>> 'PICNIC' in words True
Submit your source-code file(s) via turnin.
Feedback Robot
This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.
Please read the feedback carefully.
Due at 23:59:59 on the date listed on the syllabus.
Assignment 11
is worth 60 points.
Possible point values per category: --------------------------------------- Method functionality 60 (split roughly evenly) Possible deductions: Style and practices 10–20% Possible extra credit: Submission via Git 5% ---------------------------------------