<WRAP round info>
====== cs20pf22as11: Implementing a Hash Table ======

===== Goals =====

  * Practice designing a hash table that works similarly to Python's built-in ''set'', but is written in pure Python.
  * Get some experience creating a data structure that performs constant-time (O(1)) insertion, search, and deletion operations.
</WRAP>

-----

<WRAP round help>
====== Prerequisites ======

This assignment requires familiarity with the [[:lecture materials/]] presented in class through [[:lecture materials/week 11]].
</WRAP>

-----

<WRAP round todo>
====== Assignment ======

You shall define a class named ''HashSet'' in a module named ''hash_set'' that re-implements a basic version of Python's built-in ''set()'' type as a hash table. The class must offer the following functionality:

  - As discussed in lecture, an initial table size of 8 (you'll want to use a ''list'' internally), a maximum load factor of ⅔, and a new table size of the set length * 3 when the load factor is exceeded ((i.e. the table sizes as set size increases should be 8, 18, 39, 81, 165, 333, 669, 1341, 2685, 5373, 10749, 21501, 43005, 86013, 172029, 344061, 688125, 1376253, 2752509, 5505021, 11010045, 22020093, etc.)).
    - <wrap important>Make sure the class doesn't hash an inserted key more than once, i.e. caches hashes to reuse when it is time to rebuild a bigger table.</wrap>
  - Method ''<nowiki>__init__()</nowiki>'' that accepts an optional iterable object containing initial keys for the set, à la the built-in ''[[https://docs.python.org/3/library/functions.html#func-set|set()]]'' constructor.
  - A [[https://en.wikipedia.org/wiki/Linear_probing|linear probing]] sequence that alternates looking first forward and then backward an increasing distance from the collision index. e.g. if there is a collision at index 0, the probing sequence to find an unused index is 1, -1, 2, -2, etc.
  - Method ''<nowiki>__bool__()</nowiki>'' that returns ''True'' if the set is non-empty, and ''False'' otherwise.
  - Method ''<nowiki>__contains__()</nowiki>'' that tests for membership in the set.
  - Method ''<nowiki>__getitem__()</nowiki>'' that returns the key at a given index of the internal table.
  - Method ''<nowiki>__iter__()</nowiki>'' that yields keys from the internal table, in the order encountered in the table.
  - Method ''<nowiki>__len__()</nowiki>'' that returns the number of keys in the set.
  - Method ''<nowiki>add()</nowiki>'' that adds a key to the set (keys must be [[https://docs.python.org/3.1/glossary.html#term-hashable|hashable]]).
  - Method ''<nowiki>table_size()</nowiki>'' that returns the size of the internal table.
</WRAP>

<wrap important>It should go without saying, but you may not use built-in ''set'', ''dict'', or anything built on top of those. Implement your hash table using a ''list'', as described in lecture.</wrap>

===== Demonstration =====

Here is a REPL session demonstrating the expected behavior of the class:

<code python>
>>> h = HashSet()
>>> len(h)
0
>>> h.table_size()
8
>>> list(h)
[]
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, None), (1, None), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)]
>>> 
>>> h.add(1)
>>> len(h)
1
>>> h.table_size()
8
>>> list(h)
[1]
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, None), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, None), (7, None)]
>>> 
>>> h.add(10)
>>> len(h)
2
>>> h.table_size()
8
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, None), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)]
>>> list(h)
[1, 10]
>>> 
>>> h.add(9)  # collision at index 1, should probe 2 (used) and 0 (unused)
>>> len(h)
3
>>> list(h)
[9, 1, 10]
>>> h.table_size()
8
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)]
>>> 
>>> h.add(10)  # duplicate, should be ignored
>>> len(h)
3
>>> list(h)
[9, 1, 10]
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, 9), (1, 1), (2, 10), (3, None), (4, None), (5, None), (6, None), (7, None)]
>>> 
>>> h.add(18)  # collision at index 2, should probe 3 (unused)
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, 9), (1, 1), (2, 10), (3, 18), (4, None), (5, None), (6, None), (7, None)]
>>> len(h)
4
>>> list(h)
[9, 1, 10, 18]
>>> 
>>> h.add (123)  # load factor 5/8, almost exceeds maximum...
>>> list(h)
[9, 1, 10, 18, 123]
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, 9), (1, 1), (2, 10), (3, 18), (4, 123), (5, None), (6, None), (7, None)]
>>> 
>>> h.add(456)  # max load factor will exceed 2/3, must rebuild table
>>> len(h)
6
>>> list(h)
[18, 1, 456, 9, 10, 123]
>>> h.table_size()
18
>>> list(enumerate(h[i] for i in range(h.table_size())))
[(0, 18), (1, 1), (2, None), (3, None), (4, None), (5, None), (6, 456), (7, None), (8, None), (9, 9), (10, 10), (11, None), (12, None), (13, None), (14, None), (15, 123), (16, None), (17, None)]
>>>
>>> words = HashSet(map(str.rstrip, open('/srv/datasets/many-english-words.txt')))
>>> len(words)
704240
>>> words.table_size()
1376253
>>> 'PALOMA' in words
True
>>> 'PELOTA' in words
True
>>> 'PERICO' in words
False
>>> 'PICNIC' in words
True
</code>

-----

====== Submission ======

Submit your source-code file(s) via [[info:turnin]].

<WRAP round box>
{{https://jeff.cis.cabrillo.edu/images/feedback-robot.png?nolink }} //**Feedback Robot**//

This project has a feedback robot that will run some tests on your submission and provide you with a feedback report via email within roughly one minute.

<wrap em>Please read the feedback carefully.</wrap>
</WRAP>

====== Due Date and Point Value ======

<WRAP round important>
Due at 23:59:59 on the date listed on the [[:syllabus|syllabus]].

''Assignment 11'' is worth 60 points.

<code>
Possible point values per category:
---------------------------------------
Method functionality                 60
  (split roughly evenly)
Possible deductions:
  Style and practices            10–20%
Possible extra credit:
  Submission via Git                 5%
---------------------------------------
</code>
</WRAP>