Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Hardware Meetup - Sputnik, 24. October 2019

Hash tables

... the basic background of their implementations.


Samo Penič

About me...

What is a hash table

Sometimes also called associative array or dictionary. It is a data structure used to map key into values



phone_number["Yoda"]


stores and returns us the information stored for varible, object, ...



It is widely used to build things such us dictionaries, telephone books, book indexes, ...


Hash table provide Insertion, Deletion and Retreival in constant time!

However...

The computer memory is accessible by addressing the data numerically.

phone[0]
Yoda
(01)321-4321
phone[1]
Obi
(01)222-2222
phone[2]
Luke
(01)123-1234
phone[3]
Darth
(01)111-1111

e.g.: Accessing the item at memory location phone[0] yields name Yoda and his phone number: (01)321-4321 (order reversed it is)


Finding phone number of Darth requires linear search through the array, which is not efficient.

Builing a phonebook...

    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:
    Name:

Mapping key to index

... is performed by so called hash functions. It can be as simple as sum(ascii_code[i]) % array_size


Let's try it out for our phone directory (array_size=13):





phone[0]
 
 
phone[1]
 
 
phone[2]
 
 
phone[3]
 
 
phone[4]
 
 
phone[5]
 
 
phone[6]
 
 
phone[7]
 
 
phone[8]
 
 
phone[9]
 
 
phone[10]
 
 
phone[11]
 
 
phone[12]
 
 

Collisions!

They will occur. The probability depends on Load factor (LF=number_of_items/array_size)


We can cope with collisions in different of ways:


Closed addressing

data[0]
data[1]
data[2]
data[3]
data[4]
data[5]
data[6]
data[7]
data[8]
data[9]
data[10]
data[11]
data[12]
data[13]

Open addressing




data[0]
 
 
data[1]
 
 
data[2]
 
 
data[3]
 
 
data[4]
 
 
data[5]
 
 
data[6]
 
 
data[7]
 
 
data[8]
 
 
data[9]
 
 
data[10]
 
 
data[11]
 
 
data[12]
 
 

Hash function has to...

Python dictionaries

Python relys heavily on dictionaries, so it is important they are efficient. Uses open addressing multihash to minimize collisions.


table = [None] * n;
for hash, key, value in entries:
    perturb = hash
    i = hash % n
    while table[i] is not None:
        #collision
        i = (5 * i * perturb + 1) % n
        perturb >>= 5
    table[i] = (key, value)

See talk: Raymond Hettinger -- Modern dictionaries (SF Python 2006)

Hash tables in firmware

Use a spacebar or arrow keys to navigate.
Press 'P' to launch speaker console.