skip to content
Site header image Minh Ton

Priority Queues and Hash Tables - DSA #8

Let’s take a deep-dive at two wonderful data structures that drive many popular algorithms: Priority Queues and Hash Tables!

Last Updated:

👋
Hey there! Welcome to my series of notes on Data Structures and Algorithms!


In this note, we’ll explore two powerful data structures: Priority Queues and Hash Tables! These structures drive many popular algorithms, making them both fascinating and essential to understand.

Priority Queues

We’re all familiar with the basic concept of a Queue—a linear data structure that follows the FIFO (First-In, First-Out) principle, meaning the first element inserted is the first to be removed. However, Priority Queues operate differently despite their similar name. But why do we need them?

Imagine you need to develop software to manage a hospital's emergency room. The software must track patients and determine the order in which they receive care. When a patient arrives, the staff creates a record in the database. What data structure would you use to store these records? 

Initially, you might consider a sorted list to treat patients alphabetically by name. However, it's not fair to prioritize a patient with a minor issue (e.g., “Donald” with a splinter) over a patient with a critical condition (e.g., “Mickey” with a high fever). A simple queue based on the order of arrival is also unsuitable. 

Emergency room staff assign an urgency or priority level to patients. The patient with the highest priority is treated by the next available doctor. Therefore, we need a data structure that can efficiently provide the patient with the highest priority. A Priority Queue is the appropriate data structure for this scenario. 

Basic Concept of Priority Queues

A Priority Queue is a data structure for maintaining a set of elements, each with an associated value called a priority.
A priority value indicates a task’s priority for completion.

In the above scenario, the priority value indicates a patient’s priority for treatment.

A priority value of natural numbers (N\in ℕ ) should be good enough in most cases. There are two ways for the value to indicate the priority, including smallest to highest for increasing priority, and vice-versa. In this note, we’ll assume that the smallest priority value indicates the highest priority.

Priority Queue Operations

The Queue data structure provides three operations:

  • isEmpty for checking whether a priority queue is empty.
  • enQueue for inserting a new element into the queue.
  • deQueue for removing and returning the element at the beginning of the queue.

A Priority Queue has equivalent operations, often referred to as Test, Insertion, and Extraction. Additionally, it supports two more operations:

  • Get: for getting the element in the priority queue with the highest priority.
  • Update: update the ithi^{th} element’s priority value with the new value.

In this note, we’ll be focusing on the Test, Insertion, and Extraction operations.

Implementation of Priority Queues

Using Singly-Linked List or Array

A singly-linked list is a relatively simple way to implement a priority queue. You can create a new list and insert new elements at the beginning.

Here's a singly-linked list definition for implementing a Priority Queue:

typedef struct Node* Ref;
struct Node {
	int pV; // The priority value
	Node* next;
};

Since we're arranging elements based on priority, should we use a sorted singly-linked list? Let's start with an unsorted list.


Using Unsorted Singly-Linked List

Let’s implement the function Test to check if a priority queue is empty. The Test function has a time complexity of Θ(1)\Theta(1).

Test(Ref pQ) {
	return (pQ == NULL);
}

For the insertion operation, we insert a new node at the beginning of the priority queue.

Insertion(Ref &pQ, Ref p) {
	p->next = pQ;
	pQ = p;
}

For the Extraction operation, we need to identify and remove the element with the highest priority. This requires O(n)\mathcal{O}(n) time. However, this operation should be as fast as possible. Therefore, an unsorted singly-linked list is not ideal.


Using Sorted Singly-Linked List

This singly-linked list is going to be arranged in ascending order of the elements based on their priority.

pQ13579\begin{array}{c}\text{pQ} \rightarrow \fbox{1} \rightarrow \fbox{3} \rightarrow \fbox{5} \rightarrow \fbox{7} \rightarrow \fbox{9} \rightarrow \otimes\end{array}

The Test function is simply just check if the first element reference of the priority queue is NULL, basically the same as the one that we’ve done previously.

Since we’re using a sorted singly-linked list, the first node of the list is the one with the smallest priority value. By extracting the first element, we can get the element whose priority is the highest. This can be done in Θ(1)\Theta(1) time.

Extraction(Ref &pQ) {
	if (!pQ) return;
	Ref p = pQ;
	p = pQ->next;
	p->next = NULL;
	return p;
}

For the Insertion function, there’s actually a great technique to insert a new node into a sorted singly-linked list demonstrated by Niklaus Wirth in his book Algorithms + Data Structures = Programs. It’s actually utilizing two different pointers running in parallel. This is how this works:

This is our sorted singly-linked list priority queue…

pQ579\begin{array}{c}\text{pQ} \rightarrow \fbox{5} \rightarrow \fbox{7} \rightarrow \fbox{9} \rightarrow \otimes\end{array}

…and the node that we’re going to insert into the linked list.

p8\begin{array}{c}\text{p} \rightarrow \fbox{8} \rightarrow \otimes\end{array}

We’ll define two pointers, p2 which points to the first element of our linked list, and p1 which points to the next element of the first one. We’re going to refer p1 as the “current” node while traversing the linked list.

p1pQ579p2\hspace{0.2cm} p_1 \\ \hspace{0.2cm} \downarrow \\ \begin{array}{c}\text{pQ} \rightarrow \fbox{5} \rightarrow \fbox{7} \rightarrow \fbox{9} \rightarrow \otimes\end{array} \\ \hspace{-1.7cm} \uparrow \\ \hspace{-1.7cm} p_2

Now, we keep traversing through the list while we hasn’t reached the tail (p1 ≠ NULL) and the priority value of the “current” node is smaller than that of the node to be inserted (p1->pV < p->pV).

p1pQ579p2\hspace{2.1cm} p_1 \\ \hspace{2.1cm} \downarrow \\ \begin{array}{c}\text{pQ} \rightarrow \fbox{5} \rightarrow \fbox{7} \rightarrow \fbox{9} \rightarrow \otimes\end{array} \\ \hspace{0.2cm} \uparrow \\ \hspace{0.2cm} p_2

When we reach the node where its priority value is larger or equal to the one to be inserted, stop traversing though the linked list. At this point, we insert the new node between the previous node p2 and the “current” node p1 by updating the pointers: p2->next = p links the previous node to the new node, and p->next = p1 connects the new node to the “current” node.

p1pQ5769p2\hspace{3cm} p_1 \\ \hspace{3cm} \downarrow \\ \begin{array}{c}\text{pQ} \rightarrow \fbox{5} \rightarrow \fbox{7} \rightarrow \fbox{6} \rightarrow \fbox{9} \rightarrow \otimes\end{array} \\ \hspace{-0.7cm} \uparrow \\ \hspace{-0.7cm} p_2
Insertion(Ref &pQ, Ref p) {
	Ref p2 = pQ, p1 = pQ->next;
	while (p1 && p1->pV < p->pV) {
		p2 = p1; p1 = p1->next;
	}
	p2->next = p; p->next = p1;
}

What happens if the priority queue is empty? In that case, pQ->next doesn't exist. To handle this, we can use a dummy head node for our linked list, which doesn't store data. This allows us to call the Insertion operation without errors.

Ref pQ = new Node;
pQ->next = NULL;
Insertion(pQ, p);

Using Min-Heap

In reality, we use a min-heap for implementing the Priority Queue data structure. In a min-heap, the smallest element is always at the top (the root).

a1a2a3a4a5a6a7a8a9123574589\begin{array}{|c|c|c|c|c|c|c|c|c|} \hline a_1 & a_2 & a_3 & a_4 & a_5 & a_6 & a_7 & a_8 & a_9 \\ \hline 1 & 2 & 3 & 5 & 7 & 4 & 5 & 8 & 9 \\ \hline \end{array}

The Extraction operation involves retrieving and removing this minimum element. For the Insertion operation, we can use the same approach as in Heap Sort in Advanced Sorting Algorithms - DSA #6: place the new element at the end of the array and swap it with the first element. After both operations, we must reconstruct the heap using the sifting algorithm to maintain its structure. 

Since we use the sifting algorithm, both operations have a time complexity of O(logn)\mathcal{O}(\log n).

Using Balanced Binary Search Tree

This implementation of the Priority Queue using Balanced Binary Search Tree will be introduced in a future note. However, this implementation isn’t efficient in practice. More information can be found in the source below.

Hash Tables

Introduction to Hash Tables

In computing, the most common operations on a set of elements include searching for a specific element, inserting a new element, and deleting an existing element. A data structure that supports these three operations is called a dictionary. There are quite a few ways a dictionary can be implemented, and one of them is the hashing technique.

Hashing

Firstly, let’s assume that we have to implement a dictionary of nn records with keys k1,k2,,knk_1, k_2, \dots, k_n.

💡
A record is a data structure which contains several fields, possibly of different data types. In order to keep the information of a specific entity, a record always has a special field called “key” for identification.

In the hashing technique, we only care about the keys, which makes every record distinct from each other.

Hashing is based on the idea of distributing keys among a one-dimensional array H[0..m1]H[0..m - 1] called a Hash Table. Given a predefined function hh (called hash function), the distribution is done by computing the value h(ki),i[1,n]h(k_i), \forall i \in [1,n]. To be more specific, h(ki)[0,m1]h(k_i) \in [0, m - 1] is called a hash value and determines the location where the key is stored in the hash table.

Hash Function

There are a lot of simple hash functions, and each can be used for different types of keys, including integers, letters, and character strings.

  • If the keys are non-negative integers, a simple hash function can be defined as:
    h(k)=kmodmh(k) = k \mod m

    where kk is the key, mm is the size of the hash table, and modmod denotes the modulus operator. For example, if k=37k = 37 and m=10m = 10, then h(37)=37mod10=7h(37) = 37 \mod 10 = 7.

  • If the keys are letters from some alphabet Σ\Sigma, we need to first convert them into numerical values. A simple approach is to use their ordinal value, or their position in the alphabet:
    h(k)=ord(k)modmh(k) = \text{ord}(k) \mod m

    where ord(k)\text{ord}(k) gives the position of the letter in the alphabet, and mm is the table size. For example, if k=Dk = D, and assuming ord(D)=4\text{ord}(D) = 4, then h(D)=4mod10=4h(D) = 4 \mod 10 = 4.

  • When keys are strings (e.g., words), we extend our approach by summing the ordinal values of each character and then applying the modulus operation:
    h(k)=(i=1nord(ci))modmh(k) = \left( \sum_{i=1}^{n} \text{ord}(c_i) \right) \mod m

    where cic_i is the ithi^{th} character of the string, ord(ci)\text{ord}(c_i) is the ordinal value of character cic_i, nn is the length of the string, and mm is the hash table size. For example, if k=“CAT”k = \text{“CAT”} and using ord(C)=3\text{ord}(C) = 3, ord(A)=1\text{ord}(A) = 1, ord(T)=20\text{ord}(T) = 20, h(“CAT”)=(3+1+20)mod10=24mod10=4h(\text{“CAT”}) = (3+1+20) \mod 10 = 24 \mod 10 = 4.

Let’s see the hash function in action! We’ll use the most basic hash function h(k)=kmodmh(k) = k \mod m for demonstration purpose.

Hash Function: h(k)=kmodmSince we’ll use a hash table of size 7m=7Our list of keys: k1k2k3k4k5361117Calculate the hash value of each keys: h(k1)=3mod7=3h(k2)=6mod7=6h(k3)=1mod7=1h(k4)=11mod7=4h(k5)=7mod7=0h(k1)h(k2)h(k3)h(k4)h(k5)36140Distributing the keys into the hash table: KeyIndexk5k3k1k4k20123456\text{Hash Function: } h(k) = k \mod m \\ \text{Since we'll use a hash table of size 7} \rightarrow m = 7 \\[10pt] \text{Our list of keys: } \\ \begin{array}{|c|c|c|c|c|} \hline k_1 & k_2 & k_3 & k_4 & k_5 \\ \hline 3 & 6 & 1 & 11 & 7 \\ \hline \end{array} \\ \\[10pt] \text{Calculate the hash value of each keys: } \\ \begin{array}{|c|} \hline h(k_1) = 3 \mod 7 = 3 \\ \hline h(k_2) = 6 \mod 7 = 6 \\ \hline h(k_3) = 1 \mod 7 = 1 \\ \hline h(k_4) = 11 \mod 7 = 4 \\ \hline h(k_5) = 7 \mod 7 = 0 \\ \hline \end{array} \\ \\[5pt] \begin{array}{|c|c|c|c|c|} \hline h(k_1) & h(k_2) & h(k_3) & h(k_4) & h(k_5) \\ \hline 3 & 6 & 1 & 4 & 0 \\ \hline \end{array} \\ \\[10pt] \text{Distributing the keys into the hash table: } \\ \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|} \hline k_5 & k_3 & & k_1 & k_4 & & k_2 \\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \end{array} \\[10pt]

In general, a hash function needs to satisfy some requirements. It should evenly distribute keys across the hash table, be computationally efficient, and operate in constant time, regardless of the number of keys or the table’s size.

Collisions

Collision is a phenomenon of two (or more) keys being hashed into the same cell of the hash table.

If the hash table size mm is smaller than the number of keys nn, collisions are inevitable. Even when mm is significantly larger than nn, collisions can still occur. However, with a well-chosen table size and an effective hash function, collisions become rare.

Every hashing scheme requires a collision resolution mechanism, which differs between the two primary types of hashing:

  • Open Hashing (or Separate Chaining)
  • Closed Hashing (or Open Addressing)

Open Hashing (Separate Chaining)

In open hashing, the keys are stored on linked lists (or buckets) attached to cells of the hash table. Each bucket contains all the keys hashed to its cell.

We will use these definitions for the pseudo-code implementations:

typedef struct HashNode* Ref;
struct HashNode {
	string key;
	int value;
	HashNode* next;
};

Ref table[1..m]

Search Operation of Open Hashing

To search for a key kk in the hash table, we first compute its hash value h(k)h(k), using a hash function.

If the bucket at index h(k)h(k) is empty, the key is not present in the table. However, if the bucket is occupied, it may contain kk or other keys that have hashed to the same index due to collisions. In such cases, we traverse the bucket to check for an occurrence of kk, ensuring that the correct key is found.

Hash Function: h(k)=kmodmbucket at h(k) = 0: 1015202530bucket at h(k) = 1: 1116212631bucket at h(k) = 2: 1217222732bucket at h(k) = 3: 030813182328bucket at h(k) = 4: 091419242934Search for k=28 with h(28)=28mod5=3Bucket at index 3 is not NULL, so we traverse list at index 3030813182328\text{Hash Function: } h(k) = k \mod m \\[10pt] \begin{array}{cl} \text{bucket at h(k) = 0: } & \quad \fbox{10} \rightarrow \fbox{15} \rightarrow \fbox{20} \rightarrow \fbox{25} \rightarrow \fbox{30} \\ \text{bucket at h(k) = 1: } & \quad \fbox{11} \rightarrow \fbox{16} \rightarrow \fbox{21} \rightarrow \fbox{26} \rightarrow \fbox{31} \\ \text{bucket at h(k) = 2: } & \quad \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\ \text{bucket at h(k) = 3: } & \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \fbox{28} \\ \text{bucket at h(k) = 4: } & \quad \fbox{09} \rightarrow \fbox{14} \rightarrow \fbox{19} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \end{array} \\ \\[10pt] \textcolor{cyan}{\text{Search for } k = 28 \text{ with } h(28) = 28 \mod 5 = 3} \\ \text{Bucket at index 3 is not NULL, so we traverse list at index } 3 \\ \\[5pt] \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \textcolor{red}{\fbox{28}} \\
Search(key) {
	index = hashFunction(key);
	Ref p = table[index];
	
	// Traverse the bucket
	while (p) { 
		if (p->key == key) return p;
		p = p->next;
	}
	
	return NULL;
}

Insertion Operation of Open Hashing

To insert a key kk into a hash table, we first check whether it is already present. This involves computing its hash value h(k)h(k) and searching the corresponding bucket.

If the key is found, no further action is needed. If the search is unsuccessful, meaning kk is not yet in the table, it is added to the bucket at index h(k)h(k).

Hash Function: h(k)=kmodmbucket at h(k) = 0: 1015202530bucket at h(k) = 1: 1116212631bucket at h(k) = 2: 1217222732bucket at h(k) = 3: 030813182328bucket at h(k) = 4: 091419242934Insert k=37 with h(37)=37mod5=2Bucket at index 3 does not contain 37, so we insert 37to the beginning of the list at index 3371217222732bucket at h(k) = 0: 1015202530bucket at h(k) = 1: 1116212631bucket at h(k) = 2: 371217222732bucket at h(k) = 3: 030813182328bucket at h(k) = 4: 091419242934\text{Hash Function: } h(k) = k \mod m \\[10pt] \begin{array}{cl} \text{bucket at h(k) = 0: } & \quad \fbox{10} \rightarrow \fbox{15} \rightarrow \fbox{20} \rightarrow \fbox{25} \rightarrow \fbox{30} \\ \text{bucket at h(k) = 1: } & \quad \fbox{11} \rightarrow \fbox{16} \rightarrow \fbox{21} \rightarrow \fbox{26} \rightarrow \fbox{31} \\ \text{bucket at h(k) = 2: } & \quad \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\ \text{bucket at h(k) = 3: } & \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \fbox{28} \\ \text{bucket at h(k) = 4: } & \quad \fbox{09} \rightarrow \fbox{14} \rightarrow \fbox{19} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \end{array} \\ \\[10pt] \textcolor{cyan}{\text{Insert } k = 37 \text{ with } h(37) = 37 \mod 5 = 2} \\ \text{Bucket at index 3 does not contain 37, so we insert 37} \\ \text{to the beginning of the list at index } 3 \\[5pt] \quad \textcolor{red}{\fbox{37}} \rightarrow \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\[10pt] \begin{array}{cl} \text{bucket at h(k) = 0: } & \quad \fbox{10} \rightarrow \fbox{15} \rightarrow \fbox{20} \rightarrow \fbox{25} \rightarrow \fbox{30} \\ \text{bucket at h(k) = 1: } & \quad \fbox{11} \rightarrow \fbox{16} \rightarrow \fbox{21} \rightarrow \fbox{26} \rightarrow \fbox{31} \\ \text{bucket at h(k) = 2: } & \quad \textcolor{red}{\fbox{37}} \rightarrow \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\ \text{bucket at h(k) = 3: } & \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \fbox{28} \\ \text{bucket at h(k) = 4: } & \quad \fbox{09} \rightarrow \fbox{14} \rightarrow \fbox{19} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \end{array} \\
Insertion(key, value) {
	index = hashFunction(key);
	
	// Do nothing if the key is already present
	if (Search(key)) return;
	
	// Add the key to the beginning of the bucket
	Ref p = new HashNode{ key, value, table[index] };
	table[index] = p;
}

Deletion Operation of Open Hashing

To remove a key kk in the hash table, we first search for it by computing its hash value h(k)h(k) and checking the corresponding bucket. If the key is found, the node containing kk is deleted from the bucket at index h(k)h(k). If the key is not found, no action is needed, as it is not present in the table.

Hash Function: h(k)=kmodmbucket at h(k) = 0: 1015202530bucket at h(k) = 1: 1116212631bucket at h(k) = 2: 1217222732bucket at h(k) = 3: 030813182328bucket at h(k) = 4: 091419242934Delete k=19 with h(19)=19mod5=4Bucket at index 3 does contain 19, so wetraverse the list at index 4 and delete 1991419242934bucket at h(k) = 0: 1015202530bucket at h(k) = 1: 1116212631bucket at h(k) = 2: 1217222732bucket at h(k) = 3: 030813182328bucket at h(k) = 4: 0914242934\text{Hash Function: } h(k) = k \mod m \\[10pt] \begin{array}{cl} \text{bucket at h(k) = 0: } & \quad \fbox{10} \rightarrow \fbox{15} \rightarrow \fbox{20} \rightarrow \fbox{25} \rightarrow \fbox{30} \\ \text{bucket at h(k) = 1: } & \quad \fbox{11} \rightarrow \fbox{16} \rightarrow \fbox{21} \rightarrow \fbox{26} \rightarrow \fbox{31} \\ \text{bucket at h(k) = 2: } & \quad \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\ \text{bucket at h(k) = 3: } & \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \fbox{28} \\ \text{bucket at h(k) = 4: } & \quad \fbox{09} \rightarrow \fbox{14} \rightarrow \fbox{19} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \end{array} \\ \\[10pt] \textcolor{cyan}{\text{Delete } k = 19 \text{ with } h(19) = 19 \mod 5 = 4} \\ \text{Bucket at index 3 does contain 19, so we} \\ \text{traverse the list at index 4 and delete 19} \\[5pt] \quad \fbox{9} \rightarrow \fbox{14} \rightarrow \textcolor{red}{\cancel{\fbox{19}}} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \\[10pt] \begin{array}{cl} \text{bucket at h(k) = 0: } & \quad \fbox{10} \rightarrow \fbox{15} \rightarrow \fbox{20} \rightarrow \fbox{25} \rightarrow \fbox{30} \\ \text{bucket at h(k) = 1: } & \quad \fbox{11} \rightarrow \fbox{16} \rightarrow \fbox{21} \rightarrow \fbox{26} \rightarrow \fbox{31} \\ \text{bucket at h(k) = 2: } & \quad \fbox{12} \rightarrow \fbox{17} \rightarrow \fbox{22} \rightarrow \fbox{27} \rightarrow \fbox{32} \\ \text{bucket at h(k) = 3: } & \quad \fbox{03} \rightarrow \fbox{08} \rightarrow \fbox{13} \rightarrow \fbox{18} \rightarrow \fbox{23} \rightarrow \fbox{28} \\ \text{bucket at h(k) = 4: } & \quad \fbox{09} \rightarrow \fbox{14} \rightarrow \fbox{24} \rightarrow \fbox{29} \rightarrow \fbox{34} \\ \end{array} \\
Deletion(key) {
	index = hashFunction(key);
	
	// Bucket is empty -> Key does not exist
	if (!table[index]) return;
	
	// Key is found at the head of the linked list
	if (table[index]->key == key) {
		// Delete the head
		Ref p = table[index];
		table[index] = table[index]->next;
		delete p;
		return;
	} 
	
	// Use two pointers to traverse the linked list
	// Similar to the Insertion operation of Priority Queue
	Ref p2 = table[index], p1 = table[index]->next;
	while (p1 && p1->key != key) {
		p2 = p1; p1 = p1->next;
	}
	
	// If found, delete the node containing the key
	if (!p1) return;
	p2->next = p1->next;
	delete p1;
}

Analysis of Open Hashing

The efficiency of Open Hashing depends on the lengths of the buckets, which, in turn, depend on the table size and the quality of the hash function.

If nn keys are distributed among mm cells of the hash table evenly, the length of each bucket will be about α=nm\alpha = \frac{n}{m}, where α\alpha is called the load factor of the hash table and should not be far from 11.

  • α1\alpha \ll 1 indicates an efficient use of space, as there are too many empty singly-linked list in the hash table.
  • α1\alpha \gg 1 indicates that the linked lists are very long, which will take more time to perform operations on these lists.
  • α1\alpha \approx 1 is our preferred load factor, most of the time we only need one comparison (or constant time) to perform the three operations on the hash table.

Closed Hashing (Open Addressing)

In closed hashing, all keys are stored in the hash table itself without the use of linked lists. Therefore, the table size mm must be at least as large as the number of records nn.

There are different strategies that can be employed for collision resolution, with the simplest one being called linear probing. The idea of this strategy is to check the cell located next to the position where collision occurs.

We will use these definitions for the implementation of the hash table:

struct HashNode {
	string key;
	int value;
}

HashNode* table[1..m]

Search Operation of Closed Hashing

To search for a key kk in a hash table, its hash value h(k)h(k) is first computed.

If the cell at index h(k)h(k) is empty, the search is unsuccessful.

Otherwise, kk is compared with the key stored in that cell.

If they match, the search is successful.

If not, the search continues by checking the next cell.

This process repeats until either a matching key is found or an empty cell is encountered, indicating an unsuccessful search. If the end of the hash table is reached, the search wraps around to the beginning of the table. In other words, the hash table is now a circular array!

Hash Function: h(k)=kmodmKeyIndex10182228901234Search for k=18 with h(18)=18mod5=3First Probe:{index = 3mod5=3;table[3] = 28not foundSecond Probe:{index = (3 + 1)mod5=4;table[4] = 9not foundThird Probe:{index = (3 + 2)mod5=0;table[0] = 10not foundFourth Probe:{index = (3 + 3)mod5=1;table[1] = 18Found at table[1]\text{Hash Function: } h(k) = k \mod m \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|} \hline 10 & 18 & 22 & 28 & 9 \\ \hline 0 & 1 & 2 & 3 & 4 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Search for } k = 18 \text{ with } h(18) = 18 \mod 5 = 3} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 3} \mod 5 = 3; \\ \text{table[3] = 28} \rightarrow \text{not found} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (3 + 1)} \mod 5 = 4; \\ \text{table[4] = 9} \rightarrow \text{not found} \end{cases} \\ \text{Third Probe:} \begin{cases} \text{index = (3 + 2)} \mod 5 = 0; \\ \text{table[0] = 10} \rightarrow \text{not found} \end{cases} \\ \text{Fourth Probe:} \begin{cases} \text{index = (3 + 3)} \mod 5 = 1; \\ \text{table[1] = 18} \rightarrow \textcolor{red}{\text{Found at table[1]}} \end{cases} \\
Search(key) {
	hashValue = hashFunction(key);
	
	// The 'for' loop for probing
	for (i = 0; i < m; i++) {
		// Calculate the index of the cell
		index = (hashValue + i) % m;
		
		if (!table[index]) 
			return NULL;
		if (table[index]->key == key) 
			return table[index];
	}
	return NULL;
}

Insertion Operation of Closed Hashing

To insert a key kk into a hash table, the process begins by searching for an empty cell. This involves computing the hash value h(k)h(k) and checking if the corresponding index is available. If the cell is occupied, the search continues by checking the next cell, until an empty cell is found. Once located, kk is inserted into the empty cell.

KeyIndexTomMickeyPlutoJerryBalooMinnieTeddy012345678Insert k=‘Donald’ with h(k)=3First Probe:{index = 3mod9=3;table[3] is emptyInsert ‘Donald’KeyIndexTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678Insert k=‘Trump’ with h(k)=7First Probe:{index = 7mod9=7;table[7] is occupiedcontinueSecond Probe:{index = (7 + 1)mod9=8;table[8] is occupiedcontinueThird Probe:{index = (7 + 2)mod9=0;table[0] is emptyInsert ‘Trump’KeyIndexTrumpTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678\begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \hspace{0.6cm} & \text{\small{Tom}} & \text{\small{Mickey}} & \hspace{0.6cm} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Insert } k = \text{‘Donald’ with } h(k) = 3} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 3} \mod 9 = 3; \\ \text{table[3] is empty} \rightarrow \textcolor{red}{\text{Insert ‘Donald’}} \end{cases} \\ \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \hspace{0.6cm} & \text{\small{Tom}} & \text{\small{Mickey}} & \textcolor{red}{\text{\small{Donald}}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Insert } k = \text{‘Trump’ with } h(k) = 7} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 7} \mod 9 = 7; \\ \text{table[7] is occupied} \rightarrow \text{continue} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (7 + 1)} \mod 9 = 8; \\ \text{table[8] is occupied} \rightarrow \text{continue} \end{cases} \\ \text{Third Probe:} \begin{cases} \text{index = (7 + 2)} \mod 9 = 0; \\ \text{table[0] is empty} \rightarrow \textcolor{red}{\text{Insert ‘Trump’}} \end{cases} \\ \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \textcolor{red}{\text{\small{Trump}}} & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array}
Insertion(key, value) {
	hashValue = hashFunction(key);
	
	for (i = 0; i < m; i++) {
		index = (hashValue + i) % m;
		
		if (!table[index]) { 
			// Insert into the empty cell
			table[index] = new hashNode{ key, value };
			return;
		} else if (table[index]->key == key) {
			// The key already exists --> Update its key value
			table[index]->value = value;
			return;
		}
	}
}

Rehashing

If the hash table is full, operations like the Insertion operation will fail.

Rehashing is a process used to handle situations where a hash table becomes full or nearly full. When rehashing occurs, the current hash table is scanned, and all its key are relocated into a larger table.

Deletion Operation of Closed Hashing

To remove a key kk, we first search for it in within the hash table by computing the hash value h(k)h(k). If the search is successful, kk is deleted from the cell.

KeyIndexTrumpTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678Delete k=‘Trump’ with h(k)=7First Probe:{index = 7mod9=7;table[7]  ‘Trump’continueSecond Probe:{index = (7 + 1)mod9=8;table[8]  ‘Trump’continueThird Probe:{index = (7 + 2)mod9=0;table[0] = ‘Trump’Delete ‘Trump’KeyIndexTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678\begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \text{\small{Trump}} & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Delete } k = \text{‘Trump’ with } h(k) = 7} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 7} \mod 9 = 7; \\ \text{table[7] $\neq$ ‘Trump’} \rightarrow \text{continue} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (7 + 1)} \mod 9 = 8; \\ \text{table[8] $\neq$ ‘Trump’} \rightarrow \text{continue} \end{cases} \\ \text{Third Probe:} \begin{cases} \text{index = (7 + 2)} \mod 9 = 0; \\ \text{table[0] = ‘Trump’} \rightarrow \textcolor{red}{\text{Delete ‘Trump’}} \end{cases} \\ \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \hspace{0.6cm} & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array}

However, a challenge arises when kk is in the table, but the search is unsuccessful due to the probing method used! This issue occurs because deletion can disrupt the search chain, leading to failed lookups for existing keys!

KeyIndexTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678Delete k=‘Tom’ with h(k)=8First Probe:{index = 8mod9=8;table[8]  ‘Tom’continueSecond Probe:{index = (8 + 1)mod9=0;table[0] is emptyStop‘Tom’ is not deleted!\begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \hspace{0.6cm} & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Delete } k = \text{‘Tom’ with } h(k) = 8} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 8} \mod 9 = 8; \\ \text{table[8] $\neq$ ‘Tom’} \rightarrow \text{continue} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (8 + 1)} \mod 9 = 0; \\ \text{table[0] is empty} \rightarrow \textcolor{red}{\text{Stop}} \end{cases} \\[10pt] \textcolor{red}{\text{‘Tom’ is not deleted!}}

To handle this, previously occupied locations should be marked with a special symbol to distinguish them from locations that have not been occupied!

KeyIndexTrumpTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678Delete k=‘Trump’ with h(k)=7First Probe:{index = 7mod9=7;table[7]  ‘Trump’continueSecond Probe:{index = (7 + 1)mod9=8;table[8]  ‘Trump’continueThird Probe:{index = (7 + 2)mod9=0;table[0] = ‘Trump’Delete ‘Trump’KeyIndexTomMickeyDonaldPlutoJerryBalooMinnieTeddy012345678Mark the previously occupied location with -∞ Delete k=‘Tom’ with h(k)=8First Probe:{index = 8mod9=8;table[8]  ‘Tom’continueSecond Probe:{index = (8 + 1)mod9=0;table[0] = -∞Still continueThird Probe:{index = (8 + 2)mod9=1;table[1] = ‘Tom’Delete ‘Tom’KeyIndexMickeyDonaldPlutoJerryBalooMinnieTeddy012345678\begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline \text{\small{Trump}} & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{cyan}{\text{Delete } k = \text{‘Trump’ with } h(k) = 7} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 7} \mod 9 = 7; \\ \text{table[7] $\neq$ ‘Trump’} \rightarrow \text{continue} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (7 + 1)} \mod 9 = 8; \\ \text{table[8] $\neq$ ‘Trump’} \rightarrow \text{continue} \end{cases} \\ \text{Third Probe:} \begin{cases} \text{index = (7 + 2)} \mod 9 = 0; \\ \text{table[0] = ‘Trump’} \rightarrow \textcolor{red}{\text{Delete ‘Trump’}} \end{cases} \\ \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline -∞ & \text{\small{Tom}} & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array} \\[10pt] \textcolor{orange}{\text{Mark the previously occupied location with -∞ }} \\[10pt] \textcolor{cyan}{\text{Delete } k = \text{‘Tom’ with } h(k) = 8} \\ \\[10pt] \text{First Probe:} \begin{cases} \text{index = 8} \mod 9 = 8; \\ \text{table[8] $\neq$ ‘Tom’} \rightarrow \text{continue} \end{cases} \\ \text{Second Probe:} \begin{cases} \text{index = (8 + 1)} \mod 9 = 0; \\ \text{table[0] = -∞} \rightarrow \textcolor{green}{\text{Still continue}} \end{cases} \\ \text{Third Probe:} \begin{cases} \text{index = (8 + 2)} \mod 9 = 1; \\ \text{table[1] = ‘Tom’} \rightarrow \textcolor{red}{\text{Delete ‘Tom’}} \end{cases} \\ \\[10pt] \begin{array}{c} \text{Key} \quad \\ \hline \text{Index} \quad \end{array} \begin{array}{|c|c|c|c|c|c|c|c|c|} \hline -∞ & -∞ & \text{\small{Mickey}} & \text{\small{Donald}} & \text{\small{Pluto}} & \text{\small{Jerry}} & \text{\small{Baloo}} & \text{\small{Minnie}} & \text{\small{Teddy}}\\ \hline 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline \end{array}
Deletion(key) {
	hashValue = hashFunction(key);
	for (i = 0; i < m; i++) {
		index = (hashValue + i) % m;
		if (!table[index]) return;
		if (table[index] != -&& table[index]->key == key) {
			// Mark the cell as deleted with -∞
			table[index] = -
			return;
		}
	}
}

Since we’re marking deleted keys with -∞, we will also need to modify the Search and Insertion function to treat the cells with -∞ as empty cells.

// For the 'Search' function
if (!table[index] || table[index] == -∞) 
	return NULL; // .... and the rest of the function
		
		
// For the 'Insertion' function
if (!table[index] || table[index] == -∞) { 
	table[index] = new hashNode{ key, value };
	return;
} else { // .... and the rest of the function

Analysis of Closed Hashing

Analyzing linear probing is much harder than open hashing because collisions can cause clusters of occupied cells, slowing things down. When the table isn’t too full, operations usually take about one cell access on average.



👏 Thanks for reading!