Huffman Coding

You are given an array 'ARR' of Integers having 'N' elements. The array contains an encoded message. For each index 'i', 'ARR[i]' denotes the frequency of the 'i'th' character in the message. The characters are of an alien language having 'N' alphabets. Given the frequency of each of the 'N' alphabets in the message, your task is to find out the Huffman codes for each of the 'N' alphabets in the message.

The Huffman Code for a message is the set of codes such that :

1) All codes are binary strings.
2) Each code should be able to determine its corresponding character uniquely.
3) The total numbers of bits used to represent the message are minimized.

Note:

If there are multiple sets of valid Huffman codes for a message. You can print any of them.

For example:

Consider the array ARR = [ 1, 4, 2 ] having 3 elements. 
The array containing Huffman Codes for the above array will be [ '10', '0', '11' ]. Other Valid Huffman Codes are [ '01', '1', '00' ], [ '00', '1', '01' ] etc. Codes like [ '1', '0', '01' ], [ '1', '10' , '0' ] are some of the invalid Huffman Codes.
Input Format:
The first line of the input contains an integer, 'T,’ denoting the number of test cases.

The first line of each test case contains an integer 'N', denoting the number of elements in the array 'ARR'.

The second line of each test case contains 'N' space-separated integers denoting the array elements.
Output Format:
The checker will print 1 if the returned Huffman codes are correct and follow all the rules, otherwise, it will print 0.

Print the output of each test case in a new line.
Note :
You do not need to print anything. It has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10
1 <= N <= 10^4
1 <= ARR[i]  <= 10^4

Where 'T' denotes the number of test cases, 'N' denotes the elements in the array 'ARR', and 'ARR[i]' denotes the 'i'th' element of the array 'ARR'.

Time Limit: 1 sec
AnswerBot
1y

The task is to find the Huffman codes for each alphabet in an encoded message based on their frequencies.

  • The Huffman code is a binary string that uniquely represents each character in the message.

  • The ...read more

CodingNinjas
author
2y
Optimized Solution

We will be dividing our solution into three parts for better understanding. We will use a Min-Heap to build a binary tree and then traverse the binary tree to assign Huffman codes to each element.

 

 

  1. Understanding the need of using a Min-Heap 

A basic idea would be to sort the frequency array and repeatedly give the smallest not used code to the character having maximum frequency. This idea looks decent, but it does not always provide the right answer.

Consider the sorted array ARR = [ 3, 4, 4, 5 ]. Using the above idea, we can give the codes [ '111', '110',  '10', '0' ]. The number of bits used will be 34 ( 3*3 + 4*3 + 4*2 + 5*1 ). But if we give the codes [ '11', '10', '00', '01' ]. The number of bits will be 32 ( 2*3 + 2*4 +2*4 +2*5) which is lesser than the result obtained by previous approach. 

From here, we can conclude that the sorting algorithm fails. But giving the characters having the most frequency the smallest codes makes sense. Where are we wrong?

We need to see that there can be combinations in which the character having the largest frequency is not given the smallest possible code (i.e. '0'). There are other ways possible too that are providing better answers. Though the character having the largest frequency will always have the smallest code among codes of other characters in the message, still it will not always have the smallest code possible (i.e. '0'). 

Therefore we need to greedily select two nodes having minimum frequency, assign them the largest codes and use the sum of the two nodes as a new node for further process. This can be done efficiently with the help of a Min Heap.

 

2. Structure of the heap node and how  to build Huffman Tree 

 

The node in the Heap will have an index denoting the index of its corresponding character in the message, frequency denoting the frequency of the node in the message, and the two nodes left and right denoting its left and right child, respectively. The basic idea is to take the two elements having the smallest frequency, make a new node having frequency as the sum of the frequencies of the removed elements, and add it back to the Heap. We will make the removed elements as the left child and the right child of the newly built node. In the end, we will stop when only one element remains.

 

Steps : 

  • Make an empty Heap.
  • Iterate through i = 0 to N - 1
    • Create a new node having frequency = ARR [i], index = i, and mark its left child and right child as NULL.
    • Insert the node into the heap.
  • While the number of elements in the heap are more than 1
    • Pop the node having minimum frequency from the heap. Let the node be node1.
    • Pop another node having minimum frequency from the heap. Let the node be node2. 
    • Create a new node having frequency = node1.frequency + node1.frequency, index = -1 ( as it does not point to any single index ) . Make node1 as its left child and node2 as its right child or vice-versa.
    • Insert the newly created node into the heap.

 

3. Finding the Huffman codes from the Huffman Tree 

 

We can write a recursive function that will traverse the Huffman Tree and assign the Huffman codes for every character. Let ans be the array that stores the Huffman codes for each character.

 

Working of the Recursive function 

  • The recursive function takes the root of the tree root, the built string str, as its parameters.
  • If the root is a leaf node, then set ans [ root.index ] as str and end the function.
  • Otherwise, call the recursive function for the left subtree and the right subtree. To call the left subtree use (root.left, str + "0") as the arguments and to call for the right subtree use (root.right,str + "1") as the arguments. We can also use str + "1" for the left subtree and str + "0" for the right subtree as this will also produce a valid set of Huffman codes.

 

Space Complexity: O(n)Explanation:

O(N), where N is the number of elements in the array.

 

When all elements of the input array are inserted in the heap, the number of elements in the heap are N. Hence, the overall Space Complexity is O(N). 

Time Complexity: O(nlogn)Explanation:

O(N*log(N)), where N is the number of elements in the array.

 

When all elements of the input array are inserted in the heap, the number of elements in the heap are N, and it takes O(log(N)) time to remove or insert an element from the heap. Hence, the overall Time Complexity is O(N*log(N)).


Java (SE 1.8)

/*
Time Complexity: O(N*log(N))
Space Complexity: O(N)

Where N is the number of elements in the array.
*/

import java.util.ArrayList;
import java.util.Comparator;
import java.util.PriorityQueue;

public class Solution {

// Defining the Heapnode class
static class HeapNode implements Comparable {
int frequency;
int index;
HeapNode left;
HeapNode right;

HeapNode(int frequency, int idx) {
this.frequency = frequency;
index = idx;
left = null;
right = null;
}

// Defining the class to compare two nodes based on frequency
@Override
public int compareTo(HeapNode a) {
return this.frequency - a.frequency;
}

}

// Function to parse the built Huffman Tree to assign Huffman Codes
private static void assignHuffmanCode(HeapNode root, String str, String[] ans)
{

// Checking if the current node is leaf node
if (root.left == null && root.right == null) {
ans[root.index] = str;
return;
}

// Calling for left subtree
if (root.left != null) {
assignHuffmanCode(root.left, str + "0", ans);
}

// Calling for right subtree
if (root.right != null) {
assignHuffmanCode(root.right, str + "1", ans);
}
}

public static String[] huffmanCode(int[] arr)
{

// Finding the number of elements
int n = arr.length;

if (n == 1) {
String[] ans = new String[1];
ans[0] = "1";
return ans;
}

// Defining the Min Heap
PriorityQueue heap = new PriorityQueue();

// Iterating through the array and adding elements to MinHeap
for (int i = 0; i < n; i++) {
HeapNode newNode = new HeapNode(arr[i], i);
heap.add(newNode);
}

while (heap.size() > 1)
{

// Extracting the two nodes having minimum value from the heap
HeapNode node1 = heap.peek();
heap.poll();

HeapNode node2 = heap.peek();
heap.poll();

// Creating a new node combining extracted nodes
HeapNode newNode = new HeapNode(node1.frequency + node2.frequency, -1);

// Setting its left and right children
newNode.left = node1;
newNode.right = node2;

// Inserting new node into the Min Heap
heap.add(newNode);
}

// Finding the root of Huffman string
HeapNode root = heap.peek();

// Defining the output array
String[] ans = new String[n];

// Assigning Huffman Code for every character
assignHuffmanCode(root, "", ans);

return ans;

}

}

C++ (g++ 5.4)
/*

Time Complexity: O(N*log(N))
Space Complexity: O(N)

Where N is the number of elements in the array.
*/
#include

// Defining the node class
template
class TreeNode
{
public:
T frequency;
int index;
TreeNode *left;
TreeNode *right;

TreeNode(T frequency, int idx)
{
this->frequency = frequency;
this->index = idx;
left = NULL;
right = NULL;
}

~TreeNode()
{
if (left != NULL)
{
delete left;
}
if (right != NULL)
{
delete right;
}
}
};

// Defining the class to compare two nodes based on frequency
class compare
{
public:
bool operator()(TreeNode *a, TreeNode *b)
{
return a->frequency > b->frequency;
}
};

// Function to parse the built Huffman Tree to assign Huffman Codes
void assignHuffmanCode(TreeNode *root, string &str, vector &ans)
{
// Checking if the current node is leaf node
if (root->left == NULL && root->right == NULL)
{
ans[root->index] = str;
return;
}

// Calling for left subtree
if (root->left)
{
str.push_back('0');
assignHuffmanCode(root->left, str, ans);
str.pop_back();
}

// Calling for right subtree
if (root->right)
{
str.push_back('1');
assignHuffmanCode(root->right, str, ans);
str.pop_back();
}
}

vector huffmanCode(vector &arr)
{
// Finding the number of elements
int n = arr.size();

// Handling the case when only one element is in the array
if (n == 1)
{
vector ans(1);
ans[0] = "1";
return ans;
}

// Defining the Min Heap
priority_queue *, vector *>, compare> heap;

// Iterating through the array and adding elements to MinHeap
for (int i = 0; i < n; i++)
{
TreeNode *newNode = new TreeNode(arr[i], i);
heap.push(newNode);
}

while (heap.size() > 1)
{
// Extracting the two nodes having minimum value from the heap
TreeNode *node1 = heap.top();
heap.pop();

TreeNode *node2 = heap.top();
heap.pop();

// Creating a new node combining extracted nodes
TreeNode *newNode = new TreeNode(node1->frequency + node2->frequency, -1);

// Setting its left and right children
newNode->left = node1;
newNode->right = node2;

// Inserting new node into the Min Heap
heap.push(newNode);
}

// Finding the root of Huffman string
TreeNode *root = heap.top();

// Defining the output array
vector ans(n);

string str="";

// Assigning Huffman Code for every character
assignHuffmanCode(root, str, ans);

delete root;

return ans;
}

Python (3.5)
'''

Time Complexity: O(N*log(N))
Space Complexity: O(N)

Where N is the number of elements in the array.
'''
import heapq

# Defining the node structure
class TreeNode:
def __init__(self, frequency, idx):
self.frequency = frequency
self.index = idx
self.left = None
self.right = None

def __del__(self):
if (self.left != None):
del self.left
if (self.right != None):
del self.right

def __lt__(self, other):
return self.frequency > other.frequency


# Function to parse the built Huffman Tree to assign Huffman Codes
def assignHuffmanCode(root, str, ans):

# Checking if the current node is leaf node
if (root.left == None and root.right == None):
ans[root.index] = str
return

# Calling for left subtree
if (root.left):
assignHuffmanCode(root.left, str + '0', ans)

# Calling for right subtree
if (root.right):
assignHuffmanCode(root.right, str + '1', ans)


def huffmanCode(arr):

# Finding the number of elements
n = len(arr)

# Handling the case when only one element is in the array
if (n == 1):
return ['1']

# Defining the Min Heap
heap = []

# Iterating through the array and adding elements to MinHeap
for i in range(n):
newNode = TreeNode(arr[i], i)
heap.append(newNode)

heapq.heapify(heap)

while (len(heap) > 1):

# Extracting the two nodes having minimum value from the heap
node1 = heapq.heappop(heap)
node2 = heapq.heappop(heap)

# Creating a new node combining extracted nodes
newNode = TreeNode(node1.frequency + node2.frequency, -1)

# Setting its left and right children
newNode.left = node1
newNode.right = node2

# Inserting new node into the Min Heap
heapq.heappush(heap, newNode)

# Finding the root of Huffman string
root = heapq.heappop(heap)

# Defining the output array
ans = [None] * n

# Assigning Huffman Code for every character
assignHuffmanCode(root, '', ans)

del root

return ans
Help your peers!
Add answer anonymously...
Myntra Mern Stack Developer Interview Questions
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter