Find top k (or most frequent) numbers in a stream
You are given an Integer array ‘ARR’ and an Integer ‘K’. Your task is to find the ‘K’ most frequent elements in ‘ARR’. Return the elements sorted in ascending order.

For Example:

You are given ‘ARR’ = {1, 2, 2, 3, 3} and ‘K’ = 2. Then the answer will {2, 3} as 2 and 3 are the elements occurring most number of times.

Input Format :

The first line of the input contains a single integer 'T', representing the number of test cases. The first line of each test case contains two space-separated integers, ‘N’ and ‘K’, representing the size of ‘ARR’ and given integer ‘K’, respectively. The Second line contains ‘N’ space-separated integers representing the elements of ‘ARR’.

Output Format:

For each test case, print the ‘K’ most frequent elements in ‘ARR’. Print the output of each test case in a separate line.

Note:

It is guaranteed that a unique answer exists. You do not need to print anything. It has already been taken care of. Just implement the given function.

Constraints:

1

Question

Find top k (or most frequent) numbers in a stream

You are given an Integer array ‘ARR’ and an Integer ‘K’. Your task is to find the ‘K’ most frequent elements in ‘ARR’. Return the elements sorted in ascending order.

For Example:

You are given ‘ARR’ = {1, 2, 2, 3, 3} and ‘K’ = 2. Then the answer will {2, 3} as 2 and 3 are the elements occurring most number of times.

Input Format :

The first line of the input contains a single integer 'T', representing the number of test cases.

The first line of each test case contains two space-separated integers, ‘N’ and ‘K’, representing the size of ‘ARR’ and given integer ‘K’, respectively.

The Second line contains ‘N’ space-separated integers representing the elements of ‘ARR’.

Output Format:

For each test case, print the ‘K’ most frequent elements in ‘ARR’.

Print the output of each test case in a separate line.

Note:

It is guaranteed that a unique answer exists.

You do not need to print anything. It has already been taken care of. Just implement the given function.

Constraints:

1 <= T <= 10
1 <= N <= 5000
1 <= K <= Number of unique elements in ‘ARR’
1 <= ARR[i] <= 10^6

Time Limit: 1 sec

Accepted Answer

The idea is to store the top k elements with maximum frequency. To store them a vector or an array can be used. To keep the track of frequencies of elements creates a HashMap to store element-frequency pairs. Given a stream of numbers, when a new element appears in the stream update the frequency of that element in HashMap and put that element at the end of the list of K numbers (total k+1 elements) now compare adjacent elements of the list and swap if higher frequency element is stored next to it.

Algorithm :
1. Create a Hashmap hm, and an array of k + 1 length.
2. Traverse the input array from start to end.
3. Insert the element at k+1 th position of the array, update the frequency of that element in HashMap.
4. Now, traverse the temp array from start to end – 1:
For very element, compare the frequency and swap if higher frequency element is stored next to it, if the frequency is same then swap is the next element is greater.
5. At last, print the top k element in each traversal of original array.

Accepted Answer

QuickSelect

Prerequisite: Quick Sort

In this approach, we will try to divide the problem at every step into a smaller problem. We know that the kth top frequent element is (n - k)th less frequent. So, we will do a partial sort from the less frequent element to the most frequent one, till (n - k)th less frequent element takes place (n - k) in a sorted array. To do so, we will use quickselect. Quickselect is an algorithm used to find the kth smallest element in an array. In this problem, we will modify the general quickselect algorithm to find (n - k)th less frequent element.

We will perform quickselect on an array containing all the keys of the hash map where the key is an element of ‘ARR’ and value is how often this element appears in 'ARR'. We will use a partition scheme to put all the less frequent elements to the left of the pivot and the more frequent elements to the right of the pivot. Then, we will put the pivot element at its correct position in the final sorted array(sorted according to the frequencies). When the pivot reaches the target position, i.e.,(n - k)th position, then all the elements to its right will be k most frequent elements.

Algorithm :

Initialize a hash map ‘MAP’ where the key is the element and value is how often this element appears in ‘ARR’.
Iterate every element in ‘ARR’:
- Add the current element to ‘MAP’ and increase its value by 1.
Set ‘SIZE’ as ‘MAP.SIZE’.
Initialize an integer array ‘UNIQUE’ and add all the keys of ‘MAP’ to it.
Call ‘QUICKSELECT’(0, ‘SIZE’ - 1, ‘SIZE’ -’K’).
Return elements of ‘UNIQUE’ from index‘(SIZE’ - ’K’) to ‘SIZE’.

Maintain a function ‘QUICKSELECT’(‘LEFT’, ‘RIGHT’, ‘KSMALL’):

If ‘LEFT’ is equal to ‘RIGHT’, return.
Select a random pivot between ‘LEFT’ to ‘RIGHT’.
Set ‘PIVOT’ as ‘PARTITION’(‘LEFT’, ‘RIGHT’, ‘PIVOT’).
If ‘KSMALL’ is equal to ‘PIVOT’, return.
Otherwise, if ‘KSMALL’ is less than ‘PIVOT’, call quickselect on the left partition.
Otherwise, call quickselect on the right partition.

Maintain a function ‘PARTITION’(‘LEFT’, ‘RIGHT’, ‘PIVOT’):

Set ‘PIVOTFREQUENCY’ as the frequency of ‘UNIQUE’[‘PIVOT’].
Swap ‘UNIQUE’[‘PIVOT’] and ‘UNIQUE’[‘RIGHT’].
Set ‘IDX’ as ‘LEFT’.
Iterate ‘I’ in ‘LEFT’ to ‘RIGHT’
- If the frequency of ‘UNIQUE’[‘I’] is less than the frequency of the pivot element, move ‘UNIQUE’[‘I’] to the left of the pivot element and increment ‘IDX’ by 1.
Swap ‘UNIQUE’[‘IDX’] and ‘UNIQUE’[‘RIGHT’].
Return ‘IDX’.

Space Complexity: O(n)Explanation:

O(N).

The Hash map and the array of unique elements can have at most N elements. Hence, the total space complexity is O(N).

Time Complexity: O(n^2)Explanation:

O(N ^ 2), where 'N' is the size of the input array.

The worst-case time complexity of quickselect is O(N ^ 2) when bad pivots are which doesn’t divide the problem in half are chosen constantly. But when the pivots are chosen randomly, the probability of such a worst-case is small. The average case time complexity of quickselect is O(N).

Python (3.5)

'''

    Time Complexity: O(N ^ 2)

    Space Complexity: O(N)



    where 'N' is the size of the input array.

'''



from math import ceil, floor

import random

from typing import List



uniq = []

mp = {}





def swapUnique(a: int, b: int):

    global uniq

    temp = uniq[a]

    uniq[a] = uniq[b]

    uniq[b] = temp





def partition(left: int, right: int, pivot: int):

    global uniq

    global mp



    pivotFrequency = mp[uniq[pivot]]



    # Move pivot to end.

    swapUnique(pivot, right)

    idx = left



    # Move all less frequent elements to the left of the pivot.

    for i in range(left, right + 1):



        if (mp[uniq[i]] < pivotFrequency):



            swapUnique(idx, i)

            idx += 1



    # Move pivot to its final place.

    swapUnique(idx, right)



    return idx





def quickselect(left: int, right: int, kSmall: int):



    # If the list contains only one element.

    if (left == right):

        return



    # Select a random index as pivot.

    pivot = left + floor(random.uniform(0, 100000)) % (right-left)



    # Find the position of pivot in the sorted list.

    pivot = partition(left, right, pivot)



    #  If the pivot is in its final sorted position.

    if (kSmall == pivot):

        return



    elif (kSmall < pivot):

        # Move in the left direction.

        quickselect(left, pivot - 1, kSmall)



    else:

        # Move in the right direction.

        quickselect(pivot + 1, right, kSmall)





def KMostFrequent(n: int, k: int, arr: List[int]):



    global mp

    global uniq

    mp.clear()



    #  Build map where the key is element

    # and value is how often this element appears in 'ARR'.



    for ele in arr:

        if ele not in mp:

            mp[ele] = 0

        mp[ele] += 1



    size = len(mp.items())

    uniq = [0 for i in range(size)]

    i = 0



    for x in mp:

        uniq[i] = x

        i += 1



    # Perform quickselect.

    quickselect(0, size - 1, size - k)



    # Return top 'K' frequent elements

    topK = []



    for i in range(size - k, size):

        topK.append(uniq[i])



    topK.sort()

    return topK

C++ (g++ 5.4)

/*
    Time Complexity: O(N ^ 2)
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

#include
#include

vector uniq;
map mp;

void swapUnique(int a, int b) {
    int temp = uniq[a];
    uniq[a] = uniq[b];
    uniq[b] = temp;
}

int partition(int left, int right, int pivot) {

    int pivotFrequency = mp[uniq[pivot]];

    // Move pivot to end.
    swapUnique(pivot, right);
    int idx = left;

    // Move all less frequent elements to the left of the pivot.
    for (int i = left; i <= right; i++) {

        if (mp[uniq[i]] < pivotFrequency) {

            swapUnique(idx, i);
            idx++;
        }
    }

    // Move pivot to its final place.
    swapUnique(idx, right);

    return idx;
}

void quickselect(int left, int right, int kSmall) {

    // If the list contains only one element.
    if (left == right) {
        return;
    }

    // Select a random index as pivot.
    int pivot = left + rand()%(right - left);

    // Find the position of pivot in the sorted list.
    pivot = partition(left, right, pivot);

    // If the pivot is in its final sorted position.
    if (kSmall == pivot) {

        return;
    }
    else if (kSmall < pivot) {

        // Move in the left direction.
        quickselect(left, pivot - 1, kSmall);
    }
    else {

        // Move in the right direction.
        quickselect(pivot + 1, right, kSmall);
    }
}

vector KMostFrequent(int n, int k, vector &arr)
{
    mp.clear();

    // Build map where the key is element
    // and value is how often this element appears in 'ARR'.
    for (int ele : arr) {
        mp[ele]++;
    }

    int size = mp.size();
    uniq.assign(size, 0);
    int i = 0;

    // Build array of uniq elements.
    for (auto x: mp) {

        uniq[i] = x.first;
        i++;
    }

    // Perform quickselect.
    quickselect(0, size - 1, size - k);

    // Return top 'K' frequent elements
    vector topK;
    for(int i = size-k; i < size; i++){
        topK.push_back(uniq[i]);
    }
    
    sort(topK.begin(), topK.end());
    return topK;
}

Java (SE 1.8)

/*
    Time Complexity: O(N ^ 2)
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;

public class Solution {

	static int[] unique;
    static Map map;

	public static int[] KMostFrequent(int n, int k, int[] arr) {

        map = new HashMap();

		// Build map where the key is element
		// and value is how often this element appears in 'ARR'.
		for (int ele : arr) {

			map.put(ele, map.getOrDefault(ele, 0) + 1);
		}

        int size = map.size();
        unique = new int[size];
        int i = 0;

		// Build array of unique elements.
        for (int num: map.keySet()) {

			unique[i] = num;
            i++;
        }

		// Perform quickselect.
        quickselect(0, size - 1, size - k);

        // Return top 'K' frequent elements
        return Arrays.copyOfRange(unique, size - k, size);

	}

	public static void quickselect(int left, int right, int kSmall) {

        // If the list contains only one element.
        if (left == right) {
			return;
		}

        Random r = new Random();

        // Select a random index as pivot.
        int pivot = left + r.nextInt(right - left);

        // Find the position of pivot in the sorted list.
        pivot = partition(left, right, pivot);

        // If the pivot is in its final sorted position.
        if (kSmall == pivot) {

            return;
        }
		else if (kSmall < pivot) {

			// Move in the left direction.
            quickselect(left, pivot - 1, kSmall);
        }
		else {

			// Move in the right direction.
            quickselect(pivot + 1, right, kSmall);
        }
    }

    public static int partition(int left, int right, int pivot) {

		int pivotFrequency = map.get(unique[pivot]);

		// Move pivot to end.
        swap(pivot, right);
        int idx = left;

        // Move all less frequent elements to the left of the pivot.
        for (int i = left; i <= right; i++) {

			if (map.get(unique[i]) < pivotFrequency) {

				swap(idx, i);
                idx++;
            }
        }

        // Move pivot to its final place.
        swap(idx, right);

        return idx;
    }

	public static void swap(int a, int b) {
        int temp = unique[a];
        unique[a] = unique[b];
        unique[b] = temp;
    }
}

Accepted Answer

Heap

In this approach, we will sort the unique elements of the input array according to how often this element appears in 'ARR'. We will use a hasp map where the key is element and value is how often this element appears in 'ARR'. Then we will initialize a heap where elements will be sorted in descending order according to the element’s frequency. We will add all the keys of the hash map to the heap and return the first ‘K’ elements of the heap.

Algorithm :

If ‘K’ is equal to ‘N’, return ‘ARR’.
Initialize a hash map ‘MAP’ where the key is the element and value is how often this element appears in ‘ARR’.
Iterate every element in ‘ARR’:
- Add the current element to ‘MAP’ and increase its value by 1.
Initialize a priority queue ‘HEAP’ where elements will be sorted in descending order according to the element’s frequency.
Add all the keys of ‘MAP’ in ‘HEAP’.
Initialize an array ‘ANS’.
Add first ‘K’ elements of ‘HEAP’ to ‘ANS’.
Return ‘ANS’.

Space Complexity: O(n)Explanation:

O(N).

The Hash map and the heap can have at most N elements. Hence, the total space complexity is O(N).

Time Complexity: O(nlogn)Explanation:

O(N * LOG(N)), where 'N' is the size of the input array.

Building hash map takes O(N) and, in the worst case, building heap will take O(N * LOG(N)) time as adding an element to heap takes LOG(N) time. Hence, the total time complexity is O(N * LOG(N)).

Java (SE 1.8)

/*
    Time Complexity: O(N * LOG(N))
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Queue;

public class Solution {

	public static int[] KMostFrequent(int n, int k, int[] arr) {

        if (k == n) {
			return arr;
		}

		Map map = new HashMap<>();

		// Build map where the key is element
		// and value is how often this element appears in 'ARR'.
		for (int ele : arr) {

			map.put(ele, map.getOrDefault(ele, 0) + 1);
		}

		// Elements in heap will be sorted in descending order
		// according to the frequency of the element.
		Queue heap = new PriorityQueue<>((n1, n2) -> map.get(n2) - map.get(n1));

		// Build heap of maximum size 'K'.
		for (int key : map.keySet()) {

			heap.add(key);
		}

		int[] ans = new int[k];

		// Build output array.
		for (int i = 0; i < k; i++) {

			ans[i] = heap.poll();
		}

		return ans;
	}

}

Python (3.5)

'''

    Time Complexity: O(N * LOG(N))

    Space Complexity: O(N)



    where 'N' is the size of the input array.

'''



from typing import List

from queue import PriorityQueue





def KMostFrequent(n: int, k: int, arr: List[int]) -> List[int]:



    if k == n:

        return arr



    mp = {}

    #  Build map where the key is element

    # and value is how often this element appears in 'ARR'.



    for ele in arr:

        if ele not in mp:

            mp[ele] = 0

        mp[ele] += 1



    # Elements in heap will be sorted in descending order

    # according to the frequency of the element.

    heap = PriorityQueue()



    # Build heap of maximum size 'K'.

    for x in mp:

        heap.put([-mp[x], -x])



    ans = [0 for i in range(k)]



    # Build output array.

    for i in range(k):

        ans[i] = -heap.get()[1]



    ans.sort()

    return ans

C++ (g++ 5.4)

/*
    Time Complexity: O(N * LOG(N))
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

#include
#include
#include

vector KMostFrequent(int n, int k, vector &arr)
{
    if (k == n) {
        return arr;
    }

    map mp;

    // Build map where the key is element
    // and value is how often this element appears in 'ARR'.
    for (int ele : arr) {

        mp[ele]++;
    }

    // Elements in heap will be sorted in descending order
    // according to the frequency of the element.
    priority_queue> heap;

    // Build heap of maximum size 'K'.
    for (auto x : mp) {
        heap.push({x.second, x.first});
    }

    vector ans(k);

    // Build output array.
    for (int i = 0; i < k; i++) {

        ans[i] = heap.top().second;
        heap.pop();
    }

    sort(ans.begin(), ans.end());
    return ans;
}

Accepted Answer

Bucket Sort

We can observe that frequency of any element in ‘ARR’ can be a minimum of one and maximum ‘N’. So, we can create N buckets and add each unique element in the respective bucket according to its frequency. For example, for any element having frequency X we go to BUCKET[X]. After adding all the unique elements in their respective buckets, the ‘K’ elements starting from the rightmost buckets will be our answer.

Algorithm :

Initialize a hash map ‘MAP’ where the key is the element and value is how often this element appears in ‘ARR’.
Iterate every element in ‘ARR’:
- Add the current element to ‘MAP’ and increase its value by 1.
Initialize an array of lists‘ BUCKETS’.
Add all the keys of the hash map in the respective according to their frequency.
Add 'K' elements to answer array ‘ANS’ starting from the rightmost bucket.
Return ‘ANS’.

Space Complexity: O(n)Explanation:

O(N).

The Hash map and the array of buckets elements can have at most N elements. Hence, the total space complexity is O(N).

Time Complexity: O(n)Explanation:

O(N), where 'N' is the size of the input array.

We traverse each element for a constant amount of time. Hence, the total time complexity is O(N).

Python (3.5)

'''

    Time Complexity: O(N)

    Space Complexity: O(N)



    where 'N' is the size of the input array.

'''



from typing import List





def KMostFrequent(n: int, k: int, arr: List[int]) -> List[int]:

    mp = {}



    # Build map where the key is element

    # and value is how often this element appears in 'ARR'.

    for ele in arr:

        if ele not in mp:

            mp[ele] = 0

        mp[ele] += 1



    bucket = [[] for i in range(n + 1)]



    for x in mp:

        freq = mp[x]



        bucket[freq].append(x)



    ans = [0 for i in range(k)]

    cur = 0



    # Add 'K' elements to answer array starting from the rightmost bucket.

    i = n

    while i > 0 and k > 0:

        if len(bucket[i]) == 0:

            i -= 1

            continue



        j = 0



        while j < len(bucket[i]):



            ans[cur] = bucket[i][j]

            cur += 1

            k -= 1

            if k == 0:

                break

            j += 1



        i -= 1



    ans.sort()

    return ans

C++ (g++ 5.4)

/*
    Time Complexity: O(N)
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

#include
#include

vector KMostFrequent(int n, int k, vector &arr)
{
    map mp;

    // Build map where the key is element
    // and value is how often this element appears in 'ARR'.
    for (int ele : arr) {

        mp[ele]++;
    }

    vector bucket[n+1];

    for (auto x : mp) {

        int freq = x.second;

        // Add in correct bucket.
        bucket[freq].push_back(x.first);
    }

    vector ans(k);
    int cur = 0;

    // Add 'K' elements to answer array starting from the rightmost bucket.
    for (int i = n; i > 0 && k > 0; i--) {

        if (bucket[i].size() == 0) {
            continue;
        }

        for (int num : bucket[i]) {

            ans[cur++] = num;
            k--;
            if(k == 0){
                break;
            }
        }
    }
    
    sort(ans.begin(), ans.end());
    return ans;
}

Java (SE 1.8)

/*
    Time Complexity: O(N)
    Space Complexity: O(N)

    where 'N' is the size of the input array.
*/

import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

public class Solution {

	public static int[] KMostFrequent(int n, int k, int[] arr) {

		Map map = new HashMap<>();

		// Build map where the key is element
		// and value is how often this element appears in 'ARR'.
		for (int ele : arr) {

			map.put(ele, map.getOrDefault(ele, 0) + 1);
		}

		List[] bucket = new List[n + 1];

		for (int key : map.keySet()) {

            int freq = map.get(key);

            if (bucket[freq] == null) {

                bucket[freq] = new LinkedList<>();
			}

            // Add in correct bucket.
			bucket[freq].add(key);
		}

		int[] ans = new int[k];
        int cur = 0;

        // Add 'K' elements to answer array starting from the rightmost bucket.
        for (int i = bucket.length - 1; i > 0 && k > 0; i--) {

            if (bucket[i] == null) {
                continue;
            }

            for (int num : bucket[i]) {

                ans[cur++] = num;
                k--;
            }
        }

        return ans;
    }
}

You are given an Integer array ‘ARR’ and an Integer ‘K’. Your task is to find the ‘K’ most frequent elements in ‘ARR’. Return the elements sorted in ascending order.

For Example:

Input Format :

Output Format:

Note:

Constraints:

Top Snapdeal Software Engineer interview questions & answers

Popular interview questions of Software Engineer

Top HR questions asked in Snapdeal Software Engineer