Parallel Quick Sort

7 min readDec 7, 2022

What is Parallel Programming?

The majority of practical algorithms depend on sorting as a fundamental building block. Large volumes of data need to be sorted in order for us to handle it effectively.

On serialised processors, quick sort is often implemented by carrying out each step one by one time until the work is finished and the programme ends. The CPU starts a single process that runs the code line by line.

In Parallel programming, a process is divided into concurrent processes that are run simultaneously on several threads of a processor. Coordination is needed in this case.

Why do we need parallel processing?

· Efficient

· Processes are finished in less time.

· The work load per processor is less as compare to sequential programming

In general, parallel processing is used in situations where performance is given precedence above expenses. It has been employed in the rendering of videos and mining of cryptocurrencies.

Why does this matter?

The majority of algorithms can be easily implemented using serial processors, however occasionally we may need to implement certain algorithms in parallel. Some widely used algorithms, particularly those with recursive nature (such as rapid sort), don’t work well with GPUs (parallel processors).

Recursive calls are recorded on a stack in serial CPUs, but there is no such thing in graphics processing units (GPUs), which instead emulate a huge contiguous memory by keeping track of the top of the stack’s pointer.

Sequential Quick Sort Algorithm

Find a random pivot p.
2. Elements less than pivot should be placed to the left of pivot, elements bigger than pivot to the right of pivot, and elements equal to pivot should be placed in the centre of the list. <p =p >p.

i.e. initialize i to the 1st element and j to the last one.
Increase the value of i until list[i] > pivot
Decrease the value of j until list[j] < pivot
Repeat the above steps until i > j.
Replace pivot element with list[j]

3.Iterate across each division.

4.When the list size reaches 1, the programme ends. The default situation is this. The partitions are now merged into a whole sorted list because they are in sorted order.

Each step of the mentioned algorithm is carried out sequentially by a single process. Before the next may begin, the previous one must be completed.

Analysis of sequential quick sort:

Time Complexity = O(nlogn).

Space Complexity = O(logn).

Approach 1: Naive Parallel Quick Sort

In this method, the algorithm initiates a process at each step to process the partitions simultaneously.

Steps.

Initiate a task to find pivot and divide the list in two parts, p < =p > p.
Start p processes in proportion to n partitions at each stage.
Each method locates the pivot component and splits the list according to the chosen pivot.

4. After merging the processed data, a sorted list is returned.

Analysis:

Time Complexity = O(n2)

Approach 2: Optimized Parallel Quick Sort

With this method, we alter a minor aspect of the number of procedures employed at each phase. This method finds the pivot element and rearranges the list using n number of processes throughout the whole algorithm rather than doubling the number of processes at each stage. At every stage of sorting the lists, all these operations are running simultaneously.

Steps.

1. Start n processes that will divide the list into sections and order it using the chosen pivot element.
2. From the beginning of the algorithm until the list is sorted, n processes will work on all partitions.
3. Each procedure locates a pivot and divides the list according to the chosen pivot.
4. The list is then combined to create a sorted list.

Code

Analysis of Parallel quick sort:

Process log(n) lists at each step in constant time O(1).

There are n processes, and the time required for parallel execution is O(logn). Time complexity overall is O(nlogn).

Although we have developed an algorithm that can run on parallel processors, which means it will operate much more quickly on a broader scale, the complexity has not changed from the sequential one.

Space Complexity = O(logn)

Approach 3: using sequential and parallel approaches at a same time

This method divides the initial list into n smaller sublists that are explicitly sent to p remote processors for parallel execution. Once the execution at a distributed processor is complete, the sorted sublist is sent back to the central processing node, which then combines the results from the processes to give a fully sorted list.

Steps:

1. A list of size n should be divided into a number of sublists that correspond to the number of processors that are available, p.

2. Create p threads based on the number of processors that are available.

3. Each of the p threads should get the sublist so that it contains n/p consecutive entries from the original list.

4. Randomly select a pivot element and broadcast it to all partner processes.

5. Each process will partition its elements and divide them into two groups according to the selected pivot. group1 <= pivot <= group2. This happens parallelly across all processes concurrently.

6. Each process in upper half of the process list sends its “low list” to a partner process in the lower half of the process list and it receives the “high list” in return.

7. Elements in the top half will have values larger than the pivot while those in the lower half will have values less than the pivot.

8. Each process has an unsorted sublist that is separate from the values of other processes after logP recursions.

Analysis:

In the method described above, p processes each task on n/p sub-list items for (logn) steps, each processing n number of list elements.

Time Complexity = O(N/P*logN)

Space Complexity = O(logn)

Note:

Pivot; Because the load balancing performance of this technique is subpar, we must select an appropriate median value as the algorithm’s pivot element in order to partition the list into at least equal partitions and preserve balance.

Combining Processes; By locating the beginning and end of each block and connecting them to the beginning of the following process, we may concatenate each block in the sequence of the processes.

Assuming a system has 64 threads, processing a list of size n in parallel up until the point when the sublist is sorted sequentially and merged is possible. As n rises, the size of the list that each thread can handle increases.

Time Complexity = O(logn)

Space Complexity = O(logn)

Approach 4: Hyperquicksort

This strategy outperforms the prior strategy. Previously, load balancing was a concern.

By progressively sorting the sublists using a single pivot that is broadcast to all processes at the start of the algorithm, this procedure increases the likelihood of discovering a real median.

Steps:

1. n processes split up a list of size n. Consider a 16-item list with 4 processes, each of which will handle 4 entries.

2. One of the four processes in charge of locating the pivot element selects a pivot and broadcasts it to all other processes, each of which progressively sorts its sublists using the broadcasted pivot element. Finding pivots close to the genuine median will be easier with the help of this step.

3. Repeat steps 4–6 from the previous method.

4. For each partner process, the received top half from the other partner process is combined with the remaining top half from the first partner process to create a local sublist.

5. To create a sorted list, repeat each subprocess’ top and bottom halves.

6. The procedures must then be combined to get a completely sorted list.

Note:

· The passing of values between partner processes results in communication overhead.

· Even though load imbalance may still happen, the method is an improvement over the old strategy, which was far less effective at load balancing.

Analysis:

Time Complexity = O(nlogn)

Space Complexity = O(logn)

Parallel quicksort by regular sampling

In this method, the algorithm first sequentially sorts the list before choosing a range of samples to be utilised for additional partitioning and element swapping in following steps.
Steps:

1. The initial list is split amongst n processes.

2. Each process use sequential quicksort to order its sublist.

3. Regular samples are chosen by each procedure from their sorted sublist.

4. The samples are collected, sorted, and broadcast to other processes via one process, which also broadcasts chosen pivots.

5. All processes split their sublists according to chosen pivots concurrently by using the selected pivots.

6. To exchange sorted values with other partner processes, processes interact.

7. To get a fully sorted list, the values from the sorted processes are combined.

Advantages of Parallel quicksort regular sampling

1. Though not perfect, load balance is improved.

2. There is no overhead when the same values are not repeatedly swapped.

3. Depending on the sequence in which the elements and pivots are chosen during the algorithm, certain processes may be freed up even though the number of processes is not a power of 2.

Analysis:

Time Complexity = O(nlogn).

Space Complexity = O(logn).

Summary.

In this Blog we have discussed 3 parallel quicksort Algorithms.

1. Parallel quicksort, efficiency = low.

2. Hyper quicksort, efficiency = better.

3. Parallel quicksort regular sampling, efficiency = best.

Parallel Quick Sort

Sequential Quick Sort Algorithm

Approach 1: Naive Parallel Quick Sort

Approach 2: Optimized Parallel Quick Sort

Steps.

Approach 3: using sequential and parallel approaches at a same time

Approach 4: Hyperquicksort

Parallel quicksort by regular sampling

Written by Rahul Mondal

No responses yet