Hoare Partition¶
Hoare partition with strict condition¶
Let us consider the simplest version of the problem.
Given an array of integers, rearrange the elements such that the left part contains elements less than or equal to a pivot value and the right part contains values greater than the pivot value.
Consider this example array and a pivot value of 4 :
Example
[ 1 , 9, 3, 5, 7, 4, 8 ]
If this array were already partitioned s.t. the left partition has only elements less than or equal to the pivot. These would hold true :
Any element in left partition <= pivot
Any element in right partition > pivot
Pointers
[ 1, 3, 4, 5, 7, 8, 9 ]
L/T L/T L/T *L/F* # L = Left pointer, T = arr[left]<= pivot is True , F = Its False
*R/F* R/T R/T R/T R/T # R = right pointer, T = arr[right] > pivot is True, F = Its False
If left evaluates to True while its less than right, it would mean that this element is in the wrong partition. This also implies that there must be an element in the right partition which belongs to the left partition.
Pointers
[ 1 , 9, 3, 5, 7, 4, 8 ]
L/T **L/F** # L = Left pointer, T = arr[left]<= pivot is True, F = Its False
**R/F** R/T # R = right pointer, T = arr[right] > pivot is True, F = Its False
We can simply swap these and keep moving until left and right pass each other.
So, the algorithm would be:
Algorithm
#Initialize pivot = arr[0]
#Invariant 1) [0,right] only contains elements less than or equal to pivot
#Invariant 2) (right, len(arr) -1] only contains elements greater than pivot
While left <= right :
while left<=right and (Invaraint 1) is True :
left +=1
while left<=right and (Invariant 2) is True :
right -=1
if left < right :
swap left and right
return right
This implementation does not necessarily place the pivot in its sorted position.
Or rather, the problem is not asking us to do this,
This array is partitioned by 4 but 4 is not in its sorted position.
Example
[ 1, 4, 3, |5, 7, 8, 9 ]
We can put the pivot in its sorted position by swapping it with the right pointer after running the algorithm. A good way to handle this is :
- Ensure that the pivot at the lowest index.
- Run the partitioning algorithm on the rest of the array.
- Swap right with low.
But again, this will not group multiple instances of pivot together if pivot is duplicated. That is another problem called three way partitioning.
Here is the python implementation for both cases :
Code
from typing import List
#Given a list return partition index after partition on the pivot index value
def partition(nums: List[int],lo,hi) -> int :
"""
Hoare partition with strong condition.
Picks the first element as the pivot value.
There is no guarantee that pivot will be in its right place in this implementation.
All we get is that the first half has elements less than or equal to pivot.
Pivot can be any value in the array. (Pivot Value can actually be any arbitrary value in this version).
"""
pivot = nums[lo]
left,right = lo+1 ,hi
while left <= right :
while left<= right and nums[left] <= pivot :
left+=1
while left <=right and nums[right] > pivot :
right-=1
if left < right :
nums[left],nums[right] = nums[right],nums[left]
print(f"Pivot Value: {pivot} , Partitioned list : {nums}, Partition Index = {right}")
return right
if __name__ == "__main__" :
nums = [2,4,5,1,4,8,9]
partition(nums, 0, len(nums)-1)
nums = [2,2,2,2,2,2]
partition(nums, 0, len(nums)-1)
from typing import List
import random
# Given a list return partition index after partition on the pivot index value
def partition(nums: List[int],lo,hi) -> int :
"""
Hoare partition with strong condition.
Here we modififed the code slightly to:
Fix pivot value at index zero.
Partition by the pivot value.
Finally place the pivot at the end of the left partition.
This implementation puts pivot in the sorted position as well after partitioning.
This version is not entropy optimal . That is, it does not lead to balanced partitions when there are a lot of duplicates.
Why? Consider, if this partition scheme is used in quickselect for finding the smallest element in an array of size n).
What happens when all elements are duplicates (disregarding random pivot selection):
[1,1,1,1,1,1]
l r Iter 1 : 6 comparisons
l r Iter 2 : 5 comparisons
l r Iter 3 : 4 comparisons
l r Iter 4 : 3 comparisons
l r Iter 5 : 2 comparisons
lr Iter 6 : 1 comparisons
Number of comparisons = SUM([1.....n]) = n(n+1)/2 = (n*2 + n ) /2
Therefore, Complexity is O(n^2)
"""
print(f"Input List : {nums}")
left, right = lo,hi
# RANDOM PIVOT SELECTION
# Always keep pivot at index lo
# pivot_index = random.randint(lo,hi)
# nums[lo],nums[pivot_index] = nums[pivot_index],nums[lo]
#PARTITIONING
left+=1
pivot_val = nums[lo]
while left <= right :
while left <= right and nums[left] <= pivot_val :
left+=1
while left <= right and nums[right] > pivot_val :
right-=1
if left < right :
#swap
nums[left],nums[right] = nums[right],nums[left]
#Put pivot Value in sorted position
nums[lo] , nums[right] = nums[right] ,nums[lo]
print(f"Pivot Value: {pivot_val} ,Partitioned list : {nums}, Partition Index = {right}")
return right
if __name__ == "__main__" :
# nums= [1,2,3]
# partition(nums, 0, len(nums)-1)
# nums = [2,2,2,2]
# partition(nums, 0, len(nums)-1)
# nums = [2,4,5,1,4,8,9]
# partition(nums, 0, len(nums)-1)
# nums= [3,2]
# partition(nums, 0, len(nums)-1)
# nums= [3]
# partition(nums, 0, len(nums)-1)
nums = [5,2,1,1,1,1,1,1,1,1,1,5,5,-3,-2,-5]
partition(nums, 0, len(nums)-1)
Hoare partition with weak condition¶
The problem with the partition scheme with a strict condition (where we always put pivot values in one of the partitions) is that it produces unbalanced partitions when used in quickselect or quicksort.
Consider if the above partition scheme were used in quickselect for finding the smallest element in an array with k=0 and fixed pivot selection at arr[0].
What happens when all elements are duplicates :
Dry Run
l and r indicate the search space for each iteration of quickselect loop/each recursive call to quickselect. Comparisons refer to those made by the partitioning subroutine.
[1,1,1,1,1,1]
l r Iter 1 : 5 comparisons
l r Iter 2 : 4 comparisons
l r Iter 3 : 3 comparisons
l r Iter 4 : 2 comparisons
l r Iter 5 : 1 comparisons
lr Iter 6 : 0 comparisons
The Hoare partition, after performing r-l+1 comparisons, would return the rightmost position for each call recursive call to quickselect. r would reduce linearly until it meets l, thus giving a quadratic running time!
Number of comparisons = SUM([1.....n]) = n(n+1)/2 = (n*2 + n ) /2 Therefore, Complexity is O(n^2) in this case.
To mitigate this, we could slightly modify the definition of our partitions :
Any element in left partition <= pivot
Any element in right partition >= pivot
Invariants
Invariant 1) [low,right] only contains elements less than OR EQUAL to pivot
Invariant 2) (right, high] only contains elements greater OR EQUAL to pivot
That is, pivot values are allowed to be in either partition.
Our problem now becomes :
Reframed Problem
Given an array of integers, rearrange the elements such that the left part contains elements less than or equal to a pivot value and the right part contains values greater than or equal to the pivot value.
The implementation is tricky when handling values equal to pivot. We stop scanning when left or right are equal to pivot in either partition. We swap and move the left and right pointers. After any swap, both partitions will increase by 1. Pivot values might end up in any partition. This will also be the case when, left and right are both pointing to pivot value i.e. both partitions increase by 1 and there is a redundant swap between two pivot values.
Algorithm
While left <= right :
while left<=right and left < pivot :
left +=1
while left<=right and right > pivot:
right -=1
if left <= right :
swap left and right
left+=1
right-=1
return right
Let us see the behavior of quickselect with this scheme for the same case :
Dry Run
[1,1,1,1,1,1]
l r Iter 1 : 5 comparisons
l r Iter 2 : 3 comparisons; This iteration returns r=0 and ends the loop
rl Iter 2 END
Now the hoare partitioning subroutine will reduce r by half in each iteration. And quickselect is called LOG(n) times in the worst case, when k = 0. The complexity is reduced to LOG(n)*LOG(n).
Here is the implementation :
Code
from typing import List
import random
def partition(arr, lo, hi):
"""
Entropy Optimal Hoare partition.
Produces balanced partitions when there are large number of duplicates.
[lo,right] contains elements less than or equal to pivot.
(right,hi] contains elements greater than or equal to pivot.
Invariants for the while loop :
[lo] has pivot
[lo, left) <= pivot #Has values <= pivot
(right, hi] >= pivot
[pivot|--- <=pivot-----|-----Undetermined-------|--->=pivot----]
left right
After execution of while loop :
[pivot|----<=pivot------|----->=pivot------]
lo right hi
[lo,right] <= pivot
After Putting pivot in sorted position :
[----<=pivot----|pivot|----->=pivot------]
lo right hi
Finally, return right.
"""
#PIVOT SELECTION
#Pick a random pivot index and always keep pivot at index lo
#NB: random.randint(0,0) is 0.
pivot_index = random.randint(lo,hi)
arr[lo],arr[pivot_index] = arr[pivot_index],arr[lo]
#read pivot value
pivot = arr[lo]
#PARTITIONING
#partition [lo+1,hi] ;
#NB : when lo == hi , while loop will not be executed
left,right = lo+1, hi
while left<=right:
#Move left ahead if arr[left] is strictly less than pivot value
while left <= right and arr[left] < pivot :
left+=1
#Move right to the left if it is strictly higher than pivot
while left <= right and arr[right] > pivot :
right-=1
#Swap left and right and move pointers
#If both values are equal to pivot this will do a swap,move pointers and effectively leave pivot values where they are.
if left <= right :
arr[left], arr[right] = arr[right], arr[left]
right-=1
left+=1
#Put pivot in sorted position
arr[lo], arr[right] = arr[right], arr[lo]
print(f"Pivot Value: {pivot} ,Partitioned list : {arr}, Partition Index = {right}")
return right
if __name__ == "__main__" :
# nums= [1,2,3]
# partition(nums, 0, len(nums)-1)
# nums = [2,2,2,2]
# partition(nums, 0, len(nums)-1)
# nums = [2,4,5,1,4,8,9]
# partition(nums, 0, len(nums)-1)
# nums= [3,2]
# partition(nums, 0, len(nums)-1)
# nums= [3]
# partition(nums, 0, len(nums)-1)
nums = [5,2,1,1,1,1,1,1,1,1,1,5,5,-3,-2,-5]
partition(nums, 0, len(nums)-1)
Important things to Note:¶
-
This algorithm is not stable. The relative order of elements will not be preserved. Naive partitioning using extra space is the only algorithm which preserves relative order of elements.
-
When used in quicksort/quickselect, instead of fixed pivot selection either random pivot selection or selection algorithms like Median of Medians are used to pick the pivot.