QuickSelect¶
Defining the problem.¶
Given an array, find the sorted value in non decreasing order at the kth index (Lets assume indices are counted from zero).
In other words what would be the value at the kth index if the array were sorted in non decreasing order?
This is also called the kth order statistic.
The Algorithm¶
The core of the quickselect algorithm is the partitioning scheme. There are different ways of implementing quickselet based on whether the partitioning scheme puts the pivot in its sorted position or not. Here we will consider only the implementation which uses partition schemes which do, because this is simpler and more intuitive.
Assume, you have a partitioning scheme which returns some partition index / pivot index s.t. the value at the pivot index is in its sorted position. The quickeselect algorithm makes one recursive call based on the relative position of k to the pivot index. The base case is when the pivot index is k.
Algorithm
def quickselect(arr,lo,hi, k ) :
p = partition(arr,lo,hi)
if p == k :
return arr[k]
elif k< p :
return quickselect(arr,lo,p-1,k)
else :
return quickselect(arr,p+1,hi,k)
Here is the implementation using Hoare partitionining.Notice the difference that random pivot selection makes in the comments :
Code
from typing import List
import random
def quickselect(arr: List[int],lo: int,hi: int,k : int) -> int :
"""
Quickselect using Hoare partition with a weak condition and fixed pivot.
This is the implementation which you will find in textbooks using do..while style loops which in python becomes while True loops.
I prefer avoiding white True loops.
It's still using Hoare partition with a weak condition. But the pivot selection is not random.
Notice how the performance degrades to O(n^2) for sorted input array because the pivot selection is not random.
Compare this with next version which uses a random pivot.
"""
v = arr[lo]
i = lo
j = hi+1
while True:
while True:
i += 1
if not (i < hi and arr[i] < v):
break
while True:
j -= 1
if not (j > lo and arr[j] > v):
break
if i >= j:
break
arr[i], arr[j] = arr[j], arr[i]
arr[lo], arr[j] = arr[j], arr[lo]
print(f"Pivot Value: {v} ,Partitioned list : {arr}, Partition Index = {j} , le = {lo}, hi = {hi}, k = {1}")
if k -1 == j :
print(f"Returning index {j}")
return nums[j]
elif k -1 > j :
return quickselect(nums,j+1,hi,k)
else :
return quickselect(nums,lo,j-1,k)
if __name__ == "__main__" :
# nums = [5,2,1,1,1,1,1,1,1,1,1,5,5,-3,1,1,1,1,1,1,1,1,1,1,1-2,-5] # O(nlogn)
nums = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] # O(nlogn)
# nums = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] #degrades to O(n^2)
# nums = [24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1] #O(nlogn)
print(
f"kth_largest = { quickselect(nums, 0, len(nums)-1 ,23) }"
)
# nums = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] # O(nlogn)
# print(
# f"kth_largest = { quickselect(nums, 0, len(nums)-1 ,1) }"
# )
from typing import List
import random
def hoare_partition(arr, lo, hi):
"""
Entropy Optimal Hoare partition.
Produces balanced partitions when there are large number of duplicates.
[lo,right] contains elements less than or equal to pivot.
(right,hi] contains elements greater than or equal to pivot.
Invariants for the while loop :
[lo] has pivot
[lo, left) <= pivot #Has values <= pivot
(right, hi] >= pivot
[pivot|--- <=pivot-----|-----Undetermined-------|--->=pivot----]
left right
After execution of while loop :
[pivot|----<=pivot------|----->=pivot------]
lo right hi
[lo,right] <= pivot
After Putting pivot in sorted position :
[----<=pivot----|pivot|----->=pivot------]
lo right hi
Finally, return right.
"""
#PIVOT SELECTION
#Pick a random pivot index and always keep pivot at index lo
#NB: random.randint(0,0) is 0.
pivot_index = random.randint(lo,hi)
arr[lo],arr[pivot_index] = arr[pivot_index],arr[lo]
#read pivot value
pivot = arr[lo]
#PARTITIONING
#partition [lo+1,hi] ;
#NB : when lo == hi , while loop will not be executed
left,right = lo+1, hi
while left<=right:
#Move left ahead if arr[left] is strictly less than pivot value
while left <= right and arr[left] < pivot :
left+=1
#Move right to the left if it is strictly higher than pivot
while left <= right and arr[right] > pivot :
right-=1
#Swap left and right and move pointers
#If both values are equal to pivot this will do a swap,move pointers and effectively leave pivot values where they are.
if left <= right :
arr[left], arr[right] = arr[right], arr[left]
right-=1
left+=1
#Put pivot in sorted position
arr[lo], arr[right] = arr[right], arr[lo]
return right
def quickselect(arr: List[int],lo: int,hi: int,k : int) -> int :
"""
Quicselect using Hoare partition with weak condition and random pivot.
Return vlaue at Kth SMALLEST Index,
Returns the non decreasingly sorted value at kth Index with indices starting from lo. [lo ,hi] is inclusive.
This uses Hoare's partition scheme with a weak condition and which also puts pivot in its sorted position.
"""
pivot_index = hoare_partition(arr,lo,hi)
print(f"Pivot Value: {arr[pivot_index]} ,Partitioned list : {arr}, Partition Index = {pivot_index} , le = {lo}, hi = {hi}, k = {k}")
if k == pivot_index :
return arr[k]
if k < pivot_index :
return quickselect(arr,lo, pivot_index-1,k)
elif k > pivot_index :
return quickselect(arr,pivot_index+1,hi,k)
if __name__ == "__main__" :
# nums = [5,2,1,1,1,1,1,1,1,1,1,5,5,-3,1,1,1,1,1,1,1,1,1,1,1-2,-5] # O(nlogn)
# nums = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] # Close to O(nlogn)
nums = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] #STill close to O(nlogn) . WIthout random pivot degrades to O(n^2)
# nums = [24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1] #O(nlogn)
print(
f"kth_largest = { quickselect(nums, 0, len(nums)-1 ,23) }"
)