What is Mergesort?
Introduction to Mergesort
Mergesort is a divide-and-conquer algorithm that divides the unsorted list into n sublists, each containing one element (a list of one element is considered sorted), and then repeatedly merges sublists to produce newly sorted sublists until there is only one sublist remaining. This final merged list is the sorted list. The algorithm was invented by John von Neumann in 1945.
Key Characteristics
- Divide-and-Conquer Strategy: Breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly.
- Stable Sorting Algorithm: Maintains the relative order of records with equal keys (i.e., values).
- Non-in-place Algorithm: Requires additional memory space proportional to the input size for temporary storage during the merge process.
How Mergesort Works
The Mergesort algorithm operates as follows:
- Divide: If the list is of length 0 or 1, it is already sorted. Otherwise:
- Divide the unsorted list into two roughly equal halves.
- Conquer: Recursively sort both halves.
- Combine: Merge the two sorted halves back into a single sorted list.
Detailed Steps
Step 1: Divide
The list is recursively divided into two halves until each sublist contains a single element. Since a list of one element is, by definition, sorted, this step ultimately produces a set of sorted lists.
Step 2: Conquer
Each half of the list is sorted independently using recursive calls to Mergesort. The base case for the recursion is a list of one element.
Step 3: Combine
The core operation of Mergesort is the merging of two sorted lists. This is done by comparing the smallest elements of each list and moving the smaller one to the output list. The process repeats until all elements have been moved to the output list.
Example Merge Process
Given two sorted lists [2, 5, 7]
and [1, 3, 8]
, the merge process would work as follows:
- Compare
2
and1
. Move1
to the output list. - Compare
2
and3
. Move2
to the output list. - Compare
5
and3
. Move3
to the output list. - Compare
5
and8
. Move5
to the output list. - Compare
7
and8
. Move7
to the output list. - Move the last element
8
to the output list.
The resulting merged list is [1, 2, 3, 5, 7, 8]
.
Pseudocode
function mergesort(array)
if length(array) <= 1
return array
mid = length(array) / 2
left_half = mergesort(subarray from 0 to mid - 1)
right_half = mergesort(subarray from mid to end)
return merge(left_half, right_half)
function merge(left, right)
result = []
while not empty(left) and not empty(right)
if first(left) <= first(right)
append first(left) to result
remove first(left)
else
append first(right) to result
remove first(right)
// At least one of left and right may still contain elements.
// Append the remaining elements.
append remaining elements of left to result
append remaining elements of right to result
return result
Time Complexity
- Best Case: O(n log n)
- Average Case: O(n log n)
- Worst Case: O(n log n)
Where n
is the number of elements in the list. Mergesort's time complexity is logarithmic due to the divide step and linear because of the merge step, making it highly efficient on large datasets.
Space Complexity
- Space Complexity: O(n)
Mergesort requires additional space for the temporary arrays used during the merge process, which is proportional to the size of the input array.
Advantages and Disadvantages
Advantages
- Guaranteed Performance: Always performs at O(n log n), making it predictable and reliable.
- Stability: Preserves the relative order of equal elements.
- Efficiency on Large Datasets: Particularly effective for sorting linked lists and in situations where data cannot fit into main memory.
Disadvantages
- Extra Space Requirement: Needs additional memory for the temporary arrays.
- Not In-place: Unlike some other sorting algorithms, such as quicksort, Mergesort does not sort in place.
Applications
- External Sorting: Suitable for sorting data sets that are too large to fit into memory.
- Linked List Sorting: Efficiently sorts linked lists without requiring extra space for indexing.
- Parallel Processing: Can be easily parallelized because of its divide-and-conquer nature.
Conclusion
Mergesort is a powerful and efficient sorting algorithm that leverages the divide-and-conquer strategy to provide stable sorting with guaranteed performance. Despite requiring additional space, it remains an excellent choice for sorting large datasets and is widely used in various applications, including external sorting and parallel processing environments.