This repository contains implementations of string sorting algorithms that do not rely solely on comparisons. It focuses on Key-Indexed Counting, which serves as the foundation for LSD and MSD Radix Sorts.
These algorithms exploit the structure of strings and characters to achieve better-than-comparison-based performance in suitable scenarios.
Key-Indexed Counting is a linear-time, stable sorting algorithm that works under specific constraints.
-
The key is a small integer
-
Keys come from a fixed, known range
Examples
-
Section numbers: 0–5
-
ASCII characters: 0–255
-
DNA bases: {A, C, G, T} → {0,1,2,3}
If the key range is small, we can use the key itself as an array index.
👉 This eliminates comparisons entirely.
Key-Indexed Counting proceeds in four dependent phases:
-
Count frequencies of each key
-
Compute cumulative counts (prefix sums)
-
Distribute records into their correct positions
-
Copy back to the original array
Each phase is simple, but correctness depends on performing them in this exact order.
Time Complexity: O(N + R)
Space Complexity: O(N + R)
where:
N = number of items
R = range of keys
In practice, when R ≪ N, this behaves as linear time.
-
✅ Stable
-
❌ Not comparison-based
-
🔧 Foundation for radix sorts (LSD and MSD)
LSD Radix Sort sorts strings by processing characters from the rightmost (least significant) position to the leftmost (most significant) position.
It uses Key-Indexed Counting as a subroutine at each character position.
-
All strings have the same fixed length W
-
Alphabet size R is known
-
Sort by the last character
-
Then by the second-last character
-
Continue until the first character
Each pass is stable, so earlier order is preserved.
Time Complexity: Θ(W · (N + R)) (usually written as Θ(WN) when R ≪ N)
Space Complexity: O(N + R)
where:
N = number of strings
W = length of each string
R = alphabet size
-
Always examines every character of every string
-
Data-independent performance
-
Simple and predictable
MSD Radix Sort processes strings from the leftmost (most significant) character to the right.
At each character position d, strings are grouped by that character. Once strings fall into different groups, their relative order is permanently fixed.
-
Strings that differ at position d never need to be compared again.
-
This directly mirrors the definition of lexicographic order.
-
Group strings by the character at position d
-
Recursively sort each group by position d + 1
-
Stop recursion when:
-
Subarray size is small
-
End of string is reached
-
-
Handling variable-length strings
Strings shorter than d are treated as having a special end-of-string marker that is smaller than any character.
-
High overhead for small subarrays
-
Large number of recursive calls
-
Costly initialization of counting arrays (especially for large alphabets like Unicode)
-
Cutoff to Insertion Sort for small subarrays (typically size ≤ 15 or 20)
-
Insertion sort compares strings starting at character d, skipping known-equal prefixes
Worst-case Time Complexity: Θ(W · (N + R)) (e.g., when all strings are identical)
Typical / Average Case: Θ(N + total characters examined) Often sublinear in W·N
Space Complexity: O(N + R)
where:
N = number of strings
W = maximum string length
R = alphabet size
-
Data-sensitive performance
-
Examines only as many characters as needed
-
Often much faster than comparison-based sorts on real data
This algorithm makes use of the quick sorting technique to partition the input array into parts strictly less than the d character, equal to d character, and greater than d character. The algorithm runs from the right(MSD) to left.
-
Take the first string and find its true place by partitioning the array.
-
3 partitions are formed: group with character strictly less than dth character of the first string, strictly equal to the dth character and strictly greater than the dth character.
-
Use recursion to sort the strings in different partitions.
-
It is cache-efficient unlike MSD sort.
-
It is in-place.
- It is not stable.
Time Complexity: 1.39 W N lg N Space Complexity: log N + W
| Algorithm | Direction | Time Complexity | Space | Stable? |
|---|---|---|---|---|
| Key-Indexed Counting | N/A | O(N + R) |
O(N + R) |
yes |
| LSD Radix Sort | Right → Left | O(WN) |
O(N + R) |
yes |
| MSD Radix Sort | Left → Right | Worst: Θ(WN)Typical: much less |
O(N + R) |
yes |
| 3-way string Quicksort | Left -> Right | 1.39 W N lg N Random: 1.39 N lg N | log N + W | no |