String Sorting Algorithms

This repository contains implementations of string sorting algorithms that do not rely solely on comparisons. It focuses on Key-Indexed Counting, which serves as the foundation for LSD and MSD Radix Sorts.

These algorithms exploit the structure of strings and characters to achieve better-than-comparison-based performance in suitable scenarios.

Key-Indexed Counting (Keyed-Index Sort)

Key-Indexed Counting is a linear-time, stable sorting algorithm that works under specific constraints.

When it works

The key is a small integer
Keys come from a fixed, known range

Examples

Section numbers: 0–5
ASCII characters: 0–255
DNA bases: {A, C, G, T} → {0,1,2,3}

Core idea

If the key range is small, we can use the key itself as an array index.

👉 This eliminates comparisons entirely.

Algorithm phases

Key-Indexed Counting proceeds in four dependent phases:

Count frequencies of each key
Compute cumulative counts (prefix sums)
Distribute records into their correct positions
Copy back to the original array

Each phase is simple, but correctness depends on performing them in this exact order.

Complexity

Time Complexity: O(N + R)

Space Complexity: O(N + R)

where:

N = number of items

R = range of keys

In practice, when R ≪ N, this behaves as linear time.

Properties

✅ Stable
❌ Not comparison-based
🔧 Foundation for radix sorts (LSD and MSD)

LSD Radix Sort (Least Significant Digit)

LSD Radix Sort sorts strings by processing characters from the rightmost (least significant) position to the leftmost (most significant) position.

It uses Key-Indexed Counting as a subroutine at each character position.

Assumptions

All strings have the same fixed length W
Alphabet size R is known

How it works

Sort by the last character
Then by the second-last character
Continue until the first character

Each pass is stable, so earlier order is preserved.

Complexity

Time Complexity: Θ(W · (N + R)) (usually written as Θ(WN) when R ≪ N)

Space Complexity: O(N + R)

where:

N = number of strings

W = length of each string

R = alphabet size

Characteristics

Always examines every character of every string
Data-independent performance
Simple and predictable

MSD Radix Sort (Most Significant Digit)

MSD Radix Sort processes strings from the leftmost (most significant) character to the right.

At each character position d, strings are grouped by that character. Once strings fall into different groups, their relative order is permanently fixed.

Key idea

Strings that differ at position d never need to be compared again.
This directly mirrors the definition of lexicographic order.

How it works

Group strings by the character at position d
Recursively sort each group by position d + 1
Stop recursion when:
- Subarray size is small
- End of string is reached
Handling variable-length strings

Strings shorter than d are treated as having a special end-of-string marker that is smaller than any character.

Limitations

High overhead for small subarrays
Large number of recursive calls
Costly initialization of counting arrays (especially for large alphabets like Unicode)

Practical optimizations

Cutoff to Insertion Sort for small subarrays (typically size ≤ 15 or 20)
Insertion sort compares strings starting at character d, skipping known-equal prefixes

Complexity

Worst-case Time Complexity: Θ(W · (N + R)) (e.g., when all strings are identical)

Typical / Average Case: Θ(N + total characters examined) Often sublinear in W·N

Space Complexity: O(N + R)

where:

N = number of strings

W = maximum string length

R = alphabet size

Characteristics

Data-sensitive performance
Examines only as many characters as needed
Often much faster than comparison-based sorts on real data

3-way String Quicksort

This algorithm makes use of the quick sorting technique to partition the input array into parts strictly less than the d character, equal to d character, and greater than d character. The algorithm runs from the right(MSD) to left.

How it works?

Take the first string and find its true place by partitioning the array.
3 partitions are formed: group with character strictly less than dth character of the first string, strictly equal to the dth character and strictly greater than the dth character.
Use recursion to sort the strings in different partitions.

Advantages

It is cache-efficient unlike MSD sort.
It is in-place.

Limitations

It is not stable.

Complexity

Time Complexity: 1.39 W N lg N Space Complexity: log N + W

Summary Comparison

Algorithm	Direction	Time Complexity	Space	Stable?
Key-Indexed Counting	N/A	`O(N + R)`	`O(N + R)`	yes
LSD Radix Sort	Right → Left	`O(WN)`	`O(N + R)`	yes
MSD Radix Sort	Left → Right	Worst: `Θ(WN)` Typical: much less	`O(N + R)`	yes
3-way string Quicksort	Left -> Right	1.39 W N lg N Random: 1.39 N lg N	log N + W	no

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
KeyedIndexSort.java		KeyedIndexSort.java
LSD.java		LSD.java
LongestRepeatedSubstring.java		LongestRepeatedSubstring.java
MSD.java		MSD.java
MSDPure.java		MSDPure.java
README.md		README.md
ThreeWayStringQuicksort.java		ThreeWayStringQuicksort.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

String Sorting Algorithms

Key-Indexed Counting (Keyed-Index Sort)

When it works

Core idea

Algorithm phases

Complexity

Properties

LSD Radix Sort (Least Significant Digit)

Assumptions

How it works

Complexity

Characteristics

MSD Radix Sort (Most Significant Digit)

Key idea

How it works

Limitations

Practical optimizations

Complexity

Characteristics

3-way String Quicksort

How it works?

Advantages

Limitations

Complexity

Summary Comparison

About

Uh oh!

Languages

kodi73/String-Sorting

Folders and files

Latest commit

History

Repository files navigation

String Sorting Algorithms

Key-Indexed Counting (Keyed-Index Sort)

When it works

Core idea

Algorithm phases

Complexity

Properties

LSD Radix Sort (Least Significant Digit)

Assumptions

How it works

Complexity

Characteristics

MSD Radix Sort (Most Significant Digit)

Key idea

How it works

Limitations

Practical optimizations

Complexity

Characteristics

3-way String Quicksort

How it works?

Advantages

Limitations

Complexity

Summary Comparison

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages