Fast and stable sort algorithm that uses O(1) memory. Public domain.

Last update: Jan 1, 2023

Related tags

Overview

WikiSort

WikiSort is an implementation of "block merge sort", which is a stable merge sort based on the work described in "Ratio based stable in-place merging", by Pok-Son Kim and Arne Kutzner [PDF]. It's generally as fast as a standard merge sort while using O(1) memory, and can be modified to use additional memory optionally provided to it which can further improve its speed.

C, C++, and Java versions are currently available, and you have permission from me and the authors of the paper (Dr. Kim and Dr. Kutzner) to do whatever you want with this code.

Related: Check out the GrailSort project for a similar algorithm based on a paper by Huang and Langston, or the Rewritten Grailsort project which continues its work.

If you want to learn how it works, check out the documentation:
  • Chapter 1: Tools
  • Chapter 2: Merging
  • Chapter 3: In-Place
  • Chapter 4: Faster!

Or you can check out the Wikipedia page for block merge sort. WikiSort was actually (poorly) named out of the hope that you'll update and improve its various components almost like a Wikipedia article, since this is still very much an open area of research that could use your expertise!

WikiSort vs. std::stable_sort()
^{(clang++ version 3.2, sorting 0 to 1.5 million items)}

Using a 512-item fixed-size cache for O(1) memory:

Test             Fast comparisons   Slow comparisons   150,000,000 items    0-32 items
Random               6% faster        95% as fast         35% faster        45% faster
RandomFew            5% faster        16% faster          20% faster        45% faster
MostlyDescending    97% as fast       13% faster          99% as fast       53% faster
MostlyAscending    149% faster       117% faster         286% faster        47% faster
Ascending         1280% faster       518% faster        1101% faster       242% faster
Descending          23% faster       121% faster          12% faster       164% faster
Equal             1202% faster       418% faster        1031% faster       227% faster
Jittered           526% faster       298% faster         733% faster        70% faster
MostlyEqual         15% faster        57% faster          10% faster        42% faster
Append             153% faster        90% faster         348% faster       112% faster

Using a dynamically allocated half-size cache:

Test             Fast comparisons   Slow comparisons
Random              11% faster         3% faster
RandomFew           10% faster         5% faster
MostlyDescending    19% faster        26% faster
MostlyAscending     98% faster        79% faster
Ascending          861% faster       463% faster
Descending          39% faster       142% faster
Equal              837% faster       460% faster
Jittered           326% faster       243% faster
MostlyEqual         15% faster         2% faster
Append             159% faster        94% faster

Comments

C++ implementation is broken for non-PODs

The C++ implementation relies on std::memcpy and std::memmove in its implementation. Unfortunately, this is undefined behaviour unless the argument type passed to the functions are pointers to POD types.

opened by klmr 11
Random-access iterators support for the C++ sort.
The C++ version of WikiSort currently only works with contiguous iterators (random-access iterators iterating over a contiguous chunk of memory), which is ok for std::vector but doesn't which makes the algorithm fail with std::deque even though the code compiles. This pull request makes the code more iterator-friendly and ensures that it works with std::deque and any random-access container.

Since I was doing big modifications to the C++ code, I also improved its correctness. There are now less warnings and it's more likey to work with more compilers and to avoid some errors. Here is what has changed besides the generic random-access iterators support:

I replaced the uint8_t by an int. It doesn't make a performance difference and makes the code compile with g++ in pre-C++98 mode.

I replaced calls to std::swap by calls to std::iter_swap so that optimized ALD-found swap functions can be used.

I fully qualified library functions and types with std::. Some compilers are stricter and won't accept size_t without the std:: prefix by default in C++ mode.

I reimplemented FloorPowerOfTwo so that it is more generic and renamed it to Hyperfloor. The new version works with unsigned integers of any size.

I made sure that the macros SWAP and PULL are #undefed once we don't need them anymore.

I const-qualified some functions in Iterator so that they can also eventually be used in a const context.

I used a constructor initialization list in Iterator. I don't think it will improve performance, but it cannot pessimize it and it makes the code a bit cleaner.

I replaced some functions by direct calls to standard library functions std::rotate, std::swap_ranges, std::merge, std::upper_bound, std::lower_bound and std::reverse.

And that's pretty much it. My IDE strips the useless blank spaces at the end of lines, which makes the diff a bit bigger than it should, but don't mind it.
opened by Morwenn 7
Needs faster sorting for small data sets

Finally ready to start caring about how well WikiSort performs for small data sets. The worst case was obviously going to be Testing::Descending with slow comparisons, as that turns the InsertionSort call into the O(n^2) worst case. And yes, it's a LOT slower than std::stable_sort() at the moment – about 5x, actually.

opened by BonzaiThePenguin 7
Create a makefile and gen directory for classes/objects
Right now, class files and objects are compiled manually into the root directory. This pull request comes with a makefile and a new directory gen. The object and class files generated by make are placed in gen. Everything in gen is ignored by git (see gen/.gitignore).

To compile all three languages, run make or make wiki from the root directory.

To compile (if needed) and run the tests for java, c, or cpp, run make {lang}.

You may want to move the readme instructions higher up. But, it's up to you. Also, the c compilation command is slightly different since I needed to add the gcc -lm flag. Mileage may vary.
opened by tbpalsulich 6
Ignore File, Java Printing, and Verify Cleanup

Created an ignore file to ignore classes and a.out.

Use System.out.format to print floats like seconds or percentages to 2 decimal places.

Updated verify method to check the Wiki sorted array against the Merge sorted array. Before, the same for loop was repeated after both verify() calls.

opened by tbpalsulich 4
Merging many repeated values

Just remembered that the block-based merge requires sqrt(n) unique values to exist, so an array with 1,000,000 items needs at least 1,000 unique values. The original paper offers a few techniques to deal with failing to extract enough unique values, but they have not been implemented yet. Looking into it now...

opened by BonzaiThePenguin 4
Copy contiguous areas of memory at once

If I understand correctly, in the part I modified, A1.end == B1.start, A2.end == B2.start and A3.end == B3.start, which means than when we copy such contiguous iterators, we can simply copy for example [A1.start, B1.end) at once instead of copying [A1.start, A1.end) then [B1.start, B1.end). While it doesn't change anything from an algorithmic point of view, it may end up in one call to std::memcpy instead of two for small types, which may be more optimized.

This commit does exactly that. All the tests passed successfully.

opened by Morwenn 2
Reducing the number of comparisons

Finally getting around to replacing those linear searches with something more intelligent, to drastically lower the number of comparisons needed when there aren't very many unique values within the array. Right now even after a few comparison optimizations it's still using 30% more compares than __inplace_stable_sort() in one test case. Should be able to do a lot better!
enhancement

opened by BonzaiThePenguin 2
Performance Percentage Meanings

When comparing the performance of the two sorts, the seconds percentage uses the ratio time2/time1, while the compares percentage uses compares1/compares2. Shouldn't these two match? Should they be comparing Wiki to Merge, or Merge to Wiki?

On another note, would it be beneficial to time the test cases?

opened by tbpalsulich 2
In the C implementation, is Test::index required?
In the C implementation, the type is defined as typedef struct { int value, index; } Test. Is Test::index required or is it just for the testing purpose? If we are allowed to add an index field to the array, it is trivial to achieve stable sort with any sorting algorithms, by using a comparison function

#define stable_lt(a,b) ((a).value < (b).value || ((a).value==(b).value \ && (a).index<(b).index))

to break any ties, but this is not the true in-place stable sort.
opened by lh3 2
Consistently use std::iter_swap to performs the swaps.

I wanted to add a block sort to my sorting library and WikiSort seemed like the perfect implementation, so I took it and improved some things on-the-fly. While some things are not really compatible with C++03, this pull request is fully compatible with C++03.

Basically, I simply replaced every call to std::swap by a call to std::iter_swap. The latter uses ADL to check whether the type to be swapped has a swap function in its namespace. If such a function exists, it uses it, otherwise it falls back to std::swap. While it doesn't change the correctness of the program, ADL-found swaps are generally optimized swaps for some classes, which means that WikiSort could end up being faster for free for some types while still being as fast as before for the other ones.

As a side note, my IDE strips all the useless spaces before a newline when I save, but removing useless spaces can't be a bad thing, right?

opened by Morwenn 1

Description of reverse

Thank you for publishing your work on this algorithm. It is very fascinating. I am reading Chapter 1. I would just like to get some clarification about the reverse method. I know that the method is simple, but does this really do what it says it does? I have written a small example to test it.

public class WikiSortTest{

    static class Range{
        public int start;
        public int end;

        public Range(int start1, int end1){
            start = start1;
            end = end1;
        }

        public Range(){
            start =0;
            end =0;
        }

        void set(int start1, int end1){
            start = start1;
            end = end1;
        }

        int length(){
            return end-start;
        }

    }

    static void swap(int[] A, int i, int j){
        int temp = A[i];
        A[i] = A[j];
        A[j] = temp;
    }

    static void reverse(int[] A, Range range){
        for(int index=range.length()/2 -1; index >=0; index--){
            swap(A,A[range.start+index],A[range.end-index-1]);
        }
    }

    static void printArray(int[] A){
        System.out.println();
        for(int i=0; i < A.length; i++){
            System.out.printf(A[i]+" | ");
        }
        System.out.println();
    }

    public static void main(String[] args){
        int[] A = new int[]{0,1,2,3,4};
        printArray(A);
        Range range = new Range(0,3);
        reverse(A,range);
        printArray(A);
    }
}

The output from this program is

PS D:\Algorithm Analysis> javac WikiSortTest.java
PS D:\Algorithm Analysis> java WikiSortTest      

0 | 1 | 2 | 3 | 4 | 

2 | 1 | 0 | 3 | 4 |

Is this to be expected?

opened by EvanGertis 1

Looking to improve Grailsort with ideas from Wiki; help wanted!!
Hello, Bonzai!

I've been quite interested in Wikisort and Block (Merge) Sort in general for a while now. Back in AP Comp Sci, you could say I was a bit floored when I found out a stable, O(1) space, and worst-case O(n log n) time sort actually existed. :P

I've also done a bit of personal studying into sorts, and forked over a decently popular "algorithm animation" program, which I've been working on for quite some time now! In fact, here are some videos I made regarding Wikisort (Feel free to use them!): WikiSort - Bar Graph: https://youtu.be/ciQG_uUG6O4 WikiSort - Scatter Plot: https://youtu.be/mMr4b0Yg4yg WikiSort - Color Circle: https://youtu.be/OWxuXJQ3Guw Wikisorting over 16k Numbers: https://youtu.be/X3SyGvfj1d8

I've been doing a lot of research on and off not only into Wikisort, but also Andrey Astrelin's Grailsort (I saw he talked to you in another post), which turns out to be a bit of a different implementation of Block Merge Sort. What it sacrifices in terms of adaptability in some places (best-case O(n log n) time instead of Wikisort's O(n) best-case), it makes up with raw speed in other cases (I would wager Grailsort's "Build Blocks" function is one of the fastest merge sorts out there).

I've done some visualizations of the sort here; hopefully they demonstrate the major differences between Wiki and Grail: Grail Sorting 2,048 Integers: https://youtu.be/LJbXsV_qGbs GrailSort Redistributing Its Internal Buffer: https://youtu.be/8SV18oH8kPc A Detailed Visual of GrailSort: https://youtu.be/U29NB96Y-9w Grailsorting over 16k numbers: https://youtu.be/kZJ7h307bcQ

I also have some refactored repos for Grailsort. They're rough drafts, but hopefully the code is easier to read than the original: Java Version: https://github.com/MusicTheorist/Grail-Sorting-for-Java C/C++ Version: https://github.com/MusicTheorist/Grail-Sort-Refactored

I'm also looking to create some documentation for both Grail and Block Merge Sort to make them much more intuitive, like what you've done with your docs.

I basically agree with Mr. Astrelin on the major changes between Grail and Wiki: "[Grail] mainly differs by swapping blocks and their tags in parallel before merging, and shifting the position of an internal buffer used for locally merging/appending portions of an array".

While Grailsort is a great algorithm, I do see some significant room for improvement, and I thought I would ask you for some help if you're interested! There are definitely aspects of Wikisort I think Grailsort would benefit from, and hopefully a conversation would help to make sure I understand Blocksort alright!

My questions for now would be:

What do you consider the fastest/most efficient step(s) in Wikisort? What about the slowest/least efficient?

I saw one of your closed issues asking if a backwards merge was possible. Is it, and even if it is, would it possibly break stability?

What was your strategy for getting Wikisort to be adaptive?

Do you have any updates on why Java's JIT compiler slows down Wikisort?

Are there any important edge cases to watch out for when reordering/swapping A and B blocks?

What's the complexity of your "extract keys" method? Grailsort's "find keys" method alone is O(n log n) comparisons in the worst-case, i.e. when there are less than 2 * sqrt(n) unique keys in the array.

Feel free to answer whenever and however detailed you want. Hope you're staying well during the pandemic!

Thanks for reading!!

John
opened by MusicTheorist 2
Getting a crash in the Rotate function

The line that says Rotate(array, Range_length(range) - count, range, cache, cache_size); calls into memmove(&array[range2.end - Range_length(range1)], &array[range1.start], Range_length(range1) * sizeof(array[0])); with a range where the end is less than the beginning, so it causes a crash. Any ideas why? I'm building for the iOS architecture, and on there it appears that size_t is unsigned, is it signed for you?

opened by michaeleisel 3
Gah, why is the Java version so slow?

This has been bothering me for a while, but the Java version is nowhere near the performance of a standard merge sort. I spent most of yesterday removing all class allocations (no Range, no Pull, and no Iterator), then when that didn't help I started ripping out chunks of code bit by bit, and eventually I got it down to a standard bottom-up merge sort and realized it was still a lot slower than a standard recursive merge sort. This doesn't make any sense.

There has to be something weird about this Java VM's internal design that causes it to not like the way the items in the array are being accessed, or something. The recursive version always accesses elements in the same areas, but the iterative version iterates over the array log(n) times. The number of accesses are no different, but the order of the accesses changes quite a bit. In the C version the iterative merge sort is much faster than the recursive one, which is what I would expect.

Maybe Java has its own caching system running behind the scenes, separately from any bare metal hardware caching...? Or maybe it isn't communicating its access intentions to the system properly, resulting in a ton of cache misses?

Anyway, I asked about it on Stack Overflow here: http://stackoverflow.com/questions/23121831/why-is-my-bottom-up-merge-sort-so-slow-in-java

opened by BonzaiThePenguin 6
Needs better in-place stable merging

The merge operation that is currently in place needs to be redone with a more intelligent algorithm, since it has an O(n^2) worst case. Right now I'm looking at a paper called "Optimizing stable in-place merging", by Jingchao Chen, but there are many similar papers online.
enhancement

opened by BonzaiThePenguin 20

Fast and stable sort algorithm that uses O(1) memory. Public domain.

Related tags

Overview

WikiSort

Comments

Owner

Mike McFadden

Solutions for some common algorithm problems written in Java.

Algorithms and Data Structures implemented in Java

The quickstarts demonstrate JBoss EAP, Jakarta EE 8 and a few additional technologies. They provide small, specific, working examples that can be used as a reference for your own project.

MCQs and coding questions solutions of Object-Oriented Programming java of coding ninjas

Repository for Bryn and Ethan's Java with MicroServices Batch

Computer science data structures and algorithms implementation from scratch

http://kodlama.io "Java & React Bootcamp" up to date Lectures and Homeworks.

VnV Bootcamp for ITJuana documentation and other tools

Castled is an open source reverse ETL solution that helps you to periodically sync the data in your warehouses and databases to sales, marketing, support or custom apps without any help from engineering teams

100 Days of Code Learning program to keep a habit of coding daily and learn things at your own pace with help from our remote community.

A free opensource domain tracker with a breakdown of which countries players connected with for each domain versions 1.13+

RTree2D is a 2D immutable R-tree with STR (Sort-Tile-Recursive) packing for ultra-fast nearest and intersection queries

A DJL Algorithm used to detect if a Image contains a person such as Dream, Sapnap, George Not Found, TommyInnit, Tubbo or Ranboo. This Project has been created for a YouTube Video which is not yet finished, and neither is the Algorithm.

Java & Spring based cryptocurrency trading robot (RPA) that uses the public Binance API

Java & Spring based cryptocurrency trading robot (RPA) that uses the public Binance API

Reference implementation for MINAS (MultI-class learNing Algorithm for data Streams), an algorithm to address novelty detection in data streams multi-class problems.

A Recruitment bot that uses K-Nearest Neighbors Algorithm to determine whether you should hire the applicant or not

Lightweight installer written in java, made for minecraft mods, The installer uses JPanel and uses a URL to install to the specific area (Discord URL's work the best i've seen)

Parallel programming quick sort and parallel sum examples with Fork-join, RecursiveTask, RecursiveAction