青龙男是什么意思| 火命适合佩戴什么首饰| 吩可以组什么词| 什么是头寸| 做梦房子倒塌什么预兆| 秋天吃什么| 为什么一个月来两次月经| 解神是什么意思| 葡萄什么时候成熟| 骨骼肌是什么意思| 得数是什么意思| 人生的尽头是什么| 3p是什么意思| 资产负债率高说明什么| 焦虑吃什么药| 足是什么结构| 膀胱壁毛糙是什么意思| 破窗效应是什么意思| 清对什么| 乳房结节吃什么药| 市政府办公室主任是什么级别| 菜粥里面放什么菜最好| 什么样的牙齿需要矫正| 巨蟹女跟什么星座最配| 太平猴魁是什么茶| 吃什么能升血压| 什么叫处方药| bmg是什么意思| 咳嗽可以喝什么| 入盆是什么意思| 花椒水泡脚有什么好处| 有两把刷子是什么意思| 吃环孢素有什么副作用| 五更是什么生肖| 海燕是什么鸟| 什么发什么强| 眼睛发炎吃什么药| 胸部正位片检查什么| 女人吃善存有什么好处| 白子画什么时候爱上花千骨的| 成都机场叫什么名字| 去湿气吃什么药| gly是什么氨基酸| 钾偏低是什么原因| 什么东西吃蟑螂| 张菲和费玉清什么关系| 痰的颜色代表什么| 外阴瘙痒吃什么药| 再生牙技术什么时候能实现| 丘疹性荨麻疹用什么药| 脾不好吃什么药最见效| 什么叫前列腺炎| 不但而且是什么关系| 阳历九月份是什么星座| 脚筋膜炎什么办法恢复的最快| 什么是代理| 光顾是什么意思| 七里香是什么| 十二月二十号是什么星座| 心咒是什么意思| 剖腹产可以吃什么| 原发性和继发性是什么意思| 挽联是什么意思| 晚上7点是什么时辰| 胃疼想吐是什么原因| 血糖高是什么原因引起的| 诸君是什么意思| 30年婚姻叫什么婚| 书店买不到的书是什么书| 什么是肩袖损伤| 相忘于江湖是什么意思| 舌头起泡是什么原因引起的| 冬字五行属什么| 梦遗是啥意思是什么| 大腿内侧痒是什么原因| 白玫瑰代表什么意思| 机关党委是干什么的| 鸭子喜欢吃什么食物| 鼻子发痒是什么原因引起的| 素鲍鱼是什么做的| 尿道炎吃什么药最好| 臼是什么意思| 肝不好吃什么药最好| 升米恩斗米仇什么意思| 什么是铅中毒| 芒果不能跟什么一起吃| 诸君是什么意思| 25年是什么婚| 人心叵测什么意思| 脑白质是什么病| 为什么要吃叶酸| 晦气是什么意思| 发飙是什么意思| 护照拍照穿什么衣服| 大象像什么| 粉尘螨过敏是什么意思| 晚上睡觉小腿抽筋是什么原因| 乳腺4a是什么意思| 眼睛不好吃什么补眼睛| 经常晕倒是什么原因引起的| 激素吃多了对身体有什么副作用| 鸭子烧什么配菜好吃| 痣挂什么科| 辅酶q10什么时候吃最好| 十二月二十三是什么星座| 深喉是什么感觉| 女人吃什么补肾| 甲功能5项检查是查的什么| 梦见纸钱是什么预兆| 彩超低回声是什么意思| 痛风频繁发作说明什么| 什么是蛋白尿| 被马蜂蛰了用什么药| 紫薇花什么季节开花| 风热感冒咳嗽吃什么药| 包皮过长会有什么影响| negative什么意思| 吃什么长头发又密又多| 晚上七点半是什么时辰| 肛门潮湿瘙痒用什么药最好| 92年是什么生肖| beer是什么意思| 榴莲什么时候吃最好| 七六年属什么| 外阴痒用什么药膏| 警察是什么编制| 为什么医生说直肠炎不用吃药| 好整以暇什么意思| 宫腔粘连是什么原因引起的| 胡麻是什么植物| 十二月十号是什么星座| 冒菜和麻辣烫有什么区别| 再接再厉是什么意思| 广东有什么好玩的地方| 拧巴什么意思| 青霉素是什么| 木薯淀粉是什么粉| dl是什么意思| 为什么不建议光子嫩肤| 常喝三七粉有什么好处| 射手属于什么象星座| 什么是横纹肌肉瘤| 霉菌性阴道炎吃什么消炎药| 冰心原名什么| 吉士粉是什么东西| 为什么会尿道感染| 用什么香皂洗脸可以祛痘| 坐月子能吃什么蔬菜| 为什么会长鸡眼| 关节退行性变是什么意思| 胶原蛋白什么时候喝最好| 新生儿血糖低是什么原因| 大姨妈来了喝红糖水有什么功效| 排卵期出血是什么原因引起的| 水肿吃什么药| 动车跟高铁有什么区别| 肝郁化火吃什么药| 脖子左侧疼是什么前兆| 怀孕为什么要建档| 乳头瘤是什么病| 四九城是什么意思| 微生物是什么| 卵泡回声什么意思| 一个王一个月念什么| 喝茶失眠是什么原因| ect是什么意思| nox是什么意思| 硫酸钠是什么| 蜘蛛结网预示着什么| 人发胖的原因是什么引起的| 为什么要做羊水穿刺检查| 三原色是什么| 野格是什么酒| 377是什么意思| 有缘无分是什么意思| 郑州有什么玩的| 肺占位病变是什么意思| 养殖业什么最赚钱| 血压是什么| 恩怨是什么意思| 感染hpv用什么药| 半夜呕吐是什么原因| 猫为什么流眼泪| 加德纳菌阳性是什么意思| 玄学是什么意思| 宫外孕和宫内孕有什么区别| 哪吒长什么样子| 彪马属于什么档次| 肩胛骨痛挂什么科| 综合能力测试考什么| 菠萝蜜什么季节成熟| 绿豆不能跟什么一起吃| 头发为什么会变黄| 晶莹的近义词是什么| 心率低于60说明什么| 扩招是什么意思| esr是什么意思| 为什么蚊子不咬我| 冠心吃什么药好| 氯雷他定是什么药| 做空是什么意思啊| 羁押是什么意思| 血常规24项能查出什么病| 神经病是什么意思| 玄关是什么位置| 甲状腺囊肿是什么病| 拉肚子吃什么药| 大脑供血不足头晕吃什么药最好| 中元节出什么生肖| 时尚是什么意思| 生命的本质是什么| 屋上土是什么意思| 小孩子发烧吃什么药| 双相是什么| 很low是什么意思| 甘草长什么样子图片| 今年71岁属什么生肖| 两个叉念什么| 红花是什么| 湿热体质吃什么药| 康桑密达是什么意思| ph值偏高是什么意思| 为什么会闰月| 吃二甲双胍为什么会瘦| 1996年1月属什么生肖| 蛇的天敌是什么| uu是什么意思| 六腑指的是什么| 子宫小是什么原因引起的| 唐氏综合症是什么原因| 宫腔粘连有什么危害| 31岁属什么生肖| 1月28日什么星座| 朋友的反义词是什么| 退化是什么意思| 佛心是什么意思| 为什么吃芒果会过敏| 喝什么养肝护肝| 男人喜欢什么样的女人做老婆| atp是什么意思| 瑶浴spa是什么意思| 七九年属什么的| 真人是什么意思| 月经期间喝什么好排毒排污血| 肝火旺吃什么降火最快| 咽干是什么原因造成的| 夜晚的星星像什么| 男性尿道炎是什么原因引起的| 纤维蛋白原偏低吃什么| 宝宝手足口病吃什么药| 闷骚是什么意思| 总胆固醇偏高是什么原因| 胸腔里面像岔气了的疼是什么原因| 02年是什么年| 谷维素是什么| 增强免疫力打什么针| 月经期吃什么| 站姐是什么意思| 拉肚子适合吃什么食物| 藏红花什么时候喝最好| 教头菜有什么功效| 青鱼和草鱼有什么区别| 风湿性关节炎用什么药| 闰六月是什么意思| 百度
Skip to content

scandum/quadsort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

?
?
?
?
?
?
?
?
?

Repository files navigation

Intro

This document describes a stable bottom-up adaptive branchless merge sort named quadsort. A visualisation and benchmarks are available at the bottom.

The quad swap analyzer

Quadsort starts out with an analyzer that has the following tasks:

  1. Detect ordered data with minimal comparisons.
  2. Detect reverse order data with minimal comparisons.
  3. Do the above without impacting performance on random data.
  4. Exit the quad swap analyzer with sorted blocks of 8 elements.

Ordered data handling

Quadsort's analyzer examines the array 8 elements at a time. It performs 4 comparisons on elements (1,2), (3,4), (5,6), and (7,8) of which the result is stored and a bitmask is created with a value between 0 and 15 for all the possible combinations. If the result is 0 it means the 4 comparisons were all in order.

What remains is 3 more comparisons on elements (2,3), (4,5), and (6,7) to determine if all 8 elements are in order. Traditional sorts would do this with 7 branched individual comparisons, which should result in 3.2 branch mispredictions on random data on average. Using quadsort's method should result in 0.2 branch mispredictions on random data on average.

If the data is in order quadsort moves on to the next 8 elements. If the data turns out to be neither in order or in reverse order, 4 branchless swaps are performed using the stored comparison results, followed by a branchless parity merge. More on that later.

Reverse order handling

Reverse order data is typically moved using a simple reversal function, as following.

int reverse(int array[], int start, int end, int swap)
{
    while (start < end)
    {
        swap = array[start];
        array[start++] = array[end];
        array[end--] = swap;
    }
}

While random data can only be sorted using n log n comparisons and n log n moves, reverse-order data can be sorted using n comparisons and n moves through run detection and a reversal routine. Without run detection the best you can do is sort it in n comparisons and n log n moves.

Run detection, as the name implies, comes with a detection cost. Thanks to the laws of probability a quad swap can cheat however. The chance of 4 separate pairs of elements being in reverse order is 1 in 16. So there's a 6.25% chance quadsort makes a wasteful run check.

What about run detection for in-order data? While we're turning n log n moves into n moves with reverse order run detection, we'd be turning 0 moves into 0 moves with forward run detection. So there's no point in doing so.

The next optimization is to write the quad swap analyzer in such a way that we can perform a simple check to see if the entire array was in reverse order, if so, the sort is finished.

At the end of the loop the array has been turned into a series of ordered blocks of 8 elements.

Ping-Pong Quad Merge

Most textbook mergesort examples merge two blocks to swap memory, then copy them back to main memory.

main memory ┌────────┐┌────────┐
            └────────┘└────────┘
                  ↓ merge ↓
swap memory ┌──────────────────┐
            └──────────────────┘
                  ↓ copy ↓
main memory ┌──────────────────┐
            └──────────────────┘

This doubles the amount of moves and we can fix this by merging 4 blocks at once using a quad merge / ping-pong merge like so:

main memory ┌────────┐┌────────┐┌────────┐┌────────┐
            └────────┘└────────┘└────────┘└────────┘
                  ↓ merge ↓           ↓ merge ↓
swap memory ┌──────────────────┐┌──────────────────┐
            └──────────────────┘└──────────────────┘
                            ↓ merge ↓
main memory ┌──────────────────────────────────────┐
            └──────────────────────────────────────┘

It is possible to interleave the two merges to swap memory for increased memory-level parallelism, but this can both increase and decrease performance.

Skipping

Just like with the quad swap it is beneficial to check whether the 4 blocks are in-order.

In the case of the 4 blocks being in-order the merge operation is skipped, as this would be pointless. Because reverse order data is handled in the quad swap there is no need to check for reverse order blocks.

This allows quadsort to sort in-order sequences using n comparisons instead of n * log n comparisons.

Parity merge

A parity merge takes advantage of the fact that if you have two n length arrays, you can fully merge the two arrays by performing n merge operations on the start of each array, and n merge operations on the end of each array. The arrays must be of equal length. Another way to describe the parity merge would be as a bidirectional unguarded merge.

The main advantage of a parity merge over a traditional merge is that the loop of a parity merge can be fully unrolled.

If the arrays are not of equal length a hybrid parity merge can be performed. One way to do so is using n parity merges where n is the size of the smaller array, before switching to a traditional merge.

Branchless parity merge

Since the parity merge can be unrolled it's very suitable for branchless optimizations to speed up the sorting of random data. Another advantage is that two separate memory regions are accessed in the same loop, allowing memory-level parallelism. This makes the routine up to 2.5 times faster for random data on most hardware.

Increasing the memory regions from two to four can result in both performance gains and performance losses.

The following is a visualization of an array with 256 random elements getting turned into sorted blocks of 32 elements using ping-pong parity merges.

quadsort visualization

Cross merge

While a branchless parity merge sorts random data faster, it sorts ordered data slower. One way to solve this problem is by using a method with a resemblance to the galloping merge concept first introduced by timsort.

The cross merge works in a similar way to the quad swap. Instead of merging two arrays two items at a time, it merges four items at a time.

┌───┐┌───┐┌───┐    ┌───┐┌───┐┌───┐            ╭───╮  ┌───┐┌───┐┌───┐
│ A ││ B ││ C │    │ X ││ Y ││ Z │        ┌───│B<X├──┤ A ││ B ││C/X│
└─┬─┘└─┬─┘└───┘    └─┬─┘└─┬─┘└───┘        │   ╰─┬─╯  └───┘└───┘└───┘
  └────┴─────────────┴────┴───────────────┘     │  ╭───╮  ┌───┐┌───┐┌───┐
                                                └──│A>Y├──┤ X ││ Y ││A/Z│
                                                   ╰─┬─╯  └───┘└───┘└───┘
                                                     │    ┌───┐┌───┐┌───┐
                                                     └────│A/X││X/A││B/Y│
                                                          └───┘└───┘└───┘

When merging ABC and XYZ it first checks if B is smaller or equal to X. If that's the case A and B are copied to swap. If not, it checks if A is greater than Y. If that's the case X and Y are copied to swap.

If either check is false it's known that the two remaining distributions are A X and X A. This allows performing an optimal branchless merge. Since it's known each block still has at least 1 item remaining (B and Y) and there is a high probability of the data to be random, another (sub-optimal) branchless merge can be performed.

In C this looks as following:

while (l < l_size - 1 && r < r_size - 1)
{
    if (left[l + 1] <= right[r])
    {
        swap[s++] = left[l++];
        swap[s++] = left[l++];
    }
    else if (left[l] > right[r + 1])
    {
        swap[s++] = right[r++];
        swap[s++] = right[r++];
    }
    else
    {
        a = left[l] > right[r];
        x = !a;
        swap[s + a] = left[l++];
        swap[s + x] = right[r++];
        s += 2;
    }
}

Overall the cross merge gives a decent performance gain for both ordered and random data, particularly when the two arrays are of unequal length. When two arrays are of near equal length quadsort looks 8 elements ahead, and performs an 8 element parity merge if it can't skip ahead.

Merge strategy

Quadsort will merge blocks of 8 into blocks of 32, which it will merge into blocks of 128, 512, 2048, 8192, etc.

For each ping-pong merge quadsort will perform two comparisons to see if it will be faster to use a parity merge or a cross merge, and pick the best option.

Tail merge

When sorting an array of 33 elements you end up with a sorted array of 32 elements and a sorted array of 1 element in the end. If a program sorts in intervals it should pick an optimal array size (32, 128, 512, etc) to do so.

To minimalize the impact the remainder of the array is sorted using a tail merge.

Big O

                 ┌───────────────────────┐┌───────────────────────┐
                 │comparisons            ││swap memory            │
┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
│name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no       │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quadsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││no       ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quicksort      ││n log n│n log n│n2     ││1      │1      │1      ││no    ││yes      ││no       │
└───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘

Quadsort makes n comparisons when the data is fully sorted or reverse sorted.

Data Types

The C implementation of quadsort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings.

Interface

Quadsort uses the same interface as qsort, which is described in man qsort.

In addition to supporting (l - r) and ((l > r) - (l < r)) for the comparison function, (l > r) is valid as well. Special note should be taken that C++ sorts use (l < r) for the comparison function, which is incompatible with the C standard. When porting quadsort to C++ or Rust, switch (l, r) to (r, l) for every comparison.

Quadsort comes with the quadsort_prim(void *array, size_t nmemb, size_t size) function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements, while size should be either sizeof(int) or sizeof(long long) for signed integers, and sizeof(int) + 1 or sizeof(long long) + 1 for unsigned integers. Support for the char, short, float, double, and long double types can be easily added in quadsort.h.

Quadsort comes with the quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) function to sort elements of any given size. The comparison function needs to be by reference, instead of by value, as if you are sorting an array of pointers.

Memory

By default quadsort uses n swap memory. If memory allocation fails quadsort will switch to sorting in-place through rotations. The minimum memory requirement is 32 elements of stack memory.

Rotations can be performed with minimal performance loss by using branchless binary searches and trinity / bridge rotations.

Sorting in-place through rotations will increase the number of moves from n log n to n log2 n. The overall impact on performance is minor on array sizes below 1M elements.

Performance

Quadsort is one of the fastest merge sorts written to date. It is faster than quicksort for most data distributions, with the notable exception of generic data. Data type is important as well, and overall quadsort is faster for sorting referenced objects.

Compared to Timsort, Quadsort has similar overall adaptivity while being much faster on random data, even without branchless optimizations.

Quicksort has a slight advantage on random data as the array size increases beyond the L1 cache. For small arrays quadsort has a significant advantage due to quicksort's inability to cost effectively pick a reliable pivot. Subsequently, the only way for quicksort to rival quadsort is to cheat and become a hybrid sort, by using branchless merges to sort small partitions.

When using the clang compiler quadsort can use branchless ternary comparisons. Since most programming languages only support ternary merges ? : and not ternary partitions : ? this gives branchless mergesorts an additional advantage over branchless quicksorts. However, since the gcc compiler doesn't support branchless ternary merges, and the hack to perform branchless merges is less efficient than the hack to perform branchless partitions, branchless quicksorts have an advantage for gcc.

To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will increase the performance by 30% on primitive types. The quadsort_prim function can be used to access primitive comparisons directly.

Variants

  • blitsort is a hybrid stable in-place rotate quicksort / quadsort.

  • crumsort is a hybrid unstable in-place quicksort / quadsort.

  • fluxsort is a hybrid stable quicksort / quadsort.

  • gridsort is a hybrid stable cubesort / quadsort. Gridsort is an online sort and might be of interest to those interested in data structures and sorting very large arrays.

  • twinsort is a simplified quadsort with a much smaller code size. Twinsort might be of use to people who want to port or understand quadsort; it does not use pointers or gotos. It is a bit dated and isn't branchless.

  • piposort is a simplified branchless quadsort with a much smaller code size and complexity while still being very fast. Piposort might be of use to people who want to port quadsort. This is a lot easier when you start out small.

  • wolfsort is a hybrid stable radixsort / fluxsort with improved performance on random data. It's mostly a proof of concept that only works on unsigned 32 bit integers.

  • Robin Hood Sort is a hybrid stable radixsort / dropsort with improved performance on random and generic data. It has a compilation option to use quadsort for its merging.

Credits

I personally invented the quad swap analyzer, cross merge, parity merge, branchless parity merge, monobound binary search, bridge rotation, and trinity rotation.

The ping-pong quad merge had been independently implemented in wikisort prior to quadsort, and likely by others as well.

The monobound binary search has been independently implemented, often referred to as a branchless binary search. I published a working concept in 2014, which appears to pre-date others.

Special kudos to Musiccombo and Co for getting me interested in rotations and branchless logic.

Visualization

In the visualization below nine tests are performed on 256 elements.

  1. Random order
  2. Ascending order
  3. Ascending Saw
  4. Generic random order
  5. Descending order
  6. Descending Saw
  7. Random tail
  8. Random half
  9. Ascending tiles.

The upper half shows the swap memory and the bottom half shows the main memory. Colors are used to differentiate various operations. Quad swaps are in cyan, reversals in magenta, skips in green, parity merges in orange, bridge rotations in yellow, and trinity rotations are in violet.

quadsort benchmark

The visualization is available on YouTube and there's also a YouTube video of a java port of quadsort in ArrayV on a wide variety of data distributions.

Benchmark: quadsort vs std::stable_sort vs timsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Stablesort is g++'s std:stablesort function. Each test was ran 100 times on 100,000 elements. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
stablesort 100000 128 0.010958 0.011215 0 100 random order
fluxsort 100000 128 0.008589 0.008837 0 100 random order
timsort 100000 128 0.012799 0.013185 0 100 random order
Name Items Type Best Average Compares Samples Distribution
stablesort 100000 64 0.006134 0.006232 0 100 random order
fluxsort 100000 64 0.001945 0.001994 0 100 random order
timsort 100000 64 0.007824 0.008070 0 100 random order
Name Items Type Best Average Loops Samples Distribution
stablesort 100000 32 0.005995 0.006068 0 100 random order
fluxsort 100000 32 0.001841 0.001890 0 100 random order
timsort 100000 32 0.007593 0.007773 0 100 random order
stablesort 100000 32 0.003815 0.003841 0 100 random % 100
fluxsort 100000 32 0.000655 0.000680 0 100 random % 100
timsort 100000 32 0.005608 0.005666 0 100 random % 100
stablesort 100000 32 0.000672 0.000733 0 100 ascending order
fluxsort 100000 32 0.000044 0.000045 0 100 ascending order
timsort 100000 32 0.000045 0.000045 0 100 ascending order
stablesort 100000 32 0.001360 0.001410 0 100 ascending saw
fluxsort 100000 32 0.000328 0.000330 0 100 ascending saw
timsort 100000 32 0.000840 0.000852 0 100 ascending saw
stablesort 100000 32 0.001121 0.001154 0 100 pipe organ
fluxsort 100000 32 0.000205 0.000207 0 100 pipe organ
timsort 100000 32 0.000465 0.000469 0 100 pipe organ
stablesort 100000 32 0.000904 0.000920 0 100 descending order
fluxsort 100000 32 0.000055 0.000055 0 100 descending order
timsort 100000 32 0.000088 0.000092 0 100 descending order
stablesort 100000 32 0.001603 0.001641 0 100 descending saw
fluxsort 100000 32 0.000418 0.000427 0 100 descending saw
timsort 100000 32 0.000788 0.000816 0 100 descending saw
stablesort 100000 32 0.002029 0.002095 0 100 random tail
fluxsort 100000 32 0.000623 0.000627 0 100 random tail
timsort 100000 32 0.001996 0.002041 0 100 random tail
stablesort 100000 32 0.003491 0.003539 0 100 random half
fluxsort 100000 32 0.001071 0.001078 0 100 random half
timsort 100000 32 0.004025 0.004056 0 100 random half
stablesort 100000 32 0.000918 0.000940 0 100 ascending tiles
fluxsort 100000 32 0.000293 0.000296 0 100 ascending tiles
timsort 100000 32 0.000850 0.000931 0 100 ascending tiles
stablesort 100000 32 0.001168 0.001431 0 100 bit reversal
fluxsort 100000 32 0.001700 0.001731 0 100 bit reversal
timsort 100000 32 0.002261 0.002940 0 100 bit reversal

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. It measures the performance on random data with array sizes ranging from 1 to 1024. It's generated by running the benchmark using 1024 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
stablesort 4 32 0.005569 0.005899 0.0 50 random 1-4
quadsort 4 32 0.001144 0.001189 0.0 50 random 1-4
timsort 4 32 0.002301 0.002491 0.0 50 random 1-4
stablesort 8 32 0.005731 0.005950 0.0 50 random 5-8
quadsort 8 32 0.002064 0.002200 0.0 50 random 5-8
timsort 8 32 0.004958 0.005165 0.0 50 random 5-8
stablesort 16 32 0.006360 0.006415 0.0 50 random 9-16
quadsort 16 32 0.001862 0.001927 0.0 50 random 9-16
timsort 16 32 0.006578 0.006663 0.0 50 random 9-16
stablesort 32 32 0.007809 0.007885 0.0 50 random 17-32
quadsort 32 32 0.003177 0.003258 0.0 50 random 17-32
timsort 32 32 0.008597 0.008698 0.0 50 random 17-32
stablesort 64 32 0.008846 0.008918 0.0 50 random 33-64
quadsort 64 32 0.004144 0.004195 0.0 50 random 33-64
timsort 64 32 0.011459 0.011560 0.0 50 random 33-64
stablesort 128 32 0.010065 0.010131 0.0 50 random 65-128
quadsort 128 32 0.005131 0.005184 0.0 50 random 65-128
timsort 128 32 0.013917 0.014022 0.0 50 random 65-128
stablesort 256 32 0.011217 0.011305 0.0 50 random 129-256
quadsort 256 32 0.004937 0.005010 0.0 50 random 129-256
timsort 256 32 0.015785 0.015912 0.0 50 random 129-256
stablesort 512 32 0.012544 0.012637 0.0 50 random 257-512
quadsort 512 32 0.005545 0.005618 0.0 50 random 257-512
timsort 512 32 0.017533 0.017652 0.0 50 random 257-512
stablesort 1024 32 0.013871 0.013979 0.0 50 random 513-1024
quadsort 1024 32 0.005664 0.005755 0.0 50 random 513-1024
timsort 1024 32 0.019176 0.019270 0.0 50 random 513-1024
stablesort 2048 32 0.010961 0.011018 0.0 50 random 1025-2048
quadsort 2048 32 0.004527 0.004580 0.0 50 random 1025-2048
timsort 2048 32 0.015289 0.015338 0.0 50 random 1025-2048
stablesort 4096 32 0.010854 0.010917 0.0 50 random 2049-4096
quadsort 4096 32 0.003974 0.004018 0.0 50 random 2049-4096
timsort 4096 32 0.015051 0.015132 0.0 50 random 2049-4096

Benchmark: quadsort vs qsort (mergesort)

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using 100000 100 1 as the argument. In the benchmark quadsort is compared against glibc qsort() using the same general purpose interface and without any known unfair advantage, like inlining. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
qsort 100000 64 0.016881 0.017052 1536381 100 random string
quadsort 100000 64 0.010615 0.010756 1655772 100 random string
qsort 100000 64 0.015387 0.015550 1536491 100 random double
quadsort 100000 64 0.008648 0.008751 1655904 100 random double
qsort 100000 64 0.011165 0.011375 1536491 100 random long
quadsort 100000 64 0.006024 0.006099 1655904 100 random long
qsort 100000 64 0.010775 0.010928 1536634 100 random int
quadsort 100000 64 0.005313 0.005375 1655948 100 random int
Name Items Type Best Average Compares Samples Distribution
qsort 100000 128 0.018214 0.018843 1536491 100 random order
quadsort 100000 128 0.011098 0.011185 1655904 100 random order
Name Items Type Best Average Compares Samples Distribution
qsort 100000 64 0.009522 0.009748 1536491 100 random order
quadsort 100000 64 0.004073 0.004118 1655904 100 random order
Name Items Type Best Average Compares Samples Distribution
qsort 100000 32 0.008946 0.009149 1536634 100 random order
quadsort 100000 32 0.003342 0.003391 1655948 100 random order
qsort 100000 32 0.006868 0.007059 1532324 100 random % 100
quadsort 100000 32 0.002690 0.002740 1381730 100 random % 100
qsort 100000 32 0.002612 0.002845 815024 100 ascending order
quadsort 100000 32 0.000160 0.000161 99999 100 ascending order
qsort 100000 32 0.003396 0.003622 915020 100 ascending saw
quadsort 100000 32 0.000904 0.000925 368457 100 ascending saw
qsort 100000 32 0.002672 0.002803 884462 100 pipe organ
quadsort 100000 32 0.000466 0.000469 277443 100 pipe organ
qsort 100000 32 0.002469 0.002587 853904 100 descending order
quadsort 100000 32 0.000164 0.000165 99999 100 descending order
qsort 100000 32 0.003302 0.003453 953892 100 descending saw
quadsort 100000 32 0.000929 0.000941 380548 100 descending saw
qsort 100000 32 0.004250 0.004501 1012003 100 random tail
quadsort 100000 32 0.001188 0.001208 564953 100 random tail
qsort 100000 32 0.005960 0.006133 1200707 100 random half
quadsort 100000 32 0.002047 0.002078 980778 100 random half
qsort 100000 32 0.003903 0.004352 1209200 100 ascending tiles
quadsort 100000 32 0.002072 0.002170 671191 100 ascending tiles
qsort 100000 32 0.005165 0.006168 1553378 100 bit reversal
quadsort 100000 32 0.003146 0.003197 1711215 100 bit reversal

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using 10000000 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
qsort 10 32 0.218310 0.224505 22 10 random 10
quadsort 10 32 0.091750 0.092312 29 10 random 10
qsort 100 32 0.391962 0.396639 541 10 random 100
quadsort 100 32 0.173928 0.177794 646 10 random 100
qsort 1000 32 0.558055 0.566364 8707 10 random 1000
quadsort 1000 32 0.220395 0.222146 9817 10 random 1000
qsort 10000 32 0.735528 0.741353 120454 10 random 10000
quadsort 10000 32 0.267860 0.269924 131668 10 random 10000
qsort 100000 32 0.907161 0.910446 1536421 10 random 100000
quadsort 100000 32 0.339541 0.340942 1655703 10 random 100000
qsort 1000000 32 1.085275 1.089068 18674532 10 random 1000000
quadsort 1000000 32 0.401715 0.403860 19816270 10 random 1000000
qsort 10000000 32 1.313922 1.319911 220105921 10 random 10000000
quadsort 10000000 32 0.599393 0.601635 231513131 10 random 10000000

Benchmark: quadsort vs pdqsort vs fluxsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Pdqsort is a branchless quicksort/heapsort/insertionsort hybrid. Fluxsort is a branchless quicksort/mergesort hybrid. Each test was ran 100 times on 100,000 elements. Comparisons are fully inlined. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

data table
Name Items Type Best Average Compares Samples Distribution
pdqsort 100000 128 0.005773 0.005859 0 100 random order
quadsort 100000 128 0.009813 0.009882 0 100 random order
fluxsort 100000 128 0.008603 0.008704 0 100 random order
Name Items Type Best Average Compares Samples Distribution
pdqsort 100000 64 0.002671 0.002686 0 100 random order
quadsort 100000 64 0.002516 0.002534 0 100 random order
fluxsort 100000 64 0.001978 0.002003 0 100 random order
Name Items Type Best Average Loops Samples Distribution
pdqsort 100000 32 0.002589 0.002607 0 100 random order
quadsort 100000 32 0.002447 0.002466 0 100 random order
fluxsort 100000 32 0.001851 0.001873 0 100 random order
pdqsort 100000 32 0.000780 0.000788 0 100 random % 100
quadsort 100000 32 0.001788 0.001812 0 100 random % 100
fluxsort 100000 32 0.000675 0.000688 0 100 random % 100
pdqsort 100000 32 0.000084 0.000085 0 100 ascending order
quadsort 100000 32 0.000051 0.000051 0 100 ascending order
fluxsort 100000 32 0.000042 0.000043 0 100 ascending order
pdqsort 100000 32 0.003378 0.003402 0 100 ascending saw
quadsort 100000 32 0.000615 0.000618 0 100 ascending saw
fluxsort 100000 32 0.000327 0.000337 0 100 ascending saw
pdqsort 100000 32 0.002772 0.002792 0 100 pipe organ
quadsort 100000 32 0.000271 0.000271 0 100 pipe organ
fluxsort 100000 32 0.000214 0.000215 0 100 pipe organ
pdqsort 100000 32 0.000187 0.000192 0 100 descending order
quadsort 100000 32 0.000059 0.000059 0 100 descending order
fluxsort 100000 32 0.000053 0.000053 0 100 descending order
pdqsort 100000 32 0.003148 0.003165 0 100 descending saw
quadsort 100000 32 0.000614 0.000626 0 100 descending saw
fluxsort 100000 32 0.000327 0.000331 0 100 descending saw
pdqsort 100000 32 0.002498 0.002520 0 100 random tail
quadsort 100000 32 0.000813 0.000842 0 100 random tail
fluxsort 100000 32 0.000624 0.000627 0 100 random tail
pdqsort 100000 32 0.002573 0.002590 0 100 random half
quadsort 100000 32 0.001451 0.001462 0 100 random half
fluxsort 100000 32 0.001064 0.001075 0 100 random half
pdqsort 100000 32 0.002256 0.002281 0 100 ascending tiles
quadsort 100000 32 0.000815 0.000823 0 100 ascending tiles
fluxsort 100000 32 0.000313 0.000315 0 100 ascending tiles
pdqsort 100000 32 0.002570 0.002589 0 100 bit reversal
quadsort 100000 32 0.002230 0.002259 0 100 bit reversal
fluxsort 100000 32 0.001718 0.001744 0 100 bit reversal

The following benchmark was on WSL clang version 10 (10.0.0-4ubuntu1~18.04.2) using rhsort's wolfsort benchmark. The source code was compiled using clang -O3. The bar graph shows the best run out of 100 on 131,072 32 bit integers. Comparisons for quadsort, fluxsort and glidesort are inlined.

Some additional context is required for this benchmark. Glidesort is written and compiled in Rust which supports branchless ternary operations, subsequently fluxsort and quadsort are compiled using clang with branchless ternary operations in place for the merge and small-sort routines. Since fluxsort and quadsort are optimized for gcc there is a performance penalty, with some of the routines running 2-3x slower than they do in gcc.

fluxsort vs glidesort

data table
Name Items Type Best Average Loops Samples Distribution
quadsort 131072 32 0.002174 0.002209 0 100 random order
fluxsort 131072 32 0.002189 0.002205 0 100 random order
glidesort 131072 32 0.003065 0.003125 0 100 random order
quadsort 131072 32 0.001623 0.001646 0 100 random % 100
fluxsort 131072 32 0.000837 0.000856 0 100 random % 100
glidesort 131072 32 0.001031 0.001037 0 100 random % 100
quadsort 131072 32 0.000061 0.000063 0 100 ascending order
fluxsort 131072 32 0.000058 0.000060 0 100 ascending order
glidesort 131072 32 0.000091 0.000093 0 100 ascending order
quadsort 131072 32 0.000345 0.000353 0 100 ascending saw
fluxsort 131072 32 0.000341 0.000349 0 100 ascending saw
glidesort 131072 32 0.000351 0.000358 0 100 ascending saw
quadsort 131072 32 0.000231 0.000245 0 100 pipe organ
fluxsort 131072 32 0.000222 0.000228 0 100 pipe organ
glidesort 131072 32 0.000228 0.000235 0 100 pipe organ
quadsort 131072 32 0.000074 0.000076 0 100 descending order
fluxsort 131072 32 0.000073 0.000076 0 100 descending order
glidesort 131072 32 0.000106 0.000110 0 100 descending order
quadsort 131072 32 0.000373 0.000380 0 100 descending saw
fluxsort 131072 32 0.000355 0.000371 0 100 descending saw
glidesort 131072 32 0.000363 0.000369 0 100 descending saw
quadsort 131072 32 0.000685 0.000697 0 100 random tail
fluxsort 131072 32 0.000720 0.000726 0 100 random tail
glidesort 131072 32 0.000953 0.000966 0 100 random tail
quadsort 131072 32 0.001192 0.001204 0 100 random half
fluxsort 131072 32 0.001251 0.001266 0 100 random half
glidesort 131072 32 0.001650 0.001679 0 100 random half
quadsort 131072 32 0.001472 0.001507 0 100 ascending tiles
fluxsort 131072 32 0.000578 0.000589 0 100 ascending tiles
glidesort 131072 32 0.002559 0.002576 0 100 ascending tiles
quadsort 131072 32 0.002210 0.002231 0 100 bit reversal
fluxsort 131072 32 0.002042 0.002053 0 100 bit reversal
glidesort 131072 32 0.002787 0.002807 0 100 bit reversal
quadsort 131072 32 0.001237 0.001278 0 100 random % 2
fluxsort 131072 32 0.000227 0.000233 0 100 random % 2
glidesort 131072 32 0.000449 0.000455 0 100 random % 2
quadsort 131072 32 0.001123 0.001153 0 100 signal
fluxsort 131072 32 0.001269 0.001285 0 100 signal
glidesort 131072 32 0.003760 0.003776 0 100 signal
quadsort 131072 32 0.001911 0.001956 0 100 exponential
fluxsort 131072 32 0.001134 0.001142 0 100 exponential
glidesort 131072 32 0.002355 0.002373 0 100 exponential

About

Quadsort is a branchless stable adaptive mergesort faster than quicksort.

Topics

Resources

License

Stars

Watchers

Forks

Languages

孕反一般什么时候开始 19年属什么 神奇是什么意思 牛仔外套搭配什么裤子好看 壁虎的尾巴有什么作用
商鞅变法是什么时期 社恐到底在害怕什么 军官是什么意思 r值是什么 什么是双一流
铠字五行属什么 冲服是什么意思 建卡需要带什么证件 喉咙有异物感看什么科 脑梗挂什么科
风湿性心脏病是什么原因引起的 孕早期可以吃什么水果 一甲子是什么意思 什么的童年 小结是什么意思
梦见很多狗是什么意思hcv8jop3ns7r.cn 去医院查怀孕挂什么科hcv7jop9ns1r.cn 口周皮炎用什么药膏hcv8jop0ns3r.cn 湿气重什么原因hcv9jop2ns9r.cn 安徽属于什么地区hcv8jop2ns3r.cn
一饿就心慌是什么原因引起的hcv9jop5ns8r.cn 十面埋伏是什么生肖1949doufunao.com 稀松平常是什么意思hcv8jop2ns8r.cn 性是什么意思hcv9jop3ns9r.cn 卧轨是什么意思hcv8jop9ns6r.cn
猫肉什么味道hcv7jop6ns6r.cn 血氨低是什么原因hcv8jop3ns8r.cn 唐筛和无创有什么区别hcv8jop9ns2r.cn 鼻子经常出血是什么原因hcv7jop6ns1r.cn AB型血型有什么优势hcv8jop3ns2r.cn
胖大海配什么喝治咽炎hcv8jop8ns4r.cn 白色糠疹用什么药膏hcv8jop5ns0r.cn 1975年属什么生肖hcv7jop7ns1r.cn 婴儿什么时候会走路hcv8jop2ns0r.cn 什么叫做亚健康hcv8jop9ns9r.cn
百度