libzahl

big integer library
git clone git://git.suckless.org/libzahl
Log | Files | Refs | README | LICENSE

STATUS (7064B)


      1 Optimisation progress for libzahl. Benchmarks are done in the
      2 range 1 to 4097 bits. So far all benchmarks on libzahl are
      3 done with the following combinations of cc and libc:
      4 
      5     gcc + glibc
      6     gcc + musl
      7     clang + glibc
      8 
      9 Benchmarks on the other libraries are done with gcc and glibc.
     10 
     11 All benchmarks are done on an x86-64 (specifically an Intel
     12 Core 2 Quad CPU Q9300), without any extensions turned on
     13 during compilation, and without any use of extensions in
     14 assembly code. The benchmarks are performed with Linux as
     15 the OS's kernel with 50 µs timer slack, and the benchmarking
     16 processes are fixed to one CPU.
     17 
     18 
     19   The following functions are probably implemented optimally:
     20 
     21 zset .................... always fastest
     22 zseti(a, +) ............. tomsfastmath is faster
     23 zseti(a, -) ............. tomsfastmath is faster
     24 zsetu ................... tomsfastmath is faster
     25 zswap ................... always fastest
     26 zzero ................... always fastest (shared with gmp)
     27 zsignum ................. always fastest (shared with gmp)
     28 zeven ................... always fastest (shared with gmp)
     29 zodd .................... always fastest (shared with gmp)
     30 zeven_nonzero ........... always fastest (shared with gmp)
     31 zodd_nonzero ............ always fastest (shared with gmp)
     32 zbtest .................. always fastest
     33 zsave ................... always fastest
     34 zload ................... always fastest
     35 
     36 
     37   The following functions are probably implemented optimally, but
     38   depends on other functions or call-cases for better performance:
     39 
     40 zneg(a, b) .............. always fastest
     41 zabs(a, b) .............. always fastest
     42 ztrunc(a, b, c) ......... always fastest
     43 zbset(a, b, 1) .......... always fastest
     44 zbset(a, b, 0) .......... always fastest
     45 zbset(a, b, -1) ......... always fastest
     46 zsplit .................. alternating with gmp for fastest, but gmp is a bit faster on average
     47 
     48 
     49   The following functions are probably implemented close to
     50   optimally, further optimisation should not be a priority:
     51 
     52 zadd_unsigned ........... fastest after ~140 (depends on cc and libc) compared against zadd too
     53 ztrunc(a, a, b) ......... fastest until ~100, then 77 % (gcc) or 68 % (clang) of tomsfastmath
     54 zbset(a, a, 1) .......... always fastest
     55 zbset(a, a, 0) .......... always fastest
     56 zbset(a, a, -1) ......... always fastest (only marginally faster than gmp with clang)
     57 zlsb .................... always fastest <<suspicious>>
     58 zlsh .................... not too fast anymore
     59 zand .................... fastest after ~400, tomsfastmath before
     60 zor ..................... fastest after ~1150, tomsfastmath before
     61 zxor .................... alternative with gmp after ~700, tomsfastmath before (musl), a bit slow with glibc
     62 znot .................... always fastest
     63 
     64 
     65   The following functions require structural changes for
     66   further optimisations:
     67 
     68 zneg(a, a) .............. always fastest (shared with gmp (gcc))
     69 zabs(a, a) .............. 34 % (clang) or 8 % (gcc) of tomsfastmath
     70 
     71 
     72   The following functions are probably implemented optimally
     73   or close to optimally, except they contain some code that
     74   should not be necessary after some bugs have been fixed:
     75 
     76 zbits ................... always fastest
     77 zcmpi(a, +) ............. always fastest
     78 zcmpi(a, -) ............. always fastest
     79 zcmpu ................... always fastest
     80 
     81 
     82   It may be possible optimise the following functions
     83   further:
     84 
     85 zadd .................... fastest after ~90 (clang), ~260 (gcc+musl), or ~840 (gcc+glibc) (gcc+glibc is slow)
     86 zcmp .................... almost always fastest (musl), almost always slowest (glibc) <<suspicious (clang)>>
     87 zcmpmag ................. always fastest <<suspicious, see zcmp>>
     88 
     89 
     90   The following functions could be optimised further:
     91 
     92 zrsh .................... gmp is almost always faster; also tomsfastmath after ~3000 (gcc+glibc) or ~2800 (clang)
     93 zsub_unsigned ........... always fastest (compared against zsub too)
     94 zsub .................... always fastest (slower with gcc+glibc than gcc+musl or clang)
     95 
     96 
     97   The following functions could probably be optimised further,
     98   but their performance can be significantly improved by
     99   optimising their dependencies:
    100 
    101 zmul .................... slowest
    102 zsqr .................... slowest
    103 zstr_length(a, 10) ...... gmp is faster (clang is faster than gcc, musl is faster than glibc)
    104 zstr(a, b, n) ........... slowest
    105 
    106 
    107 musl has more stable performance than glibc. clang is better at
    108 inlining than gcc. (Which is better at optimising must be judged
    109 on a per-function basis.)
    110 
    111 
    112 
    113 {{{ [out of date legacy area, this being phased out]
    114 Optimisation progress for libzahl, compared to other big integer
    115 libraries. These comparisons are for 152-bit integers. Functions
    116 in parenthesis the right column are functions that needs
    117 optimisation to improve the peformance of the function in the
    118 left column. Double-parenthesis means there may be a better way
    119 to do it. Inside square-brackets, there are some comments on
    120 multi-bit comparisons.
    121 
    122 zgcd .................... 21 % of gmp (zcmpmag)
    123 zmodmul(big mod) ........ slowest ((zmul, zmod))
    124 zmodsqr(big mod) ........ slowest ((zmul, zmod))
    125 zmodmul(tiny mod) ....... slowest ((zmul))
    126 zmodsqr(tiny mod) ....... slowest ((zmul))
    127 zpow .................... slowest (zmul, zsqr)
    128 zpowu ................... slowest (zmul, zsqr)
    129 zmodpow ................. slowest (zmul, zsqr. zmod)
    130 zmodpowu ................ slowest (zmul, zsqr, zmod)
    131 zsets ................... 13 % of gmp
    132 zrand(default uniform) .. 51 % of gmp
    133 zptest .................. slowest (zrand, zmodpow, zsqr, zmod)
    134 zdiv(big denum) ......... tomsfastmath is faster (zdivmod)
    135 zmod(big denum) ......... fastest (zdivmod)
    136 zdivmod(big denum) ...... fastest
    137 zdiv(tiny denum) ........ slowest
    138 zmod(tiny denum) ........ slowest
    139 zdivmod(tiny denum) ..... slowest
    140 }}}
    141 
    142 
    143 
    144 Note, some corresponding functions are not implemented in
    145 some other libraries. In such cases, they have been implemented
    146 in the translation layers (found under bench/). Those
    147 implementations are often suboptimal, but probably in style
    148 with what you would write if you need that functionality.
    149 Note further, if for example, you want do perform addition
    150 and you know that your operands are non-negative, you would
    151 choose zadd_unsigned in libzahl, but if you are using a
    152 library that does not have the corrsponding function, you are
    153 better of with the regular addition (zadd). This is however
    154 reflected in the comment column.
    155 
    156 Also note, TomsFastMath does not support arbitrarily large
    157 integers, the limit is set at compile-time, which gives is
    158 a significant performance advantage. Furthermore, no failure
    159 check is done with GMP. Additionally, hebimath has been
    160 excluded from these comparison because it is not fully
    161 operational.
    162 
    163 Also note, NOT does not mean the same thing in all libraries,
    164 for example in GMP it means (-x - 1), thus, znot does not
    165 use GMP's NOT in the translations layer.
    166 
    167 
    168 The following optimisation flags have been tested:
    169 
    170 -O0 ...................... Bad
    171 -O1 ...................... Bad
    172 -O2 ...................... Not so good
    173 -O3 ...................... Good
    174 -fno-builtin ............. Bad