libzahls-design.tex (7474B)
1 \chapter{libzahl's design} 2 \label{chap:libzahl's design} 3 4 In this chapter, the design of libzahl is discussed. 5 6 \vspace{1cm} 7 \minitoc 8 9 10 \newpage 11 \section{Memory pool} 12 \label{sec:Memory pool} 13 14 Allocating memory dynamically is an expensive operation. 15 To improve performance, libzahl never deallocates memory 16 before the library is uninitialised, instead it pools 17 memory, that is no longer needed, for reuse. 18 19 Because of the memory pooling, this is a pattern to the 20 allocation sizes. In an allocation, a power of two 21 elements, plus a few elements that are discussed in 22 \secref{sec:Integer structure}, are allocated. That is, 23 the number multiplied by the size of an element. 24 Powers of two (growth factor 2) is not the most memory 25 efficient way to do this, but it is the simplest and 26 performance efficient. This power of two (sans the few 27 extra elements) is used to calculate — getting the index 28 of the only set bit — the index of the bucket in 29 which the allocation is stored when pooled. The buckets 30 are dynamic arrays with the growth factor 1.5. The 31 growth factor 1.5 is often used for dynamic arrays, it 32 is a good compromise between memory usage and performance. 33 34 libzahl also avoids allocating memory by having a set 35 of temporary variables predefined. 36 37 38 \newpage 39 \section{Error handling} 40 \label{sec:Error handling} 41 42 In C, it is traditional to return a sentiel value 43 in case an error has occurred, and set the value 44 of a global variable to describe the error that 45 has occurred. The programmer can choose whether to 46 check for errors, ignore errors where it does not 47 matter, or simple ignore errors altogether and let 48 the program eventually crash. This is a simple 49 technique that gives the programmer a better 50 understanding of what can happen. A great advantage 51 C has over most programming languages. 52 53 Another technique is to use long jumps on error. 54 This technique is not too common, but is has one 55 significant advantage. Error-checks need only be 56 preformed where the error can first be detected. 57 There is no need to check the return value at every 58 function return. This leads to cleaner code, if 59 there are many functions that can raise exceptional 60 conditions, and greater performance under some 61 conditions. This is why this technique is sometimes 62 used in high-performance libraries. libzahl uses 63 this technique. 64 65 Rather than writing 66 67 \begin{alltt} 68 if (zadd(a, b, c)) 69 goto out; 70 \end{alltt} 71 72 \noindent 73 or a bit cleaner, if there are a lot of calls, 74 75 \begin{alltt} 76 #define TRY(...) do if (__VA_ARGS__) goto out; while (0) 77 \textcolor{c}{/* \textrm{\ldots} */} 78 TRY(zadd(a, b, c)); 79 \end{alltt} 80 81 \noindent 82 we write 83 84 \begin{alltt} 85 jmp_buf env; 86 if (setjmp(env)) 87 goto out; 88 zsetup(env); 89 \textcolor{c}{/* \textrm{\ldots} */} 90 zadd(a, b, c); 91 \end{alltt} 92 93 You only need to call {\tt setjmp} and {\tt zsetup} 94 once, but can update the return point by calling 95 them once more. 96 97 If you don't need to check for errors, you can 98 disable error detection at compile-time. By defining 99 the {\tt ZAHL\_UNSAFE} C preprocessor definition 100 when compiling libzahl, and when compiling your 101 software that uses libzahl. 102 103 104 \newpage 105 \section{Integer structure} 106 \label{sec:Integer structure} 107 108 The data type used to represent a big integer with 109 libzahl is {\tt z\_t},\footnote{This name actually 110 violates the naming convention; it should be {\tt Z}, 111 or {\tt Zahl} to avoid single-letter names. But this 112 violation is common-place.} defined as 113 114 \begin{alltt} 115 typedef struct zahl z_t[1]; 116 \end{alltt} 117 118 \noindent 119 where {\tt struct zahl} is defined as 120 121 \begin{alltt} 122 struct zahl \{ 123 int sign; \textcolor{c}{/* \textrm{\emph{not} short for `signum'} */} 124 size_t used; 125 size_t alloced; \textcolor{c}{/* \textrm{short for `allocated'} */} 126 zahl_char_t *chars; \textcolor{c}{/* \textrm{short for `characters'} */} 127 \}; 128 \end{alltt} 129 130 \noindent 131 where {\tt zahl\_char\_t} is defined as 132 133 \begin{alltt} 134 typedef uint64_t zahl_char_t; 135 \end{alltt} 136 137 \noindent 138 As a user, try not to think about anything else than 139 140 \begin{alltt} 141 typedef \textcolor{c}{/* \textrm{ignore what is here} */} z_t[1]; 142 \end{alltt} 143 144 \noindent 145 details can change in future versions of libzahl. 146 147 {\tt z\_t} is defined as a single-element array. This 148 is often called a reference, or a call-by-reference. 149 There are some flexibility issues with this, why 150 {\tt struct zahl} has beed added, but for most uses 151 with big integers, it makes things simpler. Particularly, 152 you need not work prepend {\tt \&} to variable when making 153 function calls, but the existence of {\tt struct zahl} 154 allows you do so if you so choose. 155 156 The {\tt .sign} member, is either $-1$, 0, or 1, 157 when the integer is negative, zero, or positive, 158 respectively. Whenever {\tt .sign} is 0, the value 159 of {\tt .used} and {\tt .chars} are undefined. 160 161 {\tt .used} holds to the number of elements used in 162 {\tt .chars}, and {\tt .alloced} holds the allocation 163 side of {\tt .chars} measured in elements minus a few 164 extra elements that are always added to the allocation. 165 {\tt .chars} is a little-endian array of 64-bit digits, 166 these 64-bit digits are called `characters' in libzahl. 167 {\tt .chars} holds the absolute value of the 168 represented value. 169 170 Unless {\tt .sign} is 0, {\tt .chars} always contains 171 four extra elements, refered to as fluff. These are 172 merely allocated so functions can assume that they can 173 always manipulate groups of four characters, and need 174 not care about cases where the number of characters is 175 not a multiple of four. There are of course a few cases 176 when the precise number of characters is important. 177 178 179 \newpage 180 \section{Parameters} 181 \label{sec:Parameters} 182 183 The general order of parameters in libzahl functions 184 are: output integers, input integers, input data, 185 output data, parametric values. For example, in 186 addition, the out parameter is the first parameter. 187 But for marshalling and unmarshalling the buffer 188 is last. For random number generation the order is: 189 output, device, distribution, distribution parameters. 190 Whilst the distribution parameters are big integers, 191 they are not considered input integers. The order 192 of the input parameters are that of the order you 193 would write them using mathematical notation, this 194 also holds true if you include the output parameter 195 (as long as there is exactly one output), for example 196 197 \vspace{1em} 198 $a \gets b^c \mod d$ 199 \vspace{1em} 200 201 \noindent 202 is written 203 204 \begin{verbatim} 205 zmodpow(a, b, c, d); 206 \end{verbatim} 207 208 \noindent 209 or 210 211 \begin{verbatim} 212 zmodpowu(a, b, c, d); 213 \end{verbatim} 214 215 Like any self respecting bignum library, libzahl 216 supports using the same big integer reference as 217 for output as input, as long as all the output 218 parameters are mutually unique. For example 219 220 \begin{alltt} 221 a += b; 222 \end{alltt} 223 224 \noindent 225 or 226 227 \begin{alltt} 228 a = a + b; 229 \end{alltt} 230 231 \noindent 232 is written, using libzahl, as 233 234 \begin{alltt} 235 zadd(a, a, b); 236 \end{alltt} 237 238 For commutative functions, like {\tt zadd}, the 239 implementation is optimised to assume that this 240 order is more likely to be used than the alternative. 241 That is, we should, for example, write 242 243 \begin{alltt} 244 zadd(a, a, b); 245 \end{alltt} 246 247 \noindent 248 rather than 249 250 \begin{alltt} 251 zadd(a, b, a); 252 \end{alltt} 253 254 \noindent 255 This assumption is not made for non-commutative 256 functions. 257 258 When writting your own functions, be aware, 259 input parameters are generally not declared {\tt const} 260 in libzahl. Currently, some functions actually make 261 modifications (that do not affect the value) to 262 input parameters.