awk.1 (10645B)
1 .TH AWK 1 2 .SH NAME 3 awk \- pattern-directed scanning and processing language 4 .SH SYNOPSIS 5 .B awk 6 [ 7 .BI -F fs 8 ] 9 [ 10 .BI -v 11 .I var=value 12 ] 13 [ 14 .BI -mr n 15 ] 16 [ 17 .BI -mf n 18 ] 19 [ 20 .B -f 21 .I prog 22 [ 23 .I prog 24 ] 25 [ 26 .I file ... 27 ] 28 .SH DESCRIPTION 29 .I Awk 30 scans each input 31 .I file 32 for lines that match any of a set of patterns specified literally in 33 .IR prog 34 or in one or more files 35 specified as 36 .B -f 37 .IR file . 38 With each pattern 39 there can be an associated action that will be performed 40 when a line of a 41 .I file 42 matches the pattern. 43 Each line is matched against the 44 pattern portion of every pattern-action statement; 45 the associated action is performed for each matched pattern. 46 The file name 47 .L - 48 means the standard input. 49 Any 50 .IR file 51 of the form 52 .I var=value 53 is treated as an assignment, not a file name, 54 and is executed at the time it would have been opened if it were a file name. 55 The option 56 .B -v 57 followed by 58 .I var=value 59 is an assignment to be done before 60 .I prog 61 is executed; 62 any number of 63 .B -v 64 options may be present. 65 .B \-F 66 .IR fs 67 option defines the input field separator to be the regular expression 68 .IR fs . 69 .PP 70 An input line is normally made up of fields separated by white space, 71 or by regular expression 72 .BR FS . 73 The fields are denoted 74 .BR $1 , 75 .BR $2 , 76 \&..., while 77 .B $0 78 refers to the entire line. 79 If 80 .BR FS 81 is null, the input line is split into one field per character. 82 .PP 83 To compensate for inadequate implementation of storage management, 84 the 85 .B \-mr 86 option can be used to set the maximum size of the input record, 87 and the 88 .B \-mf 89 option to set the maximum number of fields. 90 .PP 91 A pattern-action statement has the form 92 .IP 93 .IB pattern " { " action " } 94 .PP 95 A missing 96 .BI { " action " } 97 means print the line; 98 a missing pattern always matches. 99 Pattern-action statements are separated by newlines or semicolons. 100 .PP 101 An action is a sequence of statements. 102 A statement can be one of the following: 103 .PP 104 .EX 105 .ta \w'\fLdelete array[expression]'u 106 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 107 while(\fI expression \fP)\fI statement\fP 108 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 109 for(\fI var \fPin\fI array \fP)\fI statement\fP 110 do\fI statement \fPwhile(\fI expression \fP) 111 break 112 continue 113 {\fR [\fP\fI statement ... \fP\fR] \fP} 114 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP 115 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 116 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 117 return\fR [ \fP\fIexpression \fP\fR]\fP 118 next #\fR skip remaining patterns on this input line\fP 119 nextfile #\fR skip rest of this file, open next, start at top\fP 120 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 121 delete\fI array\fP #\fR delete all elements of array\fP 122 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 123 .EE 124 .DT 125 .PP 126 Statements are terminated by 127 semicolons, newlines or right braces. 128 An empty 129 .I expression-list 130 stands for 131 .BR $0 . 132 String constants are quoted \&\fL"\ "\fR, 133 with the usual C escapes recognized within. 134 Expressions take on string or numeric values as appropriate, 135 and are built using the operators 136 .B + \- * / % ^ 137 (exponentiation), and concatenation (indicated by white space). 138 The operators 139 .B 140 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 141 are also available in expressions. 142 Variables may be scalars, array elements 143 (denoted 144 .IB x [ i ] ) 145 or fields. 146 Variables are initialized to the null string. 147 Array subscripts may be any string, 148 not necessarily numeric; 149 this allows for a form of associative memory. 150 Multiple subscripts such as 151 .B [i,j,k] 152 are permitted; the constituents are concatenated, 153 separated by the value of 154 .BR SUBSEP . 155 .PP 156 The 157 .B print 158 statement prints its arguments on the standard output 159 (or on a file if 160 .BI > file 161 or 162 .BI >> file 163 is present or on a pipe if 164 .BI | cmd 165 is present), separated by the current output field separator, 166 and terminated by the output record separator. 167 .I file 168 and 169 .I cmd 170 may be literal names or parenthesized expressions; 171 identical string values in different statements denote 172 the same open file. 173 The 174 .B printf 175 statement formats its expression list according to the format 176 (see 177 .IR fprintf (2)) . 178 The built-in function 179 .BI close( expr ) 180 closes the file or pipe 181 .IR expr . 182 The built-in function 183 .BI fflush( expr ) 184 flushes any buffered output for the file or pipe 185 .IR expr . 186 .PP 187 The mathematical functions 188 .BR exp , 189 .BR log , 190 .BR sqrt , 191 .BR sin , 192 .BR cos , 193 and 194 .BR atan2 195 are built in. 196 Other built-in functions: 197 .TF length 198 .TP 199 .B length 200 the length of its argument 201 taken as a string, 202 or of 203 .B $0 204 if no argument. 205 .TP 206 .B rand 207 random number on (0,1) 208 .TP 209 .B srand 210 sets seed for 211 .B rand 212 and returns the previous seed. 213 .TP 214 .B int 215 truncates to an integer value 216 .TP 217 .B utf 218 converts its numerical argument, a character number, to a 219 .SM UTF 220 string 221 .TP 222 .BI substr( s , " m" , " n\fL) 223 the 224 .IR n -character 225 substring of 226 .I s 227 that begins at position 228 .IR m 229 counted from 1. 230 .TP 231 .BI index( s , " t" ) 232 the position in 233 .I s 234 where the string 235 .I t 236 occurs, or 0 if it does not. 237 .TP 238 .BI match( s , " r" ) 239 the position in 240 .I s 241 where the regular expression 242 .I r 243 occurs, or 0 if it does not. 244 The variables 245 .B RSTART 246 and 247 .B RLENGTH 248 are set to the position and length of the matched string. 249 .TP 250 .BI split( s , " a" , " fs\fL) 251 splits the string 252 .I s 253 into array elements 254 .IB a [1]\f1, 255 .IB a [2]\f1, 256 \&..., 257 .IB a [ n ]\f1, 258 and returns 259 .IR n . 260 The separation is done with the regular expression 261 .I fs 262 or with the field separator 263 .B FS 264 if 265 .I fs 266 is not given. 267 An empty string as field separator splits the string 268 into one array element per character. 269 .TP 270 .BI sub( r , " t" , " s\fL) 271 substitutes 272 .I t 273 for the first occurrence of the regular expression 274 .I r 275 in the string 276 .IR s . 277 If 278 .I s 279 is not given, 280 .B $0 281 is used. 282 .TP 283 .B gsub 284 same as 285 .B sub 286 except that all occurrences of the regular expression 287 are replaced; 288 .B sub 289 and 290 .B gsub 291 return the number of replacements. 292 .TP 293 .BI sprintf( fmt , " expr" , " ...\fL) 294 the string resulting from formatting 295 .I expr ... 296 according to the 297 .I printf 298 format 299 .I fmt 300 .TP 301 .BI system( cmd ) 302 executes 303 .I cmd 304 and returns its exit status 305 .TP 306 .BI tolower( str ) 307 returns a copy of 308 .I str 309 with all upper-case characters translated to their 310 corresponding lower-case equivalents. 311 .TP 312 .BI toupper( str ) 313 returns a copy of 314 .I str 315 with all lower-case characters translated to their 316 corresponding upper-case equivalents. 317 .PD 318 .PP 319 The ``function'' 320 .B getline 321 sets 322 .B $0 323 to the next input record from the current input file; 324 .B getline 325 .BI < file 326 sets 327 .B $0 328 to the next record from 329 .IR file . 330 .B getline 331 .I x 332 sets variable 333 .I x 334 instead. 335 Finally, 336 .IB cmd " | getline 337 pipes the output of 338 .I cmd 339 into 340 .BR getline ; 341 each call of 342 .B getline 343 returns the next line of output from 344 .IR cmd . 345 In all cases, 346 .B getline 347 returns 1 for a successful input, 348 0 for end of file, and \-1 for an error. 349 .PP 350 Patterns are arbitrary Boolean combinations 351 (with 352 .BR "! || &&" ) 353 of regular expressions and 354 relational expressions. 355 Regular expressions are as in 356 .IR regexp (6). 357 Isolated regular expressions 358 in a pattern apply to the entire line. 359 Regular expressions may also occur in 360 relational expressions, using the operators 361 .BR ~ 362 and 363 .BR !~ . 364 .BI / re / 365 is a constant regular expression; 366 any string (constant or variable) may be used 367 as a regular expression, except in the position of an isolated regular expression 368 in a pattern. 369 .PP 370 A pattern may consist of two patterns separated by a comma; 371 in this case, the action is performed for all lines 372 from an occurrence of the first pattern 373 though an occurrence of the second. 374 .PP 375 A relational expression is one of the following: 376 .IP 377 .I expression matchop regular-expression 378 .br 379 .I expression relop expression 380 .br 381 .IB expression " in " array-name 382 .br 383 .BI ( expr , expr,... ") in " array-name 384 .PP 385 where a 386 .I relop 387 is any of the six relational operators in C, 388 and a 389 .I matchop 390 is either 391 .B ~ 392 (matches) 393 or 394 .B !~ 395 (does not match). 396 A conditional is an arithmetic expression, 397 a relational expression, 398 or a Boolean combination 399 of these. 400 .PP 401 The special patterns 402 .B BEGIN 403 and 404 .B END 405 may be used to capture control before the first input line is read 406 and after the last. 407 .B BEGIN 408 and 409 .B END 410 do not combine with other patterns. 411 .PP 412 Variable names with special meanings: 413 .TF FILENAME 414 .TP 415 .B CONVFMT 416 conversion format used when converting numbers 417 (default 418 .BR "%.6g" ) 419 .TP 420 .B FS 421 regular expression used to separate fields; also settable 422 by option 423 .BI \-F fs\f1. 424 .TP 425 .BR NF 426 number of fields in the current record 427 .TP 428 .B NR 429 ordinal number of the current record 430 .TP 431 .B FNR 432 ordinal number of the current record in the current file 433 .TP 434 .B FILENAME 435 the name of the current input file 436 .TP 437 .B RS 438 input record separator (default newline) 439 .TP 440 .B OFS 441 output field separator (default blank) 442 .TP 443 .B ORS 444 output record separator (default newline) 445 .TP 446 .B OFMT 447 output format for numbers (default 448 .BR "%.6g" ) 449 .TP 450 .B SUBSEP 451 separates multiple subscripts (default 034) 452 .TP 453 .B ARGC 454 argument count, assignable 455 .TP 456 .B ARGV 457 argument array, assignable; 458 non-null members are taken as file names 459 .TP 460 .B ENVIRON 461 array of environment variables; subscripts are names. 462 .PD 463 .PP 464 Functions may be defined (at the position of a pattern-action statement) thus: 465 .IP 466 .L 467 function foo(a, b, c) { ...; return x } 468 .PP 469 Parameters are passed by value if scalar and by reference if array name; 470 functions may be called recursively. 471 Parameters are local to the function; all other variables are global. 472 Thus local variables may be created by providing excess parameters in 473 the function definition. 474 .SH EXAMPLES 475 .TP 476 .L 477 length($0) > 72 478 Print lines longer than 72 characters. 479 .TP 480 .L 481 { print $2, $1 } 482 Print first two fields in opposite order. 483 .PP 484 .EX 485 BEGIN { FS = ",[ \et]*|[ \et]+" } 486 { print $2, $1 } 487 .EE 488 .ns 489 .IP 490 Same, with input fields separated by comma and/or blanks and tabs. 491 .PP 492 .EX 493 { s += $1 } 494 END { print "sum is", s, " average is", s/NR } 495 .EE 496 .ns 497 .IP 498 Add up first column, print sum and average. 499 .TP 500 .L 501 /start/, /stop/ 502 Print all lines between start/stop pairs. 503 .PP 504 .EX 505 BEGIN { # Simulate echo(1) 506 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 507 printf "\en" 508 exit } 509 .EE 510 .SH SOURCE 511 .B /sys/src/cmd/awk 512 .SH SEE ALSO 513 .IR sed (1), 514 .IR regexp (6), 515 .br 516 A. V. Aho, B. W. Kernighan, P. J. Weinberger, 517 .I 518 The AWK Programming Language, 519 Addison-Wesley, 1988. ISBN 0-201-07981-X 520 .SH BUGS 521 There are no explicit conversions between numbers and strings. 522 To force an expression to be treated as a number add 0 to it; 523 to force it to be treated as a string concatenate 524 \&\fL""\fP to it. 525 .br 526 The scope rules for variables in functions are a botch; 527 the syntax is worse.