9base

revived minimalist port of Plan 9 userland to Unix
git clone git://git.suckless.org/9base
Log | Files | Refs | README | LICENSE

awk.1 (10645B)


      1 .TH AWK 1
      2 .SH NAME
      3 awk \- pattern-directed scanning and processing language
      4 .SH SYNOPSIS
      5 .B awk
      6 [
      7 .BI -F fs
      8 ]
      9 [
     10 .BI -v
     11 .I var=value
     12 ]
     13 [
     14 .BI -mr n
     15 ]
     16 [
     17 .BI -mf n
     18 ]
     19 [
     20 .B -f
     21 .I prog
     22 [
     23 .I prog
     24 ]
     25 [
     26 .I file ...
     27 ]
     28 .SH DESCRIPTION
     29 .I Awk
     30 scans each input
     31 .I file
     32 for lines that match any of a set of patterns specified literally in
     33 .IR prog
     34 or in one or more files
     35 specified as
     36 .B -f
     37 .IR file .
     38 With each pattern
     39 there can be an associated action that will be performed
     40 when a line of a
     41 .I file
     42 matches the pattern.
     43 Each line is matched against the
     44 pattern portion of every pattern-action statement;
     45 the associated action is performed for each matched pattern.
     46 The file name 
     47 .L -
     48 means the standard input.
     49 Any
     50 .IR file
     51 of the form
     52 .I var=value
     53 is treated as an assignment, not a file name,
     54 and is executed at the time it would have been opened if it were a file name.
     55 The option
     56 .B -v
     57 followed by
     58 .I var=value
     59 is an assignment to be done before
     60 .I prog
     61 is executed;
     62 any number of
     63 .B -v
     64 options may be present.
     65 .B \-F
     66 .IR fs
     67 option defines the input field separator to be the regular expression
     68 .IR fs .
     69 .PP
     70 An input line is normally made up of fields separated by white space,
     71 or by regular expression
     72 .BR FS .
     73 The fields are denoted
     74 .BR $1 ,
     75 .BR $2 ,
     76 \&..., while
     77 .B $0
     78 refers to the entire line.
     79 If
     80 .BR FS
     81 is null, the input line is split into one field per character.
     82 .PP
     83 To compensate for inadequate implementation of storage management,
     84 the 
     85 .B \-mr
     86 option can be used to set the maximum size of the input record,
     87 and the
     88 .B \-mf
     89 option to set the maximum number of fields.
     90 .PP
     91 A pattern-action statement has the form
     92 .IP
     93 .IB pattern " { " action " }
     94 .PP
     95 A missing 
     96 .BI { " action " }
     97 means print the line;
     98 a missing pattern always matches.
     99 Pattern-action statements are separated by newlines or semicolons.
    100 .PP
    101 An action is a sequence of statements.
    102 A statement can be one of the following:
    103 .PP
    104 .EX
    105 .ta \w'\fLdelete array[expression]'u
    106 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
    107 while(\fI expression \fP)\fI statement\fP
    108 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
    109 for(\fI var \fPin\fI array \fP)\fI statement\fP
    110 do\fI statement \fPwhile(\fI expression \fP)
    111 break
    112 continue
    113 {\fR [\fP\fI statement ... \fP\fR] \fP}
    114 \fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
    115 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    116 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
    117 return\fR [ \fP\fIexpression \fP\fR]\fP
    118 next	#\fR skip remaining patterns on this input line\fP
    119 nextfile	#\fR skip rest of this file, open next, start at top\fP
    120 delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
    121 delete\fI array\fP	#\fR delete all elements of array\fP
    122 exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
    123 .EE
    124 .DT
    125 .PP
    126 Statements are terminated by
    127 semicolons, newlines or right braces.
    128 An empty
    129 .I expression-list
    130 stands for
    131 .BR $0 .
    132 String constants are quoted \&\fL"\ "\fR,
    133 with the usual C escapes recognized within.
    134 Expressions take on string or numeric values as appropriate,
    135 and are built using the operators
    136 .B + \- * / % ^
    137 (exponentiation), and concatenation (indicated by white space).
    138 The operators
    139 .B
    140 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
    141 are also available in expressions.
    142 Variables may be scalars, array elements
    143 (denoted
    144 .IB x  [ i ] )
    145 or fields.
    146 Variables are initialized to the null string.
    147 Array subscripts may be any string,
    148 not necessarily numeric;
    149 this allows for a form of associative memory.
    150 Multiple subscripts such as
    151 .B [i,j,k]
    152 are permitted; the constituents are concatenated,
    153 separated by the value of
    154 .BR SUBSEP .
    155 .PP
    156 The
    157 .B print
    158 statement prints its arguments on the standard output
    159 (or on a file if
    160 .BI > file
    161 or
    162 .BI >> file
    163 is present or on a pipe if
    164 .BI | cmd
    165 is present), separated by the current output field separator,
    166 and terminated by the output record separator.
    167 .I file
    168 and
    169 .I cmd
    170 may be literal names or parenthesized expressions;
    171 identical string values in different statements denote
    172 the same open file.
    173 The
    174 .B printf
    175 statement formats its expression list according to the format
    176 (see
    177 .IR fprintf (2)) .
    178 The built-in function
    179 .BI close( expr )
    180 closes the file or pipe
    181 .IR expr .
    182 The built-in function
    183 .BI fflush( expr )
    184 flushes any buffered output for the file or pipe
    185 .IR expr .
    186 .PP
    187 The mathematical functions
    188 .BR exp ,
    189 .BR log ,
    190 .BR sqrt ,
    191 .BR sin ,
    192 .BR cos ,
    193 and
    194 .BR atan2 
    195 are built in.
    196 Other built-in functions:
    197 .TF length
    198 .TP
    199 .B length
    200 the length of its argument
    201 taken as a string,
    202 or of
    203 .B $0
    204 if no argument.
    205 .TP
    206 .B rand
    207 random number on (0,1)
    208 .TP
    209 .B srand
    210 sets seed for
    211 .B rand
    212 and returns the previous seed.
    213 .TP
    214 .B int
    215 truncates to an integer value
    216 .TP
    217 .B utf
    218 converts its numerical argument, a character number, to a
    219 .SM UTF
    220 string
    221 .TP
    222 .BI substr( s , " m" , " n\fL)
    223 the
    224 .IR n -character
    225 substring of
    226 .I s
    227 that begins at position
    228 .IR m 
    229 counted from 1.
    230 .TP
    231 .BI index( s , " t" )
    232 the position in
    233 .I s
    234 where the string
    235 .I t
    236 occurs, or 0 if it does not.
    237 .TP
    238 .BI match( s , " r" )
    239 the position in
    240 .I s
    241 where the regular expression
    242 .I r
    243 occurs, or 0 if it does not.
    244 The variables
    245 .B RSTART
    246 and
    247 .B RLENGTH
    248 are set to the position and length of the matched string.
    249 .TP
    250 .BI split( s , " a" , " fs\fL)
    251 splits the string
    252 .I s
    253 into array elements
    254 .IB a [1]\f1,
    255 .IB a [2]\f1,
    256 \&...,
    257 .IB a [ n ]\f1,
    258 and returns
    259 .IR n .
    260 The separation is done with the regular expression
    261 .I fs
    262 or with the field separator
    263 .B FS
    264 if
    265 .I fs
    266 is not given.
    267 An empty string as field separator splits the string
    268 into one array element per character.
    269 .TP
    270 .BI sub( r , " t" , " s\fL)
    271 substitutes
    272 .I t
    273 for the first occurrence of the regular expression
    274 .I r
    275 in the string
    276 .IR s .
    277 If
    278 .I s
    279 is not given,
    280 .B $0
    281 is used.
    282 .TP
    283 .B gsub
    284 same as
    285 .B sub
    286 except that all occurrences of the regular expression
    287 are replaced;
    288 .B sub
    289 and
    290 .B gsub
    291 return the number of replacements.
    292 .TP
    293 .BI sprintf( fmt , " expr" , " ...\fL)
    294 the string resulting from formatting
    295 .I expr ...
    296 according to the
    297 .I printf
    298 format
    299 .I fmt
    300 .TP
    301 .BI system( cmd )
    302 executes
    303 .I cmd
    304 and returns its exit status
    305 .TP
    306 .BI tolower( str )
    307 returns a copy of
    308 .I str
    309 with all upper-case characters translated to their
    310 corresponding lower-case equivalents.
    311 .TP
    312 .BI toupper( str )
    313 returns a copy of
    314 .I str
    315 with all lower-case characters translated to their
    316 corresponding upper-case equivalents.
    317 .PD
    318 .PP
    319 The ``function''
    320 .B getline
    321 sets
    322 .B $0
    323 to the next input record from the current input file;
    324 .B getline
    325 .BI < file
    326 sets
    327 .B $0
    328 to the next record from
    329 .IR file .
    330 .B getline
    331 .I x
    332 sets variable
    333 .I x
    334 instead.
    335 Finally,
    336 .IB cmd " | getline
    337 pipes the output of
    338 .I cmd
    339 into
    340 .BR getline ;
    341 each call of
    342 .B getline
    343 returns the next line of output from
    344 .IR cmd .
    345 In all cases,
    346 .B getline
    347 returns 1 for a successful input,
    348 0 for end of file, and \-1 for an error.
    349 .PP
    350 Patterns are arbitrary Boolean combinations
    351 (with
    352 .BR "! || &&" )
    353 of regular expressions and
    354 relational expressions.
    355 Regular expressions are as in
    356 .IR regexp (6).
    357 Isolated regular expressions
    358 in a pattern apply to the entire line.
    359 Regular expressions may also occur in
    360 relational expressions, using the operators
    361 .BR ~
    362 and
    363 .BR !~ .
    364 .BI / re /
    365 is a constant regular expression;
    366 any string (constant or variable) may be used
    367 as a regular expression, except in the position of an isolated regular expression
    368 in a pattern.
    369 .PP
    370 A pattern may consist of two patterns separated by a comma;
    371 in this case, the action is performed for all lines
    372 from an occurrence of the first pattern
    373 though an occurrence of the second.
    374 .PP
    375 A relational expression is one of the following:
    376 .IP
    377 .I expression matchop regular-expression
    378 .br
    379 .I expression relop expression
    380 .br
    381 .IB expression " in " array-name
    382 .br
    383 .BI ( expr , expr,... ") in " array-name
    384 .PP
    385 where a
    386 .I relop
    387 is any of the six relational operators in C,
    388 and a
    389 .I matchop
    390 is either
    391 .B ~
    392 (matches)
    393 or
    394 .B !~
    395 (does not match).
    396 A conditional is an arithmetic expression,
    397 a relational expression,
    398 or a Boolean combination
    399 of these.
    400 .PP
    401 The special patterns
    402 .B BEGIN
    403 and
    404 .B END
    405 may be used to capture control before the first input line is read
    406 and after the last.
    407 .B BEGIN
    408 and
    409 .B END
    410 do not combine with other patterns.
    411 .PP
    412 Variable names with special meanings:
    413 .TF FILENAME
    414 .TP
    415 .B CONVFMT
    416 conversion format used when converting numbers
    417 (default
    418 .BR "%.6g" )
    419 .TP
    420 .B FS
    421 regular expression used to separate fields; also settable
    422 by option
    423 .BI \-F fs\f1.
    424 .TP
    425 .BR NF
    426 number of fields in the current record
    427 .TP
    428 .B NR
    429 ordinal number of the current record
    430 .TP
    431 .B FNR
    432 ordinal number of the current record in the current file
    433 .TP
    434 .B FILENAME
    435 the name of the current input file
    436 .TP
    437 .B RS
    438 input record separator (default newline)
    439 .TP
    440 .B OFS
    441 output field separator (default blank)
    442 .TP
    443 .B ORS
    444 output record separator (default newline)
    445 .TP
    446 .B OFMT
    447 output format for numbers (default
    448 .BR "%.6g" )
    449 .TP
    450 .B SUBSEP
    451 separates multiple subscripts (default 034)
    452 .TP
    453 .B ARGC
    454 argument count, assignable
    455 .TP
    456 .B ARGV
    457 argument array, assignable;
    458 non-null members are taken as file names
    459 .TP
    460 .B ENVIRON
    461 array of environment variables; subscripts are names.
    462 .PD
    463 .PP
    464 Functions may be defined (at the position of a pattern-action statement) thus:
    465 .IP
    466 .L
    467 function foo(a, b, c) { ...; return x }
    468 .PP
    469 Parameters are passed by value if scalar and by reference if array name;
    470 functions may be called recursively.
    471 Parameters are local to the function; all other variables are global.
    472 Thus local variables may be created by providing excess parameters in
    473 the function definition.
    474 .SH EXAMPLES
    475 .TP
    476 .L
    477 length($0) > 72
    478 Print lines longer than 72 characters.
    479 .TP
    480 .L
    481 { print $2, $1 }
    482 Print first two fields in opposite order.
    483 .PP
    484 .EX
    485 BEGIN { FS = ",[ \et]*|[ \et]+" }
    486       { print $2, $1 }
    487 .EE
    488 .ns
    489 .IP
    490 Same, with input fields separated by comma and/or blanks and tabs.
    491 .PP
    492 .EX
    493 	{ s += $1 }
    494 END	{ print "sum is", s, " average is", s/NR }
    495 .EE
    496 .ns
    497 .IP
    498 Add up first column, print sum and average.
    499 .TP
    500 .L
    501 /start/, /stop/
    502 Print all lines between start/stop pairs.
    503 .PP
    504 .EX
    505 BEGIN	{	# Simulate echo(1)
    506 	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
    507 	printf "\en"
    508 	exit }
    509 .EE
    510 .SH SOURCE
    511 .B /sys/src/cmd/awk
    512 .SH SEE ALSO
    513 .IR sed (1),
    514 .IR regexp (6),
    515 .br
    516 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
    517 .I
    518 The AWK Programming Language,
    519 Addison-Wesley, 1988.  ISBN 0-201-07981-X
    520 .SH BUGS
    521 There are no explicit conversions between numbers and strings.
    522 To force an expression to be treated as a number add 0 to it;
    523 to force it to be treated as a string concatenate
    524 \&\fL""\fP to it.
    525 .br
    526 The scope rules for variables in functions are a botch;
    527 the syntax is worse.