9base

revived minimalist port of Plan 9 userland to Unix
git clone git://git.suckless.org/9base
Log | Files | Refs | README | LICENSE

regexp.7 (2184B)


      1 .TH REGEXP 7
      2 .SH NAME
      3 regexp \- Plan 9 regular expression notation
      4 .SH DESCRIPTION
      5 This manual page describes the regular expression
      6 syntax used by the Plan 9 regular expression library
      7 .IR regexp (3).
      8 It is the form used by
      9 .IR egrep (1)
     10 before
     11 .I egrep
     12 got complicated.
     13 .PP
     14 A 
     15 .I "regular expression"
     16 specifies
     17 a set of strings of characters.
     18 A member of this set of strings is said to be
     19 .I matched
     20 by the regular expression.  In many applications
     21 a delimiter character, commonly
     22 .LR / ,
     23 bounds a regular expression.
     24 In the following specification for regular expressions
     25 the word `character' means any character (rune) but newline.
     26 .PP
     27 The syntax for a regular expression
     28 .B e0
     29 is
     30 .IP
     31 .EX
     32 e3:  literal | charclass | '.' | '^' | '$' | '(' e0 ')'
     33 
     34 e2:  e3
     35   |  e2 REP
     36 
     37 REP: '*' | '+' | '?'
     38 
     39 e1:  e2
     40   |  e1 e2
     41 
     42 e0:  e1
     43   |  e0 '|' e1
     44 .EE
     45 .PP
     46 A
     47 .B literal
     48 is any non-metacharacter, or a metacharacter
     49 (one of
     50 .BR .*+?[]()|\e^$ ),
     51 or the delimiter
     52 preceded by 
     53 .LR \e .
     54 .PP
     55 A
     56 .B charclass
     57 is a nonempty string
     58 .I s
     59 bracketed
     60 .BI [ \|s\| ]
     61 (or
     62 .BI [^ s\| ]\fR);
     63 it matches any character in (or not in)
     64 .IR s .
     65 A negated character class never
     66 matches newline.
     67 A substring 
     68 .IB a - b\f1,
     69 with
     70 .I a
     71 and
     72 .I b
     73 in ascending
     74 order, stands for the inclusive
     75 range of
     76 characters between
     77 .I a
     78 and
     79 .IR b .
     80 In 
     81 .IR s ,
     82 the metacharacters
     83 .LR - ,
     84 .LR ] ,
     85 an initial
     86 .LR ^ ,
     87 and the regular expression delimiter
     88 must be preceded by a
     89 .LR \e ;
     90 other metacharacters 
     91 have no special meaning and
     92 may appear unescaped.
     93 .PP
     94 A 
     95 .L .
     96 matches any character.
     97 .PP
     98 A
     99 .L ^
    100 matches the beginning of a line;
    101 .L $
    102 matches the end of the line.
    103 .PP
    104 The 
    105 .B REP
    106 operators match zero or more
    107 .RB ( * ),
    108 one or more
    109 .RB ( + ),
    110 zero or one
    111 .RB ( ? ),
    112 instances respectively of the preceding regular expression 
    113 .BR e2 .
    114 .PP
    115 A concatenated regular expression,
    116 .BR "e1\|e2" ,
    117 matches a match to 
    118 .B e1
    119 followed by a match to
    120 .BR e2 .
    121 .PP
    122 An alternative regular expression,
    123 .BR "e0\||\|e1" ,
    124 matches either a match to
    125 .B e0
    126 or a match to
    127 .BR e1 .
    128 .PP
    129 A match to any part of a regular expression
    130 extends as far as possible without preventing
    131 a match to the remainder of the regular expression.
    132 .SH "SEE ALSO"
    133 .IR regexp (3)