Go to the first, previous, next, last section, table of contents.


Character Sets

This section introduces the special characters . and [.

. matches any character except the NULL character. For example:

p.ck

matches

pick
pack
puck
pbck
pcck
p.ck

...

[ begins a character set. A character set is similar to . in that it matches not a single, literal character, but any of a set of characters. [ is different from . in that with [, you define the set of characters explicitly.

There are three basic forms a character set can take.

In the first form, the character set is spelled out:

[<cset-spec>]	-- every character in <cset-spec> is in the set.

In the second form, the character set indicated is the negation of a character set is explicitly spelled out:

[^<cset-spec>]	-- every character *not* in <cset-spec> is in the set.

A <cset-spec> is more or less an explicit enumeration of a set of characters. It can be written as a string of individual characters:

[aeiou]

or as a range of characters:

[0-9]

These two forms can be mixed:

[A-za-z0-9_$]

Note that special regexp characters (such as *) are not special within a character set. -, as illustrated above, is special, except, as illustrated below, when it is the first character mentioned.

This is a four-character set:

[-+*/]

The third form of a character set makes use of a pre-defined "character class":

[[:class-name:]] -- every character described by class-name is in the set.

The supported character classes are:

alnum	- the set of alpha-numeric characters
alpha	- the set of alphabetic characters
blank	- tab and space
cntrl	- the control characters
digit	- decimal digits
graph	- all printable characters except space
lower	- lower case letters
print	- the "printable" characters
punct	- punctuation
space	- whitespace characters
upper	- upper case letters
xdigit	- hexidecimal digits

Finally, character class sets can also be inverted:

[^[:space:]] - all non-whitespace characters

Character sets can be used in a regular expression anywhere a literal character can.


Go to the first, previous, next, last section, table of contents.