Rx - Backreferences

Go to the first, previous, next, last section, table of contents.

Backreferences, Extractions and Substitutions

A backreference is written \n where n is some single digit other than 0. To be a valid backreference, there must be at least n parenthesized subexpressions in the pattern prior to the backreference.

A backreference matches a literal copy of whatever was matched by the corresponding subexpression. For example,

\(.*\)-\1

matches:

go-go
ha-ha
wakka-wakka
...

In some applications, subexpressions are used to extract substrings. For example, Emacs has the functions match-beginnning and match-end which report the positions of strings matched by subexpressions. These functions use the same numbering scheme for subexpressions as backreferences, with the additional rule that subexpression 0 is defined to be the whole regexp.

In some applications, subexpressions are used in string substitution. This again uses the backreference numbering scheme. For example, this sed command:

s/From:.*<\(.*\)>/To: \1/

first matches the line:

From: Joe Schmoe <schmoe@uspringfield.edu>

when it does, subexpression 1 matches "schmoe@uspringfield.edu". The command replaces the matched line with "To: \1" after doing subexpression substitution on it to get:

To: schmoe@uspringfield.edu

Go to the first, previous, next, last section, table of contents.