Sometimes a regular expression appears to be ambiguous. For example, suppose we compare the pattern:
begin\|beginning
to the string
beginning
either just the first 5 characters will match, or the whole string will match.
In every case like this, the longer match is preferred. The whole string will match.
Sometimes there is ambiguity not about how many characters to match, but
where the subexpressions occur within the match. This can effect
extraction functions like Emacs' match-beginning
or rewrite
functions like sed's s
command. For example, consider matching
the pattern:
b\(\[^q]*\)\(ing\)?
against the string
beginning
One possibility is that the first subexpression matches "eginning" and the second is skipped. Another possibility is that the first subexpression matches "eginn" and the second matches "ing".
The rule is that consistant with matching as many characters as possible, the length of lower numbered subexpressions is maximized in preference to maximizing the length of later subexpressions.
In the case of the above example, the two possible matches are equal in overall length. Therefore, it comes down to maximizing the lower-numbered subexpression, \1. The correct answer is that \1 matches "eginning" and \2 is skipped.
Go to the first, previous, next, last section, table of contents.