Go to the first, previous, next, last section, table of contents.


Ambiguous Patterns

Sometimes a regular expression appears to be ambiguous. For example, suppose we compare the pattern:

begin\|beginning

to the string

beginning

either just the first 5 characters will match, or the whole string will match.

In every case like this, the longer match is preferred. The whole string will match.

Sometimes there is ambiguity not about how many characters to match, but where the subexpressions occur within the match. This can effect extraction functions like Emacs' match-beginning or rewrite functions like sed's s command. For example, consider matching the pattern:

b\(\[^q]*\)\(ing\)?

against the string

beginning

One possibility is that the first subexpression matches "eginning" and the second is skipped. Another possibility is that the first subexpression matches "eginn" and the second matches "ing".

The rule is that consistant with matching as many characters as possible, the length of lower numbered subexpressions is maximized in preference to maximizing the length of later subexpressions.

In the case of the above example, the two possible matches are equal in overall length. Therefore, it comes down to maximizing the lower-numbered subexpression, \1. The correct answer is that \1 matches "eginning" and \2 is skipped.


Go to the first, previous, next, last section, table of contents.