Go to the first, previous, next, last section, table of contents.

Rx Theory

There are two match algorithms. One is for truly regular regexps (those that can be reduced to a dfa). The other is for non-regular regexps.

The dfa algorithm implements the idea suggested in Compilers by Aho, Sethi and Ullman:

[One] approach [to pattern matching regular expressions] is to use a DFA, but avoid constructing all of the transition table by using a technique called "lazy transition evaluation". Here, transitions are computed at run time [when] actually needed. [T]ransitions are stored in a cache. [....] If the cache becomes full, we can erase some previously computed transition to make room for the new transition.

The implementation in Rx is generalized from that, but the above description covers what is used for Posix patterns.

The non-dfa algorithm implements a "recursive decomposition" technique described in email by Henry Spencer. For a given pattern, this algorithm first checks to see if a simpler, superset language, DFA-pattern matches. If it does, then this algorithm does the detail-work to see if the non-DFA pattern matches.

The detail work usually involves recursing on subpatterns. For example, a concatentation of two subexpressions matches a string if the string can be divided into two parts, each matching one subexpression, in the right order. More than one solution is often possible for a given pattern. This ambiguity is the subject of the "leftmost longest" rules in the spec, and the back-tracking oriented stream-of-solution functions rx_make_solutions, rx_next_solution and rx_free_solutions.

rxspencer.[ch]	 			-- The non-DFA algorithm
rxanal.[ch] rxsuper.[ch] rxnfa.[ch]	-- The DFA algorithm

Go to the first, previous, next, last section, table of contents.