performance

Pattern matching obviously comes at a price: there is some overhead involved in matching, with respect to the equivalent hand-coded tests and accesses to sub-data. For custom notations to be usable in practice, it is a very important question whether the performance price to be paid is acceptable.

linear & non-linear matching

When talking performance, it is important to know that there are two kinds of patterns: linear and non-linear: in linear patterns, each metavariable can occur only once; in non-linear patterns, a metavariable may occur several times, standing for the same value. Thus, linear patterns are a subset of non-linear patterns.

For instance, the non-linear pattern [%x,%x] stands for lists of length two containing the same element twice.

Correspondingly, there are two kinds of resulting substitutions in myPatterns, both implementing the abstract class Subst: LinearSubst and NonLinearSubst.

There are several differences between these:

a variable in a non-linear substitution cannot be bound twice to different values; on the contrary, a linear substitution keeps no track whether a variable is already bound or not
a non-linear substitution provides a meaningful toString() method, and can thus be printed easily; a linear substitution prints just as a hexadecimal address in memory
a linear substitution is more basic, but also considerably more efficient

myPatterns can work in either linear or non-linear mode, as selected by method Matchbox.setLinear() (for interpreted matching), and by the kind of substitutions used (for parsed matching). The recommended usage is:

during development and debugging, always use non-linear substitutions, that are always safe and can be printed easily
when turning to production mode, switch to linear substitutions for efficiency if the patterns are guaranteed to be linear

benchmark

Since myPatterns generalize regex pattern matching of strings, let us take this as a reference.

The overhead of regex matching in Java (as provided by the java.util.regex package) can be easily measured using a simple benchmark. On our platform, we obtain an overhead of 10 - 13 times(!) over a hand-crafted version that doesn't use regular expressions.

Matching with myPatterns on a very matching-intensive benchmark (red-black trees) incurs an overhead of 2 - 4 times when patterns are parsed, and 4 - 6 times when patterns are interpreted, in comparison to hand-crafted code that doesn't use pattern matching. Thus, with respect to regex matching, myPatterns offer excellent speed. Of course, in this example, the patterns did not contain any regex; for patterns containing regexes, the additional overhead of myPatterns would likely be small in comparison to the overhead of regex matching.

This excellent speed as compared with regexes recommends custom notations for any practical application. Thus, most programmers can benefit from the extra generality related to matching general data structures, especially when combined with custom notations.

Up: myPatterns tutorial