performance
Pattern matching obviously comes at a price: there is some overhead
involved in matching, with respect to the equivalent hand-coded tests
and accesses to sub-data. For custom notations to be usable in
practice, it is a very important question whether the performance
price to be paid is acceptable.
linear & non-linear matching
When talking performance, it is
important to know that there are two kinds of patterns: linear and
non-linear: in linear patterns, each metavariable can occur only once;
in non-linear patterns, a metavariable may occur several times,
standing for the same value. Thus, linear patterns are a subset of
non-linear patterns.
For instance, the non-linear pattern [%x,%x] stands for lists of
length two containing the same element twice.
Correspondingly, there are two kinds of resulting substitutions in
myPatterns, both implementing the abstract class Subst: LinearSubst
and NonLinearSubst.
There are several differences between these:
- a variable in a non-linear substitution cannot be bound twice to
different values; on the contrary, a linear substitution keeps no
track whether a variable is already bound or not
- a non-linear substitution provides a meaningful toString()
method, and can thus be printed easily; a linear substitution
prints just as a hexadecimal address in memory
- a linear substitution is more basic, but also considerably
more efficient
myPatterns can work in either linear or non-linear mode, as selected
by method Matchbox.setLinear() (for interpreted matching), and by the
kind of substitutions used (for parsed matching). The recommended
usage is:
- during development and debugging, always use non-linear
substitutions, that are always safe and can be printed easily
- when turning to production mode, switch to linear substitutions
for efficiency if the patterns are guaranteed to be linear
benchmark
Since myPatterns generalize regex pattern matching of strings, let
us take this as a reference.
The overhead of regex matching in Java (as provided by the
java.util.regex package) can be easily measured using
a simple benchmark. On our platform, we
obtain an overhead of 10 - 13 times(!) over a hand-crafted version
that doesn't use regular expressions.
Matching with myPatterns on a very matching-intensive benchmark
(red-black trees) incurs an overhead of 2
- 4 times when patterns are parsed, and 4 - 6 times when patterns are
interpreted, in comparison to hand-crafted code that doesn't use
pattern matching. Thus, with respect to regex matching, myPatterns
offer excellent speed. Of course, in this example, the patterns did
not contain any regex; for patterns containing regexes, the additional
overhead of myPatterns would likely be small in comparison to the
overhead of regex matching.
This excellent speed as compared with regexes recommends custom
notations for any practical application. Thus, most programmers can
benefit from the extra generality related to matching general data
structures, especially when combined with custom notations.
Up: myPatterns tutorial