The match command can be called with the following syntax:
match [-t] tree [pattern]
Arguments:
tree is a parse tree given
as a multi-level list of concrete syntax tokens and sub-trees. Lists
are written using parentheses, and whitespace as delimiters. Tokens
are strings delimited by simple quotes.
For example, “(('count')
'=' ('count') '+' ('1') ';')” may represent a parse tree for the
C/C++/Java code fragment “count = count + 1;”, depending on how
a particular tool parses this fragment.
pattern is an unparsed
pattern, represented as a string.
Unparsed patterns are code
fragments written in the native syntax of the “subject” language
to be matched except that (1) meta-variables may replace subtrees in
the corresponding parse tree, and (2) meta-parentheses may
optionally group sub-constructs corresponding to such subtrees.
Meta-variables consist of a single letter preceded by a '%'.
Meta-parentheses are noted “%(“ and “%)”. For example
“a[i]=0;”, “%x = %x + 1;”, “%f(%l)”, and “%f(%(%x,
%y%))”, are unparsed patterns corresponding to some C/C++/Java
code fragments.
Options:
-t (trace): prints a trace of the matching steps
A parse tree t is said to match an unparsed pattern p under a substitution sigma, mapping the (meta-)variables in p to subtrees of t, if the flattened form of the tree, TXT(t), computed by enumerating all the tokens in t in depth-first in-order, is equal to p in which any meta-variable has been substituted by the unparsed form of its mapped subtree. In other words, t matches p iff TXT(t) = p[TXT(sigma(xi)/xi].
When both a parse tree and a pattern are provided, the tree is matched against the pattern. This may either fail, if no match was found, or succeed with a computed substitution. For instance, matching the tree “(('i') '=' ('i') '+' ('1') ';')” with the pattern “%x = %x + 1;” succeeds with substitution {x --> ( 'i' )}.
The unparsed pattern matching algorithm is incomplete, in the sense that the match may not be found even it there is one. However, completeness can be obtained by adding sufficiently many meta-parentheses in the pattern. The trivial way is to add meta-parentheses around each sub-construct of the fragment. Of course, this trivial solution is inconvenient because it makes unparsed patterns very difficult tor read. The unparsed matching algorithm uses a lookahead of one token to limit the number of required meta-parentheses to only a few sub-constructs.
The minimally parenthesized unparsed pattern corresponding to a classic tree pattern t, noted trans(t), can be computed by invoking match with a parse tree pattern (a parse tree containing meta-variables), and providing no unparsed pattern. Meta-variables in the parse tree are noted by single letters preceded by a '%' (and not surrounded by quotes). The computed unparsed pattern is guaranteed to match any instance of the parse tree pattern, i.e., any tree obtained by substituting meta-variables in the tree by (ground) parse trees. For instance, for the tree pattern “(%x '=' (%x '+' %y))”, the minimally parenthesized unparsed pattern is “%x = %x + %y”; for the tree pattern “(%x '=' ((%x '-' %y) '-' %z))”, the corresponding unparsed pattern is “%x = %(%(%x - %y%) - %z%)”.
Examples:
$
match "(('a') '=' ((('a') '-' (('b') '*' ('c'))) '-' ('d')))"
"%x = %y - %z"
yes, sigma = { x<-( 'a' ) y<-( (
'a' ) '-' ( ( 'b' ) '*' ( 'c' ) ) ) z<-( 'd' ) }.
$
match "(('a') '=' ((('a') '-' (('b') '*' ('c'))) '-' ('d')))"
"%x = %x - %y - %z"
no.
$
match "(%x '=' ((%x '-' %y) '-' %z))"
trans(t)="%x
= %(%(%x - %y%) - %z%)"
ok, sigma = { x<-mvar( '%x' ) y<-mvar( '%y' ) z<-mvar( '%z' ) }
$
match "(('a') '=' ((('a') '-' (('b') '*' ('c'))) '-' ('d')))"
"%x = %(%(%x - %y%) - %z%)"
yes, sigma = { x<-( 'a' )
y<-( ( 'b' ) '*' ( 'c' ) ) z<-( 'd' ) }.