Unparsed pattern matching is a new paradigm for source code pattern matching that is very easy to implement in existing program manipulation tools such as: compilers, model checkers, code inspectors, refactoring tools, tools for legacy code understanding and transformation, etc.
Source code patterns serve to easily search for code fragments having a specific form. They are used in many different tools for program manipulation. Source code patterns in these tools are generally written in a specific notation for syntax trees, either private to the tool, or publicly known, such as LISP or XML. Writing such source code patterns usually require an intimate knowledge of the syntax tree representation within that tool and of its notation.
Concrete syntax patterns are a specific kind of source code patterns that are very conveniently expressed in the native syntax of the "subject" programming language: they are simply source code fragments with holes. Thus, concrete syntax patterns are trivial to write and understand by any programmer. They are also portable, because they are independent of the syntax tree representation within different tools. However, they are suprisingly difficult to implement efficiently. That is the reason why just a few tools implement traditional concrete syntax patterns without imposing severe restrictions on their form.
Unparsed patterns are a new form of concrete syntax patterns that are not only easy to use, but also very easy to implement:
Although written in concrete syntax, they can be efficiently matched without being parsed. By avoiding the parsing, they are much easier to implement, because (1) they don't require any parser generator framework or tool and (2) they don't require extending the grammar of the subject language in order to accomodate fragments and holes, and (3) their only dependency on the subject language to be matched is an unparsing function, easy to write or even generate automatically.
They are completely language-independent with respect to the host language, which represents them as plain strings. As a result, once implemented, they can be used within any host language.
Unparsed patterns have first been implement in an extensible version of the GCC compiler, called mygcc. They are now available within the free myPatterns matching library.
Publications:
N. Volanschi, C. Rinderknecht. “Unparsed Patterns: Easy User-Extensibility of Program Manipulation Tools (Extended Version)”. In ACM SIGPLAN 2008 Workshop on Partial Evaluation and Program Manipulation (PEPM '08), San Francisco, January 2008.
Prototype
Contact us. Last updated on 30/5/2010.