the predefined notation

The predefined notation in myPatterns is a superset of the standard notation JSON (JavaScript Object Notation).

Using this notation, one can write for instance:

  HashMap t = new HashMap();
  t.put("a", 11);
  t.put("b", 22);
  Subst s = match(t, "{b:%x,a:%y}"); // binds %x to 11 and %y to 22
  if(s != null) use(s.get('x'), s.get('y')); // retrieve the bound variables

Thus, the main matching function is match(), which takes some data and a pattern represented as a string, and returns a substitution, binding pattern variables to parts of the data (or "sub-data"). If the data does not match the pattern, match() returns null.

Any type of object can be matched using this notation, which is described in detail below.

base types

The predefined notations for base types (Number, Boolean, and String) are simply their native notations. To be more precise, their notation is as returned by the standard Java method toString(), except for strings, whose content has to be wrapped within single or double quotes.

For instance, number 123 is matched by the pattern "123", the boolean true is matched by "true", and the string 'who am I' is matched by both "'who am I'" and "\"who am I\"".

Note that none of these patterns contain variables. Therefore, the result of a successful match on base types using these patterns is always the empty substitution (which is not null, but a substitution that does not contain any binding).

As an extension to JSON, strings may also be matched using regular expressions (or regexes), written within slash ("/") characters. Like in the standard regex matching, the result of matching a string with a regex pattern is an array of values, also represented as a substitution binding a value to each "capturing subgroup" (a sub-pattern within parentheses "(...)").

For instance, the same string "who am I" can be matched by the patterns "/who/", "/^who (am|are|is)/", or "/^who (\\S+) (\\S+)$/". The first match returns the empty substitution {}; the second match returns the substitution {0:"am"}; the third match returns the substitution {0:"am",1:"I"}. Note that unlike traditional regex matching the first element in the resulting array does not automatically provide the string matched by the whole regex. One can obtain the same effect by defining the whole regex as a capturing group.

objects and maps

The notation for objects smoothly generalizes the JSON notation:

the pattern "{fld1:%x1,...,fldN:%xN}" matches any object containing at least the (visible) fields fld1...fldN, and binds the pattern variables to the values of the respective fields; in particular, the pattern "{}" matches any object

Exactly the same notation can be used for any object implementing the Map interface. In this case, the keys play the role of fields.

See the example at the start of this page.

lists

Any object implementing the List interface can be matched with a smooth generalization of the JSON notation for lists/arrays:

the pattern "[%x1,...,%xN]" matches any list of length N, and the N pattern variables are bound to the elements of the list, in their order; in particular, the pattern "[]" matches any empty list
the pattern "[%x1,...,%xN-1|%xN]" matches any list of length at least N-1; the first N-1 pattern variables are bound to the first N-1 elements of the list, in their order, and the last variable is bound to the rest of the list (the subsequence starting with the N-th element).

For instance, the following matches on a Vector object:

  Vector v = new Vector();
  v.add(1);
  v.add(2);
  v.add(3);
  System.out.println(match(v, "[%x|%t]"));
  System.out.println(match(v, "[%x,%y|%t]"));
  System.out.println(match(v, "[%x,%y,%z|%t]"));
  System.out.println(match(v, "[%x,%y,%z]"));

print the following resulting substitutions:

{ t:[2, 3] x:1 }
{ t:[3] x:1 y:2 }
{ t:[] x:1 y:2 z:3 }
{ x:1 y:2 z:3 }

composing the predefined notations

Of course, the predefined notations can be freely composed, by replacing any variable in a pattern with some sub-pattern. When such a composed pattern is matched, that sub-pattern will be recursively matched against the corresponding sub-data.

For instance, in the following code:

  Vector v2 = new Vector();
  v2.add("Johnson");
  v2.add("Albertson");
  v2.add("Smithson");
  s = match(v2, "[/(.*)son/,/(.*)son/,/(.*)son/]");

the resulting substitution s is {0:"John", 1:"Albert", 2:"Smith"}. That is, each sub-pattern (which happen to be all regular expressions), were matched against the corresponding vector element.

summary

Without any initial investment, anyone may use pattern matching on any object type using the predefined notations in myPatterns. These notations either are exactly the native notations in the language (for numbers and booleans) or backward-compatible extensions of the native notations (for strings, objects and arrays).

Next: Custom notations