a custom notation for red-black trees

Consider the elegant functional implementation of red-black trees described by Okasaki, which heavily exploits pattern matching in functional languages to simplify the (otherwise very complicated) insertion algorithm. We can easily obtain a very similar implementation in Java with the help of myPatterns, by defining a custom notation for red-black trees.

Red-black trees are binary search trees whose nodes contain values, and are colored black or red, subject to some coloring constraints. They can be implemented in Java as objects of the following class rbTree:

public class rbTree {
    final static int red = 0, black = 1;
    Integer color, value;
    rbTree left, right;

    rbTree(int color, rbTree left, int value, rbTree right) {
        this.color = color;
        this.left = left;
        this.value = value;
        this.right = right;
    }
}

In plain Java (without pattern matching of data structures), the core of the red-black tree implementation, the balance() method, which re-balances a tree upon the insertion of a new node, has to be coded as follows.

rbTree balance() {
    if(color == black && left != null && left.left != null &&
       left.color == red && left.left.color == red)
        return new rbTree(red, new rbTree(black, left.left.left, left.left.value, 
                          left.left.right), 
                          left.value, 
                          new rbTree(black, left.right, value, right));
    if(color == black && left != null && left.right != null &&
       left.color == red && left.right.color == red)
        return new rbTree(red, new rbTree(black, left.left, left.value, 
                          left.right.left), 
                          left.right.value, 
                          new rbTree(black, left.right.right, value, right));
    if(color == black && right != null && right.left != null &&
       right.color == red && right.left.color == red)
        return new rbTree(red, new rbTree(black, left, value, right.left.left), 
                          right.left.value, 
                          new rbTree(black, right.left.right, right.value, 
                          right.right));
    if(color == black && right != null && right.right != null &&
       right.color == red && right.right.color == red)
        return new rbTree(red, new rbTree(black, left, value, right.left), 
                          right.value, 
                          new rbTree(black, right.right.left, right.right.value, 
                          right.right.right));
      return this;
}

As can be seen, the code is very verbose (although much more concise than a standard imperative implementation). This degree of verbosity makes writing such a function tedious to write and error-prone.

To demonstrate the advantage of using custom patterns, let us define the following notation for red-black trees:

(%l %v %r) denotes a tree whose root is colored red and contains the value v, whose left subtree is l, and whose right subtree is r
[%l %v %r] denotes a tree whose root is colored black and contains the value v, whose left subtree is l, and whose right subtree is r

Note how the color of a tree is compactly encoded in the kind of surrounding parentheses.

This notation can be easily implemented by the following matches() method:

public boolean matches(String pat, ParsePosition pos, Subst sub) {
    return Matchbox.matchChar((color == black? '[': '('), pat, pos) &&
        Matchbox.matchData(left, pat, pos, sub) &&
        Matchbox.matchChar(' ', pat, pos) &&
        Matchbox.matchData(value, pat, pos, sub) &&
        Matchbox.matchChar(' ', pat, pos) &&
        Matchbox.matchData(right, pat, pos, sub) &&
        Matchbox.matchChar((color == black? ']': ')'), pat, pos);
}

Using the above notation, the balance() method shown above can be reduced as much as follows:

static final String pats[] = {"[((%a %x %b) %y %c) %z %d]",
                              "[(%a %x (%b %y %c)) %z %d]",
                              "[%a %x ((%b %y %c) %z %d)]",
                              "[%a %x (%b %y (%c %z %d))]"};

rbTree balance2() {
    Subst s = Matchbox.matchAny(this, pats);
    if(s != null)
        return new rbTree(red, 
                          new rbTree(black, (rbTree)s.get('a'), (Integer)s.get('x'), (rbTree)s.get('b')), 
                          (Integer)s.get('y'), 
                          new rbTree(black, (rbTree)s.get('c'), (Integer)s.get('z'), (rbTree)s.get('d')));
    return this;
}

Note that pattern matching in this version of the balance() method is not done using the standard function match(), but using a disjunctive version of it, matchAny(), which tries a whole list of patterns until the first match.

a parsed notation

The same notation for red-black trees can be implemented in parsed form by the following matcher() method and matcher class:

public static Matcher matcher(String pat, ParsePosition pos) throws ParseFail {
    Matcher res = Matchbox.parseRef(pat, pos);
    if(res != null) return res;
    int off = pos.getIndex();
    char c = pat.charAt(off); 
    if(c != '[' && c != '(') throw new ParseFail();
    Integer color = (c == '[')? black: red;
    pos.setIndex(off + 1);
    Matcher fun1 = matcher(pat, pos);
    Matchbox.parseChar(' ', pat, pos);
    Matcher fun2 = Matchbox.integerMatcher(pat, pos);
    Matchbox.parseChar(' ', pat, pos);
    Matcher fun3 = matcher(pat, pos);
    Matchbox.parseChar((color == black)? ']': ')', pat, pos);
    return new TreeMatcher(color, fun1, fun2, fun3);
}

class TreeMatcher implements Matcher {
    Integer color;
    Matcher fun1, fun2, fun3;
    TreeMatcher(Integer color, Matcher fun1, Matcher fun2, Matcher fun3) {
        this.color = color;
        this.fun1 = fun1;
        this.fun2 = fun2;
        this.fun3 = fun3;
    }
    public boolean match(Object obj, Subst sub) {
        if(!(obj instanceof rbTree)) return false;
        rbTree data = (rbTree)obj;
        return
            data.color == color &&
            fun1.match(data.left, sub) &&
            fun2.match(data.value, sub) &&
            fun3.match(data.right, sub);
    }
}

The matchers resulting from the parsing phase are stored in a static member of class rbTree to ensure that the patterns are parsed only once, and are used as follows in the parsed version of the balance() method:

static Matcher[] matchers = {null, null, null, null};
static {
    try { 
        matchers[0] = matcher("[((%a %x %b) %y %c) %z %d]", new ParsePosition(0));
        matchers[1] = matcher("[(%a %x (%b %y %c)) %z %d]", new ParsePosition(0));
        matchers[2] = matcher("[%a %x ((%b %y %c) %z %d)]", new ParsePosition(0));
        matchers[3] = matcher("[%a %x (%b %y (%c %z %d))]", new ParsePosition(0));
    } catch(ParseFail err) { System.out.println("Internal error: bad predefined patterns"); }
}

rbTree balance3() {
    LinearSubst s = sub;
    for(int i = 0; i < 4; i++) {
        if(matchers[i].match(this, s)) {
            return tree(red, 
                    tree(black, (rbTree)s.get('a'), (Integer)s.get('x'), (rbTree)s.get('b')), 
                    (Integer)s.get('y'), 
                    tree(black, (rbTree)s.get('c'), (Integer)s.get('z'), (rbTree)s.get('d')));
        }
    }
    return this;
}

assessment

As far as performance is concerned, we benchmarked the three versions of the red-black trees (the complete code is here) and obtained a slowdown factor (with respect to the first version using no pattern matching) of 2 (when using parsed notations) and of 6 (when using interpreted notations). The overhead of parsed notations is excellent when compared with the overhead of standard regular expression matching in Java.

As can be seen, the interpreted notation for red-black trees is trivial to implement, and the parsed notation is only slightly more complex to write, while the payoff in performance is clearly worth it.