Computing the nearest common ancestor

This post is about computing the nearest common ancestor.  It is the result of a month or so of reading papers and implementation.  Although there has been a lot of research into the problem, there are no implementations online of the two main algorithms presented.  Worse, examples in the papers are practically useless.  For example, the Alstrup-Gavoille-Kaplan-Rauhe (AGKR) algorithm was first described in a technical report in August 2001, but it did not contain an example.  In 2002, it was presented at a conference but again it did not contain an example.  Finally, in 2004, it was published in a peer-reviewed journal and contained an example, albeit still incomplete.

Nearest common ancestor

Imagine meeting another person who has the same last name as you and suggests the possibility that you are related.  You and your acquaintance create your own family trees and ask whether you have any common ancestors.  If you are related, which relative is the most recent common ancestor?  This is the problem of the nearest common ancestor (NCA), which is also known as the least common ancestor, or lowest common ancestor, first presented by Aho et al. [1].

The nearest common ancestor problem begins with a rooted general tree, G = (V, E, r ∈ V), and two nodes in the tree.  The set of ancestors of node is the set of nodes in the parent chain that starts at the node and ends at the root.  The set of ancestors a(x) of a node x ∈ V is

a(x) = (x = r) ? {x} : {x} ∪ a(Parent(x))

The proper ancestors of a node is the set of nodes in the parent chain that starts at the parent of the node and ends at the root, i.e., the set of ancestors for the node but not including the node itself.  The set of proper ancestors p(x) of a node x ∈V is

p(x) = (x = r) ? ∅ : {Parent(x)} ∪ p(Parent(x))

The set of common ancestors ca(x, y) of node x, y ∈V is

ca(x, y) = a(x) ∩ a(y)

The nearest common ancestor nca(x, y) = w if w is a common ancestor of x and y with maximal depth d(w).  So, if ca(x, y) is the set of nodes v1, v2, v3, …, vk, then vk is the least common ancestor if d(v1) < d(v2) < d(v3) < … < d(vk).

nca(x, y) = vk where vk ∈ ca(x, y) = { v1, v2, v3, …, vk } and d(vn) < d(vk) for n ≠ k

In Figure 1, the ancestors of e is the set {a, b, e}, and the ancestors of m is the set {a, d, m}.  The proper ancestors of e is the set {a, b}, and the proper ancestors of m is the set {a, d}.  The common ancestors of nodes e and m is {a}; the nearest common ancestor of nodes e and m is node a.

The nearest common ancestor problem has many different applications: compiler construction [1], string search algorithms [2], and evolutionary trees [3].  It also has applications in graph drawing in the Walker algorithm [4] and planarity testing [5, 6], which we will discuss later on.

Figure 1.  A rooted general tree with ancestors of two leafs.

nca-three-by-three

Simple solution for the nearest common ancestor

The simplest solution to the nearest common ancestor problem is computed using two stacks.  In a loop, the ancestors for one node are pushed onto one stack.  In a second loop, the ancestors for the other node are pushed onto the other stack.  Finally, the top entries of each stack are popped and compared.  If the top entries are the same, then the ancestors are the same, and we note the ancestor in a temporary variable.  However, if the top entries differ, the nearest common ancestor has been found, and it is retained in the temporary variable.

While this algorithm works fine for many applications, the issue can become probative in computational time when the nearest common ancestor is computed for many different node pairings.  It computes the nearest common ancestor in O(n) time.

Implementation in C#


1     public Vertex NcaNaive(Vertex x, Vertex y)
2     {
3         Stack<Vertex> s1 = new Stack<Vertex>();
4         Stack<Vertex> s2 = new Stack<Vertex>();
5         Vertex lca = null;
6
7         do
8         {
9             s1.Push(x);
10            x = x.Parent;

11        } while (x != null);
12        do
13        {
14            s2.Push(y);
15            y = y.Parent;
16        } while (y != null);
17        while (s1.Top() == s2.Top())
18        {
19            lca = s1.Pop();
20            s2.Pop();
21        }
22        return lca;
23    }

Each “do-while” block (lines 7-11 and 12-16) computes a list of ancestors in order from a node to the root of the tree.  The “while” block at the end of the method (lines 17-21) compares each list of ancestors, processing each node from the root to the children until there is a difference in the ancestor list.  The last node that is equal is the nearest common ancestor, and is the temporary variable lca.

Schieber-Vishkin solution for the nearest common ancestor

The Schieber-Vishkin algorithm is probably the first solution for the nearest common ancestor that is considered efficient, simple, and understandable.  It is an extension of an earlier but more complicated solution by Harel and Tarjan [7].  It computes the nearest common ancestor in O(1) time after preprocessing the tree once in O(n) time.  Both of these solutions are based on the reduction of the tree into a complete binary tree that has been labeled with an inorder numbering, and is also known as a “sideways heap” [8].  The complete binary tree has several interesting properties which we will now illustrate by example (see Figure 2).

Figure 2.  A full binary tree of 31 nodes, inorder numbering.

Sideways-heap

The first property concerns the node inorder numbering and the depth of the node in the complete binary tree.  If the label of a particular node in represented as a binary integer, the level of the node in the tree can be determined by the position of the rightmost 1-bit in the label.  In Figure 2, node 19 has the binary representation 10011, and it is a leaf node because the rightmost 1-bit is in the 1’s position.  Node 20 has the binary representation 10100, and it is two levels up from the leaves of the tree because the rightmost 1-bit is in the 4’s position.  We define the height of a node relative to the leaves of the tree, height(v) as the following function:

height(v) = ⌊ log2(v & –v) ⌋

where & is the bitwise-and binary operator, and ⌊ x ⌋ is the integer floor function.

Using this formula, height(3) = ⌊ log2(00011 & -00011) ⌋ = ⌊ log2(00011 & -11101) ⌋ = ⌊ log2(00001) ⌋ = 0 (i.e., it is a leaf node).  The height(20) = ⌊ log2(10100 & -10100) ⌋ = ⌊ log2(10100 & -01100) ⌋ = ⌊ log2(00100) ⌋ = 2 (i.e., two nodes up from a leaf node).

A second property concerns the ancestor of a node and the difference in the height between the node and its ancestor.  Given a node v and the difference in the height h, the ancestor is

a(v, h) = ((v >> h) | 1) << h

Several other examples: the ancestor of node 3 with height h = 4 is a(3,0) = ((3 >> 4) | 1) << 4 = ((0) | 1) << 4 = 1 << 4 = 16; the ancestor of node 3 (00011) with height h = 1 is a(3,1) = ((3 >> 1) | 1) << 1 = ((1) | 1) << 1 = 1 << 1 = 2; and, the ancestor of node 3 with height h = 0 is a(3,0) = ((3 >> 0) | 1) << 0 = ((3) | 1) << 0 = 3 << 0 = 3.

Because of these two properties, it is possible to compute the nearest common ancestor for two nodes in the complete binary tree for any two nodes x and y using the following formula:

Let h = ⌊ log2(x & –y) ⌋ where the label for x is less than or equal to the label for y, then

NCAcbt(x, y) = ((x >> h) | 1) << h

We now turn our attention back to the nearest common ancestor problem for a general tree.  Given a tree T = (V, E, r ∈V), the Schieber-Vishkin algorithm requires the following functions to be computed in a preprocessing step:

1)      Preorder: V → â„•

Function Preorder(v) maps nodes of the tree into integers that are the preorder numbers of the nodes in the tree.

2)      Max: V → â„•

Function Max(v) maps nodes of the tree into the maximum preorder number of a node in the subtree for v.

3)      Inlabel: V → â„•

Function Inlabel(v), known as the inlabel of a node, maps nodes of the tree into integer labels which correspond to the inorder numbers for nodes in the complete binary tree.  Function Inlabel(v) is defined as:

Inlabel(v) = ⌊ log2(Preorder(v) & – Max(v)) ⌋

4)      Ascendant: V → â„•

Function Ascendant(v), known as the ascendant of a node, maps nodes of the tree into integers.  The ascendant is a bit map defined according to the following expression:

Ascendant(v) = Ascendant(Parent(v)) | (Inlabel(v) & -Inlabel(v)), where Ascendant(r) = 0

The idea of the Ascendant(v) mapping is that it encodes the inlabel numbers of all ancestors of node v. A 1-bit in Ascendant(v) encodes the height of an ancestor of the node; a 0-bit indicates there is no ancestor at that level in the complete binary tree.

5)      Head:  â„•  → V

Function Head(v), known as the head of a node, maps an integer label that is the result of a simple calculation for the nearest common ancestor in the complete binary tree into a node in the general tree.  Function Head(v) is defined as:

Head(x) = v such that Inlabel(v) ≠ Inlabel(Parent(v)) ∩ Inlabel(v) = x

Another way to understand this definition is that Head(x) is the node in the general tree that has the node that is an ancestor of x but has a different inlabel value as x.

The Inlabel(v) relationship maps a node in the general tree to a node in the complete binary tree.  It is important to understand that the mapping Inlabel(v) is not a unique mapping, i.e., two nodes in the general tree can map to the same node in the complete binary tree.  In order to uniquely identify the nearest common ancestor of two nodes in the general tree, Ascendant(v), and Head(v) are needed.  Inlabel(v) maintains two properties which are crucial for the algorithm to compute the nearest common ancestor.  First, Inlabel(v) partitions paths in the general tree, which are known as inlabel paths.  All of these paths partition the general tree into distinct paths.  This property is known as the path-partition property.  Second, a node d that is a descendant of node p in the general tree map to nodes Inlabel(d) and Inlabel(p) in the complete binary tree, respectively, such that Inlabel(d) is a descendant of (or the same as) Inlabel(p).  This is known as the descendance-preservation property.

The preprocessing step can be done in two passes over the general tree in a depth-first search.  During the traversal, the values for Preorder(v), Max(v), Inlabel(v), and Head(v) are computed.  Preorder(v) can be computed using a global variable for the node visited, incremented after associating the value with the node.  Max(v) is computed after traversing a node’s entire subtree, associating the value of the global variable (minus 1 since it is post-incremented).  After computing Preorder(v) and Max(v), Inlabel(v) can then be computed using its definition because all dependencies have been computed.  Head(v) can be computed in a couple different ways.  However, one method is to assign the node to an array indexed by the Inlabel(v) value.  As the depth-first search returns, the array entry is replaced with nodes in a particular inlabel path higher up in the general tree.  In the second pass, function Ascendant(v) can be computed, using information from the previous pass (using Ascendant(Parent(v)) and Inlabel(v)).

The nearest common ancestor is computed in the following algorithm.



// Let x and y be the two nodes in the general tree.
// Compute the height of the nearest common ancestor in the complete binary tree:
let height(x,y) =  ⌊ log2(Inlabel(x) & -Inlabel(y)) ⌋
// Note, if Inlabel(x) is greater than Inlabel(y), then reverse the assignments of x and y.

// Compute the true height of the nearest common ancestor in the general tree:
let k = Ascendant(x) & Ascendant(y) & (1 << height(x, y)), and trueheight(x, y) =  ⌊ log2(k & -k) ⌋.
// Compute two candidates for the nearest common ancestor.
let j = ((Inlabel(x) >> trueheight(x, y)) | 1) << trueheight(x, y).
if j = Inlabel(x) then let xˆ = x
else
let L =  ⌊ log2(Ascendant(x) & ((1 << trueheight(x, y)) - 1) ⌊
x̂ = Parent(Head((Inlabel(x) >> L) | 1) << L)) 
if j = Inlabel (y) then let   ŷ  = y
else
let L =  ⌊ log2(Ascendant(y) & ((1 << trueheight(x, y)) - 1) ⌋
ŷ = Parent(Head((Inlabel(y) >> L) | 1) << L)) 
let NCASV(x,y) = z, where z = xif Preorder( x̂ ) < Preorder( ŷ ), else  ŷ.

asd

Implementation in C#



1    using System;
2    using System.Collections.Generic;
3    using System.Text;
4
5    namespace GeneralTreeLayout
6    {
7        class NCA
8        {
9            Tree tree;
10           Dictionary<Vertex, int> Preorder;
11           Dictionary<Vertex, int> Inlabel;
12           Dictionary<Vertex, int> Ascendant;
13           Dictionary<int, Vertex> Head;
14
15           public NCA(Tree t)
16           {
17               this.tree = t;
18               this.Preorder = new Dictionary<Vertex, int>();
19               this.Inlabel = new Dictionary<Vertex, int>();
20               this.Ascendant = new Dictionary<Vertex, int>();
21               this.Head = new Dictionary<int, Vertex>();
22           }
23
24           public void Preprocess()
25           {
26               this.PreorderTraverse(this.tree.Root);
27               this.ComputeAscendants(this.tree.Root, 0);
28           }
29
30           int number = 1;
31
32           public void PreorderTraverse(Vertex vertex)
33           {
34               int low = number;
35               this.Preorder.Add(vertex, number++);
36               foreach (Edge edge in vertex.Successors)
37               {
38                   this.PreorderTraverse(edge.To);
39               }
40               int high = number - 1;
41               int lh = (high & -low);
42               int h = this.FloorLog2((uint)lh);
43               int inlabel = ((high >> h) | 1) << h;
44               this.Inlabel.Add(vertex, inlabel);
45               if (!this.Head.ContainsKey(inlabel))
46                   this.Head.Add(inlabel, vertex);
47               else
48                   this.Head[inlabel] = vertex;
49           }
50
51           void ComputeAscendants(Vertex vertex, int pascendant)
52           {
53               int inlabel = this.Inlabel[vertex];
54               int lsb = inlabel & -inlabel;
55               int ascendant = pascendant | lsb;
56               this.Ascendant.Add(vertex, ascendant);
57               foreach (Edge edge in vertex.Successors)
58               {
59                   this.ComputeAscendants(edge.To, ascendant);
60               }
61           }
62
63           public Vertex ComputeNCA(Vertex x, Vertex y)
64           {
65               int inlabelx = this.Inlabel[x];
66               int inlabely = this.Inlabel[y];
67               int xy = (inlabely & -inlabelx);
68               int ch = FloorLog2((uint)xy);
69               int ancestor = ((inlabely >> ch) | 1) << ch;
70               int ax = this.Ascendant[x];
71               int ay = this.Ascendant[y];
72               int k = ax & ay & -(1 << ch);
73               int th = FloorLog2((uint)(k & -k));
74               int j = ((inlabelx >> th) | 1) << th;
75               Vertex xhat = null;
76               Vertex yhat = null;
77               if (j == inlabelx)
78                   xhat = x;
79               else
80               {
81                   int l = this.FloorLog2((uint)( ax & ((1 << th) - 1)));
82                   int anc = ((inlabelx >> l) | 1) << l;
83                   xhat = this.Head[anc].Parent;
84               }
85               if (j == inlabely)
86                   yhat = y;
87               else
88               {
89                   int l = this.FloorLog2((uint)( ay & ((1 << th) - 1)));
90                   int anc = ((inlabely >> l) | 1) << l;
91                   yhat = this.Head[anc].Parent;
92               }
93               Vertex z = null;
94               if (this.Preorder[xhat] <= this.Preorder[yhat])
95                   z = xhat;
96               else
97                   z = yhat;
98               return z;
99           }
100
101          int FloorLog2(uint v)
102          {
103              uint x = 0;
104              while ((v = (v >> 1)) != 0)
105              {
106                  x++;
107              }
108              return (int)x;
109          }
110      }
111  }

Example

Figure 3.  Finding the nearest common ancestor using the Schieber-Vishkin algorithm.

(a)

LCA1

(b)

LCA4

(c)

LCA5

Consider the example tree displayed in Figure 3(a).  Compute the nearest common ancestor of node F and node N.

First, label each node in the tree using a preorder numbering (line 26, and line 32 through 49).  These results are shown in Table 1.  For each node, the method PreorderTraverse also calculates the Inlabel(v) values.  The Head(v) mapping is also calculated.  These results are shown in Table 2.

Table 1.  Shrieber-Vishkin computation for Figure 3

Node v P(v) M(v) I(v) A(v)
A 1 18 16 16
B 2 18 16 16
C 3 6 4 20
D 7 12 8 24
E 13 18 16 16
F 4 4 4 20
G 5 5 5 21
H 6 6 6 22
I 8 11 8 24
J 12 12 12 28
K 14 17 16 16
L 18 18 18 18
M 9 9 9 25
N 10 10 10 26
O 11 11 11 25
P 15 15 15 17
Q 16 16 16 16
R 17 17 17 17

Table 2.  Shrieber-Vishkin head(x) for Figure 3

Complete binary tree node x H(x)
4 C
5 G
6 H
9 M
10 N
11 O
8 D
12 J
15 P
16 A
17 R
18 L

This method uses the function FloorLog2(v), which calculates the expression ⌊ log2(v) ⌋.  (Note, there are several other ways to compute the expression that are faster than the one show here).  Next, the Ascendant mapping is computed (line  27 and lines 51 through 61).

The function ComputeNCA computes the nearest common ancestor (lines 63 through 99).  For nodes F and N, the height of the nearest common ancestor in the complete binary tree is height(F, N) = ⌊ log2(Inlabel(F) & –Inlabel(N)) ⌋ = 4.  The true height is first found via k = Ascendant(F) & Ascendant(N) & (1 << height(F,  N)) = 20 & 26 & (1 << 4) = 10100 & 11010 & 10000 = 10000 = 16.  Then trueheight(F, N) = ⌊ log2(16 & -16) ⌋ = 3.  Notice that the height in the general tree is different from the height in the complete binary tree.  Next, compute the candidates for the NCA.  Let j = ((Inlabel(F) >> trueheight(F, N)) | 1) << trueheight(F, N) = ((4 >> 3) | 1) << 3 = 16.  The results of computing  and  both are node B.  So, NcaSV(F, N) = B.

Alstrup-Gavoille-Kaplan-Rauhe solution for the nearest common ancestor

The Alstrup-Gavoille-Kaplan-Rauhe (AGKR) algorithm [9, 10] is another solution to the nearest common ancestor problem, and is similar to the Schieber-Vishkin algorithm.  After preprocessing the general tree in O(n) time, the solution can be computed in O(1) time.  The main difference between the AGKR algorithm and the Schieber-Vishkin algorithm is that the Schieber-Vishkin algorithm maps the general tree into a complete binary tree, whereas the AGKR algorithm does not.  Instead, the AGKR algorithm is based on the computation of a common prefix between the labels of the two nodes in the general tree.

The AGKR algorithm creates labels for each node in the general tree by partitioning the nodes and edges by classify each as either light or heavy, a scheme that was introduced by Harel and Tarjan [7].  A heavy node is a node has the greatest subtree size of all its siblings.  For each internal node v, we pick a child w of v, where size(w) = max{size(z) | z ∈children(v)} and classify w as heavy.  All other nodes are light nodes.  All nodes that have no siblings are light nodes, and the root is defined to be a light node.  A heavy path is an edge from a node of any class to a node of heavy class.  All other paths are light paths.

Given a tree T = (V, E, r ∈V), the AGKR algorithm defines the following functions:

1)      Size: V → â„•

Function Size(v) returns the size of the entire subtree of a node.

2)      IsHeavy: V → bool

Function IsHeavy(v) partitions nodes of the general tree into heavy (denoted by true) or light (denoted by false):

IsHeavy(v) = true if  |Children(Parent(v))| > 1 and [ ∀ c ∈Children(Parent(v)), where c ≠ v and Size(c) ≤ Size(v)], else false.

In other words, a node v is function heavy if there is exists a sibling of v, and all siblings of v have a subtree size less than the subtree size for v.

3)      Apex: V → V

Function Apex(v) finds the apex of a node (see below).

4)      LightLabel: V → â„•

Function LightLabel(v) returns the light label for a node.  The light label of a node is:

LightLabel(v) = Size(v)

5)      HeavyLabel: V → â„•

Function HeavyLabel(v) returns the heavy label for a node.  The heavy label of a node is:

HeavyLabel(v) = Size(v) – Size(h), where h Î Children(v) and IsHeavy(h) = true

6)      FullLabel: V → â„•

Function FullLabel(v) maps nodes of the tree into integer labels.  Labels for the node are binary strings that are aligned to the leftmost (most significant) bit of an integer.  The integer labels are scaled to the appropriate size so that they fit into an appropriate integer unit on a machine (e.g., 32-bit unsigned integers).

7)      Mask: V → â„•

Function Mask(v) maps nodes of the tree into integer masks.  The mask is a binary string that corresponds to the label FullLabel(v) and notes portions of the label that are either light or heavy, explained below.

8)      Reverse:  â„• → V

Function Reverse(v) maps an integer that is the result of a simple calculation for the nearest common ancestor into a node in the tree.  The mapping should always exist, since the two nodes in the nearest common ancestor problem are in the tree.

In the preprocessing step, the AGKR algorithm first classifies nodes and edges of the tree as either light or heavy, a scheme that was introduced by Harel and Tarjan [7].  A heavy node is a node that must have at least one sibling and the node has the greatest subtree size of all its siblings.  All other nodes are light nodes.  The nearest ancestor a of a node v in which node a is light and node v is heavy is known as an apex node.  The sequence of heavy nodes from the apex node is known as a heavy path.  The heavy path starts at a light node, called the apex.  Using this partitioning, two types of labels are constructed for each node: a heavy label, and a light label.  The heavy and light labels are encoded in binary strings by considering the labels in a sequence:

a)       For heavy label encoding, the sequence is formed from the heavy labels for nodes in the heavy path.  For example, the heavy labels for the heavy path A->B->C are encoded by considering the labels in sequence: HeavyLabel(A), HeavyLabel(B), HeavyLabel(C).  There is only one encoding for each heavy label because the heavy node can belong only to one heavy path.  The encoding for heavy labels of light nodes is the empty string.

b)      For light label encoding, the sequence is formed from the light labels for sibling nodes that are light nodes.  For example, supposed the children of node A are B, C, and D.  Node C is a heavy node, and nodes B and D are light nodes.  The sequence for encoding the light labels is: LightLabel(B), LightLabel(D).  The encoding for light labels of heavy nodes is the empty string.

The encoding is calculated as follows:

1)      Create a sequence yi, I = 1 to k, from the integer values, either from the heavy labels, or the light children of a node. For example: heavy labels for heavy chain of nodes A->B->D->I->M are 1, 11, 2, 3, 1 (see Figure 4).

2)      Compute si = ∑ yj, j=1 .. i, and i=1 .. k-1, s0 = 0.  From the example in the previous step, si = 0, 1, 12, 14, 17, 18.

3)      Compute fi = ⌊ log2(yi) ⌋, i = 1 .. k.   From the example in the previous steps, fi = 0, 1, 8, 2, 2, 1.

4)      Compute zi, i = 0 .. k:

zi = (si-1 mod 2fi = 0) ? si-1 : si-1 – (si-1 mod 2fi) + 2fi

From the example in the previous steps, zi = 0, 0, 8, 12, 14, 17.

5)      Compute m = max(zi) for all i = 0 .. k.  From the example in the previous steps, m = 17.

6)      Compute w = ⌊ log2(m) ⌋ + 1.  From the example in the previous steps, w = 5.

7)      Compute encoding ei = (zi >> fi) encoded with w – fi bits, i = 1 .. k.  From the example in the previous steps, the encodings for the labels in step 1 are 00000, 01, 0110, 0111, 10001.

We define functions EncodedHeavyLabel(v) and EncodedLightLabel(v) as the encoded heavy and light labels for node v, respectively.

Computing the NCA.

Each label for a node uniquely identifies all of the edges from the root to the node.  The encodings constructed are a combination of the labels down a “heavy” path in the tree and across the “light” edges for siblings of a node.  For any node, the encoding is defined as

FullLabel(v) = FullLabel(Parent(Apex(v)) · EncodedLightLabel(Apex(v)) · EncodedHeavyLabel(v)

Given two nodes x and y that are in the tree the NCA is computed by computing the bitwise-AND between the two full labels and noting the location of the first difference from the left end of the labels.  The difference between the two labels can occur either within a light substring or within a heavy substring.

If the difference occurs in the light label substrings, then let FullLabel(x) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li … and FullLabel(y) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 L’i …, where Li ¹ L’I.  Then FullLabel(NCA(x, y)) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1.

If the difference occurs in the heavy label substrings, then let FullLabel(x) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li Hi … and FullLabel(y) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li H’i …, where Hi ¹ H’I.  Then FullLabel(NCA(x, y)) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 min(H1, H’1).  Note that the last substring of the label for the NCA is the minimum substring via alphabetic ordering.

Mask(v) is used to indicate which bits in the label are part of the heavy or light substrings.  The NCA is computed by the expression Reverse(FullLabel(NCA(x, y))).

Implementation in C#


1   using System;
2   using System.Collections.Generic;
3   using System.Linq;
4   using System.Text;
5
6   namespace GeneralTreeLayout
7   {
8       class Alstrup : NCA
9
10          Dictionary<Vertex, int> Number;
11          List<Vertex> heavy;
12          List<Vertex> light;
13          Dictionary<Vertex, Vertex> apex;
14          List<Vertex> UniqueApex;
15          Dictionary<Vertex, int> hlabel;
16          Dictionary<Vertex, int> llabel;
17          Dictionary<Vertex, String> lightLabelEncodingString;
18          Dictionary<Vertex, String> heavyLabelEncodingString;
19          Dictionary<Vertex, String> labelEncodingString;
20          Dictionary<Vertex, uint> labelEncodingBinary;
21          Dictionary<Vertex, uint> kEncodingBinary;
22
23          public Alstrup(Tree t)
24              : base(t)
25          {
26              this.Number = new Dictionary<Vertex,int>();
27              this.heavy = new List<Vertex>();
28              this.light = new List<Vertex>();
29              this.apex = new Dictionary<Vertex, Vertex>();
30              this.UniqueApex = new List<Vertex>();
31              this.hlabel = new Dictionary<Vertex, int>();
32              this.llabel = new Dictionary<Vertex, int>();
33              this.lightLabelEncodingString = new Dictionary<Vertex, string>();
34              this.heavyLabelEncodingString = new Dictionary<Vertex, string>();
35              this.labelEncodingString = new Dictionary<Vertex, string>();
36              this.labelEncodingBinary = new Dictionary<Vertex, uint>();
37              this.kEncodingBinary = new Dictionary<Vertex, uint>();
38          }
39
40          override public void Preprocess()
41          {
42              this.ComputeSize(this.tree.Root);
43              this.light.Add(this.tree.Root);
44              this.Partition(this.tree.Root);
45              this.ComputeApex(this.tree.Root, this.tree.Root);
46              foreach (Vertex v in this.UniqueApex)
47              {
48                  this.ComputeHeavyLabels(v);
49              }
50              this.ComputeLightLabels(this.tree.Root);
51              foreach (Vertex v in this.UniqueApex)
52              {
53                  this.EncodeHeavyLabels(v);
54              }
55              this.lightLabelEncodingString.Add(this.tree.Root, "");
56              this.EncodeLightLabels(this.tree.Root);
57              this.ComputeFullLabels(this.tree.Root);
58          }
59
60          private int ComputeSize(Vertex v)
61          {
62              // Count the number of children in the whole subtree.
63              int sum = 1;
64              foreach (Edge e in v.Successors)
65              {
66                  Vertex c = e.To;
67                  sum += this.ComputeSize(c);
68              }
69              this.Number.Add(v, sum);
70              return sum;
71          }
72
73          private void ComputeBinaryLabels(Vertex v, String s)
74          {
75              // Convert s into a binary and create a binary for h/l substring
76              // boundaries.
77              uint bin = 0;
78              uint hl = 0;
79              uint hlbin = 0;
80              int len = 0;
81              for (int i = 0; ; ++i)
82              {
83                  if (i == s.Length)
84                      break;
85                  if (s[i] == '1')
86                  {
87                      bin = bin << 1;
88                      bin |= 1;
89                      len++;
90                      hlbin = hlbin << 1;
91                      hlbin |= hl;
92                  }
93                  else if (s[i] == '0')
94                  {
95                      bin = bin << 1;
96                      len++;
97                      hlbin = hlbin << 1;
98                      hlbin |= hl;
99                  }
100                 else if (s[i] == 'L')
101                 {
102                     hl = 0;
103                 }
104                 else if (s[i] == 'H')
105                 {
106                     hl = 1;
107                 }
108             }
109             // Shift binaries all the way to the MSB, and
110             // set up hl to note end of label.
111             int diff = 32 - len;
112             hl = (hl == 0) ? (uint)1 : (uint)0;
113             for (int i = 0; i < diff; ++i)
114             {
115                 bin = bin << 1;
116                 hlbin = hlbin << 1;
117                 hlbin |= hl;
118             }
119             this.labelEncodingBinary.Add(v, bin);
120             this.kEncodingBinary.Add(v, hlbin);
121         }
122
123         private void ComputeFullLabels(Vertex v)
124         {
125             // Compute the labelEncodingString for a node v.
126             Vertex ap = this.apex[v];
127             Vertex parent = ap.Parent;
128             String t = "";
129             if (parent != null)
130             {
131                 if (this.labelEncodingString.ContainsKey(parent))
132                     t = this.labelEncodingString[parent];
133             }
134             String ll = "";
135             if (this.lightLabelEncodingString.ContainsKey(ap))
136                 ll = this.lightLabelEncodingString[ap];
137             String hl = "";
138             if (this.heavyLabelEncodingString.ContainsKey(v))
139                 hl = this.heavyLabelEncodingString[v];
140             String total = t + "L" + ll + "H" + hl;
141             this.labelEncodingString.Add(v, total);
142             this.ComputeBinaryLabels(v, total);
143             foreach (Edge e in v.Successors)
144             {
145                 this.ComputeFullLabels(e.To);
146             }
147         }
148
149         private void Partition(Vertex v)
150         {
151             // Find child with max size.
152             int maxsize = 0;
153             Vertex max = null;
154             foreach (Edge e in v.Successors)
155             {
156                 Vertex to = e.To;
157                 int value = this.Number[to];
158                 if (value > maxsize)
159                 {
160                     maxsize = value;
161                     max = to;
162                 }
163             }
164             if (max != null)
165             {
166                 this.heavy.Add(max);
167             }
168             foreach (Edge e in v.Successors)
169             {
170                 if (e.To != max)
171                 {
172                     this.light.Add(e.To);
173                 }
174             }
175             foreach (Edge e in v.Successors)
176             {
177                 this.Partition(e.To);
178             }
179         }
180
181         private void ComputeApex(Vertex v, Vertex a)
182         {
183             if (this.light.Contains(v))
184             {
185                 // A light node is an apex by definition.
186                 this.UniqueApex.Add(v);
187                 this.apex.Add(v, v);
188                 a = v;
189             }
190             else
191             {
192                 this.apex.Add(v, a);
193             }
194             foreach (Edge e in v.Successors)
195             {
196                 Vertex c = e.To;
197                 this.ComputeApex(c, a);
198             }
199         }
200
201         private void ComputeHeavyLabels(Vertex v)
202         {
203             int size = this.Number[v];
204             int h = 0;
205             foreach (Edge e in v.Successors)
206             {
207                 if (this.heavy.Contains(e.To))
208                 {
209                     h = this.Number[e.To];
210                     this.ComputeHeavyLabels(e.To);
211                     break;
212                 }
213             }
214             size -= h;
215             this.hlabel.Add(v, size);
216         }
217
218         private void ComputeLightLabels(Vertex v)
219         {
220             if (this.light.Contains(v))
221                 this.llabel.Add(v, this.Number[v]);
222             foreach (Edge e in v.Successors)
223             {
224                 this.ComputeLightLabels(e.To);
225             }
226         }
227
228         private void EncodeHeavyLabels(Vertex a)
229         {
230             List<Vertex> hp = this.GetHP(a);
231             List<int> hpHLabel = this.GetHeavyLabels(hp);
232             List<String> b = IntegerAlphabeticEncoding.Encoding(hpHLabel);
233             for (int i = 0; i < hp.Count; ++i)
234             {
235                 this.heavyLabelEncodingString.Add(hp[i], b[i]);
236             }
237         }
238
239         private List<Vertex> GetHP(Vertex v)
240         {
241             if (!this.UniqueApex.Contains(v))
242                 throw new Exception("Not an apex.");
243             List<Vertex> list = new List<Vertex>();
244             list.Add(v);
245             Vertex next = null;
246             do
247             {
248                 next = null;
249                 foreach (Edge e in v.Successors)
250                 {
251                     if (this.heavy.Contains(e.To))
252                     {
253                         next = e.To;
254                         list.Add(next);
255                         break;
256                     }
257                 }
258                 v = next;
259             } while (next != null);
260             return list;
261         }
262
263         private List<int> GetHeavyLabels(List<Vertex> list)
264         {
265             // Convert a list of vertices into a list of heavy labels.
266             List<int> newlist = new List<int>();
267             foreach (Vertex v in list)
268             {
269                 newlist.Add(this.hlabel[v]);
270             }
271             return newlist;
272         }
273
274         private void EncodeLightLabels(Vertex v)
275         {
276             List<Vertex> lightvertices = new List<Vertex>();
277             foreach (Edge e in v.Successors)
278             {
279                 if (this.light.Contains(e.To))
280                 {
281                     lightvertices.Add(e.To);
282                 }
283             }
284             List<int> lightlabels = this.GetLightLabels(lightvertices);
285             List<String> labels = IntegerAlphabeticEncoding.Encoding(lightlabels);
286             this.AssociateLLabels(lightvertices, labels);
287             foreach (Edge e in v.Successors)
288             {
289                 this.EncodeLightLabels(e.To);
290             }
291         }
292
293         private List<int> GetLightLabels(List<Vertex> list)
294         {
295             // Convert a list of vertices into a list of light labels.
296             List<int> newlist = new List<int>();
297             foreach (Vertex v in list)
298             {
299                 newlist.Add(this.llabel[v]);
300             }
301             return newlist;
302         }
303
304         private void AssociateLLabels(List<Vertex> vertices, List<String> labels)
305         {
306             for (int i = 0; i < vertices.Count; ++i)
307             {
308                 this.lightLabelEncodingString.Add(vertices[i], labels[i]);
309             }
310         }
311
312         override public Vertex ComputeNCA(Vertex x, Vertex y)
313         {
314             // NCA label to be found.
315             uint ncalabel = 0;
316             uint ncak = 0;
317
318             // Get labels for each vertex.
319             uint xs = this.labelEncodingBinary[x];
320             uint ys = this.labelEncodingBinary[y];
321
322             // Get heavy/light masks for each vertex.
323             // These are needed to know whether the first difference
324             // in the labels is on a heavy label substring, or a light
325             // label substring.
326             uint xk = this.kEncodingBinary[x];
327             uint yk = this.kEncodingBinary[y];
328
329             // Compute the common prefix and the location
330             // of the first difference from the left.
331             int h = Bithacks.FloorLog2(xs ^ ys);
332             int hk = Bithacks.FloorLog2(xk ^ yk);
333             if (h == 0)
334                 h = hk;
335             if (h == 0)
336                 return x;
337             uint v = (uint)(1 << h);
338             uint commonprefix = (xs >> h) << h;
339
340             // Compute if the bit that is different is heavy or light
341             // in either of the two labels.
342             // ... or both!
343             bool xheavy = (xk & v) != 0;
344             bool yheavy = (yk & v) != 0;
345
346             // Count number of heavy/light changes in each label.
347             // We need to know the first actual difference.
348             int xcount = 0;
349             {
350                 int i = 31;
351                 for (; i >= h; --i)
352                 {
353                     bool hv = (xk & (1 << i)) != 0;
354                     if (hv)
355                     {
356                         xcount++;
357                         for (; i >= h; --i)
358                             if ((xk & (1 << i)) == 0)
359                             {
360                                 xcount++;
361                                 break;
362                             }
363                     }
364                     else
365                     {
366                         xcount++;
367                         for (; i >= h; --i)
368                             if ((xk & (1 << i)) != 0)
369                             {
370                                 xcount++;
371                                 break;
372                             }
373                     }
374                 }
375             }
376             int ycount = 0;
377             {
378                 int i = 31;
379                 for (; i >= h; --i)
380                 {
381                     bool hv = (yk & (1 << i)) != 0;
382                     if (hv)
383                     {
384                         ycount++;
385                         for (; i >= h; --i)
386                             if ((yk & (1 << i)) == 0)
387                             {
388                                 ycount++;
389                                 break;
390                             }
391                     }
392                     else
393                     {
394                         ycount++;
395                         for (; i >= h; --i)
396                             if ((yk & (1 << i)) != 0)
397                             {
398                                 ycount++;
399                                 break;
400                             }
401                     }
402                 }
403             }
404
405             // Determine which is different, the L bits or the H bits.
406             int j;
407             bool xboundary, yboundary;
408             for (j = h + 1; j <= 31; j++)
409             {
410                 // Are we on a boundary between the heavy/light labels?
411                 uint vm1 = (uint)(1 << j);
412                 xboundary = ((xk & vm1) != 0 ? !xheavy : xheavy);
413                 yboundary = ((yk & vm1) != 0 ? !yheavy : yheavy);
414                 if (xboundary || yboundary)
415                     break;
416             }
417
418             // Determine what kind of boundary.
419             uint k;
420             bool isheavy;
421             if (xcount < ycount)
422             {
423                 k = xk;
424                 isheavy = xheavy;
425             }
426             else
427             {
428                 k = yk;
429                 isheavy = yheavy;
430             }
431
432             if (!isheavy)
433             {
434                 // L bits differ.
435                 // back up before L.
436                 uint xmask = (uint)-(1 << j);
437                 ncalabel = xs & xmask;
438                 ncak = xk & xmask;
439             }
440             else
441             {
442                 // H bits differ.
443                 // First, back up before H.
444                 ncalabel = xs & (uint)-(1 << j);
445
446                 // move forward picking alphabetic minimum H bits
447                 // until next L.
448                 uint xsm = xs;
449                 uint ysm = ys;
450                 uint xmask;
451                 uint ymask;
452                 {
453                     // Find specific masks to get prefix plus heavy label.
454                     // Scan ahead for 0-bit.
455                     int i;
456                     for (i = Bithacks.FloorLog2((uint)v); ; --i)
457                     {
458                         if ((xk & (1 << i)) == 0)
459                             break;
460                     }
461                     i++; // backup.
462                     // get mask of all 1's up to heavy label and apply to the label.
463                     xmask = (uint)-(1 << i);
464                     xsm &= xmask;
465                 }
466                 {
467                     // Find specific masks to get prefix plus heavy label.
468                     // Scan ahead for 0-bit.
469                     int i;
470                     for (i = Bithacks.FloorLog2((uint)v); ; --i)
471                     {
472                         if ((yk & (1 << i)) == 0)
473                             break;
474                     }
475                     i++; // backup.
476                     // get mask of all 1's up to heavy label and apply to the label.
477                     ymask = (uint)-(1 << i);
478                     ysm &= ymask;
479                 }
480                 // Got two labels, pick the one that is alphabetic.
481                 if (xsm < ysm)
482                 {
483                     ncalabel = xsm;
484                     ncak = xk & xmask;
485                 }
486                 else
487                 {
488                     ncalabel = ysm;
489                     ncak = yk & ymask;
490                 }
491             }
492
493             Vertex found = null;
494             foreach (KeyValuePair<Vertex, uint> match in this.labelEncodingBinary)
495             {
496                 if (match.Value == ncalabel)
497                 {
498                     if (this.kEncodingBinary[match.Key] == ncak)
499                     {
500                         found = match.Key;
501                         break;
502                     }
503                 }
504             }
505             return found;
506         }
507     }
508 }
509

Example

Figure 4. Finding the nearest common ancestor using the Alstrup-Gavoille-Kaplan-Rauhe algorithm.

LCA6

Consider the example tree displayed in Figure 4.  Compute the nearest common ancestor of nodes F and N.

In a series of passes over the tree, the algorithm performs preprocessing (lines 40-58): Size(v) at lines 42 and 60-71; IsHeavy(v) at lines 43, 44, and 149-179; Apex(v) at lines 45 and 181-199; HeavyLabel(v) at lines 46-49 and 201-216; LightLabel(v) at lines 50 and 218-226; EncodeHeavyLabel(v) at lines 51-54 and 228-272; EncodeLightLabel(v) at lines 56 and 274-310; and FullLabel(v) at lines 57 and 73-147.  The results of the computation are displayed in Table 3.

The NCA for nodes F and N is now computed (lines 312-508).  The full labels for the two nodes are found (lines 326 and 327).  FullLabel(F) = 580016.  FullLabel(N) = 700016.  FullLabel(NCA(F, N)) = FullLabel(F) & FullLabel(N) = 580016 & 700016 = 500016.  Mask(F) = D80016.  Mask(N) = F40016.  The most significant bit that is different is bit 13 (bit 0 is the least significant bit, bit 15 is the most significant), found at line 331.  This bit corresponds to a bit difference in the heavy label substring of FullLabel(N) but in the light label substring of FullLabel(F) (lines 343-344).  The number of heavy/light substring boundary crossings is next counted starting from the left edge of each full label, xcount = 2, ycount = 1 (lines 348-403). The bit that is different is on a heavy/light substring boundary for FullLabel(N), but not for FullLabel(F).  All this information leads one to conclude that the bit that is different is in the heavy label substring of FullLabel(N).  In this case, the minimum between the two heavy label substrings must be chosen (lines 441-491).  The labels are computed and compared (line 481) and the FullLabel(NCA(F, N)) = 400016.  Therefore, NCA(F, N) = Reverse(FullLabel(NCA(F, N))) = Reverse(4000) = B (line 500).

Table 3.  Alstrup-Gavoille-Kaplan-Rauhe computation for Figure 4.

Node, v Size(v) Class(v) Light label(v) Encoded Light label(v) Heavy label(v) Encoded Heavy label(v) Label definition Full Label with light and heavy boundary indicators Full Label (16-bit value) Heavy/light regions (16-bit value)
A 18 L 18 e 1 00000 L(A).H(A) LH00000 0000 F800
B 17 H 11 01 L(A).H(B) LH01 4000 C000
C 4 L 4 0 3 0 L(A).H(B).L(C).H(C) LH01L0H0 4000 D000
D 6 H 2 0110 L(A).H(D) LH0110 6000 F000
E 6 L 6 1 2 00 L(A).H(B).L(E).H(E) LH01L1H00 6000 D800
F 1 H 1 11 L(A).H(B).L(C).H(F) LH01L0H11 5800 D800
G 1 L 1 0 1 0 L(A).H(B).L(C).H(C).L(G).H(G) LH01L0H0L0H0 4000 D400
H 1 L 1 1 1 0 L(A).H(B).L(C).H(C).L(H).H(H) LH01L0H0L1H0 4800 D400
I 4 H 3 0111 L(A).H(I) LH0111 7000 F000
J 1 L 1 0 1 0 L(A).H(D).L(J).H(J) LH0110L0H0 6000 F400
K 4 H 3 01 L(A).H(B).L(E).H(K) LH01L1H01 6800 D800
L 1 L 1 0 1 0 L(A).H(B).L(E).H(E).L(L).H(L) LH01L1H00L0H0 6000 Da00
M 1 H 1 10001 L(A).H(M) LH10001 8800 F800
N 1 L 1 0 1 0 L(A).H(I).L(N).H(N) LH0111L0H0 7000 F400
O 1 L 1 1 1 0 L(A).H(I).L(O).H(O) LH0111L1H0 7800 F400
P 1 H 1 101 L(A).H(B).L(E).H(P) LH01L1H101 7400 Dc00
Q 1 L 1 0 1 0 L(A).H(B).L(E).H(K).L(Q).H(Q) LH01L1H01L0H0 6800 Da00
R 1 L 1 1 1 0 L(A).H(B).L(E).H(K).L(R).H(R) LH01L1H01L1H0 6c00 Da00

References

[1] Aho, A., Hopcroft, J. and Ullman, J. On finding lowest common ancestors in trees. SIAM J. Comput., 51976), 115-132.

[2] Gusfield, D. Algorithms on Stings, Trees, and Sequences: Computer Science and Computational Biology. ACM SIGACT News, 28, 4 1997), 41-60.

[3] Farach, M., Kannan, S. and Warnow, T. A robust model for finding optimal evolutionary trees. Algorithmica, 13, 1 1995), 155-179.

[4] Walker II, J. A Node-positioning Algorithm for General Trees. Software – Practice and Experience, 20, 7 1990), 685-705.

[5] Di Battista, G. and Tamassia, R. On-line planarity testing. SIAM Journal on Computing, 251996), 956-997.

[6] Westbrook, J. Fast incremental planarity testing. Springer, City, 1992.

[7] Harel, D. and Tarjan, R. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 131984), 338.

[8] Knuth, D. Combinatorial Algorithms. City, 2008.

[9] Alstrup, S., Gavoille, C., Kaplan, H. and Rauhe, T. Nearest common ancestors: A survey and a new distributed algorithm. ACM New York, NY, USA, City, 2002.

[10] Alstrup, S., Gavoille, C., Kaplan, H. and Rauhe, T. Nearest common ancestors: A survey and a new algorithm for a distributed environment. Theory of Computing Systems, 37, 3 2004), 441-456.

Leave a comment

Your email address will not be published.