Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces Jeffrey S. Beis and David G. Lowe Department of Computer Science University of British Columbia Vancouver, B.C., Canada V6T 1Z4 beis/

[email protected] Abstract Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of high-dimensionalfeatures is critical, due to the improved level of discrimination they can provide. Un- fortunately, finding the nearest neighbour to a query point rapidly becomes inefficient as the dimensionality of the fea- ture space increases. Past indexing methods have used hash tables for hypothesis recovery, but only in low-dimensional situations. In this paper, we show that a new variant of the k-d tree search algorithm makes indexing in higher- dimensional spaces practical. This Best Bin First, or BBF, search is an approximate algorithm which finds the near- est neighbour for a large fraction of the queries, and a very close neighbour in the remaining cases. The technique has been integrated into a fully developed recognition system, which is able to detect complex objects in real, cluttered scenes in just a few seconds. 1. Introduction Shape-based indexing uses feature vectors from an im- age to access an index structure, rapidly recovering possible matches to a database of object models. This procedure is much more efficient than the alternative of exhaustive com- parison, at a cost of some offline preprocessing to create the index, and the extra memory required to store it. The goal of indexingis to recover from the index the most similar model shapes to a given image shape. In terms of feature vectors, or points in a feature space, this corresponds to finding a set of nearest neighbours (NN) to a query point. Virtually all previous indexing approaches in model-based vision [4, 5, 9, 10, 16, 18, 19] have used hash tables for this task. This is somewhat surprising since it is well-known in other communities (e.g. pattern recognition, algorithms) that tree structures do the job much more efficiently. In large part this oversight can be explained by the fact that indexing techniques are generally applied in low- dimensional spaces, where hash table search can be quite efficient. Such spaces are adequate when the number of objects is small, but higher-dimensional feature vectors are essential when the model database becomes large, because they provide a much greater degree of discrimination. Un- fortunately, nearest-neighbour search times depend expo- nentially on the dimension of the space. One data structure that has been used extensively for NN lookup is the k-d tree [6]. While the “curse of dimension- ality” is also a problem for k-d trees, the effects are not as severe. Hash table inefficiency is mainly due to the fact that bin sizes are fixed, whereas those in a k-d tree are adaptive to the local density of stored points. Thus, in some cases (high-density region), a hashing approach will have to do a long linear search through many points contained in the bin in which the query point lands; in other cases (low-density), an extensive search through adjacent bins may be required before the best neighbour can be determined. In addition, there is the difficulty of choosing an appropriate bin size. In a previous paper [2], we presented an indexing method which uses k-d trees to recover a small set of nearest neigh- bours to an image feature vector. The neighbours are used to compute probabilities for each model-image match hypoth- esis, which allows the hypotheses to be ranked so that the most likely of them can be verified first. This process was showntobeeffectiveina4D feature space, with a small model database of 4 objects. In this paper we will demon- strate that the technique is feasible in spaces with up to 10 or 20 dimensions. The combination of highly specific feature vectors and probabilistichypothesis generation provides ex- tremely efficient indexing. Our method relies on the rapid recovery of nearest neigh- bours from the index. In high-dimensional spaces, standard k-d tree search often performs poorly, having to examine a large fraction of the points in the space to find the exact near- est neighbour. However, a variant of this search which effi- ciently finds approximate neighbours will be used to limit the search time. The algorithm, which we have called Best Bin First (BBF) search, finds the nearest neighbour for a large fraction of queries, and finds a very good neighbour the remaining times. This type of search has wider appli- cation than for shape indexing alone: another vision-related use would be for closest point matching in appearance-based recognition (see e.g. [15]). 2. Previous work We first discuss several prominent indexing methods which have used hash tables for indexing. Forsythe, et al.[5], outlined several types of projective invariant features for indexing planar objects viewed in arbitrary 3D orienta- tions. However, their experiments were carried out using a 2D index space generated by pairs of conics. Rothwell, et al.[16], used 4D indices defined by area moments of planar curves, that were first transformed into canonical reference frames. Clemens and Jacobs [4] generated 4D and 6D index spaces from hand-grouped point sets. For each of the meth- ods above, the dimensionality of the spaces and the number of models were too low for the inefficiency of hashing to be critical. The geometric hashing indexing technique (e.g. [10]) is also generally applied in low- (2-or3-) dimensional index spaces. While this technique differs substantially from other indexing methods in that voting overcomes some of the dif- ficulty with bin boundaries, Grimson [7] notes that perfor- mance is poor even with small amounts of noise and clutter. This is due to the hash table becoming overloaded with en- tries, and indicates that the use of higher-dimensional spaces is important. In [9], geometric hashing is applied with larger (8D) indices generated from planar curves (“footprints”). However, the experiments did not truly test indexing per- formance because only a few models were used, with 3 or 4 features each. Stein and Medioni presented a method for 2D-from-2D [18] and 3D-from-3D [19] indexing that also used hash ta- ble lookup. While the former method used index spaces of between 3 and 6 dimensions, the latter work considered higher- (up to about 14-) dimensional spaces. They avoided the exponential (with dimension) search for NN by using ex- tremely coarse quantization of the bins (e.g., 60 n0e for angle features), and looking only in the single bin containing the query point. This leads to the recovery of a large number of hypotheses, with a low signal-to-noise ratio, since highly dissimilar shapes are allowed to match. Nor does it preclude missing a significant percentage of good hypotheses which lie just across one of the many bin boundaries. In [3], Califano and Mohan argue for the use of higher- dimensional spaces. Their analysis indicates a dramatic speedup from increasing the size of the feature vectors used for indexing. However, they again use hash tables for the lookup and do no search. Thus, in their method a pose clus- tering stage is required to accumulate results, presumably because of the high likelihood that a query will land in a dif- ferent bin than that of the best neighbour. In the pattern classification literature, various types of tree have been used to find nearest neighbours. Just as in hashing, these methods divide the space into bins; unlike hashing, the methods are adaptive in that the partition of the data space depends on the data points to be stored. The best-known and most widely used of these is the k- dtree[6], a complete binary tree having smaller bins in the higher-density regions of the space. The analysis in the orig- inal paper shows expected logarithmic time lookup, but ex- periments in higher dimensions [17] have shown that the method often suffers from inefficient search, in many cases having to examine a large fraction of the stored points to de- termine the exact nearest neighbour. Other trees specialized to fast NN lookupexist (e.g. [8, 14]), but are sub-optimal be- cause more constraints are set on where bin boundaries can be placed. Instead of finding the exact nearest neighbour, if we are willing to accept an approximate neighbour in some fraction of the cases, then processing time can be greatly reduced. One early method [13] uses a hierarchical tree data struc- ture in which internal nodes are the centers of mass of the nodes at the next lower level. The point recovered from the first leaf node encountered provides an approximate NN in a very short search time, but the quality of this neighbour is relatively low. In [1], Arya presents an approach based on neighborhood graphs. While his main results concern the asymptotic be- havior of one particular algorithm, which contains imprac- tically large constant factors, he does give a practical vari- ant (RNG n03 ) that demonstrates good results for moderate di- mensionalities (8-16). Independent of our own work, he also develops the equivalent of BBF search which he terms “pri- orityk-d tree search.” Comparisons between RNG n03 and the priority algorithm show comparable performance for mod- erate dimensionalities. Most recently, Nene and Nayar [15] have used a differ- ent type of approximation to NN lookup. They only guar- antee recovery of the best neighbour if it is within n0f of the query point, by using a set of sorted lists of the coordinates of the stored points, and successively eliminating points fur- ther away than n0f, one dimension at a time. This method is effective if n0f can be kept small (n14 0n3a25 when each dimen- sion has unit extent), but in higher-dimensional spaces that is only possible when the number of points is extremely large, on the order of n28 1 n0f n29 k . Figure 1. Examples of feature groupings, including a parallel segment grouping (upper left) and a segment chain grouping (lower right). Each model segment may be redundantlyincluded in several groupings, which may be of various cardinalities and extend to dif- ferent areas of the object. 3. Shape-based indexing We now outline our method of shape-based indexing [2], withinwhich the BBF algorithm provides an important com- ponent. To create a shape index, a preprocessing stage is re- quired in which a set of feature vectors is extracted from the database of object models. These are stored in the index, each with a pointer to the model feature set that generated it. At runtime, feature vectors are extracted from an image and used to access the index, recovering a set of neighbours to each image vector that provide model-image match hy- potheses. Our approach is somewhat unusual in that most other methods have used feature vectors that are invariant to the full set of model transformations (rotation, translation, scal- ing, and projection). Presumably this has been due to fears of excessive memory usage or lookup time. We have found that neither fear is justified, and that requiring objects to in- clude invariant feature sets is too restrictive. Our method instead uses features that are partially in- variant (i.e. to translation, scaling, and image plane rota- tion), but vary with out-of-plane rotations. Simple percep- tual grouping methods [11] that use the properties of paral- lelism and cotermination in various combinations have been used to build complex feature sets, with straight edge fea- tures serving as the primitives (Figure 1). Angles and edge length ratios, both of which satisfy the partial invariance property, are extracted from these groupings and used to form high-dimensional feature vectors. Each view of an object in general contains several, pos- sibly overlapping feature groupings. As well, all groupings satisfy the additional perceptual grouping property of prox- imity. This combination of locality and redundancy of the feature sets means that indexing can succeed even with sig- nificant occlusion and noise. A second form of redundancy, the use of several different cardinalities of groupingsimulta- Figure 2. Example of recognition system showing correct and incorrect indexing hypotheses, and properly determined pose for the correct hypotheses. The initial shape match hypotheses are highlighted (bold lines). neously, provides robustness for those cases where none of the larger segment groupings are faithfully detected. To deal with the variation of shape with viewpoint, it is necessary to store feature vectors for a set of views that provides a good coverage of the viewing sphere. This in- troduces two problems: (a) the need to interpolate between nearby views; and (b) the possibilitythat so many views will be required that memory or NN lookup times will be exces- sive. We solve these problems as follows. At runtime, a set of nearest neighbours is found using a new form of k-d tree search (BBF) which sets an absolutetime limit on the search. Then, a kernel-based density estimation method (weighted k-NN) uses the recovered neighbours to interpolate between views, at the same time generating a posteriori probability estimates for the likelihood that a given match hypothesis is correct. Hypotheses can thus be prioritized so that the least ambiguous are verified first, an important aspect of any ef- ficient recognition system [20]. From each match hypothesis, it is possible to estimate ob- ject pose [12]. False positives are then rejected using an iter- ative back-projection procedure. At each stage, the image is examined for further matches near the predicted object fea- ture locations, and a least-squares fit of predicted versus ac- tual image locations is computed. This is over-constrained, so it is highly unlikely that an incorrect initial match will lead to many additional feature correspondences. Therefore, the final recognition can be robust and accurate despite the fact that the initial indexing is probabilistic. All of the above components (grouping, indexing, hy- pothesis ranking, and verification) have been integrated into a fully functioning recognition system. Rapid recovery of nearest neighbours is a key to making the entire process ef- 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Data points Query point Remaining search hypersphere Figure 3. k-d tree with 8 points,kn3d2. In BBF search, we exam- ine the bins closest to the query pointq first. More than the standard search, this is likely to maximize the overlap of (A) the hypersphere centered on q with radius D cur , and (B) the hyperrectangle of the bin to be searched. In the case above, BBF would reduce the num- ber of leaf nodes examined since the NN is in the closest adjacent bin in space (directly belowq), rather than the closest bin in the tree structure (to the left of q). ficient. The NN search described below keeps l