# 2012Spherical_Hashing.pdf

Spherical Hashing Jae-Pil Heo1, Youngwoon Lee1, Junfeng He2, Shih-Fu Chang2, Sung-Eui Yoon1 1 KAIST 2 Columbia University Abstract Many binary code encoding schemes based on hashing have been actively studied recently, since they can pro- vide efficient similarity search, especially nearest neighbor search, and compact data representations suitable for han- dling large scale image databases in many computer vi- sion problems. Existing hashing techniques encode high- dimensionaldatapointsbyusinghyperplane-basedhashing functions. In this paper we propose a novel hypersphere- based hashing function, spherical hashing, to map more spatially coherent data points into a binary code compared to hyperplane-based hashing functions. Furthermore, we propose a new binary code distance function, spherical Hamming distance, that is tailored to our hypersphere- based binary coding scheme, and design an efficient iter- ative optimization process to achieve balanced partitioning of data points for each hash function and independence be- tween hashing functions. Our extensive experiments show that our spherical hashing technique significantly outper- forms six state-of-the-art hashing techniques based on hy- perplanes across various image benchmarks of sizes rang- ing from one to 75 million of GIST descriptors. The perfor- mance gains are consistent and large, up to 100% improve- ments. The excellent results confirm the unique merits of the proposed idea in using hyperspheres to encode proxim- ity regions in high-dimensional spaces. Finally, our method is intuitive and easy to implement. 1. Introduction Thanks to rapid advances of digital camera and various image processing tools, we can easily create new pictures and images for various purposes. This in turn results in a huge amount of images available online. These huge image databases pose a significant challenge in terms of scalabil- ity to many computer vision applications, especially those applications that require efficient similarity search. For similarity search, nearest neighbor search techniques have been widely studied and tree-based techniques [6] have been used for low-dimensional data points. Un- fortunately, these techniques are not scalable to high- dimensionaldatapoints. Hencerecentlyhashingtechniques Figure 1. The first two images show partitioning examples of spherical hashing functions in a 2D space for 2 and 3 bit binary codes. The rectangle in the images represents the boundary of data points. Therightmostfigureshowspartitioningboundariesofnon- linear hashing functions used in the GSPICA method [10], which is based on hyperplanes defined in its RBF kernel space. The par- titioning boundary is visualized in the 2D original data space. have been actively studied to provide efficient solutions for such high-dimensional data points [11, 23, 26]. Encodinghigh-dimensionaldatapointsintobinarycodes based on hashing techniques enables higher scalability thanks to both its compact data representation and efficient indexing mechanism. Similar high-dimensional data points are mapped to similar binary codes and thus by looking into only those similar binary codes (based on the Hamming distance), we can efficiently identify approximate nearest neighbors. Existing hashing techniques can be broadly categorized as data-independent and data-dependent schemes. In data- independent techniques, hashing functions are chosen inde- pendently from the input points. Locality-Sensitive Hash- ing (LSH) [11] is one of the most widely known techniques in this category. This technique is extended to various hash- ing functions [3, 1, 13, 2, 20]. Recent research attentions have been shifted to developing data-dependent techniques to consider the distribution of data points and design bet- ter hashing functions. Notable examples include spectral hashing [26], semi-supervised hashing [25], iterative quan- tization [7], joint optimization [10], and random maximum margin hashing [15]. In all of these existing hashing techniques, hyperplanes are used to partition the data points (located in the original data space or a kernel space) into two sets and assign two different binary codes (e.g., ?1 or +1) depending on which set each point is assigned to. Departing from this con- ventional approach, we propose a novel hypersphere-based scheme, spherical hashing, for computing binary codes. Intuitively, hyperspheres provide much stronger power in defining a tighter closed region in the original data space than hyperplanes. For example, at least d + 1 hyperplanes are needed to define a closed region for a d-dimensional space, while only a single hypersphere can form such a closed region even in an arbitrarily high dimensional space. One can find that hyperplanes in a kernel space are able to map to non-linear hashing functions (Fig. 1). However, we have found that the proposed simple spherical hashing in the original space achieves more spatially coherent parti- tioning than the non-linear hashing functions used in recent works [10, 15, 20]. Our paper has the following contributions: 1. We propose a novel spherical hashing scheme, analyze its ability in terms of similarity search, and compare it against the state-of-the-art hyperplane-based tech- niques (Sec. 3.1). 2. We develop a new binary distance function tailored for the spherical hashing method (Sec. 3.2). 3. We formulate an optimization problem that achieves both balanced partitioning for each hashing function and the independence between any two hashing func- tions (Sec. 3.3). Also, an efficient, iterative process is proposed to construct spherical hashing functions (Sec. 3.4). In order to highlight benefits of our method, we have tested our method against different benchmarks that con- sists of one to 75 million image feature points with varying dimensions. We have also compared our method with many state-of-the-art techniques and found that our method sig- nificantly outperforms all the tested techniques, confirming the superior ability of defining closed regions with tighter bounds compared to conventional hyperplane-based hash- ing functions (Sec. 4). 2. Related Work In this section we discuss prior work related to image representations and nearest neighbor search techniques. 2.1. Image Representations To identify visually similar images for a query image, many image representations have been studied [4]. Exam- ples ofthe most popular schemesinclude the Bag-of-visual- Words representation [21] and GIST descriptor [19], which have been known to work well in practice. These image de- scriptors have high dimensionality (e.g. hundreds to thou- sands)andidentifyingsimilarimagesistypicallyreducedto finding nearest neighbor points in those high dimensional, image descriptor spaces [17]. Since these image descriptor spaces have high dimensions, finding nearest neighbor im- age descriptors has been known to be very hard because of the ‘curse of dimensionality’ [11]. 2.2. Tree based Methods Space partition based tree structures such as kd-trees [6] have been used to find nearest neighbors and optimized for higher performance [16]. It has been widely known, however, that kd-tree based search can run slower than linear scan for high dimensional points. Nist′er and Stew′enius [18] proposed another tree-based nearest neigh- bor search scheme based on hierarchical k-means trees. Although these techniques achieve reasonably high ac- curacy and efficiency, they have been demonstrated in small image databases consisting of about one million images. Also, these techniques do not consider compressions of im- age descriptors to handle large-scale image databases. 2.3. Binary Hashing Methods Binary hashing methods aim to embed points in binary codes,whilepreservingrelativedistancesamongthem. One of the most popular hashing techniques is LSH [11]. Its hash function is based on projection onto random vectors drawn from specific distribution. Many variations of LSH have been proposed for learned metrics [13], min-hash [2], random Fourier features [20], etc. [3, 1]. Since hashing functions in LSH are drawn independently from the data points, it could be inefficient especially for short lengths of binary codes. There have been a number of research efforts to de- velop data-dependent hashing methods that reflect data dis- tributions to improve the performance. Weiss et al. [26] have proposed data-dependent, spectral hashing motivated by spectral graph partitioning. It improved performance over LSH especially for compact bit lengths (i.e. 8 and 16 bits). Wang et al. [25] proposed a semi-supervised hashing method to improve image retrieval performance by exploit- ing label information of the training set. Gong and Lazeb- nik [7] introduced a procrustean approach that directly min- imizes quantization error by rotating zero-centered PCA- projected data. He et al. [10] presented a hashing method that jointly optimizes both search accuracy and search time. Joly and Buisson [15] constructed hash functions by using large margin classifiers with arbitrarily sampled data points that are randomly separated into two sets. All the mentioned hashing techniques compute binary codesbypartitioningdatapoints(locatedintheoriginalfea- ture space or a kernel space) into two different sets based on hyperplanes. Departing from this conventional approach, we adopt a novel approach of partitioning data points by hyperspheres. Wefoundasimilarlynamedmethod,sphericalLSH[22]. Our method is totally different from this spherical LSH, which is a specialized technique for data points located on the unit hypersphere. 2.4. Distance based Indexing Methods The database community has been designing efficient techniques for indexing high dimensional points and sup- porting various proximity queries. Filho et al. [5] index points with distances from fixed pivot points. As a re- sult, a region with a same index given a pivot becomes a ring shape. This method reduces the region further by us- ing multiple pivot points. They then built various hierar- chical structures (e.g., R-tree) to support various proxim- ity queries. The efficiency of this method highly depends on the locations of pivots. For choosing pivots, Jagadish et al. [12] used k-means clustering and Venkateswaran et al. [24] adopted other heuristics such as maximizing the variance of distances from pivots to data points. This line of work uses a similar concept to ours in terms of using distances from pivots for indexing high- dimensional points. However, our approach is drastically different from these techniques, since ours aims to compute compact binary codes preserving the original metric spaces of feature points by using hashing, while theirs targets for designing hierarchical indexing structures supporting effi- cient proximity queries. 3. Spherical Hashing Let us first define notations. Given a set of n data points in a D-dimensional space, we use X = {x1,.,xn}, xi ∈ RD to denote those data points. A binary code correspond- ing to each data point xi is defined by bi = {?1,+1}c, where c is the length of the code. 3.1. Binary Code Embedding Function Our hashing function H(x) = (h1(x),.,hc(x)) maps points in RD into the binary cube {?1,+1}c. We use a hypersphere to define a spherical hashing function. Each spherical hashing function hk(x) is defined by a pivot pk ∈ RD and a distance threshold tk ∈ R+ as the following: hk(x) = braceleftbigg?1 when d(p k,x) tk +1 when d(pk,x) ≤ tk, where d(·,·) denotes the Euclidean distance between two points in RD; various distance metrics (e.g., Lp metrics) can be used instead of the Euclidean distance. The value of each spherical hashing function hk(x) indicates whether the point x is inside the hypersphere whose center is pk and radius is tk. The key difference between using hyperplanes and hy- perspheres for computing binary codes is their abilities to define a closed region in RD that can be indexed by a bi- nary code. To define a closed region in a d-dimensional space, at least d + 1 hyperplanes are needed, while only a single hypersphere is sufficient to form such a closed region in an arbitrarily high dimensional space. Furthermore, un- like using multiple hyperplanes a higher number of closed 10 12 14 16 18 20 22 240 1 2 3 4 5 6 Code length Avg. of max. distances LSH Ours 0 4 8 12 16 1.5 2 2.5 3 Num. of common +1 bits Avg. of max. distances Ours Figure 2. The left figure shows how the avg. of the max. distances among points having the same binary code changes with differ- ent code lengths based on hyperspheres or hyperplanes. We ran- domly sample 1000 different binary codes to compute avg. of the max. distances. The right figure shows how having more common +1 bits in our method effectively forms tighter closed regions. For the right curve we randomly sample one million pairs of bi- nary codes. For each pair of binary codes (bi,bj) we compute the max. distance between pairs of points, (x,y), where H(x) = bi and H(y) = bj. We report the avg. of the max. distances as a function of the number of common +1 bits, i.e. |bi ∧ bj|. Both figures are obtained with GIST-1M-960D dataset (Sec. 4.1). regions can be constructed by using multiple hyperspheres, while the distances between points located in each region are bounded. For example, the number of bounded regions by havingchyperspheres goes up toparenleftbigc?1d parenrightbig+summationtextdi=0parenleftbigciparenrightbig[27]. In addition, we can approximate a hyperplane with a large hypersphere (e.g. a large radius and a far-away center) In nearest neighbor search the capability of forming closed regions with tighter distance bounds is very impor- tantintermsofeffectivelylocatingnearestneighborsfroma querypoint. Whenweconstructsuchtighterclosedregions, a region indexed by the binary code of the query point can contain more promising candidates for the nearest neigh- bors. We also empirically measure how tightly hyperspheres and hyperplanes bound regions. For this purpose, we mea- sure the maximum distance between any two points that have the same binary code and take the average of the max- imum distances among different binary codes. As can be seen in the left figure of Fig. 2, hyperspheres bound regions of binary codes more tightly compared to hyperplanes used in LSH [3]. Across all the tested code lengths, hyperspheres show about two times tighter bounds over the hyperplane- based approach. 3.2. Distance between Binary Codes Most hyperplane-based binary embedding methods use the Hamming distance between two binary codes, which measures the number of different bits, i.e. |bi⊕bj|, where⊕ is the XOR bit operation and |·| denotes the number of +1 bit in a given binary code. This distance metric measures the number of hyperplanes that two given points reside in the opposing side of them. The Hamming distance, how- ever, does not well reflect the property related to defining closed regions with tighter bounds, which is the core bene- fit of using our spheric