# 2011ORB_ an efficient alternative to SIFT or SURF.pdf

ORB: an efficient alternative to SIFT or SURF Ethan Rublee Vincent Rabaud Kurt Konolige Gary Bradski Willow Garage, Menlo Park, California {erublee}{vrabaud}{konolige}{bradski}@willowgarage.com Abstract Feature matching is at the base of many computer vi- sion problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for de- tection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magni- tude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world ap- plications, including object detection and patch-tracking on a smart phone. 1. Introduction The SIFT keypoint detector and descriptor [17], al- though over a decade old, have proven remarkably success- ful in a number of applications using visual features, in- cluding object recognition [17], image stitching [28], visual mapping [25], etc. However, it imposes a large computa- tional burden, especially for real-time systems such as vi- sual odometry, or for low-power devices such as cellphones. This has led to an intensive search for replacements with lower computation cost; arguably the best of these is SURF [2]. There has also been research aimed at speeding up the computation of SIFT, most notably with GPU devices [26]. In this paper, we propose a computationally-efficient re- placement to SIFT that has similar matching performance, is less affected by image noise, and is capable of being used for real-time performance. Our main motivation is to en- hance many common image-matching applications, e.g., to enable low-power devices without GPU acceleration to per- form panorama stitching and patch tracking, and to reduce the time for feature-based object detection on standard PCs. Our descriptor performs as well as SIFT on these tasks (and better than SURF), while being almost two orders of mag- nitude faster. Our proposed feature builds on the well-known FAST keypoint detector [23] and the recently-developed BRIEF descriptor [6]; for this reason we call it ORB (Oriented Figure 1. Typical matching result using ORB on real-world im- ages with viewpoint change. Green lines are valid matches; red circles indicate unmatched points. FAST and Rotated BRIEF). Both these techniques are at- tractive because of their good performance and low cost. In this paper, we address several limitations of these tech- niques vis-a-vis SIFT, most notably the lack of rotational invariance in BRIEF. Our main contributions are: ? The addition of a fast and accurate orientation compo- nent to FAST. ? The efficient computation of oriented BRIEF features. ? Analysis of variance and correlation of oriented BRIEF features. ? A learning method for de-correlating BRIEF features under rotational invariance, leading to better perfor- mance in nearest-neighbor applications. To validate ORB, we perform experiments that test the properties of ORB relative to SIFT and SURF, for both raw matching ability, and performance in image-matching applications. We also illustrate the efficiency of ORB by implementing a patch-tracking application on a smart phone. An additional benefit of ORB is that it is free from the licensing restrictions of SIFT and SURF. 2. Related Work Keypoints FAST and its variants [23, 24] are the method of choice for finding keypoints in real-time systems that match visual features, for example, Parallel Tracking and Mapping [13]. It is efficient and finds reasonable corner keypoints, although it must be augmented with pyramid 1 schemes for scale [14], and in our case, a Harris corner filter [11] to reject edges and provide a reasonable score. Many keypoint detectors include an orientation operator (SIFT and SURF are two prominent examples), but FAST does not. There are various ways to describe the orientation of a keypoint; many of these involve histograms of gradient computations, for example in SIFT [17] and the approxi- mation by block patterns in SURF [2]. These methods are either computationally demanding, or in the case of SURF, yield poor approximations. The reference paper by Rosin [22] gives an analysis of various ways of measuring orienta- tion of corners, and we borrow from his centroid technique. Unlike the orientation operator in SIFT, which can have multiple value on a single keypoint, the centroid operator gives a single dominant result. Descriptors BRIEF [6] is a recent feature descriptor that uses simple binary tests between pixels in a smoothed image patch. Its performance is similar to SIFT in many respects, including robustness to lighting, blur, and perspective dis- tortion. However, it is very sensitive to in-plane rotation. BRIEF grew out of research that uses binary tests to train a set of classification trees [4]. Once trained on a set of 500 or so typical keypoints, the trees can be used to re- turn a signature for any arbitrary keypoint [5]. In a similar manner, we look for the tests least sensitive to orientation. The classic method for finding uncorrelated tests is Princi- pal Component Analysis; for example, it has been shown that PCA for SIFT can help remove a large amount of re- dundant information [12]. However, the space of possible binary tests is too big to perform PCA and an exhaustive search is used instead. Visual vocabulary methods [21, 27] use offline clustering to find exemplars that are uncorrelated and can be used in matching. These techniques might also be useful in finding uncorrelated binary tests. The closest system to ORB is [3], which proposes a multi-scale Harris keypoint and oriented patch descriptor. This descriptor is used for image stitching, and shows good rotational and scale invariance. It is not as efficient to com- pute as our method, however. 3. oFAST: FAST Keypoint Orientation FAST features are widely used because of their compu- tational properties. However, FAST features do not have an orientation component. In this section we add an efficiently- computed orientation. 3.1. FAST Detector We start by detecting FAST points in the image. FAST takes one parameter, the intensity threshold between the center pixel and those in a circular ring about the center. We use FAST-9 (circular radius of 9), which has good per- formance. FAST does not produce a measure of cornerness, and we have found that it has large responses along edges. We em- ploy a Harris corner measure [11] to order the FAST key- points. For a target number N of keypoints, we first set the threshold low enough to get more than N keypoints, then order them according to the Harris measure, and pick the top N points. FAST does not produce multi-scale features. We employ a scale pyramid of the image, and produce FAST features (filtered by Harris) at each level in the pyramid. 3.2. Orientation by Intensity Centroid Our approach uses a simple but effective measure of cor- ner orientation, the intensity centroid [22]. The intensity centroid assumes that a corner’s intensity is offset from its center, and this vector may be used to impute an orientation. Rosin defines the moments of a patch as: mpq = summationdisplay x,y xpyqI(x,y), (1) and with these moments we may find the centroid: C = parenleftbiggm 10 m00, m01 m00 parenrightbigg (2) We can construct a vector from the corner’s center, O, to the centroid, vectorOC. The orientation of the patch then simply is: θ = atan2(m01,m10), (3) where atan2 is the quadrant-aware version of arctan. Rosin mentions taking into account whether the corner is dark or light; however, for our purposes we may ignore this as the angle measures are consistent regardless of the corner type. To improve the rotation invariance of this measure we make sure that moments are computed with x and y re- maining within a circular region of radius r. We empirically choose r to be the patch size, so that that x and y run from [?r,r]. As |C| approaches 0, the measure becomes unsta- ble; with FAST corners, we have found that this is rarely the case. We compared the centroid method with two gradient- based measures, BIN and MAX. In both cases, X and Y gradients are calculated on a smoothed image. MAX chooses the largest gradient in the keypoint patch; BIN forms a histogram of gradient directions at 10 degree inter- vals, and picks the maximum bin. BIN is similar to the SIFT algorithm, although it picks only a single orientation. The variance of the orientation in a simulated dataset (in-plane rotation plus added noise) is shown in Figure 2. Neither of the gradient measures performs very well, while the cen- troid gives a uniformly good orientation, even under large image noise. 5 10 15 20 25 30 35 40 0 5 10 15 20 25 Standard Deviation (degrees) Image Intensity Noise (gaussian noise for an 8 bit image) Standard Deviation in Angle Error IC MAX BIN Figure 2. Rotation measure. The intensity centroid (IC) per- forms best on recovering the orientation of artificially rotated noisy patches, compared to a histogram (BIN) and MAX method. 4. rBRIEF: Rotation-Aware Brief In this section, we first introduce a steered BRIEF de- scriptor, show how to compute it efficiently and demon- strate why it actually performs poorly with rotation. We then introduce a learning step to find less correlated binary tests leading to the better descriptor rBRIEF, for which we offer comparisons to SIFT and SURF. 4.1. Efficient Rotation of the BRIEF Operator Brief overview of BRIEF The BRIEF descriptor [6] is a bit string description of an image patch constructed from a set of binary intensity tests. Consider a smoothed image patch, p. A binary test τ is defined by: τ(p;x,y) := braceleftbigg 1 : p(x) p(y) 0 : p(x) ≥p(y) , (4) where p(x) is the intensity of p at a point x. The feature is defined as a vector of n binary tests: fn(p) := summationdisplay 1≤i≤n 2i?1τ(p;xi,yi) (5) Many different types of distributions of tests were consid- ered in [6]; here we use one of the best performers, a Gaus- sian distribution around the center of the patch. We also choose a vector length n = 256. It is important to smooth the image before performing the tests. In our implementation, smoothing is achieved us- ing an integral image, where each test point is a 5× 5 sub- window of a 31×31 pixel patch. These were chosen from our own experiments and the results in [6]. Steered BRIEF We would like to allow BRIEF to be invariant to in-plane rotation. Matching performance of BRIEF falls off sharply for in-plane rotation of more than a few degrees (see Figure 7). Calonder [6] suggests computing a BRIEF descriptor for a set of rotations and perspective warps of each patch, 0 20 40 60 80 100 120 140 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Number of Bits Bit Response Mean Histogram of Descriptor Bit Mean Values BRIEF rBRIEF steered BRIEF Figure 3. Distribution of means for feature vectors: BRIEF, steered BRIEF (Section 4.1), and rBRIEF (Section 4.3). The X axis is the distance to a mean of 0.5 but this solution is obviously expensive. A more efficient method is to steer BRIEF according to the orientation of keypoints. For any feature set of n binary tests at location (xi,yi), define the 2×n matrix S = parenleftbiggx 1,.,xn y1,.,yn parenrightbigg Using the patch orientationθ and the corresponding rotation matrix Rθ, we construct a “steered” version Sθ of S: Sθ = RθS, Now the steered BRIEF operator becomes gn(p,θ) := fn(p)|(xi,yi) ∈ Sθ (6) We discretize the angle to increments of 2π/30 (12 de- grees), and construct a lookup table of precomputed BRIEF patterns. As long at the keypoint orientation θ is consistent across views, the correct set of points Sθ will be used to compute its descriptor. 4.2. Variance and Correlation One of the pleasing properties of BRIEF is that each bit feature has a large variance and a mean near 0.5. Figure 3 shows the spread of means for a typical Gaussian BRIEF pattern of 256 bits over 100k sample keypoints. A mean of 0.5 gives the maximum sample variance 0.25 for a bit feature. On the other hand, once BRIEF is oriented along the keypoint direction to give steered BRIEF, the means are shifted to a more distributed pattern (again, Figure 3). One way to understand this is that the oriented corner keypoints present a more uniform appearance to binary tests. High variance makes a feature more discriminative, since it responds differentially to inputs. Another desirable prop- erty is to have the tests uncorrelated, since then each test will contribute to the result. To analyze the correlation and 0 5 10 15 20 25 30 35 400 2 4 6 8 Component Eigenvalue BRIEF Steered BRIEF rBRIEF Figure 4. Distribution of eigenvalues in the PCA decomposition over 100k keypoints of three feature vectors: BRIEF, steered BRIEF (Section 4.1), and rBRIEF (Section 4.3). 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0 64 128 192 256 Relative Frequency Descriptor Distance Distance Distribution rBRIEF steered BRIEF BRIEF Figure 5. The dotted lines show the distances of a keypoint to out- liers, while the solid lines denote the distances only between inlier matches for three feature vectors: BRIEF, steered BRIEF (Section 4.1), and rBRIEF (Section 4.3). variance of tests in the BRIEF vector, we looked at the re- sponse to 100k keypoints for BRIEF and steered BRIEF. The results are shown in Figure 4. Using PCA on the data, we plot the highest 40 eigenvalues (after which the two de- scriptors converge). Both BRIEF and steered BRIEF ex- hibit high initial eigenvalues, indicating correlation among the binary tests – essentially all the information is contained in the first 10 or 15 components. Steered BRIEF has signif- icantly lower variance, however, since the eigenvalues are lower, and thus is not as discriminative. Apparently BRIEF depends on random orientation of keypoints for good per- formance. Another view of the effect of steered BRIEF is shown in the distance distributions between inliers and out- liers (Figure 5). Notice that for steered BRIEF, the mean for outliers is pushed left, and there is more of an overlap with the inliers. 4.3. Learning Good Binary Features To recover from the loss of variance in steered BRIEF, and to reduce correlation among the binary tests, we de- velop a learning method for choosing a good subset of bi- nary tests. One possible strategy is to use PCA or some other dimensionality-reduction method, and starting from a large set of binary tests, identify 256 new features that have high variance and are uncorrelated over a large training set. However, since the new features are composed from a larger number of binary tests, they would be less efficient to com- pute than steered BRIEF. Instead, we search among all pos- sible binary tests to find ones that both have high variance (and means close to 0.5), as well as being uncorrelated. The method is as follows. We first set up a training set of some 300k keypoints, drawn from imag