Revisiting Binary Local Image Description for Resource Limited Devices

Abstract

The advent of a panoply of resource limited devices opens up new challenges in the design of computer vision algorithms with a clear compromise between accuracy and computational requirements. In this paper we address this and introduce binary image descriptors that establish new operating points in the state-of-the-art's accuracy vs. resources trade-off curve. We revisit descriptors based on pixel differences and gradients to introduce respectively BAD (Box Average Difference), the fastest binary descriptor in the literature, and HashSIFT. They are trained using triplet ranking loss, hard negative mining and anchor swap, combined with a new efficient feature selection algorithm. In our experiments we evaluate the accuracy, execution time and energy consumption of the proposed descriptors. We show that they are the most accurate when confronted with competing techniques with similar computational requirements. Further, in a planar image registration, HashSIFT performs on par with the top deep learning-based descriptors, being several orders of magnitude more efficient.

Video

Learning Efficient Local Descriptors

The goal of any local feature descriptor is to learn a similarity function \( \mathcal{S}(\cdot, \cdot) \) between local features. We define the training objective \( \mathcal{L}_{\text{TRL}} \) of our descriptors with the Triplet Ranking Loss (TRL). It brings different descriptions (\( \mathbf{a}_i \), \( \mathbf{p}_i \)) of the same scene point closer while pushing apart descriptors from other scene points \( \mathbf{n}_i \). Its benefit compared with contrastive pair-wise loss is that it is more related to the nearest neighbors matching task, where a good keypoint match is produced only if the correct corresponding keypoint is the close in descriptor distance.

Hard Negative Mining challenges the TRL with different scene points that have the closest description. At each iteration, we choose our negative \( \mathbf{n}_i \) as the hardest in batch (i.e., the one with the smallest descriptor distance).

Results

Here we add some extra results showing the performance of the proposed descriptors with other approaches in the State of the Art:

BAD-256 Reconstruction of Madrid Metropolis

BAD-512 Fundamental matrix estimation (EuRoC)

ETH Benchmark

Full results table in ETH Benchmark:

	# Registered	# Sparse Points	# Obervations	Track Length	Reproj. Error	# Inliner Pairs	# Inliner Matches	# Dense Points
Fountain (11 images)
ORB	11	15001	71171	4.744417	0.384306	55	125033	306277
BEBLID-256	11	15539	74044	4.765043	0.394489	55	133838	303771
LATCH	11	15384	73907	4.804147	0.401214	55	135643	307421
BAD-256	11	15574	74404	4.77745	0.397276	55	135943	307932
BAD-512	11	15741	75613	4.80357	0.407335	55	141365	305564
RSIFT	11	16167	77879	4.817158	0.433049	55	154688	307027
Binboost	11	15391	73011	4.743746	0.397668	55	129571	302792
LDAHash-DIF	11	15134	70865	4.682503	0.389491	55	122713	304385
HashSIFT-256	11	16086	77507	4.818289	0.427431	55	149103	306132
HashSIFT-512	11	16385	79082	4.826488	0.438388	55	156135	305520
TFeat-m*	11	16278	78880	4.845804	0.431607	55	153725	305073
HardNet	11	17071	83973	4.919044	0.477603	55	183331	305701
CDbin-256b	11	16607	81360	4.899139	0.455184	55	168946	305534
Herzjesu (8 images)
ORB	8	7619	31475	4.13112	0.41019	28	46625	237948
BEBLID-256	8	7922	33414	4.217874	0.429793	28	51720	241862
LATCH	8	7871	33058	4.199975	0.430669	28	50739	240523
BAD-256	8	8056	34038	4.225174	0.435542	28	53059	242998
BAD-512	8	8220	34893	4.244891	0.448551	28	55866	236171
RSIFT	8	8533	36279	4.251611	0.476318	28	60808	241740
Binboost	8	7630	32009	4.195151	0.454498	28	47763	233824
LDAHash-DIF	8	7912	32683	4.130814	0.435268	28	48765	244861
HashSIFT-256	8	8560	36392	4.251402	0.473129	28	59246	240978
HashSIFT-512	8	8769	37376	4.262288	0.479877	28	62297	240154
TFeat-m*	8	8631	36727	4.255243	0.476186	28	60675	239675
HardNet	8	9444	40483	4.286637	0.517284	28	74867	239362
CDbin-256b	8	8997	38650	4.295876	0.497678	28	67802	242179
South Building (128 images)
ORB	128	137627	695789	5.055614	0.496237	8128	2285089	2137625
BEBLID-256	128	141604	710290	5.016031	0.500718	8128	2347648	2134091
LATCH	128	139584	716808	5.135316	0.521234	8128	2345677	2144368
BAD-256	128	145771	727953	4.993812	0.515675	8128	2435017	2145993
BAD-512	128	148491	744604	5.014472	0.527237	8128	2533879	2127316
RSIFT	128	155195	798456	5.144856	0.58171	8128	2836156	2139778
Binboost	128	135186	690751	5.109634	0.510165	8128	2220460	2156847
LDAHash-DIF	128	141248	705928	4.997791	0.511755	8128	2469511	2132395
HashSIFT-256	128	149102	764699	5.128697	0.563444	8128	2718812	2116461
HashSIFT-512	128	156888	798948	5.092474	0.581466	8128	2904787	2142022
Tfeat-m*	128	152834	775159	5.071902	0.574171	8128	2721956	2149925
HardNet	128	168536	878847	5.214595	0.642522	8128	3344759	2122914
CDbin-256b	128	160589	832281	5.182678	0.616106	8128	3124870	2128460
Madrid Metropolis (1344 images)
ORB	457	135826	576138	4.241736	0.641296	898475	77323855	1085693
BEBLID-256	549	174257	705651	4.049484	0.656167	898491	78223028	1153261
LATCH	573	186886	759581	4.064408	0.655908	898825	66395879	1245053
BAD-256	600	192638	789466	4.098184	0.675328	898344	72555880	1236144
BAD-512	622	189523	812243	4.285723	0.677531	898327	68242893	1268840
RSIFT	729	286519	1136306	3.965901	0.678011	898184	77745627	1349061
Binboost	514	143622	629993	4.386466	0.668252	897792	67946197	1129936
LDAHash-DIF	592	233862	804944	3.441961	0.642139	898544	95827306	1046695
HashSIFT-256	720	298920	1075450	3.597785	0.667772	898464	88150278	1202895
HashSIFT-512	720	305237	1160738	3.802743	0.686795	898459	87769325	1387138
TFeat-m*	690	262790	986470	3.753834	0.677615	897709	75823683	1233791
HardNet	849	359610	1438909	4.001304	0.701354	898257	79144113	1436234
CDbin-256b	769	260690	1108018	4.250328	0.696556	898274	79222034	1347656

Citation

If you use this project please cite:

@article{suarez2021revisiting,
  title={Revisiting Binary Local Image Description for Resource Limited Devices},
  author={Su{\'a}rez, Iago and Buenaposada, Jos{\'e} M and Baumela, Luis},
  journal={IEEE Robotics and Automation Letters},
  volume={6},
  number={4},
  pages={8317--8324},
  year={2021},
  publisher={IEEE}
}