Skip to the content.

Abstract

The advent of a panoply of resource limited devices opens up new challenges in the design of computer vision algorithms with a clear compromise between accuracy and computational requirements. In this paper we address this and introduce binary image descriptors that establish new operating points in the state-of-the-art's accuracy vs. resources trade-off curve. We revisit descriptors based on pixel differences and gradients to introduce respectively BAD (Box Average Difference), the fastest binary descriptor in the literature, and HashSIFT. They are trained using triplet ranking loss, hard negative mining and anchor swap, combined with a new efficient feature selection algorithm. In our experiments we evaluate the accuracy, execution time and energy consumption of the proposed descriptors. We show that they are the most accurate when confronted with competing techniques with similar computational requirements. Further, in a planar image registration, HashSIFT performs on par with the top deep learning-based descriptors, being several orders of magnitude more efficient.

Video

Learning Efficient Local Descriptors

The goal of any local feature descriptor is to learn a similarity function \( \mathcal{S}(\cdot, \cdot) \) between local features. We define the training objective \( \mathcal{L}_{\text{TRL}} \) of our descriptors with the Triplet Ranking Loss (TRL). It brings different descriptions (\( \mathbf{a}_i \), \( \mathbf{p}_i \)) of the same scene point closer while pushing apart descriptors from other scene points \( \mathbf{n}_i \). Its benefit compared with contrastive pair-wise loss is that it is more related to the nearest neighbors matching task, where a good keypoint match is produced only if the correct corresponding keypoint is the close in descriptor distance.

Hard Negative Mining challenges the TRL with different scene points that have the closest description. At each iteration, we choose our negative \( \mathbf{n}_i \) as the hardest in batch (i.e., the one with the smallest descriptor distance).

constrative learning
triplet ranking loss
hard negative mining

Results

Here we add some extra results showing the performance of the proposed descriptors with other approaches in the State of the Art:

BAD-256 Reconstruction of Madrid Metropolis

BAD-512 Fundamental matrix estimation (EuRoC)

ETH Benchmark

Full results table in ETH Benchmark:

# Registered # Sparse Points # Obervations Track Length Reproj. Error # Inliner Pairs # Inliner Matches # Dense Points
Fountain (11 images)
ORB1115001711714.7444170.38430655125033306277
BEBLID-2561115539740444.7650430.39448955133838303771
LATCH1115384739074.8041470.40121455135643307421
BAD-2561115574744044.777450.39727655135943307932
BAD-5121115741756134.803570.40733555141365305564
RSIFT1116167778794.8171580.43304955154688307027
Binboost1115391730114.7437460.39766855129571302792
LDAHash-DIF1115134708654.6825030.38949155122713304385
HashSIFT-2561116086775074.8182890.42743155149103306132
HashSIFT-5121116385790824.8264880.43838855156135305520
TFeat-m*1116278788804.8458040.43160755153725305073
HardNet1117071839734.9190440.47760355183331305701
CDbin-256b1116607813604.8991390.45518455168946305534
Herzjesu (8 images)
ORB87619314754.131120.410192846625237948
BEBLID-25687922334144.2178740.4297932851720241862
LATCH87871330584.1999750.4306692850739240523
BAD-25688056340384.2251740.4355422853059242998
BAD-51288220348934.2448910.4485512855866236171
RSIFT88533362794.2516110.4763182860808241740
Binboost87630320094.1951510.4544982847763233824
LDAHash-DIF87912326834.1308140.4352682848765244861
HashSIFT-25688560363924.2514020.4731292859246240978
HashSIFT-51288769373764.2622880.4798772862297240154
TFeat-m*88631367274.2552430.4761862860675239675
HardNet89444404834.2866370.5172842874867239362
CDbin-256b88997386504.2958760.4976782867802242179
South Building (128 images)
ORB1281376276957895.0556140.496237812822850892137625
BEBLID-2561281416047102905.0160310.500718812823476482134091
LATCH1281395847168085.1353160.521234812823456772144368
BAD-2561281457717279534.9938120.515675812824350172145993
BAD-5121281484917446045.0144720.527237812825338792127316
RSIFT1281551957984565.1448560.58171812828361562139778
Binboost1281351866907515.1096340.510165812822204602156847
LDAHash-DIF1281412487059284.9977910.511755812824695112132395
HashSIFT-2561281491027646995.1286970.563444812827188122116461
HashSIFT-5121281568887989485.0924740.581466812829047872142022
Tfeat-m*1281528347751595.0719020.574171812827219562149925
HardNet1281685368788475.2145950.642522812833447592122914
CDbin-256b1281605898322815.1826780.616106812831248702128460
Madrid Metropolis (1344 images)
ORB4571358265761384.2417360.641296898475773238551085693
BEBLID-2565491742577056514.0494840.656167898491782230281153261
LATCH5731868867595814.0644080.655908898825663958791245053
BAD-2566001926387894664.0981840.675328898344725558801236144
BAD-5126221895238122434.2857230.677531898327682428931268840
RSIFT72928651911363063.9659010.678011898184777456271349061
Binboost5141436226299934.3864660.668252897792679461971129936
LDAHash-DIF5922338628049443.4419610.642139898544958273061046695
HashSIFT-25672029892010754503.5977850.667772898464881502781202895
HashSIFT-51272030523711607383.8027430.686795898459877693251387138
TFeat-m*6902627909864703.7538340.677615897709758236831233791
HardNet84935961014389094.0013040.701354898257791441131436234
CDbin-256b76926069011080184.2503280.696556898274792220341347656

Citation

If you use this project please cite:

@article{suarez2021revisiting,
  title={Revisiting Binary Local Image Description for Resource Limited Devices},
  author={Su{\'a}rez, Iago and Buenaposada, Jos{\'e} M and Baumela, Luis},
  journal={IEEE Robotics and Automation Letters},
  volume={6},
  number={4},
  pages={8317--8324},
  year={2021},
  publisher={IEEE}
}