Line segments are powerful features complementary to points. They offer structural cues, robust to drastic viewpoint and illumination changes, and can be present even in texture-less areas. However, describing and matching them is more challenging compared to points due to partial occlusions, lack of texture, or repetitiveness. This paper introduces a new matching paradigm, where points, lines, and their descriptors are unified into a single wireframe structure. We propose GlueStick, a deep matching Graph Neural Network (GNN) that takes two wireframes from different images and leverages the connectivity information between nodes to better glue them together. In addition to the increased efficiency compared to independently matching points and lines, we also demonstrate a large boost of performance when leveraging the complementary nature of these two features in a single architecture. We show that our matching strategy outperforms the state-of-the-art approaches that independently match line segments and points for a wide variety of datasets and tasks.
Keypoints, dense descriptors, and lines are extracted from two images, and unified into a wireframe for each image (front-end). We then take the two corresponding wireframes, and enrich the features of their nodes via self, line, and cross-attention inside a Graph Neural Network (GNN). Finally, points and lines are matched separately via two dual-softmax modules.
The first step in our pipeline is to build a wireframe using poins and lines:
This process lifts the unstructured line cloud into an interconnected wireframe. After this step, each keypoint and line endpoint is represented as a node in the wireframe.
Instead of frequent optimal transport assignation, we use a dual-softmax approach, which bring us higher efficiency with similar or better matching results.
GlueStick matches both point and line in a single forward pass. We propose to match nodes and lines separately through two independent dual-softmax assignments. On the one hand, all nodes (keypoints and line endpoints) are matched against each other using the final features output by the GNN. On the other hand, lines are matched in a similar way, except that each line is represented by its two endpoints features. To make the matching agnostic of the endpoint ordering, we take the maximum of the two configurations in the line assignation matrix.
We use 3D data to train and evaluate our point and line matches. Obtaining matches between lines, even with 3D data is a tricky process. We determine if two lines are a correct match by sampling points along each line (cyan dots in the left image). We compute the 3D point locations in the world and re-project them back to the other image (green points). If a reasonable amount of green points fall close to a 2D segment in the second image, we generate a GT match!
Ground truth (GT) line assignations. Line segments with the same color are labeled as matches in our GT.
LBD
SOLD2
LineTR
L2D2
GlueStick
Examples of GlueStick matches on image pairs of SUN360. We provide the point and line matches, as well as the stitching of the two images using the resulting matches.
@inproceedings{pautrat_suarez_2023_gluestick,
title={{GlueStick}: Robust Image Matching by Sticking Points and Lines Together},
author={Pautrat, R{\'e}mi and Su{\'a}rez, Iago and Yu, Yifan and Pollefeys, Marc and Larsson, Viktor},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}