GlueStick: Robust Image Matching by Sticking Points and Lines Together

Rémi Pautrat^*,1, Iago Suárez^*,2, Yifan Yu¹, Marc Pollefeys^1,3, Viktor Larsson⁴

^*Equal Contribution
¹ETZ Zürich, ²Qualcomm XR Labs Europe, ³Microsoft Mixed Reality and AI Zurich Lab, ⁴Lund University

ICCV 2023

Paper Code Colab Demo Poster arXiv

Method Overview

Keypoints, dense descriptors, and lines are extracted from two images, and unified into a wireframe for each image (front-end). We then take the two corresponding wireframes, and enrich the features of their nodes via self, line, and cross-attention inside a Graph Neural Network (GNN). Finally, points and lines are matched separately via two dual-softmax modules.

1. From Points and Lines to Wireframes

The first step in our pipeline is to build a wireframe using poins and lines:

We use SuperPoint (SP) to predict keypoints and a dense descriptor map.
We detect segments with the general-purpose LSD detector.
Keypoints located close to line endpoints are redundant, so we remove SP keypoints that are within a small distance to existing line endpoints.
We merge close-by endpoints, again with a distance threshold.

This process lifts the unstructured line cloud into an interconnected wireframe. After this step, each keypoint and line endpoint is represented as a node in the wireframe.

2. Attention-based Graph Neural Network

Graph Neural Network (GNN) architecture. Node features of the wireframe are enriched via several communication layers. Our proposed Line Message Passing exchanges information between neighboring nodes that are connected together.

3. Dual-Softmax for Points and Lines

Instead of frequent optimal transport assignation, we use a dual-softmax approach, which bring us higher efficiency with similar or better matching results.

GlueStick matches both point and line in a single forward pass. We propose to match nodes and lines separately through two independent dual-softmax assignments. On the one hand, all nodes (keypoints and line endpoints) are matched against each other using the final features output by the GNN. On the other hand, lines are matched in a similar way, except that each line is represented by its two endpoints features. To make the matching agnostic of the endpoint ordering, we take the maximum of the two configurations in the line assignation matrix.

Line matching with order-agnostic endpoints. We consider the maximum score assignment between the two possible configurations of endpoint matching.

4. Ground Truth Generation

We use 3D data to train and evaluate our point and line matches. Obtaining matches between lines, even with 3D data is a tricky process. We determine if two lines are a correct match by sampling points along each line (cyan dots in the left image). We compute the 3D point locations in the world and re-project them back to the other image (green points). If a reasonable amount of green points fall close to a 2D segment in the second image, we generate a GT match!

Ground truth (GT) line assignations. Line segments with the same color are labeled as matches in our GT.

Some Cool Results 🚀

Line Matching Comparison

LBD

SOLD2

LineTR

L2D2

GlueStick

Image Stitching

Examples of GlueStick matches on image pairs of SUN360. We provide the point and line matches, as well as the stitching of the two images using the resulting matches.