Accurately matching points across different images, viewing the same object or scene, is a crucial step in many computer vision tasks. Local image descriptors must handle image transformations in order to provide invariance to viewing conditions, such as illumination and perspective changes. In this project, we focus on improving local image description using multi-modal data and data-driven solutions. For example, we can improve the invariance of an image descriptor to non-rigid surfaces by employing geodesic-awareness, or by explicitly modelling deformations via a learned function.

Datasets

Copyright Notice

The datasets available for download in this page are published under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License. This means you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.

Dataset of Deformable RGB-D Images (Presented at ICCV’19):
Kinect 1 Sequences (38 MB) : This dataset contains six different real-world objects under different deformation levels and illumination changes. The RGB-D images were acquired at 640 x 480 resolution with a Kinect 1 sensor. Each image has approximately 50 manually annotated keypoints.

Simulation (26 MB) : This dataset is composed of simulated RGB-D sequences (640 x 480 pixels) with a physics cloth engine simulation. Several textured clothes are subjected to challenging non-rigid deformation, illumination, rotation and scale changes. The keypoints in this sequence are selected with Harris score in the first reference texture image, and their exact correspondence overtime are tracked in the simulation.

Extended Dataset (Proposed in CVIU’22):
Kinect 2 Sequences (1.1 GB) : This dataset contains five additional real-world objects acquired with a Kinect 2 sensor at 1920 x 1080 resolution images. We provide image sequences for each of the five objects containing different levels of deformations: light, medium and heavy deformations. Accurate 80 pointwise correspondences are automatically obtained with a motion capture system.

Dataset File Format: All datasets follow the same format: Color images are stored as 8-bit PNG and depth images are stored as 16-bit PNG images in millimetres. The intrinsics.xml file contains the intrinsic parameters of the camera, allowing the reconstruction of the pointcloud. Each image also has a respective .csv file, where each line consists of a keypoint number (ID), its 2D image coordinates and a boolean flag indicating if the keypoint is visible in the current keyframe. The keypoints are selected in the reference image, therefore all keypoints are visible in the reference frame.

Dense TPS warps: In addition to the pixel-accurate landmarks contained in the .csv, we also provide a dense thin-plate-splines (TPS) warp from the reference image to the target frames. The dense TPS warps are obtained by first using the landmarks as control points for a coarse TPS warp estimation, and then these warps are progressively refined by minimizing a photometric cost. This script demonstrates how to use the TPS warp files to generate ground-truth correspondences for SIFT detected keypoints.

Kinect 1 (TPS files, 2.9 MB) | Kinect 2 (TPS files, 43 MB) | Simulation (TPS files, 4.0 MB)

Publications


[NeurIPS 2021] Guilherme Potje and Renato Martins and Felipe Chamone and Erickson R. Nascimento. Extracting Deformation-Aware Local Features by Learning to Deform, Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021.
Visit the page for more information and paper access.

[ICCV 2019] Erickson R. Nascimento and Guilherme Potje and Renato Martins and Felipe Chamone and Mario F. M. Campos and Ruzena Bajcsy. GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images, 2019 IEEE International Conference on Computer Vision (ICCV), 2019.
Visit the page for more information and paper access.

Acknowledgments


This project is supported by CAPES, CNPq, and FAPEMIG.

Team



Renato José Martins

Professor at Université de Bourgogne

Felipe Cadar Chamone

PhD Student

Ruzena Bajcsy

Professor