Scene understanding is a crucial factor in the development of robots that can effectively act in an uncontrolled, dynamic, unstructured, and unknown environment, such as those found in real-world scenarios. In this context, a higher level of understanding of the scene (such as identifying certain objects, people, as well as localizing them) is usually required to perform effective navigation and perception tasks.
Here, we propose an open framework for building hybrid maps, combining both environment structure (metric map) and environment semantics (objects classes) to support autonomous robot perception and navigation tasks.
We detect and model objects in the environment from RGB-D images, using convolutional neural networks to capture higher-level information. Finally, the metric map is augmented with the semantic information extracted using the object categories.
We also make available datasets containing robot sensor data in the form of rosbag, allowing the reproductibility of experiments on machines without the need to have the physical robot and its sensors.
Dataset and Source Code
In preparation for the JINT submission.
This project is supported by CAPES and FAPEMIG.