Complete Guide to Neural Non-Rigid Tracking

Neural non-rigid tracking mechanism is robust in performance and cheaper to deploy in real-world object-tracking applications

Share

Published on April 5, 2021

by Rajkumar Lakshmanamoorthy

Augmented Reality (AR) and Virtual Reality (VR) applications are growing enormously in the count. These applications rely chiefly on the reconstruction of 2D/3D images and scenes. Though there is consistent progress in capturing and reconstruction, this remains one of the challenging tasks in computer vision. Capturing and reconstructing static objects is performed with great accuracy via many architectures. However, capturing and reconstructing dynamic objects is still a domain that needs a solid development.

Dynamic object tracking and reconstruction are roughly classified into Rigid object tracking and reconstruction and Non-rigid object tracking and reconstruction. Rigid object tracking assumes a shape prior and tracks that predefined shape anywhere in the given frame. On the other hand, non-rigid object tracking looks only for the predefined characteristics in the given frame but not a fixed shape. Commercial colour-and-depth cameras, commonly called the RGB-D sensors, such as Microsoft’s Kinect and the Intel’s Realsense make real-time non-rigid object tracking deployable. But existing systems require a large array of RGB-D cameras to capture non-rigid dynamic objects and a computationally expensive set up. These limitations hardly give way to the commercialization of real-time non-rigid tracking applications.

Aljaž Božic, Pablo Palafox, Angela Dai, Justus Thies and Matthias Niessner of the Technical University of Munich and Michael Zollhöfer of the Facebook Reality Labs have introduced a Neural non-rigid tracker mechanism that is robust in performance and cheaper to deploy in real-world applications. It demonstrates state-of-the-art non-rigid reconstructions by greatly outperforming existing methods.

An overview of Neural non-rigid tracking — An overview of the Neural non-rigid tracking

The proposed Neural non-rigid tracking uses capturing merely from a single RGB-D sensor, thus leads to a cheaper setup. The initial image in the input frame is considered the source, and the successive image is considered its target. The source and the target are advanced temporally as the input capturing streams in. Since the object of interest is non-rigid, it needs a special characteristic called correspondence to map and follow the object. Correspondence is the prediction of a specific source pixel in the target image. Neural non-rigid tracking performs correspondence prediction in a pixel-wise manner followed by correspondence weighting. In correspondence weighting, each predicted correspondence is given a real-valued weight between 0 and 1 so that the tracker can get rid of outliers. This correspondence approach enables the model to perform correspondence mapping at least 85 times faster than the existing methods!

Finally, the weighted correspondence map is passed through a differential solver. The differential solver is a self-supervised learning algorithm that learns, rejects outliers and optimizes the architecture to efficiently track the non-rigid objects. The differential solver enables the network to train end-to-end in a novel manner. Thus there is no need for any pre-trained model to learn correspondence or to track non-rigid objects. End-to-end training helps the Neural non-rigid tracking achieve greater performance even with a reduced computational capacity and a single RGB-D sensor. This architecture employs densely connected convolutional neural networks throughout.

Tracking strategy — The end-to-end training strategy in Neural non-rigid network with Correspondence prediction, Correspondence weighting and a Differential solver governed by Correspondence Map Loss, Graph Loss and Warp Loss.

Python Implementation of Neural Non-rigid Tracking

Download the source code from the official repository to the local machine.

!git clone https://github.com/DeformableFriends/NeuralTracking.git

Output:

Change the directory to refer to the downloaded NeuralTracking directory.

 %cd NeuralTracking/
 !ls -p

Output:

Install Anaconda-3 distribution using the following command, if the local machine does not have one.

 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh

Using the following command (mentioned below) in base mode, activate the conda environment and build the development environment. Installing the dependencies and activating the environment takes some time.

!bash

and provide the following inside the base mode,

conda env create --file resources/env.yml

Output:

The following command inside the base mode runs the setup file in the conda environment and installs C++ dependencies.

 conda activate nnrt
 cd csrc
 python setup.py install
 cd ..

To run the Neural Non-Rigid Tracking model and evaluate it on two frames, execute the following commands.

 %%bash
 python example_viz.py

If the users wish to train the model from scratch, the officially recommended dataset can be downloaded to the local machine and preprocessed using the following command.

 %%bash
 python create_graph_data.py

Training can be enabled using the following command. It should be noted that training may take its time based on the memory availability and device configurations.

 %%bash
  ./run_train.sh

Once training is finished, evaluation can be performed using the following command.

 %%bash
 ./run_generate.sh

Performance of Neural Non-Rigid Tracking

Qualitative analysis of Neural Non-Rigid Tracking

Neural Non-Rigid Tracking is trained and evaluated on the DeepDeform benchmark. Other competing models, including DynamicFusion, VolumeFusion and DeepDeform, are trained and evaluated under identical conditions and device configurations for comparison.

Qualitative comparison of Neural Non-Rigid Tracking with DynamicFusion and DeepDeform models.

Neural Non-Rigid Tracking achieves state-of-the-art performance in non-rigid reconstruction by generating Deformation and Geometry errors lesser than the DynamicFusion, the VolumeFusion and the DeepDeform models at 85x speed!