MITB Banner

What Is NeX? Guide To Real-Time View Synthesis With Python Code

NeX is a new scene representation based on MPI that models view-dependent effects by performing basis expansion on the pixel representation.

Share

NeX

NeX is a new scene representation based on the multiplane image (MPI) that models view-dependent effects by performing basis expansion on the pixel representation. Rather than simply storing static colour values as in a traditional MPI, NeX represents each colour as a function of the viewing angle and approximates this function using a linear combination of learnable spherical basis functions. Moreover, it uses a hybrid parameter modeling strategy that models high-frequency details in an explicit structure within an implicit MPI modeling framework. This helps improve fine details that are difficult to model by a neural network and produces sharper results in fewer training iterations. NeX also introduced a new dataset, Shiny, designed to test the limits of view-dependent modeling with significantly more challenging effects such as rainbow reflections on a CD and refraction through a test tube.

Approach & Architecture 

A multiplane image(MPI) is a 3D scene representation consisting of a collection of D planar images, each with dimension H × W × 4 where the last dimension contains RGB values and alpha transparency values. The planes are scaled and placed equidistantly either in the depth space (for bounded close-up objects) or inverse depth space (for scenes that extend out to infinity) along a reference viewing frustum. 

An RGBα MPI can be rendered in any target view by first warping all its planes to the target view via a homography that relates the reference and target view,  and then applying the composite operator. Let ci ∈ R H×W×3 and αi ∈ R H×W×1 be the RGB and alpha “images” of the ith plane respectively, ordered from back to front. And A = {α1, α2, …, αD}, C = {c1, c2, …, cD} be the sets of these images. This MPI can be rendered in a new view using the composite operator O:

Here W is the homography warping function and O is:

composite operator O

One main limitation of multiplane images is that they can only model Lambertian surfaces, i.e., surfaces whose colours appear constant regardless of the viewing angle. In real-world scenarios, many objects are non-Lambertian such as a CD, a glass table, or a metal spoon. These objects exhibit view-dependent effects such as reflection and refraction. Reconstructing these objects with an MPI makes the objects appear unrealistically dull without reflections or even break down completely due to the violation of the brightness constancy assumption used for matching invariant and 3D reconstruction. 

NeX versus standard MPI performance on a scene with CD

To allow for view-dependent modeling in NeX, the pixel color representation is modified by parameterizing each color value as a function of the viewing direction v = (vx, vy, vz). This results in a 3-dimensional mapping function C(v): R 3 → R 3 for every pixel. However, storing this mapping explicitly is limiting and not generalizable to new, unobserved angles. Regressing the color directly from v (and the pixel location) with a neural network is possible but inefficient for real-time rendering. The key idea behind NeX is to approximate this function with a linear combination of learnable basis functions {Hn(v) : R3 → R} over the spherical domain described by vector v:

Here kpn ∈ R3 for pixel p are RGB coefficients, or reflectance parameters, of N global basis functions. There are several ways to define a suitable set of basis functions,  spherical harmonics basis is one common choice used heavily in computer graphics. Fourier’s basis or Taylor’s basis can also be used. 

PSNR score of NeX neural basis function versus other fixed basis functions.
PSNR scores versus the number of basis coefficients for NeX (neural basis functions), Fourier’s series (FS), Jacobi spherical harmonics(JH),  hemispherical harmonics (HSH), spherical harmonics (SH), and Taylor’s series (TS).

However, these “fixed” basis functions have one shortcoming:  the number of basis functions required to capture high-frequency changes within a narrow viewing angle can be very high. This in turn requires more reflectance parameters which make both learning these parameters and rendering them more difficult. With learnable basis functions, the modified NeX MPI outperforms other versions with alternative basis functions that use the same number of coefficients. 

NeX uses two separate MLPs; one for predicting per-pixel parameters given the pixel location, and the other for predicting all global basis functions given the viewing angle. The motivation for using the second network is to ensure that the prediction of the basis functions, which are global, is not a function of the pixel location. The first MLP is modeled as Fθ with parameter θ:

First MLP modeling Fθ

Here x = (x, y, d) contains the location information of pixel (x, y) at plane d. The second network is modeled as Gɸ with parameter ɸ:

Second MLP modeling Gɸ

Here v is the normalized viewing direction. 

Fine details are lost when using a traditional MLP to model kn, or “coefficient images”. In view-synthesis problems, these fine details tend to come from the surface texture itself and not necessarily from complex scene geometry. NeX uses positional encoding to regress these images, which helps to an extent but still produces blurry results. Amidst experimentation, the authors stumbled upon a simple fix; storing the first coefficient k0, or “base color,” explicitly reduced the network’s burden of compressing and reproducing detail and led to sharper results, in fewer iterations. With this implicit-explicit modeling strategy, NeX predicts every parameter with MLPs except k0 which is optimized explicitly as a learnable parameter with a total variation regularize.

Real-time View Synthesis using NeX

Requirements
  1. Install the COLMAP and lpips. FFmpeg and other Python dependencies are already installed in Colab.
 !pip install lpips
 !apt install colmap 
  1. Clone the NeX GitHub repository and navigate into the newly created nex-code directory.
 !git clone https://github.com/nex-mpi/nex-code
 !cd nex-code 
  1. Select a scene, make running directories and download the selected dataset from OneDrive.

You can also use your own images but you’ll need at least 12 images in order for NeX to work. In addition to that, downscaling the images to 400-pixel width is recommended for fast upload and training. 

 scene_urls = {
         'cake': 'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/ESg8LNsTqmtFmKO-9X4dUsUBVgfw_TbuAheVAEKnsiouug?download=1',
         'crest': 'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/EYqAlbiZqO1GsiAg-HgEi34B3cBL3tuaFQxg5fyrV5Prew??download=1',
         'giants':  'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/EUx6wPzSVRtMhpinHKF9ArcBE_4c98xxJLAGSCaM54MiJQ?download=1',
         'room': 'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/ERVHMv2NeOtKgFLGRJ22jgMBdo3BqCQIfd27MFgLvNOW5w?download=1',
         'seasoning': 'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/EedXEIqliIZGk-6fxd-cb9cBsUjidu9G5du1TIYOF5FOyQ?download=1',
         'sushi': 'https://vistec-my.sharepoint.com/:u:/g/personal/pakkapon_p_s19_vistec_ac_th/EZZA-3nyCBVLtIra5yMZzC0BFx3f4wqg1cm8rKzTAt2x0g?download=1',
     }


 scene = “room”
 onedrive_dataset =scene_url[scene]

 # make directories for running
 !mkdir -p data/demo
 !mkdir -p runs 

 # download the dataset
 get_ipython().system_raw('wget -O data/demo/data.zip {}'.format(onedrive_dataset))
 get_ipython().system_raw('unzip -o -d data/demo/ data/demo/data.zip')
 get_ipython().system_raw('rm data/demo/data.zip') 
  1. Set parameters for training.
 epochs =  40
 image_width =  400 
 import math
 pos_level = math.ceil(math.log(image_width) / math.log(2))
 num_offset = int(image_width / 5.0)
 web_width = 4096 if image_width <= 400 else 16000 
  1. Train NeX on the downloaded images
!python train.py -scene data/demo -model_dir demo -layers 12 -sublayers 6 -epochs $epochs -offset $num_offset -tb_toc 1 -hidden 128 -pos_level $pos_level -depth_level 7 -tb_saveimage 2 -num_workers 2 -llff_width $image_width -web_width=$web_width

Training will take around 10 minutes for preset images and 20 minutes for new (your) images. 

  1. Display the generated video.
 from IPython.display import HTML
 from base64 import b64encode
 video_path = "runs/video_output/demo/video.mp4"
 mp4 = open(video_path, "rb").read()
 data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
 HTML(f"""
 <video width=400 controls>
       <source src="{data_url}" type="video/mp4" controls playsinline autoplay muted loop>
 </video>
 """) 
Output video created by NeX

Last Epoch (Endnote)

Performance of NeX on different datasets

This article discussed NeX,  a new approach to novel view synthesis using multiplane image (MPI) with neural basis expansion. Although NeX is effective in capturing and reproducing complex view-dependent effects, it is based on MPI and inherits MPIs limitations. When viewing from an angle too far away from the center, there are “stack of cards” artifacts that expose individual MPI planes. NeX still cannot fully reproduce the hardest scenes in the Shiny dataset which include effects like light sparkles, extremely sharp highlights, or refraction through test tubes. 

References

To learn more about NeX refer to the following resources:

Want to learn more about view-synthesis? Check out our guide to Intel’s Stable View Synthesis.

Share
Picture of Aditya Singh

Aditya Singh

A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.