We propose a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. The key idea is to fully utilize semantic parsing and primitive extraction for constraining and accelerating the radiance field reconstruction process. To fulfill this goal, a primitive-aware hybrid rendering strategy was proposed to enjoy the best of both volumetric and primitive rendering. We further contribute a reconstruction pipeline conducts primitive parsing and radiance field learning iteratively for each input frame which successfully fuses semantic, primitive, and radiance information into a single framework. Extensive evaluations demonstrate the fast reconstruction ability, high rendering quality, and convenient editing functionality of our method.
Standard volume based rendering methods like NeRF can model complex scenes but suffer from heavy sampling and ambiguous geometry. Primitive based rendering methods like NeurMips enjoys fast rendering but have difficulty representing complex geometric. We propose a novel hybrid representation to take advantage of both kinds of methods. Specifically, we represent the scene with a semantic volume, which consists three kinds of voxels with different sampling strategy. D-voxels we simply use dense sampling like nerf, while for P-voxels we apply sparse primitive aware sampling strategy. Besides, we simply skip sampling within E-voxels.
we show the optimization framework for incremental radiance reconstruction. Given each RGBD image as input, we first detect primitives such as planes. Then we merge the detected new primitives into a global primitive list. Next, we back project the semantic frame into a 3D semantic volume to update the global semantic state. After that, primitive aware hybrid rendering is applied to render RGB values, depth values as well as semantic values which are supervised by the input frames.
We provide performance comparison between PARF and NeRF-SLAM which is in depth supervised version of InstantNGP. Notice our PARF enjoys much faster convergence with the help of primitive-aware hybrid representation.
We also show the extrapolation ability of our method. Note that under extrapolation views, PARF shows robust rendering results with the help of primitive-aware representation, while NeRF-SLAM shows blurry rendering results due to the ambiguous geometry.
We show that our method is capable of real-time rendering and interactions.
Given only sparse view as input, PARF shows robust rendering performance thanks to the primitive based hybrid representation.
Our primitive-aware hybrid representation also enables convenient scene editing.
We introduce PARF, a Primitive-Aware Radiance Fusion method for indoor scene radiance field reconstruction and editing. By combining volumetric and primitive rendering in a hybrid neural representation, we successfully merge semantic parsing, primitive extraction, and radiance fusion into a single framework. PARF achieves significant improvement in convergence speed, strong view extrapolation performance, and realistic semantic editing effects simultaneously. Since the discrete semantic volume may lead to jagged primitive boundaries for novel view synthesis, future work includes combining the semantic information in a more compact manner and adding more kinds of primitives for more effective reconstruction.