|
| 1 | +# SemanticSingleViewReconstruction |
| 2 | + |
| 3 | +## 3D Semantic Scene Reconstruction from a Single Viewport |
| 4 | + |
| 5 | +Maximilian Denninger and Rudolph Triebel |
| 6 | + |
| 7 | +Accepted paper at IMPROVE 2023. [paper](MISSING_LINK) |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +<p align="center"> |
| 12 | +<img src="docu_images/main_overview.jpg" alt="data overview image" width=800> |
| 13 | +</p> |
| 14 | + |
| 15 | +### Abstract |
| 16 | + |
| 17 | +We introduce a novel method for semantic volumetric reconstructions from a single RGB image. To overcome |
| 18 | +the problem of semantically reconstructing regions in 3D that are occluded in the 2D image, we propose to |
| 19 | +combine both in an implicit encoding. By relying on a headless autoencoder, we are able to encode semantic |
| 20 | +categories and implicit TSDF values into a compressed latent representation. A second network then uses |
| 21 | +these as a reconstruction target and learns to convert color images into these latent representations, which get |
| 22 | +decoded after inference. Additionally, we introduce a novel loss-shaping technique for this implicit representation. In our experiments on the realistic benchmark Replica-dataset, we achieve a full reconstruction of a |
| 23 | +scene, which is visually and in terms of quantitative measures better than current methods while only using |
| 24 | +synthetic data during training. On top of that, we evaluate our approach on color images recorded in the wild. |
| 25 | + |
| 26 | +### Network overview |
| 27 | + |
| 28 | +<p align="center"> |
| 29 | +<img src="docu_images/architecture.jpg" alt="data overview image" width=800> |
| 30 | +</p> |
| 31 | + |
| 32 | +### Content description |
| 33 | + |
| 34 | +This repository contains the models used to reproduce the main results presented in the paper. |
| 35 | +We also include the code to generate the data and train the models. |
| 36 | + |
| 37 | +### Quick start |
| 38 | + |
| 39 | +If you just want to test this method on your images, only a few steps are necessary to do that: |
| 40 | + |
| 41 | +Head over to the [Setup section](svr/README.md), install the conda script, start the server and wait for the prediction. |
| 42 | + |
| 43 | +## Citation |
| 44 | +If you find our work useful, please cite us with: |
| 45 | + |
| 46 | +``` |
| 47 | +@inproceedings{denninger2022, |
| 48 | + title={3D Semantic Scene Reconstruction from a Single Viewport}, |
| 49 | + author={Denninger, Maximilian and Triebel, Rudolph}, |
| 50 | + booktitle={Proceedings of the 3rd International Conference on Image Processing and Vision Engineering (IMPROVE)}, |
| 51 | + year={2022} |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +## Train your own network |
| 56 | + |
| 57 | +Everything you need to retrain these methods with your own data is provided in this repository. |
| 58 | +Before you can start with the training you need to generate the data, which is nearly completely automatized. |
| 59 | +For this head over to [data generation](data_generation/README.md). |
| 60 | +After you generate the data you need for the network you want to retrain, head over to the specific network: |
| 61 | + |
| 62 | +* [U-Net for the surface normals](svr/u_net_normal/README.md) |
| 63 | +* [Implicit TSDF point cloud compression](svr/implicit_tsdf_decoder/README.md) |
| 64 | +* [Full 3D Scene Reconstruction](svr/scene_reconstruction/README.md) |
| 65 | + |
| 66 | +Be aware that the data generation takes roughly 15.000 GPU hours and needs around 15 TB of storage space. |
0 commit comments