Understanding the 3D geometry of transparent objects from RGB images is challenging due to their inherent physical properties, such as reflection and refraction. To address these difficulties, especially in scenarios with sparse views and dynamic environments, we introduce TRAN-D, a novel 2D Gaussian Splatting-based depth reconstruction method for transparent objects. Our key insight lies in separating transparent objects from the background, enabling focused optimization of Gaussians corresponding to the object. We mitigate artifacts with an object-aware loss that places Gaussians in obscured regions, ensuring coverage of invisible surfaces while reducing overfitting. Furthermore, we incorporate a physics-based simulation that refines the reconstruction in just a few seconds, effectively handling object removal and chain-reaction movement of remaining objects without the need for rescanning.
TRAN-D is a three-stage framework for reconstructing transparent objects from sparse RGB views. First, the segmentation module isolates transparent object instances using Grounded SAM, trained with a category-specific prompting strategy. Next, 2D Gaussians are randomly initialized and optimized in the object-aware 2D Gaussian Splatting module using differentiable tile rasterization and a novel object-aware 3D loss. This step produces dense and artifact-free object reconstructions. Finally, the scene update module employs physics-based simulation to refine the reconstruction after object removal, handling chain-reaction movements without requiring re-scanning.
The two rows above show the RGB image and segmentation results at t=0, while the two rows below show the situation at t=1. As the scene transitions from t=0 to t=1, some of the objects present in the scene are removed. During this process, some remaining objects that were in contact with removed objects are relocated. Our Grounded SAM not only accurately distinguishes each object in cluttered scenes containing transparent objects, but also reliably tracks the objects whose positions change after some items are removed.
We fine-tuned Grounded SAM using synthetic data generated from the TransPose dataset. Despite being trained exclusively on synthetic images, the model generalizes well to real-world scenarios, accurately segmenting transparent objects across diverse scenes. Similarly, our Grounded SAM still recognizes objects whose positions have changed as the same objects, such as the plastic bottle (blue mask) that remained but shifted after the beaker (pink mask) was removed in the second row scene.
We evaluated TRAN-D’s robustness and versatility on scenes with diverse backgrounds and textures. The first row shows BEV images included in the sparse-view training images, while the second row displays a zoomed-in view of the first test pose where we perform depth reconstruction. The last row presents the object depth reconstruction results from a spiral sequence of test poses. The segmentation cleanly separates the objects from the background, enabling object-focused reconstruction and avoiding the influence of floaters.
Below are the results at t=1, where the images inside the red boxes show the corresponding situation at t=0 for comparison. We removed the objects with the one-hot vector obtained via segmentation, predicted the motion of the remaining objects through physics simulation, and then updated the Gaussian representation with minimal iterations based on a single image capturing the changed scene. In the result video, you can see that the removed objects disappear correctly, while the remaining objects are also finely adjusted.
Unlike the scenes above, the scenes below demonstrate results on backgrounds with textures. Even at t=1, depth reconstruction continues to perform reliably.
@inproceedings{jeongyun2025TRAN-D,
title={2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update},
author={Jeongyun Kim, Seunghoon Jeong, Giseop Kim, Myung-Hwan Jeon, Eunji Jun and Ayoung Kim},
booktitle={ICCV},
year={2025}
}
This work was supported by the National Research Founda- tion of Korea (NRF) grant funded by the Korea government (MSIT)(No. RS-2024-00461409), and in part by Hyundai Motor Company and Kia. and the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) No.2022-0- 00480, Development of Training and Inference Methods for Goal-Oriented Artificial Intelligence Agents.