PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization

Weilin Chen, Jiahao Rao, Wenhao Wang, Xinyang Li , Xuan Cheng^* , Liujuan Cao

Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University
CVPR 2026
^*Corresponding Author -- chengxuan@xmu.edu.cn

Paper Code

CustomTex is capable of generating high-fidelity texture for a 3D scene mesh, driven by instance-specific reference images.

Abstract

The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge. While text-driven methods offer flexibility, they often lack the precision required for fine-grained, instance-level control, and tend to produce textures with artifacts and baked-in shading. To overcome these limitations, we introduce CustomTex, a novel framework for instance-level, high-fidelity scene texturing driven by reference images. CustomTex takes an untextured 3D scene together with reference images specifying the desired appearance for each object instance, and generates a unified, high-resolution texture map. The core of our method is a dual-distillation approach that decouples semantic control from pixel-level enhancement. We employ semantic-level distillation, equipped with instance-aware cross attention, to ensure semantic plausibility and reference-instance alignment, and pixel-level distillation to enforce high visual fidelity. Both are unified within a Variational Score Distillation (VSD) optimization framework. Experiments demonstrate that CustomTex achieves precise instance-level consistency with reference images and produces textures with superior sharpness, reduced artifacts, and minimal baked-in shading compared to state-of-the-art methods. Our work establishes a more direct and user-friendly path to high-quality, customizable 3D scene appearance editing.

Method Overview

Pipeline of CustomTex. CustomTex textures a complete 3D indoor scene by optimizing a texture map in UV space through a dual-distillation training approach. In each iteration, the 3D scene with optimized texture is rendered from a random viewpoint, producing an RGB image, a depth map and instance masks. Instance masks are used to align each reference image's features with the correct object instance in the rendered RGB image via a specialized cross-attention. The Variational Score Distillation gradient and the Super-Resolution gradient are computed based on the well-aligned reference images condition to update the texture field.

Reference-Guided Comparison

Reference Image

Paint3D

HY3D-2.1

SceneTex-IPA

Ours

Reference Image

Paint3D

HY3D-2.1

SceneTex-IPA

Ours

Stylization for More Scenes

Reference Image

Textured Result (Ours)

Reference Image

Textured Result (Ours)

Reference Image

Textured Result (Ours)

Reference Image

Textured Result (Ours)

Reference Image

Textured Result (Ours)

High-Quality Renderings

The "living room" texture generated by our method is rendered into 2,000 × 2,000 resolution image.

The "living room" texture generated by our method is rendered into 2,000 × 2,000 resolution image.

The "bedroom" texture generated by our method is rendered into 2,000 × 2,000 resolution image.

The "bedroom" texture generated by our method is rendered into 2,000 × 2,000 resolution image.

BibTeX

@misc{CustomTex2025,
  title={CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization},
  author={Weilin Chen, Jiahao Rao, Wenhao Wang, Xinyang Li, Xuan Cheng, Liujuan Cao},
  year={2025},
  url={https://chenweilinx.github.io/CustomTex/},
  note={Preprint}
}