CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting

1University of York, 2pxld.ai
ICLR 2026
Original
Stabilised

Drag the divider to compare the original glacier timelapse (left) with lighting-stabilised output (right). Each of the output frames uses the structure of the corresponding original frames, and is relit using the lighting conditions of the initial frame.

Abstract

Cross-view completion (CroCo) has proven effective as pre-training for geometric downstream tasks such as stereo depth, optical flow, and point cloud prediction. In this paper we show that it also learns photometric understanding due to training pairs with differing illumination. We propose a method to disentangle CroCo latent representations into a single latent vector representing illumination and patch-wise latent vectors representing intrinsic properties of the scene. To do so, we use self-supervised cross-lighting and intrinsic consistency losses on a dataset two orders of magnitude smaller than that used to train CroCo. This comprises pixel-wise aligned, paired images under different illumination. We further show that the lighting latent can be used and manipulated for tasks such as interpolation between lighting conditions, shadow removal, and albedo estimation. This clearly demonstrates the feasibility of using cross-view completion as pre-training for photometric downstream tasks where training data is more limited.

Hypothesis

CroCo data flow
CroCo
Implicit relighting hypothesis in CroCo
Implicit relighting in CroCo

Left: data flow through CroCo. Right: the relighting that we hypothesise CroCo must implicitly perform. To predict a masked patch (?), the target illumination must be estimated from unmasked patches (green). Patches containing the same scene content (blue) must be delit using the source lighting estimated from the second view (purple) and relit (orange) using the estimated illumination.

Method

The architecture of the model comprises four main components. First is the frozen CroCo encoder. Last is the decoder D which is separately pre-trained and then frozen to decode from CroCo latent space to RGB. Then there are the delighting and relighting transformers, I and R respectively, which disentangle lighting and intrinsics before recombining them. The training process here shows pairs of images encoded and relit to match the lighting of the other image.
CroCoDiLight architecture diagram

Task-Specific Training

For shadow removal and albedo estimation, we train components S and A to learn the transformations in the lighting latent space from the input latent to the desired output latent, which is derived from the ground truth output image.

Task-specific training diagram for shadow removal and albedo estimation

Albedo Estimation

Drag the divider to compare input (left) with predicted albedo (right)

Predicted albedo Input image

IIW (597)

Predicted albedo Input image

IIW (2504)

Predicted albedo Input image

IIW (10291)

Predicted albedo Input image

IIW (104664)

Predicted albedo Input image

IIW (117618)

Predicted albedo Input image

IIW (118411)

Shadow Removal

Here we show the outputs of using the shadow removal component S to transform the lighting latents of input images to remove the shadows. S was jointly trained on the SRD, ISTD+, and WSRD+ datasets to produce a single general-purpose shadow removal model. We then compare to other major shadow removal methods. Despite our model being trained once on multiple datasets, our visual results demonstrate comparatively strong shadow removal even when compared to models fine-tuned on specific datasets.

Drag the divider to compare input (left) with shadow-free output (right)

Shadow removed With shadow

ISTD+ (113-2)

Shadow removed With shadow

ISTD+ (116-11)

Shadow removed With shadow

SRD (MG_6507)

Shadow removed With shadow

WSRD+ (0006)

Shadow removed With shadow

WSRD+ (0009)

Shadow removed With shadow

WSRD+ (0051)

Method Comparison

Each pair of rows shows the method outputs (top) and a signed difference heatmap against the ground truth (bottom).

Input / GT CroCoDiLight (Ours) StableShadowRemoval OmniSR HomoFormer
Input CroCoDiLight StableShadowRemoval OmniSR HomoFormer
Ground Truth CroCoDiLight diff SSR diff OmniSR diff HomoFormer diff Difference colorbar
Input CroCoDiLight StableShadowRemoval OmniSR HomoFormer
Ground Truth CroCoDiLight diff SSR diff OmniSR diff HomoFormer diff Difference colorbar
Input CroCoDiLight StableShadowRemoval OmniSR HomoFormer
Ground Truth CroCoDiLight diff SSR diff OmniSR diff HomoFormer diff Difference colorbar
Input CroCoDiLight StableShadowRemoval OmniSR HomoFormer
Ground Truth CroCoDiLight diff SSR diff OmniSR diff HomoFormer diff Difference colour bar

Relighting

Freeze Intrinsics, Vary Lighting

Intrinsic patches from a single frame are kept fixed and then relit with the lighting latents from subsequent frames. While in the original timelapse, the clock hands move, the frozen intrinsics ensure that the clock hands remain static while the shadows change in the relit frames. Original timelapse from this video.

Freeze Lighting, Vary Intrinsics

The lighting latent is extracted from one reference frame and applied to all other frames. Each frame retains its own intrinsic content (geometry, materials) but is relit to match the reference lighting. Original timelapse from this video.

Midday reference frame (original) Midday reference frame (relit to itself)

Reference

Night scene, original lighting Night scene, relit to midday reference
Sunrise scene, original lighting Sunrise scene, relit to midday reference
Afternoon scene, original lighting Afternoon scene, relit to midday reference
Evening scene, original lighting Evening scene, relit to midday reference
Top: Original Bottom: Relit to reference lighting

BibTeX

@inproceedings{foggin2026crocodilight,
  title={{CroCoDiLight}: Repurposing Cross-View Completion Encoders for Relighting},
  author={Foggin, Alistair J and Smith, William A P},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=GKvb3HCyNk}
}