Comparing Correspondences: Video Prediction with
Correspondence-wise Losses
Daniel Geng
Andrew Owens
[Paper]
[GitHub]
Losses typically compare pixels to pixels or patches to patches by absolute location. What happens if we instead compare corresponding patches and pixels to corresponding patches and pixels? We propose correspondence-wise losses and compare them against traditional pixel-wise and patch-wise losses.

Abstract

Today's image prediction methods struggle to change the locations of objects in a scene, producing blurry images that average over the many positions they might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. This change leads to crisper and more perceptually accurate predictions, and can be used with any image prediction network. We apply our method to predicting future frames of a video, where it outperforms previous methods with simple, off-the-shelf architectures.



Ablations

All else being equal, we find that using a correspondence-wise loss improves image quality.

Hover over to see the correspondence-wise predictions.

Code


 [GitHub]


Paper and Supplementary Material

Geng, Owens.
Comparing Correspondences.
(hosted on ArXiv)


[Bibtex]


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.