We magnify subtle motions that can be trained without ground truth magnified videos. Our method also allows users to "target" the magnification, only magnifying specific regions or objects. Since our method is self-supervised, we can finetune the model at test time to improve its peformance.
Given a reference frame \(I_0\), a frame to be magnified \(I_t\), and a map of per-pixel magnification factors \(\alpha\), we predict a magnified frame \(\tilde{I}_t\). We minimize two losses that each use an off-the-shelf optical flow estimator \(\mathcal{F}(\cdot, \cdot)\). First, we use a magnification loss \(\mathcal{L}_{\text{mag}}\) that encourages the optical flow of the generated video to be \(\alpha\) times as large as that of the input video. And second, we also include a consistency loss \(\mathcal{L}_{\text{color}}\) that measures the visual similarity of corresponding pixels in \(I_t\) and \(\tilde{I}_t\).
Our model gives us the ability to vary the magnification factor spatially within an image. We achieve this by providing different values of \(\alpha\) for each pixel at inference time. This ability is useful when we want to focus on a specific object or when we want to ignore an object that may be challenging to magnify.
While our method demonstrates favorable performance in most videos, there are cases where it falls short. One particular failure mode occurs when a video contains thin structures such as tree branches and guitar strings. Another common failure is incorrectly magnifying background that has zero flow. This happens when small errors from the optical flow model are magnified, as shown in the camera sequence.
There's a lot of excellent work that is relevant to our paper.
The optimal approach to framing motion magnification remains an open question. While our technique focuses on training without ground truth videos, there are also well-established Eulerian methods and supervised learning-based approaches (See more information at 1, 2, and 3). Check out this website for an excellent collection of work on motion magnification.
[1] Totally Normal. "12 hours of cats sleeping." YouTube video, 2018.
[2] Feng, Brandon Y., et al. "3D Motion Magnification: Visualizing Subtle Motions from Time-Varying Radiance Fields." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
[3] Linjie Yang et al. "The 2nd Large-scale Video Object Segmentation Challenge - video object segmentation track."
[4] Wadhwa, Neal, et al. "Riesz pyramids for fast phase-based video magnification." 2014 IEEE International Conference on Computational Photography (ICCP). IEEE, 2014.
[5] Wadhwa, Neal, et al. "Phase-based video motion processing." ACM Transactions on Graphics (ToG) 32.4 (2013): 1-10.
[6] Mai, Long, and Feng Liu. "Motion-adjustable neural implicit video representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
[7] Terem, Itamar, et al. "Revealing sub‐voxel motions of brain tissue using phase‐based amplified MRI (aMRI)." Magnetic resonance in medicine 80.6 (2018): 2549-2559.
[8] Vibration, dynamics and noise. "Motion amplification camera examples." YouTube video, 2018.
[9] RDI Technologies. "Motion Amplification of an engine starting." YouTube video, 2016.
[10] Xue, Tianfan, et al. "Refraction wiggles for measuring fluid depth and velocity from video." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13. Springer International Publishing, 2014.
[11] Samed Gojak. "An Electric Train Passing by." Pexels video, 2022.
@inproceedings{pan2023selfsupervised,
title={Self-Supervised Motion Magnification by Backpropagating Through Optical Flow},
author={Zhaoying Pan and Daniel Geng and Andrew Owens},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://arxiv.org/abs/2311.17056}
}