Depth-conditional GANs for Video Generation

In the past few years, several generative adversarial networks (GANs) for video generation have been proposed although most of them only use color videos to train the generative model. However, to make the model understand scene dynamics more accurately, not only optical information but also three-dimensional geometrical information is important. In this paper, using depth video together with color video, we propose a GAN architecture for video generation. In the generator of our architecture, the depth video is generated in the first half and in the second half, the color video is generated by solving the domain translation from the depth to the color. By modeling the scene dynamics with a focus on the depth information, we were able to produce videos of higher quality than the conventional method. Furthermore, we show that our method produces better video samples than ones by conventional method in terms of both variety and quality when evaluating on facial expression and hand gesture datasets. 

GitHub project page

Y. Nakahira and K. Kawamoto, DCVGAN: Depth Conditional Video Generation, IEEE International Conference on Image Processing (ICIP), pp. 749-753,  2019.

Y. Nakahira and K. Kawamoto, Generative adversarial networks for generating RGB-D videos, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1276-1281, 2018.