Compositional Zero-Shot Video Generation by Deep Learning

We propose a conditional generative adversarial network (GAN) model for zero-shot video generation. In this study, we have explored zero-shot conditional generation setting. In other words, we generate unseen videos from training samples with missing classes. The task is an extension of conditional data generation. The key idea is to learn disentangled representations in the latent space of a GAN. To realize this objective, we base our model on the motion and content decomposed GAN and conditional GAN for image generation. We build the model to find better-disentangled representations and to generate good-quality videos. We demonstrate the effectiveness of our proposed model through experiments on the Weizmann action database and the MUG facial expression database.  

  • Shun Kimura and Kazuhiko Kawamoto, "Conditional Motion and Content Decomposed GAN for Zero-Short Video Generation", In Proc. of the 7th International Workshop on Advanced Computational Intelligence and Intelligent Informatics, 2021.
  • Shun Kimura, Kazuhiko Kawamoto, Conditional MoCoGAN for Zero-Shot Video Generation, arXiv:2109.05864, 2021.