THE FUTURE IS HERE

Prafulla Dhariwal (OpenAI) – Jukebox: A Generative Model for Music

Prafulla Dhariwal (OpenAI)
Jukebox: A Generative Model for Music
Presentation recorded June 19, 2020

Abstract: Music is an extremely challenging domain for generative modeling: it’s highly diverse, humans are perceptive to small errors, and it has extremely long range dependencies to learn if generated as raw audio. We show it’s possible to generate music with singing directly in the raw audio domain. We tackle the long sequence lengths of raw audio using a multi-scale VQ-VAE to compress it to discrete codes, and model those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.

Bio: Prafulla Dhariwal is a research scientist at OpenAI leading work on generative models under the guidance of Ilya Sutskever. His work focuses on modeling high dimensional data while preserving fidelity and diversity, with prominent works being Glow, a normalizing flow generating high resolution images with fast sampling; and Variational Lossy Auto-encoder, a way to understand and prevent latent collapse with autoregressive decoders in VAE’s. In the past, he’s also worked on reinforcement learning, including PPO, a popular on-policy RL algorithm; and GamePad, an environment to make it easier to apply RL to formal theorem proving. He obtained his undergraduate degree from MIT in 2017 with a double major in Computer Science and Mathematics.