Arun Mallya

Currently a Research Scientist at Meta (Gen AI → MSL) working on video generation, video editing models, and model auto-evaluation.

Previously a Senior Research Scientist in the Deep Imagination Research (DIR) group at NVIDIA (now Cosmos Lab). Part of the lab since its inception, when it began with just 3 members.

PhD from the University of Illinois at Urbana-Champaign, advised by Prof. Svetlana Lazebnik. M.S. in CS at UIUC; B.Tech in CSE at IIT Kharagpur.

My research currently focuses on generative content creation and their evaluation with vision-langauge models. I have previously dabbled in neural rendering and neural network efficiency, as well as probing their unexpected properties.

Arun Mallya

Selected Research

See all on Google Scholar →

Image/Video Generation

Movie Gen: A Cast of Media Foundation Models
Technical Report, Meta, 2024
Text-to-video, video-editing diffusion models
Edify Image teaser
Edify Image Generation
Technical Report, NVIDIA, 2024
Image generation with pixel-diffusion models

Facial Animation

SPACE teaser
SPACE: Speech-driven Portrait Animation with Controllable Expression
International Conference on Computer Vision (ICCV), 2023
Photo animation with lip-sync using just input speech
IMWA teaser
Implicit Warping for Animation with Image Sets
Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
Neural Information Processing Systems (NeurIPS), 2022
Complex motion transfer warping replace by a single cross-attention layer
face-vid2vid teaser
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing oral
Ting-Chun Wang, Arun Mallya, Ming-Yu Liu
Computer Vision and Pattern Recognition (CVPR), 2021
3-D controllable photo animation using motion transfer from video

Neural Rendering

LoE teaser
Implicit Neural Representations with Levels-of-Experts
Neural Information Processing Systems (NeurIPS), 2022
Like MoE, but using a hierarchy of position-dependent and periodic weights
GANcraft teaser
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds oral
International Conference on Computer Vision (ICCV), 2021
Generating realistic view-consistent renders of 3D block worlds

Model Efficiency / Interesting Properties

GradInversion teaser
See through Gradients: Image Batch Recovery via GradInversion
Hongxu Yin, Arun Mallya, Arash Vahdat, Jose Alvarez, Pavlo Molchanov, Jan Kautz
Computer Vision and Pattern Recognition (CVPR), 2021
We can invert an entire batch of images just from averaged gradients
DeepInversion teaser
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion oral
Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose Alvarez, Arun Mallya, Derek Hoiem, Niraj Jha, Jan Kautz
Computer Vision and Pattern Recognition (CVPR), 2020
We can reproduce training data just from a trained neural network
Piggyback teaser
Piggyback: Adding Multiple Tasks to a Single, Fixed Network by Learning to Mask
Arun Mallya, Dillon Davis, Svetlana Lazebnik
European Conference on Computer Vision (ECCV), 2018
We can make a neural network learn new tasks by simply masking existing, fixed weights
PackNet teaser
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning
Arun Mallya, Svetlana Lazebnik
Computer Vision and Pattern Recognition (CVPR), 2018
We can add new tasks to a neural network by pruning and training a small number of weights

Tutorials & Workshops

  1. Machine Learning with Synthetic Data, CVPR 2022
  2. Accelerating Computer Vision with Mixed Precision, ECCV 2020
  3. Accelerating Computer Vision with Mixed Precision, ICCV 2019

Writeups & Notes

Hosted on GitHub. Edit requests, additions, and corrections are welcome.

  1. A Backpropagation Refresher
  2. An Illustrated Explanation of the LSTM Forward-Backward Pass
  3. Introduction to RNNs
  4. Introduction to RNNs — II
  5. Jupyter notebook to find Receptive Field Size and Effective Stride (supports dilated convs)
  6. Visualization of neuron connections and receptive field of a CNN (including dilation)