Russ Salakhutdinov

Multimodal AI Agents 

In recent years, the rise of Large Language Models (LLMs) with advanced general capabilities has accelerated progress toward building language-guided agents capable of performing complex, multi-step tasks, much like human assistants. Developing agents that can perceive, plan, and act autonomously has long been a central goal of artificial intelligence research.

In this talk I will introduce Multimodal AI agents capable of planning, reasoning, and executing actions on the web, that can not only comprehend textual information but also effectively navigate and interact with visual settings. I will present VisualWebArena, a novel framework for evaluating multimodal autonomous language agents, along with an inference-time search algorithm that enables explicit exploration and multi-step planning in interactive web environments. Next, I will demonstrate how an automated data pipeline can facilitate Internet-scale web-agent training by generating web navigation tasks across 150,000 live websites, deploying LLM agents, and assessing their performance. Finally, I will discuss some insights for developing more capable autonomous agents in both digital and physical environments.

Bio:

Russ Salakhutdinov earned his PhD in Computer Science from the University of Toronto under the supervision of Nobel Laureate Geoffrey Hinton. After completing a postdoctoral fellowship at MIT, he joined the University of Toronto before moving to Carnegie Mellon University. He also served as Director of AI Research at Apple and is currently the VP of Research at Meta. Russ's research focuses on deep learning, machine learning, and generative AI. He is an action editor for the Journal of Machine Learning Research and has served on the senior program committees of top-tier conferences, including NeurIPS, ICLR, and ICML. He was the program co-chair for ICML 2019 and the general chair for ICML 2024. He has published over 250 research papers and his work has received over 200,000 citations according to Google Scholar. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, a recipient of the Early Researcher Award, Google Faculty Award, and Nvidia's Pioneers of AI award.