Hadi Daneshmand

 

Assistant Professor of Computer Science

Contact

503 UVA Rice Hall
85 Engineer's Way, Charlottesville, VA 22903
Email: dhadiATvirginiaDOTedu


I am an Assistant Professor of Computer Science at the University of Virginia. My research focuses on the theoretical foundations of machine learning, with an emphasis on theoretical analysis for explainability and reliability of generative AI using tools from probability theory, applied mathematics, and mathematical physics. Prior to joining UVA, I was a FODSI postdoctoral researcher, hosted by MIT and Boston University. Before that, I served as a postdoctoral researcher at Princeton University as well as INRIA Paris. I completed my PhD in computer science at ETH Zurich in 2020. I had the privilege of being advised early in my research training by Professors Francis Bach and Thomas Hofmann.

Research

Foundations of Generative AI

In a world increasingly shaped by generative AI, it is crucial to understand the underlying mechanisms that govern these models in order to rigorously characterize their strengths and limitations. My research focuses on the theoretical foundations of generative AI: analyzing its mechanisms, formulating challenges for reliable and efficient data generation, and designing new generative methods that are mathematically tractable and provably reliable.
Analyzable Generative Models Using Mathematical Physics
Generative models animation
We use tools from mathematical physics to design novel generative methods that can be analyzed rigorously in asymptotic regimes. These generative methods provably generate data from the underlying distribution under weak assumptions without noise injection and neural networks; instead, it is implemented based on first-order optimization. The figure illustrates how interacting particles evolve to generate data from the Swiss roll dataset.
See [22]
In-Context Learning and Mechanistic Analysis of Large Language Models
In-context learning animation
This line of work examines how large language models internally generate data. By analyzing their attention patterns and internal representations, we aim to uncover mechanistic explanations for how transformers perform tasks such as sorting, token alignment, reinforcement learning, and regression without changing their parameters. Our goal is to move beyond black-box performance and develop a mathematically grounded understanding of what these models compute and when they fail. The figure shows the attention heat map for token alignment across layers of a large language model.
See [23], [21], [20] and [19]
Markov Chain Analysis of Random Deep Neural Networks
Markov chain dynamics in deep nets
The intermediate data representations in deep neural networks form a Markov chain at initialization. In this project, we apply tools from Markov chain theory to analyze this chain. This perspective provides a principled explanation of a key architectural component in deep neural networks and large language models: normalization layers. By establishing a stochastic stability analysis, we show that normalization inherently biases representations toward being whitened across layers. This finding helps explain why normalization plays a crucial role in improving optimization and training stability in foundation models.
See papers [18], [17], [16], [14] and [13]
Optimization Challenges in Generative AI
Optimization in generative AI
Generative AI models are trained using highly non-convex objective functions that induce complex optimization dynamics. In this research direction, we investigate the fundamental optimization challenges that arise in generative AI. Generative adversarial networks, in particular, optimize a min–max objective using first-order methods. We introduce the notion of local optimization for min–max problems and prove that first-order dynamics can converge to stable attractors that are not locally optimal. This stands in sharp contrast to smooth optimization in supervised learning, where local optima are precisely the stable attractors of the dynamics.
See paper [11]

News

Awards

Teaching

  • Machine Learning at UVA, undergraduate coures, we developed a series of machine learning puzzles that encourage understanding of foundations of ML
  • "Neural Networks: A Theory Lab" at UVA: A course designed to introduce the theoretical foundations of neural computing through hands-on, in-class coding exercises (website, notes).

Publications (Google Scholar)

Talks

  • Forthcoming Talk
    • "Data Generation without Function Estimation" will be presented at the NeurIPS 2025 Workshop on Optimization for Machine Learning.
  • Past
    • "Learning to Compute" presented at Seminar Series "Youth in High-Dimensions", The Abdus Salam International Centre for Theoretical Physics, 2025
    • "Understanding Test-time Inference in LLMs" presented at Conference on Parsimony and Learning, Stanford University, 2025
    • Invited to the final presentation for Vienna Research Groups for Young Investigators Grant (1.6 million euro), Austria, 2024. My sincere thanks go to Benjamin Roth and Sebastian Schuster for their excellent advice and help
    • "What makes neural networks statistically powerful, and optimizable?" presented at Extra Seminar on Artificial Intelligence of University of Groningen and The University of Edinburgh 2024 (slides)
    • "Algorithmic View on Neural Information Processing". Mathematics, Information, and Computation Seminar, New York University 2023 (slides)
    • Beyond Theoretical Mean-field Neural Networks at ISL Colloquium, Stanford University, July 23 (slides)
    • Data representation in deep random neural networks at ML tea talks, MIT, March 23 (slides)
    • The power of depth in random neural networks at Princeton University, April 22
    • Batch normalization orthogonalizes representations in deep random neural networks, spotlight at NeurIPs 21 (slides)
    • Representations in Random Deep Neural Networks at INRIA 21 (slides)
    • Escaping Saddles with Stochastic Gradients at ICML 18 (slides)

Academic Services

  • Area chair for
    • Conference on Neural Information Processing Systems 2023, 2024, and 2025
    • International Conference on Machine Learning 2025
  • A member of the organizing team for
    • Session chair at INFORMS/IOS 24
    • Session chair at NeurIPS 23
    • Reviewing talks for NeurIPS 23
    • ICLR 24 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning led by Jingzhao Zhang
    • TILOS & OPTML++ seminars at MIT, 2023
  • Reviewer for ICML, NeurIPs, ICLR and AISTATS
  • Journal reviewer for Data Mining and Knowledge Discovery, Neurocomputing, TPAMI, and TSIPN

Research Group

Mentorship

I am privileged to work with excellent students who are potential future leaders of machine learning research.
  • Amir Joudaki, Ph.D. student at ETH Zurich (20-24), Admitted to a postdoctoral position at Broad Institute
  • Alexandru Meterez, Master's student at ETH (22-23), Joined Harvard University for PhD.
  • Flowers Alec Massimo, Master's student at ETH (23), Joined INVIDIA
  • Jonas Kohler, former Ph.D. student at ETH (18-20), joined Meta
  • Antonio Orvieto, Ph.D. student at ETH Zurich (20-21)
  • Peiyuan Zhang, Master's student at ETH (19-21), joined Yale University for Ph.D.
  • Leonard Adolphs, Master's student at ETH (18-19), joined ETH Zurich for Ph.D.
  • Alexandre Bense, Master's student at ETH (22)
  • Alireza Amani, Intern at ETH (18), joined London Business School for Ph.D.