|
Hadi Daneshmand
|
Assistant Professor of Computer Science
Contact
503 UVA Rice Hall
85 Engineer's Way, Charlottesville, VA 22903
Email: dhadiATvirginiaDOTedu
|
I am an Assistant Professor of Computer Science at the University of Virginia. My research focuses on the theoretical foundations of machine learning, with an emphasis on theoretical analysis for explainability and reliability of generative AI using tools from probability theory, applied mathematics, and mathematical physics.
Prior to joining UVA, I was a FODSI postdoctoral researcher, hosted by MIT and Boston University. Before that, I served as a postdoctoral researcher at Princeton University as well as INRIA Paris. I completed my PhD in computer science at ETH Zurich in 2020. I had the privilege of being advised early in my research training by Professors Francis Bach and Thomas Hofmann.
Research
Foundations of Generative AI
In a world increasingly shaped by generative AI, it is crucial to understand
the underlying mechanisms that govern these models in order to rigorously
characterize their strengths and limitations. My research focuses on the theoretical foundations of generative AI:
analyzing its mechanisms, formulating challenges for reliable and efficient data generation,
and designing new generative methods that are mathematically tractable and provably reliable.
Analyzable Generative Models Using Mathematical Physics
We use tools from mathematical physics to design novel generative methods
that can be analyzed rigorously in asymptotic regimes. These generative methods
provably generate data from the underlying distribution under
weak assumptions without noise injection and neural networks;
instead, it is implemented based on first-order optimization. The figure illustrates how interacting particles evolve to generate data from the Swiss roll dataset.
See [22]
In-Context Learning and Mechanistic Analysis of Large Language Models
This line of work examines how large language models internally generate data. By analyzing their attention patterns and internal representations, we aim to uncover mechanistic explanations for how transformers perform tasks such as sorting, token alignment, reinforcement learning, and regression without changing their parameters. Our goal is to move beyond black-box performance and develop a mathematically grounded understanding of what these models compute and when they fail.
The figure shows the attention heat map for token alignment across layers of a large language model.
See [23], [21], [20] and [19]
Markov Chain Analysis of Random Deep Neural Networks
The intermediate data representations in deep neural networks form a Markov chain at initialization. In this project, we apply tools from Markov chain theory to analyze this chain. This perspective provides a principled explanation of a key architectural component in deep neural networks and large language models: normalization layers. By establishing a stochastic stability analysis, we show that normalization inherently biases representations toward being whitened across layers. This finding helps explain why normalization plays a crucial role in improving optimization and training stability in foundation models.
See papers [18], [17], [16], [14] and [13]
Optimization Challenges in Generative AI
Generative AI models are trained using highly non-convex objective functions that induce complex optimization dynamics. In this research direction, we investigate the fundamental optimization challenges that arise in generative AI. Generative adversarial networks, in particular, optimize a min–max objective using first-order methods. We introduce the notion of local optimization for min–max problems and prove that first-order dynamics can converge to stable attractors that are not locally optimal. This stands in sharp contrast to smooth optimization in supervised learning, where local optima are precisely the stable attractors of the dynamics.
See paper [11]
News
Awards
- Research
- Service
Reviewer awards for my service to
- ICML 22
- NeurIPS 20
- ICML 19
Teaching
- Machine Learning at UVA, undergraduate coures, we developed a series of machine learning puzzles that encourage understanding of foundations of ML
-
"Neural Networks: A Theory Lab" at UVA: A course designed to introduce the theoretical foundations of neural computing through hands-on, in-class coding exercises (website, notes).
- Foundations of Generative AI
- Bridging the gap between theory and practice in deep learning (slides)
- [18] Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion with Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Ratsch. ICLR 24
- [17] On the impact of activation and normalization in obtaining isometric embeddings at initialization with Amir Joudaki and Francis Bach. NeurIPs 23
- [16] On bridging the gap between mean field and finite width in deep random neural networks with batch normalization with Amir Joudaki and Francis Bach, ICML 23 (code, poster)
- [15] Efficient displacement convex optimization with particle gradient descent with Jason D. Lee, and Chi Jin. ICML 23 (code, poster)
- Data representations in deep random neural networks (slides)
- [14] Batch normalization orthogonalizes representations in deep random networks with Amir Joudaki, and Francis Bach. (spotlight) NeurIPS 21 (code, poster)
- [13] Batch normalization provably avoids ranks collapse for randomly initialised deep networks with Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi. NeurIPS 20 (code, poster)
- Learning dynamics in deep neural networks
- [12] Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization with Jonas Kohler, Aurelien Lucchi, Thomas Hofmann, Ming Zhou and Klaus Neymeyr. AISTATS 19
- [11] Local saddle point optimization: A curvature exploitation approach with Leonard Adolphs, Aurelien Lucchi and Thomas Hofmann. AISTATS 19 (poster)
- [10] Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design with Francis Bach, arXiv 2022
Modeling accelerated optimization methods by ordinary differential equations
- [9] Rethinking the Variational Interpretation of Accelerated Optimization Methods with Peiyuan Zhang, and Antonio Orvieto. NeurIPS 21 (poster)
-
[8] Revisiting the role of euler numerical integration on acceleration and stability in convex optimization with Peiyuan Zhang, Antonio Orvieto, Thomas Hofmann and Roy S Smith. AISTATS 18 (poster)
Stochastic optimization for machine learning
- [7] Escaping saddles with stochastic gradients with Jonas Kohler, Aurelien Lucchi and Thomas Hofmann. (long presentation) ICML 18(slides)
- [6] Adaptive newton method for empirical risk minimization to statistical accuracy with Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann and Alejandro Ribeiro. NeurIPS 16
- [5] Starting small-learning with adaptive sample sizes with Aurelien Lucchi and Thomas Hofmann. ICML 16
Inference of graphs from diffusion processes
- [4] Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. ICML 14 (Recomended to JMLR)
- [3] Estimating diffusion networks: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. JMLR 16
- [2] A Time-Aware Recommender System Based on Dependency Network of Items with Amin Javari, Seyed Ebrahim Abtahi and Mahdi Jalili. The Computer Journal 14
- [1] Inferring causal molecular networks: empirical assessment through a community-based effort with Steven M Hill, Laura M Heise et al. Nature Methods
Talks
- Forthcoming Talk
- "Data Generation without Function Estimation" will be presented at the NeurIPS 2025 Workshop on Optimization for Machine Learning.
- Past
- "Learning to Compute" presented at Seminar Series "Youth in High-Dimensions", The Abdus Salam International Centre for Theoretical Physics, 2025
- "Understanding Test-time Inference in LLMs" presented at Conference on Parsimony and Learning, Stanford University, 2025
- Invited to the final presentation for Vienna Research Groups for Young Investigators Grant (1.6 million euro), Austria, 2024. My sincere thanks go to Benjamin Roth and Sebastian Schuster for their excellent advice and help
- "What makes neural networks statistically powerful, and optimizable?" presented at Extra Seminar on Artificial Intelligence of University of Groningen and The University of Edinburgh 2024 (slides)
- "Algorithmic View on Neural Information Processing". Mathematics, Information, and Computation Seminar, New York University 2023 (slides)
- Beyond Theoretical Mean-field Neural Networks at ISL Colloquium, Stanford University, July 23 (slides)
- Data representation in deep random neural networks at ML tea talks, MIT, March 23 (slides)
- The power of depth in random neural networks at Princeton University, April 22
- Batch normalization orthogonalizes representations in deep random neural networks, spotlight at NeurIPs 21 (slides)
- Representations in Random Deep Neural Networks at INRIA 21 (slides)
- Escaping Saddles with Stochastic Gradients at ICML 18 (slides)
Academic Services
- Area chair for
- Conference on Neural Information Processing Systems 2023, 2024, and 2025
- International Conference on Machine Learning 2025
- A member of the organizing team for
- Session chair at INFORMS/IOS 24
- Session chair at NeurIPS 23
- Reviewing talks for NeurIPS 23
- ICLR 24 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning led by Jingzhao Zhang
- TILOS & OPTML++ seminars at MIT, 2023
- Reviewer for ICML, NeurIPs, ICLR and AISTATS
- Journal reviewer for Data Mining and Knowledge Discovery, Neurocomputing, TPAMI, and TSIPN
I am privileged to work with excellent students who are potential future leaders of machine learning research.
- Amir Joudaki, Ph.D. student at ETH Zurich (20-24), Admitted to a postdoctoral position at Broad Institute
- Alexandru Meterez, Master's student at ETH (22-23), Joined Harvard University for PhD.
- Flowers Alec Massimo, Master's student at ETH (23), Joined INVIDIA
- Jonas Kohler, former Ph.D. student at ETH (18-20), joined Meta
- Antonio Orvieto, Ph.D. student at ETH Zurich (20-21)
- Peiyuan Zhang, Master's student at ETH (19-21), joined Yale University for Ph.D.
- Leonard Adolphs, Master's student at ETH (18-19), joined ETH Zurich for Ph.D.
- Alexandre Bense, Master's student at ETH (22)
- Alireza Amani, Intern at ETH (18), joined London Business School for Ph.D.
|