Hadi Daneshmand

Assistant Professor of Computer Science

Contact

105 UVA Rice Hall
85 Engineer's Way, Charlottesville, VA 22903
Email: dhadiATvirginiaDOTedu

I am an Assistant Professor of Computer Science at the University of Virginia (UVA). Prior to joining UVA, I was a FODSI postdoctoral researcher, hosted by MIT and Boston University. Before that, I served as a postdoctoral researcher at Princeton University as well as INRIA Paris. I completed my PhD in computer science at ETH Zurich in 2020.

News

I am deeply honored and grateful to receive the Stanford CPAL Rising Star Award
Looking forward to serving ICML25 as an area chair
I am developing a new course on "Neural Network: A Theory Lab" at UVA (website, notes)
Our ICLR25 paper with Jiuqi Wang, Ethan Blaser and Shangtong Zhang is accepted
Jiuqi Wang, Ethan Blaser and Shangtong Zhang and I are honored to receive the spotlight award at ICML24 in-context learning workshop

Research

I am developing theoretical guarantees for deep neural networks and studying their mechanism. While deep nets are often perceived as statistical parametric models, I study neural networks from computational perspectives , linking their feature extraction to continuous optimization methods. Using this technique, I have established the following results:

Effective prompt engineering provably enhances the computational capabilities of language models, enabling them to solve the optimal transport problem, hence sorting [21]
Language models can implement Temporal Difference Method for In-Context Reinforcement Learning [20]
Key building blocks of neural networks, known as normalization layers, inherently whiten data [16-18]
Language models are capable of "in-context" solving of regression and evaluation problems by simulating gradient descent [19]

This line of research paves the way for reliable and interpretable computation using statistical models, recognized and awarded at the ICML24 workshop on in-context learning and Stanford CPAL25 (Rising Star Award).

Awards

Research

Rising Star Award, The Conference on Parsimony and Learning at Stanford University
Spotlight award with Jiuqi Wang, Ethan Blaser , and Shangtong Zhang for the publication [20] at the ICML In-context Learning workshop in 2024
Postdoc fellowship of the Foundation of Data Science Insititute (FODSI) . Outputs: [17], [18] , [19] , [20] , [21]
Early Postdoc Mobility grant from SNSF for Bridging the gap between local and global optimization in machine learning, 20. Outputs: [10], [15] and [16]
Best poster award in deep learning workshop of Max Planck ETH center for learning systems 2016.

Service

ICML 22
NeurIPS 20
ICML 19

Teaching

"Neural Networks: A Theory Lab" is a course designed to introduce the theoretical foundations of neural computing through hands-on, in-class coding exercises (website, notes).

Publications (Google Scholar)

Understanding Large Language Models

[21] Provable optimal transport with transformers: The essence of depth and prompt engineering, ArXiv 2024 (code)
[20] Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning with Jiuqi Wang, Ethan Blaser, and Shangtong Zhang. ICML workshop on In-context Learning (spotlight award)
[19] Transformers learn to implement preconditioned gradient descent for in-context learning with Kwangjun Ahn, Xiang Cheng and Suvrit Sra. NeurIPs 23 (slides)

Bridging the gap between theory and practice in deep learning (slides)

[18] Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion with Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Ratsch. ICLR 24
[17] On the impact of activation and normalization in obtaining isometric embeddings at initialization with Amir Joudaki and Francis Bach. NeurIPs 23
[16] On bridging the gap between mean field and finite width in deep random neural networks with batch normalization with Amir Joudaki and Francis Bach, ICML 23 (code, poster)
[15] Efficient displacement convex optimization with particle gradient descent with Jason D. Lee, and Chi Jin. ICML 23 (code, poster)

Data representations in deep random neural networks (slides)

[14] Batch normalization orthogonalizes representations in deep random networks with Amir Joudaki, and Francis Bach. (spotlight) NeurIPS 21 (code, poster)
[13] Batch normalization provably avoids ranks collapse for randomly initialised deep networks with Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi. NeurIPS 20 (code, poster)

Learning dynamics in deep neural networks
- [12] Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization with Jonas Kohler, Aurelien Lucchi, Thomas Hofmann, Ming Zhou and Klaus Neymeyr. AISTATS 19
- [11] Local saddle point optimization: A curvature exploitation approach with Leonard Adolphs, Aurelien Lucchi and Thomas Hofmann. AISTATS 19 (poster)
- [10] Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design with Francis Bach, arXiv 2022
Modeling accelerated optimization methods by ordinary differential equations
- [9] Rethinking the Variational Interpretation of Accelerated Optimization Methods with Peiyuan Zhang, and Antonio Orvieto. NeurIPS 21 (poster)
- [8] Revisiting the role of euler numerical integration on acceleration and stability in convex optimization with Peiyuan Zhang, Antonio Orvieto, Thomas Hofmann and Roy S Smith. AISTATS 18 (poster)
Stochastic optimization for machine learning

[7] Escaping saddles with stochastic gradients with Jonas Kohler, Aurelien Lucchi and Thomas Hofmann. (long presentation) ICML 18(slides)
[6] Adaptive newton method for empirical risk minimization to statistical accuracy with Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann and Alejandro Ribeiro. NeurIPS 16
[5] Starting small-learning with adaptive sample sizes with Aurelien Lucchi and Thomas Hofmann. ICML 16

Inference of graphs from diffusion processes

[4] Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. ICML 14 (Recomended to JMLR)
[3] Estimating diffusion networks: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. JMLR 16
[2] A Time-Aware Recommender System Based on Dependency Network of Items with Amin Javari, Seyed Ebrahim Abtahi and Mahdi Jalili. The Computer Journal 14
[1] Inferring causal molecular networks: empirical assessment through a community-based effort with Steven M Hill, Laura M Heise et al. Nature Methods

Talks

Invited to the final presentation for Vienna Research Groups for Young Investigators Grant (1.6 million euro), Austria, 2024. My sincere thanks go to Benjamin Roth and Sebastian Schuster for their excellent advice and help
What makes neural networks statistically powerful, and optimizable? Extra Seminar on Artificial Intelligence of University of Groningen and The University of Edinburgh 2024 (slides)
Algorithmic View on Neural Information Processing. Mathematics, Information, and Computation Seminar, New York University 2023 (slides)
Beyond Theoretical Mean-field Neural Networks at ISL Colloquium, Stanford University, July 23 (slides)
Data representation in deep random neural networks at ML tea talks, MIT, March 23 (slides)
The power of depth in random neural networks at Princeton University, April 22
Batch normalization orthogonalizes representations in deep random neural networks, spotlight at NeurIPs 21 (slides)
Representations in Random Deep Neural Networks at INRIA 21 (slides)
Escaping Saddles with Stochastic Gradients at ICML 18 (slides)

Academic Services

Area chair for
- NeurIPS 24
- NeurIPS 23
A member of the organizing team for
- Session chair at INFORMS/IOS 24
- Session chair at NeurIPS 23
- Reviewing talks for NeurIPS 23
- ICLR 24 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning led by Jingzhao Zhang
- TILOS & OPTML++ seminars at MIT, 2023
Reviewer for ICML, NeurIPs, ICLR and AISTATS
Journal reviewer for Data Mining and Knowledge Discovery, Neurocomputing, TPAMI, and TSIPN

Mentorship

I am privileged to work with excellent students who are potential future leaders of machine learning research.

Amir Joudaki, Ph.D. student at ETH Zurich (20-24), Admitted to a postdoctoral position at Broad Institute
Alexandru Meterez, Master's student at ETH (22-23), Joined Harvard University for PhD.
Flowers Alec Massimo, Master's student at ETH (23), Joined INVIDIA
Jonas Kohler, former Ph.D. student at ETH (18-20), joined Meta
Antonio Orvieto, Ph.D. student at ETH Zurich (20-21)
Peiyuan Zhang, Master's student at ETH (19-21), joined Yale University for Ph.D.
Leonard Adolphs, Master's student at ETH (18-19), joined ETH Zurich for Ph.D.
Alexandre Bense, Master's student at ETH (22)
Alireza Amani, Intern at ETH (18), joined London Business School for Ph.D.