Hadi Daneshmand
|
Postdoctoral Researcher at the Foundation of Data Science Institute
Contact
MIT D32-588
Cambridge, USA
Boston University, 665 Commonwealth ave.
hdanesh at mit dot edu
|
I am a postdoc researcher at FODSI, hosted by
MIT and Boston University. Before, I was a SNSF postdoctoral researcher at Princeton University and INRIA Paris. I accomplished my PhD in computer science at ETH Zurich.
Research
I am developing theoretical guarantees for deep neural networks and studying their mechanism. While deep nets are often perceived as statistical parametric models, I study neural networks from computational perspectives , linking their feature extraction to continuous optimization methods. Using this technique, I have established the following results:
- Language models can provably solve optimal transport and, therefore, can sort lists of arbitrary length up to an approximation error
- Key building blocks of neural networks, known as normalization layers, inherently whiten data
- Language models are capable of "in-context" solving of regression and evaluation problems by simulating gradient descent.
These slides present an overview of my research up to February 2024.
Awards
- Research
- Service
Reviewer awards for my service to
- ICML 22
- NeurIPS 20
- ICML 19
- Understanding Large Language Models
- [21] Provable optimal transport with transformers: The essence of depth and prompt engineering, ArXiv 2024 (code)
- [20] Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning with Jiuqi Wang, Ethan Blaser, and Shangtong Zhang. ICML workshop on In-context Learning (spotlight award)
- [19] Transformers learn to implement preconditioned gradient descent for in-context learning with Kwangjun Ahn, Xiang Cheng and Suvrit Sra. NeurIPs 23 (slides)
- Bridging the gap between theory and practice in deep learning (slides)
- [18] Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion with Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Ratsch. ICLR 24
- [17] On the impact of activation and normalization in obtaining isometric embeddings at initialization with Amir Joudaki and Francis Bach. NeurIPs 23
- [16] On bridging the gap between mean field and finite width in deep random neural networks with batch normalization with Amir Joudaki and Francis Bach, ICML 23 (code, poster)
- [15] Efficient displacement convex optimization with particle gradient descent with Jason D. Lee, and Chi Jin. ICML 23 (code, poster)
- Data representations in deep random neural networks (slides)
- [14] Batch normalization orthogonalizes representations in deep random networks with Amir Joudaki, and Francis Bach. (spotlight) NeurIPS 21 (code, poster)
- [13] Batch normalization provably avoids ranks collapse for randomly initialised deep networks with Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi. NeurIPS 20 (code, poster)
- Learning dynamics in deep neural networks
- [12] Exponential convergence rates for batch normalization: The power of length-direction decoupling in non-convex optimization with Jonas Kohler, Aurelien Lucchi, Thomas Hofmann, Ming Zhou and Klaus Neymeyr. AISTATS 19
- [11] Local saddle point optimization: A curvature exploitation approach with Leonard Adolphs, Aurelien Lucchi and Thomas Hofmann. AISTATS 19 (poster)
- [10] Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design with Francis Bach, arXiv 2022
Modeling accelerated optimization methods by ordinary differential equations
- [9] Rethinking the Variational Interpretation of Accelerated Optimization Methods with Peiyuan Zhang, and Antonio Orvieto. NeurIPS 21 (poster)
-
[8] Revisiting the role of euler numerical integration on acceleration and stability in convex optimization with Peiyuan Zhang, Antonio Orvieto, Thomas Hofmann and Roy S Smith. AISTATS 18 (poster)
Stochastic optimization for machine learning
- [7] Escaping saddles with stochastic gradients with Jonas Kohler, Aurelien Lucchi and Thomas Hofmann. (long presentation) ICML 18(slides)
- [6] Adaptive newton method for empirical risk minimization to statistical accuracy with Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann and Alejandro Ribeiro. NeurIPS 16
- [5] Starting small-learning with adaptive sample sizes with Aurelien Lucchi and Thomas Hofmann. ICML 16
Inference of graphs from diffusion processes
- [4] Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. ICML 14 (Recomended to JMLR)
- [3] Estimating diffusion networks: Recovery conditions, sample complexity & soft-thresholding algorithm with Manuel Gomez-Rodriguez, Le Song and Bernhard Schoelkopf. JMLR 16
- [2] A Time-Aware Recommender System Based on Dependency Network of Items with Amin Javari, Seyed Ebrahim Abtahi and Mahdi Jalili. The Computer Journal 14
- [1] Inferring causal molecular networks: empirical assessment through a community-based effort with Steven M Hill, Laura M Heise et al. Nature Methods
Talks
- Invited to the final presentation for Vienna Research Groups for Young Investigators Grant (1.6 million euro), Austria, 2024. My sincere thanks go to Benjamin Roth and Sebastian Schuster for their excellent advice and help
- What makes neural networks statistically powerful, and optimizable? Extra Seminar on Artificial Intelligence of University of Groningen and The University of Edinburgh 2024 (slides)
- Algorithmic View on Neural Information Processing. Mathematics, Information, and Computation Seminar, New York University 2023 (slides)
- Beyond Theoretical Mean-field Neural Networks at ISL Colloquium, Stanford University, July 23 (slides)
- Data representation in deep random neural networks at ML tea talks, MIT, March 23 (slides)
- The power of depth in random neural networks at Princeton University, April 22
- Batch normalization orthogonalizes representations in deep random neural networks, spotlight at NeurIPs 21 (slides)
- Representations in Random Deep Neural Networks at INRIA 21 (slides)
- Escaping Saddles with Stochastic Gradients at ICML 18 (slides)
Academic Services
- Area chair for
- A member of the organizing team for
- Session chair at INFORMS/IOS 24
- Session chair at NeurIPS 23
- Reviewing talks for NeurIPS 23
- ICLR 24 Workshop on Bridging the Gap Between Practice and Theory in Deep Learning led by Jingzhao Zhang
- TILOS & OPTML++ seminars at MIT, 2023
- Reviewer for ICML, NeurIPs, ICLR and AISTATS
- Journal reviewer for Data Mining and Knowledge Discovery, Neurocomputing, TPAMI, and TSIPN
I am privileged to work with excellent students who are potential future leaders of machine learning research.
- Amir Joudaki, Ph.D. student at ETH Zurich (20-24), Admitted to a postdoctoral position at Broad Institute
- Alexandru Meterez, Master's student at ETH (22-23), Joined Harvard University for PhD.
- Flowers Alec Massimo, Master's student at ETH (23), Joined INVIDIA
- Jonas Kohler, former Ph.D. student at ETH (18-20), joined Meta
- Antonio Orvieto, Ph.D. student at ETH Zurich (20-21)
- Peiyuan Zhang, Master's student at ETH (19-21), joined Yale University for Ph.D.
- Leonard Adolphs, Master's student at ETH (18-19), joined ETH Zurich for Ph.D.
- Alexandre Bense, Master's student at ETH (22)
- Alireza Amani, Intern at ETH (18), joined London Business School for Ph.D.
|