Optimization for Data Science
Courses
This course is taught with Alexandre Gramfort
and supports have been written by both of us based on the previous documents of Pierre
Ablin and Robert M. Gower.
Course overview:
- Introduction to optimization for data science [PDF]
- ML optimization problems and linear algebra recap
- Optimization problems and their properties (Convexity, smoothness)
- Smooth optimization : Gradient descent [PDF]
- First order algorithms, convergence for smooth and strongly convex functions
- Smooth Optimization : Quadratic problems and linesearch methods [ZIP]
- Solvers for quadratic problems, conjugate gradient [PDF]
- Linesearch methods [PDF]
- Non-smooth Optimization : Proximal methods [PDF]
- Proximal operator and proximal algorithms
- Lab 1: Lasso and group Lasso
- Stochastic Gradient Descent [PDF]
- SGD and variance reduction techniques
- Lab 2: SGD for Logistic regression
- Standard formulation of constrained optimization problems [PDF]
- LP, QP and Mixed Integer Programming
- Coordinate descent
- Newton and quasi-newton methods [PDF] [ZIP]
- Second order methods and Labs
- Beyond convex optimization
- Nonconvex reg., Frank-Wolfe, DC programming, autodiff
Exercises
- Intro + gradient descent [PDF]
- Stochastic gradient descent [PDF]
Labs / Practical sessions
Practical sessions will require a working Python environnement with the libraries
Numpy/Scipy and Matplotlib installed. You can get such an environnement for
Windows/Linux/MacOSX on Anaconda.
Here is a list of nice references and Python tutorials :
Acknowledgements
I would like to thank the following people for their help in the
preparation of this course: Matthieu Terris,
Joël Garde. I also want to thanks the following people for spotting typos : Grégoire Mourre.
Bibliography
- Boyd & Vandenberghe: Convex Optimization. Chapters 2, 3 and 4 for a revision
on convexity and chapter 9 for a revision on unconstrained optimization.
Freely available here.
- Shalev-Shwartz & Ben-David: Understanding Machine Learning, from Theory to
Algorithms. Chapters 1 and 2 for a frequentist introduction to Machine
Learning. Freely available here
- Bubeck: Convex Optimization: Algorithms and Complexity. Chapter 6 for
additional proofs for stochastic gradient methods including SGD and SVRG. Freely available here
- Amir Beck and Marc Teboulle (2009), SIAM J. Imaging Sciences, A Fast
Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. Freely available here.
- Robert Gower et al (2019), Proceedings of Machine Learning Research, Volume
97, SGD: general analysis and improved rates. Freely available here