Rémi Flamary

Professional website

Optimization for Data Science

Courses

This course is taught with Alexandre Gramfort and supports have been written by both of us based on the previous documents of Pierre Ablin and Robert M. Gower.

Course overview:

  1. Introduction to optimization for data science [PDF]
    • ML optimization problems and linear algebra recap
    • Optimization problems and their properties (Convexity, smoothness)
  2. Smooth optimization : Gradient descent [PDF]
    • First order algorithms, convergence for smooth and strongly convex functions
  3. Smooth Optimization : Quadratic problems and linesearch methods [ZIP]
    • Solvers for quadratic problems, conjugate gradient [PDF]
    • Linesearch methods [PDF]
  4. Non-smooth Optimization : Proximal methods [PDF]
    • Proximal operator and proximal algorithms
    • Lab 1: Lasso and group Lasso
  5. Stochastic Gradient Descent
    • SGD and variance reduction techniques
    • Lab 2: SGD for Logistic regression
  6. Standard formulation of constrained optimization problems
    • LP, QP and Mixed Integer Programming
  7. Coordinate descent
    • Algorithms and Labs
  8. Newton and quasi-newton methods
    • Second order methods and Labs
  9. Beyond convex optimization
    • Nonconvex reg., Frank-Wolfe, DC programming, autodiff

Exercises

  • Intro + gradient descent [PDF]

Labs / Practical sessions

Practical sessions will require a working Python environnement with the libraries Numpy/Scipy and Matplotlib installed. You can get such an environnement for Windows/Linux/MacOSX on Anaconda.

Here is a list of nice references and Python tutorials :

Bibliography

  • Boyd & Vandenberghe: Convex Optimization. Chapters 2, 3 and 4 for a revision on convexity and chapter 9 for a revision on unconstrained optimization. Freely available here.
  • Shalev-Shwartz & Ben-David: Understanding Machine Learning, from Theory to Algorithms. Chapters 1 and 2 for a frequentist introduction to Machine Learning. Freely available here
  • Bubeck: Convex Optimization: Algorithms and Complexity. Chapter 6 for additional proofs for stochastic gradient methods including SGD and SVRG. Freely available here
  • Amir Beck and Marc Teboulle (2009), SIAM J. Imaging Sciences, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. Freely available here.
  • Robert Gower et al (2019), Proceedings of Machine Learning Research, Volume 97, SGD: general analysis and improved rates. Freely available here