Rémi Flamary

Professional website

Home

photo

I am Professor in the Applied Mathematics department and CMAP Laboratory from École Polytechnique. I was previously associate professor at Université Côte d'Azur in the Department of Electronics and in the Lagrange Laboratory that is part of the Observatoire de la Côte d'Azur. I was a PhD student and teaching assistant at the LITIS Laboratory and my PhD advisor was Alain Rakotomamonjy at Rouen University.

On this website, you can find a list of my publications and download the corresponding software/code. Some of my teaching and presentation material is also available.

Research Interests

  • Machine learning and statistical signal processing
    • Optimal transport for ML, OT on graphs
    • Domain adaptation, transfer and multi-task learning, data shift
    • Optimization with sparsity, variable selection, mixed norms, non convex regularization
    • Learning representations, deep neural networks
  • Applications of machine learning
    • Biomedical engineering, Brain-Computer Interfaces
    • Energy and climate
    • Remote sensing and hyperspectral Imaging
    • Astronomical image processing

Recent work

P. Krzakala, J. Yang, R. Flamary, F. d'Alché-Buc, C. Laclau, M. Labeau, Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss, Neural Information Processing Systems (NeurIPS), 2024.
Abstract: We propose Any2graph, a generic framework for end-to-end Supervised Graph Prediction (SGP) i.e. a deep learning model that predicts an entire graph for any kind of input. The framework is built on a novel Optimal Transport loss, the Partially-Masked Fused Gromov-Wasserstein, that exhibits all necessary properties (permutation invariance, differentiability and scalability) and is designed to handle any-sized graphs. Numerical experiments showcase the versatility of the approach that outperform existing competitors on a novel challenging synthetic dataset and a variety of real-world tasks such as map construction from satellite image (Sat2Graph) or molecule prediction from fingerprint (Fingerprint2Graph).
BibTeX:
@inproceedings{krzakala2024endtoend,
author = {Paul Krzakala and Junjie Yang and Rémi Flamary and Florence d'Alché-Buc and Charlotte Laclau and Matthieu Labeau},
title = {Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2024}
}
T. Gnassounou, R. Flamary, A. Gramfort, Convolutional Monge Mapping Normalization for learning on biosignals, Neural Information Processing Systems (NeurIPS), 2023.
Abstract: In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances.
BibTeX:
@inproceedings{gnassounou2023convolutional,
author = {Gnassounou, Théo and Flamary, Rémi and Gramfort, Alexandre},
title = {Convolutional Monge Mapping Normalization for learning on biosignals},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2023}
}
H. Van Assel, T. Vayer, R. Flamary, N. Courty, SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities, Neural Information Processing Systems (NeurIPS), 2023.
Abstract: Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.
BibTeX:
@inproceedings{van2023snekhorn,
author = {Van Assel, Hugues and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
title = {SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2023}
}
C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Template based Graph Neural Network with Optimal Transport Distances, Neural Information Processing Systems (NeurIPS), 2022.
Abstract: Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.
BibTeX:
@inproceedings{vincentcuaz2022template,
author = { Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and Vayer, Titouan and Courty, Nicolas},
title = {Template based Graph Neural Network with Optimal Transport   Distances},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2022}
}
A. Thual, H. Tran, T. Zemskova, N. Courty, R. Flamary, S. Dehaene, B. Thirion, Aligning individual brains with Fused Unbalanced Gromov-Wasserstein, Neural Information Processing Systems (NeurIPS), 2022.
Abstract: Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel method for inter-subject alignment based on Optimal Transport, denoted as Fused Unbalanced Gromov Wasserstein (FUGW). The method aligns cortical surfaces based on the similarity of their functional signatures in response to a variety of stimulation settings, while penalizing large deformations of individual topographic organization. We demonstrate that FUGW is well-suited for whole-brain landmark-free alignment. The unbalanced feature allows to deal with the fact that functional areas vary in size across subjects. Our results show that FUGW alignment significantly increases between-subject correlation of activity for independent functional data, and leads to more precise mapping at the group level.
BibTeX:
@inproceedings{thual2022aligning,
author = { Thual, Alexis and Tran, Huy and Zemskova, Tatiana and Courty, Nicolas and Flamary, Rémi and Dehaene, Stanislas and Thirion, Bertrand},
title = {Aligning individual brains with Fused Unbalanced Gromov-Wasserstein},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2022}
}
C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Semi-relaxed Gromov Wasserstein divergence with applications on graphs, International Conference on Learning Representations (ICLR), 2022.
Abstract: Comparing structured objects such as graphs is a fundamental operation involved in many learning tasks. To this end, the Gromov-Wasserstein (GW) distance, based on Optimal Transport (OT), has proven to be successful in handling the specific nature of the associated objects. More specifically, through the nodes connectivity relations, GW operates on graphs, seen as probability measures over specific spaces. At the core of OT is the idea of conservation of mass, which imposes a coupling between all the nodes from the two considered graphs. We argue in this paper that this property can be detrimental for tasks such as graph dictionary or partition learning, and we relax it by proposing a new semi-relaxed Gromov-Wasserstein divergence. Aside from immediate computational benefits, we discuss its properties, and show that it can lead to an efficient graph dictionary learning algorithm. We empirically demonstrate its relevance for complex tasks on graphs such as partitioning, clustering and completion.
BibTeX:
@inproceedings{vincent2022semi,
author = {Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and   Vayer, Titouan and Courty, Nicolas},
title = {Semi-relaxed Gromov Wasserstein divergence with applications on graphs},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2022}
}
L. Chapel, R. Flamary, H. Wu, C. Févotte, G. Gasso, Unbalanced Optimal Transport through Non-negative Penalized Linear Regression, Neural Information Processing Systems (NeurIPS), 2021.
Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
BibTeX:
@inproceedings{chapel2021unbalanced,
author = {Chapel, Laetitia and Flamary, Rémi and Wu, Haoran and Févotte, Cédric   and Gasso, Gilles},
title = {Unbalanced Optimal Transport through Non-negative Penalized Linear Regression},
booktitle = {Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
K. Fatras, B. Bhushan Damodaran, S. Lobry, R. Flamary, D. Tuia, N. Courty, Wasserstein Adversarial Regularization for learning with label noise, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2021.
Abstract: Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors.
BibTeX:
@article{damodaran2021wasserstein,
author = { Fatras, Kilian and Bhushan Damodaran, Bharath and Lobry, Sylvain and Flamary, Rémi and Tuia, Devis and Courty, Nicolas},
title = {Wasserstein Adversarial Regularization for learning with label          noise},
journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on },
year = {2021}
}
C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.
Abstract: Dictionary learning is a key tool for representation learning that explains the data as linear combination of few basic elements. Yet, this analysis is not amenable in the context of graph learning, as graphs usually belong to different metric spaces. We fill this gap by proposing a new online Graph Dictionary Learning approach, which uses the Gromov Wasserstein divergence for the data fitting term. In our work, graphs are encoded through their nodes' pairwise relations and modeled as convex combination of graph atoms, i.e. dictionary elements, estimated thanks to an online stochastic algorithm, which operates on a dataset of unregistered graphs with potentially different number of nodes. Our approach naturally extends to labeled graphs, and is completed by a novel upper bound that can be used as a fast approximation of Gromov Wasserstein in the embedding space. We provide numerical evidences showing the interest of our approach for unsupervised embedding of graph datasets and for online graph subspace estimation and tracking.
BibTeX:
@inproceedings{vincent2021online,
author = {Vincent-Cuaz, Cédric and Vayer, Titouan and Flamary, Rémi and Corneli, Marco and Courty, Nicolas},
title = {Online Graph Dictionary Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2021}
}
K. Fatras, T. Séjourné, N. Courty, R. Flamary, Unbalanced minibatch Optimal Transport; applications to Domain Adaptation, International Conference on Machine Learning (ICML), 2021.
Abstract: Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, \em i.e. minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
BibTeX:
@inproceedings{fatras2021unbalanced,
author = {Fatras, Kilian and Séjourné, Thibault and Courty, Nicolas and   Flamary, Rémi},
title = {Unbalanced minibatch Optimal Transport; applications to Domain Adaptation},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2021}
}
R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy , I. Redko, A. Rolet, A. Schutz, V. S. a. D. J. Sutherland, R. Tavenard, A. Tong, T. Vayer, POT: Python Optimal Transport, Journal of Machine Learning Research, Vol. 22, N. 78, pp 1-8, 2021.
Abstract: Optimal transport has recently been reintroduced to the machine learning community thanks in part to novel efficient optimization procedures allowing for medium to large scale applications. We propose a Python toolbox that implements several key optimal transport ideas for the machine learning community. The toolbox contains implementations of a number of founding works of OT for machine learning such as Sinkhorn algorithm and Wasserstein barycenters, but also provides generic solvers that can be used for conducting novel fundamental research. This toolbox, named POT for Python Optimal Transport, is open source with an MIT license.
BibTeX:
@article{flamary2021pot,
author = { Rémi Flamary and Nicolas Courty and Alexandre Gramfort and   Mokhtar Z. Alaya and Aurélie Boisbunon and Stanislas Chambon and Laetitia
  Chapel and Adrien Corenflos and Kilian Fatras and Nemo Fournier and Léo
  Gautheron and Nathalie T.H. Gayraud and Hicham Janati and Alain Rakotomamonjy
  and Ievgen Redko and Antoine Rolet and Antony Schutz and Vivien Seguy and
  Danica J. Sutherland and Romain Tavenard and Alexander Tong and Titouan
  Vayer},
title = {POT: Python Optimal Transport},
journal = { Journal of Machine Learning Research},
volume = { 22},
number = { 78},
pages = { 1-8},
year = {2021}
}

News

NeurIPS 2023

2023-12-01

I will be present at NeurIPS 2023 in New Orleans. I will present with mt awesome co-authors two posters and I am an invited speaker at the Optimal Transport for Machine Learning Workshop (OTML).

Feel free to come and see me and my collaborators at our posters or during the OTML workshop (we also have posters there).

T. Gnassounou, R. Flamary, A. Gramfort, Convolutional Monge Mapping Normalization for learning on biosignals, Neural Information Processing Systems (NeurIPS), 2023.
Abstract: In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances.
BibTeX:
@inproceedings{gnassounou2023convolutional,
author = {Gnassounou, Théo and Flamary, Rémi and Gramfort, Alexandre},
title = {Convolutional Monge Mapping Normalization for learning on biosignals},
booktitle = {Neural Information Processing Systems (NeurIPS)},
editor = {},
year = {2023}
} 
H. Van Assel, T. Vayer, R. Flamary, N. Courty, SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities, Neural Information Processing Systems (NeurIPS), 2023.
Abstract: Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.
BibTeX:
@inproceedings{van2023snekhorn,
author = {Van Assel, Hugues and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
title = {SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities},
booktitle = {Neural Information Processing Systems (NeurIPS)},
editor = {},
year = {2023}
} 

The least effort theory and its applications to artificial intelligence

2023-04-12

Gabriel Peyré and I presented on March 13 2023 at Sorbonne University Jussieu in Paris, a conference for a large public where we discussed the use of optimal transport and the least effort theory in artificial intelligence applications.

I provide here the slides of the presentation (in french) and the link to the Youtube video.

Oral presentation at NeurIPS 2022

2022-11-20

The thesis work of Cédric Vincent-Cuaz on Optimal Transport for Graph Neural Networks has been accepted for a very selective oral presentation at NeuriPS 2022.

Cedric and I will be present at New Orleans for NeurIPS. Feel free to come and see us at our poster.

C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Template based Graph Neural Network with Optimal Transport Distances, Neural Information Processing Systems (NeurIPS), 2022.
Abstract: Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.
BibTeX:
@inproceedings{vincentcuaz2022template,
author = { Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and Vayer, Titouan and Courty, Nicolas},
title = {Template based Graph Neural Network with Optimal Transport   Distances},
booktitle = {Neural Information Processing Systems (NeurIPS)},
editor = {},
year = {2022}
} 

Optimal Transport for Machine Learning tutorial at Hi! Paris Summer School 2022

2022-06-15

I will be giving a tutorial about Optimal transport for machine learning for the Hi! Paris Summer School 2022 on July 4 2022 at Ecole Polytechnique in Paris/Saclay, France.

The presentation slides are available below:

  • Part 1 : Intro to Optimal Transport [PDF].
  • Part 2: Optimal Transform for Machine learning [PDF].