Professional website
All Journals Conferences Books and chapters Others
Submited and preprint |
2021 |
Q. H. Tran, H. Janati, I. Redko, F. Rémi, N. Courty, Factored couplings in multi-marginal optimal transport via difference of convex programming, NeurIPS 2021 Optimal Transport and Machine Learning Workshop (OTML), 2021. |
Abstract: Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes. |
BibTeX:
@conference{tran2021factored, author = {Tran, Quang Huy and Janati, Hicham and Redko, Ievgen and Flamary Rémi and Courty, Nicolas}, title = {Factored couplings in multi-marginal optimal transport via difference of convex programming}, howpublished = { NeurIPS 2021 Optimal Transport and Machine Learning Workshop (OTML)}, year = {2021} } |
2019 |
R. Flamary, Optimal Transport for Machine Learning, Université Cote d'Azur, 2019. |
Abstract: In this document, I present several recent contributions to machine learning using optimal transport (OT) theory. The first part of the document introduces the optimal transport problem and discuss several algorithm designed to solve its original and regularized formulation. Next I present contributions to machine learning that focus on 4 different aspects of OT. I introduce first the use of approximate Monge mapping for domain adaptation and then the use of OT divergence such as Wasserstein distance for histogram and empirical data. Finally I discuss shortly recent results that aim at extending OT as a distance between structured data such as labeled graphs. |
BibTeX:
@phdthesis{flamady2019hdr, author = { Flamary, R.}, title = {Optimal Transport for Machine Learning}, school = { Université Cote d'Azur}, year = {2019} } |
R. Flamary, K. Lounici, A. Ferrari, Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation, 2019. |
Abstract: This article investigates the quality of the estimator of the linear Monge mapping between distributions. We provide the first concentration result on the linear mapping operator and prove a sample complexity of n^−1/2 when using empirical estimates of first and second order moments. This result is then used to derive a generalization bound for domain adaptation with optimal transport. As a consequence, this method approaches the performance of theoretical Bayes predictor under mild conditions on the covariance structure of the problem. We also discuss the computational complexity of the linear mapping estimation and show that when the source and target are stationary the mapping is a convolution that can be estimated very efficiently using fast Fourier transforms. Numerical experiments reproduce the behavior of the proven bounds on simulated and real data for mapping estimation and domain adaptation on images. |
BibTeX:
@techreport{flamary2019concentration, author = { Flamary, Rémi and Lounici, Karim and Ferrari, André}, title = {Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation}, year = {2019} } |
L. Dragoni, R. Flamary, K. Lounici, P. Reynaud-Bouret, Large scale Lasso with windowed active set for convolutional spike sorting, 2019. |
Abstract: Spike sorting is a fundamental preprocessing step in neuroscience that is central to access simultaneous but distinct neuronal activities and therefore to better understand the animal or even human brain. But numerical complexity limits studies that require processing large scale datasets in terms of number of electrodes, neurons, spikes and length of the recorded signals. We propose in this work a novel active set algorithm aimed at solving the Lasso for a classical convolutional model. Our algorithm can be implemented efficiently on parallel architecture and has a linear complexity w.r.t. the temporal dimensionality which ensures scaling and will open the door to online spike sorting. We provide theoretical results about the complexity of the algorithm and illustrate it in numerical experiments along with results about the accuracy of the spike recovery and robustness to the regularization parameter. |
BibTeX:
@techreport{dragoni2019large, author = {Dragoni, Laurent and Flamary, Rémi and Lounici, Karim and Reynaud-Bouret, Patricia}, title = {Large scale Lasso with windowed active set for convolutional spike sorting}, year = {2019} } |
2018 |
A. Rakotomamonjy, A. Traore, M. Berar, R. Flamary, N. Courty, Distance Measure Machines, 2018. |
Abstract: This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of Balcan 2008 to the population distribution case and we prove that, for some learning problems, Wasserstein distance achieves low-error linear decision functions with high probability. Our key result is to prove that the theory also holds for empirical distributions. Algorithmically, the proposed approach is very simple as it consists in computing a mapping based on pairwise Wasserstein distances and then learning a linear decision function. Our experimental results show that this Wasserstein distance embedding performs better than kernel mean embeddings and computing Wasserstein distance is far more tractable than estimating pairwise Kullback-Leibler divergence of empirical distributions. |
BibTeX:
@techreport{rakotomamonjy2018wasserstein, author = {Rakotomamonjy, Alain and Traore, Abraham and Berar, Maxime and Flamary, Remi and Courty, Nicolas}, title = {Distance Measure Machines}, year = {2018} } |
2017 |
R. Mourya, A. Ferrari, R. Flamary, P. Bianchi, C. Richard, Distributed Deblurring of Large Images of Wide Field-Of-View, 2017. |
Abstract: Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view suffering from shift-variant blur. Most of the existing image deblurring techniques are designed and implemented to work efficiently on centralized computing system having multiple processors and a shared memory. Thus, the largest image that can be handle is limited by the size of the physical memory available on the system. In this paper, we propose a distributed nonblind image deblurring algorithm in which several connected processing nodes (with reasonable computational resources) process simultaneously different portions of a large image while maintaining certain coherency among them to finally obtain a single crisp image. Unlike the existing centralized techniques, image deblurring in distributed fashion raises several issues. To tackle these issues, we consider certain approximations that trade-offs between the quality of deblurred image and the computational resources required to achieve it. The experimental results show that our algorithm produces the similar quality of images as the existing centralized techniques while allowing distribution, and thus being cost effective for extremely large images. |
BibTeX:
@techreport{mourya2017distdeblur, author = {Mourya, Rahul and Ferrari, Andre and Flamary, Remi and Bianchi, Pascal and Richard, Cedric}, title = {Distributed Deblurring of Large Images of Wide Field-Of-View}, year = {2017} } |
2016 |
D. Mary, R. Flamary, C. Theys, C. Aime, Mathematical Tools for Instrumentation and Signal Processing in Astronomy, 2016. |
Abstract: This book is a collection of 13 articles corresponding to lectures and research works exposed at the Summer school of the CNRS titled « Bases mathématiques pour l’instrumentation et le traitement du signal en astronomie ». The school took place in Nice and Porquerolles, France, from June 1 to 5, 2015. This book contains three parts: I. Astronomy in the coming decade and beyond The three chapters of this part emphasize the strong interdisciplinary nature of Astrophysics, both at theoretical and observational levels, and the increasingly larger sizes of data sets produced by increasingly more complex instruments and infrastructures. These remarkable features call in the same time for more mathematical tools in signal processing and instrumentation, in particular in statistical modeling, large scale inference, data mining, machine learning, and for efficient processing solutions allowing their implementation. II. Mathematical concepts, methods and tools The first chapter of this part starts with an example of how pure mathematics can lead to new instrumental concepts, in this case for exoplanet detection. The four other chapters of this part provide a detailed introduction to four main topics: Orthogonal functions as a powerful tool for modeling signals and images, covering Fourier, Fourier-Legendre, Fourier-Bessel series for 1D signals and Spherical Harmonic series for 2D signals; Optimization and machine learning methods with application to inverse problems, denoising and classication, with on-line numerical experiments; Large scale statistical inference with adaptive procedures allowing to control the False Discovery Rate, like the Benjamini-Hochberg procedure, its Bayesian interpretation and some variations; Processing solutions for large data sets, covering the Hadoop framework and YARN, the main tools for the management of both the storage and computing capacities of a cluster of machines and also recent solutions like Spark. III. Application: tools in action This parts collects a number of current research works where some tools above are presented in action: optimization for deconvolution, statistical modeling, multiple testing, optical and instrumental models. The applications of this part include astronomical imaging, detection and estimation of circumgalactic structures, and detection of exoplanets. |
BibTeX:
@book{mary2016mathematical, author = {Mary, David and Flamary, Remi and Theys, Celine and Aime, Claude}, title = {Mathematical Tools for Instrumentation and Signal Processing in Astronomy}, publisher = {EDP Sciences}, year = {2016} } |
2015 |
R. Flamary, I. Harrane, M. Fauvel, S. Valero, M. Dalla Mura, Discrimination périodique à partir d’observations multi-temporelles, GRETSI, 2015. |
Abstract: In this work, we propose a novel linear classification scheme for non-stationary periodic data. We express the classifier in a temporal basis while regularizing its temporal complexity leading to a convex optimization problem. Numerical experiments show very good results on a simulated example and on real life remote sensing image classification problem. |
BibTeX:
@conference{flamary2015discrimination, author = {Flamary, R. and Harrane, I. and Fauvel, M. and Valero, S. and Dalla Mura, M.}, title = {Discrimination périodique à partir d’observations multi-temporelles}, booktitle = {GRETSI}, year = {2015} } |
A. Rakotomamonjy, R. Flamary, N. Courty, Generalized conditional gradient: analysis of convergence and applications, 2015. |
Abstract: |
BibTeX:
@techreport{rakotomamonjy2015generalized, author = {Rakotomamonjy, Alain and Flamary, Rémi and Courty, Nicolas}, title = {Generalized conditional gradient: analysis of convergence and applications}, year = {2015} } |
2014 |
R. Flamary, N. Courty, D. Tuia, A. Rakotomamonjy, Optimal transport with Laplacian regularization: Applications to domain adaptation and shape matching, NIPS Workshop on Optimal Transport and Machine Learning OTML, 2014. |
Abstract: We propose a method based on optimal transport for empirical distributions with Laplacian regularization (LOT). Laplacian regularization is a graph-based regularization that can encode neighborhood similarity between samples either on the final position of the transported samples or on their displacement as in the work of Ferradans et al.. In both cases, LOT is expressed as a quadratic programming problem and can be solved with a Frank-Wolfe algorithm with optimal step size. Results on domain adaptation and a shape matching problems show the interest of using this regularization in optimal transport. |
BibTeX:
@conference{flamary2014optlaplace, author = { Flamary, R. and Courty, N.. and Tuia, D. and Rakotomamonjy, A.}, title = {Optimal transport with Laplacian regularization: Applications to domain adaptation and shape matching}, booktitle = { }, howpublished = { NIPS Workshop on Optimal Transport and Machine Learning OTML}, year = {2014} } |
2011 |
R. Flamary, Apprentissage statistique pour le signal: applications aux interfaces cerveau-machine, Laboratoire LITIS, Université de Rouen, 2011. |
Abstract: Brain Computer Interfaces (BCI) require the use of statistical learning methods for signal recognition. In this thesis we propose a general approach using prior knowledge on the problem at hand through regularization. To this end, we learn jointly the classifier and the feature extraction step in a unique optimization problem. We focus on the problem of sensor selection, and propose several regularization terms adapted to the problem. Our first contribution is a filter learning method called large margin filtering. It consists in learning a filtering maximizing the margin between samples of each classe so as to adapt to the properties of the features. In addition, this approach is easy to interpret and can lead to the selection of the most relevant sensors. Numerical experiments on a real life BCI problem and a 2D image classification show the good behaviour of our method both in terms of performance and interpretability. The second contribution is a general sparse multitask learning approach. Several classifiers are learned jointly and discriminant kernels for all the tasks are automatically selected. We propose some efficient algorithms and numerical experiments have shown the interest of our approach. Finally, the third contribution is a direct application of the sparse multitask learning to a BCI event-related potential classification problem. We propose an adapted regularization term that promotes both sensor selection and similarity between the classifiers. Numerical experiments show that the calibration time of a BCI can be drastically reduced thanks to the proposed multitask approach. |
BibTeX:
@phdthesis{thesis2011, author = { Flamary, R.}, title = {Apprentissage statistique pour le signal: applications aux interfaces cerveau-machine}, school = { Laboratoire LITIS, Université de Rouen}, year = {2011} } |
2010 |
R. Flamary, B. Labbé, A. Rakotomamonjy, Filtrage vaste marge pour l'étiquetage séquentiel de signaux, Conference en Apprentissage CAp, 2010. |
Abstract: Ce papier traite de l’étiquetage séquentiel de signaux, c’est-à-dire de discrimination pour des échantillons temporels. Dans ce contexte, nous proposons une méthode d’apprentissage pour un filtrage vaste-marge séparant au mieux les classes. Nous apprenons ainsi de manière jointe un SVM sur des échantillons et un filtrage temporel de ces échantillons. Cette méthode permet l’étiquetage en ligne d’échantillons temporels. Un décodage de séquence hors ligne optimal utilisant l’algorithme de Viterbi est également proposé. Nous introduisons différents termes de régularisation, permettant de pondérer ou de sélectionner les canaux automatiquement au sens du critère vaste-marge. Finalement, notre approche est testée sur un exemple jouet de signaux non-linéaires ainsi que sur des données réelles d’Interface Cerveau-Machine. Ces expériences montrent l’intérêt de l’apprentissage supervisé d’un filtrage temporel pour l’étiquetage de séquence. |
BibTeX:
@conference{flamcap2010, author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.}, title = {Filtrage vaste marge pour l'étiquetage séquentiel de signaux}, booktitle = { Conference en Apprentissage CAp}, year = {2010} } |
2009 |
R. Flamary, B. Labbé, A. Rakotomamonjy, Large margin filtering for signal segmentation, NIPS Workshop on Temporal Segmentation NIPS Workshop in Temporal Segmentation, 2009. |
Abstract: |
BibTeX:
@conference{nipsworkshop2009, author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.}, title = {Large margin filtering for signal segmentation}, booktitle = { NIPS Workshop on Temporal Segmentation}, howpublished = { NIPS Workshop in Temporal Segmentation}, year = {2009} } |
R. Flamary, A. Rakotomamonjy, G. Gasso, S. Canu, Selection de variables pour l'apprentissage simultanée de tâches, Conférence en Apprentissage (CAp'09), 2009. |
Abstract: Cet article traite de la sélection de variables pour l’apprentissage simultanée de taches de discrimination SVM . Nous formulons ce problème comme étant un apprentissage multi-taches avec pour terme de régularisation une norme mixte de type `p `2 avec p <1 . Cette dernière permet d’obtenir des modèles de discrimination pour chaque tâche, utilisant un même sous-ensemble des variables. Nous proposons tout d’abord un algorithme permettant de résoudre le problème d’apprentissage lorsque la norme mixte est convexe (p = 1). Ensuite, à l’aide de la programmation DC, nous traitons le cas non-convexe (p < 1) . Nous montrons que ce dernier cas peut être résolu par un algorithme itératif où, à chaque itération, un problème basé sur la norme mixte `1 `2 est résolu. Nos expériences montrent l’interêt de la méthode sur quelques problèmes de discriminations simultanées. |
BibTeX:
@conference{cap09, author = { Flamary, R. and Rakotomamonjy, A. and Gasso, G. and Canu, S.}, title = {Selection de variables pour l'apprentissage simultanée de tâches}, booktitle = { Conférence en Apprentissage (CAp'09)}, year = {2009} } |
R. Flamary, A. Rakotomamonjy, G. Gasso, S. Canu, SVM Multi-Task Learning and Non convex Sparsity Measure, The Learning Workshop The Learning Workshop (Snowbird), 2009. |
Abstract: |
BibTeX:
@conference{snowbird09, author = { R. Flamary and A. Rakotomamonjy and G. Gasso and S. Canu}, title = {SVM Multi-Task Learning and Non convex Sparsity Measure}, booktitle = { The Learning Workshop}, howpublished = { The Learning Workshop (Snowbird)}, year = {2009} } |
2008 |
R. Flamary, Filtrage de surfaces obtenues à partir de structures M-Rep (M-Rep obtained surface filtering), Laboratoire CREATIS-LRMN, INSA de Lyon, 2008. |
Abstract: |
BibTeX:
@mastersthesis{mrep08, author = { Flamary, R.}, title = {Filtrage de surfaces obtenues à partir de structures M-Rep (M-Rep obtained surface filtering)}, school = { Laboratoire CREATIS-LRMN, INSA de Lyon}, year = {2008} } |