Rémi Flamary / Publications / Journals and book chapters

Publications / Journals and book chapters

All Journals Conferences Books and chapters Others

Submited and preprint

S. Boïté, E. Tanguy, J. Delon, A. Desolneux, R. Flamary, Differentiable expectation-maximisation and applications to gaussian mixture model optimal transport (Submited), 2025.

[Abstract] [BibTeX] [PDF]

Abstract: The Expectation-Maximisation (EM) algorithm is a central tool in statistics and machine learning, widely used for latent-variable models such as Gaussian Mixture Models (GMMs). Despite its ubiquity, EM is typically treated as a non-differentiable black box, preventing its integration into modern learning pipelines where end-to-end gradient propagation is essential. In this work, we present and compare several differentiation strategies for EM, from full automatic differentiation to approximate methods, assessing their accuracy and computational efficiency. As a key application, we leverage this differentiable EM in the computation of the Mixture Wasserstein distance MW2 between GMMs, allowing MW2 to be used as a differentiable loss in imaging and machine learning tasks. To complement our practical use of MW2, we contribute a novel stability result which provides theoretical justification for the use of MW2 with EM, and also introduce a novel unbalanced variant of MW2. Numerical experiments on barycentre computation, colour and style transfer, image generation, and texture synthesis illustrate the versatility of the proposed approach in different settings.

BibTeX:

@article{boite2025differentiable,
author = {Boïté, Samuel and Tanguy, Eloi and Delon, Julie and Desolneux, Agnès and Flamary, Rémi},
title = {Differentiable expectation-maximisation and applications to gaussian mixture model optimal transport},
year = {2025 (Submited)}
}

T. Gnassounou, A. Collas, R. Flamary, K. Lounici, A. Gramfort, Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment (Submited), 2024.

[Abstract] [BibTeX] [URL] [PDF]

Abstract: Machine learning applications on signals such as computer vision or biomedical data often face significant challenges due to the variability that exists across hardware devices or session recordings. This variability poses a Domain Adaptation (DA) problem, as training and testing data distributions often differ. In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities. This Optimal Transport (OT) based method adapts the cross-power spectrum density (cross-PSD) of multivariate signals by mapping them to the Wasserstein barycenter of source domains (multi-source DA). Predictions for new domains can be done with a filtering without the need for retraining a model with source data (test-time DA). We also study and discuss two special cases of the method, Temporal Monge Alignment (TMA) and Spatial Monge Alignment (SMA). Non-asymptotic concentration bounds are derived for the mappings estimation, which reveals a bias-plus-variance error structure with a variance decay rate of $O(n^-1/2)$ with n the signal length. This theoretical guarantee demonstrates the efficiency of the proposed computational schema. Numerical experiments on multivariate biosignals and image data show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings. Notably, STMA is a pre-processing step complementary to state-of-the-art deep learning methods.

BibTeX:

@article{gnassounou2024multisourcetesttimedomainadaptation,
author = {Théo Gnassounou and Antoine Collas and Rémi Flamary and Karim Lounici and Alexandre Gramfort},
title = {Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment},
year = {2024 (Submited)}
}

C. Le Coz, A. Tantet, R. Flamary, R. Plougonven, A barycenter-based approach for the multi-model ensembling of subseasonal forecasts (Submited), 2023.

[Abstract] [BibTeX] [URL] [PDF]

Abstract: Ensemble forecasts and their combination are explored from the perspective of a probability space. Manipulating ensemble forecasts as discrete probability distributions, multi-model ensembles (MMEs) are reformulated as barycenters of these distributions. Barycenters are defined with respect to a given distance. The barycenter with respect to the L2-distance is shown to be equivalent to the pooling method. Then, the barycenter-based approach is extended to a different distance with interesting properties in the distribution space: the Wasserstein distance. Another interesting feature of the barycenter approach is the possibility to give different weights to the ensembles and so to naturally build weighted MME. As a proof of concept, the L2- and the Wasserstein-barycenters are applied to combine two models from the S2S database, namely the European Centre Medium-Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction (NCEP) models. The performance of the two (weighted-) MMEs are evaluated for the prediction of weekly 2m-temperature over Europe for seven winters. The weights given to the models in the barycenters are optimized with respect to two metrics, the CRPS and the proportion of skilful forecasts. These weights have an important impact on the skill of the two barycenter-based MMEs. Although the ECMWF model has an overall better performance than NCEP, the barycenter-ensembles are generally able to outperform both. However, the best MME method, but also the weights, are dependent on the metric. These results constitute a promising first implementation of this methodology before moving to combination of more models.

BibTeX:

@article{coz2023barycenterbased,
author = {Le Coz,Camille and Tantet, Alexis  and Flamary, Rémi  and Plougonven, Riwal},
title = {A barycenter-based approach for the multi-model ensembling of subseasonal forecasts},
year = {2023 (Submited)}
}

2026

T. Germain, R. Flamary, V. R. Kostic, K. Lounici, A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems, International Conference on Learning Representations (ICLR), 2026.

[Abstract] [BibTeX] [PDF] [Code]

Abstract: The geometry of dynamical systems estimated from trajectory data is a major challenge for machine learning applications. Koopman and transfer operators provide a linear representation of nonlinear dynamics through their spectral decomposition, offering a natural framework for comparison. We propose a novel approach representing each system as a distribution of its joint operator eigenvalues and spectral projectors and defining a metric between systems leveraging optimal transport. The proposed metric is invariant to the sampling frequency of trajectories. It is also computationally efficient, supported by finite-sample convergence guarantees, and enables the computation of Fréchet means, providing interpolation between dynamical systems. Experiments on simulated and real-world datasets show that our approach consistently outperforms standard operator-based distances in machine learning applications, including dimensionality reduction and classification, and provides meaningful interpolation between dynamical systems.

BibTeX:

@article{germain2026spectral,
author = {Thibaut Germain and Rémi Flamary and Vladimir R. Kostic and Karim Lounici},
title = {A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}

V. Kondratyev, A. Fishkov, N. Kotelevskii, M. Hegazy, R. Flamary, M. Panov, E. Moulines, Neural Optimal Transport Meets Multivariate Conformal Prediction, International Conference on Learning Representations (ICLR), 2026.

[Abstract] [BibTeX] [PDF]

Abstract: We propose a framework for conditional vector quantile regression (CVQR) that combines neural optimal transport with amortized optimization, and apply it to multivariate conformal prediction. Classical quantile regression does not extend naturally to multivariate responses, while existing approaches often ignore the geometry of joint distributions. Our method parametrizes the conditional vector quantile function as the gradient of a convex potential implemented by an input-convex neural network, ensuring monotonicity and uniform ranks. To reduce the cost of solving high-dimensional variational problems, we introduced amortized optimization of the dual potentials, yielding efficient training and faster inference. We then exploit the induced multivariate ranks for conformal prediction, constructing distribution-free predictive regions with finite-sample validity. Unlike coordinatewise methods, our approach adapts to the geometry of the conditional distribution, producing tighter and more informative regions. Experiments on benchmark datasets show improved coverage-efficiency trade-offs compared to baselines, highlighting the benefits of integrating neural optimal transport with conformal prediction.

BibTeX:

@article{kondratyev2026neural,
author = {Kondratyev, Vladimir and Fishkov, Alexander and Kotelevskii, Nikita and Hegazy, Mahmoud and Flamary, Remi and Panov, Maxim and Moulines, Eric},
title = {Neural Optimal Transport Meets Multivariate Conformal Prediction},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}

T. Gnassounou, A. Collas, R. Flamary, A. Gramfort, PSDNorm: Test-Time Temporal Normalization for Deep Learning in Sleep Staging, International Conference on Learning Representations (ICLR), 2026.

[Abstract] [BibTeX] [URL] [PDF] [Code]

Abstract: Distribution shift poses a significant challenge in machine learning, particularly in biomedical applications such as EEG signals collected across different subjects, institutions, and recording devices. While existing normalization layers, Batch-Norm, LayerNorm and InstanceNorm, help address distribution shifts, they fail to capture the temporal dependencies inherent in temporal signals. In this paper, we propose PSDNorm, a layer that leverages Monge mapping and temporal context to normalize feature maps in deep learning models. Notably, the proposed method operates as a test-time domain adaptation technique, addressing distribution shifts without additional training. Evaluations on 10 sleep staging datasets using the U-Time model demonstrate that PSDNorm achieves state-of-the-art performance at test time on datasets not seen during training while being 4x more data-efficient than the best baseline. Additionally, PSDNorm provides a significant improvement in robustness, achieving markedly higher F1 scores for the 20% hardest subjects.

BibTeX:

@article{gnassounou2026psdnorm,
author = {Théo Gnassounou and Antoine Collas and Rémi Flamary and Alexandre Gramfort},
title = {PSDNorm: Test-Time Temporal Normalization for Deep Learning in Sleep Staging},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}

2025

Y. Lalou, T. Gnassounou, A. Collas, A. de Mathelin, O. Kachaiev, A. Odonnat, A. Gramfort, T. Moreau, R. Flamary, SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation, Transactions of Machine Learning Research (TMLR), 2025.

[Abstract] [BibTeX] [PDF] [Code]

Abstract: Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many methods have been proposed in the literature, fair and realistic evaluation remains an open question, particularly due to methodological difficulties in selecting hyperparameters in the unsupervised setting. With SKADA-Bench, we propose a framework to evaluate DA methods and present a fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment. Realistic hyperparameter selection is performed with nested cross-validation and various unsupervised model selection scores, on both simulated datasets with controlled shifts and real-world datasets across diverse modalities, such as images, text, biomedical, and tabular data with specific feature extraction. Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications, with key insights into the choice and impact of model selection approaches. SKADA-Bench is open-source, reproducible, and can be easily extended with novel DA methods, datasets, and model selection criteria without requiring re-evaluating competitors. SKADA-Bench is available on GitHub at

BibTeX:

@article{lalou2025skadabench,
author = {Yanis Lalou and Théo Gnassounou and Antoine Collas and de Mathelin, Antoine  and Oleksii Kachaiev and Ambroise Odonnat and Alexandre Gramfort and Thomas Moreau and Rémi Flamary},
title = {SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation},
journal = { Transactions of Machine Learning Research (TMLR)},
year = {2025}
}

H. Van Assel, C. Vincent-Cuaz, N. Courty, R. Flamary, P. Frossard, T. Vayer, Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection, Transactions of Machine Learning Research (TMLR), 2025.

[Abstract] [BibTeX] [PDF] [Code]

Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto interpretable spaces or organizing points into meaningful clusters. In practice, these methods are used sequentially, without guaranteeing that the clustering aligns well with the conducted dimensionality reduction. In this work, we offer a fresh perspective: that of distributions. Leveraging tools from optimal transport, particularly the Gromov-Wasserstein distance, we unify clustering and dimensionality reduction into a single framework called distributional reduction. This allows us to jointly address clustering and dimensionality reduction with a single optimization problem. Through comprehensive experiments, we highlight the versatility and interpretability of our method and show that it outperforms existing approaches across a variety of image and genomics datasets.

BibTeX:

@article{vanassel2024distributional,
author = {Van Assel, Hugues  and Cédric Vincent-Cuaz and Nicolas Courty and Rémi Flamary and Pascal Frossard and Titouan Vayer},
title = {Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection},
journal = { Transactions of Machine Learning Research (TMLR)},
year = {2025}
}

2024

E. Tanguy, R. Flamary, J. Delon, Properties of Discrete Sliced Wasserstein Losses, Mathematics of Computation, 2024.

[Abstract] [BibTeX] [URL] [DOI] [PDF]

Abstract: The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of the SW distance between two uniform discrete measures with the same amount of points as a function of the support Y of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation Ep (estimating the expectation in SW using only p samples) and show convergence results on the critical points of Ep to those of E, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising E and Ep converge towards (Clarke) critical points of these energies.

BibTeX:

@article{tanguy2024properties,
author = {Eloi Tanguy and Rémi Flamary and Julie Delon},
title = {Properties of Discrete Sliced Wasserstein Losses},
journal = { Mathematics of Computation},
year = {2024}
}

A. Trovato, É. Chassande-Mottin, M. S. Bejger, R. Flamary, N. Courty, Neural network time-series classifiers for gravitational-wave searches in single-detector periods, Classical and Quantum Gravity, 2024.

[Abstract] [BibTeX] [URL] [DOI] [PDF] [Data]

Abstract: The search for gravitational-wave signals is limited by non-Gaussian transient noises that mimic astrophysical signals. Temporal coincidence between two or more detectors is used to mitigate contamination by these instrumental glitches. However, when a single detector is in operation, coincidence is impossible, and other strategies have to be used. We explore the possibility of using neural network classifiers and present the results obtained with three types of architectures: convolutional neural network, temporal convolutional network, and inception time. The last two architectures are specifically designed to process time-series data. The classifiers are trained on a month of data from the LIGO Livingston detector during the first observing run (O1) to identify data segments that include the signature of a binary black hole merger. Their performances are assessed and compared. We then apply trained classifiers to the remaining three months of O1 data, focusing specifically on single-detector times. The most promising candidate from our search is 2016-01-04 12:24:17 UTC. Although we are not able to constrain the significance of this event to the level conventionally followed in gravitational-wave searches, we show that the signal is compatible with the merger of two black holes with masses m1=50.7+10.4-8.9M⊙ and m2=24.4+20.2-9.3M at the luminosity distance of dL=564+812-338Mpc.

BibTeX:

@article{trovato2024neural,
author = {Trovato, Agata and Chassande-Mottin, Éric and Bejger, Michal Stanislaw and Flamary, Rémi and Courty, Nicolas},
title = {Neural network time-series classifiers for gravitational-wave searches in single-detector periods},
journal = { Classical and Quantum Gravity},
year = {2024}
}

E. Tanguy, R. Flamary, J. Delon, Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance, Comptes Rendus. Mathématique, Vol. 362, pp 1121--1129, 2024.

[Abstract] [BibTeX] [URL] [DOI] [PDF]

Abstract: This paper deals with the reconstruction of a discrete discrete measure $\gamma_Z$ on $\mathbbR^d$ from the knowledge of its pushforwards measures $P_i\#\gamma_Z$ by linear linear applications $P_i: \mathbbR^d \rightarrow \mathbbR^d_i$ (for instance projections onto subspaces). The measure $\gamma_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $\gamma_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.

BibTeX:

@article{tanguy2024reconstructing,
author = {Eloi Tanguy and Rémi Flamary and Julie Delon},
title = {Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance},
journal = { Comptes Rendus. Mathématique},
volume = { 362},
pages = { 1121--1129},
publisher = { Académie des sciences, Paris},
year = {2024}
}

2023

D. Bouche, R. Flamary, F. d'Alché-Buc, R. Plougonven, M. Clausel, J. Badosa, P. Drobinski, Wind power predictions from nowcasts to 4-hour forecasts: a learning approach with variable selection, Renewable Energy, 2023.

[Abstract] [BibTeX] [DOI] [PDF]

Abstract: We study the prediction of short term wind speed and wind power (every 10 minutes up to 4 hours ahead). Accurate forecasts for those quantities are crucial to mitigate the negative effects of wind farms' intermittent production on energy systems and markets. For those time scales, outputs of numerical weather prediction models are usually overlooked even though they should provide valuable information on higher scales dynamics. In this work, we combine those outputs with local observations using machine learning. So as to make the results usable for practitioners, we focus on simple and well known methods which can handle a high volume of data. We study first variable selection through two simple techniques, a linear one and a nonlinear one. Then we exploit those results to forecast wind speed and wind power still with an emphasis on linear models versus nonlinear ones. For the wind power prediction, we also compare the indirect approach (wind speed predictions passed through a power curve) and the indirect one (directly predict wind power).

BibTeX:

@article{bouche2023wind,
author = { Bouche, Dimitri and Flamary, Rémi and d'Alché-Buc, Florence and   Plougonven, Riwal and Clausel, Marianne and Badosa, Jordi and Drobinski, Philippe},
title = {Wind power predictions from nowcasts to 4-hour forecasts: a learning   approach with variable selection},
journal = {Renewable Energy},
year = {2023}
}

2022

T. Vayer, L. Chapel, N. Courty, R. Flamary, Y. Soullard, R. Tavenard, Time Series Alignment with Global Invariances, Transactions on Machine Learning Research (TMLR), 2022.

[Abstract] [BibTeX] [URL] [PDF] [Code]

Abstract: In this work we address the problem of comparing time series while taking into account both feature space transformation and temporal variability. The proposed framework combines a latent global transformation of the feature space with the widely used Dynamic Time Warping (DTW). The latent global transformation captures the feature invariance while the DTW (or its smooth counterpart soft-DTW) deals with the temporal shifts. We cast the problem as a joint optimization over the global transformation and the temporal alignments. The versatility of our framework allows for several variants depending on the invariance class at stake. Among our contributions we define a differentiable loss for time series and present two algorithms for the computation of time series barycenters under our new geometry. We illustrate the interest of our approach on both simulated and real world data.

BibTeX:

@article{vayer2022time,
author = {Titouan Vayer and Laetitia Chapel and Nicolas Courty and Rémi Flamary and Yann Soullard and Romain Tavenard},
title = {Time Series Alignment with Global Invariances},
journal = { Transactions on Machine Learning Research (TMLR)},
year = {2022}
}

L. Dragoni, R. Flamary, K. Lounici, P. Reynaud-Bouret, Sliding window strategy for convolutional spike sorting with Lasso: Algorithm, theoretical guarantees and complexity, Acta Applicandae Mathematicae, Vol. 179, N. 78, 2022.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: We present a fast algorithm for the resolution of the Lasso for convolutional models in high dimension, with a particular focus on the problem of spike sorting in neuroscience. Making use of biological properties related to neurons, we explain how the particular structure of the problem allows several optimizations, leading to an algorithm with a temporal complexity which grows linearly with respect to the size of the recorded signal and can be performed online. Moreover the spatial separability of the initial problem allows to break it into subproblems, further reducing the complexity and making possible its application on the latest recording devices which comprise a large number of sensors. We provide several mathematical results: the size and numerical complexity of the subproblems can be estimated mathematically by using percolation theory. We also show under reasonable assumptions that the Lasso estimator retrieves the true support with large probability. Finally the theoretical time complexity of the algorithm is given. Numerical simulations are also provided in order to illustrate the efficiency of our approach.

BibTeX:

@article{dragoni2022sliding,
author = {Dragoni, Laurent and Flamary, Rémi and Lounici, Karim and Reynaud-Bouret, Patricia},
title = {Sliding window strategy for convolutional spike sorting with Lasso: Algorithm, theoretical guarantees and complexity},
journal = { Acta Applicandae Mathematicae},
volume = { 179},
number = { 78},
year = {2022}
}

C. Brouard, J. Mariette, R. Flamary, N. Vialaneix, Feature selection for kernel methods in systems biology, NAR Genomics and Bioinformatics, Vol. 4, N. 1, pp lqac014, 2022.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: The substantial development of high-throughput bio-technologies has rendered large-scale multi-omics datasets increasingly available. New challenges have emerged to process and integrate this large volume of information, often obtained from widely heterogeneous sources. Kernel methods have proven successful to handle the analysis of different types of datasets obtained on the same individuals. However, they usually suffer from a lack of interpretability since the original description of the individuals is lost due to the kernel embedding. We propose novel feature selection methods that are adapted to the kernel framework and go beyond the well established work in supervised learning by addressing the more difficult tasks of unsupervised learning and kernel output learning. The method is expressed under the form of a non-convex optimization problem with a L1 penalty, which is solved with a proximal gradient descent approach. It is tested on several systems biology datasets and shows good performances in selecting relevant and less redundant features compared to existing alternatives. It also proved relevant for identifying important governmental measures best explaining the time series of Covid-19 reproducing number evolution during the first months of 2020. The proposed feature selection method is embedded in the R package mixKernel version 0.7, published on CRAN.

BibTeX:

@article{brouard2022feature,
author = {Brouard, Céline and Mariette, Jér\^ome and Flamary, Rémi and   Vialaneix, Nathalie},
title = {Feature selection for kernel methods in systems biology},
journal = {NAR Genomics and Bioinformatics},
volume = {4},
number = {1},
pages = {lqac014},
publisher = {Oxford University Press},
year = {2022}
}

2021

A. Rakotomamonjy, R. Flamary, G. Gasso, M. Z. Alaya, M. Berar, N. Courty, Optimal Transport for Conditional Domain Matching and Label Shift, Machine Learning, 2021.

[Abstract] [BibTeX] [PDF] [Code]

Abstract: We address the problem of unsupervised domain adaptation under the setting of generalized target shift (both class-conditional and label shifts occur). We show that in that setting, for good generalization, it is necessary to learn with similar source and target label distributions and to match the class-conditional probabilities. For this purpose, we propose an estimation of target label proportion by blending mixture estimation and optimal transport. This estimation comes with theoretical guarantees of correctness. Based on the estimation, we learn a model by minimizing a importance weighted loss and a Wasserstein distance between weighted marginals. We prove that this minimization allows to match class-conditionals given mild assumptions on their geometry. Our experimental results show that our method performs better on average than competitors accross a range domain adaptation problems including digits,VisDA and Office.

BibTeX:

@article{rakotomamonjy2021optimal,
author = {Rakotomamonjy, Alain and Flamary, Rémi and Gasso, Gilles and Alaya, Mokhtar Z and Berar, Maxime and Courty, Nicolas},
title = {Optimal Transport for Conditional Domain Matching and Label Shift},
journal = {Machine Learning},
year = {2021}
}

R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy , I. Redko, A. Rolet, A. Schutz, V. S. a. D. J. Sutherland, R. Tavenard, A. Tong, T. Vayer, POT: Python Optimal Transport, Journal of Machine Learning Research, Vol. 22, N. 78, pp 1-8, 2021.

[Abstract] [BibTeX] [URL] [PDF] [Code]

Abstract: Optimal transport has recently been reintroduced to the machine learning community thanks in part to novel efficient optimization procedures allowing for medium to large scale applications. We propose a Python toolbox that implements several key optimal transport ideas for the machine learning community. The toolbox contains implementations of a number of founding works of OT for machine learning such as Sinkhorn algorithm and Wasserstein barycenters, but also provides generic solvers that can be used for conducting novel fundamental research. This toolbox, named POT for Python Optimal Transport, is open source with an MIT license.

BibTeX:

@article{flamary2021pot,
author = { Rémi Flamary and Nicolas Courty and Alexandre Gramfort and   Mokhtar Z. Alaya and Aurélie Boisbunon and Stanislas Chambon and Laetitia
  Chapel and Adrien Corenflos and Kilian Fatras and Nemo Fournier and Léo
  Gautheron and Nathalie T.H. Gayraud and Hicham Janati and Alain Rakotomamonjy
  and Ievgen Redko and Antoine Rolet and Antony Schutz and Vivien Seguy and
  Danica J. Sutherland and Romain Tavenard and Alexander Tong and Titouan
  Vayer},
title = {POT: Python Optimal Transport},
journal = { Journal of Machine Learning Research},
volume = { 22},
number = { 78},
pages = { 1-8},
year = {2021}
}

J.C. Burnel, K. Fatras, R. Flamary, N. Courty, Generating natural adversarial Remote Sensing Images, Geoscience and Remote Sensing, IEEE Transactions on, 2021.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Over the last years, Remote Sensing Images (RSI) analysis have started resorting to using deep neural networks to solve most of the commonly faced problems, such as detection, land cover classification or segmentation. As far as critical decision making can be based upon the results of RSI analysis, it is important to clearly identify and understand potential security threats occurring in those machine learning algorithms. Notably, it has recently been found that neural networks are particularly sensitive to carefully designed attacks, generally crafted given the full knowledge of the considered deep network. In this paper, we consider the more realistic but challenging case where one wants to generate such attacks in the case of a black-box neural network. In this case, only the prediction score of the network is accessible, given a specific input. Examples that lure away the network's prediction, while being perceptually similar to real images, are called natural or unrestricted adversarial examples. We present an original method to generate such examples, based on a variant of the Wasserstein Generative Adversarial Network. We demonstrate its effectiveness on natural adversarial hyper-spectral image generation and image modification for fooling a state-of-the-art detector. Among others, we also conduct a perceptual evaluation with human annotators to better assess the effectiveness of the proposed method.

BibTeX:

@article{burnel2021generating,
author = {Burnel, Jean-Christophe and Fatras, Kilian and Flamary, Rémi and Courty, Nicolas},
title = {Generating natural adversarial Remote Sensing Images},
journal = {Geoscience and Remote Sensing, IEEE Transactions on},
year = {2021}
}

K. Fatras, B. Bhushan Damodaran, S. Lobry, R. Flamary, D. Tuia, N. Courty, Wasserstein Adversarial Regularization for learning with label noise, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2021.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors.

BibTeX:

@article{damodaran2021wasserstein,
author = { Fatras, Kilian and Bhushan Damodaran, Bharath and Lobry, Sylvain and Flamary, Rémi and Tuia, Devis and Courty, Nicolas},
title = {Wasserstein Adversarial Regularization for learning with label          noise},
journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on },
year = {2021}
}

2020

T. Vayer, L. Chapel, R. Flamary, R. Tavenard, N. Courty, Fused Gromov-Wasserstein Distance for Structured Objects, Algorithms, Vol. 13 (9), pp 212, 2020.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Optimal transport theory has recently found many applications in machine learning thanks to its capacity to meaningfully compare various machine learning objects that are viewed as distributions. The Kantorovitch formulation, leading to the Wasserstein distance, focuses on the features of the elements of the objects, but treats them independently, whereas the Gromov–Wasserstein distance focuses on the relations between the elements, depicting the structure of the object, yet discarding its features. In this paper, we study the Fused Gromov-Wasserstein distance that extends the Wasserstein and Gromov–Wasserstein distances in order to encode simultaneously both the feature and structure information. We provide the mathematical framework for this distance in the continuous setting, prove its metric and interpolation properties, and provide a concentration result for the convergence of finite samples. We also illustrate and interpret its use in various applications, where structured objects are involved.

BibTeX:

@article{vayer2020fused,
author = {Vayer, Titouan and Chapel, Laetita and Flamary, Rémi and Tavenard, Romain and Courty, Nicolas},
title = {Fused Gromov-Wasserstein Distance for Structured Objects},
journal = { Algorithms},
volume = {13 (9)},
pages = {212},
year = {2020}
}

2019

R. Rougeot, R. Flamary, D. Mary, C. Aime, Influence of surface roughness on diffraction in the externally occulted Lyot solar coronagraph, Astronomy and Astrophysics, 2019.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Context. The solar coronagraph ASPIICS will fly on the future ESA formation flying mission Proba-3. The instrument combines an external occulter of diameter 1.42m and a Lyot solar coronagraph of 5cm diameter, located downstream at a distance of 144m. Aims. The theoretical performance of the externally occulted Lyot coronagraph has been computed by assuming perfect optics. In this paper, we improve related modelling by introducing roughness scattering effects from the telescope. We have computed the diffraction at the detector, that we compare to the ideal case without perturbation to estimate the performance degradation. We have also investigated the influence of sizing the internal occulter and the Lyot stop, and we performed a sensitivity analysis on the roughness. Methods. We have built on a recently published numerical model of diffraction propagation. The micro-structures of the telescope are built by filtering a white noise with a power spectral density following an isotropic ABC function, suggested by Harvey scatter theory. The parameters were tuned to fit experimental data measured on ASPIICS lenses. The computed wave front error was included in the Fresnel wave propagation of the coronagraph. A circular integration over the solar disk was performed to reconstruct the complete diffraction intensity. Results. The level of micro-roughness is 1.92nm root-mean-square. Compared to the ideal case, in the plane of the internal occulter, the diffraction peak intensity is reduced by ' 0.001\%. However, the intensity outside the peak increases by 12\% on average, up to 20\% at 3R , where the mask does not filter out the diffraction. At detector level, the diffraction peak remains ' 10 −6 at 1.1R , similar to the ideal case, but the diffraction tail at large solar radius is much higher, up to one order of magnitude. Sizing the internal occulter and the Lyot stop does not improve the rejection, as opposed to the ideal case. Conclusions. Besides these results, this paper provides a methodology to implement roughness scattering in the wave propagation model for the solar coronagraph.

BibTeX:

@article{rougeot2019influence,
author = { Rougeot, Raphael and Flamary, Remi and Mary, David and Aime, Claude},
title = {Influence of surface roughness on diffraction in the externally occulted Lyot solar coronagraph},
journal = { Astronomy and Astrophysics},
year = {2019}
}

B. B. Damodaran, R. Flamary, V. Seguy, N. Courty, An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images, Computer Vision and Image Understanding, 2019.

[Abstract] [BibTeX] [PDF]

Abstract: Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods.

BibTeX:

@article{damodaran2019entropic,
author = { B. Damodaran, Bharath and Flamary, Rémi and Seguy, Viven         and Courty, Nicolas},
title = {An Entropic Optimal Transport Loss for Learning Deep Neural Networks         under Label Noise in Remote Sensing Images},
journal = {Computer Vision and Image Understanding},
year = {2019}
}

R. B. Metcalf, M. Meneghetti, C. Avestruz, F. Bellagamba, C. R. Bom, E. Bertin, R. Cabanac, E. Decencière, R. Flamary, R. Gavazzi, others, The Strong Gravitational Lens Finding Challenge, Astronomy and Astrophysics, Vol. 625, pp A119, 2019.

[Abstract] [BibTeX] [DOI] [PDF]

Abstract: Large scale imaging surveys will increase the number of galaxy-scale strong lensing candidates by maybe three orders of magnitudes beyond the number known today. Finding these rare objects will require picking them out of at least tens of millions of images and deriving scientific results from them will require quantifying the efficiency and bias of any search method. To achieve these objectives automated methods must be developed. Because gravitational lenses are rare objects reducing false positives will be particularly important. We present a description and results of an open gravitational lens finding challenge. Participants were asked to classify 100,000 candidate objects as to whether they were gravitational lenses or not with the goal of developing better automated methods for finding lenses in large data sets. A variety of methods were used including visual inspection, arc and ring finders, support vector machines (SVM) and convolutional neural networks (CNN). We find that many of the methods will be easily fast enough to analyse the anticipated data flow. In test data, several methods are able to identify upwards of half the lenses after applying some thresholds on the lens characteristics such as lensed image brightness, size or contrast with the lens galaxy without making a single false-positive identification. This is significantly better than direct inspection by humans was able to do. (abridged)

BibTeX:

@article{metcalf2019strong,
author = {Metcalf, R Benton and Meneghetti, M and Avestruz, Camille and Bellagamba, Fabio and Bom, Clécio R and Bertin, Emmanuel and Cabanac, Rémi and Decencière, Etienne and Flamary, Rémi and Gavazzi, Raphael and others},
title = {The Strong Gravitational Lens Finding Challenge},
journal = {Astronomy and Astrophysics},
volume = {625},
pages = {A119},
publisher = {EDP Sciences},
year = {2019}
}

2018

I. Harrane, R. Flamary, C. Richard, On reducing the communication cost of the diffusion LMS algorithm, IEEE Transactions on Signal and Information Processing over Networks (SIPN), Vol. 5, pp 100-112, 2018.

[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The rise of digital and mobile communications has recently made the world more connected and networked, resulting in an unprecedented volume of data flowing between sources, data centers, or processes. While these data may be processed in a centralized manner, it is often more suitable to consider distributed strategies such as diffusion as they are scalable and can handle large amounts of data by distributing tasks over networked agents. Although it is relatively simple to implement diffusion strategies over a cluster, it appears to be challenging to deploy them in an ad-hoc network with limited energy budget for communication. In this paper, we introduce a diffusion LMS strategy that significantly reduces communication costs without compromising the performance. Then, we analyze the proposed algorithm in the mean and mean-square sense. Next, we conduct numerical experiments to confirm the theoretical findings. Finally, we perform large scale simulations to test the algorithm efficiency in a scenario where energy is limited.

BibTeX:

@article{harrane2018reducing,
author = {Harrane, Ibrahim and Flamary, R. and Richard, C.},
title = {On reducing the communication cost of the diffusion LMS algorithm},
journal = {IEEE Transactions on Signal and Information Processing over Networks (SIPN)},
volume = {5},
pages = {100-112},
year = {2018}
}

R. Flamary, M. Cuturi, N. Courty, A. Rakotomamonjy, Wasserstein Discriminant Analysis, Machine learning , Vol. 107, pp 1923-1945, 2018.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.

BibTeX:

@article{flamary2017wasserstein,
author = {Flamary, Remi and Cuturi, Marco and Courty, Nicolas and Rakotomamonjy, Alain},
title = {Wasserstein Discriminant Analysis},
journal = { Machine learning },
volume = {107},
pages = {1923-1945},
year = {2018}
}

2017

P. Hartley, R. Flamary, N. Jackson, A. S. Tagore, R. B. Metcalf, Support Vector Machine classification of strong gravitational lenses, Monthly Notices of the Royal Astronomical Society (MNRAS), 2017.

[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey.

BibTeX:

@article{hartley2017support,
author = {Hartley, Philippa, and Flamary, Remi and Jackson, Neal and Tagore, A. S. and Metcalf, R. B.},
title = {Support Vector Machine classification of strong gravitational lenses},
journal = {Monthly Notices of the Royal Astronomical Society (MNRAS)},
year = {2017}
}

R. Rougeot, R. Flamary, D. Galano, C. Aime, Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS, Astronomy and Astrophysics, 2017.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Context. The future ESA Formation Flying mission Proba-3 will fly the solar coronagraph ASPIICS which couples a Lyot coronagraph of 50mm and an external occulter of 1.42m diameter set 144m before. Aims. We perform a numerical study on the theoretical performance of the hybrid coronagraph such ASPIICS. In this system, an internal occulter is set on the image of the external occulter instead of a Lyot mask on the solar image. First, we determine the rejection due to the external occulter alone. Second, the effects of sizing the internal occulter and the Lyot stop are analyzed. This work also applies to the classical Lyot coronagraph alone and the external solar coronagraph. Methods. The numerical computation uses the parameters of ASPIICS. First we take the approach of Aime, C. 2013, A&A 558, A138, to express the wave front from Fresnel diffraction at the entrance aperture of the Lyot coronagraph. From there, each wave front coming from a given point of the Sun is propagated through the Lyot coronagraph in three steps, from the aperture to the image of the external occulter, where the internal occulter is set, from this plane to the image of the entrance aperture, where the Lyot stop is set, and from there to the final observing plane. Making use of the axis-symmetry, wave fronts originating from one radius of the Sun are computed and the intensities circularly averaged. Results. As expected, the image of the external occulter appears as a bright circle, which locally exceeds the brightness of the Sun observed without external occulter. However, residual sunlight is below 10e-8 outside 1.5R. The Lyot coronagraph effectively complements the external occultation. At the expense of a small reduction in flux and resolution, reducing the Lyot stop allows a clear gain in rejection. Oversizing the internal occulter produces a similar effect but tends to exclude observations very close to the limb. We provide a graph that allows simply estimating the performance as a function of sizes of the internal occulter and Lyot stop.

BibTeX:

@article{rougeot2016performance,
author = { Rougeot, Raphael and Flamary, Remi and Galano, Damien and Aime, Claude},
title = {Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS},
journal = { Astronomy and Astrophysics},
year = {2017}
}

2016

N. Courty, R. Flamary, D. Tuia, A. Rakotomamonjy, Optimal transport for domain adaptation, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2016.

[Abstract] [BibTeX] [DOI] [PDF] [Supp] [Slides] [Code]

Abstract: Domain adaptation is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly, models built on a specific data representations become more robust when confronted to data depicting the same semantic concepts (the classes), but observed by another observation system with its own specificities. Among the many strategies proposed to adapt a domain to another, finding domain-invariant representations has shown excellent properties, as a single classifier can use labelled samples from the source domain under this representation to predict the unlabelled samples of the target domain. In this paper, we propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source and target domains. We learn a transportation plan matching both PDFs, which constrains labelled samples in the source domain to remain close during transport. This way, we exploit at the same time the few labeled information in the source and distributions of the input/observation variables observed in both domains. Experiments in toy and challenging real visual adaptation examples show the interest of the method, that consistently outperforms state of the art approaches.

BibTeX:

@article{courty2016optimal,
author = { Courty, N. and Flamary, R.  and Tuia, D. and Rakotomamonjy, A.},
title = {Optimal transport for domain adaptation},
journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on },
year = {2016}
}

D. Tuia, R. Flamary, M. Barlaud, Non-convex regularization in remote sensing, Geoscience and Remote Sensing, IEEE Transactions on, 2016.

[Abstract] [BibTeX] [PDF] [Code]

Abstract: In this paper, we study the effect of different regularizers and their implications in high dimensional image classification and sparse linear unmixing. Although kernelization or sparse methods are globally accepted solutions for processing data in high dimensions, we present here a study on the impact of the form of regularization used and its parametrization. We consider regularization via traditional squared (l2) and sparsity-promoting (l1) norms, as well as more unconventional nonconvex regularizers (lp and Log Sum Penalty). We compare their properties and advantages on several classification and linear unmixing tasks and provide advices on the choice of the best regularizer for the problem at hand. Finally, we also provide a fully functional toolbox for the community

BibTeX:

@article{tuia2016nonconvex,
author = {Tuia, D. and  Flamary, R. and Barlaud, M.},
title = {Non-convex regularization in remote sensing},
journal = {Geoscience and Remote Sensing, IEEE Transactions on},
year = {2016}
}

A. Rakotomamonjy, R. Flamary, G. Gasso, DC Proximal Newton for Non-Convex Optimization Problems, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 27, N. 3, pp 636-647, 2016.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm admit as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function anddoi regularizers are non-convex.

BibTeX:

@article{rakoto2015dcprox,
author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G.},
title = {DC Proximal Newton for Non-Convex Optimization Problems},
journal = { Neural Networks and Learning Systems, IEEE Transactions on},
volume = {27},
number = {3},
pages = {636-647},
year = {2016}
}

2015

D. Tuia, R. Flamary, N. Courty, Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions, ISPRS Journal of Photogrammetry and Remote Sensing, 2015.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: In this paper, we tackle the question of discovering an effective set of spatial filters to solve hyperspectral classification problems. Instead of fixing a priori the filters and their parameters using expert knowledge, we let the model find them within random draws in the (possibly infinite) space of possible filters. We define an active set feature learner that includes in the model only features that improve the classifier. To this end, we consider a fast and linear classifier, multiclass logistic classification, and show that with a good representation (the filters discovered), such a simple classifier can reach at least state of the art performances. We apply the proposed active set learner in four hyperspectral image classification problems, including agricultural and urban classification at different resolutions, as well as multimodal data. We also propose a hierarchical setting, which allows to generate more complex banks of features that can better describe the nonlinearities present in the data.

BibTeX:

@article{tuia2015multiclass,
author = {Tuia, D. and Flamary, R. and  Courty, N.},
title = {Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions},
journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
year = {2015}
}

R. Flamary, M. Fauvel, M. Dalla Mura, S. Valero, Analysis of multi-temporal classification techniques for forecasting image times series, Geoscience and Remote Sensing Letters (GRSL), Vol. 12, N. 5, pp 953-957, 2015.

[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The classification of an annual times series by using data from past years is investigated in this paper. Several classification schemes based on data fusion, sparse learning and semi-supervised learning are proposed to address the problem. Numerical experiments are performed on a MODIS image time series and show that while several approaches have statistically equivalent performances, SVM with 1 regularization leads to a better interpretation of the results due to their inherent sparsity in the temporal domain.

BibTeX:

@article{flamary2014analysis,
author = { Flamary, R. and Fauvel, M. and Dalla Mura, M. and Valero, S.},
title = {Analysis of multi-temporal classification techniques for forecasting image times series},
journal = { Geoscience and Remote Sensing Letters (GRSL)},
volume = {12},
number = {5},
pages = {953-957},
year = {2015}
}

2014

R. Flamary, C. Aime, Optimization of starshades: focal plane versus pupil plane, Astronomy and Astrophysics, Vol. 569, N. A28, pp 10, 2014.

[Abstract] [BibTeX] [URL] [DOI] [PDF]

Abstract: We search for the best possible transmission for an external occulter coronagraph that is dedicated to the direct observation of terrestrial exoplanets. We show that better observation conditions are obtained when the flux in the focal plane is minimized in the zone in which the exoplanet is observed, instead of the total flux received by the telescope. We describe the transmission of the occulter as a sum of basis functions. For each element of the basis, we numerically computed the Fresnel diffraction at the aperture of the telescope and the complex amplitude at its focus. The basis functions are circular disks that are linearly apodized over a few centimeters (truncated cones). We complemented the numerical calculation of the Fresnel diffraction for these functions by a comparison with pure circular discs (cylinder) for which an analytical expression, based on a decomposition in Lommel series, is available. The technique of deriving the optimal transmission for a given spectral bandwidth is a classical regularized quadratic minimization of intensities, but linear optimizations can be used as well. Minimizing the integrated intensity on the aperture of the telescope or for selected regions of the focal plane leads to slightly different transmissions for the occulter. For the focal plane optimization, the resulting residual intensity is concentrated behind the geometrical image of the occulter, in a blind region for the observation of an exoplanet, and the level of background residual starlight becomes very low outside this image. Finally, we provide a tolerance analysis for the alignment of the occulter to the telescope which also favors the focal plane optimization. This means that telescope offsets of a few decimeters do not strongly reduce the efficiency of the occulter.

BibTeX:

@article{flamary2014starshade,
author = { Flamary, Remi and Aime, Claude},
title = {Optimization of starshades: focal plane versus pupil plane},
journal = { Astronomy and Astrophysics},
volume = {569},
number = {A28},
pages = { 10},
year = {2014}
}

R. Flamary, N. Jrad, R. Phlypo, M. Congedo, A. Rakotomamonjy, Mixed-Norm Regularization for Brain Decoding, Computational and Mathematical Methods in Medicine, Vol. 2014, N. 1, pp 1-13, 2014.

[Abstract] [BibTeX] [DOI] [PDF] [Slides] [Code]

Abstract: This work investigates the use of mixed-norm regularization for sensor selection in event-related potential (ERP) based brain-computer interfaces (BCI). The classification problem is cast as a discriminative optimization framework where sensor selection is induced through the use of mixed-norms. This framework is extended to the multitask learning situation where several similar classification tasks related to different subjects are learned simultaneously. In this case, multitask learning helps in leveraging data scarcity issue yielding to more robust classifiers. For this purpose, we have introduced a regularizer that induces both sensor selection and classifier similarities. The different regularization approaches are compared on three ERP datasets showing the interest of mixed-norm regularization in terms of sensor selection. The multitask approaches are evaluated when a small number of learning examples are available yielding to significant performance improvements especially for subjects performing poorly.

BibTeX:

@article{flamary2014mixed,
author = {Flamary, R. and Jrad, N. and Phlypo, R. and Congedo, M. and Rakotomamonjy, A.},
title = {Mixed-Norm Regularization for Brain Decoding},
journal = {Computational and Mathematical Methods in Medicine},
volume = {2014},
number = {1},
pages = {1-13},
year = {2014}
}

E. Niaf, R. Flamary, O. Rouvière, C. Lartizien, S. Canu, Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging, Image Processing, IEEE Transactions on, Vol. 23, N. 3, pp 979-991, 2014.

[Abstract] [BibTeX] [DOI] [PDF] [Code]

Abstract: Building an accurate training database is challenging in supervised classification. For instance, in medical imaging, radiologists often delineate malignant and benign tissues without access to the histological ground truth, leading to uncertain data sets. This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here are both qualitative (a class label) or quantitative (an estimation of the posterior probability). In this context, usual discriminative methods, such as the support vector machine (SVM), fail either to learn a robust classifier or to predict accurate probability estimates. We generalize the regular SVM by introducing a new formulation of the learning problem to take into account class labels as well as class probability estimates. This original reformulation into a probabilistic SVM (P-SVM) can be efficiently solved by adapting existing flexible SVM solvers. Furthermore, this framework allows deriving a unique learned prediction function for both decision and posterior probability estimation providing qualitative and quantitative predictions. The method is first tested on synthetic data sets to evaluate its properties as compared with the classical SVM and fuzzy-SVM. It is then evaluated on a clinical data set of multiparametric prostate magnetic resonance images to assess its performances in discriminating benign from malignant tissues. P-SVM is shown to outperform classical SVM as well as the fuzzy-SVM in terms of probability predictions and classification performances, and demonstrates its potential for the design of an efficient computer-aided decision system for prostate cancer diagnosis based on multiparametric magnetic resonance (MR) imaging.

BibTeX:

@article{niaf2014kernel,
author = {Niaf, E. and Flamary, R. and Rouvière, O. and Lartizien, C. and  Canu, S.},
title = {Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging},
journal = {Image Processing, IEEE Transactions on},
volume = {23},
number = {3},
pages = {979-991},
year = {2014}
}

D. Tuia, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, R. Flamary, Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM, Geoscience and Remote Sensing, IEEE Transactions on, Vol. 52, N. 10, pp 6062-6074, 2014.

[Abstract] [BibTeX] [URL] [DOI] [PDF] [Code]

Abstract: Including spatial information is a key step for successful remote sensing image classification. In particular, when dealing with high spatial resolution, if local variability is strongly reduced by spatial filtering, the classification performance results are boosted. In this paper, we consider the triple objective of designing a spatial/spectral classifier, which is compact (uses as few features as possible), discriminative (enhances class separation), and robust (works well in small sample situations). We achieve this triple objective by discovering the relevant features in the (possibly infinite) space of spatial filters by optimizing a margin-maximization criterion. Instead of imposing a filter bank with predefined filter types and parameters, we let the model figure out which set of filters is optimal for class separation. To do so, we randomly generate spatial filter banks and use an active-set criterion to rank the candidate features according to their benefits to margin maximization (and, thus, to generalization) if added to the model. Experiments on multispectral very high spatial resolution (VHR) and hyperspectral VHR data show that the proposed algorithm, which is sparse and linear, finds discriminative features and achieves at least the same performances as models using a large filter bank defined in advance by prior knowledge.

BibTeX:

@article{tuia2014automatic,
author = {Tuia, D. and Volpi, M. and Dalla Mura, M. and Rakotomamonjy, A. and Flamary, R.},
title = {Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM},
journal = {Geoscience and Remote Sensing, IEEE Transactions on},
volume = {52},
number = {10},
pages = {6062-6074},
year = {2014}
}

L. Laporte, R. Flamary, S. Canu, S. Déjean, J. Mothe, Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 25, N. 6, pp 1118-1130, 2014.

[Abstract] [BibTeX] [URL] [DOI] [PDF] [Code]

Abstract: Feature selection in learning to rank has recently emerged as a crucial issue. Whereas several preprocessing approaches have been proposed, only a few works have been focused on integrating the feature selection into the learning process. In this work, we propose a general framework for feature selection in learning to rank using SVM with a sparse regularization term. We investigate both classical convex regularizations such as l1 or weighted l1 and non-convex regularization terms such as log penalty, Minimax Concave Penalty (MCP) or lp pseudo norm with p lower than 1. Two algorithms are proposed, first an accelerated proximal approach for solving the convex problems, second a reweighted l1 scheme to address the non-convex regularizations. We conduct intensive experiments on nine datasets from Letor 3.0 and Letor 4.0 corpora. Numerical results show that the use of non-convex regularizations we propose leads to more sparsity in the resulting models while prediction performance is preserved. The number of features is decreased by up to a factor of six compared to the l1 regularization. In addition, the software is publicly available on the web.

BibTeX:

@article{tnnls2014,
author = { Laporte, L. and Flamary, R. and Canu, S. and Déjean, S. and Mothe, J.},
title = {Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM},
journal = { Neural Networks and Learning Systems, IEEE Transactions on},
volume = {25},
number = {6},
pages = {1118-1130},
year = {2014}
}

2013

A. Rakotomamonjy, R. Flamary, F. Yger, Learning with infinitely many features, Machine Learning, Vol. 91, N. 1, pp 43-66, 2013.

[Abstract] [BibTeX] [URL] [DOI] [PDF] [Slides] [Code]

Abstract: We propose a principled framework for learning with infinitely many features, situations that are usually induced by continuously parametrized feature extraction methods. Such cases occur for instance when considering Gabor-based features in computer vision problems or when dealing with Fourier features for kernel approximations. We cast the problem as the one of finding a finite subset of features that minimizes a regularized empirical risk. After having analyzed the optimality conditions of such a problem, we propose a simple algorithm which has the avour of a column-generation technique. We also show that using Fourier-based features, it is possible to perform approximate infinite kernel learning. Our experimental results on several datasets show the benefits of the proposed approach in several situations including texture classification and large-scale kernelized problems (involving about 100 thousand examples).

BibTeX:

@article{ml2012,
author = { Rakotomamonjy, A. and Flamary, R. and Yger, F.},
title = {Learning with infinitely many features},
journal = { Machine Learning},
volume = {91},
number = {1},
pages = {43-66},
year = {2013}
}

2012

R. Flamary, A. Rakotomamonjy, Decoding finger movements from ECoG signals using switching linear models, Frontiers in Neuroscience, Vol. 6, N. 29, 2012.

[Abstract] [BibTeX] [URL] [DOI] [PDF]

Abstract: One of the most interesting challenges in ECoG-based Brain-Machine Interface is movement prediction. Being able to perform such a prediction paves the way to high-degree precision command for a machine such as a robotic arm or robotic hands. As a witness of the BCI community increasing interest towards such a problem, the fourth BCI Competition provides a dataset which aim is to predict individual finger movements from ECog signals. The difficulty of the problem relies on the fact that there is no simple relation between ECoG signals and finger movements. We propose in this paper, to estimate and decode these finger flexions using switching models controlled by an hidden state. Switching models can integrate prior knowledge about the decoding problem and helps in predicting fine and precise movements. Our model is thus based on a first block which estimates which finger is moving and another block which, knowing which finger is moving, predicts the movements of all other fingers. Numerical results that have been submitted to the Competition show that the model yields high decoding performances when the hidden state is well estimated. This approach achieved the second place in the BCI competition with a correlation measure between real and predicted movements of 0.42.

BibTeX:

@article{frontiers2012,
author = { Flamary, R. and  Rakotomamonjy, A.},
title = {Decoding finger movements from ECoG signals using switching linear models},
journal = { Frontiers in Neuroscience},
volume = { 6},
number = { 29},
year = {2012}
}

R. Flamary, D. Tuia, B. Labbé, G. Camps-Valls, A. Rakotomamonjy, Large Margin Filtering, IEEE Transactions Signal Processing, Vol. 60, N. 2, pp 648-659, 2012.

[Abstract] [BibTeX] [URL] [PDF] [Code]

Abstract: Many signal processing problems are tackled by filtering the signal for subsequent feature classification or regression. Both steps are critical and need to be designed carefully to deal with the particular statistical characteristics of both signal and noise. Optimal design of the filter and the classifier are typically aborded in a separated way, thus leading to suboptimal classification schemes. This paper proposes an efficient methodology to learn an optimal signal filter and a support vector machine (SVM) classifier jointly. In particular, we derive algorithms to solve the optimization problem, prove its theoretical convergence, and discuss different filter regularizers for automated scaling and selection of the feature channels. The latter gives rise to different formulations with the appealing properties of sparseness and noise-robustness. We illustrate the performance of the method in several problems. First, linear and nonlinear toy classification examples, under the presence of both Gaussian and convolutional noise, show the robustness of the proposed methods. The approach is then evaluated on two challenging real life datasets: BCI time series classification and multispectral image segmentation. In all the examples, large margin filtering shows competitive classification performances while offering the advantage of interpretability of the filtered channels retrieved.

BibTeX:

@article{ieeesp2012,
author = { Flamary, R. and Tuia, D. and Labbé, B. and Camps-Valls, G. and Rakotomamonjy, A.},
title = {Large Margin Filtering},
journal = { IEEE Transactions Signal Processing},
volume = {60},
number = {2},
pages = {648-659},
year = {2012}
}

2011

A. Rakotomamonjy, R. Flamary, G. Gasso, S. Canu, lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning, IEEE Transactions on Neural Networks, Vol. 22, N. 8, pp 1307-1320, 2011.

[Abstract] [BibTeX] [URL] [PDF] [Code]

Abstract: Recently, there has been a lot of interest around multi-task learning (MTL) problem with the constraints that tasks should share a common sparsity profile. Such a problem can be addressed through a regularization framework where the regularizer induces a joint-sparsity pattern between task decision functions. We follow this principled framework and focus on $\ell_p-\ell_q$ (with $0 \leq p \leq 1$ and $ 1 \leq q \leq 2$) mixed-norms as sparsity- inducing penalties. Our motivation for addressing such a larger class of penalty is to adapt the penalty to a problem at hand leading thus to better performances and better sparsity pattern. For solving the problem in the general multiple kernel case, we first derive a variational formulation of the $\ell_1-\ell_q$ penalty which helps up in proposing an alternate optimization algorithm. Although very simple, the latter algorithm provably converges to the global minimum of the $\ell_1-\ell_q$ penalized problem. For the linear case, we extend existing works considering accelerated proximal gradient to this penalty. Our contribution in this context is to provide an efficient scheme for computing the $\ell_1-\ell_q$ proximal operator. Then, for the more general case when $0 < p < 1$, we solve the resulting non-convex problem through a majorization-minimization approach. The resulting algorithm is an iterative scheme which, at each iteration, solves a weighted $\ell_1-\ell_q$ sparse MTL problem. Empirical evidences from toy dataset and real-word datasets dealing with BCI single trial EEG classification and protein subcellular localization show the benefit of the proposed approaches and algorithms.

BibTeX:

@article{tnn2011,
author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G. and Canu, S.},
title = {lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning},
journal = { IEEE Transactions on Neural Networks},
volume = {22},
number = {8},
pages = {1307-1320},
year = {2011}
}

N. Jrad, M. Congedo, R. Phlypo, S. Rousseau, R. Flamary, F. Yger, A. Rakotomamonjy, sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces, Journal of Neural Engineering, Vol. 8, N. 5, pp 056004, 2011.

[Abstract] [BibTeX] [URL] [PDF]

Abstract: In many machine learning applications, like brain–computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.

BibTeX:

@article{jrad2011swsvm,
author = {N. Jrad and M. Congedo and R. Phlypo and S. Rousseau and R. Flamary and F. Yger and A. Rakotomamonjy},
title = {sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces},
journal = {Journal of Neural Engineering},
volume = {8},
number = {5},
pages = {056004},
year = {2011}
}

Rémi Flamary

Pages

Other sites

Contact

Address

Publications / Journals and book chapters

Submited and preprint

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011