Rémi Flamary

Professional website

Publications / Conferences

All Journals Conferences Books and chapters Others

QuickSearch:  

Selected: 0.

Search Settings

    Submited and preprint

    P. Krzakala, J. Yang, R. Flamary, F. d'Alché-Buc, C. Laclau, M. Labeau, End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov-Wasserstein Matching (Submited), 2024.
    Abstract: We present a novel end-to-end deep learning-based approach for Supervised Graph Prediction (SGP). We introduce an original Optimal Transport (OT)-based loss, the Partially-Masked Fused Gromov-Wasserstein loss (PM-FGW), that allows to directly leverage graph representations such as adjacency and feature matrices. PM-FGW exhibits all the desirable properties for SGP: it is node permutation invariant, sub-differentiable and handles graphs of different sizes by comparing their padded representations as well as their masking vectors. Moreover, we present a flexible transformer-based architecture that easily adapts to different types of input data. In the experimental section, three different tasks, a novel and challenging synthetic dataset (image2graph) and two real-world tasks, image2map and fingerprint2molecule - showcase the efficiency and versatility of the approach compared to competitors.
    BibTeX:
    @inproceedings{krzakala2024endtoend,
    author = {Paul Krzakala and Junjie Yang and Rémi Flamary and Florence d'Alché-Buc and Charlotte Laclau and Matthieu Labeau},
    title = {End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov-Wasserstein Matching},
    year = {2024 (Submited)}
    }
    H. V. Assel, C. Vincent-Cuaz, N. Courty, R. Flamary, P. Frossard, T. Vayer, Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection (Submited), 2024.
    Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto interpretable spaces or organizing points into meaningful clusters. In practice, these methods are used sequentially, without guaranteeing that the clustering aligns well with the conducted dimensionality reduction. In this work, we offer a fresh perspective: that of distributions. Leveraging tools from optimal transport, particularly the Gromov-Wasserstein distance, we unify clustering and dimensionality reduction into a single framework called distributional reduction. This allows us to jointly address clustering and dimensionality reduction with a single optimization problem. Through comprehensive experiments, we highlight the versatility and interpretability of our method and show that it outperforms existing approaches across a variety of image and genomics datasets.
    BibTeX:
    @inproceedings{vanassel2024distributional,
    author = {Hugues Van Assel and Cédric Vincent-Cuaz and Nicolas Courty and Rémi Flamary and Pascal Frossard and Titouan Vayer},
    title = {Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection},
    year = {2024 (Submited)}
    }

    2023

    T. Gnassounou, R. Flamary, A. Gramfort, Convolutional Monge Mapping Normalization for learning on biosignals, Neural Information Processing Systems (NeurIPS), 2023.
    Abstract: In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances.
    BibTeX:
    @inproceedings{gnassounou2023convolutional,
    author = {Gnassounou, Théo and Flamary, Rémi and Gramfort, Alexandre},
    title = {Convolutional Monge Mapping Normalization for learning on biosignals},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2023}
    }
    H. Van Assel, T. Vayer, R. Flamary, N. Courty, SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities, Neural Information Processing Systems (NeurIPS), 2023.
    Abstract: Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.
    BibTeX:
    @inproceedings{van2023snekhorn,
    author = {Van Assel, Hugues and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
    title = {SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2023}
    }
    A. Collas, T. Vayer, R. Flamary, A. Breloy, Entropic Wasserstein component analysis, IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2023.
    Abstract: Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online.
    BibTeX:
    @inproceedings{collas2023entropic,
    author = {Collas, Antoine and Vayer, Titouan and Flamary, Rémi and Breloy, Arnaud},
    title = {Entropic Wasserstein component analysis},
    booktitle = { IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},
    year = {2023}
    }
    Q. H. Tran, H. Janati, N. Courty, R. Flamary, I. Redko, P. Demetci, R. Singh, Unbalanced CO-Optimal Transport, Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
    Abstract: Optimal transport (OT) compares probability distributions by computing a meaningful alignment between their samples. CO-optimal transport (COOT) takes this comparison further by inferring an alignment between features as well. While this approach leads to better alignments and generalizes both OT and Gromov-Wasserstein distances, we provide a theoretical result showing that it is sensitive to outliers that are omnipresent in real-world data. This prompts us to propose unbalanced COOT for which we provably show its robustness to noise in the compared datasets. To the best of our knowledge, this is the first such result for OT methods in incomparable spaces. With this result in hand, we provide empirical evidence of this robustness for the challenging tasks of heterogeneous domain adaptation with and without varying proportions of classes and simultaneous alignment of samples and features across single-cell measurements.
    BibTeX:
    @inproceedings{tran2023unbalanced,
    author = { Tran, Quang Huy and Janati, Hicham and Courty, Nicolas and Flamary, Rémi and Redko, Ievgen and Demetci, Pinar and Singh, Ritambhara},
    title = {Unbalanced CO-Optimal Transport},
    booktitle = { Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI)},
    year = {2023}
    }

    2022

    C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Template based Graph Neural Network with Optimal Transport Distances, Neural Information Processing Systems (NeurIPS), 2022.
    Abstract: Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.
    BibTeX:
    @inproceedings{vincentcuaz2022template,
    author = { Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and Vayer, Titouan and Courty, Nicolas},
    title = {Template based Graph Neural Network with Optimal Transport   Distances},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2022}
    }
    A. Thual, H. Tran, T. Zemskova, N. Courty, R. Flamary, S. Dehaene, B. Thirion, Aligning individual brains with Fused Unbalanced Gromov-Wasserstein, Neural Information Processing Systems (NeurIPS), 2022.
    Abstract: Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel method for inter-subject alignment based on Optimal Transport, denoted as Fused Unbalanced Gromov Wasserstein (FUGW). The method aligns cortical surfaces based on the similarity of their functional signatures in response to a variety of stimulation settings, while penalizing large deformations of individual topographic organization. We demonstrate that FUGW is well-suited for whole-brain landmark-free alignment. The unbalanced feature allows to deal with the fact that functional areas vary in size across subjects. Our results show that FUGW alignment significantly increases between-subject correlation of activity for independent functional data, and leads to more precise mapping at the group level.
    BibTeX:
    @inproceedings{thual2022aligning,
    author = { Thual, Alexis and Tran, Huy and Zemskova, Tatiana and Courty, Nicolas and Flamary, Rémi and Dehaene, Stanislas and Thirion, Bertrand},
    title = {Aligning individual brains with Fused Unbalanced Gromov-Wasserstein},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2022}
    }
    L. Brogat-Motte, R. Flamary, C. Brouard, J. Rousu, F. d'Alché-Buc, Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters, International Conference In Machine Learning (ICML), 2022.
    Abstract: This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge regression for which theoretical results such as consistency and excess risk bound are proved. Next we propose an interpretable parametric model where the barycenter weights are modeled with a neural network and the graphs on which the FGW barycenter is calculated are additionally learned. Numerical experiments show the strength of the method and its ability to interpolate in the labeled graph space on simulated data and on a difficult metabolic identification problem where it can reach very good performance with very little engineering.
    BibTeX:
    @inproceedings{brogat2022learning,
    author = {Brogat-Motte, Luc and Flamary, Rémi and Brouard, Céline and Rousu, Juho and d'Alché-Buc, Florence},
    title = {Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters},
    booktitle = { International Conference In Machine Learning (ICML)},
    year = {2022}
    }
    R. Turrisi, R. Flamary, A. Rakotomamonjy, M. Pontil, Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport, Conference on Uncertainty in Artificial Intelligence (UAI), 2022.
    Abstract: The problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets is becoming increasingly important. A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains. In this paper, we address this problem from a new perspective: instead of looking for a latent representation invariant between source and target domains, we exploit the diversity of source distributions by tuning their weights to the target task at hand. Our method, named Weighted Joint Distribution Optimal Transport (WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real-life datasets.
    BibTeX:
    @inproceedings{turrisi2022multisource,
    author = {Rosanna Turrisi and Rémi Flamary and Alain Rakotomamonjy and Massimiliano Pontil},
    title = {Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport},
    booktitle = { Conference on Uncertainty in Artificial Intelligence (UAI)},
    year = {2022}
    }
    A. Rakotomamonjy, R. Flamary, G. Gasso, J. Salmon, Convergent Working Set Algorithm for Lasso with Non-Convex Sparse Regularizers, International Conference on Artificial Intelligence and Statistics (AISTAT), 2022.
    Abstract: Owing to their statistical properties, non-convex sparse regularizers have attracted much interest for estimating a sparse linear model from high dimensional data. Given that the solution is sparse, for accelerating convergence, a working set strategy addresses the optimization problem through an iterative algorithm by incre-menting the number of variables to optimize until the identification of the solution support. While those methods have been well-studied and theoretically supported for convex regularizers, this paper proposes a working set algorithm for non-convex sparse regularizers with convergence guarantees. The algorithm, named FireWorks, is based on a non-convex reformulation of a recent primal-dual approach and leverages on the geometry of the residuals. Our theoretical guarantees derive from a lower bound of the objective function decrease between two inner solver iterations and shows the convergence to a stationary point of the full problem. More importantly, we also show that convergence is preserved even when the inner solver is inexact, under sufficient decay of the error across iterations. Our experimental results demonstrate high computational gain when using our working set strategy compared to the full problem solver for both block-coordinate descent or a proximal gradient solver.
    BibTeX:
    @inproceedings{rakotomamonjy2022provably,
    author = {Rakotomamonjy, Alain and Flamary, Rémi and Gasso, Gilles and Salmon, Joseph},
    title = {Convergent Working Set Algorithm for Lasso with Non-Convex Sparse Regularizers},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2022}
    }
    C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Semi-relaxed Gromov Wasserstein divergence with applications on graphs, International Conference on Learning Representations (ICLR), 2022.
    Abstract: Comparing structured objects such as graphs is a fundamental operation involved in many learning tasks. To this end, the Gromov-Wasserstein (GW) distance, based on Optimal Transport (OT), has proven to be successful in handling the specific nature of the associated objects. More specifically, through the nodes connectivity relations, GW operates on graphs, seen as probability measures over specific spaces. At the core of OT is the idea of conservation of mass, which imposes a coupling between all the nodes from the two considered graphs. We argue in this paper that this property can be detrimental for tasks such as graph dictionary or partition learning, and we relax it by proposing a new semi-relaxed Gromov-Wasserstein divergence. Aside from immediate computational benefits, we discuss its properties, and show that it can lead to an efficient graph dictionary learning algorithm. We empirically demonstrate its relevance for complex tasks on graphs such as partitioning, clustering and completion.
    BibTeX:
    @inproceedings{vincent2022semi,
    author = {Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and   Vayer, Titouan and Courty, Nicolas},
    title = {Semi-relaxed Gromov Wasserstein divergence with applications on graphs},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2022}
    }

    2021

    L. Chapel, R. Flamary, H. Wu, C. Févotte, G. Gasso, Unbalanced Optimal Transport through Non-negative Penalized Linear Regression, Neural Information Processing Systems (NeurIPS), 2021.
    Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
    BibTeX:
    @inproceedings{chapel2021unbalanced,
    author = {Chapel, Laetitia and Flamary, Rémi and Wu, Haoran and Févotte, Cédric   and Gasso, Gilles},
    title = {Unbalanced Optimal Transport through Non-negative Penalized Linear Regression},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2021}
    }
    C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.
    Abstract: Dictionary learning is a key tool for representation learning that explains the data as linear combination of few basic elements. Yet, this analysis is not amenable in the context of graph learning, as graphs usually belong to different metric spaces. We fill this gap by proposing a new online Graph Dictionary Learning approach, which uses the Gromov Wasserstein divergence for the data fitting term. In our work, graphs are encoded through their nodes' pairwise relations and modeled as convex combination of graph atoms, i.e. dictionary elements, estimated thanks to an online stochastic algorithm, which operates on a dataset of unregistered graphs with potentially different number of nodes. Our approach naturally extends to labeled graphs, and is completed by a novel upper bound that can be used as a fast approximation of Gromov Wasserstein in the embedding space. We provide numerical evidences showing the interest of our approach for unsupervised embedding of graph datasets and for online graph subspace estimation and tracking.
    BibTeX:
    @inproceedings{vincent2021online,
    author = {Vincent-Cuaz, Cédric and Vayer, Titouan and Flamary, Rémi and Corneli, Marco and Courty, Nicolas},
    title = {Online Graph Dictionary Learning},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2021}
    }
    K. Fatras, T. Séjourné, N. Courty, R. Flamary, Unbalanced minibatch Optimal Transport; applications to Domain Adaptation, International Conference on Machine Learning (ICML), 2021.
    Abstract: Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, \em i.e. minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
    BibTeX:
    @inproceedings{fatras2021unbalanced,
    author = {Fatras, Kilian and Séjourné, Thibault and Courty, Nicolas and   Flamary, Rémi},
    title = {Unbalanced minibatch Optimal Transport; applications to Domain Adaptation},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2021}
    }

    2020

    I. Redko, T. Vayer, R. Flamary, N. Courty, CO-Optimal Transport, Neural Information Processing Systems (NeurIPS), 2020.
    Abstract: Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions. Yet, its original formulation relies on the existence of a cost function between the samples of the two distributions, which makes it impractical for comparing data distributions supported on different topological spaces. To circumvent this limitation, we propose a novel OT problem, named COOT for CO-Optimal Transport, that aims to simultaneously optimize two transport maps between both samples and features. This is different from other approaches that either discard the individual features by focussing on pairwise distances (e.g. Gromov-Wasserstein) or need to model explicitly the relations between the features. COOT leads to interpretable correspondences between both samples and feature representations and holds metric properties. We provide a thorough theoretical analysis of our framework and establish rich connections with the Gromov-Wasserstein distance. We demonstrate its versatility with two machine learning applications in heterogeneous domain adaptation and co-clustering/data summarization, where COOT leads to performance improvements over the competing state-of-the-art methods.
    BibTeX:
    @inproceedings{redko2020cooptimal,
    author = {Ivegen Redko and Titouan Vayer and Rémi Flamary and Nicolas Courty},
    title = {CO-Optimal Transport},
    booktitle = { Neural Information Processing Systems (NeurIPS)},
    year = {2020}
    }
    D. Marcos, R. Fong, S. Lobry, R. Flamary, N. Courty, D. Tuia, Contextual Semantic Interpretability, Asian Conference on Computer Vision (ACCV), 2020.
    Abstract: Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emphsemantic bottleneck. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.
    BibTeX:
    @inproceedings{marcos2020contextual,
    author = {Diego Marcos and Ruth Fong and Sylvain Lobry and Remi Flamary and Nicolas Courty and Devis Tuia},
    title = {Contextual Semantic Interpretability},
    booktitle = { Asian Conference on Computer Vision (ACCV)},
    year = {2020}
    }
    K. Fatras, Y. Zine, R. Flamary, R. Gribonval, N. Courty, Learning with minibatch Wasserstein : asymptotic and gradient properties, International Conference on Artificial Intelligence and Statistics (AISTAT), 2020.
    Abstract: Optimal transport distances are powerful tools to compare probability distributions and have found many applications in machine learning. Yet their algorithmic complexity prevents their direct use on large scale datasets. To overcome this challenge, practitioners compute these distances on minibatches \em i.e. they average the outcome of several smaller optimal transport problems. We propose in this paper an analysis of this practice, which effects are not well understood so far. We notably argue that it is equivalent to an implicit regularization of the original problem, with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with defects such as loss of distance property. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, GANs or color transfer that highlight the practical interest of this strategy.
    BibTeX:
    @inproceedings{fatras2019learning,
    author = {Kilian Fatras and Younes Zine and Rémi Flamary and Rémi Gribonval and Nicolas Courty},
    title = {Learning with minibatch Wasserstein : asymptotic and gradient properties},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2020}
    }

    2019

    T. Vayer, R. Flamary, R. Tavenard, L. Chapel, N. Courty, Sliced Gromov-Wasserstein, Neural Information Processing Systems (NeurIPS), 2019.
    Abstract: Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions that do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the Sliced Wasserstein (SW) distance exploits the direct solution of W on the line, that only requires sorting discrete samples in 1D. This paper propose a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being O(n^2) to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute
    BibTeX:
    @inproceedings{vayer2019sliced,
    author = { Vayer, Titouan and Flamary, Rémi and Tavenard, Romain and Chapel, Laetitia  and  Courty, Nicolas},
    title = {Sliced Gromov-Wasserstein},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2019}
    }
    T. Vayer, L. Chapel, R. Flamary, R. Tavenard, N. Courty, Optimal Transport for structured data with application on graphs, International Conference on Machine Learning (ICML), 2019.
    Abstract: This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Fréchet means or barycenters of graphs are illustrated and discussed in a clustering context.
    BibTeX:
    @inproceedings{vayer2019optimal,
    author = { Vayer, Titouan and Chapel, Laetitia and Flamary, Rémi and Tavenard, Romain and  Courty, Nicolas},
    title = {Optimal Transport for structured data with application on graphs},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2019}
    }
    I. Redko, N. Courty, R. Flamary, D. Tuia, Optimal Transport for Multi-source Domain Adaptation under Target Shift, International Conference on Artificial Intelligence and Statistics (AISTAT), 2019.
    Abstract: In this paper, we propose to tackle the problem of reducing discrepancies between multiple domains referred to as multi-source domain adaptation and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with labels' proportions differing across them. We design a method based on optimal transport, a theory that is gaining momentum to tackle adaptation problems in machine learning due to its efficiency in aligning probability distributions. Our method performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data related to satellite image segmentation task show the superiority of the proposed method over the state-of-the-art.
    BibTeX:
    @inproceedings{redko2018optimal,
    author = { Redko, I. and Courty, N. and Flamary, R. and Tuia, D.},
    title = {Optimal Transport for Multi-source Domain Adaptation under Target Shift},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2019}
    }

    2018

    I. Harrane, R. Flamary, C. Richard, R. Couillet, Random matrix theory for diffusion LMS analysis., Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2018.
    Abstract: Multi-agent networks usually consist of a large number of interconnected agents or nodes. Interconnections between the agents allow them to share information and collaborate in order to solve complex tasks collectively. Examples abound in the realm of social, economic and biological networks. Distributed algorithms over such networks offer a valuable alternative to centralized solutions with useful properties such as scalability, robustness, and decentralization. Among the existing cooperation rules, we are interested in this paper in diffusion strategies since they are scalable, robust, and enable agents to continuously learn and adapt in an online manner to concept drifts in their data streams. The performances of diffusion strategies have been widely studied in the literature, but never in the asymptotic regime of extremely large networks and very high dimensional data. In this paper we explore this regime with the Random Matrix Theory (RMT) and analyze the performance of the diffusion LMS accordingly. Then we conduct numerical simulations to support the theoretical finding and to determine its applicability when RMT conditions are violated.
    BibTeX:
    @inproceedings{harrane2018random,
    author = {Harrane, Ibrahim and Flamary, R. and Richard, C. and Couillet, R.},
    title = {Random matrix theory for diffusion LMS analysis.},
    booktitle = {Asilomar Conference on Signals, Systems and Computers (ASILOMAR)},
    year = {2018}
    }
    B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, N. Courty, DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation, European Conference in Computer Visions (ECCV), 2018.
    Abstract: In computer vision, one is often confronted with problems of domain shifts, which occur when one applies a classifier trained on a source dataset to target data sharing similar characteristics (e.g. same classes), but also different latent data structures (e.g. different acquisition conditions). In such a situation, the model will perform poorly on the new data, since the classifier is specialized to recognize visual cues specific to the source domain. In this work we explore a solution, named DeepJDOT, to tackle this problem: through a measure of discrepancy on joint deep representations/labels based on optimal transport, we not only learn new data representations aligned between the source and target domain, but also simultaneously preserve the discriminative information used by the classifier. We applied DeepJDOT to a series of visual recognition tasks, where it compares favorably against state-of-the-art deep domain adaptation methods.
    BibTeX:
    @inproceedings{damodaran2018deepjdot,
    author = { Damodaran, Bharath B. and Kellenberger, Benjamin and Flamary, Rémi and Tuia, Devis and Courty, Nicolas},
    title = {DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation},
    booktitle = {European Conference in Computer Visions (ECCV)},
    year = {2018}
    }
    R. Rougeot, C. Aime, C. Baccani, S. Fineschi, R. Flamary, D. Galano, C. Galy, V. Kirschner, F. Landini, M. Romoli, others, Straylight analysis for the externally occulted Lyot solar coronagraph ASPIICS, Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave, Vol. 10698, pp 106982T, 2018.
    Abstract: The ESA formation Flying mission Proba-3 will fly the giant solar coronagraph ASPIICS. The instrument is composed of a 1.4 meter diameter external occulting disc mounted on the Occulter Spacecraft and a Lyot-style solar coronagraph of 50mm diameter aperture carried by the Coronagraph Spacecraft positioned 144 meters behind. The system will observe the inner corona of the Sun, as close as 1.1 solar radius. For a solar coronagraph, the most critical source of straylight is the residual diffracted sunlight, which drives the scientific performance of the observation. This is especially the case for ASPIICS because of its reduced field-of-view close to the solar limb. The light from the Sun is first diffracted by the edge of the external occulter, and then propagates and scatters inside the instrument. There is a crucial need to estimate both intensity and distribution of the diffraction on the focal plane. Because of the very large size of the coronagraph, one cannot rely on representative full scale test campaign. Moreover, usual optics software package are not designed to perform such diffraction computation, with the required accuracy. Therefore, dedicated approaches have been developed in the frame of ASPIICS. First, novel numerical models compute the diffraction profile on the entrance pupil plane and instrument detector plane (Landini et al., Rougeot et al.), assuming perfect optics in the sense of multi-reflection and scattering. Results are confronted to experimental measurements of diffraction. The paper reports the results of the different approaches.
    BibTeX:
    @inproceedings{rougeot2018straylight,
    author = {Rougeot, Rapha\el and Aime, Claude and Baccani, Cristian and Fineschi, Silvano and Flamary, Rémi and Galano, Damien and Galy, Camille and Kirschner, Volker and Landini, Federico and Romoli, Marco and others},
    title = {Straylight analysis for the externally occulted Lyot solar coronagraph ASPIICS},
    volume = {10698},
    pages = {106982T},
    booktitle = {Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave},
    organization = {International Society for Optics and Photonics},
    year = {2018}
    }
    V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, M. Blondel, Large-Scale Optimal Transport and Mapping Estimation, International Conference on Learning Representations (ICLR), 2018.
    Abstract: This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a Monge map as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
    BibTeX:
    @inproceedings{seguy2018large,
    author = {Seguy, Vivien. and Damodaran, Bharath B.  and Flamary, Remi and Courty, Nicolas and Rolet, Antoine and Blondel, Mathieu},
    title = {Large-Scale Optimal Transport and Mapping Estimation},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2018}
    }
    N. Courty, R. Flamary, M. Ducoffe, Learning Wasserstein Embeddings, International Conference on Learning Representations (ICLR), 2018.
    Abstract: The Wasserstein distance received a lot of attention recently in the community of machine learning, especially for its principled way of comparing distributions. It has found numerous applications in several hard problems, such as domain adaptation, dimensionality reduction or generative models. However, its use is still limited by a heavy computational cost. Our goal is to alleviate this problem by providing an approximation mechanism that allows to break its inherent complexity. It relies on the search of an embedding where the Euclidean distance mimics the Wasserstein distance. We show that such an embedding can be found with a siamese architecture associated with a decoder network that allows to move from the embedding space back to the original input space. Once this embedding has been found, computing optimization problems in the Wasserstein space (e.g. barycenters, principal directions or even archetypes) can be conducted extremely fast. Numerical experiments supporting this idea are conducted on image datasets, and show the wide potential benefits of our method.
    BibTeX:
    @inproceedings{courty2018learning,
    author = {Courty, Nicolas and Flamary, Remi and Ducoffe, Melanie},
    title = {Learning Wasserstein Embeddings},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2018}
    }

    2017

    N. Courty, R. Flamary, A. Habrard, A. Rakotomamonjy, Joint Distribution Optimal Transportation for Domain Adaptation, Neural Information Processing Systems (NIPS), 2017.
    Abstract: This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain Ps and Pt. We propose a solution of this problem with optimal transport, that allows to recover an estimated target Pft=(X,f(X)) by optimizing simultaneously the optimal coupling and f. We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.
    BibTeX:
    @inproceedings{courty2017joint,
    author = {Courty, Nicolas and Flamary, Remi and Habrard, Amaury and Rakotomamonjy, Alain},
    title = {Joint Distribution Optimal Transportation for Domain Adaptation},
    booktitle = {Neural Information Processing Systems (NIPS)},
    year = {2017}
    }
    R. Mourya, A. Ferrari, R. Flamary, P. Bianchi, C. Richard, Distributed Approach for Deblurring Large Images with Shift-Variant Blur, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: Image deblurring techniques are effective tools to obtain high quality image from acquired image degraded by blur and noise. In applications such as astronomy and satellite imaging, size of acquired images can be extremely large (up to gigapixels) covering a wide field-of-view suffering from shift-variant blur. Most of the existing deblurring techniques are designed to be cost effective on a centralized computing system having a shared memory and possibly multicore processor. The largest image they can handle is then conditioned by the memory capacity of the system. In this paper, we propose a distributed shift-variant image deblurring algorithm in which several connected processing units (each with reasonable computational resources) can deblur simultaneously different portions of a large image while maintaining a certain coherency among them to finally obtain a single crisp image. The proposed algorithm is based on a distributed Douglas-Rachford splitting algorithm with a specific structure of the penalty parameters used in the proximity operator. Numerical experiments show that the proposed algorithm produces images of similar quality as the existing centralized techniques while being distributed and being cost effective for extremely large images.
    BibTeX:
    @inproceedings{mourya2017distributed,
    author = {Mourya, Rahul and Ferrari, Andre and Flamary, Remi and Bianchi, Pascal and Richard, Cedric},
    title = {Distributed Approach for Deblurring Large Images with Shift-Variant Blur},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }
    R. Flamary, Astronomical image reconstruction with convolutional neural networks, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: State of the art methods in astronomical image reconstruction rely on the resolution of a regularized or constrained optimization problem. Solving this problem can be computationally intensive and usually leads to a quadratic or at least superlinear complexity w.r.t. the number of pixels in the image. We investigate in this work the use of convolutional neural networks for image reconstruction in astronomy. With neural networks, the computationally intensive tasks is the training step, but the prediction step has a fixed complexity per pixel, i.e. a linear complexity. Numerical experiments show that our approach is both computationally efficient and competitive with other state of the art methods in addition to being interpretable.
    BibTeX:
    @inproceedings{flamary2017astro,
    author = {Flamary, Remi},
    title = {Astronomical image reconstruction with convolutional neural networks},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }
    R. Ammanouil, A. Ferrari, R. Flamary, C. Ferrari, D. Mary, Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: As the world's largest radio telescope, the Square Kilometer Array (SKA) will provide radio interferometric data with unprecedented detail. Image reconstruction algorithms for radio interferometry are challenged to scale well with TeraByte image sizes never seen before. In this work, we investigate one such 3D image reconstruction algorithm known as MUFFIN (MUlti-Frequency image reconstruction For radio INterferometry). In particular, we focus on the challenging task of automatically finding the optimal regularization parameter values. In practice, finding the regularization parameters using classical grid search is computationally intensive and nontrivial due to the lack of ground- truth. We adopt a greedy strategy where, at each iteration, the optimal parameters are found by minimizing the predicted Stein unbiased risk estimate (PSURE). The proposed self-tuned version of MUFFIN involves parallel and computationally efficient steps, and scales well with large- scale data. Finally, numerical results on a 3D image are presented to showcase the performance of the proposed approach.
    BibTeX:
    @inproceedings{ammanouil2017multi,
    author = {Ammanouil, Rita and Ferrari, Andre and Flamary, Remi and Ferrari, Chiara and Mary, David},
    title = {Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }

    2016

    R. Flamary, C. Févotte, N. Courty, V. Emyia, Optimal spectral transportation with application to music transcription, Neural Information Processing Systems (NIPS), 2016.
    Abstract: Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timber can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data.
    BibTeX:
    @inproceedings{flamary2016ost,
    author = {Flamary, Remi and Févotte, Cédric and Courty, N. and  Emyia, Valentin},
    title = {Optimal spectral transportation with application to music transcription},
    booktitle = { Neural Information Processing Systems (NIPS)},
    year = {2016}
    }
    M. Perrot, N. Courty, R. Flamary, A. Habrard, Mapping estimation for discrete optimal transport, Neural Information Processing Systems (NIPS), 2016.
    Abstract: We are interested in the computation of the transport map of an Optimal Transport problem. Most of the computational approaches of Optimal Transport use the Kantorovich relaxation of the problem to learn a probabilistic coupling but do not address the problem of learning the transport map linked to the original Monge problem. Consequently, it lowers the potential usage of such methods in contexts where out-of-samples computations are mandatory. In this paper we propose a new way to jointly learn the coupling and an approximation of the transport map. We use a jointly convex formulation which can be efficiently optimized. Additionally, jointly learning the coupling and the transport map allows to smooth the result of the Optimal Transport and generalize it on out-of-samples examples. Empirically, we show the interest and the relevance of our method in two tasks: domain adaptation and image editing.
    BibTeX:
    @inproceedings{perrot2016mapping,
    author = {Perrot, M. and Courty, N. and Flamary, R. and Habrard, A.},
    title = {Mapping estimation for discrete optimal transport},
    booktitle = {Neural Information Processing Systems (NIPS)},
    year = {2016}
    }
    I. Harrane, R. Flamary, C. Richard, Doubly partial-diffusion LMS over adaptive networks, Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2016.
    Abstract: Diffusion LMS is an efficient strategy for solving distributed optimization problems with cooperating agents. Nodes are interested in estimating the same parameter vector and exchange information with their neighbors to improve their local estimates. However, successful implementation of such applications depends on a substantial amount of communication resources. In this paper, we introduce diffusion algorithms that have a significantly reduced communication load without compromising performance. We also perform analyses in the mean and mean-square sense. Simulations results are provided to confirm the theoretical findings.
    BibTeX:
    @inproceedings{harrane2016doubly,
    author = {Harrane, Ibrahim and Flamary, R. and Richard, C.},
    title = {Doubly partial-diffusion LMS over adaptive networks},
    booktitle = {Asilomar Conference on Signals, Systems and Computers (ASILOMAR)},
    year = {2016}
    }
    S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, Supervised planetary unmixing with optimal transport, Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
    Abstract: This paper is focused on spectral unmixing and present an original technique based on Optimal Transport. Optimal Transport consists in estimating a plan that transports a spectrum onto another with minimal cost, enabling to compute an associated distance (Wasserstein distance) that can be used as an alternative metric to compare hyperspectral data. This is exploited for spectral unmixing where abundances in each pixel are estimated on the basis of their projections in a Wasserstein sense (Bregman projections) onto known endmembers. In this work an over-complete dictionary is used to deal with internal variability between endmembers, while a regularization term, also based on Wasserstein distance, is used to promote prior proportion knowledge in the endmember groups. Experiments are performed on real hyperspectral data of asteroid 4-Vesta.
    BibTeX:
    @inproceedings{nakhostin2016planetary,
    author = {Nakhostin, Sina  and Courty, Nicolas and Flamary, Remi and Tuia, D. and Corpetti, Thomas},
    title = {Supervised planetary unmixing with optimal transport},
    booktitle = {Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS)},
    year = {2016}
    }
    N. Courty, R. Flamary, D. Tuia, T. Corpetti, Optimal transport for data fusion in remote sensing, International Geoscience and Remote Sensing Symposium (IGARSS), 2016.
    Abstract: One of the main objective of data fusion is the integration of several acquisition of the same physical object, in order to build a new consistent representation that embeds all the information from the different modalities. In this paper, we propose the use of optimal transport theory as a powerful mean of establishing correspondences between the modalities. After reviewing important properties and computational aspects, we showcase its application to three remote sensing fusion problems: domain adaptation, time series averaging and change detection in LIDAR data.
    BibTeX:
    @inproceedings{courty2016optimalrs,
    author = {Courty, N. and Flamary, R. and Tuia, D. and Corpetti, T.},
    title = {Optimal transport for data fusion in remote sensing},
    booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2016}
    }
    I. Harrane, R. Flamary, C. Richard, Toward privacy-preserving diffusion strategies for adaptation and learning over networks, European Conference on Signal Processing (EUSIPCO), 2016.
    Abstract: Distributed optimization allows to address inference problems in a decentralized manner over networks, where agents can exchange information with their neighbors to improve their local estimates. Privacy preservation has become an important issue in many data mining applications. It aims at protecting the privacy of individual data in order to prevent the disclosure of sensitive information during the learning process. In this paper, we derive a diffusion strategy of the LMS type to solve distributed inference problems in the case where agents are also interested in preserving the privacy of the local measurements. We carry out a detailed mean and mean-square error analysis of the algorithm. Simulations are provided to check the theoretical findings.
    BibTeX:
    @inproceedings{haranne2016toward,
    author = {Harrane, I. and Flamary, R. and Richard, C.},
    title = {Toward privacy-preserving diffusion strategies for adaptation and learning over networks},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2016}
    }

    2015

    R. Flamary, A. Rakotomamonjy, G. Gasso, Importance Sampling Strategy for Non-Convex Randomized Block-Coordinate Descent, IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2015.
    Abstract: As the number of samples and dimensionality of optimization problems related to statistics and machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that helps randomized coordinate descent algorithms to focus on blocks that are still far from convergence. The framework applies to problems composed of the sum of two possibly non-convex terms, one being separable and non-smooth. We have compared our algorithm to a full gradient proximal approach as well as to a randomized block coordinate algorithm that considers uniform sampling and cyclic block coordinate descent. Experimental evidences show the clear benefit of using an importance sampling strategy.
    BibTeX:
    @inproceedings{flamary2015importance,
    author = {Flamary, R. and Rakotomamonjy, A. and  Gasso, G.},
    title = {Importance Sampling Strategy for Non-Convex Randomized Block-Coordinate Descent},
    booktitle = {IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)},
    year = {2015}
    }
    D. Tuia, R. Flamary, A. Rakotomamonjy, N. Courty, Multitemporal classification without new labels: a solution with optimal transport, International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multitemp), 2015.
    Abstract: We propose to adapt distributions between couples of remote sensing images with regularized optimal transport: we apply two forms of regularizations, namely an entropy-based regularization and a class-based regularization to a series of classification problems involving very high resolution images acquired by the WorldView2 satellite. We study the effect of the two regularizers on the quality of the transport.
    BibTeX:
    @inproceedings{tuia2015multitemporal,
    author = {Tuia, D. and Flamary, R. and Rakotomamonjy, A. and  Courty, N.},
    title = {Multitemporal classification without new labels: a solution with optimal transport},
    booktitle = {International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multitemp)},
    year = {2015}
    }
    D. Tuia, R. Flamary, M. Barlaud, To be or not to be convex? A study on regularization in hyperspectral image classification, International Geoscience and Remote Sensing Symposium (IGARSS), 2015.
    Abstract: Hyperspectral image classification has long been dominated by convex models, which provide accurate decision functions exploiting all the features in the input space. However, the need for high geometrical details, which are often satisfied by using spatial filters, and the need for compact models (i.e. relying on models issued form reduced input spaces) has pushed research to study alternatives such as sparsity inducing regularization, which promotes models using only a subset of the input features. Although successful in reducing the number of active inputs, these models can be biased and sometimes offer sparsity at the cost of reduced accuracy. In this paper, we study the possibility of using non-convex regularization, which limits the bias induced by the regularization. We present and compare four regularizers, and then apply them to hyperspectral classification with different cost functions.
    BibTeX:
    @inproceedings{tuia2015tobe,
    author = {Tuia, D. and Flamary, R. and Barlaud, M.},
    title = {To be or not to be convex? A study on regularization in   hyperspectral image classification},
    booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2015}
    }

    2014

    A. Boisbunon, R. Flamary, A. Rakotomamonjy, A. Giros, J. Zerubia, Large scale sparse optimization for object detection in high resolution images, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2014.
    Abstract: In this work, we address the problem of detecting objects in images by expressing the image as convolutions between activation matrices and dictionary atoms. The activation matrices are estimated through sparse optimization and correspond to the position of the objects. In particular, we propose an efficient algorithm based on an active set strategy that is easily scalable and can be computed in parallel. We apply it to a toy image and a satellite image where the aim is to detect all the boats in a harbor. These results show the benefit of using nonconvex penalties, such as the log-sum penalty, over the convex l1 penalty.
    BibTeX:
    @inproceedings{boisbunon2014largescale,
    author = {Boisbunon, A. and Flamary, R. and Rakotomamonjy, A. and Giros, A. and Zerubia, J.},
    title = {Large scale sparse optimization for object detection in high resolution images},
    booktitle = {IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2014}
    }
    E. Niaf, R. Flamary, A. Rakotomamonjy, O. Rouvière, C. Lartizien, SVM with feature selection and smooth prediction in images: application to CAD of prostate cancer, IEEE International Conference on Image Processing (ICIP), 2014.
    Abstract: We propose a new computer-aided detection scheme for prostate cancer screening on multiparametric magnetic resonance (mp-MR) images. Based on an annotated training database of mp-MR images from thirty patients, we train a novel support vector machine (SVM)-inspired classifier which simultaneously learns an optimal linear discriminant and a subset of predictor variables (or features) that are most relevant to the classification task, while promoting spatial smoothness of the malignancy prediction maps. The approach uses a $\ell_1$-norm in the regularization term of the optimization problem that rewards sparsity. Spatial smoothness is promoted via an additional cost term that encodes the spatial neighborhood of the voxels, to avoid noisy prediction maps. Experimental comparisons of the proposed $\ell_1$-Smooth SVM scheme to the regular $\ell_2$-SVM scheme demonstrate a clear visual and numerical gain on our clinical dataset.
    BibTeX:
    @inproceedings{niaf2014svmsmooth,
    author = {Niaf, E. and Flamary, R. and Rakotomamonjy, A. and Rouvière, O. and Lartizien, C.},
    title = {SVM with feature selection and smooth prediction in images: application to CAD of prostate cancer},
    booktitle = {IEEE International Conference on Image Processing (ICIP)},
    year = {2014}
    }
    D. Tuia, N. Courty, R. Flamary, A group-lasso active set strategy for multiclass hyperspectral image classification, Photogrammetric Computer Vision (PCV), 2014.
    Abstract: Hyperspectral images have a strong potential for landcover/landuse classification, since the spectra of the pixels can highlight subtle differences between materials and provide information beyond the visible spectrum. Yet, a limitation of most current approaches is the hypothesis of spatial independence between samples: images are spatially correlated and the classification map should exhibit spatial regularity. One way of integrating spatial smoothness is to augment the input spectral space with filtered versions of the bands. However, open questions remain, such as the selection of the bands to be filtered, or the filterbank to be used. In this paper, we consider the entirety of the possible spatial filters by using an incremental feature learning strategy that assesses whether a candidate feature would improve the model if added to the current input space. Our approach is based on a multiclass logistic classifier with group-lasso regularization. The optimization of this classifier yields an optimality condition, that can easily be used to assess the interest of a candidate feature without retraining the model, thus allowing drastic savings in computational time. We apply the proposed method to three challenging hyperspectral classification scenarios, including agricultural and urban data, and study both the ability of the incremental setting to learn features that always improve the model and the nature of the features selected.
    BibTeX:
    @inproceedings{tuia2014grouplasso,
    author = {Tuia, D. and Courty, N. and Flamary, R.},
    title = {A group-lasso active set strategy for multiclass hyperspectral image classification},
    booktitle = {Photogrammetric Computer Vision (PCV)},
    year = {2014}
    }
    J. Lehaire, R. Flamary, O. Rouvière, C. Lartizien, Computer-aided diagnostic for prostate cancer detection and characterization combining learned dictionaries and supervised classification, IEEE International Conference on Image Processing (ICIP), 2014.
    Abstract: This paper aims at presenting results of a computer-aided diagnostic (CAD) system for voxel based detection and characterization of prostate cancer in the peripheral zone based on multiparametric magnetic resonance (mp-MR) imaging. We propose an original scheme with the combination of a feature extraction step based on a sparse dictionary learning (DL) method and a supervised classification in order to discriminate normal (N), normal but suspect (NS) tissues as well as different classes of cancer tissue whose aggressiveness is characterized by the Gleason score ranging from 6 (GL6) to 9 (GL9). We compare the classification performance of two supervised methods, the linear support vector machine (SVM) and the multinomial logistic regression (MLR) classifiers in a binary classification task. Classification performances were evaluated over an mp-MR image database of 35 patients where each voxel was labeled, based on a ground truth, by an expert radiologist. Results show that the proposed method in addition to being interpretable thanks to the sparse representation of the voxels compares favorably (AUC>0.8) with recent state of the art performances. Preliminary results on example patients data also indicate that the outputs cancer probability maps are correlated to the Gleason score.
    BibTeX:
    @inproceedings{lehaire2014dicolearn,
    author = {Lehaire, J. and Flamary, R. and Rouvière, O. and Lartizien, C.},
    title = {Computer-aided diagnostic for prostate cancer detection and characterization combining learned dictionaries and supervised
    classification},
    booktitle = {IEEE International Conference on Image Processing (ICIP)},
    year = {2014}
    }
    A. Ferrari, D. Mary, R. Flamary, C. Richard, Distributed image reconstruction for very large arrays in radio astronomy, IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2014.
    Abstract: Current and future radio interferometric arrays such as LOFAR and SKA are characterized by a paradox. Their large number of receptors (up to millions) allow theoretically unprecedented high imaging resolution. In the same time, the ultra massive amounts of samples makes the data transfer and computational loads (correlation and calibration) order of magnitudes too high to allow any currently existing image reconstruction algorithm to achieve, or even approach, the theoretical resolution. We investigate here decentralized and distributed image reconstruction strategies which select, transfer and process only a fraction of the total data. The loss in MSE incurred by the proposed approach is evaluated theoretically and numerically on simple test cases.
    BibTeX:
    @inproceedings{ferrari2014distributed,
    author = {Ferrari, A. and Mary, D. and Flamary, R. and Richard, C.},
    title = {Distributed image reconstruction for very large arrays in radio astronomy},
    booktitle = {IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM)},
    year = {2014}
    }
    N. Courty, R. Flamary, D. Tuia, Domain adaptation with regularized optimal transport, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2014.
    Abstract: We present a new and original method to solve the domain adaptation problem using optimal transport. By searching for the best transportation plan between the probability distribution functions of a source and a target domain, a non-linear and invertible transformation of the learning samples can be estimated. Any standard machine learning method can then be applied on the transformed set, which makes our method very generic. We propose a new optimal transport algorithm that incorporates label information in the optimization: this is achieved by combining an efficient matrix scaling technique together with a majoration of a non-convex regularization term. By using the proposed optimal transport with label regularization, we obtain significant increase in performance compared to the original transport solution. The proposed algorithm is computationally efficient and effective, as illustrated by its evaluation on a toy example and a challenging real life vision dataset, against which it achieves competitive results with respect to state-of-the-art methods.
    BibTeX:
    @inproceedings{courty2014domain,
    author = {Courty, N. and Flamary, R. and Tuia, D.},
    title = {Domain adaptation with regularized optimal transport},
    booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)},
    year = {2014}
    }
    A. Boisbunon, R. Flamary, A. Rakotomamonjy, Active set strategy for high-dimensional non-convex sparse optimization problems, International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2014.
    Abstract: The use of non-convex sparse regularization has attracted much interest when estimating a very sparse model on high dimensional data. In this work we express the optimality conditions of the optimization problem for a large class of non-convex regularizers. From those conditions, we derive an efficient active set strategy that avoids the computing of unnecessary gradients. Numerical experiments on both generated and real life datasets show a clear gain in computational cost w.r.t. the state of the art when using our method to obtain very sparse solutions.
    BibTeX:
    @inproceedings{boisbunon2014active,
    author = {Boisbunon, A. and Flamary, R. and Rakotomamonjy, A.},
    title = {Active set strategy for high-dimensional non-convex sparse optimization problems},
    booktitle = {International Conference on Acoustic, Speech and Signal Processing (ICASSP)},
    year = {2014}
    }

    2013

    W. Gao, J. Chen, C. Richard, J. Huang, R. Flamary, Kernel LMS algorithm with Forward-Backward splitting for dictionnary learning, International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2013.
    Abstract: Nonlinear adaptive filtering with kernels has become a topic of high interest over the last decade. A characteristics of kernel-based techniques is that they deal with kernel expansions whose number of terms is equal to the number of input data, making them unsuitable for online applications. Kernel-based adaptive filtering algorithms generally rely on a two-stage process at each iteration: a model order control stage that limits the increase in the number of terms by including only valuable kernels into the so-called dictionary, and a fil- ter parameter update stage. It is surprising to note that most existing strategies for dictionary update can only incorporate new elements into the dictionary. This unfortunately means that they cannot discard obsolete kernel functions, within the context of a time-varying environment in particular. Recently, to remedy this drawback, it has been proposed to associate an l1-norm regularization criterion with the mean-square error criterion. The aim of this paper is to provide theoretical results on the convergence of this approach.
    BibTeX:
    @inproceedings{gao2013kernel,
    author = {Gao, W. and Chen, J. and Richard, C. and Huang, J. and Flamary, R.},
    title = {Kernel LMS algorithm with Forward-Backward splitting for dictionnary learning},
    booktitle = {International Conference on Acoustic, Speech and Signal Processing (ICASSP)},
    year = {2013}
    }
    R. Flamary, A. Rakotomamonjy, Support Vector Machine with spatial regularization for pixel classification, International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines : theory and applications (ROKS), 2013.
    Abstract: We propose in this work to regularize the output of a svm classifier on pixels in order to promote smoothness in the predicted image. The learning problem can be cast as a semi-supervised SVM with a particular structure encoding pixel neighborhood in the regularization graph. We provide several optimization schemes in order to solve the problem for linear SVM with l2 or l1 regularization and show the interest of the approach on an image classification example with very few labeled pixels.
    BibTeX:
    @inproceedings{ROKS2013,
    author = {  Flamary, R. and  Rakotomamonjy, A.},
    title = {Support Vector Machine with spatial regularization for pixel classification},
    booktitle = { International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines : theory and applications (ROKS)},
    year = {2013}
    }
    D. Tuia, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, R. Flamary, Create the relevant spatial filterbank in the hyperspectral jungle, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2013.
    Abstract: Inclusion of spatial information is known to be beneficial to the classification of hyperspectral images. However, given the high dimensionality of the data, it is difficult to know before hand which are the bands to filter or what are the filters to be applied. In this paper, we propose an active set algorithm based on a $l_1$ support vector machine that explores the (possibily infinite) space of spatial filters and retrieve automatically the filters that maximize class separation. Experiments on hyperspectral imagery confirms the power of the method, that reaches state of the art performance with small feature sets generated automatically and without prior knowledge.
    BibTeX:
    @inproceedings{IGARSS2013,
    author = {  Tuia, D. and  Volpi, M. and  Dalla Mura, M. and  Rakotomamonjy, A. and  Flamary, R.},
    title = {Create the relevant spatial filterbank in the hyperspectral jungle},
    booktitle = { IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2013}
    }

    2012

    D. Tuia, R. Flamary, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, Discovering relevant spatial filterbanks for VHR image classification, International Conference on Pattern Recognition (ICPR), 2012.
    Abstract: In very high resolution (VHR) image classification it is common to use spatial filters to enhance the discrimination among landuses related to similar spectral properties but different spatial characteristics. However, the filters types that can be used are numerous (e.g. textural, morphological, Gabor, wavelets, etc.) and the user must pre-select a family of features, as well as their specific parameters. This results in features spaces that are high dimensional and redundant, thus requiring long and suboptimal feature selection phases. In this paper, we propose to discover the relevant filters as well as their parameters with a sparsity promoting regularization and an active set algorithm that iteratively adds to the model the most promising features. This way, we explore the filters/parameters input space efficiently (which is infinitely large for continuous parameters) and construct the optimal filterbank for classification without any other information than the types of filters to be used.
    BibTeX:
    @inproceedings{ICPR2012,
    author = {  Tuia, D. and  Flamary, R. and  Volpi, M. and  Dalla Mura, M. and  Rakotomamonjy, A.},
    title = { Discovering relevant spatial filterbanks for VHR image classification},
    booktitle = { International Conference on Pattern Recognition (ICPR)},
    year = {2012}
    }
    E. Niaf, R. Flamary, S. Canu, O. Rouvière, C. Lartizien, Handling learning samples uncertainties in SVM : application to MRI-based prostate cancer Computer-Aided Diagnosis, IEEE International Symposium on Biomedical Imaging , 2012.
    Abstract: Building an accurate training database is challenging in supervised classification. Radiologists often delineate malignant and benign tissues without access to the ground truth thus leading to uncertain datasets. We propose to deal with this uncertainty by introducing probabilistic labels in the learning stage. We introduce a probabilistic support vector machine (P-SVM) inspired from the regular C-SVM formulation allowing to consider class labels through a hinge loss and probability estimates using epsilon-insensitive cost function together with a minimum norm (maximum margin) objective. Solution is used for both decision and posterior probability estimation.
    BibTeX:
    @inproceedings{isbi2012,
    author = { Niaf, E. and Flamary, R. and Canu, S. and Rouvière, O. and Lartizien, C.},
    title = {Handling learning samples uncertainties in SVM : application to MRI-based prostate cancer Computer-Aided Diagnosis},
    booktitle = { IEEE International Symposium on Biomedical Imaging },
    year = {2012}
    }

    2011

    R. Flamary, F. Yger, A. Rakotomamonjy, Selecting from an infinite set of features in SVM, European Symposium on Artificial Neural Networks, 2011.
    Abstract: Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling
    BibTeX:
    @inproceedings{ESANN2011,
    author = { Flamary, R. and Yger, F. and Rakotomamonjy, A.},
    title = { Selecting from an infinite set of features in SVM},
    booktitle = { European Symposium on Artificial Neural Networks},
    year = {2011}
    }
    R. Flamary, X. Anguera, N. Oliver, Spoken WordCloud: Clustering Recurrent Patterns in Speech, International Workshop on Content-Based Multimedia Indexing, 2011.
    Abstract: The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create the summary. However, this approach might not be suitable with adverse acoustic conditions or languages with limited training resources. In order to address these limitations, we propose in this paper an automatic speech summarization method that is based on the automatic discovery of patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions in the recording. This approach allows us to build what we call a Spoken WordCloud because of its similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.
    BibTeX:
    @inproceedings{CBMI2011,
    author = { Flamary, R. and Anguera, X. and Oliver, N.},
    title = { Spoken WordCloud: Clustering Recurrent Patterns in Speech},
    booktitle = { International Workshop on Content-Based Multimedia Indexing},
    year = {2011}
    }
    E. Niaf, R. Flamary, C. Lartizien, S. Canu, Handling uncertainties in SVM classification, IEEE Workshop on Statistical Signal Processing , 2011.
    Abstract: This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here is either qualitative (a class label) or quantitative (an estimation of the posterior probability). Our main contribution is a SVM inspired formulation of this problem allowing to take into account class label through a hinge loss as well as probability estimates using epsilon-insensitive cost function together with a minimum norm (maximum margin) objective. This formulation shows a dual form leading to a quadratic problem and allows the use of a representer theorem and associated kernel. The solution provided can be used for both decision and posterior probability estimation. Based on empirical evidence our method outperforms regular SVM in terms of probability predictions and classification performances.
    BibTeX:
    @inproceedings{ssp2011,
    author = { Niaf, E. and Flamary, R. and Lartizien, C. and Canu, S.},
    title = {Handling uncertainties in SVM classification},
    booktitle = { IEEE Workshop on Statistical Signal Processing },
    year = {2011}
    }

    2010

    R. Flamary, B. Labbé, A. Rakotomamonjy, Large margin filtering for signal sequence labeling, International Conference on Acoustic, Speech and Signal Processing 2010, 2010.
    Abstract: Signal Sequence Labeling consists in predicting a sequence of labels given an observed sequence of samples. A naive way is to filter the signal in order to reduce the noise and to apply a classification algorithm on the filtered samples. We propose in this paper to jointly learn the filter with the classifier leading to a large margin filtering for classification. This method allows to learn the optimal cutoff frequency and phase of the filter that may be different from zero. Two methods are proposed and tested on a toy dataset and on a real life BCI dataset from BCI Competition III.
    BibTeX:
    @inproceedings{flamaryicassp210,
    author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.},
    title = {Large margin filtering for signal sequence labeling},
    booktitle = { International Conference on Acoustic, Speech and Signal Processing  2010},
    year = {2010}
    }
    D. Tuia, G. Camps-Valls, R. Flamary, A. Rakotomamonjy, Learning spatial filters for multispectral image segmentation, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2010.
    Abstract: We present a novel filtering method for multispectral satellite image classification. The proposed method learns a set of spatial filters that maximize class separability of binary support vector machine (SVM) through a gradient descent approach. Regularization issues are discussed in detail and a Frobenius-norm regularization is proposed to efficiently exclude uninformative filters coefficients. Experiments carried out on multiclass one-against-all classification and target detection show the capabilities of the learned spatial filters
    BibTeX:
    @inproceedings{mlsp10,
    author = { Tuia, D. and Camps-Valls, G. and Flamary, R. and Rakotomamonjy, A.},
    title = {Learning spatial filters for multispectral image segmentation},
    booktitle = { IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2010}
    }

    2009

    R. Flamary, J. Rose, A. Rakotomamonjy, S. Canu, Variational Sequence Labeling, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2009.
    Abstract: Sequence labeling is concerned with processing an input data sequence and producing an output sequence of discrete labels which characterize it. Common applications includes speech recognition, language processing (tagging, chunking) and bioinformatics. Many solutions have been proposed to partially cope with this problem. These include probabilistic models (HMMs, CRFs) and machine learning algorithm (SVM, Neural nets). In practice, the best results have been obtained by combining several of these methods. However, fusing different signal segmentation methods is not straightforward, particularly when integrating prior information. In this paper the sequence labeling problem is viewed as a multi objective optimization task. Each objective targets a different aspect of sequence labelling such as good classification, temporal stability and change detection. The resulting optimization problem turns out to be non convex and plagued with numerous local minima. A region growing algorithm is proposed as a method for finding a solution to this multi functional optimization task. The proposed algorithm is evaluated on both synthetic and real data (BCI dataset). Results are encouraging and better than those previously reported on these datasets.
    BibTeX:
    @inproceedings{mlsp09,
    author = { R. Flamary and J.L. Rose and A. Rakotomamonjy and S. Canu},
    title = {Variational Sequence Labeling},
    booktitle = { IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2009}
    }