Rémi Flamary

Site web professionel

Publications

Toutes Revues Conférences Livres et chapitres Autres

Recherche rapide:  

Sélectionnés: 0.

Search Settings

    Travaux soumis ou sous presse

    P. Krzakala, J. Yang, R. Flamary, F. d'Alché-Buc, C. Laclau, M. Labeau, End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov-Wasserstein Matching (Submited), 2024.
    Abstract: We present a novel end-to-end deep learning-based approach for Supervised Graph Prediction (SGP). We introduce an original Optimal Transport (OT)-based loss, the Partially-Masked Fused Gromov-Wasserstein loss (PM-FGW), that allows to directly leverage graph representations such as adjacency and feature matrices. PM-FGW exhibits all the desirable properties for SGP: it is node permutation invariant, sub-differentiable and handles graphs of different sizes by comparing their padded representations as well as their masking vectors. Moreover, we present a flexible transformer-based architecture that easily adapts to different types of input data. In the experimental section, three different tasks, a novel and challenging synthetic dataset (image2graph) and two real-world tasks, image2map and fingerprint2molecule - showcase the efficiency and versatility of the approach compared to competitors.
    BibTeX:
    @inproceedings{krzakala2024endtoend,
    author = {Paul Krzakala and Junjie Yang and Rémi Flamary and Florence d'Alché-Buc and Charlotte Laclau and Matthieu Labeau},
    title = {End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov-Wasserstein Matching},
    year = {2024 (Submited)}
    }
    H. V. Assel, C. Vincent-Cuaz, N. Courty, R. Flamary, P. Frossard, T. Vayer, Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection (Submited), 2024.
    Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto interpretable spaces or organizing points into meaningful clusters. In practice, these methods are used sequentially, without guaranteeing that the clustering aligns well with the conducted dimensionality reduction. In this work, we offer a fresh perspective: that of distributions. Leveraging tools from optimal transport, particularly the Gromov-Wasserstein distance, we unify clustering and dimensionality reduction into a single framework called distributional reduction. This allows us to jointly address clustering and dimensionality reduction with a single optimization problem. Through comprehensive experiments, we highlight the versatility and interpretability of our method and show that it outperforms existing approaches across a variety of image and genomics datasets.
    BibTeX:
    @inproceedings{vanassel2024distributional,
    author = {Hugues Van Assel and Cédric Vincent-Cuaz and Nicolas Courty and Rémi Flamary and Pascal Frossard and Titouan Vayer},
    title = {Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection},
    year = {2024 (Submited)}
    }
    C. Le Coz, A. Tantet, R. Flamary, R. Plougonven, A barycenter-based approach for the multi-model ensembling of subseasonal forecasts (Submited), 2023.
    Abstract: Ensemble forecasts and their combination are explored from the perspective of a probability space. Manipulating ensemble forecasts as discrete probability distributions, multi-model ensembles (MMEs) are reformulated as barycenters of these distributions. Barycenters are defined with respect to a given distance. The barycenter with respect to the L2-distance is shown to be equivalent to the pooling method. Then, the barycenter-based approach is extended to a different distance with interesting properties in the distribution space: the Wasserstein distance. Another interesting feature of the barycenter approach is the possibility to give different weights to the ensembles and so to naturally build weighted MME. As a proof of concept, the L2- and the Wasserstein-barycenters are applied to combine two models from the S2S database, namely the European Centre Medium-Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction (NCEP) models. The performance of the two (weighted-) MMEs are evaluated for the prediction of weekly 2m-temperature over Europe for seven winters. The weights given to the models in the barycenters are optimized with respect to two metrics, the CRPS and the proportion of skilful forecasts. These weights have an important impact on the skill of the two barycenter-based MMEs. Although the ECMWF model has an overall better performance than NCEP, the barycenter-ensembles are generally able to outperform both. However, the best MME method, but also the weights, are dependent on the metric. These results constitute a promising first implementation of this methodology before moving to combination of more models.
    BibTeX:
    @article{coz2023barycenterbased,
    author = {Le Coz,Camille and Tantet, Alexis  and Flamary, Rémi  and Plougonven, Riwal},
    title = {A barycenter-based approach for the multi-model ensembling of subseasonal forecasts},
    year = {2023 (Submited)}
    }
    E. Tanguy, R. Flamary, J. Delon, Properties of Discrete Sliced Wasserstein Losses (Submited), 2023.
    Abstract: The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of the SW distance between two uniform discrete measures with the same amount of points as a function of the support Y of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation Ep (estimating the expectation in SW using only p samples) and show convergence results on the critical points of Ep to those of E, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising E and Ep converge towards (Clarke) critical points of these energies.
    BibTeX:
    @article{tanguy2023properties,
    author = {Eloi Tanguy and Rémi Flamary and Julie Delon},
    title = {Properties of Discrete Sliced Wasserstein Losses},
    year = {2023 (Submited)}
    }
    A. Trovato, É. Chassande-Mottin, M. Bejger, R. Flamary, N. Courty, Neural network time-series classifiers for gravitational-wave searches in single-detector periods (Submited), 2023.
    Abstract: The search for gravitational-wave signals is limited by non-Gaussian transient noises that mimic astrophysical signals. Temporal coincidence between two or more detectors is used to mitigate contamination by these instrumental glitches. However, when a single detector is in operation, coincidence is impossible, and other strategies have to be used. We explore the possibility of using neural network classifiers and present the results obtained with three types of architectures: convolutional neural network, temporal convolutional network, and inception time. The last two architectures are specifically designed to process time-series data. The classifiers are trained on a month of data from the LIGO Livingston detector during the first observing run (O1) to identify data segments that include the signature of a binary black hole merger. Their performances are assessed and compared. We then apply trained classifiers to the remaining three months of O1 data, focusing specifically on single-detector times. The most promising candidate from our search is 2016-01-04 12:24:17 UTC. Although we are not able to constrain the significance of this event to the level conventionally followed in gravitational-wave searches, we show that the signal is compatible with the merger of two black holes with masses m1=50.7+10.4-8.9M⊙ and m2=24.4+20.2-9.3M at the luminosity distance of dL=564+812-338Mpc.
    BibTeX:
    @article{trovato2023neural,
    author = {Trovato, Agata and Chassande-Mottin, Éric and Bejger, Michal and Flamary, Rémi and Courty, Nicolas},
    title = {Neural network time-series classifiers for gravitational-wave searches in single-detector periods},
    year = {2023 (Submited)}
    }
    E. Tanguy, R. Flamary, J. Delon, Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance (Submited), 2023.
    Abstract: This paper deals with the reconstruction of a discrete discrete measure $\gamma_Z$ on $\mathbbR^d$ from the knowledge of its pushforwards measures $P_i\#\gamma_Z$ by linear linear applications $P_i: \mathbbR^d \rightarrow \mathbbR^d_i$ (for instance projections onto subspaces). The measure $\gamma_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $\gamma_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.
    BibTeX:
    @article{tanguy2023reconstructing,
    author = {Eloi Tanguy and Rémi Flamary and Julie Delon},
    title = {Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance},
    year = {2023 (Submited)}
    }

    2024

    A. Collas, R. Flamary, A. Gramfort, Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications, 2024.
    Abstract: This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA), specifically addressing the challenge of limited labeled signals in the target dataset. Leveraging a domain-dependent mixing model and the optimal transport domain adaptation assumption, we exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains. Theoretical foundations are laid for identifying crucial Stiefel matrices, essential for recovering underlying signal variances from a Riemannian representation of observed signal covariances. We propose an integrated cost function that simultaneously learns these matrices, pairwise domain relationships, and a predictor, classifier, or regressor, depending on the task. Applied to neuroscience problems, MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset.
    BibTeX:
    @techreport{collas2024weakly,
    author = {Antoine Collas and Rémi Flamary and Alexandre Gramfort},
    title = {Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications},
    institution = {INRIA, École Polytechnique},
    year = {2024}
    }

    2023

    T. Gnassounou, R. Flamary, A. Gramfort, Convolutional Monge Mapping Normalization for learning on biosignals, Neural Information Processing Systems (NeurIPS), 2023.
    Abstract: In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances.
    BibTeX:
    @inproceedings{gnassounou2023convolutional,
    author = {Gnassounou, Théo and Flamary, Rémi and Gramfort, Alexandre},
    title = {Convolutional Monge Mapping Normalization for learning on biosignals},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2023}
    }
    H. Van Assel, T. Vayer, R. Flamary, N. Courty, SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities, Neural Information Processing Systems (NeurIPS), 2023.
    Abstract: Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.
    BibTeX:
    @inproceedings{van2023snekhorn,
    author = {Van Assel, Hugues and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
    title = {SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2023}
    }
    H. Van Assel, C. Vincent-Cuaz, T. Vayer, R. Flamary, N. Courty, Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein, Optimal Tranport and Machine Leraning Workshop at NeuriPS 2023, 2023.
    Abstract: We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    BibTeX:
    @conference{van2023interpolating,
    author = {Van Assel, Hugues and Vincent-Cuaz, Cédric and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
    title = {Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein},
    howpublished = {Optimal Tranport and Machine Leraning Workshop at NeuriPS 2023},
    year = {2023}
    }
    H. Van Assel, T. Vayer, R. Flamary, N. Courty, Optimal Transport with Adaptive Regularisation, Optimal Tranport and Machine Leraning Workshop at NeuriPS 2023, 2023.
    Abstract: Regularising the primal formulation of optimal transport (OT) with a strictly convex term leads to enhanced numerical complexity and a denser transport plan. Many formulations impose a global constraint on the transport plan, for instance by relying on entropic regularisation. As it is more expensive to diffuse mass for outlier points compared to central ones, this typically results in a significant imbalance in the way mass is spread across the points. This can be detrimental for some applications where a minimum of smoothing is required per point. To remedy this, we introduce OT with Adaptive RegularIsation (OTARI), a new formulation of OT that imposes constraints on the mass going in or/and out of each point. We then showcase the benefits of this approach for domain adaptation.
    BibTeX:
    @conference{van2023optimal,
    author = {Van Assel, Hugues and Vayer, Titouan and Flamary, Rémi and Courty, Nicolas},
    title = {Optimal Transport with Adaptive Regularisation},
    howpublished = {Optimal Tranport and Machine Leraning Workshop at NeuriPS 2023},
    year = {2023}
    }
    C. Le Coz, A. Tantet, R. Flamary, R. Plougonven, Optimal transport for the multi-model combination of sub-seasonal ensemble forecasts, European Geoscience Union (EGU) General Assembly 2023, 2023.
    Abstract: Combining ensemble forecasts from several models has been shown to improve the skill of S2S predictions. One of the most used method for such combination is the “pooled ensemble” method, i.e. the concatenation of the ensemble members from the different models. The members of the new multi-model ensemble can simply have the same weights or be given different weights based on the skills of the models. If one sees the ensemble forecasts as discrete probability distributions, then the “pooled ensemble” is their (weighted-)barycenter with respect to the L2 distance. Here, we investigate whether a different metric when computing the barycenter may help improve the skill of S2S predictions. We consider in this work a second barycenter with respect to the Wasserstein distance. This distance is defined as the cost of the optimal transport between two distributions and has interesting properties in the distribution space, such as the possibility to preserve the temporal consistency of the ensemble members. We compare the L2 and Wasserstein barycenters for the combination of two models from the S2S database, namely ECMWF and NCEP. Their performances are evaluated for the weekly 2m-temperature over seven winters in Europe (land) in terms of different scores. The weights of the models in the barycenters are estimated from the data using grid search with cross-validation. We show that the estimation of these weights is critical as it greatly impacts the score of the barycenters. Although the NCEP ensemble generally has poorer skills than the ECMWF one, the barycenter ensembles are able to improve on both single-model ensembles (although not for all scores). At the end, the best ensemble depends on the score and on the location. These results constitute a promising first step before implementing this methodology with more than two ensembles, and ensembles having less contrasting skills.
    BibTeX:
    @conference{le2023optimal,
    author = {Le Coz, Camille and Tantet, Alexis and Flamary, Rémi and Plougonven, Riwal},
    title = {Optimal transport for the multi-model combination of sub-seasonal ensemble forecasts},
    howpublished = { European Geoscience Union (EGU) General Assembly 2023},
    year = {2023}
    }
    A. Collas, T. Vayer, R. Flamary, A. Breloy, Entropic Wasserstein component analysis, IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2023.
    Abstract: Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online.
    BibTeX:
    @inproceedings{collas2023entropic,
    author = {Collas, Antoine and Vayer, Titouan and Flamary, Rémi and Breloy, Arnaud},
    title = {Entropic Wasserstein component analysis},
    booktitle = { IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},
    year = {2023}
    }
    Q. H. Tran, H. Janati, N. Courty, R. Flamary, I. Redko, P. Demetci, R. Singh, Unbalanced CO-Optimal Transport, Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI), 2023.
    Abstract: Optimal transport (OT) compares probability distributions by computing a meaningful alignment between their samples. CO-optimal transport (COOT) takes this comparison further by inferring an alignment between features as well. While this approach leads to better alignments and generalizes both OT and Gromov-Wasserstein distances, we provide a theoretical result showing that it is sensitive to outliers that are omnipresent in real-world data. This prompts us to propose unbalanced COOT for which we provably show its robustness to noise in the compared datasets. To the best of our knowledge, this is the first such result for OT methods in incomparable spaces. With this result in hand, we provide empirical evidence of this robustness for the challenging tasks of heterogeneous domain adaptation with and without varying proportions of classes and simultaneous alignment of samples and features across single-cell measurements.
    BibTeX:
    @inproceedings{tran2023unbalanced,
    author = { Tran, Quang Huy and Janati, Hicham and Courty, Nicolas and Flamary, Rémi and Redko, Ievgen and Demetci, Pinar and Singh, Ritambhara},
    title = {Unbalanced CO-Optimal Transport},
    booktitle = { Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI)},
    year = {2023}
    }
    D. Bouche, R. Flamary, F. d'Alché-Buc, R. Plougonven, M. Clausel, J. Badosa, . P. Drobinski, Wind power predictions from nowcasts to 4-hour forecasts: a learning approach with variable selection, Renewable Energy, 2023.
    Abstract: We study the prediction of short term wind speed and wind power (every 10 minutes up to 4 hours ahead). Accurate forecasts for those quantities are crucial to mitigate the negative effects of wind farms' intermittent production on energy systems and markets. For those time scales, outputs of numerical weather prediction models are usually overlooked even though they should provide valuable information on higher scales dynamics. In this work, we combine those outputs with local observations using machine learning. So as to make the results usable for practitioners, we focus on simple and well known methods which can handle a high volume of data. We study first variable selection through two simple techniques, a linear one and a nonlinear one. Then we exploit those results to forecast wind speed and wind power still with an emphasis on linear models versus nonlinear ones. For the wind power prediction, we also compare the indirect approach (wind speed predictions passed through a power curve) and the indirect one (directly predict wind power).
    BibTeX:
    @article{bouche2023wind,
    author = { Bouche, Dimitri and Flamary, Rémi and d'Alché-Buc, Florence and   Plougonven, Riwal and Clausel, Marianne and Badosa, Jordi and Drobinski,
      Philippe},
    title = {Wind power predictions from nowcasts to 4-hour forecasts: a learning   approach with variable selection},
    journal = {Renewable Energy},
    year = {2023}
    }

    2022

    T. Vayer, L. Chapel, N. Courty, R. Flamary, Y. Soullard, R. Tavenard, Time Series Alignment with Global Invariances, Transactions on Machine Learning Research (TMLR), 2022.
    Abstract: In this work we address the problem of comparing time series while taking into account both feature space transformation and temporal variability. The proposed framework combines a latent global transformation of the feature space with the widely used Dynamic Time Warping (DTW). The latent global transformation captures the feature invariance while the DTW (or its smooth counterpart soft-DTW) deals with the temporal shifts. We cast the problem as a joint optimization over the global transformation and the temporal alignments. The versatility of our framework allows for several variants depending on the invariance class at stake. Among our contributions we define a differentiable loss for time series and present two algorithms for the computation of time series barycenters under our new geometry. We illustrate the interest of our approach on both simulated and real world data.
    BibTeX:
    @article{vayer2022time,
    author = {Titouan Vayer and Laetitia Chapel and Nicolas Courty and Rémi Flamary and Yann Soullard and Romain Tavenard},
    title = {Time Series Alignment with Global Invariances},
    journal = { Transactions on Machine Learning Research (TMLR)},
    year = {2022}
    }
    C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Semi-relaxed Gromov-Wasserstein divergence for graphs classification, Colloque GRETSI 2022-XXVIIIème Colloque Francophone de Traitement du Signal et des Images, 2022.
    Abstract: Comparing structured objects such as graphs is a fundamental operation involved in many learning tasks. To this end, the GromovWasserstein (GW) distance, based on Optimal Transport (OT), has been successful in providing meaningful comparison between such entities. GW operates on graphs, seen as probability measures over spaces depicted by their nodes connectivity relations. At the core of OT is the idea of mass conservation, which imposes a coupling between all the nodes from the two considered graphs. We argue in this paper that this property can be detrimental for tasks such as graph dictionary learning (DL), and we relax it by proposing a new semi-relaxed Gromov-Wasserstein divergence. The latter leads to immediate computational benefits and naturally induces a new graph DL method, shown to be relevant for unsupervised representation learning and classification of graphs.
    BibTeX:
    @conference{vincent2022semiclassif,
    author = {Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and Vayer, Titouan and Courty, Nicolas},
    title = {Semi-relaxed Gromov-Wasserstein divergence for graphs classification},
    howpublished = {Colloque GRETSI 2022-XXVIIIème Colloque Francophone de Traitement du Signal et des Images},
    year = {2022}
    }
    C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Template based Graph Neural Network with Optimal Transport Distances, Neural Information Processing Systems (NeurIPS), 2022.
    Abstract: Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.
    BibTeX:
    @inproceedings{vincentcuaz2022template,
    author = { Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and Vayer, Titouan and Courty, Nicolas},
    title = {Template based Graph Neural Network with Optimal Transport   Distances},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2022}
    }
    A. Thual, H. Tran, T. Zemskova, N. Courty, R. Flamary, S. Dehaene, B. Thirion, Aligning individual brains with Fused Unbalanced Gromov-Wasserstein, Neural Information Processing Systems (NeurIPS), 2022.
    Abstract: Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel method for inter-subject alignment based on Optimal Transport, denoted as Fused Unbalanced Gromov Wasserstein (FUGW). The method aligns cortical surfaces based on the similarity of their functional signatures in response to a variety of stimulation settings, while penalizing large deformations of individual topographic organization. We demonstrate that FUGW is well-suited for whole-brain landmark-free alignment. The unbalanced feature allows to deal with the fact that functional areas vary in size across subjects. Our results show that FUGW alignment significantly increases between-subject correlation of activity for independent functional data, and leads to more precise mapping at the group level.
    BibTeX:
    @inproceedings{thual2022aligning,
    author = { Thual, Alexis and Tran, Huy and Zemskova, Tatiana and Courty, Nicolas and Flamary, Rémi and Dehaene, Stanislas and Thirion, Bertrand},
    title = {Aligning individual brains with Fused Unbalanced Gromov-Wasserstein},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2022}
    }
    L. Brogat-Motte, R. Flamary, C. Brouard, J. Rousu, F. d'Alché-Buc, Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters, International Conference In Machine Learning (ICML), 2022.
    Abstract: This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge regression for which theoretical results such as consistency and excess risk bound are proved. Next we propose an interpretable parametric model where the barycenter weights are modeled with a neural network and the graphs on which the FGW barycenter is calculated are additionally learned. Numerical experiments show the strength of the method and its ability to interpolate in the labeled graph space on simulated data and on a difficult metabolic identification problem where it can reach very good performance with very little engineering.
    BibTeX:
    @inproceedings{brogat2022learning,
    author = {Brogat-Motte, Luc and Flamary, Rémi and Brouard, Céline and Rousu, Juho and d'Alché-Buc, Florence},
    title = {Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters},
    booktitle = { International Conference In Machine Learning (ICML)},
    year = {2022}
    }
    R. Turrisi, R. Flamary, A. Rakotomamonjy, M. Pontil, Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport, Conference on Uncertainty in Artificial Intelligence (UAI), 2022.
    Abstract: The problem of domain adaptation on an unlabeled target dataset using knowledge from multiple labelled source datasets is becoming increasingly important. A key challenge is to design an approach that overcomes the covariate and target shift both among the sources, and between the source and target domains. In this paper, we address this problem from a new perspective: instead of looking for a latent representation invariant between source and target domains, we exploit the diversity of source distributions by tuning their weights to the target task at hand. Our method, named Weighted Joint Distribution Optimal Transport (WJDOT), aims at finding simultaneously an Optimal Transport-based alignment between the source and target distributions and a re-weighting of the sources distributions. We discuss the theoretical aspects of the method and propose a conceptually simple algorithm. Numerical experiments indicate that the proposed method achieves state-of-the-art performance on simulated and real-life datasets.
    BibTeX:
    @inproceedings{turrisi2022multisource,
    author = {Rosanna Turrisi and Rémi Flamary and Alain Rakotomamonjy and Massimiliano Pontil},
    title = {Multi-source Domain Adaptation via Weighted Joint Distributions Optimal Transport},
    booktitle = { Conference on Uncertainty in Artificial Intelligence (UAI)},
    year = {2022}
    }
    L. Dragoni, R. Flamary, K. Lounici, P. Reynaud-Bouret, Sliding window strategy for convolutional spike sorting with Lasso: Algorithm, theoretical guarantees and complexity, Acta Applicandae Mathematicae, Vol. 179, N. 78, 2022.
    Abstract: We present a fast algorithm for the resolution of the Lasso for convolutional models in high dimension, with a particular focus on the problem of spike sorting in neuroscience. Making use of biological properties related to neurons, we explain how the particular structure of the problem allows several optimizations, leading to an algorithm with a temporal complexity which grows linearly with respect to the size of the recorded signal and can be performed online. Moreover the spatial separability of the initial problem allows to break it into subproblems, further reducing the complexity and making possible its application on the latest recording devices which comprise a large number of sensors. We provide several mathematical results: the size and numerical complexity of the subproblems can be estimated mathematically by using percolation theory. We also show under reasonable assumptions that the Lasso estimator retrieves the true support with large probability. Finally the theoretical time complexity of the algorithm is given. Numerical simulations are also provided in order to illustrate the efficiency of our approach.
    BibTeX:
    @article{dragoni2022sliding,
    author = {Dragoni, Laurent and Flamary, Rémi and Lounici, Karim and Reynaud-Bouret, Patricia},
    title = {Sliding window strategy for convolutional spike sorting with Lasso: Algorithm, theoretical guarantees and complexity},
    journal = { Acta Applicandae Mathematicae},
    volume = { 179},
    number = { 78},
    year = {2022}
    }
    C. Brouard, J. Mariette, R. Flamary, N. Vialaneix, Feature selection for kernel methods in systems biology, NAR Genomics and Bioinformatics, Vol. 4, N. 1, pp lqac014, 2022.
    Abstract: The substantial development of high-throughput bio-technologies has rendered large-scale multi-omics datasets increasingly available. New challenges have emerged to process and integrate this large volume of information, often obtained from widely heterogeneous sources. Kernel methods have proven successful to handle the analysis of different types of datasets obtained on the same individuals. However, they usually suffer from a lack of interpretability since the original description of the individuals is lost due to the kernel embedding. We propose novel feature selection methods that are adapted to the kernel framework and go beyond the well established work in supervised learning by addressing the more difficult tasks of unsupervised learning and kernel output learning. The method is expressed under the form of a non-convex optimization problem with a L1 penalty, which is solved with a proximal gradient descent approach. It is tested on several systems biology datasets and shows good performances in selecting relevant and less redundant features compared to existing alternatives. It also proved relevant for identifying important governmental measures best explaining the time series of Covid-19 reproducing number evolution during the first months of 2020. The proposed feature selection method is embedded in the R package mixKernel version 0.7, published on CRAN.
    BibTeX:
    @article{brouard2022feature,
    author = {Brouard, Céline and Mariette, Jér\^ome and Flamary, Rémi and   Vialaneix, Nathalie},
    title = {Feature selection for kernel methods in systems biology},
    journal = {NAR Genomics and Bioinformatics},
    volume = {4},
    number = {1},
    pages = {lqac014},
    publisher = {Oxford University Press},
    year = {2022}
    }
    A. Rakotomamonjy, R. Flamary, G. Gasso, J. Salmon, Convergent Working Set Algorithm for Lasso with Non-Convex Sparse Regularizers, International Conference on Artificial Intelligence and Statistics (AISTAT), 2022.
    Abstract: Owing to their statistical properties, non-convex sparse regularizers have attracted much interest for estimating a sparse linear model from high dimensional data. Given that the solution is sparse, for accelerating convergence, a working set strategy addresses the optimization problem through an iterative algorithm by incre-menting the number of variables to optimize until the identification of the solution support. While those methods have been well-studied and theoretically supported for convex regularizers, this paper proposes a working set algorithm for non-convex sparse regularizers with convergence guarantees. The algorithm, named FireWorks, is based on a non-convex reformulation of a recent primal-dual approach and leverages on the geometry of the residuals. Our theoretical guarantees derive from a lower bound of the objective function decrease between two inner solver iterations and shows the convergence to a stationary point of the full problem. More importantly, we also show that convergence is preserved even when the inner solver is inexact, under sufficient decay of the error across iterations. Our experimental results demonstrate high computational gain when using our working set strategy compared to the full problem solver for both block-coordinate descent or a proximal gradient solver.
    BibTeX:
    @inproceedings{rakotomamonjy2022provably,
    author = {Rakotomamonjy, Alain and Flamary, Rémi and Gasso, Gilles and Salmon, Joseph},
    title = {Convergent Working Set Algorithm for Lasso with Non-Convex Sparse Regularizers},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2022}
    }
    C. Vincent-Cuaz, R. Flamary, M. Corneli, T. Vayer, N. Courty, Semi-relaxed Gromov Wasserstein divergence with applications on graphs, International Conference on Learning Representations (ICLR), 2022.
    Abstract: Comparing structured objects such as graphs is a fundamental operation involved in many learning tasks. To this end, the Gromov-Wasserstein (GW) distance, based on Optimal Transport (OT), has proven to be successful in handling the specific nature of the associated objects. More specifically, through the nodes connectivity relations, GW operates on graphs, seen as probability measures over specific spaces. At the core of OT is the idea of conservation of mass, which imposes a coupling between all the nodes from the two considered graphs. We argue in this paper that this property can be detrimental for tasks such as graph dictionary or partition learning, and we relax it by proposing a new semi-relaxed Gromov-Wasserstein divergence. Aside from immediate computational benefits, we discuss its properties, and show that it can lead to an efficient graph dictionary learning algorithm. We empirically demonstrate its relevance for complex tasks on graphs such as partitioning, clustering and completion.
    BibTeX:
    @inproceedings{vincent2022semi,
    author = {Vincent-Cuaz, Cédric and Flamary, Rémi and Corneli, Marco and   Vayer, Titouan and Courty, Nicolas},
    title = {Semi-relaxed Gromov Wasserstein divergence with applications on graphs},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2022}
    }

    2021

    Q. H. Tran, H. Janati, I. Redko, F. Rémi, N. Courty, Factored couplings in multi-marginal optimal transport via difference of convex programming, NeurIPS 2021 Optimal Transport and Machine Learning Workshop (OTML), 2021.
    Abstract: Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes.
    BibTeX:
    @conference{tran2021factored,
    author = {Tran, Quang Huy and Janati, Hicham and Redko, Ievgen and Flamary   Rémi and Courty, Nicolas},
    title = {Factored couplings in multi-marginal optimal transport via difference of convex programming},
    howpublished = { NeurIPS 2021 Optimal Transport and Machine Learning Workshop (OTML)},
    year = {2021}
    }
    L. Chapel, R. Flamary, H. Wu, C. Févotte, G. Gasso, Unbalanced Optimal Transport through Non-negative Penalized Linear Regression, Neural Information Processing Systems (NeurIPS), 2021.
    Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
    BibTeX:
    @inproceedings{chapel2021unbalanced,
    author = {Chapel, Laetitia and Flamary, Rémi and Wu, Haoran and Févotte, Cédric   and Gasso, Gilles},
    title = {Unbalanced Optimal Transport through Non-negative Penalized Linear Regression},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2021}
    }
    A. Rakotomamonjy, R. Flamary, G. Gasso, M. Z. Alaya, M. Berar, N. Courty, Optimal Transport for Conditional Domain Matching and Label Shift, Machine Learning, 2021.
    Abstract: We address the problem of unsupervised domain adaptation under the setting of generalized target shift (both class-conditional and label shifts occur). We show that in that setting, for good generalization, it is necessary to learn with similar source and target label distributions and to match the class-conditional probabilities. For this purpose, we propose an estimation of target label proportion by blending mixture estimation and optimal transport. This estimation comes with theoretical guarantees of correctness. Based on the estimation, we learn a model by minimizing a importance weighted loss and a Wasserstein distance between weighted marginals. We prove that this minimization allows to match class-conditionals given mild assumptions on their geometry. Our experimental results show that our method performs better on average than competitors accross a range domain adaptation problems including digits,VisDA and Office.
    BibTeX:
    @article{rakotomamonjy2021optimal,
    author = {Rakotomamonjy, Alain and Flamary, Rémi and Gasso, Gilles and Alaya, Mokhtar Z and Berar, Maxime and Courty, Nicolas},
    title = {Optimal Transport for Conditional Domain Matching and Label Shift},
    journal = {Machine Learning},
    year = {2021}
    }
    C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.
    Abstract: Dictionary learning is a key tool for representation learning that explains the data as linear combination of few basic elements. Yet, this analysis is not amenable in the context of graph learning, as graphs usually belong to different metric spaces. We fill this gap by proposing a new online Graph Dictionary Learning approach, which uses the Gromov Wasserstein divergence for the data fitting term. In our work, graphs are encoded through their nodes' pairwise relations and modeled as convex combination of graph atoms, i.e. dictionary elements, estimated thanks to an online stochastic algorithm, which operates on a dataset of unregistered graphs with potentially different number of nodes. Our approach naturally extends to labeled graphs, and is completed by a novel upper bound that can be used as a fast approximation of Gromov Wasserstein in the embedding space. We provide numerical evidences showing the interest of our approach for unsupervised embedding of graph datasets and for online graph subspace estimation and tracking.
    BibTeX:
    @inproceedings{vincent2021online,
    author = {Vincent-Cuaz, Cédric and Vayer, Titouan and Flamary, Rémi and Corneli, Marco and Courty, Nicolas},
    title = {Online Graph Dictionary Learning},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2021}
    }
    K. Fatras, T. Séjourné, N. Courty, R. Flamary, Unbalanced minibatch Optimal Transport; applications to Domain Adaptation, International Conference on Machine Learning (ICML), 2021.
    Abstract: Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, \em i.e. minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
    BibTeX:
    @inproceedings{fatras2021unbalanced,
    author = {Fatras, Kilian and Séjourné, Thibault and Courty, Nicolas and   Flamary, Rémi},
    title = {Unbalanced minibatch Optimal Transport; applications to Domain Adaptation},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2021}
    }
    R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy , I. Redko, A. Rolet, A. Schutz, V. S. a. D. J. Sutherland, R. Tavenard, A. Tong, T. Vayer, POT: Python Optimal Transport, Journal of Machine Learning Research, Vol. 22, N. 78, pp 1-8, 2021.
    Abstract: Optimal transport has recently been reintroduced to the machine learning community thanks in part to novel efficient optimization procedures allowing for medium to large scale applications. We propose a Python toolbox that implements several key optimal transport ideas for the machine learning community. The toolbox contains implementations of a number of founding works of OT for machine learning such as Sinkhorn algorithm and Wasserstein barycenters, but also provides generic solvers that can be used for conducting novel fundamental research. This toolbox, named POT for Python Optimal Transport, is open source with an MIT license.
    BibTeX:
    @article{flamary2021pot,
    author = { Rémi Flamary and Nicolas Courty and Alexandre Gramfort and   Mokhtar Z. Alaya and Aurélie Boisbunon and Stanislas Chambon and Laetitia
      Chapel and Adrien Corenflos and Kilian Fatras and Nemo Fournier and Léo
      Gautheron and Nathalie T.H. Gayraud and Hicham Janati and Alain Rakotomamonjy
      and Ievgen Redko and Antoine Rolet and Antony Schutz and Vivien Seguy and
      Danica J. Sutherland and Romain Tavenard and Alexander Tong and Titouan
      Vayer},
    title = {POT: Python Optimal Transport},
    journal = { Journal of Machine Learning Research},
    volume = { 22},
    number = { 78},
    pages = { 1-8},
    year = {2021}
    }
    J.C. Burnel, K. Fatras, R. Flamary, N. Courty, Generating natural adversarial Remote Sensing Images, Geoscience and Remote Sensing, IEEE Transactions on, 2021.
    Abstract: Over the last years, Remote Sensing Images (RSI) analysis have started resorting to using deep neural networks to solve most of the commonly faced problems, such as detection, land cover classification or segmentation. As far as critical decision making can be based upon the results of RSI analysis, it is important to clearly identify and understand potential security threats occurring in those machine learning algorithms. Notably, it has recently been found that neural networks are particularly sensitive to carefully designed attacks, generally crafted given the full knowledge of the considered deep network. In this paper, we consider the more realistic but challenging case where one wants to generate such attacks in the case of a black-box neural network. In this case, only the prediction score of the network is accessible, given a specific input. Examples that lure away the network's prediction, while being perceptually similar to real images, are called natural or unrestricted adversarial examples. We present an original method to generate such examples, based on a variant of the Wasserstein Generative Adversarial Network. We demonstrate its effectiveness on natural adversarial hyper-spectral image generation and image modification for fooling a state-of-the-art detector. Among others, we also conduct a perceptual evaluation with human annotators to better assess the effectiveness of the proposed method.
    BibTeX:
    @article{burnel2021generating,
    author = {Burnel, Jean-Christophe and Fatras, Kilian and Flamary, Rémi and Courty, Nicolas},
    title = {Generating natural adversarial Remote Sensing Images},
    journal = {Geoscience and Remote Sensing, IEEE Transactions on},
    year = {2021}
    }
    K. Fatras, B. Bhushan Damodaran, S. Lobry, R. Flamary, D. Tuia, N. Courty, Wasserstein Adversarial Regularization for learning with label noise, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2021.
    Abstract: Noisy labels often occur in vision datasets, especially when they are obtained from crowdsourcing or Web scraping. We propose a new regularization method, which enables learning robust classifiers in presence of noisy data. To achieve this goal, we propose a new adversarial regularization scheme based on the Wasserstein distance. Using this distance allows taking into account specific relations between classes by leveraging the geometric properties of the labels space. Our Wasserstein Adversarial Regularization (WAR) encodes a selective regularization, which promotes smoothness of the classifier between some classes, while preserving sufficient complexity of the decision boundary between others. We first discuss how and why adversarial regularization can be used in the context of label noise and then show the effectiveness of our method on five datasets corrupted with noisy labels: in both benchmarks and real datasets, WAR outperforms the state-of-the-art competitors.
    BibTeX:
    @article{damodaran2021wasserstein,
    author = { Fatras, Kilian and Bhushan Damodaran, Bharath and Lobry, Sylvain and Flamary, Rémi and Tuia, Devis and Courty, Nicolas},
    title = {Wasserstein Adversarial Regularization for learning with label          noise},
    journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on },
    year = {2021}
    }

    2020

    X. Li, Y. Grandvalet, R. Flamary, N. Courty, D. Dou, Representation transfer by optimal transport, 2020.
    Abstract: Learning generic representations with deep networks requires massive training samples and significant computer resources. To learn a new specific task, an important issue is to transfer the generic teacher's representation to a student network. In this paper, we propose to use a metric between representations that is based on a functional view of neurons. We use optimal transport to quantify the match between two representations, yielding a distance that embeds some invariances inherent to the representation of deep networks. This distance defines a regularizer promoting the similarity of the student's representation with that of the teacher. Our approach can be used in any learning context where representation transfer is applicable. We experiment here on two standard settings: inductive transfer learning, where the teacher's representation is transferred to a student network of same architecture for a new related task, and knowledge distillation, where the teacher's representation is transferred to a student of simpler architecture for the same task (model compression). Our approach also lends itself to solving new learning problems; we demonstrate this by showing how to directly transfer the teacher's representation to a simpler architecture student for a new related task.
    BibTeX:
    @techreport{li2020representation,
    author = {Li, Xuhong and Grandvalet, Yves and Flamary, Rémi and Courty, Nicolas and Dou, Dejing},
    title = {Representation transfer by optimal transport},
    institution = {arXiv preprint arXiv:2007.06737},
    year = {2020}
    }
    I. Redko, T. Vayer, R. Flamary, N. Courty, CO-Optimal Transport, Neural Information Processing Systems (NeurIPS), 2020.
    Abstract: Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions. Yet, its original formulation relies on the existence of a cost function between the samples of the two distributions, which makes it impractical for comparing data distributions supported on different topological spaces. To circumvent this limitation, we propose a novel OT problem, named COOT for CO-Optimal Transport, that aims to simultaneously optimize two transport maps between both samples and features. This is different from other approaches that either discard the individual features by focussing on pairwise distances (e.g. Gromov-Wasserstein) or need to model explicitly the relations between the features. COOT leads to interpretable correspondences between both samples and feature representations and holds metric properties. We provide a thorough theoretical analysis of our framework and establish rich connections with the Gromov-Wasserstein distance. We demonstrate its versatility with two machine learning applications in heterogeneous domain adaptation and co-clustering/data summarization, where COOT leads to performance improvements over the competing state-of-the-art methods.
    BibTeX:
    @inproceedings{redko2020cooptimal,
    author = {Ivegen Redko and Titouan Vayer and Rémi Flamary and Nicolas Courty},
    title = {CO-Optimal Transport},
    booktitle = { Neural Information Processing Systems (NeurIPS)},
    year = {2020}
    }
    D. Marcos, R. Fong, S. Lobry, R. Flamary, N. Courty, D. Tuia, Contextual Semantic Interpretability, Asian Conference on Computer Vision (ACCV), 2020.
    Abstract: Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emphsemantic bottleneck. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.
    BibTeX:
    @inproceedings{marcos2020contextual,
    author = {Diego Marcos and Ruth Fong and Sylvain Lobry and Remi Flamary and Nicolas Courty and Devis Tuia},
    title = {Contextual Semantic Interpretability},
    booktitle = { Asian Conference on Computer Vision (ACCV)},
    year = {2020}
    }
    T. Vayer, L. Chapel, R. Flamary, R. Tavenard, N. Courty, Fused Gromov-Wasserstein Distance for Structured Objects, Algorithms, Vol. 13 (9), pp 212, 2020.
    Abstract: Optimal transport theory has recently found many applications in machine learning thanks to its capacity to meaningfully compare various machine learning objects that are viewed as distributions. The Kantorovitch formulation, leading to the Wasserstein distance, focuses on the features of the elements of the objects, but treats them independently, whereas the Gromov–Wasserstein distance focuses on the relations between the elements, depicting the structure of the object, yet discarding its features. In this paper, we study the Fused Gromov-Wasserstein distance that extends the Wasserstein and Gromov–Wasserstein distances in order to encode simultaneously both the feature and structure information. We provide the mathematical framework for this distance in the continuous setting, prove its metric and interpolation properties, and provide a concentration result for the convergence of finite samples. We also illustrate and interpret its use in various applications, where structured objects are involved.
    BibTeX:
    @article{vayer2020fused,
    author = {Vayer, Titouan and Chapel, Laetita and Flamary, Rémi and Tavenard, Romain and Courty, Nicolas},
    title = {Fused Gromov-Wasserstein Distance for Structured Objects},
    journal = { Algorithms},
    volume = {13 (9)},
    pages = {212},
    year = {2020}
    }
    K. Fatras, Y. Zine, R. Flamary, R. Gribonval, N. Courty, Learning with minibatch Wasserstein : asymptotic and gradient properties, International Conference on Artificial Intelligence and Statistics (AISTAT), 2020.
    Abstract: Optimal transport distances are powerful tools to compare probability distributions and have found many applications in machine learning. Yet their algorithmic complexity prevents their direct use on large scale datasets. To overcome this challenge, practitioners compute these distances on minibatches \em i.e. they average the outcome of several smaller optimal transport problems. We propose in this paper an analysis of this practice, which effects are not well understood so far. We notably argue that it is equivalent to an implicit regularization of the original problem, with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with defects such as loss of distance property. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, GANs or color transfer that highlight the practical interest of this strategy.
    BibTeX:
    @inproceedings{fatras2019learning,
    author = {Kilian Fatras and Younes Zine and Rémi Flamary and Rémi Gribonval and Nicolas Courty},
    title = {Learning with minibatch Wasserstein : asymptotic and gradient properties},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2020}
    }

    2019

    R. Flamary, Optimal Transport for Machine Learning, Université Cote d'Azur, 2019.
    Abstract: In this document, I present several recent contributions to machine learning using optimal transport (OT) theory. The first part of the document introduces the optimal transport problem and discuss several algorithm designed to solve its original and regularized formulation. Next I present contributions to machine learning that focus on 4 different aspects of OT. I introduce first the use of approximate Monge mapping for domain adaptation and then the use of OT divergence such as Wasserstein distance for histogram and empirical data. Finally I discuss shortly recent results that aim at extending OT as a distance between structured data such as labeled graphs.
    BibTeX:
    @phdthesis{flamady2019hdr,
    author = { Flamary, R.},
    title = {Optimal Transport for Machine Learning},
    school = { Université Cote d'Azur},
    year = {2019}
    }
    R. Flamary, K. Lounici, A. Ferrari, Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation, 2019.
    Abstract: This article investigates the quality of the estimator of the linear Monge mapping between distributions. We provide the first concentration result on the linear mapping operator and prove a sample complexity of n^−1/2 when using empirical estimates of first and second order moments. This result is then used to derive a generalization bound for domain adaptation with optimal transport. As a consequence, this method approaches the performance of theoretical Bayes predictor under mild conditions on the covariance structure of the problem. We also discuss the computational complexity of the linear mapping estimation and show that when the source and target are stationary the mapping is a convolution that can be estimated very efficiently using fast Fourier transforms. Numerical experiments reproduce the behavior of the proven bounds on simulated and real data for mapping estimation and domain adaptation on images.
    BibTeX:
    @techreport{flamary2019concentration,
    author = { Flamary, Rémi and Lounici, Karim and Ferrari, André},
    title = {Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation},
    year = {2019}
    }
    L. Dragoni, R. Flamary, K. Lounici, P. Reynaud-Bouret, Large scale Lasso with windowed active set for convolutional spike sorting, 2019.
    Abstract: Spike sorting is a fundamental preprocessing step in neuroscience that is central to access simultaneous but distinct neuronal activities and therefore to better understand the animal or even human brain. But numerical complexity limits studies that require processing large scale datasets in terms of number of electrodes, neurons, spikes and length of the recorded signals. We propose in this work a novel active set algorithm aimed at solving the Lasso for a classical convolutional model. Our algorithm can be implemented efficiently on parallel architecture and has a linear complexity w.r.t. the temporal dimensionality which ensures scaling and will open the door to online spike sorting. We provide theoretical results about the complexity of the algorithm and illustrate it in numerical experiments along with results about the accuracy of the spike recovery and robustness to the regularization parameter.
    BibTeX:
    @techreport{dragoni2019large,
    author = {Dragoni, Laurent and Flamary, Rémi and Lounici, Karim and Reynaud-Bouret, Patricia},
    title = {Large scale Lasso with windowed active set for convolutional spike sorting},
    year = {2019}
    }
    T. Vayer, R. Flamary, R. Tavenard, L. Chapel, N. Courty, Sliced Gromov-Wasserstein, Neural Information Processing Systems (NeurIPS), 2019.
    Abstract: Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions that do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the Sliced Wasserstein (SW) distance exploits the direct solution of W on the line, that only requires sorting discrete samples in 1D. This paper propose a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being O(n^2) to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute
    BibTeX:
    @inproceedings{vayer2019sliced,
    author = { Vayer, Titouan and Flamary, Rémi and Tavenard, Romain and Chapel, Laetitia  and  Courty, Nicolas},
    title = {Sliced Gromov-Wasserstein},
    booktitle = {Neural Information Processing Systems (NeurIPS)},
    year = {2019}
    }
    T. Vayer, L. Chapel, R. Flamary, R. Tavenard, N. Courty, Optimal Transport for structured data with application on graphs, International Conference on Machine Learning (ICML), 2019.
    Abstract: This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Fréchet means or barycenters of graphs are illustrated and discussed in a clustering context.
    BibTeX:
    @inproceedings{vayer2019optimal,
    author = { Vayer, Titouan and Chapel, Laetitia and Flamary, Rémi and Tavenard, Romain and  Courty, Nicolas},
    title = {Optimal Transport for structured data with application on graphs},
    booktitle = {International Conference on Machine Learning (ICML)},
    year = {2019}
    }
    R. Rougeot, R. Flamary, D. Mary, C. Aime, Influence of surface roughness on diffraction in the externally occulted Lyot solar coronagraph, Astronomy and Astrophysics, 2019.
    Abstract: Context. The solar coronagraph ASPIICS will fly on the future ESA formation flying mission Proba-3. The instrument combines an external occulter of diameter 1.42m and a Lyot solar coronagraph of 5cm diameter, located downstream at a distance of 144m. Aims. The theoretical performance of the externally occulted Lyot coronagraph has been computed by assuming perfect optics. In this paper, we improve related modelling by introducing roughness scattering effects from the telescope. We have computed the diffraction at the detector, that we compare to the ideal case without perturbation to estimate the performance degradation. We have also investigated the influence of sizing the internal occulter and the Lyot stop, and we performed a sensitivity analysis on the roughness. Methods. We have built on a recently published numerical model of diffraction propagation. The micro-structures of the telescope are built by filtering a white noise with a power spectral density following an isotropic ABC function, suggested by Harvey scatter theory. The parameters were tuned to fit experimental data measured on ASPIICS lenses. The computed wave front error was included in the Fresnel wave propagation of the coronagraph. A circular integration over the solar disk was performed to reconstruct the complete diffraction intensity. Results. The level of micro-roughness is 1.92nm root-mean-square. Compared to the ideal case, in the plane of the internal occulter, the diffraction peak intensity is reduced by ' 0.001\%. However, the intensity outside the peak increases by 12\% on average, up to 20\% at 3R , where the mask does not filter out the diffraction. At detector level, the diffraction peak remains ' 10 −6 at 1.1R , similar to the ideal case, but the diffraction tail at large solar radius is much higher, up to one order of magnitude. Sizing the internal occulter and the Lyot stop does not improve the rejection, as opposed to the ideal case. Conclusions. Besides these results, this paper provides a methodology to implement roughness scattering in the wave propagation model for the solar coronagraph.
    BibTeX:
    @article{rougeot2019influence,
    author = { Rougeot, Raphael and Flamary, Remi and Mary, David and Aime, Claude},
    title = {Influence of surface roughness on diffraction in the externally occulted Lyot solar coronagraph},
    journal = { Astronomy and Astrophysics},
    year = {2019}
    }
    B. B. Damodaran, R. Flamary, V. Seguy, N. Courty, An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images, Computer Vision and Image Understanding, 2019.
    Abstract: Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods.
    BibTeX:
    @article{damodaran2019entropic,
    author = { B. Damodaran, Bharath and Flamary, Rémi and Seguy, Viven         and Courty, Nicolas},
    title = {An Entropic Optimal Transport Loss for Learning Deep Neural Networks         under Label Noise in Remote Sensing Images},
    journal = {Computer Vision and Image Understanding},
    year = {2019}
    }
    I. Redko, N. Courty, R. Flamary, D. Tuia, Optimal Transport for Multi-source Domain Adaptation under Target Shift, International Conference on Artificial Intelligence and Statistics (AISTAT), 2019.
    Abstract: In this paper, we propose to tackle the problem of reducing discrepancies between multiple domains referred to as multi-source domain adaptation and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with labels' proportions differing across them. We design a method based on optimal transport, a theory that is gaining momentum to tackle adaptation problems in machine learning due to its efficiency in aligning probability distributions. Our method performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data related to satellite image segmentation task show the superiority of the proposed method over the state-of-the-art.
    BibTeX:
    @inproceedings{redko2018optimal,
    author = { Redko, I. and Courty, N. and Flamary, R. and Tuia, D.},
    title = {Optimal Transport for Multi-source Domain Adaptation under Target Shift},
    booktitle = { International Conference on Artificial Intelligence and Statistics (AISTAT)},
    year = {2019}
    }
    R. B. Metcalf, M. Meneghetti, C. Avestruz, F. Bellagamba, C. R. Bom, E. Bertin, R. Cabanac, E. Decencière, R. Flamary, R. Gavazzi, others, The Strong Gravitational Lens Finding Challenge, Astronomy and Astrophysics, Vol. 625, pp A119, 2019.
    Abstract: Large scale imaging surveys will increase the number of galaxy-scale strong lensing candidates by maybe three orders of magnitudes beyond the number known today. Finding these rare objects will require picking them out of at least tens of millions of images and deriving scientific results from them will require quantifying the efficiency and bias of any search method. To achieve these objectives automated methods must be developed. Because gravitational lenses are rare objects reducing false positives will be particularly important. We present a description and results of an open gravitational lens finding challenge. Participants were asked to classify 100,000 candidate objects as to whether they were gravitational lenses or not with the goal of developing better automated methods for finding lenses in large data sets. A variety of methods were used including visual inspection, arc and ring finders, support vector machines (SVM) and convolutional neural networks (CNN). We find that many of the methods will be easily fast enough to analyse the anticipated data flow. In test data, several methods are able to identify upwards of half the lenses after applying some thresholds on the lens characteristics such as lensed image brightness, size or contrast with the lens galaxy without making a single false-positive identification. This is significantly better than direct inspection by humans was able to do. (abridged)
    BibTeX:
    @article{metcalf2019strong,
    author = {Metcalf, R Benton and Meneghetti, M and Avestruz, Camille and Bellagamba, Fabio and Bom, Clécio R and Bertin, Emmanuel and Cabanac, Rémi and Decencière, Etienne and Flamary, Rémi and Gavazzi, Raphael and others},
    title = {The Strong Gravitational Lens Finding Challenge},
    journal = {Astronomy and Astrophysics},
    volume = {625},
    pages = {A119},
    publisher = {EDP Sciences},
    year = {2019}
    }

    2018

    I. Harrane, R. Flamary, C. Richard, R. Couillet, Random matrix theory for diffusion LMS analysis., Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2018.
    Abstract: Multi-agent networks usually consist of a large number of interconnected agents or nodes. Interconnections between the agents allow them to share information and collaborate in order to solve complex tasks collectively. Examples abound in the realm of social, economic and biological networks. Distributed algorithms over such networks offer a valuable alternative to centralized solutions with useful properties such as scalability, robustness, and decentralization. Among the existing cooperation rules, we are interested in this paper in diffusion strategies since they are scalable, robust, and enable agents to continuously learn and adapt in an online manner to concept drifts in their data streams. The performances of diffusion strategies have been widely studied in the literature, but never in the asymptotic regime of extremely large networks and very high dimensional data. In this paper we explore this regime with the Random Matrix Theory (RMT) and analyze the performance of the diffusion LMS accordingly. Then we conduct numerical simulations to support the theoretical finding and to determine its applicability when RMT conditions are violated.
    BibTeX:
    @inproceedings{harrane2018random,
    author = {Harrane, Ibrahim and Flamary, R. and Richard, C. and Couillet, R.},
    title = {Random matrix theory for diffusion LMS analysis.},
    booktitle = {Asilomar Conference on Signals, Systems and Computers (ASILOMAR)},
    year = {2018}
    }
    B. B. Damodaran, B. Kellenberger, R. Flamary, D. Tuia, N. Courty, DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation, European Conference in Computer Visions (ECCV), 2018.
    Abstract: In computer vision, one is often confronted with problems of domain shifts, which occur when one applies a classifier trained on a source dataset to target data sharing similar characteristics (e.g. same classes), but also different latent data structures (e.g. different acquisition conditions). In such a situation, the model will perform poorly on the new data, since the classifier is specialized to recognize visual cues specific to the source domain. In this work we explore a solution, named DeepJDOT, to tackle this problem: through a measure of discrepancy on joint deep representations/labels based on optimal transport, we not only learn new data representations aligned between the source and target domain, but also simultaneously preserve the discriminative information used by the classifier. We applied DeepJDOT to a series of visual recognition tasks, where it compares favorably against state-of-the-art deep domain adaptation methods.
    BibTeX:
    @inproceedings{damodaran2018deepjdot,
    author = { Damodaran, Bharath B. and Kellenberger, Benjamin and Flamary, Rémi and Tuia, Devis and Courty, Nicolas},
    title = {DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation},
    booktitle = {European Conference in Computer Visions (ECCV)},
    year = {2018}
    }
    A. Rakotomamonjy, A. Traore, M. Berar, R. Flamary, N. Courty, Distance Measure Machines, 2018.
    Abstract: This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of Balcan 2008 to the population distribution case and we prove that, for some learning problems, Wasserstein distance achieves low-error linear decision functions with high probability. Our key result is to prove that the theory also holds for empirical distributions. Algorithmically, the proposed approach is very simple as it consists in computing a mapping based on pairwise Wasserstein distances and then learning a linear decision function. Our experimental results show that this Wasserstein distance embedding performs better than kernel mean embeddings and computing Wasserstein distance is far more tractable than estimating pairwise Kullback-Leibler divergence of empirical distributions.
    BibTeX:
    @techreport{rakotomamonjy2018wasserstein,
    author = {Rakotomamonjy, Alain and Traore, Abraham and Berar, Maxime and Flamary, Remi and Courty, Nicolas},
    title = {Distance Measure Machines},
    year = {2018}
    }
    R. Rougeot, C. Aime, C. Baccani, S. Fineschi, R. Flamary, D. Galano, C. Galy, V. Kirschner, F. Landini, M. Romoli, others, Straylight analysis for the externally occulted Lyot solar coronagraph ASPIICS, Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave, Vol. 10698, pp 106982T, 2018.
    Abstract: The ESA formation Flying mission Proba-3 will fly the giant solar coronagraph ASPIICS. The instrument is composed of a 1.4 meter diameter external occulting disc mounted on the Occulter Spacecraft and a Lyot-style solar coronagraph of 50mm diameter aperture carried by the Coronagraph Spacecraft positioned 144 meters behind. The system will observe the inner corona of the Sun, as close as 1.1 solar radius. For a solar coronagraph, the most critical source of straylight is the residual diffracted sunlight, which drives the scientific performance of the observation. This is especially the case for ASPIICS because of its reduced field-of-view close to the solar limb. The light from the Sun is first diffracted by the edge of the external occulter, and then propagates and scatters inside the instrument. There is a crucial need to estimate both intensity and distribution of the diffraction on the focal plane. Because of the very large size of the coronagraph, one cannot rely on representative full scale test campaign. Moreover, usual optics software package are not designed to perform such diffraction computation, with the required accuracy. Therefore, dedicated approaches have been developed in the frame of ASPIICS. First, novel numerical models compute the diffraction profile on the entrance pupil plane and instrument detector plane (Landini et al., Rougeot et al.), assuming perfect optics in the sense of multi-reflection and scattering. Results are confronted to experimental measurements of diffraction. The paper reports the results of the different approaches.
    BibTeX:
    @inproceedings{rougeot2018straylight,
    author = {Rougeot, Rapha\el and Aime, Claude and Baccani, Cristian and Fineschi, Silvano and Flamary, Rémi and Galano, Damien and Galy, Camille and Kirschner, Volker and Landini, Federico and Romoli, Marco and others},
    title = {Straylight analysis for the externally occulted Lyot solar coronagraph ASPIICS},
    volume = {10698},
    pages = {106982T},
    booktitle = {Space Telescopes and Instrumentation 2018: Optical, Infrared, and Millimeter Wave},
    organization = {International Society for Optics and Photonics},
    year = {2018}
    }
    I. Harrane, R. Flamary, C. Richard, On reducing the communication cost of the diffusion LMS algorithm, IEEE Transactions on Signal and Information Processing over Networks (SIPN), Vol. 5, pp 100-112, 2018.
    Abstract: The rise of digital and mobile communications has recently made the world more connected and networked, resulting in an unprecedented volume of data flowing between sources, data centers, or processes. While these data may be processed in a centralized manner, it is often more suitable to consider distributed strategies such as diffusion as they are scalable and can handle large amounts of data by distributing tasks over networked agents. Although it is relatively simple to implement diffusion strategies over a cluster, it appears to be challenging to deploy them in an ad-hoc network with limited energy budget for communication. In this paper, we introduce a diffusion LMS strategy that significantly reduces communication costs without compromising the performance. Then, we analyze the proposed algorithm in the mean and mean-square sense. Next, we conduct numerical experiments to confirm the theoretical findings. Finally, we perform large scale simulations to test the algorithm efficiency in a scenario where energy is limited.
    BibTeX:
    @article{harrane2018reducing,
    author = {Harrane, Ibrahim and Flamary, R. and Richard, C.},
    title = {On reducing the communication cost of the diffusion LMS algorithm},
    journal = {IEEE Transactions on Signal and Information Processing over Networks (SIPN)},
    volume = {5},
    pages = {100-112},
    year = {2018}
    }
    V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, M. Blondel, Large-Scale Optimal Transport and Mapping Estimation, International Conference on Learning Representations (ICLR), 2018.
    Abstract: This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a Monge map as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
    BibTeX:
    @inproceedings{seguy2018large,
    author = {Seguy, Vivien. and Damodaran, Bharath B.  and Flamary, Remi and Courty, Nicolas and Rolet, Antoine and Blondel, Mathieu},
    title = {Large-Scale Optimal Transport and Mapping Estimation},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2018}
    }
    N. Courty, R. Flamary, M. Ducoffe, Learning Wasserstein Embeddings, International Conference on Learning Representations (ICLR), 2018.
    Abstract: The Wasserstein distance received a lot of attention recently in the community of machine learning, especially for its principled way of comparing distributions. It has found numerous applications in several hard problems, such as domain adaptation, dimensionality reduction or generative models. However, its use is still limited by a heavy computational cost. Our goal is to alleviate this problem by providing an approximation mechanism that allows to break its inherent complexity. It relies on the search of an embedding where the Euclidean distance mimics the Wasserstein distance. We show that such an embedding can be found with a siamese architecture associated with a decoder network that allows to move from the embedding space back to the original input space. Once this embedding has been found, computing optimization problems in the Wasserstein space (e.g. barycenters, principal directions or even archetypes) can be conducted extremely fast. Numerical experiments supporting this idea are conducted on image datasets, and show the wide potential benefits of our method.
    BibTeX:
    @inproceedings{courty2018learning,
    author = {Courty, Nicolas and Flamary, Remi and Ducoffe, Melanie},
    title = {Learning Wasserstein Embeddings},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2018}
    }
    R. Flamary, M. Cuturi, N. Courty, A. Rakotomamonjy, Wasserstein Discriminant Analysis, Machine learning , Vol. 107, pp 1923-1945, 2018.
    Abstract: Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.
    BibTeX:
    @article{flamary2017wasserstein,
    author = {Flamary, Remi and Cuturi, Marco and Courty, Nicolas and Rakotomamonjy, Alain},
    title = {Wasserstein Discriminant Analysis},
    journal = { Machine learning },
    volume = {107},
    pages = {1923-1945},
    year = {2018}
    }

    2017

    N. Courty, R. Flamary, A. Habrard, A. Rakotomamonjy, Joint Distribution Optimal Transportation for Domain Adaptation, Neural Information Processing Systems (NIPS), 2017.
    Abstract: This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function f in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain Ps and Pt. We propose a solution of this problem with optimal transport, that allows to recover an estimated target Pft=(X,f(X)) by optimizing simultaneously the optimal coupling and f. We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.
    BibTeX:
    @inproceedings{courty2017joint,
    author = {Courty, Nicolas and Flamary, Remi and Habrard, Amaury and Rakotomamonjy, Alain},
    title = {Joint Distribution Optimal Transportation for Domain Adaptation},
    booktitle = {Neural Information Processing Systems (NIPS)},
    year = {2017}
    }
    P. Hartley, R. Flamary, N. Jackson, A. S. Tagore, R. B. Metcalf, Support Vector Machine classification of strong gravitational lenses, Monthly Notices of the Royal Astronomical Society (MNRAS), 2017.
    Abstract: The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey.
    BibTeX:
    @article{hartley2017support,
    author = {Hartley, Philippa, and Flamary, Remi and Jackson, Neal and Tagore, A. S. and Metcalf, R. B.},
    title = {Support Vector Machine classification of strong gravitational lenses},
    journal = {Monthly Notices of the Royal Astronomical Society (MNRAS)},
    year = {2017}
    }
    R. Mourya, A. Ferrari, R. Flamary, P. Bianchi, C. Richard, Distributed Deblurring of Large Images of Wide Field-Of-View, 2017.
    Abstract: Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view suffering from shift-variant blur. Most of the existing image deblurring techniques are designed and implemented to work efficiently on centralized computing system having multiple processors and a shared memory. Thus, the largest image that can be handle is limited by the size of the physical memory available on the system. In this paper, we propose a distributed nonblind image deblurring algorithm in which several connected processing nodes (with reasonable computational resources) process simultaneously different portions of a large image while maintaining certain coherency among them to finally obtain a single crisp image. Unlike the existing centralized techniques, image deblurring in distributed fashion raises several issues. To tackle these issues, we consider certain approximations that trade-offs between the quality of deblurred image and the computational resources required to achieve it. The experimental results show that our algorithm produces the similar quality of images as the existing centralized techniques while allowing distribution, and thus being cost effective for extremely large images.
    BibTeX:
    @techreport{mourya2017distdeblur,
    author = {Mourya, Rahul and Ferrari, Andre and Flamary, Remi and Bianchi, Pascal and Richard, Cedric},
    title = {Distributed Deblurring of Large Images of Wide Field-Of-View},
    year = {2017}
    }
    R. Mourya, A. Ferrari, R. Flamary, P. Bianchi, C. Richard, Distributed Approach for Deblurring Large Images with Shift-Variant Blur, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: Image deblurring techniques are effective tools to obtain high quality image from acquired image degraded by blur and noise. In applications such as astronomy and satellite imaging, size of acquired images can be extremely large (up to gigapixels) covering a wide field-of-view suffering from shift-variant blur. Most of the existing deblurring techniques are designed to be cost effective on a centralized computing system having a shared memory and possibly multicore processor. The largest image they can handle is then conditioned by the memory capacity of the system. In this paper, we propose a distributed shift-variant image deblurring algorithm in which several connected processing units (each with reasonable computational resources) can deblur simultaneously different portions of a large image while maintaining a certain coherency among them to finally obtain a single crisp image. The proposed algorithm is based on a distributed Douglas-Rachford splitting algorithm with a specific structure of the penalty parameters used in the proximity operator. Numerical experiments show that the proposed algorithm produces images of similar quality as the existing centralized techniques while being distributed and being cost effective for extremely large images.
    BibTeX:
    @inproceedings{mourya2017distributed,
    author = {Mourya, Rahul and Ferrari, Andre and Flamary, Remi and Bianchi, Pascal and Richard, Cedric},
    title = {Distributed Approach for Deblurring Large Images with Shift-Variant Blur},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }
    R. Flamary, Astronomical image reconstruction with convolutional neural networks, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: State of the art methods in astronomical image reconstruction rely on the resolution of a regularized or constrained optimization problem. Solving this problem can be computationally intensive and usually leads to a quadratic or at least superlinear complexity w.r.t. the number of pixels in the image. We investigate in this work the use of convolutional neural networks for image reconstruction in astronomy. With neural networks, the computationally intensive tasks is the training step, but the prediction step has a fixed complexity per pixel, i.e. a linear complexity. Numerical experiments show that our approach is both computationally efficient and competitive with other state of the art methods in addition to being interpretable.
    BibTeX:
    @inproceedings{flamary2017astro,
    author = {Flamary, Remi},
    title = {Astronomical image reconstruction with convolutional neural networks},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }
    R. Ammanouil, A. Ferrari, R. Flamary, C. Ferrari, D. Mary, Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters, European Conference on Signal Processing (EUSIPCO), 2017.
    Abstract: As the world's largest radio telescope, the Square Kilometer Array (SKA) will provide radio interferometric data with unprecedented detail. Image reconstruction algorithms for radio interferometry are challenged to scale well with TeraByte image sizes never seen before. In this work, we investigate one such 3D image reconstruction algorithm known as MUFFIN (MUlti-Frequency image reconstruction For radio INterferometry). In particular, we focus on the challenging task of automatically finding the optimal regularization parameter values. In practice, finding the regularization parameters using classical grid search is computationally intensive and nontrivial due to the lack of ground- truth. We adopt a greedy strategy where, at each iteration, the optimal parameters are found by minimizing the predicted Stein unbiased risk estimate (PSURE). The proposed self-tuned version of MUFFIN involves parallel and computationally efficient steps, and scales well with large- scale data. Finally, numerical results on a 3D image are presented to showcase the performance of the proposed approach.
    BibTeX:
    @inproceedings{ammanouil2017multi,
    author = {Ammanouil, Rita and Ferrari, Andre and Flamary, Remi and Ferrari, Chiara and Mary, David},
    title = {Multi-frequency image reconstruction for radio-interferometry with self-tuned regularization parameters},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2017}
    }
    R. Rougeot, R. Flamary, D. Galano, C. Aime, Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS, Astronomy and Astrophysics, 2017.
    Abstract: Context. The future ESA Formation Flying mission Proba-3 will fly the solar coronagraph ASPIICS which couples a Lyot coronagraph of 50mm and an external occulter of 1.42m diameter set 144m before. Aims. We perform a numerical study on the theoretical performance of the hybrid coronagraph such ASPIICS. In this system, an internal occulter is set on the image of the external occulter instead of a Lyot mask on the solar image. First, we determine the rejection due to the external occulter alone. Second, the effects of sizing the internal occulter and the Lyot stop are analyzed. This work also applies to the classical Lyot coronagraph alone and the external solar coronagraph. Methods. The numerical computation uses the parameters of ASPIICS. First we take the approach of Aime, C. 2013, A&A 558, A138, to express the wave front from Fresnel diffraction at the entrance aperture of the Lyot coronagraph. From there, each wave front coming from a given point of the Sun is propagated through the Lyot coronagraph in three steps, from the aperture to the image of the external occulter, where the internal occulter is set, from this plane to the image of the entrance aperture, where the Lyot stop is set, and from there to the final observing plane. Making use of the axis-symmetry, wave fronts originating from one radius of the Sun are computed and the intensities circularly averaged. Results. As expected, the image of the external occulter appears as a bright circle, which locally exceeds the brightness of the Sun observed without external occulter. However, residual sunlight is below 10e-8 outside 1.5R. The Lyot coronagraph effectively complements the external occultation. At the expense of a small reduction in flux and resolution, reducing the Lyot stop allows a clear gain in rejection. Oversizing the internal occulter produces a similar effect but tends to exclude observations very close to the limb. We provide a graph that allows simply estimating the performance as a function of sizes of the internal occulter and Lyot stop.
    BibTeX:
    @article{rougeot2016performance,
    author = { Rougeot, Raphael and Flamary, Remi and Galano, Damien and Aime, Claude},
    title = {Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS},
    journal = { Astronomy and Astrophysics},
    year = {2017}
    }

    2016

    D. Mary, R. Flamary, C. Theys, C. Aime, Mathematical Tools for Instrumentation and Signal Processing in Astronomy, 2016.
    Abstract: This book is a collection of 13 articles corresponding to lectures and research works exposed at the Summer school of the CNRS titled « Bases mathématiques pour l’instrumentation et le traitement du signal en astronomie ». The school took place in Nice and Porquerolles, France, from June 1 to 5, 2015. This book contains three parts: I. Astronomy in the coming decade and beyond The three chapters of this part emphasize the strong interdisciplinary nature of Astrophysics, both at theoretical and observational levels, and the increasingly larger sizes of data sets produced by increasingly more complex instruments and infrastructures. These remarkable features call in the same time for more mathematical tools in signal processing and instrumentation, in particular in statistical modeling, large scale inference, data mining, machine learning, and for efficient processing solutions allowing their implementation. II. Mathematical concepts, methods and tools The first chapter of this part starts with an example of how pure mathematics can lead to new instrumental concepts, in this case for exoplanet detection. The four other chapters of this part provide a detailed introduction to four main topics: Orthogonal functions as a powerful tool for modeling signals and images, covering Fourier, Fourier-Legendre, Fourier-Bessel series for 1D signals and Spherical Harmonic series for 2D signals; Optimization and machine learning methods with application to inverse problems, denoising and classication, with on-line numerical experiments; Large scale statistical inference with adaptive procedures allowing to control the False Discovery Rate, like the Benjamini-Hochberg procedure, its Bayesian interpretation and some variations; Processing solutions for large data sets, covering the Hadoop framework and YARN, the main tools for the management of both the storage and computing capacities of a cluster of machines and also recent solutions like Spark. III. Application: tools in action This parts collects a number of current research works where some tools above are presented in action: optimization for deconvolution, statistical modeling, multiple testing, optical and instrumental models. The applications of this part include astronomical imaging, detection and estimation of circumgalactic structures, and detection of exoplanets.
    BibTeX:
    @book{mary2016mathematical,
    author = {Mary, David and Flamary, Remi and Theys, Celine and Aime, Claude},
    title = {Mathematical Tools for Instrumentation and Signal Processing in Astronomy},
    publisher = {EDP Sciences},
    year = {2016}
    }
    R. Flamary, C. Févotte, N. Courty, V. Emyia, Optimal spectral transportation with application to music transcription, Neural Information Processing Systems (NIPS), 2016.
    Abstract: Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timber can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data.
    BibTeX:
    @inproceedings{flamary2016ost,
    author = {Flamary, Remi and Févotte, Cédric and Courty, N. and  Emyia, Valentin},
    title = {Optimal spectral transportation with application to music transcription},
    booktitle = { Neural Information Processing Systems (NIPS)},
    year = {2016}
    }
    M. Perrot, N. Courty, R. Flamary, A. Habrard, Mapping estimation for discrete optimal transport, Neural Information Processing Systems (NIPS), 2016.
    Abstract: We are interested in the computation of the transport map of an Optimal Transport problem. Most of the computational approaches of Optimal Transport use the Kantorovich relaxation of the problem to learn a probabilistic coupling but do not address the problem of learning the transport map linked to the original Monge problem. Consequently, it lowers the potential usage of such methods in contexts where out-of-samples computations are mandatory. In this paper we propose a new way to jointly learn the coupling and an approximation of the transport map. We use a jointly convex formulation which can be efficiently optimized. Additionally, jointly learning the coupling and the transport map allows to smooth the result of the Optimal Transport and generalize it on out-of-samples examples. Empirically, we show the interest and the relevance of our method in two tasks: domain adaptation and image editing.
    BibTeX:
    @inproceedings{perrot2016mapping,
    author = {Perrot, M. and Courty, N. and Flamary, R. and Habrard, A.},
    title = {Mapping estimation for discrete optimal transport},
    booktitle = {Neural Information Processing Systems (NIPS)},
    year = {2016}
    }
    N. Courty, R. Flamary, D. Tuia, A. Rakotomamonjy, Optimal transport for domain adaptation, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2016.
    Abstract: Domain adaptation is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly, models built on a specific data representations become more robust when confronted to data depicting the same semantic concepts (the classes), but observed by another observation system with its own specificities. Among the many strategies proposed to adapt a domain to another, finding domain-invariant representations has shown excellent properties, as a single classifier can use labelled samples from the source domain under this representation to predict the unlabelled samples of the target domain. In this paper, we propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source and target domains. We learn a transportation plan matching both PDFs, which constrains labelled samples in the source domain to remain close during transport. This way, we exploit at the same time the few labeled information in the source and distributions of the input/observation variables observed in both domains. Experiments in toy and challenging real visual adaptation examples show the interest of the method, that consistently outperforms state of the art approaches.
    BibTeX:
    @article{courty2016optimal,
    author = { Courty, N. and Flamary, R.  and Tuia, D. and Rakotomamonjy, A.},
    title = {Optimal transport for domain adaptation},
    journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on },
    year = {2016}
    }
    I. Harrane, R. Flamary, C. Richard, Doubly partial-diffusion LMS over adaptive networks, Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2016.
    Abstract: Diffusion LMS is an efficient strategy for solving distributed optimization problems with cooperating agents. Nodes are interested in estimating the same parameter vector and exchange information with their neighbors to improve their local estimates. However, successful implementation of such applications depends on a substantial amount of communication resources. In this paper, we introduce diffusion algorithms that have a significantly reduced communication load without compromising performance. We also perform analyses in the mean and mean-square sense. Simulations results are provided to confirm the theoretical findings.
    BibTeX:
    @inproceedings{harrane2016doubly,
    author = {Harrane, Ibrahim and Flamary, R. and Richard, C.},
    title = {Doubly partial-diffusion LMS over adaptive networks},
    booktitle = {Asilomar Conference on Signals, Systems and Computers (ASILOMAR)},
    year = {2016}
    }
    S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, Supervised planetary unmixing with optimal transport, Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
    Abstract: This paper is focused on spectral unmixing and present an original technique based on Optimal Transport. Optimal Transport consists in estimating a plan that transports a spectrum onto another with minimal cost, enabling to compute an associated distance (Wasserstein distance) that can be used as an alternative metric to compare hyperspectral data. This is exploited for spectral unmixing where abundances in each pixel are estimated on the basis of their projections in a Wasserstein sense (Bregman projections) onto known endmembers. In this work an over-complete dictionary is used to deal with internal variability between endmembers, while a regularization term, also based on Wasserstein distance, is used to promote prior proportion knowledge in the endmember groups. Experiments are performed on real hyperspectral data of asteroid 4-Vesta.
    BibTeX:
    @inproceedings{nakhostin2016planetary,
    author = {Nakhostin, Sina  and Courty, Nicolas and Flamary, Remi and Tuia, D. and Corpetti, Thomas},
    title = {Supervised planetary unmixing with optimal transport},
    booktitle = {Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS)},
    year = {2016}
    }
    S. Canu, R. Flamary, D. Mary, Introduction to optimization with applications in astronomy and astrophysics, Mathematical tools for instrumentation and signal processing in astronomy, 2016.
    Abstract: This chapter aims at providing an introduction to numerical optimization with some applications in astronomy and astrophysics. We provide important preliminary definitions that will guide the reader towards different optimization procedures. We discuss three families of optimization problems and describe numerical algorithms allowing, when this is possible, to solve these problems. For each family, we present in detail simple examples and more involved advanced examples. As a final illustration, we focus on two worked-out examples of optimization applied to astronomical data. The first application is a supervised classification of RR-Lyrae stars. The second one is the denoising of galactic spectra formulated by means of sparsity inducing models in a redundant dictionary.
    BibTeX:
    @incollection{canu2016introduction,
    author = { Canu, Stephane, and Flamary, Remi and Mary, David},
    title = {Introduction to optimization with applications in astronomy and astrophysics},
    booktitle = { Mathematical tools for instrumentation and signal processing in astronomy},
    editor = { {Mary, David and Flamary, Remi, and Theys, Celine, and Aime, Claude}},
    year = {2016}
    }
    R. Flamary, A. Rakotomamonjy, M. Sebag, Apprentissage statistique pour les BCI, Les interfaces cerveau-ordinateur 1, fondements et méthodes, pp 197-215, 2016.
    Abstract: Ce chapitre introduit l'apprentissage statistique et son application aux interfaces cerveau-machine. Dans un premier temps, le principe général de l'apprentissage supervisé est présenté et les difficultés de mise en oeuvre sont discutées, en particulier les aspects relatifs a la sélection de capteurs et l'apprentissage multi- sujets. Ce chapitre détaille également la validation d'une approche d'apprentissage, incluant les différentes mesures de performance et l’optimisation des hyper-paramètres de l'algorithme considéré. Le lecteur est invité à expérimenter les algorithmes décrits : une boite a outils Matlab/Octave 1 permet de reproduire les expériences illustrant le chapitre et contient les détails d'implémentation des différentes méthodes.
    BibTeX:
    @incollection{flamary2016apprentissage,
    author = { Flamary, Remi and Rakotomamonjy, Alain, and Sebag, Michele},
    title = {Apprentissage statistique pour les BCI},
    pages = { 197-215},
    booktitle = { Les interfaces cerveau-ordinateur 1, fondements et méthodes},
    editor = { {Clerc, Maureen and Bougrain, Laurent and Lotte, Fabien}},
    publisher = { ISTE Editions},
    year = {2016}
    }
    R. Flamary, A. Rakotomamonjy, M. Sebag, Statistical learning for BCIs, Brain Computer Interfaces 1: Fundamentals and Methods, pp 185-206, 2016.
    Abstract: This chapter introduces statistical learning and its applications to brain–computer interfaces. We begin by presenting the general principles of supervised learning and discussing the difficulties raised by its implementation, with a particular focus on aspects related to selecting sensors and multisubject learning. This chapter also describes in detail how a learning approach may be validated, including various metrics of performance and optimization of the hyperparameters of the considered algorithms. We invite the reader to experiment with the algorithms described here: the illustrative experiments included in this chapter may be reproduced using a Matlab/Octave toolbox, which contains the implementation details of the various different methods.
    BibTeX:
    @incollection{flamary2016statistical,
    author = { Flamary, Remi and Rakotomamonjy, Alain, and Sebag, Michele},
    title = {Statistical learning for BCIs},
    pages = { 185-206},
    booktitle = { Brain Computer Interfaces 1: Fundamentals and Methods},
    editor = { {Clerc, Maureen and Bougrain, Laurent and Lotte, Fabien}},
    publisher = { ISTE Ltd and John Wiley and Sons Inc },
    year = {2016}
    }
    D. Tuia, R. Flamary, M. Barlaud, Non-convex regularization in remote sensing, Geoscience and Remote Sensing, IEEE Transactions on, 2016.
    Abstract: In this paper, we study the effect of different regularizers and their implications in high dimensional image classification and sparse linear unmixing. Although kernelization or sparse methods are globally accepted solutions for processing data in high dimensions, we present here a study on the impact of the form of regularization used and its parametrization. We consider regularization via traditional squared (l2) and sparsity-promoting (l1) norms, as well as more unconventional nonconvex regularizers (lp and Log Sum Penalty). We compare their properties and advantages on several classification and linear unmixing tasks and provide advices on the choice of the best regularizer for the problem at hand. Finally, we also provide a fully functional toolbox for the community
    BibTeX:
    @article{tuia2016nonconvex,
    author = {Tuia, D. and  Flamary, R. and Barlaud, M.},
    title = {Non-convex regularization in remote sensing},
    journal = {Geoscience and Remote Sensing, IEEE Transactions on},
    year = {2016}
    }
    N. Courty, R. Flamary, D. Tuia, T. Corpetti, Optimal transport for data fusion in remote sensing, International Geoscience and Remote Sensing Symposium (IGARSS), 2016.
    Abstract: One of the main objective of data fusion is the integration of several acquisition of the same physical object, in order to build a new consistent representation that embeds all the information from the different modalities. In this paper, we propose the use of optimal transport theory as a powerful mean of establishing correspondences between the modalities. After reviewing important properties and computational aspects, we showcase its application to three remote sensing fusion problems: domain adaptation, time series averaging and change detection in LIDAR data.
    BibTeX:
    @inproceedings{courty2016optimalrs,
    author = {Courty, N. and Flamary, R. and Tuia, D. and Corpetti, T.},
    title = {Optimal transport for data fusion in remote sensing},
    booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2016}
    }
    I. Harrane, R. Flamary, C. Richard, Toward privacy-preserving diffusion strategies for adaptation and learning over networks, European Conference on Signal Processing (EUSIPCO), 2016.
    Abstract: Distributed optimization allows to address inference problems in a decentralized manner over networks, where agents can exchange information with their neighbors to improve their local estimates. Privacy preservation has become an important issue in many data mining applications. It aims at protecting the privacy of individual data in order to prevent the disclosure of sensitive information during the learning process. In this paper, we derive a diffusion strategy of the LMS type to solve distributed inference problems in the case where agents are also interested in preserving the privacy of the local measurements. We carry out a detailed mean and mean-square error analysis of the algorithm. Simulations are provided to check the theoretical findings.
    BibTeX:
    @inproceedings{haranne2016toward,
    author = {Harrane, I. and Flamary, R. and Richard, C.},
    title = {Toward privacy-preserving diffusion strategies for adaptation and learning over networks},
    booktitle = {European Conference on Signal Processing (EUSIPCO)},
    year = {2016}
    }
    A. Rakotomamonjy, R. Flamary, G. Gasso, DC Proximal Newton for Non-Convex Optimization Problems, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 27, N. 3, pp 636-647, 2016.
    Abstract: We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm admit as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function anddoi regularizers are non-convex.
    BibTeX:
    @article{rakoto2015dcprox,
    author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G.},
    title = {DC Proximal Newton for Non-Convex Optimization Problems},
    journal = { Neural Networks and Learning Systems, IEEE Transactions on},
    volume = {27},
    number = {3},
    pages = {636-647},
    year = {2016}
    }

    2015

    R. Flamary, A. Rakotomamonjy, G. Gasso, Importance Sampling Strategy for Non-Convex Randomized Block-Coordinate Descent, IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2015.
    Abstract: As the number of samples and dimensionality of optimization problems related to statistics and machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that helps randomized coordinate descent algorithms to focus on blocks that are still far from convergence. The framework applies to problems composed of the sum of two possibly non-convex terms, one being separable and non-smooth. We have compared our algorithm to a full gradient proximal approach as well as to a randomized block coordinate algorithm that considers uniform sampling and cyclic block coordinate descent. Experimental evidences show the clear benefit of using an importance sampling strategy.
    BibTeX:
    @inproceedings{flamary2015importance,
    author = {Flamary, R. and Rakotomamonjy, A. and  Gasso, G.},
    title = {Importance Sampling Strategy for Non-Convex Randomized Block-Coordinate Descent},
    booktitle = {IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)},
    year = {2015}
    }
    R. Flamary, I. Harrane, M. Fauvel, S. Valero, M. Dalla Mura, Discrimination périodique à partir d’observations multi-temporelles, GRETSI, 2015.
    Abstract: In this work, we propose a novel linear classification scheme for non-stationary periodic data. We express the classifier in a temporal basis while regularizing its temporal complexity leading to a convex optimization problem. Numerical experiments show very good results on a simulated example and on real life remote sensing image classification problem.
    BibTeX:
    @conference{flamary2015discrimination,
    author = {Flamary, R. and Harrane, I. and Fauvel, M. and Valero, S. and Dalla Mura, M.},
    title = {Discrimination périodique à partir d’observations multi-temporelles},
    booktitle = {GRETSI},
    year = {2015}
    }
    D. Tuia, R. Flamary, A. Rakotomamonjy, N. Courty, Multitemporal classification without new labels: a solution with optimal transport, International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multitemp), 2015.
    Abstract: We propose to adapt distributions between couples of remote sensing images with regularized optimal transport: we apply two forms of regularizations, namely an entropy-based regularization and a class-based regularization to a series of classification problems involving very high resolution images acquired by the WorldView2 satellite. We study the effect of the two regularizers on the quality of the transport.
    BibTeX:
    @inproceedings{tuia2015multitemporal,
    author = {Tuia, D. and Flamary, R. and Rakotomamonjy, A. and  Courty, N.},
    title = {Multitemporal classification without new labels: a solution with optimal transport},
    booktitle = {International Workshop on the Analysis of Multitemporal Remote Sensing Images (Multitemp)},
    year = {2015}
    }
    A. Rakotomamonjy, R. Flamary, N. Courty, Generalized conditional gradient: analysis of convergence and applications, 2015.
    Abstract: The objectives of this technical report is to provide additional results on the generalized conditional gradient methods introduced by Bredies et al. [BLM05]. Indeed, when the objective function is smooth, we provide a novel certificate of optimality and we show that the algorithm has a linear convergence rate. Applications of this algorithm are also discussed
    BibTeX:
    @techreport{rakotomamonjy2015generalized,
    author = {Rakotomamonjy, Alain and Flamary, Rémi and Courty, Nicolas},
    title = {Generalized conditional gradient: analysis of convergence and applications},
    year = {2015}
    }
    D. Tuia, R. Flamary, M. Barlaud, To be or not to be convex? A study on regularization in hyperspectral image classification, International Geoscience and Remote Sensing Symposium (IGARSS), 2015.
    Abstract: Hyperspectral image classification has long been dominated by convex models, which provide accurate decision functions exploiting all the features in the input space. However, the need for high geometrical details, which are often satisfied by using spatial filters, and the need for compact models (i.e. relying on models issued form reduced input spaces) has pushed research to study alternatives such as sparsity inducing regularization, which promotes models using only a subset of the input features. Although successful in reducing the number of active inputs, these models can be biased and sometimes offer sparsity at the cost of reduced accuracy. In this paper, we study the possibility of using non-convex regularization, which limits the bias induced by the regularization. We present and compare four regularizers, and then apply them to hyperspectral classification with different cost functions.
    BibTeX:
    @inproceedings{tuia2015tobe,
    author = {Tuia, D. and Flamary, R. and Barlaud, M.},
    title = {To be or not to be convex? A study on regularization in   hyperspectral image classification},
    booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2015}
    }
    D. Tuia, R. Flamary, N. Courty, Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions, ISPRS Journal of Photogrammetry and Remote Sensing, 2015.
    Abstract: In this paper, we tackle the question of discovering an effective set of spatial filters to solve hyperspectral classification problems. Instead of fixing a priori the filters and their parameters using expert knowledge, we let the model find them within random draws in the (possibly infinite) space of possible filters. We define an active set feature learner that includes in the model only features that improve the classifier. To this end, we consider a fast and linear classifier, multiclass logistic classification, and show that with a good representation (the filters discovered), such a simple classifier can reach at least state of the art performances. We apply the proposed active set learner in four hyperspectral image classification problems, including agricultural and urban classification at different resolutions, as well as multimodal data. We also propose a hierarchical setting, which allows to generate more complex banks of features that can better describe the nonlinearities present in the data.
    BibTeX:
    @article{tuia2015multiclass,
    author = {Tuia, D. and Flamary, R. and  Courty, N.},
    title = {Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions},
    journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
    year = {2015}
    }
    R. Flamary, M. Fauvel, M. Dalla Mura, S. Valero, Analysis of multi-temporal classification techniques for forecasting image times series, Geoscience and Remote Sensing Letters (GRSL), Vol. 12, N. 5, pp 953-957, 2015.
    Abstract: The classification of an annual times series by using data from past years is investigated in this paper. Several classification schemes based on data fusion, sparse learning and semi-supervised learning are proposed to address the problem. Numerical experiments are performed on a MODIS image time series and show that while several approaches have statistically equivalent performances, SVM with 1 regularization leads to a better interpretation of the results due to their inherent sparsity in the temporal domain.
    BibTeX:
    @article{flamary2014analysis,
    author = { Flamary, R. and Fauvel, M. and Dalla Mura, M. and Valero, S.},
    title = {Analysis of multi-temporal classification techniques for forecasting image times series},
    journal = { Geoscience and Remote Sensing Letters (GRSL)},
    volume = {12},
    number = {5},
    pages = {953-957},
    year = {2015}
    }

    2014

    R. Flamary, N. Courty, D. Tuia, A. Rakotomamonjy, Optimal transport with Laplacian regularization: Applications to domain adaptation and shape matching, NIPS Workshop on Optimal Transport and Machine Learning OTML, 2014.
    Abstract: We propose a method based on optimal transport for empirical distributions with Laplacian regularization (LOT). Laplacian regularization is a graph-based regularization that can encode neighborhood similarity between samples either on the final position of the transported samples or on their displacement as in the work of Ferradans et al.. In both cases, LOT is expressed as a quadratic programming problem and can be solved with a Frank-Wolfe algorithm with optimal step size. Results on domain adaptation and a shape matching problems show the interest of using this regularization in optimal transport.
    BibTeX:
    @conference{flamary2014optlaplace,
    author = { Flamary, R. and Courty, N.. and Tuia, D. and Rakotomamonjy, A.},
    title = {Optimal transport with Laplacian regularization: Applications to domain adaptation and shape matching},
    booktitle = { },
    howpublished = { NIPS Workshop on Optimal Transport and Machine Learning OTML},
    year = {2014}
    }
    R. Flamary, A. Rakotomamonjy, G. Gasso, Learning Constrained Task Similarities in Graph-Regularized Multi-Task Learning, Regularization, Optimization, Kernels, and Support Vector Machines, 2014.
    Abstract: This chapter addresses the problem of learning constrained task relatedness in a graph-regularized multi-task learning framework. In such a context, the weighted adjacency matrix of a graph encodes the knowledge on task similarities and each entry of this matrix can be interpreted as a hyperparameter of the learning problem. This task relation matrix is learned via a bilevel optimization procedure where the outer level optimizes a proxy of the generalization errors over all tasks with respect to the similarity matrix and the inner level estimates the parameters of the tasks knowing this similarity matrix. Constraints on task similarities are also taken into account in this optimization framework and they allow the task similarity matrix to be more interpretable for instance, by imposing a sparse similarity matrix. Since the global problem is non-convex, we propose a non-convex proximal algorithm that provably converges to a stationary point of the problem. Empirical evidence illustrates the approach is competitive compared to existing methods that also learn task relation and exhibits an enhanced interpretability of the learned task similarity matrix.
    BibTeX:
    @incollection{flamary2014learning,
    author = {  Flamary, R. and  Rakotomamonjy, A. and Gasso, G.},
    title = {Learning Constrained Task Similarities in Graph-Regularized Multi-Task Learning},
    booktitle = { Regularization, Optimization, Kernels, and Support Vector Machines},
    editor = { {Suykens J. A.K. ,  Signoretto M., Argyriou A.}},
    year = {2014}
    }
    R. Flamary, C. Aime, Optimization of starshades: focal plane versus pupil plane, Astronomy and Astrophysics, Vol. 569, N. A28, pp 10, 2014.
    Abstract: We search for the best possible transmission for an external occulter coronagraph that is dedicated to the direct observation of terrestrial exoplanets. We show that better observation conditions are obtained when the flux in the focal plane is minimized in the zone in which the exoplanet is observed, instead of the total flux received by the telescope. We describe the transmission of the occulter as a sum of basis functions. For each element of the basis, we numerically computed the Fresnel diffraction at the aperture of the telescope and the complex amplitude at its focus. The basis functions are circular disks that are linearly apodized over a few centimeters (truncated cones). We complemented the numerical calculation of the Fresnel diffraction for these functions by a comparison with pure circular discs (cylinder) for which an analytical expression, based on a decomposition in Lommel series, is available. The technique of deriving the optimal transmission for a given spectral bandwidth is a classical regularized quadratic minimization of intensities, but linear optimizations can be used as well. Minimizing the integrated intensity on the aperture of the telescope or for selected regions of the focal plane leads to slightly different transmissions for the occulter. For the focal plane optimization, the resulting residual intensity is concentrated behind the geometrical image of the occulter, in a blind region for the observation of an exoplanet, and the level of background residual starlight becomes very low outside this image. Finally, we provide a tolerance analysis for the alignment of the occulter to the telescope which also favors the focal plane optimization. This means that telescope offsets of a few decimeters do not strongly reduce the efficiency of the occulter.
    BibTeX:
    @article{flamary2014starshade,
    author = { Flamary, Remi and Aime, Claude},
    title = {Optimization of starshades: focal plane versus pupil plane},
    journal = { Astronomy and Astrophysics},
    volume = {569},
    number = {A28},
    pages = { 10},
    year = {2014}
    }
    A. Boisbunon, R. Flamary, A. Rakotomamonjy, A. Giros, J. Zerubia, Large scale sparse optimization for object detection in high resolution images, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2014.
    Abstract: In this work, we address the problem of detecting objects in images by expressing the image as convolutions between activation matrices and dictionary atoms. The activation matrices are estimated through sparse optimization and correspond to the position of the objects. In particular, we propose an efficient algorithm based on an active set strategy that is easily scalable and can be computed in parallel. We apply it to a toy image and a satellite image where the aim is to detect all the boats in a harbor. These results show the benefit of using nonconvex penalties, such as the log-sum penalty, over the convex l1 penalty.
    BibTeX:
    @inproceedings{boisbunon2014largescale,
    author = {Boisbunon, A. and Flamary, R. and Rakotomamonjy, A. and Giros, A. and Zerubia, J.},
    title = {Large scale sparse optimization for object detection in high resolution images},
    booktitle = {IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2014}
    }
    E. Niaf, R. Flamary, A. Rakotomamonjy, O. Rouvière, C. Lartizien, SVM with feature selection and smooth prediction in images: application to CAD of prostate cancer, IEEE International Conference on Image Processing (ICIP), 2014.
    Abstract: We propose a new computer-aided detection scheme for prostate cancer screening on multiparametric magnetic resonance (mp-MR) images. Based on an annotated training database of mp-MR images from thirty patients, we train a novel support vector machine (SVM)-inspired classifier which simultaneously learns an optimal linear discriminant and a subset of predictor variables (or features) that are most relevant to the classification task, while promoting spatial smoothness of the malignancy prediction maps. The approach uses a $\ell_1$-norm in the regularization term of the optimization problem that rewards sparsity. Spatial smoothness is promoted via an additional cost term that encodes the spatial neighborhood of the voxels, to avoid noisy prediction maps. Experimental comparisons of the proposed $\ell_1$-Smooth SVM scheme to the regular $\ell_2$-SVM scheme demonstrate a clear visual and numerical gain on our clinical dataset.
    BibTeX:
    @inproceedings{niaf2014svmsmooth,
    author = {Niaf, E. and Flamary, R. and Rakotomamonjy, A. and Rouvière, O. and Lartizien, C.},
    title = {SVM with feature selection and smooth prediction in images: application to CAD of prostate cancer},
    booktitle = {IEEE International Conference on Image Processing (ICIP)},
    year = {2014}
    }
    D. Tuia, N. Courty, R. Flamary, A group-lasso active set strategy for multiclass hyperspectral image classification, Photogrammetric Computer Vision (PCV), 2014.
    Abstract: Hyperspectral images have a strong potential for landcover/landuse classification, since the spectra of the pixels can highlight subtle differences between materials and provide information beyond the visible spectrum. Yet, a limitation of most current approaches is the hypothesis of spatial independence between samples: images are spatially correlated and the classification map should exhibit spatial regularity. One way of integrating spatial smoothness is to augment the input spectral space with filtered versions of the bands. However, open questions remain, such as the selection of the bands to be filtered, or the filterbank to be used. In this paper, we consider the entirety of the possible spatial filters by using an incremental feature learning strategy that assesses whether a candidate feature would improve the model if added to the current input space. Our approach is based on a multiclass logistic classifier with group-lasso regularization. The optimization of this classifier yields an optimality condition, that can easily be used to assess the interest of a candidate feature without retraining the model, thus allowing drastic savings in computational time. We apply the proposed method to three challenging hyperspectral classification scenarios, including agricultural and urban data, and study both the ability of the incremental setting to learn features that always improve the model and the nature of the features selected.
    BibTeX:
    @inproceedings{tuia2014grouplasso,
    author = {Tuia, D. and Courty, N. and Flamary, R.},
    title = {A group-lasso active set strategy for multiclass hyperspectral image classification},
    booktitle = {Photogrammetric Computer Vision (PCV)},
    year = {2014}
    }
    J. Lehaire, R. Flamary, O. Rouvière, C. Lartizien, Computer-aided diagnostic for prostate cancer detection and characterization combining learned dictionaries and supervised classification, IEEE International Conference on Image Processing (ICIP), 2014.
    Abstract: This paper aims at presenting results of a computer-aided diagnostic (CAD) system for voxel based detection and characterization of prostate cancer in the peripheral zone based on multiparametric magnetic resonance (mp-MR) imaging. We propose an original scheme with the combination of a feature extraction step based on a sparse dictionary learning (DL) method and a supervised classification in order to discriminate normal (N), normal but suspect (NS) tissues as well as different classes of cancer tissue whose aggressiveness is characterized by the Gleason score ranging from 6 (GL6) to 9 (GL9). We compare the classification performance of two supervised methods, the linear support vector machine (SVM) and the multinomial logistic regression (MLR) classifiers in a binary classification task. Classification performances were evaluated over an mp-MR image database of 35 patients where each voxel was labeled, based on a ground truth, by an expert radiologist. Results show that the proposed method in addition to being interpretable thanks to the sparse representation of the voxels compares favorably (AUC>0.8) with recent state of the art performances. Preliminary results on example patients data also indicate that the outputs cancer probability maps are correlated to the Gleason score.
    BibTeX:
    @inproceedings{lehaire2014dicolearn,
    author = {Lehaire, J. and Flamary, R. and Rouvière, O. and Lartizien, C.},
    title = {Computer-aided diagnostic for prostate cancer detection and characterization combining learned dictionaries and supervised
    classification},
    booktitle = {IEEE International Conference on Image Processing (ICIP)},
    year = {2014}
    }
    A. Ferrari, D. Mary, R. Flamary, C. Richard, Distributed image reconstruction for very large arrays in radio astronomy, IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2014.
    Abstract: Current and future radio interferometric arrays such as LOFAR and SKA are characterized by a paradox. Their large number of receptors (up to millions) allow theoretically unprecedented high imaging resolution. In the same time, the ultra massive amounts of samples makes the data transfer and computational loads (correlation and calibration) order of magnitudes too high to allow any currently existing image reconstruction algorithm to achieve, or even approach, the theoretical resolution. We investigate here decentralized and distributed image reconstruction strategies which select, transfer and process only a fraction of the total data. The loss in MSE incurred by the proposed approach is evaluated theoretically and numerically on simple test cases.
    BibTeX:
    @inproceedings{ferrari2014distributed,
    author = {Ferrari, A. and Mary, D. and Flamary, R. and Richard, C.},
    title = {Distributed image reconstruction for very large arrays in radio astronomy},
    booktitle = {IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM)},
    year = {2014}
    }
    N. Courty, R. Flamary, D. Tuia, Domain adaptation with regularized optimal transport, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2014.
    Abstract: We present a new and original method to solve the domain adaptation problem using optimal transport. By searching for the best transportation plan between the probability distribution functions of a source and a target domain, a non-linear and invertible transformation of the learning samples can be estimated. Any standard machine learning method can then be applied on the transformed set, which makes our method very generic. We propose a new optimal transport algorithm that incorporates label information in the optimization: this is achieved by combining an efficient matrix scaling technique together with a majoration of a non-convex regularization term. By using the proposed optimal transport with label regularization, we obtain significant increase in performance compared to the original transport solution. The proposed algorithm is computationally efficient and effective, as illustrated by its evaluation on a toy example and a challenging real life vision dataset, against which it achieves competitive results with respect to state-of-the-art methods.
    BibTeX:
    @inproceedings{courty2014domain,
    author = {Courty, N. and Flamary, R. and Tuia, D.},
    title = {Domain adaptation with regularized optimal transport},
    booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)},
    year = {2014}
    }
    A. Boisbunon, R. Flamary, A. Rakotomamonjy, Active set strategy for high-dimensional non-convex sparse optimization problems, International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2014.
    Abstract: The use of non-convex sparse regularization has attracted much interest when estimating a very sparse model on high dimensional data. In this work we express the optimality conditions of the optimization problem for a large class of non-convex regularizers. From those conditions, we derive an efficient active set strategy that avoids the computing of unnecessary gradients. Numerical experiments on both generated and real life datasets show a clear gain in computational cost w.r.t. the state of the art when using our method to obtain very sparse solutions.
    BibTeX:
    @inproceedings{boisbunon2014active,
    author = {Boisbunon, A. and Flamary, R. and Rakotomamonjy, A.},
    title = {Active set strategy for high-dimensional non-convex sparse optimization problems},
    booktitle = {International Conference on Acoustic, Speech and Signal Processing (ICASSP)},
    year = {2014}
    }
    R. Flamary, N. Jrad, R. Phlypo, M. Congedo, A. Rakotomamonjy, Mixed-Norm Regularization for Brain Decoding, Computational and Mathematical Methods in Medicine, Vol. 2014, N. 1, pp 1-13, 2014.
    Abstract: This work investigates the use of mixed-norm regularization for sensor selection in event-related potential (ERP) based brain-computer interfaces (BCI). The classification problem is cast as a discriminative optimization framework where sensor selection is induced through the use of mixed-norms. This framework is extended to the multitask learning situation where several similar classification tasks related to different subjects are learned simultaneously. In this case, multitask learning helps in leveraging data scarcity issue yielding to more robust classifiers. For this purpose, we have introduced a regularizer that induces both sensor selection and classifier similarities. The different regularization approaches are compared on three ERP datasets showing the interest of mixed-norm regularization in terms of sensor selection. The multitask approaches are evaluated when a small number of learning examples are available yielding to significant performance improvements especially for subjects performing poorly.
    BibTeX:
    @article{flamary2014mixed,
    author = {Flamary, R. and Jrad, N. and Phlypo, R. and Congedo, M. and Rakotomamonjy, A.},
    title = {Mixed-Norm Regularization for Brain Decoding},
    journal = {Computational and Mathematical Methods in Medicine},
    volume = {2014},
    number = {1},
    pages = {1-13},
    year = {2014}
    }
    E. Niaf, R. Flamary, O. Rouvière, C. Lartizien, S. Canu, Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging, Image Processing, IEEE Transactions on, Vol. 23, N. 3, pp 979-991, 2014.
    Abstract: Building an accurate training database is challenging in supervised classification. For instance, in medical imaging, radiologists often delineate malignant and benign tissues without access to the histological ground truth, leading to uncertain data sets. This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here are both qualitative (a class label) or quantitative (an estimation of the posterior probability). In this context, usual discriminative methods, such as the support vector machine (SVM), fail either to learn a robust classifier or to predict accurate probability estimates. We generalize the regular SVM by introducing a new formulation of the learning problem to take into account class labels as well as class probability estimates. This original reformulation into a probabilistic SVM (P-SVM) can be efficiently solved by adapting existing flexible SVM solvers. Furthermore, this framework allows deriving a unique learned prediction function for both decision and posterior probability estimation providing qualitative and quantitative predictions. The method is first tested on synthetic data sets to evaluate its properties as compared with the classical SVM and fuzzy-SVM. It is then evaluated on a clinical data set of multiparametric prostate magnetic resonance images to assess its performances in discriminating benign from malignant tissues. P-SVM is shown to outperform classical SVM as well as the fuzzy-SVM in terms of probability predictions and classification performances, and demonstrates its potential for the design of an efficient computer-aided decision system for prostate cancer diagnosis based on multiparametric magnetic resonance (MR) imaging.
    BibTeX:
    @article{niaf2014kernel,
    author = {Niaf, E. and Flamary, R. and Rouvière, O. and Lartizien, C. and  Canu, S.},
    title = {Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging},
    journal = {Image Processing, IEEE Transactions on},
    volume = {23},
    number = {3},
    pages = {979-991},
    year = {2014}
    }
    D. Tuia, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, R. Flamary, Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM, Geoscience and Remote Sensing, IEEE Transactions on, Vol. 52, N. 10, pp 6062-6074, 2014.
    Abstract: Including spatial information is a key step for successful remote sensing image classification. In particular, when dealing with high spatial resolution, if local variability is strongly reduced by spatial filtering, the classification performance results are boosted. In this paper, we consider the triple objective of designing a spatial/spectral classifier, which is compact (uses as few features as possible), discriminative (enhances class separation), and robust (works well in small sample situations). We achieve this triple objective by discovering the relevant features in the (possibly infinite) space of spatial filters by optimizing a margin-maximization criterion. Instead of imposing a filter bank with predefined filter types and parameters, we let the model figure out which set of filters is optimal for class separation. To do so, we randomly generate spatial filter banks and use an active-set criterion to rank the candidate features according to their benefits to margin maximization (and, thus, to generalization) if added to the model. Experiments on multispectral very high spatial resolution (VHR) and hyperspectral VHR data show that the proposed algorithm, which is sparse and linear, finds discriminative features and achieves at least the same performances as models using a large filter bank defined in advance by prior knowledge.
    BibTeX:
    @article{tuia2014automatic,
    author = {Tuia, D. and Volpi, M. and Dalla Mura, M. and Rakotomamonjy, A. and Flamary, R.},
    title = {Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM},
    journal = {Geoscience and Remote Sensing, IEEE Transactions on},
    volume = {52},
    number = {10},
    pages = {6062-6074},
    year = {2014}
    }
    L. Laporte, R. Flamary, S. Canu, S. Déjean, J. Mothe, Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 25, N. 6, pp 1118-1130, 2014.
    Abstract: Feature selection in learning to rank has recently emerged as a crucial issue. Whereas several preprocessing approaches have been proposed, only a few works have been focused on integrating the feature selection into the learning process. In this work, we propose a general framework for feature selection in learning to rank using SVM with a sparse regularization term. We investigate both classical convex regularizations such as l1 or weighted l1 and non-convex regularization terms such as log penalty, Minimax Concave Penalty (MCP) or lp pseudo norm with p lower than 1. Two algorithms are proposed, first an accelerated proximal approach for solving the convex problems, second a reweighted l1 scheme to address the non-convex regularizations. We conduct intensive experiments on nine datasets from Letor 3.0 and Letor 4.0 corpora. Numerical results show that the use of non-convex regularizations we propose leads to more sparsity in the resulting models while prediction performance is preserved. The number of features is decreased by up to a factor of six compared to the l1 regularization. In addition, the software is publicly available on the web.
    BibTeX:
    @article{tnnls2014,
    author = { Laporte, L. and Flamary, R. and Canu, S. and Déjean, S. and Mothe, J.},
    title = {Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM},
    journal = { Neural Networks and Learning Systems, IEEE Transactions on},
    volume = {25},
    number = {6},
    pages = {1118-1130},
    year = {2014}
    }

    2013

    W. Gao, J. Chen, C. Richard, J. Huang, R. Flamary, Kernel LMS algorithm with Forward-Backward splitting for dictionnary learning, International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2013.
    Abstract: Nonlinear adaptive filtering with kernels has become a topic of high interest over the last decade. A characteristics of kernel-based techniques is that they deal with kernel expansions whose number of terms is equal to the number of input data, making them unsuitable for online applications. Kernel-based adaptive filtering algorithms generally rely on a two-stage process at each iteration: a model order control stage that limits the increase in the number of terms by including only valuable kernels into the so-called dictionary, and a fil- ter parameter update stage. It is surprising to note that most existing strategies for dictionary update can only incorporate new elements into the dictionary. This unfortunately means that they cannot discard obsolete kernel functions, within the context of a time-varying environment in particular. Recently, to remedy this drawback, it has been proposed to associate an l1-norm regularization criterion with the mean-square error criterion. The aim of this paper is to provide theoretical results on the convergence of this approach.
    BibTeX:
    @inproceedings{gao2013kernel,
    author = {Gao, W. and Chen, J. and Richard, C. and Huang, J. and Flamary, R.},
    title = {Kernel LMS algorithm with Forward-Backward splitting for dictionnary learning},
    booktitle = {International Conference on Acoustic, Speech and Signal Processing (ICASSP)},
    year = {2013}
    }
    R. Flamary, A. Rakotomamonjy, Support Vector Machine with spatial regularization for pixel classification, International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines : theory and applications (ROKS), 2013.
    Abstract: We propose in this work to regularize the output of a svm classifier on pixels in order to promote smoothness in the predicted image. The learning problem can be cast as a semi-supervised SVM with a particular structure encoding pixel neighborhood in the regularization graph. We provide several optimization schemes in order to solve the problem for linear SVM with l2 or l1 regularization and show the interest of the approach on an image classification example with very few labeled pixels.
    BibTeX:
    @inproceedings{ROKS2013,
    author = {  Flamary, R. and  Rakotomamonjy, A.},
    title = {Support Vector Machine with spatial regularization for pixel classification},
    booktitle = { International Workshop on Advances in Regularization, Optimization, Kernel Methods and Support Vector Machines : theory and applications (ROKS)},
    year = {2013}
    }
    D. Tuia, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, R. Flamary, Create the relevant spatial filterbank in the hyperspectral jungle, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2013.
    Abstract: Inclusion of spatial information is known to be beneficial to the classification of hyperspectral images. However, given the high dimensionality of the data, it is difficult to know before hand which are the bands to filter or what are the filters to be applied. In this paper, we propose an active set algorithm based on a $l_1$ support vector machine that explores the (possibily infinite) space of spatial filters and retrieve automatically the filters that maximize class separation. Experiments on hyperspectral imagery confirms the power of the method, that reaches state of the art performance with small feature sets generated automatically and without prior knowledge.
    BibTeX:
    @inproceedings{IGARSS2013,
    author = {  Tuia, D. and  Volpi, M. and  Dalla Mura, M. and  Rakotomamonjy, A. and  Flamary, R.},
    title = {Create the relevant spatial filterbank in the hyperspectral jungle},
    booktitle = { IEEE International Geoscience and Remote Sensing Symposium (IGARSS)},
    year = {2013}
    }
    A. Rakotomamonjy, R. Flamary, F. Yger, Learning with infinitely many features, Machine Learning, Vol. 91, N. 1, pp 43-66, 2013.
    Abstract: We propose a principled framework for learning with infinitely many features, situations that are usually induced by continuously parametrized feature extraction methods. Such cases occur for instance when considering Gabor-based features in computer vision problems or when dealing with Fourier features for kernel approximations. We cast the problem as the one of finding a finite subset of features that minimizes a regularized empirical risk. After having analyzed the optimality conditions of such a problem, we propose a simple algorithm which has the avour of a column-generation technique. We also show that using Fourier-based features, it is possible to perform approximate infinite kernel learning. Our experimental results on several datasets show the benefits of the proposed approach in several situations including texture classification and large-scale kernelized problems (involving about 100 thousand examples).
    BibTeX:
    @article{ml2012,
    author = { Rakotomamonjy, A. and Flamary, R. and Yger, F.},
    title = {Learning with infinitely many features},
    journal = { Machine Learning},
    volume = {91},
    number = {1},
    pages = {43-66},
    year = {2013}
    }

    2012

    D. Tuia, R. Flamary, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, Discovering relevant spatial filterbanks for VHR image classification, International Conference on Pattern Recognition (ICPR), 2012.
    Abstract: In very high resolution (VHR) image classification it is common to use spatial filters to enhance the discrimination among landuses related to similar spectral properties but different spatial characteristics. However, the filters types that can be used are numerous (e.g. textural, morphological, Gabor, wavelets, etc.) and the user must pre-select a family of features, as well as their specific parameters. This results in features spaces that are high dimensional and redundant, thus requiring long and suboptimal feature selection phases. In this paper, we propose to discover the relevant filters as well as their parameters with a sparsity promoting regularization and an active set algorithm that iteratively adds to the model the most promising features. This way, we explore the filters/parameters input space efficiently (which is infinitely large for continuous parameters) and construct the optimal filterbank for classification without any other information than the types of filters to be used.
    BibTeX:
    @inproceedings{ICPR2012,
    author = {  Tuia, D. and  Flamary, R. and  Volpi, M. and  Dalla Mura, M. and  Rakotomamonjy, A.},
    title = { Discovering relevant spatial filterbanks for VHR image classification},
    booktitle = { International Conference on Pattern Recognition (ICPR)},
    year = {2012}
    }
    R. Flamary, A. Rakotomamonjy, Decoding finger movements from ECoG signals using switching linear models, Frontiers in Neuroscience, Vol. 6, N. 29, 2012.
    Abstract: One of the most interesting challenges in ECoG-based Brain-Machine Interface is movement prediction. Being able to perform such a prediction paves the way to high-degree precision command for a machine such as a robotic arm or robotic hands. As a witness of the BCI community increasing interest towards such a problem, the fourth BCI Competition provides a dataset which aim is to predict individual finger movements from ECog signals. The difficulty of the problem relies on the fact that there is no simple relation between ECoG signals and finger movements. We propose in this paper, to estimate and decode these finger flexions using switching models controlled by an hidden state. Switching models can integrate prior knowledge about the decoding problem and helps in predicting fine and precise movements. Our model is thus based on a first block which estimates which finger is moving and another block which, knowing which finger is moving, predicts the movements of all other fingers. Numerical results that have been submitted to the Competition show that the model yields high decoding performances when the hidden state is well estimated. This approach achieved the second place in the BCI competition with a correlation measure between real and predicted movements of 0.42.
    BibTeX:
    @article{frontiers2012,
    author = { Flamary, R. and  Rakotomamonjy, A.},
    title = {Decoding finger movements from ECoG signals using switching linear models},
    journal = { Frontiers in Neuroscience},
    volume = { 6},
    number = { 29},
    year = {2012}
    }
    R. Flamary, D. Tuia, B. Labbé, G. Camps-Valls, A. Rakotomamonjy, Large Margin Filtering, IEEE Transactions Signal Processing, Vol. 60, N. 2, pp 648-659, 2012.
    Abstract: Many signal processing problems are tackled by filtering the signal for subsequent feature classification or regression. Both steps are critical and need to be designed carefully to deal with the particular statistical characteristics of both signal and noise. Optimal design of the filter and the classifier are typically aborded in a separated way, thus leading to suboptimal classification schemes. This paper proposes an efficient methodology to learn an optimal signal filter and a support vector machine (SVM) classifier jointly. In particular, we derive algorithms to solve the optimization problem, prove its theoretical convergence, and discuss different filter regularizers for automated scaling and selection of the feature channels. The latter gives rise to different formulations with the appealing properties of sparseness and noise-robustness. We illustrate the performance of the method in several problems. First, linear and nonlinear toy classification examples, under the presence of both Gaussian and convolutional noise, show the robustness of the proposed methods. The approach is then evaluated on two challenging real life datasets: BCI time series classification and multispectral image segmentation. In all the examples, large margin filtering shows competitive classification performances while offering the advantage of interpretability of the filtered channels retrieved.
    BibTeX:
    @article{ieeesp2012,
    author = { Flamary, R. and Tuia, D. and Labbé, B. and Camps-Valls, G. and Rakotomamonjy, A.},
    title = {Large Margin Filtering},
    journal = { IEEE Transactions Signal Processing},
    volume = {60},
    number = {2},
    pages = {648-659},
    year = {2012}
    }
    E. Niaf, R. Flamary, S. Canu, O. Rouvière, C. Lartizien, Handling learning samples uncertainties in SVM : application to MRI-based prostate cancer Computer-Aided Diagnosis, IEEE International Symposium on Biomedical Imaging , 2012.
    Abstract: Building an accurate training database is challenging in supervised classification. Radiologists often delineate malignant and benign tissues without access to the ground truth thus leading to uncertain datasets. We propose to deal with this uncertainty by introducing probabilistic labels in the learning stage. We introduce a probabilistic support vector machine (P-SVM) inspired from the regular C-SVM formulation allowing to consider class labels through a hinge loss and probability estimates using epsilon-insensitive cost function together with a minimum norm (maximum margin) objective. Solution is used for both decision and posterior probability estimation.
    BibTeX:
    @inproceedings{isbi2012,
    author = { Niaf, E. and Flamary, R. and Canu, S. and Rouvière, O. and Lartizien, C.},
    title = {Handling learning samples uncertainties in SVM : application to MRI-based prostate cancer Computer-Aided Diagnosis},
    booktitle = { IEEE International Symposium on Biomedical Imaging },
    year = {2012}
    }

    2011

    A. Rakotomamonjy, R. Flamary, G. Gasso, S. Canu, lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning, IEEE Transactions on Neural Networks, Vol. 22, N. 8, pp 1307-1320, 2011.
    Abstract: Recently, there has been a lot of interest around multi-task learning (MTL) problem with the constraints that tasks should share a common sparsity profile. Such a problem can be addressed through a regularization framework where the regularizer induces a joint-sparsity pattern between task decision functions. We follow this principled framework and focus on $\ell_p-\ell_q$ (with $0 \leq p \leq 1$ and $ 1 \leq q \leq 2$) mixed-norms as sparsity- inducing penalties. Our motivation for addressing such a larger class of penalty is to adapt the penalty to a problem at hand leading thus to better performances and better sparsity pattern. For solving the problem in the general multiple kernel case, we first derive a variational formulation of the $\ell_1-\ell_q$ penalty which helps up in proposing an alternate optimization algorithm. Although very simple, the latter algorithm provably converges to the global minimum of the $\ell_1-\ell_q$ penalized problem. For the linear case, we extend existing works considering accelerated proximal gradient to this penalty. Our contribution in this context is to provide an efficient scheme for computing the $\ell_1-\ell_q$ proximal operator. Then, for the more general case when $0 < p < 1$, we solve the resulting non-convex problem through a majorization-minimization approach. The resulting algorithm is an iterative scheme which, at each iteration, solves a weighted $\ell_1-\ell_q$ sparse MTL problem. Empirical evidences from toy dataset and real-word datasets dealing with BCI single trial EEG classification and protein subcellular localization show the benefit of the proposed approaches and algorithms.
    BibTeX:
    @article{tnn2011,
    author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G. and Canu, S.},
    title = {lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning},
    journal = { IEEE Transactions on Neural Networks},
    volume = {22},
    number = {8},
    pages = {1307-1320},
    year = {2011}
    }
    R. Flamary, Apprentissage statistique pour le signal: applications aux interfaces cerveau-machine, Laboratoire LITIS, Université de Rouen, 2011.
    Abstract: Brain Computer Interfaces (BCI) require the use of statistical learning methods for signal recognition. In this thesis we propose a general approach using prior knowledge on the problem at hand through regularization. To this end, we learn jointly the classifier and the feature extraction step in a unique optimization problem. We focus on the problem of sensor selection, and propose several regularization terms adapted to the problem. Our first contribution is a filter learning method called large margin filtering. It consists in learning a filtering maximizing the margin between samples of each classe so as to adapt to the properties of the features. In addition, this approach is easy to interpret and can lead to the selection of the most relevant sensors. Numerical experiments on a real life BCI problem and a 2D image classification show the good behaviour of our method both in terms of performance and interpretability. The second contribution is a general sparse multitask learning approach. Several classifiers are learned jointly and discriminant kernels for all the tasks are automatically selected. We propose some efficient algorithms and numerical experiments have shown the interest of our approach. Finally, the third contribution is a direct application of the sparse multitask learning to a BCI event-related potential classification problem. We propose an adapted regularization term that promotes both sensor selection and similarity between the classifiers. Numerical experiments show that the calibration time of a BCI can be drastically reduced thanks to the proposed multitask approach.
    BibTeX:
    @phdthesis{thesis2011,
    author = { Flamary, R.},
    title = {Apprentissage statistique pour le signal: applications aux interfaces cerveau-machine},
    school = { Laboratoire LITIS, Université de Rouen},
    year = {2011}
    }
    N. Jrad, M. Congedo, R. Phlypo, S. Rousseau, R. Flamary, F. Yger, A. Rakotomamonjy, sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces, Journal of Neural Engineering, Vol. 8, N. 5, pp 056004, 2011.
    Abstract: In many machine learning applications, like brain–computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.
    BibTeX:
    @article{jrad2011swsvm,
    author = {N. Jrad and M. Congedo and R. Phlypo and S. Rousseau and R. Flamary and F. Yger and A. Rakotomamonjy},
    title = {sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces},
    journal = {Journal of Neural Engineering},
    volume = {8},
    number = {5},
    pages = {056004},
    year = {2011}
    }
    R. Flamary, F. Yger, A. Rakotomamonjy, Selecting from an infinite set of features in SVM, European Symposium on Artificial Neural Networks, 2011.
    Abstract: Dealing with the continuous parameters of a feature extraction method has always been a difficult task that is usually solved by cross-validation. In this paper, we propose an active set algorithm for selecting automatically these parameters in a SVM classification context. Our experiments on texture recognition and BCI signal classification show that optimizing the feature parameters in a continuous space while learning the decision function yields to better performances than using fixed parameters obtained from a grid sampling
    BibTeX:
    @inproceedings{ESANN2011,
    author = { Flamary, R. and Yger, F. and Rakotomamonjy, A.},
    title = { Selecting from an infinite set of features in SVM},
    booktitle = { European Symposium on Artificial Neural Networks},
    year = {2011}
    }
    R. Flamary, X. Anguera, N. Oliver, Spoken WordCloud: Clustering Recurrent Patterns in Speech, International Workshop on Content-Based Multimedia Indexing, 2011.
    Abstract: The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create the summary. However, this approach might not be suitable with adverse acoustic conditions or languages with limited training resources. In order to address these limitations, we propose in this paper an automatic speech summarization method that is based on the automatic discovery of patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions in the recording. This approach allows us to build what we call a Spoken WordCloud because of its similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.
    BibTeX:
    @inproceedings{CBMI2011,
    author = { Flamary, R. and Anguera, X. and Oliver, N.},
    title = { Spoken WordCloud: Clustering Recurrent Patterns in Speech},
    booktitle = { International Workshop on Content-Based Multimedia Indexing},
    year = {2011}
    }
    E. Niaf, R. Flamary, C. Lartizien, S. Canu, Handling uncertainties in SVM classification, IEEE Workshop on Statistical Signal Processing , 2011.
    Abstract: This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here is either qualitative (a class label) or quantitative (an estimation of the posterior probability). Our main contribution is a SVM inspired formulation of this problem allowing to take into account class label through a hinge loss as well as probability estimates using epsilon-insensitive cost function together with a minimum norm (maximum margin) objective. This formulation shows a dual form leading to a quadratic problem and allows the use of a representer theorem and associated kernel. The solution provided can be used for both decision and posterior probability estimation. Based on empirical evidence our method outperforms regular SVM in terms of probability predictions and classification performances.
    BibTeX:
    @inproceedings{ssp2011,
    author = { Niaf, E. and Flamary, R. and Lartizien, C. and Canu, S.},
    title = {Handling uncertainties in SVM classification},
    booktitle = { IEEE Workshop on Statistical Signal Processing },
    year = {2011}
    }

    2010

    R. Flamary, B. Labbé, A. Rakotomamonjy, Large margin filtering for signal sequence labeling, International Conference on Acoustic, Speech and Signal Processing 2010, 2010.
    Abstract: Signal Sequence Labeling consists in predicting a sequence of labels given an observed sequence of samples. A naive way is to filter the signal in order to reduce the noise and to apply a classification algorithm on the filtered samples. We propose in this paper to jointly learn the filter with the classifier leading to a large margin filtering for classification. This method allows to learn the optimal cutoff frequency and phase of the filter that may be different from zero. Two methods are proposed and tested on a toy dataset and on a real life BCI dataset from BCI Competition III.
    BibTeX:
    @inproceedings{flamaryicassp210,
    author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.},
    title = {Large margin filtering for signal sequence labeling},
    booktitle = { International Conference on Acoustic, Speech and Signal Processing  2010},
    year = {2010}
    }
    R. Flamary, B. Labbé, A. Rakotomamonjy, Filtrage vaste marge pour l'étiquetage séquentiel de signaux, Conference en Apprentissage CAp, 2010.
    Abstract: Ce papier traite de l’étiquetage séquentiel de signaux, c’est-à-dire de discrimination pour des échantillons temporels. Dans ce contexte, nous proposons une méthode d’apprentissage pour un filtrage vaste-marge séparant au mieux les classes. Nous apprenons ainsi de manière jointe un SVM sur des échantillons et un filtrage temporel de ces échantillons. Cette méthode permet l’étiquetage en ligne d’échantillons temporels. Un décodage de séquence hors ligne optimal utilisant l’algorithme de Viterbi est également proposé. Nous introduisons différents termes de régularisation, permettant de pondérer ou de sélectionner les canaux automatiquement au sens du critère vaste-marge. Finalement, notre approche est testée sur un exemple jouet de signaux non-linéaires ainsi que sur des données réelles d’Interface Cerveau-Machine. Ces expériences montrent l’intérêt de l’apprentissage supervisé d’un filtrage temporel pour l’étiquetage de séquence.
    BibTeX:
    @conference{flamcap2010,
    author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.},
    title = {Filtrage vaste marge pour l'étiquetage séquentiel de signaux},
    booktitle = { Conference en Apprentissage CAp},
    year = {2010}
    }
    D. Tuia, G. Camps-Valls, R. Flamary, A. Rakotomamonjy, Learning spatial filters for multispectral image segmentation, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2010.
    Abstract: We present a novel filtering method for multispectral satellite image classification. The proposed method learns a set of spatial filters that maximize class separability of binary support vector machine (SVM) through a gradient descent approach. Regularization issues are discussed in detail and a Frobenius-norm regularization is proposed to efficiently exclude uninformative filters coefficients. Experiments carried out on multiclass one-against-all classification and target detection show the capabilities of the learned spatial filters
    BibTeX:
    @inproceedings{mlsp10,
    author = { Tuia, D. and Camps-Valls, G. and Flamary, R. and Rakotomamonjy, A.},
    title = {Learning spatial filters for multispectral image segmentation},
    booktitle = { IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2010}
    }

    2009

    R. Flamary, B. Labbé, A. Rakotomamonjy, Large margin filtering for signal segmentation, NIPS Workshop on Temporal Segmentation NIPS Workshop in Temporal Segmentation, 2009.
    Abstract:
    BibTeX:
    @conference{nipsworkshop2009,
    author = { Flamary, R. and Labbé, B. and Rakotomamonjy, A.},
    title = {Large margin filtering for signal segmentation},
    booktitle = { NIPS Workshop on Temporal Segmentation},
    howpublished = { NIPS Workshop in Temporal Segmentation},
    year = {2009}
    }
    R. Flamary, A. Rakotomamonjy, G. Gasso, S. Canu, Selection de variables pour l'apprentissage simultanée de tâches, Conférence en Apprentissage (CAp'09), 2009.
    Abstract: Cet article traite de la sélection de variables pour l’apprentissage simultanée de taches de discrimination SVM . Nous formulons ce problème comme étant un apprentissage multi-taches avec pour terme de régularisation une norme mixte de type `p `2 avec p <1 . Cette dernière permet d’obtenir des modèles de discrimination pour chaque tâche, utilisant un même sous-ensemble des variables. Nous proposons tout d’abord un algorithme permettant de résoudre le problème d’apprentissage lorsque la norme mixte est convexe (p = 1). Ensuite, à l’aide de la programmation DC, nous traitons le cas non-convexe (p < 1) . Nous montrons que ce dernier cas peut être résolu par un algorithme itératif où, à chaque itération, un problème basé sur la norme mixte `1 `2 est résolu. Nos expériences montrent l’interêt de la méthode sur quelques problèmes de discriminations simultanées.
    BibTeX:
    @conference{cap09,
    author = { Flamary, R. and Rakotomamonjy, A. and Gasso, G. and Canu, S.},
    title = {Selection de variables pour l'apprentissage simultanée de tâches},
    booktitle = { Conférence en Apprentissage (CAp'09)},
    year = {2009}
    }
    R. Flamary, A. Rakotomamonjy, G. Gasso, S. Canu, SVM Multi-Task Learning and Non convex Sparsity Measure, The Learning Workshop The Learning Workshop (Snowbird), 2009.
    Abstract:
    BibTeX:
    @conference{snowbird09,
    author = { R. Flamary and A. Rakotomamonjy and G. Gasso and  S. Canu},
    title = {SVM Multi-Task Learning and Non convex Sparsity Measure},
    booktitle = { The Learning Workshop},
    howpublished = { The Learning Workshop (Snowbird)},
    year = {2009}
    }
    R. Flamary, J. Rose, A. Rakotomamonjy, S. Canu, Variational Sequence Labeling, IEEE Workshop in Machine Learning for Signal Processing (MLSP), 2009.
    Abstract: Sequence labeling is concerned with processing an input data sequence and producing an output sequence of discrete labels which characterize it. Common applications includes speech recognition, language processing (tagging, chunking) and bioinformatics. Many solutions have been proposed to partially cope with this problem. These include probabilistic models (HMMs, CRFs) and machine learning algorithm (SVM, Neural nets). In practice, the best results have been obtained by combining several of these methods. However, fusing different signal segmentation methods is not straightforward, particularly when integrating prior information. In this paper the sequence labeling problem is viewed as a multi objective optimization task. Each objective targets a different aspect of sequence labelling such as good classification, temporal stability and change detection. The resulting optimization problem turns out to be non convex and plagued with numerous local minima. A region growing algorithm is proposed as a method for finding a solution to this multi functional optimization task. The proposed algorithm is evaluated on both synthetic and real data (BCI dataset). Results are encouraging and better than those previously reported on these datasets.
    BibTeX:
    @inproceedings{mlsp09,
    author = { R. Flamary and J.L. Rose and A. Rakotomamonjy and S. Canu},
    title = {Variational Sequence Labeling},
    booktitle = { IEEE Workshop in Machine Learning for Signal Processing (MLSP)},
    year = {2009}
    }

    2008

    R. Flamary, Filtrage de surfaces obtenues à partir de structures M-Rep (M-Rep obtained surface filtering), Laboratoire CREATIS-LRMN, INSA de Lyon, 2008.
    Abstract:
    BibTeX:
    @mastersthesis{mrep08,
    author = { Flamary, R.},
    title = {Filtrage de surfaces obtenues à partir de structures M-Rep (M-Rep  obtained surface filtering)},
    school = { Laboratoire CREATIS-LRMN, INSA de Lyon},
    year = {2008}
    }