Professional website

All Journals Conferences Others

## Submited and preprint |

T. Vayer, L. Chapel, R. Flamary, R. Tavenard, N. Courty, Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties (Submited), 2018. |

Abstract: Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties Titouan Vayer, Laetita Chapel, Rémi Flamary, Romain Tavenard, Nicolas Courty
(Submitted on 7 Nov 2018)
Optimal transport theory has recently found many applications in machine learning thanks to its capacity for comparing various machine learning objects considered as distributions. The Kantorovitch formulation, leading to the Wasserstein distance, focuses on the features of the elements of the objects but treat them independently, whereas the Gromov-Wasserstein distance focuses only on the relations between the elements, depicting the structure of the object, yet discarding its features.
In this paper we propose to extend these distances in order to encode simultaneously both the feature and structure informations, resulting in the Fused Gromov-Wasserstein distance. We develop the mathematical framework for this novel distance, prove its metric and interpolation properties and provide a concentration result for the convergence of finite samples. We also illustrate and interpret its use in various contexts where structured objects are involved. |

BibTeX:
@article{vayer2018fused, author = {Vayer, Titouan and Chapel, Laetita and Flamary, Rémi and Tavenard, Romain and Courty, Nicolas}, title = {Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties}, year = {2018 (Submited)} } |

B. B. Damodaran, R. Flamary, V. Seguy, N. Courty, An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images (Submited), 2018. |

Abstract: Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods. |

BibTeX:
@article{damodaran2018entropic, author = { B. Damodaran, Bharath and Flamary, Rémi and Seguy, Viven and Courty, Nicolas}, title = {An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images}, year = {2018 (Submited)} } |

R. Mourya, A. Ferrari, R. Flamary, P. Bianchi, C. Richard, Distributed Deblurring of Large Images of Wide Field-Of-View (Submited), 2017. |

Abstract: Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view suffering from shift-variant blur. Most of the existing image deblurring techniques are designed and implemented to work efficiently on centralized computing system having multiple processors and a shared memory. Thus, the largest image that can be handle is limited by the size of the physical memory available on the system. In this paper, we propose a distributed nonblind image deblurring algorithm in which several connected processing nodes (with reasonable computational resources) process simultaneously different portions of a large image while maintaining certain coherency among them to finally obtain a single crisp image. Unlike the existing centralized techniques, image deblurring in distributed fashion raises several issues. To tackle these issues, we consider certain approximations that trade-offs between the quality of deblurred image and the computational resources required to achieve it. The experimental results show that our algorithm produces the similar quality of images as the existing centralized techniques while allowing distribution, and thus being cost effective for extremely large images. |

BibTeX:
@article{mourya2017distdeblur, author = {Mourya, Rahul and Ferrari, Andre and Flamary, Remi and Bianchi, Pascal and Richard, Cedric}, title = {Distributed Deblurring of Large Images of Wide Field-Of-View}, year = {2017 (Submited)} } |

## 2018 |

I. Harrane, R. Flamary, C. Richard, On reducing the communication cost of the diffusion LMS algorithm, IEEE Transactions on Signal and Information Processing over Networks (SIPN), 2018. |

Abstract: The rise of digital and mobile communications has recently made the world more connected and networked, resulting in an unprecedented volume of data flowing between sources, data centers, or processes. While these data may be processed in a centralized manner, it is often more suitable to consider distributed strategies such as diffusion as they are scalable and can handle large amounts of data by distributing tasks over networked agents. Although it is relatively simple to implement diffusion strategies over a cluster, it appears to be challenging to deploy them in an ad-hoc network with limited energy budget for communication. In this paper, we introduce a diffusion LMS strategy that significantly reduces communication costs without compromising the performance. Then, we analyze the proposed algorithm in the mean and mean-square sense. Next, we conduct numerical experiments to confirm the theoretical findings. Finally, we perform large scale simulations to test the algorithm efficiency in a scenario where energy is limited. |

BibTeX:
@article{harrane2018reducing, author = {Harrane, Ibrahim and Flamary, R. and Richard, C.}, title = {On reducing the communication cost of the diffusion LMS algorithm}, journal = {IEEE Transactions on Signal and Information Processing over Networks (SIPN)}, year = {2018} } |

R. Flamary, M. Cuturi, N. Courty, A. Rakotomamonjy, Wasserstein Discriminant Analysis, Machine learning , 2018. |

Abstract: Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset. |

BibTeX:
@article{flamary2017wasserstein, author = {Flamary, Remi and Cuturi, Marco and Courty, Nicolas and Rakotomamonjy, Alain}, title = {Wasserstein Discriminant Analysis}, journal = { Machine learning }, year = {2018} } |

## 2017 |

P. Hartley, R. Flamary, N. Jackson, A. S. Tagore, R. B. Metcalf, Support Vector Machine classification of strong gravitational lenses, Monthly Notices of the Royal Astronomical Society (MNRAS), 2017. |

Abstract: The imminent advent of very large-scale optical sky surveys, such as Euclid and LSST, makes it important to find efficient ways of discovering rare objects such as strong gravitational lens systems, where a background object is multiply gravitationally imaged by a foreground mass. As well as finding the lens systems, it is important to reject false positives due to intrinsic structure in galaxies, and much work is in progress with machine learning algorithms such as neural networks in order to achieve both these aims. We present and discuss a Support Vector Machine (SVM) algorithm which makes use of a Gabor filterbank in order to provide learning criteria for separation of lenses and non-lenses, and demonstrate using blind challenges that under certain circumstances it is a particularly efficient algorithm for rejecting false positives. We compare the SVM engine with a large-scale human examination of 100000 simulated lenses in a challenge dataset, and also apply the SVM method to survey images from the Kilo-Degree Survey. |

BibTeX:
@article{hartley2017support, author = {Hartley, Philippa, and Flamary, Remi and Jackson, Neal and Tagore, A. S. and Metcalf, R. B.}, title = {Support Vector Machine classification of strong gravitational lenses}, journal = {Monthly Notices of the Royal Astronomical Society (MNRAS)}, year = {2017} } |

R. Rougeot, R. Flamary, D. Galano, C. Aime, Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS, Astronomy and Astrophysics, 2017. |

Abstract: Context. The future ESA Formation Flying mission Proba-3 will fly the solar coronagraph ASPIICS which couples a Lyot coronagraph of 50mm and an external occulter of 1.42m diameter set 144m before. Aims. We perform a numerical study on the theoretical performance of the hybrid coronagraph such ASPIICS. In this system, an
internal occulter is set on the image of the external occulter instead of a Lyot mask on the solar image. First, we determine the rejection due to the external occulter alone. Second, the effects of sizing the internal occulter and the Lyot stop are analyzed. This work also applies to the classical Lyot coronagraph alone and the external solar coronagraph.
Methods. The numerical computation uses the parameters of ASPIICS. First we take the approach of Aime, C. 2013, A&A 558,
A138, to express the wave front from Fresnel diffraction at the entrance aperture of the Lyot coronagraph. From there, each wave
front coming from a given point of the Sun is propagated through the Lyot coronagraph in three steps, from the aperture to the image of the external occulter, where the internal occulter is set, from this plane to the image of the entrance aperture, where the Lyot stop is set, and from there to the final observing plane. Making use of the axis-symmetry, wave fronts originating from one radius of the Sun are computed and the intensities circularly averaged.
Results. As expected, the image of the external occulter appears as a bright circle, which locally exceeds the brightness of the Sun observed without external occulter. However, residual sunlight is below 10e-8 outside 1.5R. The Lyot coronagraph effectively complements the external occultation. At the expense of a small reduction in flux and resolution, reducing the Lyot stop allows a clear gain in rejection. Oversizing the internal occulter produces a similar effect but tends to exclude observations very close to the limb. We provide a graph that allows simply estimating the performance as a function of sizes of the internal occulter and Lyot stop. |

BibTeX:
@article{rougeot2016performance, author = { Rougeot, Raphael and Flamary, Remi and Galano, Damien and Aime, Claude}, title = {Performance of hybrid externally occulted Lyot solar coronagraph, Application to ASPIICS}, journal = { Astronomy and Astrophysics}, year = {2017} } |

## 2016 |

N. Courty, R. Flamary, D. Tuia, A. Rakotomamonjy, Optimal transport for domain adaptation, Pattern Analysis and Machine Intelligence, IEEE Transactions on , 2016. |

Abstract: Domain adaptation is one of the most challenging tasks of modern data analytics. If the adaptation is done correctly, models built on a specific data representations become more robust when confronted to data depicting the same semantic concepts (the classes), but observed by another observation system with its own specificities. Among the many strategies proposed to adapt a domain to another, finding domain-invariant representations has shown excellent properties, as a single classifier can use labelled samples from the source domain under this representation to predict the unlabelled samples of the target domain. In this paper, we propose a regularized unsupervised optimal transportation model to perform the alignment of the representations in the source and target domains. We learn a transportation plan matching both PDFs, which constrains labelled samples in the source domain to remain close during transport. This way, we exploit at the same time the few labeled information in the source and distributions of the input/observation variables observed in both domains. Experiments in toy and challenging real visual adaptation examples show the interest of the method, that consistently outperforms state of the art approaches. |

BibTeX:
@article{courty2016optimal, author = { Courty, N. and Flamary, R. and Tuia, D. and Rakotomamonjy, A.}, title = {Optimal transport for domain adaptation}, journal = { Pattern Analysis and Machine Intelligence, IEEE Transactions on }, year = {2016} } |

S. Canu, R. Flamary, D. Mary, Introduction to optimization with applications in astronomy and astrophysics, Mathematical tools for instrumentation and signal processing in astronomy, 2016. |

Abstract: This chapter aims at providing an introduction to numerical optimization with some applications in astronomy and astrophysics. We
provide important preliminary definitions that will guide the reader towards different
optimization procedures. We discuss three families of optimization
problems and describe numerical algorithms allowing, when this is possible,
to solve these problems. For each family, we present in detail simple examples and
more involved advanced examples. As a final illustration, we focus on two worked-out
examples of optimization applied to astronomical data. The first application is a supervised
classification of RR-Lyrae stars. The second one is the denoising of galactic spectra
formulated by means of sparsity inducing models in a redundant dictionary. |

BibTeX:
@incollection{canu2016introduction, author = { Canu, Stephane, and Flamary, Remi and Mary, David}, title = {Introduction to optimization with applications in astronomy and astrophysics}, booktitle = { Mathematical tools for instrumentation and signal processing in astronomy}, editor = { {Mary, David and Flamary, Remi, and Theys, Celine, and Aime, Claude}}, year = {2016} } |

R. Flamary, A. Rakotomamonjy, M. Sebag, Apprentissage statistique pour les BCI, Les interfaces cerveau-ordinateur 1, fondements et méthodes, pp 197-215, 2016. |

Abstract: Ce chapitre introduit l'apprentissage statistique et son application aux interfaces cerveau-machine. Dans un premier temps, le principe général de l'apprentissage supervisé est présenté et les difficultés de mise en
oeuvre sont discutées, en particulier les aspects relatifs a la sélection de capteurs et l'apprentissage multi-
sujets. Ce chapitre détaille également la validation d'une approche d'apprentissage, incluant les différentes
mesures de performance et l’optimisation des hyper-paramètres de l'algorithme considéré.
Le lecteur est invité à expérimenter les algorithmes décrits : une boite a outils Matlab/Octave 1 permet
de reproduire les expériences illustrant le chapitre et contient les détails d'implémentation des différentes
méthodes. |

BibTeX:
@incollection{flamary2016apprentissage, author = { Flamary, Remi and Rakotomamonjy, Alain, and Sebag, Michele}, title = {Apprentissage statistique pour les BCI}, pages = { 197-215}, booktitle = { Les interfaces cerveau-ordinateur 1, fondements et méthodes}, editor = { {Clerc, Maureen and Bougrain, Laurent and Lotte, Fabien}}, publisher = { ISTE Editions}, year = {2016} } |

R. Flamary, A. Rakotomamonjy, M. Sebag, Statistical learning for BCIs, Brain Computer Interfaces 1: Fundamentals and Methods, pp 185-206, 2016. |

Abstract: This chapter introduces statistical learning and its applications to brain–computer interfaces. We begin by presenting the general principles of
supervised learning and discussing the difficulties raised by its
implementation, with a particular focus on aspects related to selecting sensors
and multisubject learning. This chapter also describes in detail how a learning
approach may be validated, including various metrics of performance and
optimization of the hyperparameters of the considered algorithms.
We invite the reader to experiment with the algorithms described here: the
illustrative experiments included in this chapter may be reproduced using a
Matlab/Octave toolbox, which contains the implementation details of the
various different methods. |

BibTeX:
@incollection{flamary2016statistical, author = { Flamary, Remi and Rakotomamonjy, Alain, and Sebag, Michele}, title = {Statistical learning for BCIs}, pages = { 185-206}, booktitle = { Brain Computer Interfaces 1: Fundamentals and Methods}, editor = { {Clerc, Maureen and Bougrain, Laurent and Lotte, Fabien}}, publisher = { ISTE Ltd and John Wiley and Sons Inc }, year = {2016} } |

D. Tuia, R. Flamary, M. Barlaud, Non-convex regularization in remote sensing, Geoscience and Remote Sensing, IEEE Transactions on, 2016. |

Abstract: In this paper, we study the effect of different regularizers and their implications in high dimensional image
classification and sparse linear unmixing. Although kernelization or sparse methods are
globally accepted solutions for processing data in high dimensions, we present here a study on the impact of the form of
regularization used and its parametrization. We consider
regularization via traditional squared (l2) and
sparsity-promoting (l1) norms, as well as more unconventional
nonconvex regularizers (lp and Log Sum Penalty).
We compare their properties and advantages on several
classification and linear unmixing tasks and provide advices on the choice
of the best regularizer for the problem at hand. Finally, we also
provide a fully functional toolbox for the community |

BibTeX:
@article{tuia2016nonconvex, author = {Tuia, D. and Flamary, R. and Barlaud, M.}, title = {Non-convex regularization in remote sensing}, journal = {Geoscience and Remote Sensing, IEEE Transactions on}, year = {2016} } |

A. Rakotomamonjy, R. Flamary, G. Gasso, DC Proximal Newton for Non-Convex Optimization Problems, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 27, N. 3, pp 636-647, 2016. |

Abstract: We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are
non-convex but belong to the class of difference of convex (DC)
functions. Our contribution is a new general purpose proximal
Newton algorithm that is able to deal with such a situation.
The algorithm consists in obtaining a descent direction from an
approximation of the loss function and then in performing a
line search to ensure sufficient descent. A theoretical analysis is
provided showing that the iterates of the proposed algorithm
admit as limit points stationary points of the DC objective
function. Numerical experiments show that our approach is
more efficient than current state of the art for a problem with
a convex loss functions and non-convex regularizer. We have
also illustrated the benefit of our algorithm in high-dimensional
transductive learning problem where both loss function and
regularizers are non-convex. |

BibTeX:
@article{rakoto2015dcprox, author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G.}, title = {DC Proximal Newton for Non-Convex Optimization Problems}, journal = { Neural Networks and Learning Systems, IEEE Transactions on}, volume = {27}, number = {3}, pages = {636-647}, year = {2016} } |

## 2015 |

D. Tuia, R. Flamary, N. Courty, Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions, ISPRS Journal of Photogrammetry and Remote Sensing, 2015. |

Abstract: In this paper, we tackle the question of discovering an effective set of spatial filters to solve hyperspectral classification problems. Instead of fixing a priori the filters and their parameters using expert knowledge, we let the model find them within random draws in the (possibly infinite) space of possible filters. We define an active set feature learner that includes in the model only features that improve the classifier. To this end, we consider a fast and linear classifier, multiclass logistic classification, and show that with a good representation (the filters discovered), such a simple classifier can reach at least state of the art performances. We apply the proposed active set learner in four hyperspectral image classification problems, including agricultural and urban classification at different resolutions, as well as multimodal data. We also propose a hierarchical setting, which allows to generate more complex banks of features that can better describe the nonlinearities present in the data. |

BibTeX:
@article{tuia2015multiclass, author = {Tuia, D. and Flamary, R. and Courty, N.}, title = {Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions}, journal = {ISPRS Journal of Photogrammetry and Remote Sensing}, year = {2015} } |

R. Flamary, M. Fauvel, M. Dalla Mura, S. Valero, Analysis of multi-temporal classification techniques for forecasting image times series, Geoscience and Remote Sensing Letters (GRSL), Vol. 12, N. 5, pp 953-957, 2015. |

Abstract: The classification of an annual times series by using data from past years is investigated in this paper. Several
classification schemes based on data fusion, sparse learning and
semi-supervised learning are proposed to address the problem.
Numerical experiments are performed on a MODIS image time
series and show that while several approaches have statistically
equivalent performances, SVM with 1 regularization leads to a
better interpretation of the results due to their inherent sparsity
in the temporal domain. |

BibTeX:
@article{flamary2014analysis, author = { Flamary, R. and Fauvel, M. and Dalla Mura, M. and Valero, S.}, title = {Analysis of multi-temporal classification techniques for forecasting image times series}, journal = { Geoscience and Remote Sensing Letters (GRSL)}, volume = {12}, number = {5}, pages = {953-957}, year = {2015} } |

## 2014 |

R. Flamary, A. Rakotomamonjy, G. Gasso, Learning Constrained Task Similarities in Graph-Regularized Multi-Task Learning, Regularization, Optimization, Kernels, and Support Vector Machines, 2014. |

Abstract: This chapter addresses the problem of learning constrained task relatedness in a graph-regularized multi-task learning framework. In such a context, the
weighted adjacency matrix of a graph encodes the knowledge on task similarities
and each entry of this matrix can be interpreted as a hyperparameter
of the learning problem. This task relation matrix is learned via a bilevel
optimization procedure where the outer level optimizes a proxy of the generalization
errors over all tasks with respect to the similarity matrix and the
inner level estimates the parameters of the tasks knowing this similarity matrix.
Constraints on task similarities are also taken into account in this optimization
framework and they allow the task similarity matrix to be more
interpretable for instance, by imposing a sparse similarity matrix. Since the
global problem is non-convex, we propose a non-convex proximal algorithm
that provably converges to a stationary point of the problem. Empirical evidence
illustrates the approach is competitive compared to existing methods
that also learn task relation and exhibits an enhanced interpretability of the
learned task similarity matrix. |

BibTeX:
@incollection{flamary2014learning, author = { Flamary, R. and Rakotomamonjy, A. and Gasso, G.}, title = {Learning Constrained Task Similarities in Graph-Regularized Multi-Task Learning}, booktitle = { Regularization, Optimization, Kernels, and Support Vector Machines}, editor = { {Suykens J. A.K. , Signoretto M., Argyriou A.}}, year = {2014} } |

R. Flamary, C. Aime, Optimization of starshades: focal plane versus pupil plane, Astronomy and Astrophysics, Vol. 569, N. A28, pp 10, 2014. |

Abstract: We search for the best possible transmission for an external occulter coronagraph that is dedicated to the direct observation of terrestrial exoplanets. We show that better observation conditions are obtained when the flux in the focal plane is minimized in the zone in which the exoplanet is observed, instead of the total flux received by the telescope. We describe the transmission of the occulter as a sum of basis functions. For each element of the basis, we numerically computed the Fresnel diffraction at the aperture of the telescope and the complex amplitude at its focus. The basis functions are circular disks that are linearly apodized over a few centimeters (truncated cones). We complemented the numerical calculation of the Fresnel diffraction for these functions by a comparison with pure circular discs (cylinder) for which an analytical expression, based on a decomposition in Lommel series, is available. The technique of deriving the optimal transmission for a given spectral bandwidth is a classical regularized quadratic minimization of intensities, but linear optimizations can be used as well. Minimizing the integrated intensity on the aperture of the telescope or for selected regions of the focal plane leads to slightly different transmissions for the occulter. For the focal plane optimization, the resulting residual intensity is concentrated behind the geometrical image of the occulter, in a blind region for the observation of an exoplanet, and the level of background residual starlight becomes very low outside this image. Finally, we provide a tolerance analysis for the alignment of the occulter to the telescope which also favors the focal plane optimization. This means that telescope offsets of a few decimeters do not strongly reduce the efficiency of the occulter. |

BibTeX:
@article{flamary2014starshade, author = { Flamary, Remi and Aime, Claude}, title = {Optimization of starshades: focal plane versus pupil plane}, journal = { Astronomy and Astrophysics}, volume = {569}, number = {A28}, pages = { 10}, year = {2014} } |

R. Flamary, N. Jrad, R. Phlypo, M. Congedo, A. Rakotomamonjy, Mixed-Norm Regularization for Brain Decoding, Computational and Mathematical Methods in Medicine, Vol. 2014, N. 1, pp 1-13, 2014. |

Abstract: This work investigates the use of mixed-norm regularization for sensor selection in event-related potential (ERP) based brain-computer interfaces (BCI). The classification problem is cast as a discriminative optimization framework where sensor selection is induced through the use of mixed-norms. This framework is extended to the multitask learning situation where several similar classification tasks related to different subjects are learned simultaneously. In this case, multitask learning helps in leveraging data scarcity issue yielding to more robust classifiers. For this purpose, we have introduced a regularizer that induces both sensor selection and classifier similarities. The different regularization approaches are compared on three ERP datasets showing the interest of mixed-norm regularization in terms of sensor selection. The multitask approaches are evaluated when a small number of learning examples are available yielding to significant performance improvements especially for subjects performing poorly. |

BibTeX:
@article{flamary2014mixed, author = {Flamary, R. and Jrad, N. and Phlypo, R. and Congedo, M. and Rakotomamonjy, A.}, title = {Mixed-Norm Regularization for Brain Decoding}, journal = {Computational and Mathematical Methods in Medicine}, volume = {2014}, number = {1}, pages = {1-13}, year = {2014} } |

E. Niaf, R. Flamary, O. Rouvière, C. Lartizien, S. Canu, Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging, Image Processing, IEEE Transactions on, Vol. 23, N. 3, pp 979-991, 2014. |

Abstract: Building an accurate training database is challenging in supervised classification. For instance, in medical imaging, radiologists often delineate malignant and benign tissues without access to the histological ground truth, leading to uncertain data sets. This paper addresses the pattern classification problem arising when available target data include some uncertainty information. Target data considered here are both qualitative (a class label) or quantitative (an estimation of the posterior probability). In this context, usual discriminative methods, such as the support vector machine (SVM), fail either to learn a robust classifier or to predict accurate probability estimates. We generalize the regular SVM by introducing a new formulation of the learning problem to take into account class labels as well as class probability estimates. This original reformulation into a probabilistic SVM (P-SVM) can be efficiently solved by adapting existing flexible SVM solvers. Furthermore, this framework allows deriving a unique learned prediction function for both decision and posterior probability estimation providing qualitative and quantitative predictions. The method is first tested on synthetic data sets to evaluate its properties as compared with the classical SVM and fuzzy-SVM. It is then evaluated on a clinical data set of multiparametric prostate magnetic resonance images to assess its performances in discriminating benign from malignant tissues. P-SVM is shown to outperform classical SVM as well as the fuzzy-SVM in terms of probability predictions and classification performances, and demonstrates its potential for the design of an efficient computer-aided decision system for prostate cancer diagnosis based on multiparametric magnetic resonance (MR) imaging. |

BibTeX:
@article{niaf2014kernel, author = {Niaf, E. and Flamary, R. and Rouvière, O. and Lartizien, C. and Canu, S.}, title = {Kernel-Based Learning From Both Qualitative and Quantitative Labels: Application to Prostate Cancer Diagnosis Based on Multiparametric MR Imaging}, journal = {Image Processing, IEEE Transactions on}, volume = {23}, number = {3}, pages = {979-991}, year = {2014} } |

D. Tuia, M. Volpi, M. Dalla Mura, A. Rakotomamonjy, R. Flamary, Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM, Geoscience and Remote Sensing, IEEE Transactions on, Vol. 52, N. 10, pp 6062-6074, 2014. |

Abstract: Including spatial information is a key step for successful remote sensing image classification. In particular, when dealing with high spatial resolution, if local variability is strongly reduced by spatial filtering, the classification performance results are boosted. In this paper, we consider the triple objective of designing a spatial/spectral classifier, which is compact (uses as few features as possible), discriminative (enhances class separation), and robust (works well in small sample situations). We achieve this triple objective by discovering the relevant features in the (possibly infinite) space of spatial filters by optimizing a margin-maximization criterion. Instead of imposing a filter bank with predefined filter types and parameters, we let the model figure out which set of filters is optimal for class separation. To do so, we randomly generate spatial filter banks and use an active-set criterion to rank the candidate features according to their benefits to margin maximization (and, thus, to generalization) if added to the model. Experiments on multispectral very high spatial resolution (VHR) and hyperspectral VHR data show that the proposed algorithm, which is sparse and linear, finds discriminative features and achieves at least the same performances as models using a large filter bank defined in advance by prior knowledge. |

BibTeX:
@article{tuia2014automatic, author = {Tuia, D. and Volpi, M. and Dalla Mura, M. and Rakotomamonjy, A. and Flamary, R.}, title = {Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM}, journal = {Geoscience and Remote Sensing, IEEE Transactions on}, volume = {52}, number = {10}, pages = {6062-6074}, year = {2014} } |

L. Laporte, R. Flamary, S. Canu, S. Déjean, J. Mothe, Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 25, N. 6, pp 1118-1130, 2014. |

Abstract: Feature selection in learning to rank has recently emerged as a crucial issue. Whereas several preprocessing approaches have been proposed, only a few works have been focused on integrating the feature selection into the learning process. In this work, we propose a general framework for feature selection in learning to rank using SVM with a sparse regularization term. We investigate both classical convex regularizations such as l1 or weighted l1 and non-convex regularization terms such as log penalty, Minimax Concave Penalty (MCP) or lp pseudo norm with p lower than 1. Two algorithms are proposed, first an accelerated proximal approach for solving the convex problems, second a reweighted l1 scheme to address the non-convex regularizations. We conduct intensive experiments on nine datasets from Letor 3.0 and Letor 4.0 corpora. Numerical results show that the use of non-convex regularizations we propose leads to more sparsity in the resulting models while prediction performance is preserved. The number of features is decreased by up to a factor of six compared to the l1 regularization. In addition, the software is publicly available on the web. |

BibTeX:
@article{tnnls2014, author = { Laporte, L. and Flamary, R. and Canu, S. and Déjean, S. and Mothe, J.}, title = {Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM}, journal = { Neural Networks and Learning Systems, IEEE Transactions on}, volume = {25}, number = {6}, pages = {1118-1130}, year = {2014} } |

## 2013 |

A. Rakotomamonjy, R. Flamary, F. Yger, Learning with infinitely many features, Machine Learning, Vol. 91, N. 1, pp 43-66, 2013. |

Abstract: We propose a principled framework for learning with infinitely many
features, situations that are usually induced by continuously parametrized feature
extraction methods. Such cases occur for instance when considering Gabor-based
features in computer vision problems or when dealing with Fourier features for
kernel approximations. We cast the problem as the one of finding a finite subset of
features that minimizes a regularized empirical risk. After having analyzed the optimality conditions of such a problem, we propose a simple algorithm which has the
avour of a column-generation technique. We also show that using Fourier-based
features, it is possible to perform approximate infinite kernel learning. Our experimental results on several datasets show the benefits of the proposed approach in
several situations including texture classification and large-scale kernelized problems (involving about 100 thousand examples). |

BibTeX:
@article{ml2012, author = { Rakotomamonjy, A. and Flamary, R. and Yger, F.}, title = {Learning with infinitely many features}, journal = { Machine Learning}, volume = {91}, number = {1}, pages = {43-66}, year = {2013} } |

## 2012 |

R. Flamary, A. Rakotomamonjy, Decoding finger movements from ECoG signals using switching linear models, Frontiers in Neuroscience, Vol. 6, N. 29, 2012. |

Abstract: One of the most interesting challenges in ECoG-based Brain-Machine Interface is movement prediction. Being able to perform such a
prediction paves the way to high-degree precision command for a
machine such as a robotic arm or robotic hands. As a witness of the
BCI community increasing interest towards such a problem, the fourth
BCI Competition provides a dataset which aim is to predict
individual finger movements from ECog signals. The difficulty of the problem relies on
the fact that there is no simple relation between ECoG signals and finger
movements. We propose in this paper, to estimate and
decode these finger flexions using switching models controlled by an hidden state. Switching
models can integrate prior knowledge about the decoding problem and helps in predicting fine and precise movements. Our model is thus based on a first block which estimates which finger is moving and another
block which, knowing which finger is moving, predicts the movements of all
other fingers. Numerical results that have been submitted to the Competition show that the model yields high decoding performances when the hidden state is well
estimated. This approach achieved the second place in the BCI
competition with a correlation measure between real and predicted
movements of 0.42. |

BibTeX:
@article{frontiers2012, author = { Flamary, R. and Rakotomamonjy, A.}, title = {Decoding finger movements from ECoG signals using switching linear models}, journal = { Frontiers in Neuroscience}, volume = { 6}, number = { 29}, year = {2012} } |

R. Flamary, D. Tuia, B. Labbé, G. Camps-Valls, A. Rakotomamonjy, Large Margin Filtering, IEEE Transactions Signal Processing, Vol. 60, N. 2, pp 648-659, 2012. |

Abstract: Many signal processing problems are tackled by filtering the signal for subsequent feature classification or regression. Both steps are critical and need to be designed carefully
to deal with the particular statistical characteristics of both
signal and noise. Optimal design of the filter and the classifier are typically aborded in a separated way, thus leading
to suboptimal classification schemes. This paper proposes an
efficient methodology to learn an optimal signal filter and a
support vector machine (SVM) classifier jointly. In particular,
we derive algorithms to solve the optimization problem, prove its
theoretical convergence, and discuss different filter regularizers
for automated scaling and selection of the feature channels. The
latter gives rise to different formulations with the appealing
properties of sparseness and noise-robustness. We illustrate the
performance of the method in several problems. First, linear
and nonlinear toy classification examples, under the presence
of both Gaussian and convolutional noise, show the robustness
of the proposed methods. The approach is then evaluated on
two challenging real life datasets: BCI time series classification
and multispectral image segmentation. In all the examples, large
margin filtering shows competitive classification performances
while offering the advantage of interpretability of the filtered
channels retrieved. |

BibTeX:
@article{ieeesp2012, author = { Flamary, R. and Tuia, D. and Labbé, B. and Camps-Valls, G. and Rakotomamonjy, A.}, title = {Large Margin Filtering}, journal = { IEEE Transactions Signal Processing}, volume = {60}, number = {2}, pages = {648-659}, year = {2012} } |

## 2011 |

A. Rakotomamonjy, R. Flamary, G. Gasso, S. Canu, lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning, IEEE Transactions on Neural Networks, Vol. 22, N. 8, pp 1307-1320, 2011. |

Abstract: Recently, there has been a lot of interest around multi-task learning (MTL) problem with the constraints that tasks should share a common
sparsity profile. Such a problem can be addressed through a regularization
framework where the regularizer induces a joint-sparsity pattern
between task decision functions. We follow this principled framework
and focus on $\ell_p-\ell_q$ (with $0 \leq p \leq 1$ and $ 1 \leq
q \leq 2$) mixed-norms as sparsity- inducing penalties. Our motivation
for addressing such a larger class of penalty is to adapt the penalty
to a problem at hand leading thus to better performances and better
sparsity pattern. For solving the problem in the general multiple
kernel case, we first derive a variational formulation of the $\ell_1-\ell_q$
penalty which helps up in proposing an alternate optimization algorithm.
Although very simple, the latter algorithm provably converges to
the global minimum of the $\ell_1-\ell_q$ penalized problem. For
the linear case, we extend existing works considering accelerated
proximal gradient to this penalty. Our contribution in this context
is to provide an efficient scheme for computing the $\ell_1-\ell_q$
proximal operator. Then, for the more general case when $0 < p
< 1$, we solve the resulting non-convex problem through a majorization-minimization
approach. The resulting algorithm is an iterative scheme which,
at each iteration, solves a weighted $\ell_1-\ell_q$ sparse MTL
problem. Empirical evidences from toy dataset and real-word datasets
dealing with BCI single trial EEG classification and protein
subcellular localization show the benefit of the proposed approaches
and algorithms. |

BibTeX:
@article{tnn2011, author = { Rakotomamonjy, A. and Flamary, R. and Gasso, G. and Canu, S.}, title = {lp-lq penalty for sparse linear and sparse multiple kernel multi-task learning}, journal = { IEEE Transactions on Neural Networks}, volume = {22}, number = {8}, pages = {1307-1320}, year = {2011} } |

N. Jrad, M. Congedo, R. Phlypo, S. Rousseau, R. Flamary, F. Yger, A. Rakotomamonjy, sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces, Journal of Neural Engineering, Vol. 8, N. 5, pp 056004, 2011. |

Abstract: In many machine learning applications, like brain–computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials. |

BibTeX:
@article{jrad2011swsvm, author = {N. Jrad and M. Congedo and R. Phlypo and S. Rousseau and R. Flamary and F. Yger and A. Rakotomamonjy}, title = {sw-SVM: sensor weighting support vector machines for EEG-based brain–computer interfaces}, journal = {Journal of Neural Engineering}, volume = {8}, number = {5}, pages = {056004}, year = {2011} } |