**Navigation**

**Contact us**

- Scientific Leader
- tel: +33 3 20 43 68 76

- Team Assistant
- tel: +33 3 59 57 78 45

**Research Organizations**

**Current Collaborations**

**Related Inria teams**

*© 2016-2018 Modal-Team. All rights reserved.*

**Navigation**

**Contact us**

- Scientific Leader
- tel: +33 3 20 43 68 76

- Team Assistant
- tel: +33 3 59 57 78 45

**Research Organizations**

**Current Collaborations**

**Related Inria teams**

*© 2016-2018 Modal-Team. All rights reserved.*

seminars

**Usual day**: Tuesday at 11.00.

**Place**: Inria Lille - Nord Europe.

**How to get there**: en français, in english.

**Organizers**: Alain Celisse and Hemant Tyagi

**Calendar feed**: iCalendar (hosted by the seminars platform of University of Lille)

Most slides are available: check past sessions and archives.

**Archives**: 2018-2019, 2017-2018, 2016-2017, 2015-2016, 2014-2015, 2013-2014.

**Date**: May 5, 2020**(Tuesday)**at**11.00**(Room B21)**Affiliation**: ENS Paris**Webpage**: Link**Title**: Naive Feature Selection: Sparsity in Naive Bayes**Abstract**: Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, l1-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with 1.6 million training points and about 12 million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.

**Date**: April 28, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: ENSAE CREST**Webpage**: Link**Title**: TBA**Abstract**: TBA

**Date**: April 14, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: Inria Lille**Webpage**:**Title**: TBA**Abstract**: TBA

**Date**: April 7, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: Inria Lille**Webpage**:**Title**: TBA**Abstract**: TBA

**Date**: March 24, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: University of Lyon 2**Webpage**: Link**Title**: A partial overview of the recent trends in the mathematical analysis of deep learning**Abstract**: Deep learning is a new field at the intersection of computer science and mathematics whose main goal is to solve the problem of classifying objects given some training data (and many generalisations of this problem). Despite the extensive activity around devising new algorithms that recently lead to fascinating practical breakthroughs, a complete understanding of the mathematical underpinnings is still lacking. In this talk, I will give an overview of the recent and exciting trends in the mathematics of deep learning and summarise the main open problems in the field.

**Date**: March 10, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: HU Berlin**Webpage**: Link**Title**: Smoothed residual stopping for statistical inverse problems via truncated SVD estimation**Abstract**: This work examines under what circumstances adaptivity for truncated SVD estimation can be achieved by an early stopping rule based on the smoothed residuals \| (A A^T)^(\alpha / 2) (Y - A /hat mu^(m)) \|^2. Lower and upper bounds for the risk are derived, which show that moderate smoothing of the residuals can be used to adapt over classes of signals with varying smoothness, while oversmoothing yields suboptimal convergence rates. The theoretical results are illustrated by Monte-Carlo simulations.

**Date**: February 4, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: University of Lille and Inria Lille**Webpage**: Link**Title**: How smoothing can improve on the discrepancy principle for early stopping**Abstract**: Early stopping rules refer to new strategies for choosing when to interrupt an iterative learning process (gradient descent,…). This question of deciding when to stop the iterations is of primary importance since a bad choice would result in poor statistical properties and/or additional unnecessary computations. In this talk the purpose is to describe the early-stopping challenge, illustrate the behavior of classical discrepancy-based stopping rules, and also discuss some of their deficiencies. This motivates considering smoothing-based strategies such as the one inspired from Tikhonov regularization. For such rules, we prove several theoretical (non-)asymptotic guarantees, and also illustrate their practical behavior on simulation experiments carried out by means of spectral filter algorithms.

**Date**: January 28, 2020**(Tuesday)**at**11.00**(Room A11)**Affiliation**: Inria Lille - Nord Europe**Webpage**:**Title**: Early stopping: some new ideas towards optimality**Abstract**: Early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. The main advantage of this technique is significant reduction of computational resources and possibly better statistical performance. In this talk we will focus on nonparametric regression setting in reproducing kernel Hilbert space (RKHS). We perform empirical risk minimisation procedure in RKHS with two iterative learning algorithms: gradient descent and (iterative) kernel ridge regression. In order to avoid overfitting phenomenon for these two algorithms we will consider so-called discrepancy principle early stopping rule. We quantify its performance by means of oracle-type inequality and show that, under some mild conditions, this stopping rule is not appropriate for various kernel classes. In order to solve this problem we propose a new stopping rule called smoothed discrepancy principle and show that this rule could be proved optimal in minimax sense under some conditions. At the end of the talk there will be simulation experiments to validate the theoretical results.

**Date**: January 21, 2020**(Tuesday)**at**11.00**(Plenary room)**Affiliation**: University of Lorraine**Webpage**: Link**Title**: Decision Tree for uncertainty measures**Abstract**: The ensemble methods are popular machine learning techniques which are powerful when one wants to deal with both classification or prediction problems. A set of classifiers is constructed, and the classification or the prediction of a new data instance is done by tacking a weighted vote. These classifiers can be regression or classification trees. Practically, a tree is a piece-wise constant estimator on partitions obtained from the data. These partitions are induced by recursive dyadic split of the set of input variables. For example, CART (Classification And Regression Trees) is an algorithm which constructs a tree. The goal is to partition the space of input variable values in the most “homogeneous” K disjoint regions possible. More precisely, each partitioning value has to minimize a risk function. However, in practice, experimental measures can be observed with uncertainty. This work proposes to extend CART algorithm to these kind of data. We present an induced model adapted to uncertainty data and both a prediction and split rule for a tree construction taking into account the uncertainty of each quantitative observation from the data base. Joint work with Sami ALKHOURY, Myriam TAMI, Emilie DEVIJVER, Eric GAUSSIER and Total SA

**Date**: January 14, 2020**(Tuesday)**at**11.00**(Room A11)**Affiliation**: Inria Lille - Nord Europe**Webpage**:**Title**: Optimal Transport and its Application to Domain Adaptation**Abstract**: The optimal transport was first formalized by Monge in the 70s. The initial problem consists of finding the best plan to move a mass of dirt into a given hole while minimizing the effort. Recently, the problem and associated Wasserstein distance have attracted lots of attention and been used in different machine learning tasks. This talk will focus on the application of optimal transport to domain adaptation problems. In such a scenario, one is given a set of points sampled from a source distribution and wants to adapt them to match the distribution of the target domain. The optimal transport will be compared with other domain adaptation methods and some computational details will also be discussed.

**Date**: December 3, 2019**(Tuesday)**at**11.00**(Plenary Room)**Affiliation**: University of Lyon 2**Webpage**: Link**Title**: A new and simpler approach to the analysis of Robust PCA**Abstract**: In standard PCA, one collects n observed vectors x_1,. . . ,x_n in R^d and forms the matrix X = [x_1, . . . , x_n]. Then, one finds the best approximation of rank r by thresholding the singular values as prescribed, by the Eckart-Young theorem. One of the main drawback of PCA is that the singular vectors are very sensitive to outliers. Robust PCA (RPCA) is a kind of matrix decomposition which allows to recover a low rank matrix corrupted by both noise and outliers with potentially very large value. RPCA has been extensively used in a very wide range of applications from genetics to video processing. One of the main ideas behind Robust PCA is to reconstruct the matrices L_0 and S_0 by solving the convex programming problem

min ||L||_{∗} + λ ||S||_1 s. t. ||X − L − S|| ≤ η.

The original theoretical analysis of this estimator’s accuracy is well known to be technically very involved. The goal of the present work is to propose an elementary analysis of Robust PCA using the descent cone approach of Amelunxen et al. and a null space-type property on the eigenvectors of the original low rank matrix

**Date**: October 15, 2019**(Tuesday)**at**11.00**(Plenary Room)**Affiliation**: Indian Institute of Science (IISc)**Webpage**:**Title**: Euclidean k-Means with Center Proximity**Abstract**: Clustering is an important tool in any data science toolkit. Most popular clustering algorithms partition the given data into disjoint clusters by optimizing a certain global objective such as the k-means. The implicit assumption in doing so is that an optimal solution for this objective would recover the underlying ground truth clustering. However, many such objectives are NP-hard to optimize in the worst case, e.g., k-centre, k-median, k-means. Moreover, an optimal solution need not satisfy certain properties desired from the ground truth clustering, e.g., balance, stability. In practice, however, Lloyd’s algorithm, k-means++, and their variants perform well on most real-world data sets. This dichotomy between theoretical intractability and empirically observed efficiency has led to the CDNM thesis: “Clustering is Difficult only when it does Not Matter”! In most real-world data sets, the underlying ground-truth clustering is unambiguous and stable under small perturbations of data. We will highlight these issues with the example of k-means clustering. In this talk, we will explore a notion of stability called centre-proximity and give an algorithm and lower bounds for stable instances. This is joint work with Dr Amit Deshpande (Microsoft Research, India) and Dr Anand Louis (Indian Institute of Science, India).

**Date**: October 8, 2019**(Tuesday)**at**11.00**(Plenary Room)**Affiliation**: SINTEF**Webpage**: Link.**Title**: Machine learning in the real world**Abstract**: Machine learning algorithms are flexible and powerful, but the data requirements are high and rarely met by the available data. Real world data is often medium sized (relative to problem side), noisy and full of missing values. At the same time, in order to deploy machine learning in industrial settings, they must be robust, explainable and have quantified uncertainties. I will show practical examples of these challenges from our recent projects and some case-by-case solutions, but also highlight remaining issues.

**Date**: September 23, 2019**(Monday)**at**11.00**(Room A00)**Affiliation**: University of Tokyo**Webpage**: Link.**Title**: Unsupervised Domain Adaptation Based on Source-guided Discrepancy**Abstract**: Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different, and labels in the target domain are unavailable. One important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain. To mitigate these problems, we propose a novel discrepancy called source-guided discrepancy (S-disc), which exploits labels in the source domain. As a consequence, S-disc can be computed efficiently with a finite sample convergence guarantee. In addition, we show that S-disc can provide a tighter generalization error bound than the one based on an existing discrepancy. Finally, we report experimental results that demonstrate the advantages of S-disc over the existing discrepancies.**Slides**: Link

**Date**: September 11, 2019**(Wednesday)**at**14.00**(Plenary Room)**Affiliation**: KU Leuven**Webpage**: Link.**Title**: Landmark sampling, diversity and kernel methods**Abstract**: In machine learning, there is a revived interest for kernel methods, e.g. for designing interpretable convolutional networks or in the context of Gaussian processes. More generally, in kernel-based learning, a central question concerns large scale approximations of the kernel matrix. A popular method for finding a low rank approximation of kernel matrices is the so-called Nystrom method, which relies on the sampling of 'good' landmark points in a dataset. We will discuss an approach for selecting 'diverse' landmarks with some theoretical guarantees. Our work makes a connection between kernelized Christoffel functions, ridge leverage scores and determinantal point processes.

**Archives**: 2018-2019, 2017-2018, 2016-2017, 2015-2016, 2014-2015, 2013-2014.

seminars.txt · Last modified: 2020/03/24 11:21 by tyagi