Machine Learning for NeuroImaging Workshop
Marseille, France
November 8-9 2011

Program

Tuesday Nov. 8, 2011

12h - 13h30: Welcome Lunch

Session I: Introduction

13:30 - 14:00: Introduction to neuroimaging for machine learners

Session II: Sparsity and feature selection

14:00 - 14:30: Bertrand Thirion (Parietal / Inria, Gif sur Yvette, France)

Title: Spatial regularization and sparsity for brain mapping
Abstract: Reverse inference, or “brain reading”, is a recent paradigm for analyzing functional magnetic resonance imaging (fMRI) data, based on pattern recognition tools. This approach aims at decoding brain activity by predicting some cognitive variables related to brain activation maps. Reverse inference takes into account the multivariate information between voxels and is currently the only way to assess how precisely some cognitive information is encoded by the activity of neural populations within the whole brain. However, it relies on a prediction function that is plagued by the curse of dimensionality, as we have far more features than samples, i.e., more voxels than fMRI volumes. To address this problem, different methods have been proposed. Among them are univariate feature selection, feature agglomeration and regularization techniques. We will give an overview of recently developed techniques to impose sparsity or compactness priors or predictive maps that seem par ticularly well suited to neuroimaging. In particular, when a generalization across individuals is of interest, more robustness to cross-individuals spatial variability may be achieved with adapted regularization or agglomeration methods. We will focus on the well-posedness of the estimation procedures and related optimization problems, then we will present tests of our algorithms on real datasets, and we show that the proposed algorithm yields better prediction accuracy than reference methods.

14:30 - 15:00: Francis Bach (Sierra / Inria, Paris, France)

Title: Structured sparsity and convex optimization
Abstract: The concept of parsimony is central in many scientific domains. In the context of statistics, signal processing or machine learning, it takes the form of variable or feature selection problems, and is commonly used in two situations: First, to make the model or the prediction more interpretable or cheaper to use, i.e., even if the underlying problem does not admit sparse solutions, one looks for the best sparse approximation. Second, sparsity can also be used given prior knowledge that the model should be sparse. In these two situations, reducing parsimony to finding models with low cardinality turns out to be limiting, and structured parsimony has emerged as a fruitful practical extension, with applications to image processing, text processing or bioinformatics. In this talk, I will review recent results on structured sparsity, as it applies to machine learning and signal processing. (joint work with R. Jenatton, J. Mairal and G. Obozinski)

15:00 - 15:30: Alain Rakotomamonjy (LITIS / Université de Rouen, Rouen, France)

Title: Selecting from an infinite set of features
Abstract: This talk introduces a principled framework for learning with an infinite set of features or from continuously parametrized features. Such a situation occurs for instance when considering Gabor-based features in computer vision problems or when dealing with Fourier features for kernel approximations. We cast the problem as the one of finding a finite subset of features that minimizes a regularized empirical risk. After having analyzed the optimality conditions of such a problem, we propose a simple algorithm which has the flavour of a column-generation technique. Our experimental results on several datasets show the benefits of the proposed approach in several situations including texture classification and large-scale kernelized problems (involving about $100$ thousand examples).

15:30 - 16:00: Discussion

16:00 - 16:15: Coffee break

Session III: From priors to models

16:15 - 16:45: Marie Szafranski (ENSIIE, IBISC, Evry, France)

Title: Learning from different sources
Abstract: Many popular methods in machine learning rely on the concept of kernel and its underlying theory. With kernel functions, data can be represented as similarities between objects. When data are collected from different and possibly heterogeneous sources, using kernels allows to have homogeneous representations of this multiple information. The framework of Multiple Kernel Learning (MKL) enables to learn a general kernel, from an ensemble of kernels, whose combination is optimized in a machine learning process. However, MKL is not meant to address problems where several kernels pertain to a single source. Composite Kernel Learning (CKL) considers problems where we have a set of kernels, partitioned in groups defined by a prior information, which may correspond to subsets of sources or more generally distinct families of similarity measures between examples. In this talk, I will give an insight into the frameworks of MKL and CKL and illustrate their behaviours on Brain Computer Interfaces experiences.

16:45 - 17:15: Jean-Philippe Vert (Institut Curie / Mines ParisTech, Paris, France)

Title: Including prior knowledge in machine learning from genomic data
Abstract: Estimating predictive models from high-dimensional and structured genomic data measured on a small number of samples is one of the most challenging statistical problems raised by current needs in computational biology. Popular tools in statistics and machine learning to address this issue are so-called shrinkage estimators, which minimize an empirical risk regularized by a penalty term, and which include for example support vector machines or the LASSO. In this talk I will discuss new penalty functions for shrinkage estimators, including generalizations of the LASSO which lead to particular sparsity patterns, and which can be seen as a way to include problem-specific prior information in the estimator. I will illustrate the approach by several examples such as the classification of gene expression data using gene networks as prior knowledge, or the classification and detection of frequent breakpoints in DNA copy number profiles.

17:15 - 17-45: Marcel Van Gerven (Donders Institute, Njimegen, The Netherlands)

Title: Percept decoding with sparse latent variable models
Abstract: Recent advances in machine learning and neuroimaging have made it possible to decode subjective experience from patterns of brain activity. A notable example is the reconstruction of perceived and imagined visual stimuli from BOLD data. After giving a brief introduction into this area of research, I will discuss two alternative sparse latent variable models which are ideally suited for percept decoding. The first (generative) approach combines the elastic net regularizer with deep learning, where latent variables encode increasingly complex stimulus features. The second (discriminative) approach combines the elastic net regularizer with partial least squares, leading to a new sparse orthonormalized partial least squares algorithm. The algorithm learns a sparse low-dimensional mapping between neuroimaging data and perceived stimuli using a small number of latent variables. I will conclude by pointing out some other applications and directions for future research.

17:45 - 18:15: Discussion

Wednesday Nov. 9, 2011

Session IV: Towards diagnosis tools

9:00 - 9:30: Olivier Colliot (Cogimage / CRICM, Paris, France)

9:30 - 10:00: Janaina Mourao-Miranda (UCL, London, UK)

Title: Towards Machine Learning tools for diagnosis
Abstract: Pattern recognition approaches have been successfully used to classify groups of individuals (e.g. healthy controls and patients) based on their patterns of brain activity or structure. In the standard framework these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In this presentation I will discuss different strategies to perform fMRI-based diagnosis of psychiatry disorders using pattern recognition approaches: (1) Diagnosis and prognosis as a standard two class problem based on patterns of brain activation for a specific stimuli; (2) Application of the one-class SVM to treat patient classification as an outlier detection problem.

10:00 - 10:30: John Ashburner (FIL / UCL, London, UK)

Title: Computational Anatomy and Machine Learning
Abstract: Clinical applications of machine learning with neuroimaging data often require modelling inter-subject variability. Much of the inter-subject variability visible in brain images is a result of variability among the relative shapes of the brains. For example, a difference between the positions of white-matter tracts of two populations of ``aligned'' scans may simply mean that the data have not been well aligned. Similarly, much of the difference between BOLD responses from different populations may simply be a result of poor alignment or some form of volumetric differences. I will present a few possible approaches that have been used for working with measures of shape, with a particular focus on methods based on diffeomorphic metric mapping.

10:30 - 10:45: Coffee break

10:45 - 11:15: Arthur Gretton (Gatsby/UCL, London, UK)

Title: Covariate Shift by Kernel Mean Matching
Abstract: Assume we are given sets of observations of training and test data, where (unlike in the classical setting) the training and test distributions are allowed to differ. Thus for learning purposes, we face the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We consider specifically the case where the difference in training and test distributions occurs only in the marginal distribution of the covariates: the conditional distribution of the outputs given the covariates is unchanged. We achieve covariate shift correction by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation, making it suited to high dimensions and structured data, where distribution estimates may not be practical. We first describe the general setting of covariate shift correction, and the importance weighting approach. While direct density estimation provides an estimate of the importance weights, this has two potential disadvantages: it may not offer the best bias/variance tradeoff, and density estimation might be difficult on complex, high dimensional domains (such as text). We then describe how distributions may be mapped to reproducing kernel Hilbert spaces (RKHS), and review distances between such mappings. We demonstrate a transfer learning algorithm that reweights the training points such that their RKHS mapping matches that of the (unlabeled) test points. The sample weights are obtained by a simple quadratic programming procedure. Our correction method yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is "simpler" than the data might suggest. On the other hand, even an ideal sample reweighting may not be of practical benefit given a sufficiently powerful learning algorithm (if available).

11:15 - 11:45: Edouard Duchesnay (LNAO / Neurospin, Gif-sur-Yvette, France)

Title: Methods to bridge the gap between genetic and clinic using neuroimaging as an intermediary phenotype
Brain imaging could be crucial as an intermediate phenotype to understand the complex path between genetics and behavioural or cognitive phenotypes. Such a vast problem can be addressed through a two steps approach: (i) searching for the brain imaging phenotypes associated to the clinical status; (ii) searching for the genetic variability associated to a brain imaging variability. The high dimensional nature of the genetic/neuroimaging data implies ML algorithms to be re-designed in order to identify significant associations between blocks of informations. We investigate dimension reduction/regularisation strategies to (i) identify predictive biomarkers of Autism based on functional PET brain scans on an highly imbalanced groups size dataset; (ii) identify the genetic variability associated to functional MRI variability using latent variables models.

11:45 - 12:15: Discussion

12:15 - 13:30: Lunch

Session V: Graphs and connectivity

13:30 - 14:00: Jonas Richiardi (EPFL, Lausanne, Switzerland)

Title: Classifying brain connectivity data using graph embeddings
Abstract: Beyond classical "brain decoding" (pattern recognition based on BOLD activations), another view of the data can be gained by looking at functional connectivity, which considers temporal correlations between brain regions of the brain at rest or during tasks. Such connectivity is typically represented as a graph, and much literature has focused on group-level, post-hoc analysis of such graphs in terms of properties such as local efficiency, clustering coefficient, and so on. However, it is of interest in many applications to be able to perform predictive modelling with connectivity graphs. This talk will focus on an emerging technique we recently proposed, connectivity-based decoding, which can be use both for brain state decoding and clinical applications such as diagnosis. After a whole-brain regional functional connectivity graph has been established, the problem can be cast as a labelled graph classification task. We will show that the graphs of interest can be formulated as a restricted class of graphs whose properties, in particular the fact that the vertex correspondance problem does not need to be solved, prevent the application of classical graph matching algorithms (focusing on recovering a permutation matrix) to elicit a useful distance or dissimilarity between graphs. We will instead advocate for the use of graph embedding methods, which have performed very well in many engineering areas such as computer vision and data mining. Thus, the effort is spent on finding an effective graph representation for discriminative learning. We will present several vector space representations of graphs that are suitable for the class of graphs of interest, and discuss experimental results on neuroimaging data.

14:30 - 15:00: Sylvain Takerkart (INCM / INT, France)

Title: Designing graphs and graph-kernels to characterize cortical representations measured with functional MRI
Abstract: Supervised classification algorithms have recently received large attention to analyze spatial patterns of functional MRI data and therefore perform image-based identification (“decoding”) of mental states. However, most algorithms used to perform this task are blind to the structure underlying the data. We here present how to design graphical representations to model local activation patterns measured with fMRI. We then introduce custom-designed graph kernels that we use to perform supervised classification in graph-space, thus taking into account the intrinsic spatial structure of the data.

14:30 - 15:00: Thomas Gärtner (KDML / Universität Bonn, Bonn, Germany)

Title: Learning from Structured Data
Abstract: Most real-world data is inherently structured: People in a social network are related, movies share the same authors and/or directors, molecules consist of atoms connected by bonds, etc. Most machine learning algorithms, however, can not directly cope with this inherent structure, instead many can only deal with data that is represented by a single row in a single table. In this talk I will give examples how kernel methods and other linear classifiers can be extended to handle structured input as well as output variables. I will concentrate on highlighting conceptual differences and similarities.

15:00 - 15:30: Discussion

Session VI: Concluding Remarks and Farewell

Online user: 1