Multivariate Prediction

Written by Luke Chang

The statistical methods we have discussed in this course so far have primarily been concerned with modeling activation in a single voxel and testing hypotheses in the form of “where in the brain is activation significantly greater in condition A relative condition B”. As you may recall this involved using multilevel modeling with the GLM, where a voxel’s time course was modeled using a first level GLM and then the contrast effect was aggregated across participants in the second level model. This procedure is often referred to as mass univariate testing and requires carefully considering how to correct for the many tests across voxels.

\[\text{voxel} = \beta \cdot \text{task model} + \beta \cdot \text{covariates} + \epsilon\]

A completely different approach to the problem is to reverse the regression equation and identify patterns of voxel activations that predict an outcome. This might be classifying between different conditions of a task, or predicting the intensity of a continuous outcome measure (e.g., emotion, pain, working memory load, etc).

\[\text{outcome} = \sum_{i}^n \beta_i\cdot \text{voxel}_i + \epsilon\]

Here we are learning a model of \(\beta\) values across the brain that when multiplied by new data will predict the intensity of a psychological state or outcome or the probability of being a specific state.

\[\text{predicted outcome} = \text{model} \cdot \text{brain data}\]

This is the general approach behind supervised learning algorithms. The intuition behind this approach is that brain signal might not be functionally localizable to a single region, but instead might be distributed throughout the brain. Patterns of brain activity can thus be used to decode psychological states.

The focus of this supervised learning approach is to accurately predict or classify the outcome, whereas the goal in classical statistics is to test hypotheses about which regressor explains the most independent variance of the dependent variable. These two approaches are complementary, but require thinking carefully about different issues.

In mass-univariate testing, we spent a lot of time thinking carefully about independence of errors (e.g., multi-level models) and correcting for multiple hypothesis tests. In multivariate prediction/classification or multivoxel pattern analysis as it is often called, we need to carefully think about feature selection - which voxels we will include in our model, and cross-validation - how well our model will generalize to new data. In MVPA, we typically have more features (i.e., voxels) then data points (i.e., n < p), so this requires performing feature selection or a data reduction step. The algorithms used to learn patterns come from the field of machine-learning are very good at detecting patterns, but have a tendency to overfit the training data.

In this tutorial, we will cover the basic steps of multivariate prediction/classification:

  1. Data Extraction - what data should we use to train model?

  2. Feature Selection - which features or voxels should we use?

  3. Cross-validation - how do we train and test the model?

  4. Model Interpretation - how should we interpret the model?

Why MVPA?

For most of our tutorials, we have tended to focus on the basics of how to perform a specific type of analysis, and have largely ignored questions about why we might be interested in specific questions. While univariate GLM analyses allow us to localize which regions might be associated with a specific psychological process, MVPA analyses, and specifically multivariate decoding, allows us to identify distributed representations throughout the brain. These might be localizable to a single region, or may be diffusely encompass many different brain systems.

Before we dive into the details of how to conduct these analyses, let’ learn a little bit about the theoretical background motivating this appraoch in two videos from Tor Wager.

from IPython.display import YouTubeVideo

YouTubeVideo('87yKz23sPnE')
YouTubeVideo('FAyPEr7eu4M')

Important MVPA Concepts

Now, we are ready to dive into the details. In this tutorial, we will be using the nltools toolsbox to run these models, but also see (nilearn, brainiak, and pyMPVA) for excellent alternatives.

Running MVPA style analyses using multivariate regression is surprisingly easier and faster than univariate methods. All you need to do is specify the algorithm and cross-validation parameters. Currently, we have several different linear algorithms implemented from scikit-learn in the nltools package.

To make sure you understand all of the key concepts involved in the practical aspects of conducting MVPA, let’s watch two short videos by Martin Lindquist before we dive into the code.

YouTubeVideo('dJIb5bzkQHQ')
YouTubeVideo('zKMsJyiL5Dc')

Data Extraction

The first step in MVPA is to decide what data you want to use to predict your outcome variable. Typically, researchers perform a temporal data reduction step, which involves estimating a standard univariate GLM using a single subject first-level model. This model will specify regressors for a single trial, or model a specific condition type over many trials. Just as in the standard univariate approach, these regressors are convolved with an HRF function. These models also usually include nuisance covariates (e.g., motion parameters, spikes, filters, linear trends, etc.). The estimated beta maps from this temporal reduction step are then used as the input data into the prediction model. Note that it is also possible to learn a spatiotemporal model that includes the voxels from each TR measured during a a given trial, but this is less common in practice.

First, let’s load the modules we need for this analysis.

%matplotlib inline

import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from nltools.data import Brain_Data
from nltools.mask import expand_mask
from bids import BIDSLayout, BIDSValidator
from nilearn.plotting import view_img_on_surf

data_dir = '../data/localizer'
layout = BIDSLayout(data_dir, derivatives=True)

Now let’s load some data to train a model.

In this example, let’s continue to use data from the Pinel Localizer Task that we have been using throughout all of our tutorials. For our first analysis, let’s attempt to classify Left from Right motor activation. We will load a single beta image for each subject that we already estimated in earlier tutorials. We are sorting the files so that subjects are in the same order, then we are stacking all of the images together using .append() such that the data looks like \(Subject_{1, left}, ... Subject_{n, left}, Subject_{1, right}, ... Subject_{n, right}\).

left_file_list = glob.glob(os.path.join(data_dir, 'derivatives','fmriprep', '*','func','*_video_left*.nii.gz'))
left_file_list.sort() 
left = Brain_Data(left_file_list)

right_file_list = glob.glob(os.path.join(data_dir, 'derivatives','fmriprep', '*','func','*_video_right*.nii.gz'))
right_file_list.sort()
right = Brain_Data(right_file_list)

data = left.append(right)

Next, we need to create the labels or outcome variable to train the model. We will make a vector of ones and zeros to indicate left images and right images, respectively.

We assign this vector to the data.Y attribute of the Brain_Data instance.

Y = pd.DataFrame(np.hstack([np.ones(len(left_file_list)), np.zeros(len(left_file_list))]))

data.Y = Y

okay, we are ready to go. Let’s now train our first model. We will use a support vector machine (SVM) to learn a pattern that can discriminate left from right motor responses across all 9 participants.

svm_stats = data.predict(algorithm='svm', **{'kernel':"linear"})
overall accuracy: 1.00
../_images/Multivariate_Prediction_14_1.png

the results of this analysis are stored in a dictionary.

  • Y: training labels

  • yfit_all: predicted labels

  • dist_from_hyperplane_all: how far the prediction is from the classifier hyperplane through feature space, > 0 indicates left, while < 0 indicates right.

  • intercept: scalar value which indicates how much to add to the prediction to get the correct class label.

  • weight_map: multivariate brain model

  • mcr_all: overall model accuracy in classifying training data

print(svm_stats.keys())
dict_keys(['Y', 'yfit_all', 'dist_from_hyperplane_all', 'intercept', 'weight_map', 'mcr_all'])

You can see that that the model can perfectly discriminate between left and right using the training data. This is great, but we definitely shouldn’t get our hopes up as this model is completely being overfit to the training data. To get an unbiased estimate of the accuracy we will need to test the model on independent data.

We can also examine the model weights more thoroughly by plotting it. This shows that we see a very nice expected motor cortex representation, but notice that there are many other regions also contributing to the prediction.

view_img_on_surf(svm_stats['weight_map'].to_nifti())