Representational Similarity Analysis¶
Written by Luke Chang
Representational Similarity Analysis (RSA) is a multivariate technique that allows one to link disparate types of data based on shared structure in their similarity (or distance) matrices. This technique was initially proposed by Nikolaus Kriegskorte in 2008. Unlike multivariate prediction/classification, RSA does not attempt to directly map brain activity onto a measure, but instead compares the similarities between brain activity and the measure using second-order isomorphisms. This sounds complicated, but is actually quite conceptually simple.
In this tutorial we will cover: 1. how to embed a stimuli in a feature space 2. how to compute similarity or distance within this feature space 3. how to map variations in position within this embedding space onto brain processes.
For a more in depth tutorial, we recommend reading this excellent tutorial written by Mark Thornton for the 2018 MIND Summer School, or watching his video walkthrough.
Let's work through a quick example to outline the specific steps involved in RSA to make this more concrete.
RSA Overview
Imagine that you view 96 different images in the scanner and we are interested in learning more about how the brain processes information about these stimuli. Do we categorize these images into different groups? If so, how do we do this? It might depend on what features, or aspects, of the stimuli that we consider.Feature embedding space
Each image can be described by a number of different variables. These variables could be more abstract, such as is it animate or inanimate? or they can be more low level, such as what color is each object? We can group together different variables into a feature space and then describe each image using this space. Think of each feature as an axis in a multidimensional space. For example, consider the two different dimensions of animacy and softness. Where would a puppy, rock, and pillow be positioned within this two dimensional embedding space?Brain embedding space
We can also embed each image in brain space. Suppose we were interested in identifying the relationship between images within the visual cortex. Here the embedding space is each voxel's activation within the visual cortex. Voxels are now the axes of the representational space. To calculate this we need to create a brain map for every single image using a single trial model. We can use a standard first level GLM to estimate a separate beta map for each image while simultaneously removing noise from the signal by including nuisance covariates (e.g., head motion, spikes, discrete cosine transform filters, etc.)Mapping stimuli relationships onto brain
Now we can test different hypotheses about how the brain might be processing information about each image by mapping the representational structure of the brain space to different features spaces. To make an inference about what each region of the brain is computing, we can evaluate if there are any regions in the brain that exhibit a similar structure as the feature representational space. Remember, these matrices are typically symmetrical (unless there is a directed relationship), so you can extract either the upper or lower triangle of these matrices. Then these triangles are flattened or vectorized, which makes it easy to evaluate the similarity between the triangle of the brain and feature matrices. Because these relationships might not necessarily be linear, it is common to use a spearman rank correlation to examine monotonic relationships between different similarity matrices.# '%matplotlib inline' command supported automatically in marimo
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from nltools.data import Brain_Data, Adjacency
from nltools.mask import expand_mask, roi_to_brain
from nltools.stats import fdr, threshold, fisher_r_to_z, one_sample_permutation
from sklearn.metrics import pairwise_distances
from nilearn.plotting import plot_glass_brain, plot_stat_map, view_img_on_surf, view_img
from dartbrains_tools.data import get_subjects, get_file, CONDITIONS
Single Subject Pattern Similarity¶
Recall that in the Single Subject Model Lab that we ran single subject models for 10 different regressors for the Pinel Localizer task. In this tutorial, we will use our results to learn how to conduct RSA style analyses.
First, let's get a list of all of the subject IDs and load the beta values from each condition for a single subject into a Brain_Data object. We will be using the output of running a 1st-level model for each participant where we saved a separate file for each task condition. See this code for reference from the group analysis tutorial.
_sub = 'S01'
_file_list = [get_file(_sub, 'betas', cond) for cond in CONDITIONS]
conditions = CONDITIONS
beta = Brain_Data(_file_list)
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads. /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform( /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:286: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.nifti_masker.fit_transform(
Next we will compute the pattern similarity across each beta image. We could do this for the whole brain, but it probably makes more sense to look within a single region. Many papers use a searchlight approach in which they examine the pattern similarity within a small sphere centered on each voxel. The advantage of this approach is that it uses the same number of voxels across searchlights and allows one to investigate the spatial topography at a relatively fine-scale. However, this procedure is fairly computationally expensive as it needs to be computed over each voxel and just like univariate analyses, will require stringent correction for multiple tests as we learned about in tutorial 12: Thresholding Group Analyses. Personally, I prefer to use whole-brain parcellations as they provide a nice balance between spatial specificity and computational efficiency. In this tutorial, we will continue to use functional regions of interest from our 50 ROI Neurosynth parcellation. This allows us to cover the entire brain with a relatively course spatial granularity, but requires several orders of magnitude of less computations than using a voxelwise searchlight approach. This means it will run much faster and will require us to use a considerably less stringent statistical threshold to correct for all independent tests. For example, for 50 tests bonferroni correction is p < 0.001 (i.e., .05/50). If we ever wanted better spatial granularity we could use increasingly larger parcellations (e.g., 100 or 200).
Let's load our parcellation mask so that we can examine the pattern similarity across these conditions for each ROI.
mask = Brain_Data('https://neurovault.org/media/images/8423/k50_2mm.nii.gz')
mask_x = expand_mask(mask)
mask.plot()
/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:253: UserWarning: [NiftiMasker.fit] Generation of a mask has been requested (imgs != None) while a mask was given at masker creation. Given mask will be used. self.data = self.nifti_masker.fit_transform(data) /home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py:718: UserWarning: Data array used to create a new image contains 64-bit ints. This is likely due to creating the array with numpy and passing `int` as the `dtype`. Many tools such as FSL and SPM cannot deal with int64 in Nifti images, so for compatibility the data has been converted to int32. return self.nifti_masker.inverse_transform(self.data)
ValueError: cut_coords passed does not match the display mode. ZSlicer plotting expects a number, list, tuple or array of numbers or None.You provided cut_coords=range(-40, 60, 10).
Traceback (most recent call last):
File "", line 4, in <module>
mask.plot()
~~~~~~~~~^^
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nltools/data/brain_data.py", line 810, in plot
plot_stat_map(
~~~~~~~~~~~~~^
self.to_nifti(),
^^^^^^^^^^^^^^^^
...<7 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nilearn/plotting/image/img_plotting.py", line 1322, in plot_stat_map
display = _plot_img_with_bg(
img=stat_map_img,
...<24 lines>...
**kwargs,
)
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nilearn/plotting/image/img_plotting.py", line 236, in _plot_img_with_bg
display = display_factory(display_mode)(
img,
...<7 lines>...
radiological=radiological,
)
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nilearn/plotting/displays/_slicers.py", line 284, in init_with_figure
cut_coords = cls.find_cut_coords(img, threshold, cut_coords)
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nilearn/plotting/displays/_slicers.py", line 1814, in find_cut_coords
cut_coords = cls._sanitize_cut_coords(cut_coords)
File "/home/runner/work/dartbrains/dartbrains/.venv/lib/python3.13/site-packages/nilearn/plotting/displays/_slicers.py", line 1866, in _sanitize_cut_coords
raise ValueError(
...<4 lines>...
)
ValueError: cut_coords passed does not match the display mode. ZSlicer plotting expects a number, list, tuple or array of numbers or None.You provided cut_coords=range(-40, 60, 10).
Ok, now we will want to calculate the pattern similar within each ROI across the 10 conditions.
We will loop over each ROI and extract the pattern data across all conditions and then compute the correlation distance between each condition. This data will now be an Adjacency object that we discussed in the Lab 13: Connectivity. We will temporarily store this in a list.
Notice that for each iteration of the loop we apply the ROI mask to our beta images and then calculate the correlation distance.
Ancestor raised: An ancestor raised an exception (ValueError)
Let's plot an example ROI and it's associated distance matrix.
Here is a left motor parcel. Notice how the distance is small between the motor left auditory and visual and motor right auditory and visual beta images?
roi = 26
plot_glass_brain(mask_x[roi].to_nifti())
out[roi].labels = conditions
f2 = out[roi].plot(vmin=0, vmax=2, cmap='RdBu_r')
Ancestor raised: An ancestor raised an exception (ValueError)
We can also visualize this distance matrix using multidimensional scaling with the plot_mds() method. This method projects the images into either a 2D or 3D plane for ease of visualization. There are many other distance based projection methods such as T-SNE or UMAP, we encourage readers to check out the excellent hypertools package that has a great implementation of all of these methods.
Notice how the motor right visual and auditory are near each other, while the motor left visual and auditory are grouped together further away?
Ancestor raised: An ancestor raised an exception (ValueError)
It's completely fine to continue to work with distance values, but just to make this slightly more intuitive to understand what is going on we will convert this to similarity. For correlation distance, this entails subtracting each value from 1. This will yield similarity scores in the form of pearson correlations. If you are using unbounded metrics (e.g., euclidean distance), then use the distance_to_similarity() Adjacency method.
We are also adding conditions as labels to the object, which make the plots easier to read.
out_sim = []
for _m in out:
mask_tmp = _m.copy()
mask_tmp = 1 - mask_tmp
mask_tmp.labels = conditions
out_sim.append(mask_tmp)
Ancestor raised: An ancestor raised an exception (ValueError)
Let's look at the heatmap of the similarity matrix to see how more red colors indicate greater similarity between patterns within the left motor cortex across conditions. The Adjacency class only stores the lower triangle for similarity and distance matrices. At the moment, it is incorrectly setting the diagonal to zero when plotting the similarity matrix. The diagonal should be a vector of ones.
roi_1 = 26
plot_glass_brain(mask_x[roi_1].to_nifti())
_f = out_sim[roi_1].plot(vmin=-1, vmax=1, cmap='RdBu_r')
Ancestor raised: An ancestor raised an exception (ValueError)
Testing a Representation Hypothesis¶
Ok, now that we have a sense of what the similarity of patterns look like in left motor cortex, let's create an adjacency matrix indicating a specific relationship between left hand finger tapping across the auditory and visual conditions. This type of adjacency matrix is one way in which we can test a specific hypotheses about the representational structure of the data across all images.
As you can see this only includes edges for the motor-left auditory and motor-left visual conditions.
motor = np.zeros((len(conditions),len(conditions)))
motor[np.diag_indices(len(conditions))] = 1
motor[1,7] = 1
motor[7,1] = 1
motor[2,8] = 1
motor[8,2] = 1
motor = Adjacency(motor, matrix_type='distance', labels=conditions)
motor.plot()
Now let's search over all ROIs to see if any match this representational structure of left motor-cortex using the similarity() method from the Adjacency class. This function uses spearman rank correlations by default. This is probably a good idea as we are most interested in monotonic relationships between the two similarity matrices.
The similarity() method also computes permutations within columns and rows by default. To speed up the analysis, we will set the number of permutations to zero (i.e., n_permute=0).
motor_sim_r = []
for _m in out_sim:
_s = _m.similarity(motor, metric='spearman', n_permute=0)
motor_sim_r.append(_s['correlation'])
Ancestor raised: An ancestor raised an exception (ValueError)
This will return a vector of similarity scores for each ROI, we can plot the distribution of these 50 \(\rho\) values.
plt.hist(motor_sim_r)
plt.xlabel('Similarity (rho)', fontsize=18)
plt.ylabel('Frequency', fontsize=18)
Ancestor raised: An ancestor raised an exception (ValueError)
We can also plot these RSA values back onto the brain to see the spatial distribution.
rsa_motor = roi_to_brain(motor_sim_r, mask_x)
plot_stat_map(rsa_motor.to_nifti(), draw_cross=False, display_mode='z', black_bg=True, cut_coords=np.arange(-30,70, 15))
Ancestor raised: An ancestor raised an exception (ValueError)
Notice how left motor cortex is among the ROIs with the highest similarity value? Unfortunately, we can only plot the similarity values and can't threshold them yet because we didn't calculate any p-values.
We could calculate p-values using a permutation test, but this would require us to repeatedly recalculate the similarity between the two matrices and would take a long time (i.e., 5,000 correlations X 50 ROIS). Plus, the inference we want to make isn't really at the single-subject level, but across participants.
Let's now run this same analysis across all participants and run a one-sample t-test across each ROI.
RSA Group Inference¶
Here we calculate the RSA for each ROI for every participant. This will take a little bit of time to run (30 participants X 50 ROIs).
sub_list = get_subjects()
all_sub_similarity = {}
all_sub_motor_rsa = {}
for _sub in sub_list:
_file_list = [get_file(_sub, 'betas', cond) for cond in CONDITIONS]
conditions_1 = CONDITIONS
beta_1 = Brain_Data(_file_list)
sub_pattern = []
motor_sim_r_1 = []
for _m in mask_x:
sub_pattern_similarity = 1 - beta_1.apply_mask(_m).distance(metric='correlation')
sub_pattern_similarity.labels = conditions_1
_s = sub_pattern_similarity.similarity(motor, metric='spearman', n_permute=0)
sub_pattern.append(sub_pattern_similarity)
motor_sim_r_1.append(_s['correlation'])
all_sub_similarity[_sub] = sub_pattern
all_sub_motor_rsa[_sub] = motor_sim_r_1
all_sub_motor_rsa = pd.DataFrame(all_sub_motor_rsa).T
Ancestor raised: An ancestor raised an exception (ValueError)
Now let's calculate a one sample t-test on each ROI, to see which ROI is consistently different from zero across our sample of participants. Because these are r-values, we will first perform a fisher r to z transformation. We will use a non-parametric permutation sign test to perform our null hypothesis test. This will take a minute to run as we will be calculating 5000 permutations for each of 50 ROIs (though these permutations are parallelized across cores).
rsa_stats = []
for i in all_sub_motor_rsa:
rsa_stats.append(one_sample_permutation(fisher_r_to_z(all_sub_motor_rsa[i])))
Ancestor raised: An ancestor raised an exception (ValueError)
We can plot a thresholded map using fdr correction as the threshold
fdr_p = fdr(np.array([x['p'] for x in rsa_stats]), q=0.05)
print(fdr_p)
rsa_motor_r = Brain_Data([x*y['mean'] for x,y in zip(mask_x, rsa_stats)]).sum()
rsa_motor_p = Brain_Data([x*y['p'] for x,y in zip(mask_x, rsa_stats)]).sum()
thresholded = threshold(rsa_motor_r, rsa_motor_p, thr=fdr_p)
plot_glass_brain(thresholded.to_nifti(), cmap='coolwarm')
Ancestor raised: An ancestor raised an exception (ValueError)
Looks like nothing survives FDR. Let's try a more liberal uncorrected threshold.
thresholded_1 = threshold(rsa_motor_r, rsa_motor_p, thr=0.01)
plot_glass_brain(thresholded_1.to_nifti(), cmap='coolwarm')
Ancestor raised: An ancestor raised an exception (ValueError)
Ancestor raised: An ancestor raised an exception (ValueError)
This looks a little better and makes it clear that the ROI that has the highest similarity with our model specifying the representational structure of motor cortex is precisely the left motor cortex. Though this only shows up using a liberal uncorrected threshold, remember that we are only using about 9 subjects for this demo.
Though this was a very simple model using a very simple dataset, hopefully this example has illustrated how straightforward it is to run RSA style analyses using any type of data. Most people use this method to examine representations of much more complicated feature spaces.
It is also possible to combine different hypotheses or models to decompose a brain representational similarity matrix. This method typically uses regression rather than correlations (see Parkinson et al., 2017)
We have recently used this technique in our own work to map similarity in brain space to individual variability in a computational model of decision-making. If you are interested in seeing an example of intersubject RSA (IS-RSA), check out the paper, and accompanying python code.