Introduction to Neuroimaging Data

In this tutorial we will learn the basics of the organization of data folders, and how to load, plot, and manipulate neuroimaging data in Python.

To introduce the basics of fMRI data structures, watch this short video by Martin Lindquist.

from IPython.display import YouTubeVideo


Software Packages

There are many different software packages to analyze neuroimaging data. Most of them are open source and free to use (with the exception of BrainVoyager). The most popular ones (SPM, FSL, & AFNI) have been around a long time and are where many new methods are developed and distributed. These packages have focused on implementing what they believe are the best statistical methods, ease of use, and computational efficiency. They have very large user bases so many bugs have been identified and fixed over the years. There are also lots of publicly available documentation, listserves, and online tutorials, which makes it very easy to get started using these tools.

There are also many more boutique packages that focus on specific types of preprocessing step and analyses such as spatial normalization with ANTs, connectivity analyses with the conn-toolbox, representational similarity analyses with the rsaToolbox, and prediction/classification with pyMVPA.

Many packages have been developed within proprietary software such as Matlab (e.g., SPM, Conn, RSAToolbox, etc). Unfortunately, this requires that your university has site license for Matlab and many individual add-on toolboxes. If you are not affiliated with a University, you may have to pay for Matlab, which can be fairly expensive. There are free alternatives such as octave, but octave does not include many of the add-on toolboxes offered by matlab that may be required for a specific package. Because of this restrictive licensing, it is difficult to run matlab on cloud computing servers and to use with free online courses such as dartbrains. Other packages have been written in C/C++/C# and need to be compiled to run on your specific computer and operating system. While these tools are typically highly computationally efficient, it can sometimes be challenging to get them to install and work on specific computers and operating systems.

There has been a growing trend to adopt the open source Python framework in the data science and scientific computing communities, which has lead to an explosion in the number of new packages available for statistics, visualization, machine learning, and web development. pyMVPA was an early leader in this trend, and there are many great tools that are being actively developed such as nilearn, brainiak, neurosynth, nipype, fmriprep, and many more. One exciting thing is that these newer developments have built on the expertise of decades of experience with imaging analyses, and leverage changes in high performance computing. There is also a very tight integration with many cutting edge developments in adjacent communities such as machine learning with scikit-learn, tensorflow, and pytorch, which has made new types of analyses much more accessible to the neuroimaging community. There has also been an influx of younger contributors with software development expertise. You might be surprised to know that many of the popular tools being used had core contributors originating from the neuroimaging community (e.g., scikit-learn, seaborn, and many more).

For this course, I have chosen to focus on tools developed in Python as it is an easy to learn programming language, has excellent tools, works well on distributed computing systems, has great ways to disseminate information (e.g., jupyter notebooks, jupyter-book, etc), and is free! If you are just getting started, I would spend some time working with NiLearn and Brainiak, which have a lot of functionality, are very well tested, are reasonably computationally efficient, and most importantly have lots of documentation and tutorials to get started.

We will be using many packages throughout the course such as PyBids to navigate neuroimaging datasets, fmriprep to perform preprocessing, and nltools, which is a package developed in my lab, to do basic data manipulation and analysis. NLtools is built using many other toolboxes such as nibabel and nilearn, and we will also be using these frequently throughout the course.

BIDS: Brain Imaging Dataset Specification

Recently, there has been growing interest to share datasets across labs and even on public repositories such as openneuro. In order to make this a successful enterprise, it is necessary to have some standards in how the data are named and organized. Historically, each lab has used their own idiosyncratic conventions, which can make it difficult for outsiders to analyze. In the past few years, there have been heroic efforts by the neuroimaging community to create a standardized file organization and naming practices. This specification is called BIDS for Brain Imaging Dataset Specification.

As you can imagine, individuals have their own distinct method of organizing their files. Think about how you keep track of your files on your personal laptop (versus your friend). This may be okay in the personal realm, but in science, it’s best if anyone (especially yourself 6 months from now!) can follow your work and know which files mean what by their titles.

Here’s an example of non-Bids versus BIDS dataset found in this paper:


Here are a few major differences between the two datasets:

  1. In BIDS, files are in nifti format (not dicoms).

  2. In BIDS, scans are broken up into separate folders by type of scan(functional versus anatomical versus diffusion weighted) for each subject.

  3. In BIDS, JSON files are included that contain descriptive information about the scans (e.g., acquisition parameters)

Not only can using this specification be useful within labs to have a set way of structuring data, but it can also be useful when collaborating across labs, developing and utilizing software, and publishing data.

In addition, because this is a consistent format, it is possible to have a python package to make it easy to query a dataset. We recommend using pybids.

The dataset we will be working with has already been converted to the BIDS format (see download localizer tutorial).

You may need to install pybids to query the BIDS datasets using following command !pip install pybids.

The BIDSLayout

Pybids is a package to help query and navigate a neuroimaging dataset that is in the BIDs format. At the core of pybids is the BIDSLayout object. A BIDSLayout is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the BIDSLayout initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a BIDSLayout by passing in the BIDS dataset root location as a single argument.

Notice we are setting derivatives=True. This means the layout will also index the derivatives sub folder, which might contain preprocessed data, analyses, or other user generated files.

from bids import BIDSLayout, BIDSValidator
import os

data_dir = '../data/localizer'
layout = BIDSLayout(data_dir, derivatives=True)
BIDS Layout: ...thub/dartbrains/data/localizer | Subjects: 2 | Sessions: 0 | Runs: 0

When we initialize a BIDSLayout, all of the files and metadata found under the specified root folder are indexed. This can take a few seconds (or, for very large datasets, a minute or two). Once initialization is complete, we can start querying the BIDSLayout in various ways. The main query method is .get(). If we call .get() with no additional arguments, we get back a list of all the BIDS files in our dataset.

Let’s return the first 10 files

[<BIDSJSONFile filename='/Users/lukechang/Github/dartbrains/data/localizer/dataset_description.json'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/.DS_Store'>,
 <BIDSJSONFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/dataset_description.json'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/logs/CITATION.bib'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/logs/CITATION.html'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/logs/'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/logs/CITATION.tex'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/sub-S01.html'>,
 <BIDSFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/sub-S01/.DS_Store'>,
 <BIDSJSONFile filename='/Users/lukechang/Github/dartbrains/data/localizer/derivatives/fmriprep/sub-S01/anat/sub-S01_desc-brain_mask.json'>]

As you can see, just a generic .get() call gives us all of the files. We will definitely want to be a bit more specific. We can specify the type of data we would like to query. For example, suppose we want to return the first 10 subject ids.

layout.get(target='subject', return_type='id', scope='derivatives')[:10]
['S01', 'S02', 'S03', 'S04', 'S05', 'S06', 'S07', 'S08', 'S09', 'S10']

Or perhaps, we would like to get the file names for the raw bold functional nifti images for the first 10 subjects. We can filter files in the raw or derivatives, using scope keyword.scope='raw', to only query raw bold nifti files.

layout.get(target='subject', scope='raw', suffix='bold', return_type='file')[:10]

When you call .get() on a BIDSLayout, the default returned values are objects of class BIDSFile. A BIDSFile is a lightweight container for individual files in a BIDS dataset.

Here are some of the attributes and methods available to us in a BIDSFile (note that some of these are only available for certain subclasses of BIDSFile; e.g., you can’t call get_image() on a BIDSFile that doesn’t correspond to an image file!):

  • .path: The full path of the associated file

  • .filename: The associated file’s filename (without directory)

  • .dirname: The directory containing the file

  • .get_entities(): Returns information about entities associated with this BIDSFile (optionally including metadata)

  • .get_image(): Returns the file contents as a nibabel image (only works for image files)

  • .get_df(): Get file contents as a pandas DataFrame (only works for TSV files)

  • .get_metadata(): Returns a dictionary of all metadata found in associated JSON files

  • .get_associations(): Returns a list of all files associated with this one in some way

Let’s explore the first file in a little more detail.

f = layout.get()[0]
<BIDSJSONFile filename='/Users/lukechang/Github/dartbrains/data/localizer/dataset_description.json'>

If we wanted to get the path of the file, we can use .path.


Suppose we were interested in getting a list of tasks included in the dataset.


We can query all of the files associated with this task.

layout.get(task='localizer', suffix='bold', scope='raw')[:10]
[<BIDSImageFile filename='/Users/lukechang/Github/dartbrains/data/localizer/sub-S01/func/sub-S01_task-localizer_bold.nii.gz'>,
 <BIDSImageFile filename='/Users/lukechang/Github/dartbrains/data/localizer/sub-S02/func/sub-S02_task-localizer_bold.nii.gz'>,
 <BIDSJSONFile filename='/Users/lukechang/Github/dartbrains/data/localizer/task-localizer_bold.json'>]

Notice that there are nifti and event files. We can get the filename for the first particant’s functional run

f = layout.get(task='localizer')[0].filename

If you want a summary of all the files in your BIDSLayout, but don’t want to have to iterate BIDSFile objects and extract their entities, you can get a nice bird’s-eye view of your dataset using the to_df() method.

entity path datatype extension subject suffix task
0 /Users/lukechang/Github/dartbrains/data/locali... NaN json NaN description NaN
1 /Users/lukechang/Github/dartbrains/data/locali... NaN json NaN participants NaN
2 /Users/lukechang/Github/dartbrains/data/locali... NaN tsv NaN participants NaN
3 /Users/lukechang/Github/dartbrains/data/locali... anat nii.gz S01 T1w NaN
4 /Users/lukechang/Github/dartbrains/data/locali... func nii.gz S01 bold localizer
... ... ... ... ... ... ...
677 /Users/lukechang/Github/dartbrains/data/locali... NaN NaN NaN NaN NaN
678 /Users/lukechang/Github/dartbrains/data/locali... NaN NaN NaN NaN NaN
679 /Users/lukechang/Github/dartbrains/data/locali... NaN NaN NaN NaN NaN
680 /Users/lukechang/Github/dartbrains/data/locali... NaN NaN NaN NaN NaN
681 /Users/lukechang/Github/dartbrains/data/locali... NaN NaN NaN NaN NaN

682 rows × 6 columns

Loading Data with Nibabel

Neuroimaging data is often stored in the format of nifti files .nii which can also be compressed using gzip .nii.gz. These files store both 3D and 4D data and also contain structured metadata in the image header.

There is an very nice tool to access nifti data stored on your file system in python called nibabel. If you don’t already have nibabel installed on your computer it is easy via pip. First, tell the jupyter cell that you would like to access the unix system outside of the notebook and then install nibabel using pip !pip install nibabel. You only need to run this once (unless you would like to update the version).

nibabel objects can be initialized by simply pointing to a nifti file even if it is compressed through gzip. First, we will import the nibabel module as nib (short and sweet so that we don’t have to type so much when using the tool). I’m also including a path to where the data file is located so that I don’t have to constantly type this. It is easy to change this on your own computer.

We will be loading an anatomical image from subject S01 from the localizer dataset. See this paper for more information about this dataset.

We will use pybids to grab subject S01’s T1 image.

import nibabel as nib

data = nib.load(layout.get(subject='S01', scope='derivatives', suffix='T1w', return_type='file', extension='nii.gz')[1])

If we want to get more help on how to work with the nibabel data object we can either consult the documentation or add a ?.


The imaging data is stored in either a 3D or 4D numpy array. Just like numpy, it is easy to get the dimensions of the data using shape.

(193, 229, 193)

Looks like there are 3 dimensions (x,y,z) that is the number of voxels in each dimension. If we know the voxel size, we could convert this into millimeters.

We can also directly access the data and plot a single slice using standard matplotlib functions.

%matplotlib inline

import matplotlib.pyplot as plt

<matplotlib.image.AxesImage at 0x7fad486d8e90>

Try slicing different dimensions (x,y,z) yourself to get a feel for how the data is represented in this anatomical image.

We can also access data from the image header. Let’s assign the header of an image to a variable and print it to view it’s contents.

header = data.header
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b'r'
dim_info        : 0
dim             : [  3 193 229 193   1   1   1   1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : float32
bitpix          : 32
slice_start     : 0
pixdim          : [1. 1. 1. 1. 0. 0. 0. 0.]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 0
slice_code      : unknown
xyzt_units      : 2
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b'xform matrices modified by FixHeaderApplyTransforms (niworkflows v1.1.12).'
aux_file        : b''
qform_code      : mni
sform_code      : mni
quatern_b       : 0.0
quatern_c       : 0.0
quatern_d       : 0.0
qoffset_x       : -96.0
qoffset_y       : -132.0
qoffset_z       : -78.0
srow_x          : [  1.   0.   0. -96.]
srow_y          : [   0.    1.    0. -132.]
srow_z          : [  0.   0.   1. -78.]
intent_name     : b''
magic           : b'n+1'

Some of the important information in the header is information about the orientation of the image in space. This can be represented as the affine matrix, which can be used to transform images between different spaces.

array([[   1.,    0.,    0.,  -96.],
       [   0.,    1.,    0., -132.],
       [   0.,    0.,    1.,  -78.],
       [   0.,    0.,    0.,    1.]])

We will dive deeper into affine transformations in the preprocessing tutorial.

Plotting Data with Nilearn

There are many useful tools from the nilearn library to help manipulate and visualize neuroimaging data. See their documentation for an example.

In this section, we will explore a few of their different plotting functions, which can work directly with nibabel instances.

%matplotlib inline

from nilearn.plotting import view_img, plot_glass_brain, plot_anat, plot_epi
<nilearn.plotting.displays.OrthoSlicer at 0x7fad5aeb0c10>

Nilearn plotting functions are very flexible and allow us to easily customize our plots

plot_anat(data, draw_cross=False, display_mode='z')
<nilearn.plotting.displays.ZSlicer at 0x7fad3114af10>

try to get more information how to use the function with ? and try to add different commands to change the plot.

nilearn also has a neat interactive viewer called view_img for examining images directly in the notebook.