Written by Luke Chang & Sasha Brietzke
In this tutorial we will learn the basics of the organization of data folders, and how to load, plot, and manipulate neuroimaging data in Python.
To introduce the basics of fMRI data structures, watch this short video by Martin Lindquist.
from IPython.display import YouTubeVideo YouTubeVideo('OuRdQJMU5ro')
There are many different software packages to analyze neuroimaging data. Most of them are open source and free to use (with the exception of BrainVoyager). The most popular ones (SPM, FSL, & AFNI) have been around a long time and are where many new methods are developed and distributed. These packages have focused on implementing what they believe are the best statistical methods, ease of use, and computational efficiency. They have very large user bases so many bugs have been identified and fixed over the years. There are also lots of publicly available documentation, listserves, and online tutorials, which makes it very easy to get started using these tools.
There are also many more boutique packages that focus on specific types of preprocessing step and analyses such as spatial normalization with ANTs, connectivity analyses with the conn-toolbox, representational similarity analyses with the rsaToolbox, and prediction/classification with pyMVPA.
Many packages have been developed within proprietary software such as Matlab (e.g., SPM, Conn, RSAToolbox, etc). Unfortunately, this requires that your university has site license for Matlab and many individual add-on toolboxes. If you are not affiliated with a University, you may have to pay for Matlab, which can be fairly expensive. There are free alternatives such as octave, but octave does not include many of the add-on toolboxes offered by matlab that may be required for a specific package. Because of this restrictive licensing, it is difficult to run matlab on cloud computing servers and to use with free online courses such as dartbrains. Other packages have been written in C/C++/C# and need to be compiled to run on your specific computer and operating system. While these tools are typically highly computationally efficient, it can sometimes be challenging to get them to install and work on specific computers and operating systems.
There has been a growing trend to adopt the open source Python framework in the data science and scientific computing communities, which has lead to an explosion in the number of new packages available for statistics, visualization, machine learning, and web development. pyMVPA was an early leader in this trend, and there are many great tools that are being actively developed such as nilearn, brainiak, neurosynth, nipype, fmriprep, and many more. One exciting thing is that these newer developments have built on the expertise of decades of experience with imaging analyses, and leverage changes in high performance computing. There is also a very tight integration with many cutting edge developments in adjacent communities such as machine learning with scikit-learn, tensorflow, and pytorch, which has made new types of analyses much more accessible to the neuroimaging community. There has also been an influx of younger contributors with software development expertise. You might be surprised to know that many of the popular tools being used had core contributors originating from the neuroimaging community (e.g., scikit-learn, seaborn, and many more).
For this course, I have chosen to focus on tools developed in Python as it is an easy to learn programming language, has excellent tools, works well on distributed computing systems, has great ways to disseminate information (e.g., jupyter notebooks, jupyter-book, etc), and is free! If you are just getting started, I would spend some time working with NiLearn and Brainiak, which have a lot of functionality, are very well tested, are reasonably computationally efficient, and most importantly have lots of documentation and tutorials to get started.
We will be using many packages throughout the course such as PyBids to navigate neuroimaging datasets, fmriprep to perform preprocessing, and nltools, which is a package developed in my lab, to do basic data manipulation and analysis. NLtools is built using many other toolboxes such as nibabel and nilearn, and we will also be using these frequently throughout the course.
Recently, there has been growing interest to share datasets across labs and even on public repositories such as openneuro. In order to make this a succesful enterprise, it is necessary to have some standards in how the data are named and organized. Historically, each lab has used their own idiosyncratic conventions, which can make it difficult for outsiders to analyze. In the past few years, there have been heroic efforts by the neuroimaging community to create a standardized file organization and naming practices. This specification is called BIDS for Brain Imaging Dataset Specification.
As you can imagine, individuals have their own distinct method of organizing their files. Think about how you keep track of your files on your personal laptop (versus your friend). This may be okay in the personal realm, but in science, it's best if anyone (especially yourself 6 months from now!) can follow your work and know which files mean what by their titles.
Here's an example of non-Bids versus BIDS dataset found in this paper:
Here are a few major differences between the two datasets:
- In BIDS, files are in nifti format (not dicoms).
- In BIDS, scans are broken up into separate folders by type of scan(functional versus anatomical versus diffusion weighted) for each subject.
- In BIDS, JSON files are included that contain descriptive information about the scans (e.g., acquisition parameters)
Not only can using this specification be useful within labs to have a set way of structuring data, but it can also be useful when collaborating across labs, developing and utilizing software, and publishing data.
In addition, because this is a consistent format, it is possible to have a python package to make it easy to query a dataset. We recommend using pybids.
The dataset we will be working with has already been converted to the BIDS format (see download localizer tutorial).
You may need to install pybids to query the BIDS datasets using following command
!pip install pybids.
Pybids is a package to help query and navigate a neurogimaging dataset that is in the BIDs format. At the core of pybids is the
BIDSLayout object. A
BIDSLayout is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the BIDSLayout initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a BIDSLayout by passing in the BIDS dataset root location as a single argument.
Notice we are setting
derivatives=True. This means the layout will also index the derivatives sub folder, which might contain preprocessed data, analyses, or other user generated files.
from bids import BIDSLayout, BIDSValidator import os data_dir = '../data/localizer' layout = BIDSLayout(data_dir, derivatives=True) layout
BIDS Layout: ...pbox/Dartbrains/Data/localizer | Subjects: 94 | Sessions: 0 | Runs: 0
When we initialize a BIDSLayout, all of the files and metadata found under the specified root folder are indexed. This can take a few seconds (or, for very large datasets, a minute or two). Once initialization is complete, we can start querying the BIDSLayout in various ways. The main query method is
.get(). If we call .
get() with no additional arguments, we get back a list of all the BIDS files in our dataset.
Let's return the first 10 files
[<BIDSJSONFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/dataset_description.json'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/.DS_Store'>, <BIDSJSONFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/dataset_description.json'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/logs/CITATION.bib'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/logs/CITATION.html'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/logs/CITATION.md'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/logs/CITATION.tex'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/sub-S01.html'>, <BIDSFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/sub-S01/.DS_Store'>, <BIDSJSONFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/derivatives/fmriprep/sub-S01/anat/sub-S01_desc-brain_mask.json'>]
As you can see, just a generic
.get() call gives us all of the files. We will definitely want to be a bit more specific. We can specify the type of data we would like to query. For example, suppose we want to return the first 10 subject ids.
['S01', 'S02', 'S03', 'S04', 'S05', 'S06', 'S07', 'S08', 'S09', 'S10']
Or perhaps, we would like to get the file names for the raw bold functional nifti images for the first 10 subjects. We can filter files in the
scope='raw', to only query raw bold nifti files.
layout.get(target='subject', scope='raw', suffix='bold', return_type='file')[:10]
['/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S01/func/sub-S01_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S02/func/sub-S02_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S03/func/sub-S03_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S04/func/sub-S04_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S05/func/sub-S05_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S06/func/sub-S06_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S07/func/sub-S07_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S08/func/sub-S08_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S09/func/sub-S09_task-localizer_bold.nii.gz', '/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S10/func/sub-S10_task-localizer_bold.nii.gz']
When you call .get() on a BIDSLayout, the default returned values are objects of class BIDSFile. A BIDSFile is a lightweight container for individual files in a BIDS dataset.
Here are some of the attributes and methods available to us in a BIDSFile (note that some of these are only available for certain subclasses of BIDSFile; e.g., you can't call get_image() on a BIDSFile that doesn't correspond to an image file!):
- .path: The full path of the associated file
- .filename: The associated file's filename (without directory)
- .dirname: The directory containing the file
- .get_entities(): Returns information about entities associated with this BIDSFile (optionally including metadata)
- .get_image(): Returns the file contents as a nibabel image (only works for image files)
- .get_df(): Get file contents as a pandas DataFrame (only works for TSV files)
- .get_metadata(): Returns a dictionary of all metadata found in associated JSON files
- .get_associations(): Returns a list of all files associated with this one in some way
Let's explore the first file in a little more detail.
f = layout.get() f
If we wanted to get the path of the file, we can use
Suppose we were interested in getting a list of tasks included in the dataset.
We can query all of the files associated with this task.
layout.get(task='localizer', suffix='bold', scope='raw')[:10]
[<BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S01/func/sub-S01_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S02/func/sub-S02_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S03/func/sub-S03_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S04/func/sub-S04_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S05/func/sub-S05_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S06/func/sub-S06_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S07/func/sub-S07_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S08/func/sub-S08_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S09/func/sub-S09_task-localizer_bold.nii.gz'>, <BIDSImageFile filename='/Users/lukechang/Dropbox/Dartbrains/Data/localizer/sub-S10/func/sub-S10_task-localizer_bold.nii.gz'>]
Notice that there are nifti and event files. We can get the filename for the first particant's functional run
f = layout.get(task='localizer').filename f
If you want a summary of all the files in your BIDSLayout, but don't want to have to iterate BIDSFile objects and extract their entities, you can get a nice bird's-eye view of your dataset using the
289 rows × 6 columns
Neuroimaging data is often stored in the format of nifti files
.nii which can also be compressed using gzip
.nii.gz. These files store both 3D and 4D data and also contain structured metadata in the image header.
There is an very nice tool to access nifti data stored on your file system in python called nibabel. If you don't already have nibabel installed on your computer it is easy via
pip. First, tell the jupyter cell that you would like to access the unix system outside of the notebook and then install nibabel using pip
!pip install nibabel. You only need to run this once (unless you would like to update the version).
nibabel objects can be initialized by simply pointing to a nifti file even if it is compressed through gzip. First, we will import the nibabel module as
nib (short and sweet so that we don't have to type so much when using the tool). I'm also including a path to where the data file is located so that I don't have to constantly type this. It is easy to change this on your own computer.
We will use pybids to grab subject S01's T1 image.
import nibabel as nib data = nib.load(layout.get(subject='S01', scope='derivatives', suffix='T1w', return_type='file', extension='nii.gz'))
The imaging data is stored in either a 3D or 4D numpy array. Just like numpy, it is easy to get the dimensions of the data using
(193, 229, 193)
Looks like there are 3 dimensions (x,y,z) that is the number of voxels in each dimension. If we know the voxel size, we could convert this into millimeters.
We can also directly access the data and plot a single slice using standard matplotlib functions.
%matplotlib inline import matplotlib.pyplot as plt plt.imshow(data.get_fdata()[:,:,50])
<matplotlib.image.AxesImage at 0x7fadc1706e50>
Try slicing different dimensions (x,y,z) yourself to get a feel for how the data is represented in this anatomical image.
We can also access data from the image header. Let's assign the header of an image to a variable and print it to view it's contents.
header = data.header print(header)
<class 'nibabel.nifti1.Nifti1Header'> object, endian='<' sizeof_hdr : 348 data_type : b'' db_name : b'' extents : 0 session_error : 0 regular : b'r' dim_info : 0 dim : [ 3 193 229 193 1 1 1 1] intent_p1 : 0.0 intent_p2 : 0.0 intent_p3 : 0.0 intent_code : none datatype : float32 bitpix : 32 slice_start : 0 pixdim : [1. 1. 1. 1. 0. 0. 0. 0.] vox_offset : 0.0 scl_slope : nan scl_inter : nan slice_end : 0 slice_code : unknown xyzt_units : 2 cal_max : 0.0 cal_min : 0.0 slice_duration : 0.0 toffset : 0.0 glmax : 0 glmin : 0 descrip : b'xform matrices modified by FixHeaderApplyTransforms (niworkflows v1.1.12).' aux_file : b'' qform_code : mni sform_code : mni quatern_b : 0.0 quatern_c : 0.0 quatern_d : 0.0 qoffset_x : -96.0 qoffset_y : -132.0 qoffset_z : -78.0 srow_x : [ 1. 0. 0. -96.] srow_y : [ 0. 1. 0. -132.] srow_z : [ 0. 0. 1. -78.] intent_name : b'' magic : b'n+1'
Some of the important information in the header is information about the orientation of the image in space. This can be represented as the affine matrix, which can be used to transform images between different spaces.
array([[ 1., 0., 0., -96.], [ 0., 1., 0., -132.], [ 0., 0., 1., -78.], [ 0., 0., 0., 1.]])
We will dive deeper into affine transformations in the preprocessing tutorial.
%matplotlib inline from nilearn.plotting import view_img, plot_glass_brain, plot_anat, plot_epi
<nilearn.plotting.displays.OrthoSlicer at 0x7fadd13c4c50>
Nilearn plotting functions are very flexible and allow us to easily customize our plots
plot_anat(data, draw_cross=False, display_mode='z')
<nilearn.plotting.displays.ZSlicer at 0x7fadc157fd50>
try to get more information how to use the function with
? and try to add different commands to change the plot.
nilearn also has a neat interactive viewer called
view_img for examining images directly in the notebook.