Preprocessing

Written by Luke Chang Being able to study brain activity associated with cognitive processes in humans is an amazing achievement. However, there is an extraordinary amount of noise and very low levels of signal, which makes it difficult to make inferences about brain function using BOLD imaging. A critical step before any analysis is to remove as much noise as possible. The series of steps to remove noise comprise our neuroimaging data preprocessing pipeline. See slides on our preprocessing lecture here.
In this lab, we will go over the basics of preprocessing fMRI data using the fmriprep preprocessing pipeline. We will cover:
  • Image transformations (rigid body and affine)
  • Cost functions for image registration
  • Head motion correction (realignment)
  • Spatial normalization
  • Spatial smoothing
  • fMRIPrep automated preprocessing pipeline
There are other preprocessing steps that are also common, but not necessarily performed by all labs such as slice timing and distortion correction. We will not be discussing these in depth outside of the videos. Let’s start with watching a short video by Martin Lindquist to get a general overview of the main steps of preprocessing and the basics of how to transform images and register them to other images.
Translation:
Rotation:
Scale:

Realignment

Now let's put everything we learned together to understand how we can correct for head motion in functional images that occurred during a scanning session. It is extremely important to make sure that a specific voxel has the same 3D coordinate across all time points to be able to model neural processes. This of course is made difficult by the fact that participants move during a scanning session and also in between runs. Realignment is the preprocessing step in which a rigid body transformation is applied to each volume to align them to a common space. One typically needs to choose a reference volume, which might be the first, middle, or last volume, or the mean of all volumes. Let's look at an example of the translation and rotation parameters after running realignment on our first subject.
Don't forget that even though we can approximately put each volume into a similar position with realignment, head motion always distorts the magnetic field and can lead to nonlinear changes in signal intensity that will not be addressed by this procedure. In the resting-state literature, where many analyses are based on functional connectivity, head motion can lead to spurious correlations. Some researchers choose to exclude any subject that moved more than a certain amount. Others choose to remove the impact of these time points in their data through removing the volumes via scrubbing or modeling out the volume with a dummy code in the first level general linear models.
There are many different steps involved in the spatial normalization process and these details vary widely across various imaging software packages. We will briefly discuss some of the steps involved in the anatomical preprocessing pipeline implemented by fMRIprep and will be showing example figures from the output generated by the pipeline. First, brains are extracted from the skull and surrounding dura mater. You can check and see how well the algorithm performed by examining the red outline.Next, the anatomical images are segmented into different tissue types. These tissue maps are used for various types of analyses, including providing a grey matter mask to reduce the computational time in estimating statistics. In addition, they provide masks to aid in extracting average activity in CSF, or white matter, which might be used as covariates in the statistical analyses to account for physiological noise.

fmriprep

Throughout this lab and course, you have frequently heard about fmriprep, which is a functional magnetic resonance imaging (fMRI) data preprocessing pipeline that was developed by a team at the Center for Reproducible Research led by Russ Poldrack and Chris Gorgolewski. Fmriprep was designed to provide an easily accessible, state-of-the-art interface that is robust to variations in scan acquisition protocols, requires minimal user input, and provides easily interpretable and comprehensive error and output reporting. Fmriprep performs basic processing steps (coregistration, normalization, unwarping, noise component extraction, segmentation, skullstripping etc.) providing outputs that are ready for data analysis. fmriprep was built on top of nipype, which is a tool to build preprocessing pipelines in python using graphs. This provides a completely flexible way to create custom pipelines using any type of software while also facilitating easy parallelization of steps across the pipeline on high performance computing platforms. Nipype is completely flexible, but has a fairly steep learning curve and is best for researchers who have strong opinions about how they want to preprocess their data, or are working with nonstandard data that might require adjusting the preprocessing steps or parameters. In practice, most researchers typically use similar preprocessing steps and do not need to tweak the pipelines very often. In addition, many researchers do not fully understand how each preprocessing step will impact their results and would prefer if somebody else picked suitable defaults based on current best practices in the literature. The fmriprep pipeline uses a combination of tools from well-known software packages, including FSL, ANTs, FreeSurfer and AFNI. This pipeline was designed to provide the best software implementation for each stage of preprocessing, and is quickly being updated as methods evolve and bugs are discovered by a growing user base. This tool allows you to easily do the following:
  • Take fMRI data from raw to fully preprocessed form.
  • Implement tools from different software packages.
  • Achieve optimal data processing quality by using the best tools available.
  • Generate preprocessing quality reports, with which the user can easily identify outliers.
  • Receive verbose output concerning the stage of preprocessing for each subject, including meaningful errors.
  • Automate and parallelize processing steps, which provides a significant speed-up from typical linear, manual processing.
  • More information and documentation can be found at https://fmriprep.readthedocs.io/

Quick primer on High Performance Computing

We could run fmriprep on our computer, but this could take a long time if we have a lot of participants. Because we have a limited amount of computational resources on our laptops (e.g., cpus, and memory), we would have to run each participant sequentially. For example, if we had 50 participants, it would take 50 times longer to run all participants than a single one. Imagine if you had 50 computers and ran each participant separate at the same time in parallel across all of the computers. This would allow us to run 50 participants in the same amount of time as a single participant. This is the basic idea behind high performance computing, which contains a cluster of many computers that have been installed in racks. Below is a picture of what Dartmouth's Discovery cluster looks like:
A cluster is simply a collection of nodes. A node can be thought of as an individual computer. Each node contains processors, which encompass multiple cores. Discovery contains 3000+ cores, which is certainly a lot more than your laptop! In order to submit a job, you can create a Portable Batch System (PBS) script that sets up the parameters (e.g., how much time you want your script to run, specifying directory to run, etc) and submits your job to a queue. NOTE: If you end up working in a lab in the future, you will likely need to request access to a system like discovery using this type of link. For a detailed walkthrough of running fmriprep on Dartmouth's Discovery cluster — SLURM scripts, data access on Rolando, and environment setup — see the companion tutorial: Running fMRIPrep on HPC.