Download Data

Written by Luke Chang

Many of the imaging tutorials throughout this course will use open data from the Pinel Localizer task.

The Pinel Localizer task was designed to probe several different types of basic cognitive processes, such as visual perception, finger tapping, language, and math. Several of the tasks are cued by reading text on the screen (i.e., visual modality) and also by hearing auditory instructions (i.e., auditory modality). The trials are randomized across conditions and have been optimized to maximize efficiency for a rapid event related design. There are 100 trials in total over a 5-minute scanning session. Read the original paper for more specific details about the task and the dataset paper.

This dataset is well suited for these tutorials as it is (a) publicly available to anyone in the world, (b) relatively small (only about 5min), and (c) provides many options to create different types of contrasts.

There are a total of 94 subjects available, but we will primarily only be working with a smaller subset of about 15.

Though the data is being shared on the OSF website, we recommend downloading it from our g-node repository as we have fixed a few issues with BIDS formatting and have also performed preprocessing using fmriprep.

In this notebook, we will walk through how to access the datset using DataLad. Note, that the entire dataset is fairly large (~42gb), but the tutorials will mostly only be working with a small portion of the data (5.8gb), so there is no need to download the entire thing. If you are taking the Psych60 course at Dartmouth, we have already made the data available on the jupyterhub server.

DataLad

The easist way to access the data is using DataLad, which is an open source version control system for data built on top of git-annex. Think of it like git for data. It provides a handy command line interface for downloading data, tracking changes, and sharing it with others.

While DataLad offers a number of useful features for working with datasets, there are three in particular that we think make it worth the effort to install for this course.

  1. Cloning a DataLad Repository can be completed with a single line of code datalad clone <repository> and provides the full directory structure in the form of symbolic links. This allows you to explore all of the files in the dataset, without having to download the entire dataset at once.

  2. Specific files can be easily downloaded using datalad get <filename>, and files can be removed from your computer at any time using datalad drop <filename>. As these datasets are large, this will allow you to only work with the data that you need for a specific tutorial and you can drop the rest when you are done with it.

  3. All of the DataLad commands can be run within Python using the datalad python api.

We will only be covering a few basic DataLad functions to get and drop data. We encourage the interested reader to read the very comprehensive DataLad User Handbook for more details and troubleshooting.

Installing Datalad

DataLad can be easily installed using pip.

pip install datalad

Unfortunately, it currently requires manually installing the git-annex dependency, which is not automatically installed using pip.

If you are using OSX, we recommend installing git-annex using homebrew package manager.

brew install git-annex

If you are on Debian/Ubuntu we recommend enabling the NeuroDebian repository and installing with apt-get.

sudo apt-get install datalad

For more installation options, we recommend reading the DataLad installation instructions.

!pip install datalad
Requirement already satisfied: datalad in /Users/lukechang/anaconda3/lib/python3.7/site-packages (0.12.6)
Requirement already satisfied: msgpack in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (1.0.0)
Requirement already satisfied: appdirs in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (1.4.3)
Requirement already satisfied: chardet>=3.0.4 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (3.0.4)
Requirement already satisfied: keyring>=8.0 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (21.4.0)
Requirement already satisfied: GitPython>=2.1.12 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (3.1.0)
Requirement already satisfied: fasteners in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (0.15)
Requirement already satisfied: jsmin in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (2.2.2)
Requirement already satisfied: iso8601 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (0.1.12)
Requirement already satisfied: keyrings.alt in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (3.4.0)
Requirement already satisfied: patool>=1.7 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (1.12)
Requirement already satisfied: wrapt in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (1.11.2)
Requirement already satisfied: tqdm in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (4.48.2)
Requirement already satisfied: whoosh in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (2.7.4)
Requirement already satisfied: boto in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (2.49.0)
Requirement already satisfied: simplejson in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (3.17.0)
Requirement already satisfied: PyGithub in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (1.47)
Requirement already satisfied: humanize in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (2.4.0)
Requirement already satisfied: requests>=1.2 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from datalad) (2.24.0)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from keyring>=8.0->datalad) (1.7.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from GitPython>=2.1.12->datalad) (4.0.2)
Requirement already satisfied: monotonic>=0.1 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from fasteners->datalad) (1.5)
Requirement already satisfied: six in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from fasteners->datalad) (1.15.0)
Requirement already satisfied: pyjwt in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from PyGithub->datalad) (1.7.1)
Requirement already satisfied: deprecated in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from PyGithub->datalad) (1.2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from requests>=1.2->datalad) (1.25.10)
Requirement already satisfied: certifi>=2017.4.17 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from requests>=1.2->datalad) (2020.6.20)
Requirement already satisfied: idna<3,>=2.5 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from requests>=1.2->datalad) (2.10)
Requirement already satisfied: zipp>=0.5 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->keyring>=8.0->datalad) (3.1.0)
Requirement already satisfied: smmap<4,>=3.0.1 in /Users/lukechang/anaconda3/lib/python3.7/site-packages (from gitdb<5,>=4.0.1->GitPython>=2.1.12->datalad) (3.0.1)

Download Data with DataLad

The Pinel localizer dataset can be accessed at the following location https://gin.g-node.org/ljchang/Localizer/. To download the Localizer dataset run datalad install https://gin.g-node.org/ljchang/Localizer in a terminal in the location where you would like to install the dataset. Don’t forget to change the directory to a folder on your local computer. The full dataset is approximately 42gb.

You can run this from the notebook using the ! cell magic.

%cd ~/Dropbox/Dartbrains/data

!datalad install https://gin.g-node.org/ljchang/Localizer
/Users/lukechang/Dropbox/Dartbrains/data

Datalad Basics

You might be surprised to find that after cloning the dataset that it barely takes up any space du -sh. This is because cloning only downloads the metadata of the dataset to see what files are included.

You can check to see how big the entire dataset would be if you downloaded everything using datalad status.

%cd ~/Dropbox/Dartbrains/data/Localizer

!datalad status --annex
/Users/lukechang/Dropbox/Dartbrains/data/Localizer
1794 annex'd files (42.1 GB recorded total size)

Getting Data

One of the really nice features of datalad is that you can see all of the data without actually storing it on your computer. When you want a specific file you use datalad get <filename> to download that specific file. Importantly, you do not need to download all of the dat at once, only when you need it.

Now that we have cloned the repository we can grab individual files. For example, suppose we wanted to grab the first subject’s confound regressors generated by fmriprep.

!datalad get participants.tsv

Now we can check and see how much of the total dataset we have downloaded using datalad status

!datalad status --annex all
1794 annex'd files (0.0 B/42.1 GB present/total size)

If you would like to download all of the files you can use datalad get .. Depending on the size of the dataset and the speed of your internet connection, this might take awhile. One really nice thing about datalad is that if your connection is interrupted you can simply run datalad get . again, and it will resume where it left off.

You can also install the dataset and download all of the files with a single command datalad install -g https://gin.g-node.org/ljchang/Localizer. You may want to do this if you have a lot of storage available and a fast internet connection. For most people, we recommend only downloading the files you need for a specific tutorial.

Dropping Data

Most people do not have unlimited space on their hard drives and are constantly looking for ways to free up space when they are no longer actively working with files. Any file in a dataset can be removed using datalad drop. Importantly, this does not delete the file, but rather removes it from your computer. You will still be able to see file metadata after it has been dropped in case you want to download it again in the future.

As an example, let’s drop the Localizer participants .tsv file.

!datalad drop participants.tsv

Datalad has a Python API!

One particularly nice aspect of datalad is that it has a Python API, which means that anything you would like to do with datalad in the commandline, can also be run in Python. See the details of the datalad Python API.

For example, suppose you would like to clone a data repository, such as the Localizer dataset. You can run dl.clone(source=url, path=location). Make sure you set localizer_path to the location where you would like the Localizer repository installed.

import os
import glob
import datalad.api as dl
import pandas as pd

localizer_path = '/Users/lukechang/Dropbox/Dartbrains/data/Localizer'

dl.clone(source='https://gin.g-node.org/ljchang/Localizer', path=localizer_path)
[WARNING] realpath of PWD=/ is / whenever os.getcwd()=/Users/lukechang/Dropbox/Dartbrains/data/Localizer. From now on will be returning os.getcwd(). Directory symlinks in the paths will be resolved 
<Dataset path=/Users/lukechang/Dropbox/Dartbrains/data/Localizer>

We can now create a dataset instance using dl.Dataset(path_to_data).

ds = dl.Dataset(localizer_path)

How much of the dataset have we downloaded? We can check the status of the annex using ds.status(annex='all').

results = ds.status(annex='all')
1794 annex'd files (0.0 B/42.1 GB present/total size)
1794 annex'd files (0.0 B/42.1 GB present/total size)

Looks like it’s empty, which makes sense since we only cloned the dataset.

Now we need to get some data. Let’s start with something small to play with first.

Let’s use glob to find all of the tab-delimited confound data generated by fmriprep.

file_list = glob.glob(os.path.join(localizer_path, '*', 'fmriprep', '*', 'func', '*tsv'))
file_list.sort()
file_list[:10]
['/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S01/func/sub-S01_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S02/func/sub-S02_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S03/func/sub-S03_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S04/func/sub-S04_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S05/func/sub-S05_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S06/func/sub-S06_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S07/func/sub-S07_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S08/func/sub-S08_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S09/func/sub-S09_task-localizer_desc-confounds_regressors.tsv',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S10/func/sub-S10_task-localizer_desc-confounds_regressors.tsv']

glob can search the filetree and see all of the relevant data even though none of it has been downloaded yet.

Let’s now download the first subjects confound regressor file and load it using pandas.

result = ds.get(file_list[0])

confounds = pd.read_csv(file_list[0], sep='\t')
confounds.head()
csf csf_derivative1 csf_derivative1_power2 csf_power2 white_matter white_matter_derivative1 white_matter_power2 white_matter_derivative1_power2 global_signal global_signal_derivative1 ... rot_x_derivative1_power2 rot_x_power2 rot_y rot_y_derivative1 rot_y_derivative1_power2 rot_y_power2 rot_z rot_z_derivative1 rot_z_derivative1_power2 rot_z_power2
0 5164.630182 NaN NaN 2.667340e+07 4006.007667 NaN 1.604810e+07 NaN 3753.537871 NaN ... NaN 4.016403e-07 0.000344 NaN NaN 1.180596e-07 -0.000701 NaN NaN 4.914346e-07
1 5178.481411 13.851229 191.856548 2.681667e+07 4011.819383 5.811716 1.609469e+07 33.776043 3760.408417 6.870546 ... 8.622980e-09 2.925631e-07 0.000569 0.000225 5.063355e-08 3.233253e-07 -0.000776 -0.000075 5.666476e-09 6.026417e-07
2 5161.040643 -17.440768 304.180395 2.663634e+07 4006.766409 -5.052974 1.605418e+07 25.532548 3756.426086 -3.982332 ... 6.975673e-08 6.480347e-07 0.000655 0.000086 7.409422e-09 4.286255e-07 -0.000524 0.000253 6.390582e-08 2.740564e-07
3 5150.604178 -10.436465 108.919794 2.652872e+07 4008.586021 1.819612 1.606876e+07 3.310987 3751.566090 -4.859996 ... 1.673784e-07 1.567265e-07 0.000554 -0.000101 1.011674e-08 3.070412e-07 -0.000605 -0.000082 6.722360e-09 3.666230e-07
4 5172.441161 21.836983 476.853810 2.675415e+07 4007.189291 -1.396730 1.605757e+07 1.950854 3746.298200 -5.267890 ... 2.102616e-08 2.925631e-07 0.000997 0.000443 1.959195e-07 9.934926e-07 -0.000840 -0.000235 5.510428e-08 7.059982e-07

5 rows × 136 columns

What if we wanted to drop that file? Just like the CLI, we can use ds.drop(file_name).

result = ds.drop(file_list[0])

To confirm that it is actually removed, let’s try to load it again with pandas.

confounds = pd.read_csv(file_list[0], sep='\t')

Looks like it was successfully removed.

We can also load the entire dataset in one command if want using ds.get(dataset='.', recursive=True). We are not going to do it right now as this will take awhile and require lots of free hard disk space.

Let’s actually download one of the files we will be using in the tutorial. First, let’s use glob to get a list of all of the functional data that has been preprocessed by fmriprep, denoised, and smoothed.

file_list = glob.glob(os.path.join(localizer_path, 'derivatives', 'fmriprep', '*', 'func', '*task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz'))
file_list.sort()
file_list
['/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S01/func/sub-S01_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S02/func/sub-S02_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S03/func/sub-S03_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S04/func/sub-S04_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S05/func/sub-S05_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S06/func/sub-S06_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S07/func/sub-S07_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S08/func/sub-S08_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S09/func/sub-S09_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S10/func/sub-S10_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S11/func/sub-S11_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S12/func/sub-S12_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S13/func/sub-S13_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S14/func/sub-S14_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S15/func/sub-S15_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S16/func/sub-S16_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S17/func/sub-S17_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S18/func/sub-S18_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S19/func/sub-S19_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S20/func/sub-S20_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S21/func/sub-S21_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S22/func/sub-S22_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S23/func/sub-S23_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S24/func/sub-S24_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S25/func/sub-S25_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S26/func/sub-S26_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S27/func/sub-S27_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S28/func/sub-S28_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S29/func/sub-S29_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S30/func/sub-S30_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S31/func/sub-S31_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S32/func/sub-S32_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S33/func/sub-S33_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S34/func/sub-S34_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S35/func/sub-S35_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S36/func/sub-S36_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S37/func/sub-S37_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S38/func/sub-S38_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S39/func/sub-S39_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S40/func/sub-S40_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S41/func/sub-S41_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S42/func/sub-S42_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S43/func/sub-S43_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S44/func/sub-S44_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S45/func/sub-S45_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S46/func/sub-S46_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S47/func/sub-S47_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S48/func/sub-S48_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S49/func/sub-S49_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S50/func/sub-S50_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S51/func/sub-S51_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S52/func/sub-S52_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S53/func/sub-S53_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S54/func/sub-S54_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S55/func/sub-S55_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S56/func/sub-S56_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S57/func/sub-S57_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S58/func/sub-S58_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S59/func/sub-S59_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S60/func/sub-S60_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S61/func/sub-S61_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S62/func/sub-S62_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S63/func/sub-S63_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S64/func/sub-S64_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S65/func/sub-S65_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S66/func/sub-S66_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S67/func/sub-S67_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S68/func/sub-S68_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S69/func/sub-S69_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S70/func/sub-S70_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S71/func/sub-S71_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S72/func/sub-S72_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S73/func/sub-S73_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S74/func/sub-S74_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S75/func/sub-S75_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S76/func/sub-S76_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S77/func/sub-S77_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S78/func/sub-S78_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S79/func/sub-S79_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S80/func/sub-S80_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S81/func/sub-S81_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S82/func/sub-S82_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S83/func/sub-S83_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S84/func/sub-S84_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S85/func/sub-S85_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S86/func/sub-S86_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S87/func/sub-S87_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S88/func/sub-S88_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S89/func/sub-S89_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S90/func/sub-S90_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S91/func/sub-S91_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S92/func/sub-S92_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S93/func/sub-S93_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 '/Users/lukechang/Dropbox/Dartbrains/data/Localizer/derivatives/fmriprep/sub-S94/func/sub-S94_task-localizer_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz']

Now let’s download the first subject’s file using ds.get(). This file is 825mb, so this might take a few minutes depending on your internet speed.

result = ds.get(file_list[0])

How much of the dataset have we downloaded? We can check the status of the annex using ds.status(annex='all').

result = ds.status(annex='all')
1794 annex'd files (106.9 MB/42.1 GB present/total size)
1794 annex'd files (106.9 MB/42.1 GB present/total size)

Download Data for Course

Now let’s download the data we will use for the course. We will download:

  • sub-S01’s raw data

  • experimental metadata

  • preprocessed data for the first 20 subjects including the fmriprep QC reports.

result = ds.get(os.path.join(localizer_path, 'sub-S01'))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.json')))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.tsv')))
result = ds.get(glob.glob(os.path.join(localizer_path, 'phenotype')))
file_list = glob.glob(os.path.join(localizer_path, '*', 'fmriprep', 'sub*'))
file_list.sort()
for f in file_list[:20]:
    result = ds.get(f)

To get the python packages for the course be sure to read the installation instructions in the Introduction to JupyterHub tutorial.

Preprocessing

The data has already been preprocessed using fmriprep, which is a robust, but opinionated automated preprocessing pipeline developed by Russ Poldrack’s group at Stanford University. The developer’s have made a number of choices about how to preprocess your fMRI data using best practices and have created an automated pipeline using multiple software packages that are all distributed via a docker container.

Though, you are welcome to just start working right away with the preprocessed data, here are the steps to run it yourself:

    1. Install Docker and download image

    docker pull poldracklab/fmriprep:<latest-version>

    1. Run a single command in the terminal specifying the location of the data, the location of the output, the participant id, and a few specific flags depending on specific details of how you want to run the preprocessing.

    fmriprep-docker /Users/lukechang/Dropbox/Dartbrains/Data/localizer /Users/lukechang/Dropbox/Dartbrains/Data/preproc participant --participant_label sub-S01 --write-graph --fs-no-reconall --notrack --fs-license-file ~/Dropbox/Dartbrains/License/license.txt --work-dir /Users/lukechang/Dropbox/Dartbrains/Data/work

In practice, it’s alway a little bit finicky to get everything set up on a particular system. Sometimes you might run into issues with a specific missing file like the freesurfer license even if you’re not using it. You might also run into issues with the format of the data that might have some conflicts with the bids-validator. In our experience, there is always some frustrations getting this to work, but it’s very nice once it’s done.