3 Preparation
In this section, we show all code that was used to process the data.
3.1 MRI data to BIDS
3.1.1 Overview
3.1.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-bids and https://gin.g-node.org/lnnrtwttkhn/highspeed-bids.
3.1.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.1.2 Step 1: Conversion of MRI data to the Brain Imaging Data Structure (BIDS) using HeuDiConv
3.1.2.1 Overview
After MRI data was acquired at the MRI scanner, we converted all data to adhere to the Brain Imaging Data Structure (BIDS) standard. Please see the paper by Gorgoleski et al., 2016, Scientific Data for details. In short, BIDS is a community standard for organizing and describing MRI datasets.
3.1.2.2 Container: heudiconv
container, version 0.6.0
We used HeuDiConv, version 0.6.0, to convert our MRI DICOM data to the BIDS structure. First, we created a Singularity container with HeuDiConv in our cluster environment at the Max Planck Institute for Human Development, Berlin Germany. The Singularity container is separately available at https://github.com/lnnrtwttkhn/tools and was created using:
singularity pull docker://nipy/heudiconv:0.6.0
For the conversion of DICOM data acquired at the MRI scanner to BIDS-converted NIfTI-files the following scripts were used (these scripts can be found in the in the code/heudiconv/
directory):
3.1.2.3 Mapping raw DICOMS to BIDS: highspeed-heudiconv-heuristic.py
highspeed_heudiconv_heuristic.py
is a Python script, that creates a mapping between the DICOMS and the NIfTI-converted files in the BIDS structure.
#!/usr/bin/env python
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
if template is None or not template:
raise ValueError('Template must be a valid format string')
return template, outtype, annotation_classes
def infotodict(seqinfo):
# paths in BIDS format
= create_key('sub-{subject}/{session}/anat/sub-{subject}_{session}_rec-{rec}_T1w')
anat = create_key('sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_rec-{rec}_run-pre_bold')
rest_pre = create_key('sub-{subject}/{session}/func/sub-{subject}_{session}_task-rest_rec-{rec}_run-post_bold')
rest_post = create_key('sub-{subject}/{session}/func/sub-{subject}_{session}_task-highspeed_rec-{rec}_run-{item:02d}_bold')
task = create_key('sub-{subject}/{session}/fmap/sub-{subject}_{session}_rec-{rec}_dir-{dir}_epi')
fmap_topup = {anat: [], rest_pre: [], rest_post: [], task: [], fmap_topup: []}
info
for s in seqinfo:
if 'NORM' in s.image_type:
= 'prenorm'
rec else:
= 'nonorm'
rec
if ('t1' in s.series_description):
'item': s.series_id, 'rec': rec})
info[anat].append({if ('FM_' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec, 'dir': 'AP'})
info[fmap_topup].append({if ('FMInv_' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec, 'dir': 'PA'})
info[fmap_topup].append({if ('Rest_Pre' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec})
info[rest_pre].append({if ('Rest_Post' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec})
info[rest_post].append({# some participants have one post resting state labelled "Rest":
if ('Rest' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec})
info[rest_post].append({if ('Run' in s.series_description) and ('prenorm' in rec):
'item': s.series_id, 'rec': rec})
info[task].append({
return info
3.1.2.4 Changing participant IDs: highspeed-heudiconv-anonymizer.py
highspeed-heudiconv-anonymizer.py
is an executable Python script, which is used in combination with the --anon-cmd
flag of the heudiconv
command that turns the original participant IDs (that we used when we ran the study in the lab) into consecutive zero-padded numbers (see this thread on neurostars.org for details)
#!/usr/bin/env python
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: ANONYMIZE PARTICIPANT IDS DURING BIDS-CONVERSION WITH HEUDICONV
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ==============================================================================
# import relevant packages:
import sys
import os
# define paths depending on the operating systen:
if 'linux' in sys.platform:
# define the path to the text file containg the subject IDs:
= os.path.join("/code", "highspeed-participant-list.txt")
path_sublist # retrieve the user input:
= open(path_sublist, "r").read().splitlines()
ids_orig # define the number of subjects:
= ["%02d" % t for t in range(1, len(ids_orig)+1)]
ids_new # create a dictionary mapping original ids to anonymized ids:
= dict(zip(ids_orig, ids_new))
subj_map # replace the original ids with zero-padded numbers:
= sys.argv[-1]
sid if sid in subj_map:
print(subj_map[sid])
As a side note, the last step is not really necessary since zero-padded numbers are not required by the BIDS standard.
3.1.2.5 Running heudiconv
on the cluster: highspeed-heudiconv-cluster.sh
highspeed-heudiconv-cluster.sh
is a bash script that parallelizes the HeuDiConv command for each participant on the high-performance cluster of the Max Planck Institute for Human Development Berlin, Germany.
Note, that for privacy concerns we only saved the BIDS-converted data in the repo after running the defacing (see details below).
#!/usr/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: PARALLELIZE BIDS CONVERSION USING HEUDICONV ON THE MPIB CLUSTER
# PROJECT NAME: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
PATH_BASE="${HOME}"
# path to the project root directory
PATH_ROOT="${PATH_BASE}/highspeed"
# define the name of the project:
PROJECT_NAME="highspeed-bids"
# define the path to the project folder:
PATH_PROJECT="${PATH_ROOT}/${PROJECT_NAME}"
# define the path to the input directory:
PATH_INPUT="${PATH_PROJECT}/input/mri"
# define the path to the output directory
PATH_OUTPUT="${PATH_PROJECT}"
# define the path to the singularity container:
PATH_CONTAINER="${PATH_PROJECT}/tools/heudiconv/heudiconv_0.6.0.sif"
# define the path to the code main directory:
PATH_CODE="${PATH_PROJECT}/code/heudiconv"
# path to the heudiconv heuristic file:
HEURISTIC_FILE="highspeed-heudiconv-heuristic.py"
# define path to the python executable file that anonymizes the subject ids:
ANON_FILE="highspeed-heudiconv-anonymizer.py"
# make the anonymizer file executable:
chmod +x "${PATH_CODE}/$ANON_FILE"
# path to the directory where error and out path_logs of cluster jobs are saved:
PATH_LOGS="${PATH_PROJECT}/logs/heudiconv/$(date '+%Y%m%d_%H%M%S')"
# path to the text file with all subject ids:
PATH_SUB_LIST="${PATH_CODE}/highspeed-participant-list.txt"
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create directory for log files:
if [ ! -d ${PATH_LOGS} ]; then
mkdir -p ${PATH_LOGS}
echo "created ${PATH_LOGS}"
fi
# ==============================================================================
# DEFINE PARAMETERS:
# ==============================================================================
# maximum number of cpus per process:
N_CPUS=1
# memory demand in *GB*
MEM_GB=6
# memory demand in *MB*
MEM_MB="$((${MEM_GB} * 1000))"
# read subject ids from the list of the text file
SUB_LIST=$(cat ${PATH_SUB_LIST} | tr '\n' ' ')
# ==============================================================================
# RUN HEUDICONV:
# ==============================================================================
# initalize a subject counter:
SUB_COUNT=0
# loop over all subjects:
for SUB in ${SUB_LIST}; do
# update the subject counter:
let SUB_COUNT=SUB_COUNT+1
# get the subject number with zero padding:
SUB_PAD=$(printf "%02d\n" $SUB_COUNT)
# loop over all sessions:
for SES in `seq 1 2`; do
# get the session number with zero padding:
SES_PAD=$(printf "%02d\n" $SES)
# define the dicom template for the heudiconv command:
DICOM_DIR_TEMPLATE="HIGHSPEED_{subject}_HIGHSPEED_{subject}_${SES}*/*/*/*IMA"
# check the existence of the input files and continue if data is missing:
if [ ! -d ${PATH_INPUT}/HIGHSPEED_${SUB}_HIGHSPEED_${SUB}_${SES}_* ]; then
echo "No data input available for sub-${SUB} ses-${SES_PAD}!"
continue
fi
# start slurm job file:
echo "#!/bin/bash" > job
# name of the job:
echo "#SBATCH --job-name heudiconv_sub-${SUB_PAD}_ses-${SES_PAD}" >> job
# select partition:
echo "#SBATCH --partition gpu" >> job
# set the expected maximum running time for the job:
echo "#SBATCH --time 12:00:00" >> job
# determine how much RAM your operation needs:
echo "#SBATCH --mem ${MEM_GB}GB" >> job
# request multiple cpus
echo "#SBATCH --cpus-per-task ${N_CPUS}" >> job
# write output and error log to log folder:
echo "#SBATCH --output ${PATH_LOGS}/slurm-heudiconv-%j.out" >> job
# email notification on abort/end, use 'n' for no notification:
echo "#SBATCH --mail-type NONE" >> job
# set working directory
echo "#SBATCH --workdir ." >> job
# define the heudiconv command:
echo "singularity run --contain -B ${PATH_INPUT}:/input:ro \
-B ${PATH_OUTPUT}:/output:rw -B ${PATH_CODE}:/code:ro \
${PATH_CONTAINER} -d /input/${DICOM_DIR_TEMPLATE} -s ${SUB} \
--ses ${SES_PAD} -o /output -f /code/${HEURISTIC_FILE} \
--anon-cmd /code/${ANON_FILE} -c dcm2niix -b --overwrite" >> job
# submit job to cluster queue and remove it to avoid confusion:
sbatch job
rm -f job
done
done
We acquired both pre-normalized and non-normalized MRI data (see e.g., here for more information on pre-normalization). All analyses reported in the paper were based on the pre-normalized data. Only the pre-normalized data set is published because uploading the dataset in two versions (with- and without pre-normalization) would otherwise cause interference when running fMRIPrep (see here).
3.1.2.6 Resources
The following resources helped along the way (thank you, people on the internet!):
- Heudiconv documentation
- “Heudiconv: Example Usage” - Tutorial by James Kent
- “Heudiconv on multiple subjects” - Discussion on neurostars.org
- “Heudiconv across multiple sessions” - Discussion on neurostars.org
- “Heudiconv: How to turn subject IDs into consecutive zero-padded numbers as required by BIDS?” - Discussion on neurostars.org
- “DICOM to BIDS conversion” - A YouTube tutorial
- “BIDS Tutorial Series: HeuDiConv Walkthrough” - A tutorial by the Stanford Center for Reproducible Neuroscience
3.1.3 Step 2: Removal of facial features using pydeface
3.1.3.1 Overview
Facial features need to be removed from structural images before sharing the data online. See the statement from openfmri.org below regarding the importance of defacing:
To protect the privacy of the individuals who have been scanned we require that all subjects be de-identified before publishing a dataset. For the purposes of fMRI de-facing is the preferred method de-identification of scan data. Skull stripped data will not be accepted for publication.
and a second statement from the openneuro.org FAQs:
Yes. We recommend using pydeface. Defacing is strongly prefered over skullstripping, because the process is more robust and yields lower chance of accidentally removing brain tissue.
3.1.3.2 Container: pydeface
container, version 2.0.0
Defacing of all structural images was performed using pydeface
, version 2.0.0. (with nipype version 1.3.0-rc1)
To ensure robustness of the defacing procedure, we used a Singularity container for pydeface
, which we installed as follows:
singularity pull docker://poldracklab/pydeface:37-2e0c2d
All scripts that we used for defacing can be found in the code/defacing
directory.
3.1.3.3 Defacing structural images: highspeed-defacing-cluster.sh
First, we ran highspeed-defacing-cluster.sh
to deface all structural images that can be found in the corresponding BIDS data set.
Note, that this script was optimized to run on the high performance cluster of the Max Planck Institute for Human Development, Berlin.
#!/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: DEFACING ANATOMICAL MRI DATA IN A BIDS DATASET
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# define home directory:
PATH_BASE="${HOME}"
# define the name of the current task:
TASK_NAME="pydeface"
# define the name of the project:
PATH_ROOT="${PATH_BASE}/highspeed"
# define the name of the project:
PROJECT_NAME="highspeed-bids"
# define the path to the project folder:
PATH_PROJECT="${PATH_ROOT}/${PROJECT_NAME}"
# define the path to the singularity container:
PATH_CONTAINER="${PATH_PROJECT}/tools/${TASK_NAME}/${TASK_NAME}_37-2e0c2d.sif"
# path to the log directory:
PATH_LOG="${PATH_PROJECT}/logs/${TASK_NAME}"
# path to the data directory (in BIDS format):
PATH_BIDS="${PATH_PROJECT}"
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create directory for log files:
if [ ! -d ${PATH_LOG} ]; then
mkdir -p ${PATH_LOG}
else
# remove old log files inside the log container:
rm -r ${PATH_LOG}/*
fi
# ==============================================================================
# DEFINE PARAMETERS:
# ==============================================================================
# maximum number of cpus per process:
N_CPUS=1
# memory demand in *MB*
MEM_MB=500
# memory demand in *KB*
MEM_KB="$((${MEM_MB} * 1000))"
# ==============================================================================
# RUN PYDEFACE:
# ==============================================================================
for FILE in ${PATH_BIDS}/*/*/anat/*T1w.nii.gz; do
# to just get filename from a given path:
FILE_BASENAME="$(basename -- $FILE)"
# get the parent directory:
FILE_PARENT="$(dirname "$FILE")"
# create cluster job:
echo "#!/bin/bash" > job
# name of the job
echo "#SBATCH --job-name pydeface_${FILE_BASENAME}" >> job
# set the expected maximum running time for the job:
echo "#SBATCH --time 1:00:00" >> job
# determine how much RAM your operation needs:
echo "#SBATCH --mem ${MEM_MB}MB" >> job
# email notification on abort/end, use 'n' for no notification:
echo "#SBATCH --mail-type NONE" >> job
# writelog to log folder
echo "#SBATCH --output ${PATH_LOG}/slurm-%j.out" >> job
# request multiple cpus
echo "#SBATCH --cpus-per-task ${N_CPUS}" >> job
# define the main command:
echo "singularity run --contain -B ${FILE_PARENT}:/input:rw ${PATH_CONTAINER} \
pydeface /input/${FILE_BASENAME} --force" >> job
# submit job to cluster queue and remove it to avoid confusion:
sbatch job
rm -f job
done
3.1.3.4 Replacing defaced with original images: highspeed-defacing-cleanup.sh
pydeface
creates a new file with the ending T1w_defaced.nii.gz
.
As fMRIPrep
, MRIQC
and other tools need to use the defaced instead of the original image, we need to replace the original with the defaced image.
This can be done separately after pydeface
was run.
In order to replace the original structural images with the defaced once (as recommended), we ran highspeed-defacing-cleanup.sh
.
#!/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: REPLACING ORIGINAL STRUCTURAL IMAGES WITH DEFACED STRUCTURAL IMAGES
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# define home directory
PATH_BASE="${HOME}"
# define the name of the current task:
TASK_NAME="pydeface"
# define the name of the project:
PROJECT_NAME="highspeed"
# path to the data directory (in bids format):
PATH_BIDS="${PATH_BASE}/${PROJECT_NAME}/highspeed-bids"
# ==============================================================================
# REMOVE ORIGINAL T1W IMAGES AND REPLACE WITH DEFACED ONES:
# ==============================================================================
for FILE in ${PATH_BIDS}/*/*/anat/*T1w_defaced.nii.gz; do
# to just get filename from a given path:
FILE_BASENAME="$(basename -- $FILE)"
# get the parent path of directories:
FILE_PARENT="$(dirname "$FILE")"
# get the file name without the _defaced extension:
FILE_NEW="${FILE_BASENAME//_defaced}"
# remove the undefaced T1w file:
rm -rf ${FILE_PARENT}/${FILE_NEW}
echo "removed ${FILE_PARENT}/${FILE_NEW}"
# replace the original T1w image with the defaced version:
mv ${FILE} ${FILE_PARENT}/${FILE_NEW}
echo "replaced with ${FILE}"
done
3.1.3.5 Resources
- “Pydeface defaces structural data only?!” - Discussion on neurostars.org if any other data than structural acquisitions should be defaced (short answer: no!)
- “Is/how much fmriprep (freesurfer et al) is resilient to “defacing”?" - Discussion on neurostars.org if
fMRIPrep
works well with defaced data (short answer: yes!)
3.1.4 Step 3: Adding BIDS *events.tsv
files
3.1.4.1 Overview
According to the BIDS specification …
The purpose of this file is to describe timing and other properties of events recorded during the scan.
Thus, we transformed all behavioral data that was acquired during the scanning sessions to run-specific *events.tsv
files that are compatible with the BIDS standard.
3.1.4.2 Code and software
We mainly used two MATLAB files that were run in MATLAB version R2017b. The code takes the original behavioral data files acquired during scanning as input and returns run-specific BIDS-compatible *events.tsv
files.
%% SCRIPT: CREATE EVENT.TSV FILES FROM THE BEHAVIORAL DATA FOR BIDS
% =========================================================================
% PROJECT: HIGHSPEED
% WRITTEN BY LENNART WITTKUHN 2018 - 2020
% CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
% MAX PLANCK RESEARCH GROUP NEUROCODE
% MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
% MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
% LENTZEALLEE 94, 14195 BERLIN, GERMANY
% =========================================================================
%% DEFINE PATHS AND IMPORTANT VARIABLES:
% clear the workspace and command window:
clear variables; clc;
% define the data root path
= strsplit(pwd, 'code');
path_root = path_root{1};
path_root % define the input path:
= fullfile(path_root, 'input', 'behavior', 'main');
path_input % define the output path:
= path_root;
path_output % get the contents of the output directory:
= dir(path_output);
path_output_dir % check how many subjects are in the root directory:
= sum(contains({path_output_dir.name},'sub'));
num_subs_found % extended output path used to check for old files:
= fullfile(path_output,'*','*','func');
path_old_files % find all existing events.tsv files in the output directory:
= dir(fullfile(path_old_files,'*events.tsv'));
prev_files % delete all previous events files:
for old_file = 1:length(prev_files)
delete(fullfile(prev_files(old_file).folder,prev_files(old_file).name))
end
% define the script path:
= fullfile(path_root, 'code');
path_script % read the text file containing a list of subject ids:
= dlmread(fullfile(path_script, 'heudiconv', 'highspeed-participant-list.txt'));
sub_list % turn the array with ids into a strings in a cell array:
= cellstr(num2str(sub_list));
sub_list %check if the number of subjects in the list matches the target directory
if numel(sub_list) ~= num_subs_found
warning(['Number of subjects in the data dir does not match ' ...
'number of subjects in the subject text file!']);
= cellfun(@num2str,num2cell(1:length(sub_list)),'un',0);
sub_alt_list else
= sub_list;
sub_alt_list = cellfun(@num2str,num2cell(1:num_subs_found),'un',0);
sub_alt_list end
% determine the number of study sessions:
= 2;
num_ses % determine the number of task runs per study session:
= 4;
num_run % define a cell array containing the stimulus labels in german:
= {'Gesicht','Haus','Katze','Schuh','Stuhl'};
key_set % define a cell array containing the stimulus labels in english:
= {'Face','House','Cat','Shoe','Chair'};
value_set % create a dictionary that translates the stimulus labels:
= containers.Map(key_set,value_set);
label_dict % create a 2d-array of run indices ordered by run (row) and sessions (col):
= reshape(1:num_run * num_ses, num_run, num_ses);
run_array % define the names of the four different task conditions:
= {'oddball','sequence','repetition','repetition'};
task_names %%
for sub = 1:length(sub_alt_list)
%for sub = 1:1
% initialize the maximum repetition trial index:
= 0;
max_rep % get the current subject id:
= sub_list{sub};
sub_id % print progress:
fprintf('Running sub %d of %d\n', sub, length(sub_alt_list))
% define a template string that takes subject, session and run id:
= '*sub_%s_session_%d*run_%d*';
template_string % put in the current subject, session and run id:
= sprintf(template_string,sub_id,num_ses,num_run);
file_string % read behavioral data files of all participants:
= dir(fullfile(path_input,file_string));
path_file % load the behavioral data into the workspace:
load(fullfile(path_input,path_file.name));
for session = 1:num_ses
% create a subject identifier (in bids format):
= sprintf('sub-%02d',str2double(sub_alt_list{sub}));
pad_sub % create a session identififer (in bids format):
= ['ses-0', num2str(session)];
pad_ses % combine the two identifiers as the first part of file names:
= strcat(pad_sub,'_',pad_ses);
sub_file_name % create the subject output path:
= (fullfile(path_output,pad_sub,pad_ses,'func'));
path_output_sub % create the subject directory if it does not exist yet:
if ~exist(path_output_sub,'dir')
system(sprintf('mkdir -p %s',path_output_sub));
end
for run = 1:num_run
events = table;
for cond = 1:4
= extract_bids_events(Data, Basics, Sets, pad_sub, run, session, cond);
event_all events = [events;event_all];
end
% sort by event onset (i.e., in chronological order):
events = sortrows(events,{'onset'});
% make two copies of the repetition trials:
= events.trial(contains(events.condition, 'repetition'));
rep_trials_old = rep_trials_old;
rep_trials_new % get the old trial indices while maintaining their order:
= unique(rep_trials_old, 'stable');
trial_old % get the number of repetition trials in the current run:
= length(trial_old);
n_rep_trials % create new trial indices depending on the running number of
% repetition trials:
= max_rep+1:max_rep+n_rep_trials;
trial_new % change the old trial indices
for i = 1:n_rep_trials
== trial_old(i)) = trial_new(i);
rep_trials_new(rep_trials_old end
% update the repetition trials of the events files:
events.trial(contains(events.condition, 'repetition')) = rep_trials_new;
% update the counter of the maximum repetition trial index:
= max(unique(events.trial(contains(events.condition, 'repetition'))));
max_rep % create template string file for data output (tsv format):
= '_task-highspeed_rec-prenorm_run-0%d_events';
string_template % write conditon and run information into the string:
= sprintf(string_template,run);
string_written % create the full filenames:
= strcat(sub_file_name,string_written);
outfile_name % create paths of the tsv and csv files:
= fullfile(path_output_sub,strcat(outfile_name,'.tsv'));
path_tsv = fullfile(path_output_sub,strcat(outfile_name,'.csv'));
path_csv % write the events table as csv file:
events,path_csv,'Delimiter','\t');
writetable(% copy the created file from csv to tsv file:
copyfile(path_csv,path_tsv)
% delete the csv file:
delete(path_csv);
end
end
end
function event_all = extract_bids_events(Data, Basics, Sets, pad_sub, run, ses, cond)
%% CONVERT DATASET TO TABLE AND GET THE VARIABLE NAMES:
% convert data table from dataset to table type:
= dataset2table(Data(cond).data);
data % get the variable names of the current data table:
= data.Properties.VariableNames;
var_names % get all events we want to convert:
= var_names(contains(var_names, {'tFlip'}));
all_events %% BASIC TASK STATS
% determine the number of study sessions:
= 2;
num_ses % determine the number of task runs per study session:
= 4;
num_run % get the indices of the current sessions (as booleans):
= Basics.runInfo.session == ses;
idx_session % get the indices of the current run (as booleans):
= Basics.runInfo.run == run;
idx_run % get the timestamp of of the first scanner trigger:
= Basics.runInfo.tTrigger(idx_session & idx_run);
t_trigger % get the data indices of the current session:
= data.session == ses;
idx_data_ses % get the data indices of the current run within session:
= data.run == run;
idx_data_run % combine the indices to get the correct data indices:
index = idx_data_ses & idx_data_run;
% create a 2d-array of run indices ordered by run (row) and sessions (col):
= reshape(1:num_run * num_ses, num_run, num_ses);
run_array % define the names of the four different task conditions:
= {'oddball','sequence','repetition','repetition'};
task_names %% DEFINE DICTIONAIRY FOR THE STIMULUS LABELS:
% define a cell array containing the stimulus labels in german:
= {'Gesicht','Haus','Katze','Schuh','Stuhl'};
keys_stim % define a cell array containing the stimulus labels in english:
= {'face','house','cat','shoe','chair'};
value_stim % create a dictionary that translates the stimulus labels:
= containers.Map(keys_stim,value_stim);
dict_stim %% DEFINE DICTIONAIRY FOR THE EVENTS:
% define a cell array containing the stimulus labels in german:
= {'tFlipCue','tFlipBlank','tFlipFix','tFlipStim','tFlipITI','tFlipDelay','tFlipResp','tResponse'};
keys_type % define a cell array containing the stimulus labels in english:
= {'cue','blank','fixation','stimulus','interval','delay','choice','response'};
value_type % create a dictionary that translates the stimulus labels:
= containers.Map(keys_type,value_type);
dict_type %% LOOP OVER ALL EVENTS AND GATHER THE EVENT INFORMATION
= table;
event_all for i = 1:length(all_events)
% get the current event:
= all_events{i};
event_type % get the number of sequential stimuli of that event:
= size(data{:,event_type},2);
num_seq_stim % number of trials of cond in the current run and session:
= sum(index) * num_seq_stim;
num_events % initialize empty events struct
= struct;
event
% onsets, in seconds from first trigger:
= data{index, event_type} - t_trigger;
event.onset = reshape(transpose(event.onset),[],1);
event.onset
% duration, in seconds
if strcmp(event_type,'tFlipCue')
= repmat(Basics.tTargetCue,num_events,1);
event.duration elseif strcmp(event_type, 'tFlipBlank')
= repmat(Basics.tPreFixation,num_events,1);
event.duration elseif strcmp(event_type, 'tFlipFix')
= repmat(Basics.tFixation,num_events,1);
event.duration elseif strcmp(event_type, 'tFlipStim')
= repmat(Sets(cond).set.tStim,num_events,1);
event.duration elseif strcmp(event_type, 'tFlipStim')
= repmat(Sets(cond).set.tStim,num_events,1);
event.duration elseif strcmp(event_type, 'tFlipITI')
= repelem(data.tITI(index,:),num_seq_stim,1);
event.duration elseif strcmp(event_type, 'tFlipDelay')
= (data{index, 'tFlipResp'} - t_trigger) - event.onset;
event.duration elseif strcmp(event_type, 'tFlipResp')
= repmat(Basics.tResponseLimit,num_events,1);
event.duration end
% participant id
= repmat({pad_sub},num_events,1);
event.subject % add column that contains the session identifier:
= repmat(ses,num_events,1);
event.session % run within session:
= repmat(run,num_events,1);
event.run_session % run across the entire experiment:
= repmat(run_array(run,ses),num_events,1);
event.run_study % add column that contains the trial counter
if cond == 4
= 41:1:45;
trial_indices = repelem(trial_indices(index)',num_seq_stim,1);
event.trial else
= repelem(find(index),num_seq_stim,1);
event.trial end
% add column that contains the condition:
= repmat(task_names(cond),num_events,1);
event.condition % add column that contains the trial type:
= (repmat({dict_type(event_type)},num_events,1));
event.trial_type
% initialize all other event information:
= nan(num_events,1);
event.serial_position = nan(num_events,1);
event.interval_time = nan(num_events,1);
event.stim_orient = nan(num_events,1);
event.stim_index = event.trial_type;
event.stim_label %event.stim_file = strcat('images/',event.stim_label,'.jpg');
= nan(num_events,1);
event.target = nan(num_events,1);
event.nontarget = nan(num_events,1);
event.key_down = repmat({NaN},num_events,1);
event.key_id = repmat({NaN},num_events,1);
event.key_target = nan(num_events,1);
event.accuracy = nan(num_events,1);
event.response_time
if strcmp(event_type, 'tFlipStim')
% add column that contains the sequential position:
= repmat(1:num_seq_stim,1,sum(index))';
event.serial_position % add column that contains the inter-stimulus interval:
= repelem(data.tITI(index,:),num_seq_stim,1);
event.interval_time % add column that contains the stimulus orientation:
= repelem(data.orient(index,:),num_seq_stim,1);
event.stim_orient % get stimulus labels of the current run:
= data.stimIndex(index,:);
event.stim_index = reshape(transpose(event.stim_index),[],1);
event.stim_index % add column that contains the path to the stimulus folder:
= transpose(value_stim(event.stim_index));
event.stim_label %event.stim_file = strcat('images/',event.stim_label,'.jpg');
% add column that indicates whether stimulus is a target:
if cond == 1
= double(event.stim_orient == 180);
event.target = nan(sum(index) * num_seq_stim,1);
event.nontarget elseif cond == 2 || cond == 3 || cond == 4
= data.stimIndex(index,:);
A = data.targetPos(index,:);
V = data.targetPosAlt(index,:);
W = bsxfun(@eq, cumsum(ones(size(A)), 2), V);
event.target = reshape(transpose(event.target),[],1);
event.target = bsxfun(@eq, cumsum(ones(size(A)), 2), W);
event.nontarget = reshape(transpose(event.nontarget),[],1);
event.nontarget end
end
% add participant responses:
if (strcmp(event_type, 'tFlipStim') && strcmp(task_names{cond}, 'oddball')) || ...
strcmp(event_type, 'tFlipResp') && ~strcmp(task_names{cond}, 'oddball'))
(% key press
= repelem(data.keyIsDown(index,:),num_seq_stim,1);
event.key_down % key identity
= repelem(data.keyIndex(index,:),num_seq_stim,1);
event.key_id if ~isempty(event.key_id)
= cellstr(num2str(event.key_id));
event.key_id strcmp(strrep(event.key_id,' ',''),'90')) = {'left'};
event.key_id(strcmp(strrep(event.key_id,' ',''),'71')) = {'right'};
event.key_id(~strcmp(event.key_id,'left') & ...
event.key_id(~strcmp(event.key_id,'right')) = {NaN};
end
% key target
if ismember('keyTarget',data.Properties.VariableNames)
= repelem(data.keyTarget(index,:),num_seq_stim,1);
event.key_target else
= repmat({NaN},sum(index) * num_seq_stim,1);
event.key_target end
% accuracy
= repelem(data.acc(index,:),num_seq_stim,1);
event.accuracy % response time
= repelem(data.rt(index,:),num_seq_stim,1);
event.response_time
end
events = struct2table(event);
= [event_all;events];
event_all end
% remove all events that have no onset:
isnan(event_all.onset),:) = [];
event_all(end
3.1.5 Step 4: Adding additional information to the BIDS dataset
3.1.5.1 Code and software
First, we created a virtual environment using Python version 3.8.5 to install all dependencies and required packages.
A list of all required packages can be found in requirements.txt
(created using pip freeze > requirements.txt
) and installed via pip install -r requirements.txt
.
mkvirtualenv highspeed-bids -p $(which python3)
python --version
$ Python 3.8.5
3.1.5.2 Updating dataset_description.json
: python highspeed-bids-description.py
highspeed-bids-description.py
is a short Python script that loads the dataset_description.json
file that is pre-generated by HeuDiConv and populates it with the relevant study informtion.
datalad run -m "create dataset_description.json" --output "dataset_description.json" "python3 code/bids_conversion/highspeed-bids-description.py"
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: UPDATE OF BIDS DIRECTORY
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
import json
import os
# ======================================================================
# DEFINE PATHS
# ======================================================================
# path to the project root:
= 'highspeed-bids'
project_name = os.getcwd().split(project_name)[0] + project_name
path_root = os.path.join(path_root, 'dataset_description.json')
path_desc # ======================================================================
# UPDATE DATA-SET DESCRIPTION FILE
# ======================================================================
# open the dataset_description.json file:
with open(path_desc) as json_file:
= json.load(json_file)
json_desc
json_file.close()# update fields of the json file:
"Acknowledgements"] = "This work was funded by a research group grant awarded to NWS by the Max Planck Society (M.TN.A.BILD0004). We thank Eran Eldar, Sam Hall-McMaster and Ondrej Zika for helpful comments on a previous version of this manuscript, Gregor Caregnato for help with participant recruitment and data collection, Anika Loewe, Sonali Beckmann and Nadine Taube for assistance with MRI data acquisition, Lion Schulz for help with behavioral data analysis, Michael Krause for support with cluster computing and all participants for their participation. Lennart Wittkuhn is a pre-doctoral fellow of the International Max Planck Research School on Computational Methods in Psychiatry and Ageing Research (IMPRS COMP2PSYCH). The participating institutions are the Max Planck Institute for Human Development, Berlin, Germany, and University College London, London, UK. For more information, see https://www.mps-ucl-centre.mpg.de/en/comp2psych."
json_desc["Authors"] = ["Lennart Wittkuhn", "Nicolas W. Schuck"]
json_desc["Funding"] = ["M.TN.A.BILD0004"]
json_desc["DatasetDOI"] = "https://gin.g-node.org/lnnrtwttkhn/highspeed-bids/"
json_desc["License"] = "Creative Commons Attribution-NonCommercial-ShareAlike 4.0"
json_desc["Name"] = "Faster than thought: Detecting sub-second activation sequences with sequential fMRI pattern analysis"
json_desc["ReferencesAndLinks"] = ["Wittkuhn, L. and Schuck, N. W. (2020). Faster than thought: Detecting sub-second activation sequences with sequential fMRI pattern analysis. bioRxiv. doi: 10.1101/2020.02.15.950667"]
json_desc["HowToAcknowledge"] = "Please cite: Wittkuhn, L. and Schuck, N. W. (2020). Faster than thought: Detecting sub-second activation sequences with sequential fMRI pattern analysis. bioRxiv. doi: 10.1101/2020.02.15.950667"
json_desc["EthicsApprovals"] = ["The research protocol was approved by the ethics commission of the German Psychological Society (DPGs), reference number: NS 012018"]
json_desc[# save updated data-set_description.json file:
with open(path_desc, 'w') as outfile:
=4)
json.dump(json_desc, outfile, indent outfile.close()
3.1.5.3 Updating the .json
files of fieldmap data: python highspeed-bids-fieldmaps.py
During later pre-processing of MRI data with fMRIPrep
, we want to use our fieldmap data for distortion correction.
To ensure that fMRIPrep
detects and uses the fieldmap data, we need to add the IntendedFor
field to the .json
files of the fieldmap data and provide relative paths to the functional task data (see here for more information).
This is done using the code/bids_conversion/highspeed-bids-fieldmaps.py
file.
To detect the provencance of the run command in our DataLad dataset, we run:
datalad run -m "create IntendedFor fields in fieldmap .json files" \
--input "./*/*/*/*.nii.gz" --output "./*/*/fmap/*.json" \
"python3 code/code/bids_conversion/highspeed_bids_fieldmaps.py"
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: UPDATE THE FIELDMAP JSON FILES
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
import os
import glob
import json
import stat
# ======================================================================
# DEFINE PATHS
# ======================================================================
# to run type python3 bids_fieldmaps_info.py $PATH_BIDS
# where $PATH_BIDS is the path to your BIDS directory
# path to the project root:
= 'highspeed-bids'
project_name = os.getcwd().split(project_name)[0] + project_name
path_root = os.path.join(path_root, '*', '*', 'fmap', '*.json')
path_fmap = os.path.join(path_root, '*', '*', 'func', '*.nii.gz')
path_func # ======================================================================
# UPDATE FIELDMAP JSON FILES
# ======================================================================
# get all fieldmap files in the data-set:
= glob.glob(path_fmap)
files_fmap # loop over all field-map files:
for file_path in files_fmap:
# open the .json file of the fieldmap acquisition:
with open(file_path, 'r') as in_file:
= json.load(in_file)
json_info
in_file.close()# get the path to the session folder of a specific participant:
= os.path.dirname(os.path.dirname(file_path))
file_base # get the path to all functional acquisitions in that session:
= glob.glob(os.path.join(file_base, 'func', '*nii.gz'))
files_func = os.path.basename(file_base)
session = os.path.join(session, 'func')
up_dirs = [os.path.join(up_dirs, os.path.basename(file)) for file in files_func]
intended_for "IntendedFor"] = sorted(intended_for)
json_info[# change file permissions to read:
= os.stat(file_path).st_mode
permissions =file_path, mode=permissions | stat.S_IWUSR)
os.chmod(path# save updated fieldmap json-file:
with open(file_path, 'w') as out_file:
=2, sort_keys=True)
json.dump(json_info, out_file, indent
out_file.close()# change file permissions back to read-only:
=file_path, mode=permissions) os.chmod(path
3.1.5.4 Creating the participants.tsv
and participants.json
files
According to the BIDS specification, version 1.1.0 …
[…] the purpose of this file is to describe properties of participants such as age, handedness, sex, etc. In case of single session studies this file has one compulsory column participant_id that consists of sub-
, followed by a list of optional columns describing participants. Each participant needs to be described by one and only one row.
In our case, this information was saved in the original behavioral data files that were created when participants performed the task during scanning.
To extract this information from the behavioral files into the participants.tsv
file we use the code/events/highspeed_bids_participants.m
file.
Since wrapping the execution of this script into a datalad
run command appeared challenging, we just ran the script with paths not self-contained and just saved the changes in files.
First, we had to unlock the participants.tsv
file using datalad unlock participants.tsv
The we ran highspeed_bids_participants.m
using MATLAB version R2017b and saved the changes using datalad save
.
%% HIGHSPEED: GET DATA OF THE HIGHSPEED TASK
clear variables; clc; % clear workspace and command window
= strsplit(pwd,'code');
path_base = path_base{1};
path_base = fullfile(path_base, 'input', 'behavior', 'main');
path_input = path_base;
path_output = fullfile(...
path_digitspan , 'input', 'behavior', 'digitspan');
path_base= dlmread(fullfile(...
allID , 'code', 'heudiconv', 'highspeed-participant-list.txt'));
path_base= length(allID);
num_subs
% get data
= dir(path_input);
dirData = {dirData.name};
dirData = dirData(...
dataFiles ,'session_1_run_4') & ...
contains(dirData,cellstr(num2str(allID))));
contains(dirData
= table;
covariates = cell(num_subs,1);
covariates.participant_id = nan(num_subs,1);
covariates.age = cell(num_subs,1);
covariates.sex = cell(num_subs,1);
covariates.handedness = nan(num_subs,1);
covariates.digit_span = nan(num_subs,1);
covariates.randomization = nan(num_subs,1);
covariates.session_interval
% study intervals, ordered by participant ids:
= {
intervals 1, 13, 4, 4, 17, 8, 14, 6, 7, 10, ...
7, 6, 18, 4, 8, 5, 23, 3, 1, 12, ...
9, 8, 24, 21, 17, 21, 14, 4, 4, 9, ...
7, 7, 11, 7, 14, 2, 1, 5, 3, 3};
% create a dictionary that maps IDs to intervals:
= containers.Map(allID,intervals);
interval_dict
= 'highspeed_task_mri_sub_%d_session_%d_run_%d.mat';
filetemplate fprintf('List of missing data:\n')
for sub = 1:num_subs
% get correct ids:
= allID(sub);
id_orig = sprintf('sub-%02d', sub);
id_new % load task statistics
= 1; run = 4;
session = sprintf(filetemplate,allID(sub),session,run);
filename = dirData(contains(dirData,filename));
dataframe if ~isempty(dataframe)
load(fullfile(path_input,filename));
= id_new;
covariates.participant_id{sub} = Parameters.subjectInfo.age;
covariates.age(sub) = Parameters.subjectInfo.gender;
covariates.sex{sub} = 'right';
covariates.handedness{sub} = Parameters.subjectInfo.cbal;
covariates.randomization(sub) = interval_dict(id_orig);
covariates.session_interval(sub) else
= strcat(str,'- all behavioral data\n');
str end
% digit span
= sprintf('DigitSpan_%d.mat',allID(sub));
digitspan_file = dir(fullfile(path_digitspan));
digitspan_dir if any(contains({digitspan_dir.name},digitspan_file))
load(fullfile(path_digitspan,digitspan_file))
= nansum(Data.acc);
covariates.digit_span(sub) end
end
% WRITE DATA
,fullfile(path_output,'participants.csv'), ...
writetable(covariates'Delimiter','\t','WriteRowNames',true, ...
'QuoteStrings',true,'WriteVariableNames',true)
copyfile(fullfile(path_output,'participants.csv'), ...
fullfile(path_output,'participants.tsv'));
delete(fullfile(path_output,'participants.csv'));
Finally, we create the participants.json
file using:
datalad run -m "create participants.json" \
--output "participants.json" \
"python3 code/bids_conversion/highspeed-bids-participants.py"
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: UPDATE PARTICIPANTS .JSON FILE
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2019
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
import json
import os
# ======================================================================
# DEFINE PATHS
# ======================================================================
# path to the project root:
= 'highspeed-bids'
project_name = os.getcwd().split(project_name)[0] + project_name
path_root = os.path.join(path_root, 'participants.json')
path_desc # ======================================================================
# UPDATE DATA-SET DESCRIPTION FILE
# ======================================================================
# update fields of the json file:
= dict()
json_desc "participant_id"] = {
json_desc["Description": "Participant identifier"
}"age"] = {
json_desc["Description": "Age, in years as in the first session",
"Units": "years"
}"sex"] = {
json_desc["Description": "Sex, self-rated by participant",
"Levels": {
"m": "male",
"f": "female",
"o": "other"
}
}"handedness"] = {
json_desc["Description": "Handedness, self-rated by participant; note that participants were required to be right-handed",
"Levels": {
"right": "right",
"left": "left"
}
}"digit_span"] = {
json_desc["Description": "Total score in Digit-Span Test (Petermann & Wechsler, 2012), assessing working memory capacity",
"Units": "total scores"
}"randomization"] = {
json_desc["Description": "Pseudo-randomized group assignment for selection of sequences in sequence trials"
}"session_interval"] = {
json_desc["Description:": "Interval in days between the two experimental sessions",
"Units": "days"
}# save updated data-set_description.json file:
with open(path_desc, 'w') as outfile:
=4)
json.dump(json_desc, outfile, indent outfile.close()
3.2 MRI quality control: Running MRIQC
3.2.1 Overview
MRIQC extracts no-reference IQMs (image quality metrics) from structural (T1w and T2w) and functional MRI (magnetic resonance imaging) data.
Please see the official MRIQC documentation for details and refer to the paper listed in the References section.
3.2.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-mriqc and https://gin.g-node.org/lnnrtwttkhn/highspeed-mriqc.
3.2.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.2.2 Container: mriqc
container, version 0.15.2rc1
MRIQC quality control was performed using mriqc
, version 0.15.2rc1.
To run MRIQC, we created a Singularity container based on the MRIQC docker image:
singularity pull docker://poldracklab/mriqc:0.15.2rc1
3.2.3 MRIQC subject-level reports: highspeed-mriqc-subject-level.sh
First, MRIQC has to run on the individual subject-, (if available, session-), modality- and run- level.
This is achieved by running the highspeed-mriqc-subject-level.sh
shell-script.
The script is parallelizing all subjects and sessions on the cluster.
#!/usr/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: CREATE PARTICIPANT-LEVEL MRIQC REPORTS FOR A BIDS-STRUCTURED DATASET
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT (MPIB)
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ACKNOWLEDGEMENTS: THANKS TO ALEXANDER SKOWRON AND NIR MONETA @ MPIB FOR HELP
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# path to the base directory:
PATH_BASE="${HOME}"
# path to the project root directory
PATH_ROOT="${PATH_BASE}/highspeed"
# define the name of the project:
PROJECT_NAME="highspeed-mriqc"
# define the path to the project folder:
PATH_PROJECT="${PATH_ROOT}/${PROJECT_NAME}"
# define the name of the current task:
TASK_NAME="mriqc"
# define the path to the script main directory:
PATH_CODE="${PATH_PROJECT}/code/${TASK_NAME}"
# cd into the directory of the current task:
cd "${PATH_CODE}"
# define the path to the singularity container:
PATH_CONTAINER="${PATH_PROJECT}/tools/${TASK_NAME}/${TASK_NAME}_0.15.2rc1.sif"
# define the path for the templateflow cache
PATH_TEMPLATEFLOW="${PATH_BASE}/.cache/templateflow"
# path to the data directory (in bids format):
PATH_INPUT="${PATH_PROJECT}/bids"
# path to the output directory:
PATH_OUTPUT=${PATH_PROJECT}/${TASK_NAME}
# path to the working directory:
PATH_WORK=${PATH_PROJECT}/work
# path to the log directory:
PATH_LOG=${PATH_PROJECT}/logs/$(date '+%Y%m%d_%H%M%S')
# path to the text file with all subject ids:
PATH_SUB_LIST="${PATH_CODE}/highspeed-participant-list.txt"
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create working directory:
if [ ! -d ${PATH_WORK} ]
then
mkdir -p ${PATH_WORK}
fi
# create directory for log files:
if [ ! -d ${PATH_LOG} ]
then
mkdir -p ${PATH_LOG}
else
# remove old log files inside the log container:
rm -r ${PATH_LOG}/*
fi
# ==============================================================================
# DEFINE PARAMETERS:
# ==============================================================================
# maximum number of cpus per process:
N_CPUS=5
# memory demand in *GB*
MEM_GB=10
# read subject ids from the list of the text file:
SUB_LIST=$(cat ${PATH_SUB_LIST} | tr '\n' ' ')
# declare an array with sessions you want to run:
declare -a SESSIONS=("01" "02")
#for SUB in ${SUB_LIST}; do
# ==============================================================================
# RUN MRIQC:
# ==============================================================================
# initilize a subject counter:
SUB_COUNT=0
for SUB in ${SUB_LIST}; do
# update the subject counter:
let SUB_COUNT=SUB_COUNT+1
# create the subject number with zero-padding:
SUB_PAD=$(printf "%02d\n" ${SUB_COUNT})
# loop over all sessions:
for SES in ${SESSIONS[@]}; do
# create a new job file:
echo "#!/bin/bash" > job
# name of the job
echo "#SBATCH --job-name mriqc_sub-${SUB_PAD}_ses-${SES}" >> job
# add partition to job
echo "#SBATCH --partition gpu" >> job
# set the expected maximum running time for the job:
echo "#SBATCH --time 24:00:00" >> job
# determine how much RAM your operation needs:
echo "#SBATCH --mem ${MEM_GB}GB" >> job
# email notification on abort/end, use 'n' for no notification:
echo "#SBATCH --mail-type NONE" >> job
# write log to log folder:
echo "#SBATCH --output ${PATH_LOG}/slurm-mriqc-%j.out" >> job
# request multiple cpus:
echo "#SBATCH --cpus-per-task ${N_CPUS}" >> job
# export template flow environment variable:
echo "export SINGULARITYENV_TEMPLATEFLOW_HOME=/templateflow" >> job
# define the main command:
echo "singularity run --contain -B ${PATH_INPUT}:/input:ro \
-B ${PATH_OUTPUT}:/output:rw -B ${PATH_WORK}:/work:rw \
-B ${PATH_TEMPLATEFLOW}:/templateflow:rw \
${PATH_CONTAINER} /input/ /output/ participant --participant-label ${SUB_PAD} \
--session-id ${SES} -w /work/ --verbose-reports --write-graph \
--n_cpus ${N_CPUS} --mem_gb ${MEM_GB} --no-sub" >> job
# submit job to cluster queue and remove it to avoid confusion:
sbatch job
rm -f job
done
done
3.2.4 MRIQC group-level reports: highspeed-mriqc-group-level.sh
Afterwards, one can run the highspeed-mriqc-group-level.sh
script to acquire group statistics of the quality metrics.
#!/usr/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: CREATE GROUP-LEVEL MRIQC REPORTS FOR A BIDS-STRUCTURED DATASET
# ONLY RUN AFTER ALL PARTICIPANT-LEVEL REPORTS ARE FINISHED!
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT (MPIB)
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ACKNOWLEDGEMENT: THANKS TO ALEXANDER SKOWRON AND NIR MONETA @ MPIB FOR HELP
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# path to the base directory:
PATH_BASE="${HOME}"
# path to the project root directory
PATH_ROOT="${PATH_BASE}/highspeed"
# define the name of the project:
PROJECT_NAME="highspeed-mriqc"
# define the path to the project folder:
PATH_PROJECT="${PATH_ROOT}/${PROJECT_NAME}"
# define the name of the current task:
TASK_NAME="mriqc"
# define the path to the script main directory:
PATH_CODE="${PATH_PROJECT}/code"
# cd into the directory of the current task:
cd "${PATH_CODE}"
# define the path to the singularity container:
PATH_CONTAINER="${PATH_PROJECT}/tools/${TASK_NAME}/${TASK_NAME}_0.15.2rc1.sif"
# define the path for the templateflow cache
PATH_TEMPLATEFLOW="${PATH_BASE}/.cache/templateflow"
# path to the data directory (in bids format):
PATH_INPUT="${PATH_PROJECT}/bids"
# path to the output directory:
PATH_OUTPUT=${PATH_PROJECT}/${TASK_NAME}
# path to the working directory:
PATH_WORK=${PATH_PROJECT}/work
# path to the log directory:
PATH_LOG=${PATH_PROJECT}/logs
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create working directory:
if [ ! -d ${PATH_WORK} ]
then
mkdir -p ${PATH_WORK}
fi
# create directory for log files:
if [ ! -d ${PATH_LOG} ]
then
mkdir -p ${PATH_LOG}
fi
# ==============================================================================
# RUN MRIQC TO CREATE THE GROUP REPORTS:
# ==============================================================================
# create group reports for the functional data:
singularity run -B ${PATH_INPUT}:/input:ro \
-B ${PATH_OUTPUT}:/output:rw -B ${PATH_WORK}:/work:rw \
-B ${PATH_TEMPLATEFLOW}:/templateflow:rw \
${PATH_CONTAINER} /input/ /output/ group --no-sub
3.2.5 References
Esteban, O., Birman, D., Schaer, M., Koyejo, O. O., Poldrack, R. A., & Gorgolewski, K. J. (2017). MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLoS ONE, 12(9), e0184661. doi: 10.1371/journal.pone.0184661
3.3 MRI pre-processing:Running fMRIPrep
3.3.1 Overview
We used fMRIPrep, version 1.2.2, to pre-process the BIDS-converted MRI data.
According to the fMRIPrep documentation …
fMRIPrep is a functional magnetic resonance imaging (fMRI) data preprocessing pipeline that is designed to provide an easily accessible, state-of-the-art interface that is robust to variations in scan acquisition protocols and that requires minimal user input, while providing easily interpretable and comprehensive error and output reporting. It performs basic processing steps (coregistration, normalization, unwarping, noise component extraction, segmentation, skullstripping etc.) providing outputs that can be easily submitted to a variety of group level analyses, including task-based or resting-state fMRI, graph theory measures, surface or volume-based statistics, etc.
Please see the fMRIPrep documentation for details and refer to the paper listed in the References section.
3.3.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-fmriprep and https://gin.g-node.org/lnnrtwttkhn/highspeed-fmriprep.
3.3.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.3.2 Software: fmriprep
container, version 1.2.2
We first generated a Singularity container based on the fMRIPrep docker image:
singularity pull docker://poldracklab/fmriprep:1.2.2
3.3.3 Run fMRIPrep on HPC
We then ran fMRIPrep on the high-performance cluster (HPC) of the Max Planck Institute for Human Development, Berlin, Germany using the code in highspeed-fmriprep-cluster.sh
:
#!/usr/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: RUN FMRIPREP ON THE MPIB CLUSTER (TARDIS)
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ACKNOWLEDGEMENTS: THANKS HRVOJE STOJIC (UCL) AND ALEX SKOWRON (MPIB) FOR HELP
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# path to the base directory:
PATH_BASE="${HOME}"
# path to the project root directory
PATH_ROOT="${PATH_BASE}/highspeed"
# define the name of the project:
PROJECT_NAME="highspeed-fmriprep"
# define the path to the project folder:
PATH_PROJECT="${PATH_ROOT}/${PROJECT_NAME}"
# define the name of the current task:
TASK_NAME="fmriprep"
# path to the current shell script:
PATH_CODE=${PATH_PROJECT}/code
# cd into the directory of the current script:
cd "${PATH_CODE}/${TASK_NAME}"
# path to the fmriprep ressources folder:
PATH_FMRIPREP=${PATH_PROJECT}/tools/${TASK_NAME}
# path to the fmriprep singularity image:
PATH_CONTAINER=${PATH_FMRIPREP}/${TASK_NAME}_1.2.2.sif
# path to the freesurfer license file on tardis:
PATH_FS_LICENSE=${PATH_FMRIPREP}/fs_600_license.txt
# path to the data directory (in bids format):
PATH_BIDS=${PATH_PROJECT}//bids
# path to the output directory:
PATH_OUT=${PATH_PROJECT}
# path to the freesurfer output directory:
PATH_FREESURFER="${PATH_OUT}/freesurfer"
# path to the working directory:
PATH_WORK=${PATH_PROJECT}/work
# path to the log directory:
PATH_LOG=${PATH_PROJECT}/logs/$(date '+%Y%m%d_%H%M%S')
# path to the text file with all subject ids:
PATH_SUB_LIST="${PATH_CODE}/fmriprep/highspeed-participant-list.txt"
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create output directory:
if [ ! -d ${PATH_OUT} ]; then
mkdir -p ${PATH_OUT}
fi
# create freesurfer output directory
if [ ! -d ${PATH_FREESURFER} ]; then
mkdir -p ${PATH_FREESURFER}
fi
# create directory for work:
if [ ! -d ${PATH_WORK} ]; then
mkdir -p ${PATH_WORK}
fi
# create directory for log files:
if [ ! -d ${PATH_LOG} ]; then
mkdir -p ${PATH_LOG}
fi
# ==============================================================================
# DEFINE PARAMETERS:
# ==============================================================================
# maximum number of cpus per process:
N_CPUS=8
# maximum number of threads per process:
N_THREADS=8
# memory demand in *GB*
MEM_GB=35
# memory demand in *MB*
MEM_MB="$((${MEM_GB} * 1000))"
# user-defined subject list
PARTICIPANTS=$1
# check if user input was supplied:
if [ -z "$PARTICIPANTS" ]; then
echo "No participant label supplied."
# read subject ids from the list of the text file
SUB_LIST=$(cat ${PATH_SUB_LIST} | tr '\n' ' ')
else
SUB_LIST=$PARTICIPANTS
fi
# ==============================================================================
# RUN FMRIPREP:
# ==============================================================================
# initalize a subject counter:
SUB_COUNT=0
# loop over all subjects:
for SUB in ${SUB_LIST}; do
# update the subject counter:
let SUB_COUNT=SUB_COUNT+1
# get the subject number with zero padding:
SUB_PAD=$(printf "%02d\n" $SUB_COUNT)
# create participant-specific working directory:
PATH_WORK_SUB="${PATH_WORK}/sub-${SUB_PAD}"
if [ ! -d ${PATH_WORK_SUB} ]; then
mkdir -p ${PATH_WORK_SUB}
fi
# create a new job file:
echo "#!/bin/bash" > job
# name of the job
echo "#SBATCH --job-name fmriprep_sub-${SUB_PAD}" >> job
# add partition to job
echo "#SBATCH --partition gpu" >> job
# set the expected maximum running time for the job:
echo "#SBATCH --time 40:00:00" >> job
# determine how much RAM your operation needs:
echo "#SBATCH --mem ${MEM_GB}GB" >> job
# email notification on abort/end, use 'n' for no notification:
echo "#SBATCH --mail-type NONE" >> job
# write log to log folder
echo "#SBATCH --output ${PATH_LOG}/slurm-fmriprep-%j.out" >> job
# request multiple cpus
echo "#SBATCH --cpus-per-task ${N_CPUS}" >> job
# define the fmriprep command:
echo "singularity run --cleanenv -B ${PATH_BIDS}:/input:ro \
-B ${PATH_OUT}:/output:rw -B ${PATH_FMRIPREP}:/utilities:ro \
-B ${PATH_FREESURFER}:/output/freesurfer:rw \
-B ${PATH_WORK_SUB}:/work:rw ${PATH_CONTAINER} \
--fs-license-file /utilities/fs_600_license.txt \
/input/ /output/ participant --participant_label ${SUB_PAD} -w /work/ \
--mem_mb ${MEM_MB} --nthreads ${N_CPUS} --omp-nthreads $N_THREADS \
--write-graph --stop-on-first-crash \
--output-space T1w fsnative template fsaverage \
--notrack --verbose --resource-monitor" >> job
# submit job to cluster queue and remove it to avoid confusion:
sbatch job
rm -f job
done
3.3.4 References
Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., and et al. (2019). fMRIPrep 1.2.2. doi:10.5281/zenodo.852659
Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., and et al. (2018). fMRIPrep: A robust preprocessing pipeline for functional MRI. Nature Methods, 16(1):111–116. doi:10.1038/s41592-018-0235-4
Esteban, O., Ciric, R., Finc, K., Blair, R. W., Markiewicz, C. J., Moodie, C. A., Kent, J. D., Goncalves, M., DuPre, E., Gomez, D. E. P., and et al. (2020). Analysis of task-based functional mri data preprocessed with fmriprep. Nature Protocols. doi:10.1038/s41596-020-0327-3
3.4 Feature selection: Anatomical masks
3.4.1 Overview
As described in the paper, we used a feature selection approach that combined binarized anatomical ROIs with functional ROIs based on first-level GLMs.
3.4.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-masks and https://gin.g-node.org/lnnrtwttkhn/highspeed-masks.
3.4.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.4.2 Creating binary anatomical masks using highspeed-masks.py
We created binarized anatomical masks of occipito-temporal cortex and hippocampus based on the participant-specific Freesurfer parcellation using a Nipype workflow:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: CREATE BINARIZED MASKS FROM SEGMENTED FUNCTIONAL IMAGES
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
# import basic libraries:
import os
import sys
import glob
import warnings
from os.path import join as opj
# import nipype libraries:
from nipype.interfaces.utility import IdentityInterface
from nipype.interfaces.io import SelectFiles, DataSink
from nipype.pipeline.engine import Workflow, Node, MapNode
# import libraries for bids interaction:
from bids.layout import BIDSLayout
# import freesurfer interfaces:
from nipype.interfaces.freesurfer import Binarize
# import fsl interfaces:
from niflow.nipype1.workflows.fmri.fsl import create_susan_smooth
import datalad.api as dl
# ======================================================================
# ENVIRONMENT SETTINGS (DEALING WITH ERRORS AND WARNINGS):
# ======================================================================
# set the fsl output type environment variable:
'FSLOUTPUTTYPE'] = 'NIFTI_GZ'
os.environ[# set the freesurfer subject directory:
'SUBJECTS_DIR'] = '/opt/software/freesurfer/6.0.0/subjects'
os.environ[# deal with nipype-related warnings:
'TF_CPP_MIN_LOG_LEVEL'] = "3"
os.environ[# filter out warnings related to the numpy package:
"ignore", message="numpy.dtype size changed*")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed*")
warnings.filterwarnings(# ======================================================================
# DEFINE PBS CLUSTER JOB TEMPLATE (NEEDED WHEN RUNNING ON THE CLUSTER):
# ======================================================================
= """
job_template #PBS -l walltime=5:00:00
#PBS -j oe
#PBS -o $HOME/highspeed/highspeed-masks/logs
#PBS -m n
#PBS -v FSLOUTPUTTYPE=NIFTI_GZ
source /etc/bash_completion.d/virtualenvwrapper
workon highspeed-masks
module load fsl/5.0
module load freesurfer/6.0.0
"""
# ======================================================================
# DEFINE PATHS AND SUBJECTS
# ======================================================================
= None
path_root = None
sub_list # path to the project root:
= 'highspeed-masks'
project_name = os.getenv('PWD').split(project_name)[0] + project_name
path_root # grab the list of subjects from the bids data set:
= opj(path_root, 'bids')
path_bids 'bids', 'participants.json'))
dl.get(opj('bids', 'sub-*', 'ses-*', '*', '*.json')))
dl.get(glob.glob(opj(path_root, 'bids', 'sub-*', 'ses-*', '*.json')))
dl.get(glob.glob(opj(path_root, 'bids', '*.json')))
dl.get(glob.glob(opj(path_root, = BIDSLayout(path_bids)
layout # get all subject ids:
= sorted(layout.get_subjects())
sub_list # create a template to add the "sub-" prefix to the ids
= ['sub-'] * len(sub_list)
sub_template # add the prefix to all ids:
= ["%s%s" % t for t in zip(sub_template, sub_list)]
sub_list # if user defined to run specific subject
= sub_list[int(sys.argv[1]):int(sys.argv[2])]
sub_list # ======================================================================
# DEFINE NODE: INFOSOURCE
# ======================================================================
# define the infosource node that collects the data:
= Node(IdentityInterface(
infosource =['subject_id']), name='infosource')
fields# let the node iterate (parallelize) over all subjects:
= [('subject_id', sub_list)]
infosource.iterables # ======================================================================
# DEFINE SELECTFILES NODE
# ======================================================================
= opj(path_root, 'fmriprep', 'fmriprep', '*', '*',
path_func 'func', '*space-T1w*preproc_bold.nii.gz')
= opj(path_root, 'fmriprep', 'fmriprep', '*',
path_func_parc '*', 'func', '*space-T1w*aparcaseg_dseg.nii.gz')
= opj(path_root, 'fmriprep', 'fmriprep', '*',
path_wholemask '*', 'func', '*space-T1w*brain_mask.nii.gz')
dl.get(glob.glob(path_func))
dl.get(glob.glob(path_func_parc))
dl.get(glob.glob(path_wholemask))# define all relevant files paths:
= dict(
templates =opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}', '*',
func'func', '*space-T1w*preproc_bold.nii.gz'),
=opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}',
func_parc'*', 'func', '*space-T1w*aparcaseg_dseg.nii.gz'),
=opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}',
wholemask'*', 'func', '*space-T1w*brain_mask.nii.gz'),
)# define the selectfiles node:
= Node(SelectFiles(templates, sort_filelist=True),
selectfiles ='selectfiles')
name# set expected thread and memory usage for the node:
= 1
selectfiles.interface.num_threads = 0.1
selectfiles.interface.estimated_memory_gb # selectfiles.inputs.subject_id = 'sub-20'
# selectfiles_results = selectfiles.run()
# ======================================================================
# DEFINE CREATE_SUSAN_SMOOTH WORKFLOW NODE
# ======================================================================
# define the susan smoothing node and specify the smoothing fwhm:
= create_susan_smooth()
susan # set the smoothing kernel:
= 4
susan.inputs.inputnode.fwhm # set expected thread and memory usage for the nodes:
'inputnode').interface.num_threads = 1
susan.get_node('inputnode').interface.estimated_memory_gb = 0.1
susan.get_node('median').interface.num_threads = 1
susan.get_node('median').interface.estimated_memory_gb = 3
susan.get_node('mask').interface.num_threads = 1
susan.get_node('mask').interface.estimated_memory_gb = 3
susan.get_node('meanfunc2').interface.num_threads = 1
susan.get_node('meanfunc2').interface.estimated_memory_gb = 3
susan.get_node('merge').interface.num_threads = 1
susan.get_node('merge').interface.estimated_memory_gb = 3
susan.get_node('multi_inputs').interface.num_threads = 1
susan.get_node('multi_inputs').interface.estimated_memory_gb = 3
susan.get_node('smooth').interface.num_threads = 1
susan.get_node('smooth').interface.estimated_memory_gb = 3
susan.get_node('outputnode').interface.num_threads = 1
susan.get_node('outputnode').interface.estimated_memory_gb = 0.1
susan.get_node(# ======================================================================
# DEFINE BINARIZE NODE
# ======================================================================
= [
mask_visual_labels 1005, 2005, # cuneus
1011, 2011, # lateral occipital
1021, 2021, # pericalcarine
1029, 2029, # superioparietal
1013, 2013, # lingual
1008, 2008, # inferioparietal
1007, 2007, # fusiform
1009, 2009, # inferiotemporal
1016, 2016, # parahippocampal
1015, 2015, # middle temporal
]= [
mask_hippocampus_labels 17, 53, # left and right hippocampus
]= [
mask_mtl_labels 17, 53, # left and right hippocampus
1016, 2016, # parahippocampal
1006, 2006, # ctx-entorhinal
]# function: use freesurfer mri_binarize to threshold an input volume
= MapNode(
mask_visual =Binarize(), name='mask_visual', iterfield=['in_file'])
interface# input: match instead of threshold
= mask_visual_labels
mask_visual.inputs.match # optimize the efficiency of the node:
= {'qsub_args': '-l nodes=1:ppn=1', 'overwrite': True}
mask_visual.plugin_args = {'qsub_args': '-l mem=100MB', 'overwrite': True}
mask_visual.plugin_args # function: use freesurfer mri_binarize to threshold an input volume
= MapNode(
mask_hippocampus =Binarize(), name='mask_hippocampus', iterfield=['in_file'])
interface# input: match instead of threshold
= mask_hippocampus_labels
mask_hippocampus.inputs.match # optimize the efficiency of the node:
= {
mask_hippocampus.plugin_args 'qsub_args': '-l nodes=1:ppn=1', 'overwrite': True}
= {
mask_hippocampus.plugin_args 'qsub_args': '-l mem=100MB', 'overwrite': True}
# function: use freesurfer mri_binarize to threshold an input volume
= MapNode(
mask_mtl =Binarize(), name='mask_mtl', iterfield=['in_file'])
interface# input: match instead of threshold
= mask_mtl_labels
mask_mtl.inputs.match # optimize the efficiency of the node:
= {'qsub_args': '-l nodes=1:ppn=1', 'overwrite': True}
mask_mtl.plugin_args = {'qsub_args': '-l mem=100MB', 'overwrite': True}
mask_mtl.plugin_args # ======================================================================
# CREATE DATASINK NODE
# ======================================================================
# create a node of the function:
= Node(DataSink(), name='datasink')
datasink # assign the path to the base directory:
= opj(path_root, 'masks')
datasink.inputs.base_directory # create a list of substitutions to adjust the filepaths of datasink:
= [('_subject_id_', '')]
substitutions # assign the substitutions to the datasink command:
= substitutions
datasink.inputs.substitutions # determine whether to store output in parameterized form:
= True
datasink.inputs.parameterization # set expected thread and memory usage for the node:
= 1
datasink.interface.num_threads = 0.2
datasink.interface.estimated_memory_gb # ======================================================================
# DEFINE WORKFLOW PIPELINE:
# ======================================================================
# initiation of the 1st-level analysis workflow:
= Workflow(name='masks')
wf # stop execution of the workflow if an error is encountered:
= {'execution': {'stop_on_first_crash': True}}
wf.config # define the base directory of the workflow:
= opj(path_root, 'work')
wf.base_dir # connect infosource to selectfiles node:
connect(infosource, 'subject_id', selectfiles, 'subject_id')
wf.# connect functional files to smoothing workflow:
connect(selectfiles, 'func', susan, 'inputnode.in_files')
wf.connect(selectfiles, 'wholemask', susan, 'inputnode.mask_file')
wf.connect(susan, 'outputnode.smoothed_files', datasink, 'smooth')
wf.# connect segmented functional files to visual mask node
connect(selectfiles, 'func_parc', mask_visual, 'in_file')
wf.connect(mask_visual, 'binary_file', datasink, 'mask_visual.@binary')
wf.connect(mask_visual, 'count_file', datasink, 'mask_visual.@count')
wf.# connect segmented functional files to hippocampus node
connect(selectfiles, 'func_parc', mask_hippocampus, 'in_file')
wf.connect(mask_hippocampus, 'binary_file', datasink, 'mask_hippocampus.@binary')
wf.connect(mask_hippocampus, 'count_file', datasink, 'mask_hippocampus.@count')
wf.# connect segmented functional files to mtl node
connect(selectfiles, 'func_parc', mask_mtl, 'in_file')
wf.connect(mask_mtl, 'binary_file', datasink, 'mask_mtl.@binary')
wf.connect(mask_mtl, 'count_file', datasink, 'mask_mtl.@count')
wf.# ======================================================================
# WRITE GRAPH AND EXECUTE THE WORKFLOW
# ======================================================================
# write the graph:
='colored', simple_form=True)
wf.write_graph(graph2use# set the maximum resources the workflow can utilize:
# args_dict = {'status_callback' : log_nodes_cb}
# execute the workflow depending on the operating system:
if 'darwin' in sys.platform:
# will execute the workflow using all available cpus:
='MultiProc')
wf.run(pluginelif 'linux' in sys.platform:
='PBS', plugin_args=dict(template=job_template)) wf.run(plugin
3.4.3 Plotting masked data using highspeed-masks-plot.py
We generated some plots of the data using the following code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: PLOT MASKS FOR ILLUSTRATION
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
from nilearn import plotting
from nilearn import image
import os
from os.path import join as opj
import glob
import re
import matplotlib.pyplot as plt
import statistics
= 'sub-01'
sub = os.getcwd()
path_root = opj(path_root, 'data', 'bids', sub, '*', 'anat', '*.nii.gz')
path_anat = sorted(glob.glob(path_anat))[0]
anat = opj(path_root, 'data', 'decoding', sub, 'data', '*.nii.gz')
path_patterns = sorted(glob.glob(path_patterns))
path_patterns = [x for x in path_patterns if 'hpc' not in x]
path_patterns = opj(path_root, 'data', 'decoding', sub, 'masks', '*union.nii.gz')
path_union = glob.glob(path_union)
path_union
# path_fmriprep = opj(path_root, 'derivatives', 'fmriprep')
# path_masks = opj(path_root, 'derivatives', 'decoding', sub)
# path_anat = opj(path_fmriprep, sub, 'anat', sub + '_desc-preproc_T1w.nii.gz')
# path_visual = opj(path_masks, 'masks', '*', '*.nii.gz')
# vis_mask = glob.glob(path_visual)[0]
# vis_mask_smooth = image.smooth_img(vis_mask, 4)
0], bg_img=anat,
plotting.plot_roi(path_union[=[30, 10, 15],
cut_coords="Region-of-interest", black_bg=True,
title='ortho', cmap='red_transparent',
display_mode=False)
draw_cross
# calculate average patterns across all trials:
= [image.mean_img(i) for i in path_patterns]
mean_patterns # check the shape of the mean patterns (should be 3D):
print(image.get_data(i).shape) for i in mean_patterns]
[# extract labels of patterns
= [re.search('union_(.+?).nii.gz', i).group(1) for i in path_patterns]
labels
# function used to plot individual patterns:
def plot_patterns(pattern, name):
= plotting.plot_stat_map(
display
pattern,=anat,
bg_img#cut_coords=[30, 29, -6],
=name,
title=True,
black_bg=True,
colorbar='ortho',
display_mode=False
draw_cross
)= opj(path_root, 'figures', 'pattern_{}.pdf').format(name)
path_save =path_save)
display.savefig(filename
display.close()return(display)
# plot individual patterns and save coordinates:
= []
coords for pattern, name in zip(mean_patterns, labels):
= plot_patterns(pattern, name)
display
coords.append(display.cut_coords)# mean_coords = [sum(x)/len(x) for x in zip(*coords)]
= [statistics.median(x) for x in zip(*coords)]
mean_coords
# create subplot with all patterns using mean coordinates:
= plt.subplots(nrows=len(path_patterns), ncols=1, figsize=(14, 20))
fig, axes for pattern, name, ax in zip(mean_patterns, labels, axes):
= plotting.plot_stat_map(
display =anat, cut_coords=mean_coords, title=name,
pattern, bg_img=True, colorbar=True, display_mode='ortho',
black_bg=False, axes=ax, symmetric_cbar=True, vmax=1)
draw_cross=0, hspace=0)
fig.subplots_adjust(wspace'figures', 'pattern_all.pdf'), bbox_inches='tight') fig.savefig(opj(path_root,
3.4.4 Software: Required packages
The requirements.txt
file lists the required packages which can be installed e.g., using pip install -r requirements.txt
annexremote==1.4.3
appdirs==1.4.4
bids-validator==1.3.12
boto==2.49.0
certifi==2019.11.28
cffi==1.14.3
chardet==3.0.4
Click==7.0
cryptography==3.2.1
datalad==0.13.5
decorator==4.4.1
Deprecated==1.2.10
docopt==0.6.2
etelemetry==0.1.2
fasteners==0.15
filelock==3.0.12
future==0.18.2
humanize==2.6.0
idna==2.8
importlib-metadata==2.0.0
iso8601==0.1.13
isodate==0.6.0
jeepney==0.4.3
joblib==0.14.1
jsmin==2.2.2
keyring==20.0.1
keyrings.alt==3.2.0
lxml==4.4.2
monotonic==1.5
msgpack==1.0.0
networkx==2.4
neurdflib==5.0.1
nibabel==3.0.0
niflow-nipype1-workflows==0.0.4
nilearn==0.6.0
nipype==1.4.0
num2words==0.5.10
numpy==1.18.0
packaging==20.0
pandas==0.25.3
patool==1.12
patsy==0.5.1
pkg-resources==0.0.0
prov==1.5.3
pybids==0.10.0
pycparser==2.20
pydot==1.4.1
pydotplus==2.0.2
PyGithub==1.53
PyJWT==1.7.1
pyparsing==2.4.6
python-dateutil==2.8.1
pytz==2019.3
rdflib==4.2.2
requests==2.22.0
scikit-learn==0.22.1
scipy==1.4.1
SecretStorage==3.2.0
simplejson==3.17.0
six==1.13.0
sklearn==0.0
SQLAlchemy==1.3.12
tqdm==4.51.0
traits==5.2.0
urllib3==1.25.7
Whoosh==2.7.4
wrapt==1.12.1
zipp==1.2.0
3.5 Feature selection: First-level GLMs
3.5.1 Overview
As described above, we used a feature selection approach that combined binarized anatomical ROIs with functional ROIs based on first-level GLMs. Below we show which code was used to run the first-level GLMs based on Nipype:
3.5.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-glm and https://gin.g-node.org/lnnrtwttkhn/highspeed-glm.
3.5.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.5.2 Main GLM workflow (highspeed-glm-main.py
)
The code below shows the main Nipype workflow to run first-level GLMs used for feature selection:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: FIRST LEVEL GLM
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ACKNOWLEDGEMENTS: THANKS TO HRVOJE STOJIC (UCL) FOR HELP
# ======================================================================
# IMPORT RELEVANT PACKAGES
# ======================================================================
# import basic libraries:
import os
import glob
import sys
import warnings
from os.path import join as opj
# import nipype libraries:
from nipype.interfaces.utility import Function, IdentityInterface
from nipype.interfaces.io import SelectFiles, DataSink
from nipype.pipeline.engine import Workflow, Node, MapNode
from nipype.utils.profiler import log_nodes_cb
from nipype import config, logging
# import spm and matlab interfaces:
from nipype.algorithms.modelgen import SpecifySPMModel
from nipype.interfaces.spm.model import (
Level1Design, EstimateModel, EstimateContrast, ThresholdStatistics,
Threshold)from nipype.interfaces.matlab import MatlabCommand
from nipype.interfaces import spm
# import fsl interfaces:
from nipype.workflows.fmri.fsl import create_susan_smooth
from nipype.interfaces.fsl.utils import ExtractROI
# import libraries for bids interaction:
from bids.layout import BIDSLayout
# import freesurfer interfaces:
# import custom functions:
from highspeed_glm_functions import (
get_subject_info, plot_stat_maps, leave_one_out)import datalad.api as dl
# ======================================================================
# ENVIRONMENT SETTINGS (DEALING WITH ERRORS AND WARNINGS):
# ======================================================================
# set the fsl output type environment variable:
'FSLOUTPUTTYPE'] = 'NIFTI_GZ'
os.environ[# deal with nipype-related warnings:
"TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ[# inhibit CTF lock
'MCR_INHIBIT_CTF_LOCK'] = '1'
os.environ[# filter out warnings related to the numpy package:
"ignore", message="numpy.dtype size changed*")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed*")
warnings.filterwarnings(# ======================================================================
# SET PATHS AND SUBJECTS
# ======================================================================
# define paths depending on the operating system (OS) platform:
= 'highspeed'
project # initialize empty paths:
= None
path_root = None
sub_list # path to the project root:
= 'highspeed-glm'
project_name = os.getenv('PWD').split(project_name)[0] + project_name
path_root if 'darwin' in sys.platform:
= '/Users/Shared/spm12'
path_spm = '/Applications/MATLAB_R2017a.app/bin/matlab -nodesktop -nosplash'
path_matlab # set paths for spm:
=path_spm, matlab_cmd=path_matlab)
spm.SPMCommand.set_mlab_paths(paths
MatlabCommand.set_default_paths(path_spm)
MatlabCommand.set_default_matlab_cmd(path_matlab)= ['sub-01']
sub_list elif 'linux' in sys.platform:
# path_matlab = '/home/mpib/wittkuhn/spm12.simg eval \$SPMMCRCMD'
# path_matlab = opj('/home', 'beegfs', 'wittkuhn', 'tools', 'spm', 'spm12.simg eval \$SPMMCRCMD')
= 'singularity run -B /home/mpib/wittkuhn -B /mnt/beegfs/home/wittkuhn /home/mpib/wittkuhn/highspeed/highspeed-glm/tools/spm/spm12.simg'
singularity_cmd = 'eval \$SPMMCRCMD'
singularity_spm = ' '.join([singularity_cmd, singularity_spm])
path_matlab =path_matlab, use_mcr=True)
spm.SPMCommand.set_mlab_paths(matlab_cmd# grab the list of subjects from the bids data set:
= BIDSLayout(opj(path_root, 'bids'))
layout # get all subject ids:
= sorted(layout.get_subjects())
sub_list # create a template to add the "sub-" prefix to the ids
= ['sub-'] * len(sub_list)
sub_template # add the prefix to all ids:
= ["%s%s" % t for t in zip(sub_template, sub_list)]
sub_list # if user defined to run specific subject
= sub_list[int(sys.argv[1]):int(sys.argv[2])]
sub_list # print the SPM version:
print('using SPM version %s' % spm.SPMCommand().version)
# ======================================================================
# DEFINE PBS CLUSTER JOB TEMPLATE (NEEDED WHEN RUNNING ON THE CLUSTER):
# ======================================================================
= """
job_template #PBS -l walltime=10:00:00
#PBS -j oe
#PBS -o /home/mpib/wittkuhn/highspeed/highspeed-glm/logs/glm
#PBS -m n
#PBS -v FSLOUTPUTTYPE=NIFTI_GZ
source /etc/bash_completion.d/virtualenvwrapper
workon highspeed-glm
module load fsl/5.0
module load matlab/R2017b
module load freesurfer/6.0.0
"""
# ======================================================================
# SETTING UP LOGGING
# ======================================================================
#path_log = opj(path_root, 'logs', 'l1analyis')
# enable debug mode for logging and configuration:
#config.enable_debug_mode()
# enable logging to file and provide path to the logging file:
#config.update_config({'logging': {'log_directory': path_log,
# 'log_to_file': True},
# 'execution': {'stop_on_first_crash': False,
# 'keep_unnecessary_outputs': 'false'},
# 'monitoring': {'enabled': True}
# })
# update the global logging settings:
# logging.update_logging(config)
# callback_log_path = opj(path_log,'ressources.log')
# logger = logging.getLogger('callback')
# logger.setLevel(logging.DEBUG)
# handler = logging.FileHandler(callback_log_path)
# logger.addHandler(handler)
# ======================================================================
# SETTING UP LOGGING
# ======================================================================
#path_log = opj(path_root, 'logs', 'l1analyis')
## create directory to save the log files if it does not exist yet:
#if not os.path.exists(path_log):
# os.makedirs(path_log)
## configure logging:
#logging.basicConfig(
# filename=opj(path_log,'log_l1analysis.log'),
# level=logging.DEBUG,
# filemode = "a+",
# format='%(asctime)s - %(levelname)s - %(message)s',
# datefmt='%d/%m/%Y %H:%M:%S')
#logging.getLogger().addHandler(logging.StreamHandler())
# ======================================================================
# LOG INFORMATION ABOUT THE EXECUTION
# ======================================================================
# print the loaded SPM version:
#analysis_name = "Level 1 GLM analysis"
#logging.info("--------------------------------------------------------")
#logging.info("Analysis: " + analysis_name)
#logging.info("SPM version: " + (spm.SPMCommand().version))
#logging.info("List of subjects:")
#logging.info(sub_list)
# ======================================================================
# DEFINE SETTINGS
# ======================================================================
# time of repetition, in seconds:
= 1.25
time_repetition # total number of runs:
= 8
num_runs # smoothing kernel, in mm:
= 4
fwhm # number of dummy variables to remove from each run:
= 0
num_dummy # ======================================================================
# DEFINE NODE: INFOSOURCE
# ======================================================================
# define the infosource node that collects the data:
= Node(IdentityInterface(
infosource =['subject_id']), name='infosource')
fields# let the node iterate (paralellize) over all subjects:
= [('subject_id', sub_list)]
infosource.iterables # ======================================================================
# DEFINE SELECTFILES NODE
# ======================================================================
= glob.glob(opj(
path_confounds 'fmriprep', 'fmriprep', '*', '*',
path_root, 'func', '*highspeed*confounds_regressors.tsv'))
= glob.glob(opj(
path_events 'bids', '*', '*', 'func', '*events.tsv'))
path_root, = glob.glob(opj(
path_func 'fmriprep', 'fmriprep', '*', '*',
path_root, 'func', '*highspeed*space-T1w*preproc_bold.nii.gz'))
= glob.glob(opj(
path_anat 'fmriprep', 'fmriprep', '*',
path_root, 'anat', '*_desc-preproc_T1w.nii.gz'))
= glob.glob(opj(
path_wholemask 'fmriprep', 'fmriprep', '*', '*',
path_root, 'func', '*highspeed*space-T1w*brain_mask.nii.gz'))
dl.get(path_confounds)
dl.get(path_events)
dl.get(path_func)
dl.get(path_anat)
dl.get(path_wholemask)# define all relevant files paths:
= dict(
templates =opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}',
confounds'*', 'func', '*highspeed*confounds_regressors.tsv'),
=opj(path_root, 'bids', '{subject_id}', '*', 'func',
events'*events.tsv'),
=opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}', '*',
func'func', '*highspeed*space-T1w*preproc_bold.nii.gz'),
=opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}',
anat'anat', '{subject_id}_desc-preproc_T1w.nii.gz'),
=opj(path_root, 'fmriprep', 'fmriprep', '{subject_id}',
wholemask'*', 'func', '*highspeed*space-T1w*brain_mask.nii.gz'),
)# define the selectfiles node:
= Node(SelectFiles(templates, sort_filelist=True),
selectfiles ='selectfiles')
name# set expected thread and memory usage for the node:
= 1
selectfiles.interface.num_threads = 0.1
selectfiles.interface.mem_gb # selectfiles.inputs.subject_id = 'sub-20'
# selectfiles_results = selectfiles.run()
# ======================================================================
# DEFINE CREATE_SUSAN_SMOOTH WORKFLOW NODE
# ======================================================================
# define the susan smoothing node and specify the smoothing fwhm:
= create_susan_smooth()
susan # set the smoothing kernel:
= fwhm
susan.inputs.inputnode.fwhm # set expected thread and memory usage for the nodes:
'inputnode').interface.num_threads = 1
susan.get_node('inputnode').interface.mem_gb = 0.1
susan.get_node('median').interface.num_threads = 1
susan.get_node('median').interface.mem_gb = 3
susan.get_node('mask').interface.num_threads = 1
susan.get_node('mask').interface.mem_gb = 3
susan.get_node('meanfunc2').interface.num_threads = 1
susan.get_node('meanfunc2').interface.mem_gb = 3
susan.get_node('merge').interface.num_threads = 1
susan.get_node('merge').interface.mem_gb = 3
susan.get_node('multi_inputs').interface.num_threads = 1
susan.get_node('multi_inputs').interface.mem_gb = 3
susan.get_node('smooth').interface.num_threads = 1
susan.get_node('smooth').interface.mem_gb = 3
susan.get_node('outputnode').interface.num_threads = 1
susan.get_node('outputnode').interface.mem_gb = 0.1
susan.get_node(# ======================================================================
# DEFINE NODE: FUNCTION TO GET THE SUBJECT-SPECIFIC INFORMATION
# ======================================================================
= MapNode(Function(
subject_info =['events', 'confounds'],
input_names=['subject_info', 'event_names'],
output_names=get_subject_info),
function='subject_info', iterfield=['events', 'confounds'])
name# set expected thread and memory usage for the node:
= 1
subject_info.interface.num_threads = 0.1
subject_info.interface.mem_gb # subject_info.inputs.events = selectfiles_results.outputs.events
# subject_info.inputs.confounds = selectfiles_results.outputs.confounds
# subject_info_results = subject_info.run()
# subject_info_results.outputs.subject_info
# ======================================================================
# DEFINE NODE: REMOVE DUMMY VARIABLES (USING FSL ROI)
# ======================================================================
# function: extract region of interest (ROI) from an image
= MapNode(ExtractROI(), name='trim', iterfield=['in_file'])
trim # define index of the first selected volume (i.e., minimum index):
= num_dummy
trim.inputs.t_min # define the number of volumes selected starting at the minimum index:
= -1
trim.inputs.t_size # define the fsl output type:
= 'NIFTI'
trim.inputs.output_type # set expected thread and memory usage for the node:
= 1
trim.interface.num_threads = 3
trim.interface.mem_gb # ======================================================================
# DEFINE NODE: LEAVE-ONE-RUN-OUT SELECTION OF DATA
# ======================================================================
= Node(Function(
leave_one_run_out =['subject_info', 'event_names', 'data_func', 'run'],
input_names=['subject_info', 'data_func', 'contrasts'],
output_names=leave_one_out),
function='leave_one_run_out')
name# define the number of rows as an iterable:
= ('run', range(num_runs))
leave_one_run_out.iterables # ======================================================================
# DEFINE NODE: SPECIFY SPM MODEL (GENERATE SPM-SPECIFIC MODEL)
# ======================================================================
# function: makes a model specification compatible with spm designers
# adds SPM specific options to SpecifyModel
= Node(SpecifySPMModel(), name="l1model")
l1model # input: concatenate runs to a single session (boolean, default: False):
= False
l1model.inputs.concatenate_runs # input: units of event onsets and durations (secs or scans):
= 'secs'
l1model.inputs.input_units # input: units of design event onsets and durations (secs or scans):
= 'secs'
l1model.inputs.output_units # input: time of repetition (a float):
= time_repetition
l1model.inputs.time_repetition # high-pass filter cutoff in secs (a float, default = 128 secs):
= 128
l1model.inputs.high_pass_filter_cutoff # ======================================================================
# DEFINE NODE: LEVEL 1 DESIGN (GENERATE AN SPM DESIGN MATRIX)
# ======================================================================
# function: generate an SPM design matrix
= Node(Level1Design(), name="l1design")
l1design # input: (a dictionary with keys which are 'hrf' or 'fourier' or
# 'fourier_han' or 'gamma' or 'fir' and with values which are any value)
= {'hrf': {'derivs': [0, 0]}}
l1design.inputs.bases # input: units for specification of onsets ('secs' or 'scans'):
= 'secs'
l1design.inputs.timing_units # input: interscan interval / repetition time in secs (a float):
= time_repetition
l1design.inputs.interscan_interval # input: Model serial correlations AR(1), FAST or none:
= 'AR(1)'
l1design.inputs.model_serial_correlations # input: number of time-bins per scan in secs (an integer):
= 16
l1design.inputs.microtime_resolution # input: the onset/time-bin in seconds for alignment (a float):
= 1
l1design.inputs.microtime_onset # set expected thread and memory usage for the node:
= 1
l1design.interface.num_threads = 2
l1design.interface.mem_gb # ======================================================================
# DEFINE NODE: ESTIMATE MODEL (ESTIMATE THE PARAMETERS OF THE MODEL)
# ======================================================================
# function: use spm_spm to estimate the parameters of a model
= Node(EstimateModel(), name="l1estimate")
l1estimate # input: (a dictionary with keys which are 'Classical' or 'Bayesian2'
# or 'Bayesian' and with values which are any value)
= {'Classical': 1}
l1estimate.inputs.estimation_method # set expected thread and memory usage for the node:
= 1
l1estimate.interface.num_threads = 2
l1estimate.interface.mem_gb # ======================================================================
# DEFINE NODE: ESTIMATE CONTRASTS (ESTIMATES THE CONTRASTS)
# ======================================================================
# function: use spm_contrasts to estimate contrasts of interest
= Node(EstimateContrast(), name="l1contrasts")
l1contrasts # input: list of contrasts with each contrast being a list of the form:
# [('name', 'stat', [condition list], [weight list], [session list])]:
# l1contrasts.inputs.contrasts = l1contrasts_list
# node input: overwrite previous results:
= True
l1contrasts.overwrite # set expected thread and memory usage for the node:
= 1
l1contrasts.interface.num_threads = 1.5
l1contrasts.interface.mem_gb # ======================================================================
# DEFINE NODE: FUNCTION TO PLOT CONTRASTS
# ======================================================================
= MapNode(Function(
plot_contrasts =['anat', 'stat_map', 'thresh'],
input_names=['out_path'],
output_names=plot_stat_maps),
function='plot_contrasts', iterfield=['thresh'])
name# input: plot data with set of different thresholds:
= [None, 1, 2, 3]
plot_contrasts.inputs.thresh # set expected thread and memory usage for the node:
= 1
plot_contrasts.interface.num_threads = 0.2
plot_contrasts.interface.mem_gb # ======================================================================
# DEFINE NODE: THRESHOLD
# ======================================================================
# function: Topological FDR thresholding based on cluster extent/size.
# Smoothness is estimated from GLM residuals but is assumed to be the
# same for all of the voxels.
= Node(Threshold(), name="thresh")
thresh # input: whether to use FWE (Bonferroni) correction for initial threshold
# (a boolean, nipype default value: True):
= False
thresh.inputs.use_fwe_correction # input: whether to use FDR over cluster extent probabilities (boolean)
= True
thresh.inputs.use_topo_fdr # input: value for initial thresholding (defining clusters):
= 0.05
thresh.inputs.height_threshold # input: is the cluster forming threshold a stat value or p-value?
# ('p-value' or 'stat', nipype default value: p-value):
= 'p-value'
thresh.inputs.height_threshold_type # input: which contrast in the SPM.mat to use (an integer):
= 1
thresh.inputs.contrast_index # input: p threshold on FDR corrected cluster size probabilities (float):
= 0.05
thresh.inputs.extent_fdr_p_threshold # input: minimum cluster size in voxels (an integer, default = 0):
= 0
thresh.inputs.extent_threshold # set expected thread and memory usage for the node:
= 1
thresh.interface.num_threads = 0.2
thresh.interface.mem_gb # ======================================================================
# DEFINE NODE: THRESHOLD STATISTICS
# ======================================================================
# function: Given height and cluster size threshold calculate
# theoretical probabilities concerning false positives
= Node(ThresholdStatistics(), name="thresh_stat")
thresh_stat # input: which contrast in the SPM.mat to use (an integer):
= 1
thresh_stat.inputs.contrast_index # ======================================================================
# CREATE DATASINK NODE (OUTPUT STREAM):
# ======================================================================
# create a node of the function:
= Node(DataSink(), name='datasink')
l1datasink # assign the path to the base directory:
= opj(path_root, 'l1pipeline')
l1datasink.inputs.base_directory # create a list of substitutions to adjust the file paths of datasink:
= [('_subject_id_', '')]
substitutions # assign the substitutions to the datasink command:
= substitutions
l1datasink.inputs.substitutions # determine whether to store output in parameterized form:
= True
l1datasink.inputs.parameterization # set expected thread and memory usage for the node:
= 1
l1datasink.interface.num_threads = 0.2
l1datasink.interface.mem_gb # ======================================================================
# DEFINE THE LEVEL 1 ANALYSIS SUB-WORKFLOW AND CONNECT THE NODES:
# ======================================================================
# initiation of the 1st-level analysis workflow:
= Workflow(name='l1analysis')
l1analysis # connect the 1st-level analysis components
connect(l1model, 'session_info', l1design, 'session_info')
l1analysis.connect(l1design, 'spm_mat_file', l1estimate, 'spm_mat_file')
l1analysis.connect(l1estimate, 'spm_mat_file', l1contrasts, 'spm_mat_file')
l1analysis.connect(l1estimate, 'beta_images', l1contrasts, 'beta_images')
l1analysis.connect(l1estimate, 'residual_image', l1contrasts, 'residual_image')
l1analysis.# ======================================================================
# DEFINE META-WORKFLOW PIPELINE:
# ======================================================================
# initiation of the 1st-level analysis workflow:
= Workflow(name='l1pipeline')
l1pipeline # stop execution of the workflow if an error is encountered:
= {'execution': {'stop_on_first_crash': True,
l1pipeline.config 'hash_method': 'timestamp'}}
# define the base directory of the workflow:
= opj(path_root, 'work')
l1pipeline.base_dir # ======================================================================
# ENABLE LOGGING:
# ======================================================================
# enable logging to file:
#config.enable_debug_mode()
#config.update_config({'logging': {'log_directory': os.getcwd(),
# 'log_to_file': True}})
#logging.update_logging(config)
# ======================================================================
# CONNECT WORKFLOW NODES:
# ======================================================================
# connect infosource to selectfiles node:
connect(infosource, 'subject_id', selectfiles, 'subject_id')
l1pipeline.# generate subject specific events and regressors to subject_info:
connect(selectfiles, 'events', subject_info, 'events')
l1pipeline.connect(selectfiles, 'confounds', subject_info, 'confounds')
l1pipeline.# connect functional files to smoothing workflow:
connect(selectfiles, 'func', susan, 'inputnode.in_files')
l1pipeline.connect(selectfiles, 'wholemask', susan, 'inputnode.mask_file')
l1pipeline.connect(susan, 'outputnode.smoothed_files', l1datasink, 'smooth')
l1pipeline.# connect smoothed functional data to the trimming node:
connect(susan, 'outputnode.smoothed_files', trim, 'in_file')
l1pipeline.# ======================================================================
# INPUT AND OUTPUT STREAM FOR THE LEVEL 1 SPM ANALYSIS SUB-WORKFLOW:
# ======================================================================
# connect regressors to the subsetting node::
connect(subject_info, 'subject_info', leave_one_run_out, 'subject_info')
l1pipeline.# connect event_names to the subsetting node:
connect(subject_info, 'event_names', leave_one_run_out, 'event_names')
l1pipeline.# connect smoothed and trimmed data to subsetting node:
connect(trim, 'roi_file', leave_one_run_out, 'data_func')
l1pipeline.# connect regressors to the level 1 model specification node:
connect(leave_one_run_out, 'subject_info', l1analysis, 'l1model.subject_info')
l1pipeline.# connect smoothed and trimmed data to the level 1 model specification:
connect(leave_one_run_out, 'data_func', l1analysis, 'l1model.functional_runs')
l1pipeline.# connect l1 contrast specification to contrast estimation
connect(leave_one_run_out, 'contrasts', l1analysis, 'l1contrasts.contrasts')
l1pipeline.# connect the anatomical image to the plotting node:
connect(selectfiles, 'anat', plot_contrasts, 'anat')
l1pipeline.# connect spm t-images to the plotting node:
connect(l1analysis, 'l1contrasts.spmT_images', plot_contrasts, 'stat_map')
l1pipeline.# connect the t-images and spm mat file to the threshold node:
connect(l1analysis, 'l1contrasts.spmT_images', thresh, 'stat_image')
l1pipeline.connect(l1analysis, 'l1contrasts.spm_mat_file', thresh, 'spm_mat_file')
l1pipeline.# connect all output results of the level 1 analysis to the datasink:
connect(l1analysis, 'l1estimate.beta_images', l1datasink, 'estimates.@beta_images')
l1pipeline.connect(l1analysis, 'l1estimate.residual_image', l1datasink, 'estimates.@residual_image')
l1pipeline.connect(l1analysis, 'l1contrasts.spm_mat_file', l1datasink, 'contrasts.@spm_mat')
l1pipeline.connect(l1analysis, 'l1contrasts.spmT_images', l1datasink, 'contrasts.@spmT')
l1pipeline.connect(l1analysis, 'l1contrasts.con_images', l1datasink, 'contrasts.@con')
l1pipeline.connect(plot_contrasts, 'out_path', l1datasink, 'contrasts.@out_path')
l1pipeline.connect(thresh, 'thresholded_map', l1datasink, 'thresh.@threshhold_map')
l1pipeline.connect(thresh, 'pre_topo_fdr_map', l1datasink, 'thresh.@pre_topo_fdr_map')
l1pipeline.# ======================================================================
# WRITE GRAPH AND EXECUTE THE WORKFLOW
# ======================================================================
# write the graph:
='colored', simple_form=True)
l1pipeline.write_graph(graph2use# set the maximum resources the workflow can utilize:
# args_dict = {'status_callback' : log_nodes_cb}
# execute the workflow depending on the operating system:
if 'darwin' in sys.platform:
# will execute the workflow using all available cpus:
='MultiProc')
l1pipeline.run(pluginelif 'linux' in sys.platform:
='PBS', plugin_args=dict(template=job_template)) l1pipeline.run(plugin
3.5.3 Extra GLM functions (highspeed-glm-functions.py
)
The main GLM Nipype workflow shown above uses the following custom functions:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# DEFINE FUNCTION: FUNCTION TO GET THE SUBJECT-SPECIFIC INFORMATION
# ======================================================================
def get_subject_info(events, confounds):
"""
FUNCTION TO GET THE SUBJECT-SPECIFIC INFORMATION
:param events: list with paths to events files
:param confounds: list with paths to confounds files
:return: Bunch object with event onsets, durations and regressors
"""
# import libraries (needed to be done in the function):
import pandas as pd
from nipype.interfaces.base import Bunch
# event types we consider:
= {
event_spec 'correct_rejection': {'target': 0, 'key_down': 0},
'hit': {'target': 1, 'key_down': 1},
'false_alarm': {'target': 0, 'key_down': 1},
'miss': {'target': 1, 'key_down': 0},
}
#event_names = ['correct_rejection']
# read the events and confounds files of the current run:
#events = selectfiles_results.outputs.events[0]
#confounds = selectfiles_results.outputs.confounds[0]
= pd.read_csv(events, sep="\t")
run_events = pd.read_csv(confounds, sep="\t")
run_confounds
# define confounds to include as regressors:
= ['trans', 'rot', 'a_comp_cor', 'framewise_displacement']
confounds
# search for confounds of interest in the confounds data frame:
= [col for col in run_confounds.columns if
regressor_names any([conf in col for conf in confounds])]
def replace_nan(regressor_values):
# calculate the mean value of the regressor:
= regressor_values.mean(skipna=True)
mean_value # replace all values containing nan with the mean value:
= mean_value
regressor_values[regressor_values.isnull()] # return list of the regressor values:
return list(regressor_values)
# create a nested list with regressor values
= [replace_nan(run_confounds[conf]) for conf in regressor_names]
regressors
= []
onsets = []
durations = []
event_names
for event in event_spec:
= list(
onset_list 'onset']
run_events['condition'] == 'oddball') &
[(run_events['target'] == event_spec[event]['target']) &
(run_events['key_down'] == event_spec[event]['key_down'])])
(run_events[
= list(
duration_list 'duration']
run_events['condition'] == 'oddball') &
[(run_events['target'] == event_spec[event]['target']) &
(run_events['key_down'] == event_spec[event]['key_down'])])
(run_events[
if (onset_list != []) & (duration_list != []):
event_names.append(event)
onsets.append(onset_list)
durations.append(duration_list)
# create a bunch for each run:
= Bunch(
subject_info =event_names, onsets=onsets, durations=durations,
conditions=regressor_names, regressors=regressors)
regressor_names
return subject_info, sorted(event_names)
# ======================================================================
# DEFINE FUNCTION: FUNCTION TO PLOT THE CONTRASTS AGAINST ANATOMICAL
# ======================================================================
def plot_stat_maps(anat, stat_map, thresh):
"""
:param anat:
:param stat_map:
:param thresh:
:return:
"""
# import libraries (needed to be done in the function):
from nilearn.plotting import plot_stat_map
from os import path as op
= op.join(op.dirname(stat_map), 'contrast_thresh_%s.png' % thresh)
out_path
plot_stat_map(=('Threshold: %s' % thresh),
stat_map, title=anat, threshold=thresh, dim=-1, display_mode='ortho',
bg_img=out_path)
output_file
return out_path
def leave_one_out(subject_info, event_names, data_func, run):
"""
Select subsets of lists in leave-one-out fashion:
:param subject_info: list of subject info bunch objects
:param data_func: list of functional runs
:param run: current run
:return: return list of subject info and data excluding the current run
"""
# create new list with event_names of all runs except current run:
= [info for i, info in enumerate(event_names) if i != run]
event_names = [len(i) for i in event_names]
num_events = event_names[num_events.index(max(num_events))]
max_events
# create list of contrasts:
= 'correct_rejection'
stim = (stim, 'T', max_events, [1 if stim in s else 0 for s in max_events])
contrast1 = [contrast1]
contrasts
# create new list with subject info of all runs except current run:
= [info for i, info in enumerate(subject_info) if i != run]
subject_info # create new list with functional data of all runs except current run:
= [info for i, info in enumerate(data_func) if i != run]
data_func
# return the new lists
return subject_info, data_func, contrasts
3.5.4 Software: Required packages
The requirements.txt
file lists the required packages which can be installed e.g., using pip install -r requirements.txt
alabaster==0.7.10
annexremote==1.4.3
apipkg==1.4
appdirs==1.4.4
asn1crypto==0.24.0
atomicwrites==1.1.5
attrs==18.1.0
Babel==2.6.0
bcrypt==3.1.4
boto==2.49.0
certifi==2018.4.16
cffi==1.11.5
chardet==3.0.4
citeproc-py==0.4.0
click==6.7
codecov==2.0.15
configparser==3.5.0
coverage==4.5.1
cryptography==2.2.2
cycler==0.10.0
datalad==0.13.5
decorator==4.3.0
Deprecated==1.2.10
dipy==0.14.0
docutils==0.14
duecredit==0.6.3
execnet==1.5.0
fasteners==0.15
funcsigs==1.0.2
future==0.16.0
futures==3.1.1
grabbit==0.2.5
h5py==2.8.0
humanize==2.6.0
idna==2.6
imagesize==1.0.0
importlib-metadata==2.0.0
iso8601==0.1.13
isodate==0.6.0
jeepney==0.4.3
Jinja2==2.10
jsmin==2.2.2
keyring==20.0.1
keyrings.alt==3.2.0
kiwisolver==1.0.1
lxml==4.2.1
MarkupSafe==1.0
matplotlib==2.2.2
mock==2.0.0
monotonic==1.5
more-itertools==4.2.0
mpmath==1.0.0
msgpack==1.0.0
networkx==2.1
neurdflib==5.0.0.post1
nibabel==2.2.1
nilearn==0.4.1
nipy==0.4.2
nipype==1.1.5
nitime==0.7
num2words==0.5.6
numpy==1.14.4
numpydoc==0.8.0
packaging==17.1
pandas==0.23.4
paramiko==2.4.1
patool==1.12
patsy==0.5.0
pbr==4.0.4
pkg-resources==0.0.0
pluggy==0.6.0
prov==1.5.2
psutil==5.4.6
py==1.5.3
pyasn1==0.4.3
pybids==0.6.5
pycparser==2.18
pydeface==2.0.0
pydot==1.2.4
pydotplus==2.0.2
PyGithub==1.53
Pygments==2.2.0
PyJWT==1.7.1
PyNaCl==1.2.1
pyparsing==2.2.0
pytest==3.6.1
pytest-cov==2.5.1
pytest-env==0.6.2
pytest-forked==0.2
pytest-xdist==1.22.2
python-dateutil==2.7.3
pytz==2018.4
rdflib==4.2.2
requests==2.18.4
scikit-learn==0.20.3
scipy==1.1.0
SecretStorage==3.2.0
simplejson==3.15.0
six==1.11.0
sklearn==0.0
snowballstemmer==1.2.1
Sphinx==1.7.5
sphinxcontrib-websupport==1.1.0
sympy==1.1.1
tqdm==4.52.0
traits==4.6.0
urllib3==1.22
Whoosh==2.7.4
wrapt==1.12.1
xvfbwrapper==0.2.9
yapf==0.22.0
zipp==1.2.0
3.6 Multivariate decoding
3.6.1 Overview
3.6.1.1 Data availability
The data is freely available from https://github.com/lnnrtwttkhn/highspeed-decoding and https://gin.g-node.org/lnnrtwttkhn/highspeed-decoding.
3.6.1.2 License
The dataset is licensed under Creative Commons Attribution-ShareAlike 4.0. Please see https://creativecommons.org/licenses/by-sa/4.0/ for details.
3.6.2 Main decoding pipeline
Here, we show the Python code used to run the decoding pipeline:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ======================================================================
# SCRIPT INFORMATION:
# ======================================================================
# SCRIPT: DECODING ANALYSIS
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2019
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
'''
========================================================================
IMPORT RELEVANT PACKAGES:
========================================================================
'''
import glob
import os
import yaml
import logging
import time
from os.path import join as opj
import sys
import copy
from pprint import pformat
import numpy as np
from nilearn.input_data import NiftiMasker
from nilearn import plotting, image, masking
import pandas as pd
from matplotlib import pyplot as plt
import warnings
from nilearn.signal import clean
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import LeaveOneGroupOut
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.preprocessing import StandardScaler
from collections import Counter
import datalad.api as dl
'''
========================================================================
DEAL WITH ERRORS, WARNING ETC.
========================================================================
'''
'ignore', message='numpy.dtype size changed*')
warnings.filterwarnings('ignore', message='numpy.ufunc size changed*')
warnings.filterwarnings(= logging.getLogger('matplotlib')
mpl_logger
mpl_logger.setLevel(logging.WARNING)'''
========================================================================
SETUP NECESSARY PATHS ETC:
========================================================================
'''
# get start time of the script:
= time.time()
start # name of the current project:
= 'highspeed'
project # initialize empty variables:
= None
sub = None
path_tardis = None
path_server = None
path_local # define paths depending on the operating system (OS) platform:
if 'darwin' in sys.platform:
# define the path to the cluster:
= opj(os.environ['HOME'], 'Volumes', 'tardis_beegfs', project)
path_tardis # define the path to the server:
= opj('/Volumes', 'MPRG-Neurocode', 'Data', project)
path_server # define the path to the local computer:
= opj(os.environ['HOME'], project, '*', '*')
path_local # define the subject id:
= 'sub-14'
sub elif 'linux' in sys.platform:
# path to the project root:
= 'highspeed-decoding'
project_name = os.getenv('PWD').split(project_name)[0] + project_name
path_root # define the path to the cluster:
= path_root
path_tardis # define the path to the server:
= path_tardis
path_server # define the path to the local computer:
= opj(path_tardis, 'code', 'decoding')
path_local # define the subject id:
= 'sub-%s' % sys.argv[1]
sub '''
========================================================================
LOAD PROJECT PARAMETERS:
========================================================================
'''
= glob.glob(opj(path_local, '*parameters.yaml'))[0]
path_params with open(path_params, 'rb') as f:
= yaml.load(f, Loader=yaml.FullLoader)
params
f.close()'''
========================================================================
CREATE PATHS TO OUTPUT DIRECTORIES:
========================================================================
'''
= opj(path_tardis, 'decoding')
path_decoding = opj(path_decoding, sub)
path_out = opj(path_out, 'plots')
path_out_figs = opj(path_out, 'data')
path_out_data = opj(path_out, 'logs')
path_out_logs = opj(path_out, 'masks')
path_out_masks '''
========================================================================
CREATE OUTPUT DIRECTORIES IF THE DO NOT EXIST YET:
========================================================================
'''
for path in [path_out_figs, path_out_data, path_out_logs, path_out_masks]:
if not os.path.exists(path):
os.makedirs(path)'''
========================================================================
SETUP LOGGING:
========================================================================
'''
# remove all handlers associated with the root logger object:
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)# get current data and time as a string:
= time.strftime("%Y-%m-%d-%H-%M-%S")
timestr # create path for the logging file
= opj(path_out_logs, '%s-%s.log' % (timestr, sub))
log # start logging:
logging.basicConfig(=log, level=logging.DEBUG, format='%(asctime)s %(message)s',
filename='%d/%m/%Y %H:%M:%S')
datefmt'''
========================================================================
DEFINE DECODING SPECIFIC PARAMETERS:
========================================================================
'''
# define the mask to be used:
= 'visual' # visual or whole
mask # applied time-shift to account for the BOLD delay, in seconds:
= 4 # 4, 5 or 6 secs
bold_delay # define the degree of smoothing of the functional data
= 4
smooth '''
========================================================================
ADD BASIC SCRIPT INFORMATION TO THE LOGGER:
========================================================================
'''
'Running decoding script')
logging.info('operating system: %s' % sys.platform)
logging.info('project name: %s' % project)
logging.info('participant: %s' % sub)
logging.info('mask: %s' % mask)
logging.info('bold delay: %d secs' % bold_delay)
logging.info('smoothing kernel: %d mm' % smooth)
logging.info('''
========================================================================
DEFINE RELEVANT VARIABLES:
========================================================================
'''
# time of repetition (TR), in seconds:
= params['mri']['tr']
t_tr # number of volumes (TRs) for each functional task run:
= 530
n_tr_run # acquisition time window of one sequence trial, in seconds:
= 16
t_win # number of measurements that are considered per sequence time window:
= round(t_win / t_tr)
n_tr_win # number of oddball trials in the experiment:
= 600
n_tr_odd # number of sequence trials in the experiment:
= 75
n_tr_seq # number of repetition trials in the experiment:
= 45
n_tr_rep # number of scanner triggers before the experiment starts:
= params['mri']['num_trigger']
n_tr_wait # number of functional task runs in total:
= params['mri']['num_runs']
n_run # number of experimental sessions in total:
= params['mri']['num_sessions']
n_ses # number of functional task runs per session:
= int(n_run / n_ses)
n_run_ses '''
========================================================================
LOAD BEHAVIORAL DATA (THE EVENTS.TSV FILES OF THE SUBJECT):
========================================================================
Load the events files for the subject. The 'oddball' data serves as the
training set of the classifier. The 'sequence' data constitutes the
test set.
'''
# paths to all events files of the current subject:
= opj(path_tardis, 'bids', sub, 'ses-*', 'func', '*tsv')
path_events
dl.get(glob.glob(path_events))= sorted(glob.glob(path_events), key=lambda f: os.path.basename(f))
path_events 'found %d event files' % len(path_events))
logging.info('paths to events files (sorted):\n%s' % pformat(path_events))
logging.info(# import events file and save data in dataframe:
= pd.concat((pd.read_csv(f, sep='\t') for f in path_events),
df_events =True)
ignore_index'''
========================================================================
CREATE PATHS TO THE MRI DATA:
========================================================================
'''
# define path to input directories:
= opj(path_tardis, 'fmriprep', 'fmriprep', sub)
path_fmriprep = opj(path_tardis, 'glm', 'l1pipeline')
path_level1 = opj(path_tardis, 'masks', 'masks')
path_masks
'path to fmriprep files: %s' % path_fmriprep)
logging.info('path to level 1 files: %s' % path_level1)
logging.info('path to mask files: %s' % path_masks)
logging.info(= {
paths 'fmriprep': opj(path_tardis, 'fmriprep', 'fmriprep', sub),
'level1': opj(path_tardis, 'glm', 'l1pipeline'),
'masks': opj(path_tardis, 'masks', 'masks')
}
# load the visual mask task files:
= opj(path_masks, 'mask_visual', sub, '*', '*task-highspeed*.nii.gz')
path_mask_vis_task = sorted(glob.glob(path_mask_vis_task), key=lambda f: os.path.basename(f))
path_mask_vis_task 'found %d visual mask task files' % len(path_mask_vis_task))
logging.info('paths to visual mask task files:\n%s' % pformat(path_mask_vis_task))
logging.info(
dl.get(path_mask_vis_task)
# load the hippocampus mask task files:
= opj(path_masks, 'mask_hippocampus', sub, '*', '*task-highspeed*.nii.gz')
path_mask_hpc_task = sorted(glob.glob(path_mask_hpc_task), key=lambda f: os.path.basename(f))
path_mask_hpc_task 'found %d hpc mask files' % len(path_mask_hpc_task))
logging.info('paths to hpc mask task files:\n%s' % pformat(path_mask_hpc_task))
logging.info(
dl.get(path_mask_hpc_task)
# load the whole brain mask files:
= opj(path_fmriprep, '*', 'func', '*task-highspeed*T1w*brain_mask.nii.gz')
path_mask_whole_task = sorted(glob.glob(path_mask_whole_task), key=lambda f: os.path.basename(f))
path_mask_whole_task 'found %d whole-brain masks' % len(path_mask_whole_task))
logging.info('paths to whole-brain mask files:\n%s' % pformat(path_mask_whole_task))
logging.info(
dl.get(path_mask_whole_task)
# load the functional mri task files:
= opj(path_level1, 'smooth', sub, '*', '*task-highspeed*nii.gz')
path_func_task = sorted(glob.glob(path_func_task), key=lambda f: os.path.basename(f))
path_func_task 'found %d functional mri task files' % len(path_func_task))
logging.info('paths to functional mri task files:\n%s' % pformat(path_func_task))
logging.info(
dl.get(path_func_task)
# define path to the functional resting state runs:
= opj(path_tardis, 'masks', 'masks', 'smooth', sub, '*', '*task-rest*nii.gz')
path_rest = sorted(glob.glob(path_rest), key=lambda f: os.path.basename(f))
path_rest 'found %d functional mri rest files' % len(path_rest))
logging.info('paths to functional mri rest files:\n%s' % pformat(path_rest))
logging.info(
dl.get(path_rest)
# load the anatomical mri file:
= opj(path_fmriprep, 'anat', '%s_desc-preproc_T1w.nii.gz' % sub)
path_anat = sorted(glob.glob(path_anat), key=lambda f: os.path.basename(f))
path_anat 'found %d anatomical mri file' % len(path_anat))
logging.info('paths to anatoimical mri files:\n%s' % pformat(path_anat))
logging.info(
dl.get(path_anat)
# load the confounds files:
= opj(path_fmriprep, '*', 'func', '*task-highspeed*confounds_regressors.tsv')
path_confs_task = sorted(glob.glob(path_confs_task), key=lambda f: os.path.basename(f))
path_confs_task 'found %d confounds files' % len(path_confs_task))
logging.info('found %d confounds files' % len(path_confs_task))
logging.info('paths to confounds files:\n%s' % pformat(path_confs_task))
logging.info(
dl.get(path_confs_task)
# load the spm.mat files:
= opj(path_level1, 'contrasts', sub, '*', 'SPM.mat')
path_spm_mat = sorted(glob.glob(path_spm_mat), key=lambda f: os.path.dirname(f))
path_spm_mat 'found %d spm.mat files' % len(path_spm_mat))
logging.info('paths to spm.mat files:\n%s' % pformat(path_spm_mat))
logging.info(
dl.get(path_spm_mat)
# load the t-maps of the first-level glm:
= opj(path_level1, 'contrasts', sub, '*', 'spmT*.nii')
path_tmap = sorted(glob.glob(path_tmap), key=lambda f: os.path.dirname(f))
path_tmap 'found %d t-maps' % len(path_tmap))
logging.info('paths to t-maps files:\n%s' % pformat(path_tmap))
logging.info(
dl.get(path_tmap)'''
========================================================================
LOAD THE MRI DATA:
========================================================================
'''
= image.load_img(path_anat[0])
anat 'successfully loaded %s' % path_anat[0])
logging.info(# load visual mask:
= image.load_img(path_mask_vis_task[0]).get_data().astype(int)
mask_vis 'successfully loaded one visual mask file!')
logging.info(# load tmap data:
= [image.load_img(i).get_data().astype(float) for i in copy.deepcopy(path_tmap)]
tmaps 'successfully loaded the tmaps!')
logging.info(# load hippocampus mask:
= [image.load_img(i).get_data().astype(int) for i in copy.deepcopy(path_mask_hpc_task)]
mask_hpc 'successfully loaded one visual mask file!')
logging.info('''
FEATURE SELECTION
'''
# plot raw-tmaps on an anatomical background:
'plotting raw tmaps with anatomical as background:')
logging.info(for i, path in enumerate(path_tmap):
'plotting raw tmap %s (%d of %d)' % (path, i+1, len(path_tmap)))
logging.info(= opj(path_out_figs, '%s_run-%02d_tmap_raw.png' % (sub, i+1))
path_save =os.path.basename(path_save),
plotting.plot_roi(path, anat, title=path_save, colorbar=True)
output_file
'''
========================================================================
FEATURE SELECTION: MASKS THE T-MAPS WITH THE ANATOMICAL MASKS
We create a combination of the t-maps and the anatomical mask.
To this end, we multiply the anatomical mask with each t-map.
As the anatomical conists of binary values (zeros and ones) a
multiplication results in t-map masked by the anatomical ROI.
========================================================================
'''
#v= tmaps_masked[0]
#fig = plt.figure()
#ax = fig.add_subplot(111, projection='3d')
#ax.scatter(v[:,0],v[:,1],v[:,2], zdir='z', c= 'red')
#plt.show()
# check if any value in the supposedly binary mask is bigger than 1:
if np.any(mask_vis > 1):
'WARNING: detected values > 1 in the anatomical ROI!')
logging.info("Values > 1 in the anatomical ROI!")
sys.exit(# get combination of anatomical mask and t-map
= [np.multiply(mask_vis, i) for i in copy.deepcopy(tmaps)]
tmaps_masked # masked tmap into image like object:
= [image.new_img_like(i, j) for (i, j) in zip(path_tmap, copy.deepcopy(tmaps_masked))]
tmaps_masked_img
for i, path in enumerate(tmaps_masked_img):
= opj(path_out_masks, '%s_run-%02d_tmap_masked.nii.gz' % (sub, i + 1))
path_save
path.to_filename(path_save)
# plot masked t-maps
'plotting masked tmaps with anatomical as background:')
logging.info(for i, path in enumerate(tmaps_masked_img):
'plotting masked tmap %d of %d' % (i+1, len(tmaps_masked_img)))
logging.info(= opj(path_out_figs, '%s_run-%02d_tmap_masked.png' % (sub, i+1))
path_save =os.path.basename(path_save),
plotting.plot_roi(path, anat, title=path_save, colorbar=True)
output_file'''
========================================================================
FEATURE SELECTION: THRESHOLD THE MASKED T-MAPS
We threshold the masked t-maps, selecting only voxels above AND
below this threshold. We then extract these data and count how
many voxels were above and / or below the threshold in total.
========================================================================
'''
# set the threshold:
= params['mri']['thresh']
threshold 'thresholding t-maps with a threshold of %s' % str(threshold))
logging.info(# threshold the masked tmap image:
= [image.threshold_img(i, threshold) for i in copy.deepcopy(tmaps_masked_img)]
tmaps_masked_thresh_img
'plotting thresholded tmaps with anatomical as background:')
logging.info(for i, path in enumerate(tmaps_masked_thresh_img):
= opj(path_out_figs, '%s_run-%02d_tmap_masked_thresh.png' % (sub, i+1))
path_save 'plotting masked tmap %s (%d of %d)'
logging.info(% (path_save, i + 1, len(tmaps_masked_thresh_img)))
=os.path.basename(path_save),
plotting.plot_roi(path, anat, title=path_save, colorbar=True)
output_file
# extract data from the thresholded images
= [image.load_img(i).get_data().astype(float) for i in tmaps_masked_thresh_img]
tmaps_masked_thresh
# calculate the number of tmap voxels:
= [np.size(i) for i in copy.deepcopy(tmaps_masked_thresh)]
num_tmap_voxel 'number of feature selected voxels: %s' % pformat(num_tmap_voxel))
logging.info(
= [np.count_nonzero(i) for i in copy.deepcopy(tmaps_masked_thresh)]
num_above_voxel 'number of voxels above threshold: %s' % pformat(num_above_voxel))
logging.info(
= [np.count_nonzero(i == 0) for i in copy.deepcopy(tmaps_masked_thresh)]
num_below_voxel 'number of voxels below threshold: %s' % pformat(num_below_voxel))
logging.info(
# plot the distribution of t-values:
for i, run_mask in enumerate(tmaps_masked_thresh):
= run_mask.flatten()
masked_tmap_flat = masked_tmap_flat[~np.isnan(masked_tmap_flat)]
masked_tmap_flat = masked_tmap_flat[~np.isnan(masked_tmap_flat) & ~(masked_tmap_flat == 0)]
masked_tmap_flat = opj(path_out_figs, '%s_run-%02d_tvalue_distribution.png' % (sub, i+1))
path_save 'plotting thresholded t-value distribution %s (%d of %d)'
logging.info(% (path_save, i+1, len(tmaps_masked_thresh)))
= plt.figure()
fig ='auto')
plt.hist(masked_tmap_flat, bins't-values')
plt.xlabel('number')
plt.ylabel('t-value distribution (%s, run-%02d)' % (sub, i+1))
plt.title(
plt.savefig(path_save)
# create a dataframe with the number of voxels
= pd.DataFrame({
df_thresh 'id': [sub] * n_run,
'run': np.arange(1,n_run+1),
'n_total': num_tmap_voxel,
'n_above': num_above_voxel,
'n_below': num_below_voxel
})= opj(path_out_data, '%s_thresholding.csv' % (sub))
file_name =',', index=False)
df_thresh.to_csv(file_name, sep
'''
========================================================================
FEATURE SELECTION: BINARIZE THE THRESHOLDED MASKED T-MAPS
We set all voxels above and below the threshold to 1 and all voxels
that were not selected to 0.
========================================================================
'''
# replace all NaNs with 0:
= [np.where(np.isnan(i), 0, i) for i in copy.deepcopy(tmaps_masked_thresh)]
tmaps_masked_thresh_bin # replace all other values with 1:
= [np.where(i > 0, 1, i) for i in copy.deepcopy(tmaps_masked_thresh_bin)]
tmaps_masked_thresh_bin # turn the 3D-array into booleans:
= [i.astype(bool) for i in copy.deepcopy(tmaps_masked_thresh_bin)]
tmaps_masked_thresh_bin # create image like object:
= [image.new_img_like(path_func_task[0], i.astype(np.int)) for i in copy.deepcopy(tmaps_masked_thresh_bin)]
masks_final
'plotting final masks with anatomical as background:')
logging.info(for i, path in enumerate(masks_final):
= '%s_run-%02d_tmap_masked_thresh.nii.gz'\
filename % (sub, i + 1)
= opj(path_out_masks, filename)
path_save 'saving final mask %s (%d of %d)'
logging.info(% (path_save, i+1, len(masks_final)))
path.to_filename(path_save)= opj(path_out_figs, '%s_run-%02d_visual_final_mask.png'
path_save % (sub, i + 1))
'plotting final mask %s (%d of %d)'
logging.info(% (path_save, i + 1, len(masks_final)))
=os.path.basename(path_save),
plotting.plot_roi(path, anat, title=path_save, colorbar=True)
output_file'''
========================================================================
LOAD SMOOTHED FMRI DATA FOR ALL FUNCTIONAL TASK RUNS:
========================================================================
'''
# load smoothed functional mri data for all eight task runs:
'loading %d functional task runs ...' % len(path_func_task))
logging.info(= [image.load_img(i) for i in path_func_task]
data_task 'loading successful!')
logging.info('''
========================================================================
DEFINE THE CLASSIFIERS
========================================================================
'''
= ['cat', 'chair', 'face', 'house', 'shoe']
class_labels # create a dictionary with all values as independent instances:
# see here: https://bit.ly/2J1DvZm
= {key: LogisticRegression(
clf_set =1., penalty='l2', multi_class='ovr', solver='lbfgs',
C=4000, class_weight='balanced', random_state=42) for key in class_labels}
max_iter= {
classifiers 'log_reg': LogisticRegression(
=1., penalty='l2', multi_class='multinomial', solver='lbfgs',
C=4000, class_weight='balanced', random_state=42)}
max_iter
clf_set.update(classifiers)'''
========================================================================
1. SPLIT THE EVENTS DATAFRAME FOR EACH TASK CONDITION
2. RESET THE INDICES OF THE DATAFRAMES
3. SORT THE ROWS OF ALL DATAFRAMES IN CHRONOLOGICAL ORDER
4. PRINT THE NUMBER OF TRIALS OF EACH TASK CONDITION
========================================================================
'''
class TaskData:
"""
"""
def __init__(
self, events, condition, name, trial_type, bold_delay=0,
=1, t_tr=1.25, num_vol_run=530):
intervalimport pandas as pd
# define name of the task data subset:
self.name = name
# define the task condition the task data subset if from:
self.condition = condition
# define the delay (in seconds) by which onsets are moved:
self.bold_delay = bold_delay
# define the number of TRs from event onset that should be selected:
self.interval = interval
# define the repetition time (TR) of the mri data acquisition:
self.t_tr = t_tr
# define the number of volumes per task run:
self.num_vol_run = num_vol_run
# select events: upright stimulus, correct answer trials only:
if trial_type == 'stimulus':
self.events = events.loc[
'condition'] == condition) &
(events['trial_type'] == trial_type) &
(events['stim_orient'] == 0) &
(events['serial_position'] == 1) &
(events['accuracy'] != 0),
(events[
:]elif trial_type == 'cue':
self.events = events.loc[
'condition'] == condition) &
(events['trial_type'] == trial_type),
(events[
:]# reset the indices of the data frame:
self.events.reset_index()
# sort all values by session and run:
self.events.sort_values(by=['session', 'run_session'])
# call further function upon initialization:
self.define_trs()
self.get_stats()
def define_trs(self):
# import relevant functions:
import numpy as np
# select all events onsets:
self.event_onsets = self.events['onset']
# add the selected delay to account for the peak of the hrf
self.bold_peaks = self.events['onset'] + self.bold_delay
# divide the expected time-point of bold peaks by the repetition time:
self.peak_trs = self.bold_peaks / self.t_tr
# add the number of run volumes to the tr indices:
= (self.events['run_study']-1) * self.num_vol_run
run_volumes # add the number of volumes of each run:
= round(self.peak_trs + run_volumes)
trs # copy the relevant trs as often as specified by the interval:
= np.transpose(np.tile(trs, (self.interval, 1)))
a # create same-sized matrix with trs to be added:
= np.full((len(trs), self.interval), np.arange(self.interval))
b # assign the final list of trs:
self.trs = np.array(np.add(a, b).flatten(), dtype=int)
# save the TRs of the stimulus presentations
self.stim_trs = round(self.event_onsets / self.t_tr + run_volumes)
def get_stats(self):
import numpy as np
self.num_trials = len(self.events)
self.runs = np.repeat(np.array(self.events['run_study'], dtype=int), self.interval)
self.trials = np.repeat(np.array(self.events['trial'], dtype=int), self.interval)
self.sess = np.repeat(np.array(self.events['session'], dtype=int), self.interval)
self.stim = np.repeat(np.array(self.events['stim_label'], dtype=object), self.interval)
self.itis = np.repeat(np.array(self.events['interval_time'], dtype=float), self.interval)
self.stim_trs = np.repeat(np.array(self.stim_trs, dtype=int), self.interval)
def zscore(self, signals, run_list, t_tr=1.25):
from nilearn.signal import clean
import numpy as np
# get boolean indices for all run indices in the run list:
= np.isin(self.runs, list(run_list))
run_indices # standardize data all runs in the run list:
self.data_zscored = clean(
=signals[self.trs[run_indices]],
signals=self.runs[run_indices],
sessions=t_tr,
t_r=False,
detrend=True)
standardize
def predict(self, clf, run_list):
# import packages:
import pandas as pd
# get classifier class predictions:
= clf.predict(self.data_zscored)
pred_class # get classifier probabilistic predictions:
= clf.predict_proba(self.data_zscored)
pred_proba # get the classes of the classifier:
= clf.classes_
classes_names # get boolean indices for all run indices in the run list:
= np.isin(self.runs, list(run_list))
run_indices # create a dataframe with the probabilities of each class:
= pd.DataFrame(pred_proba, columns=classes_names)
df # get the number of predictions made:
= len(df)
num_pred # get the number of trials in the test set:
= int(num_pred / self.interval)
num_trials # add the predicted class label to the dataframe:
'pred_label'] = pred_class
df[# add the true stimulus label to the dataframe:
'stim'] = self.stim[run_indices]
df[# add the volume number (TR) to the dataframe:
'tr'] = self.trs[run_indices]
df[# add the sequential TR to the dataframe:
'seq_tr'] = np.tile(np.arange(1, self.interval + 1), num_trials)
df[# add the counter of trs on which the stimulus was presented
'stim_tr'] = self.stim_trs[run_indices]
df[# add the trial number to the dataframe:
'trial'] = self.trials[run_indices]
df[# add the run number to the dataframe:
'run_study'] = self.runs[run_indices]
df[# add the session number to the dataframe:
'session'] = self.sess[run_indices]
df[# add the inter trial interval to the dataframe:
'tITI'] = self.itis[run_indices]
df[# add the participant id to the dataframe:
'id'] = np.repeat(self.events['subject'].unique(), num_pred)
df[# add the name of the classifier to the dataframe:
'test_set'] = np.repeat(self.name, num_pred)
df[# return the dataframe:
return df
= TaskData(
train_odd_peak =df_events, condition='oddball', trial_type='stimulus',
events=4, interval=1, name='train-odd_peak')
bold_delay= TaskData(
test_odd_peak =df_events, condition='oddball', trial_type='stimulus',
events=4, interval=1, name='test-odd_peak')
bold_delay= TaskData(
test_odd_long =df_events, condition='oddball', trial_type='stimulus',
events=0, interval=7, name='test-odd_long')
bold_delay= TaskData(
test_seq_long =df_events, condition='sequence', trial_type='stimulus',
events=0, interval=13, name='test-seq_long')
bold_delay= TaskData(
test_rep_long =df_events, condition='repetition', trial_type='stimulus',
events=0, interval=13, name='test-rep_long')
bold_delay= TaskData(
test_seq_cue =df_events, condition='sequence', trial_type='cue',
events=0, interval=5, name='test-seq_cue')
bold_delay= TaskData(
test_rep_cue =df_events, condition='repetition', trial_type='cue',
events=0, interval=5, name='test-rep_cue')
bold_delay= [
test_sets
test_odd_peak, test_odd_long, test_seq_long, test_rep_long,
test_seq_cue, test_rep_cue
]'''
========================================================================
SEPARATE RUN-WISE STANDARDIZATION (Z-SCORING) OF ALL TASK CONDITIONS
Standardize features by removing the mean and scaling to unit variance.
Centering and scaling happen independently on each feature by computing
the relevant statistics on the samples in the training set. Here,
we standardize the features of all trials run-wise but separated for
each task condition (oddball, sequence, and repetition condition).
========================================================================
'''
def detrend(data, t_tr=1.25):
from nilearn.signal import clean
= clean(
data_detrend =data, t_r=t_tr, detrend=True, standardize=False)
signalsreturn data_detrend
def show_weights(array):
# https://stackoverflow.com/a/50154388
import numpy as np
import seaborn as sns
= array.shape[0]
n_samples = np.unique(array, return_counts=True)
classes, bins = len(classes)
n_classes = n_samples / (n_classes * bins)
weights
sns.barplot(classes, weights)'class label')
plt.xlabel('weight')
plt.ylabel(
plt.show()
def melt_df(df, melt_columns):
# save the column names of the dataframe in a list:
= df.columns.tolist()
column_names # remove the stimulus classes from the column names;
= [x for x in column_names if x not in melt_columns]
id_vars # melt the dataframe creating one column with value_name and var_name:
= pd.melt(
df_melt ='probability', var_name='class', id_vars=id_vars)
df, value_name# return the melted dataframe:
return df_melt
= []
data_list = list(range(1, n_run+1))
runs #mask_label = 'cv'
'starting leave-one-run-out cross-validation')
logging.info(for mask_label in ['cv', 'cv_hpc']:
'testing in mask %s' % (mask_label))
logging.info(for run in runs:
'testing on run %d of %d ...' % (run, len(runs)))
logging.info(# define the run indices for the training and test set:
= [x for x in runs if x != run]
train_runs = [x for x in runs if x == run]
test_runs # get the feature selection mask of the current run:
if mask_label == 'cv':
= masks_final[runs.index(run)]
mask_run elif mask_label == 'cv_hpc':
= path_mask_hpc_task[runs.index(run)]
mask_run # extract smoothed fMRI data from the mask for the cross-validation fold:
= [masking.apply_mask(i, mask_run) for i in data_task]
masked_data # detrend the masked fMRI data separately for each run:
= [detrend(i) for i in masked_data]
data_detrend # combine the detrended data of all runs:
= np.vstack(data_detrend)
data_detrend # loop through all classifiers in the classifier set:
for clf_name, clf in clf_set.items():
# print classifier:
'classifier: %s' % clf_name)
logging.info(# fit the classifier to the training data:
=data_detrend, run_list=train_runs)
train_odd_peak.zscore(signals# get the example labels:
= copy.deepcopy(train_odd_peak.stim[train_odd_peak.runs != run])
train_stim # replace labels for single-label classifiers:
if clf_name in class_labels:
# replace all other labels with other
= ['other' if x != clf_name else x for x in train_stim]
train_stim # turn into a numpy array
= np.array(train_stim, dtype=object)
train_stim # check weights:
#show_weights(array=train_stim)
# train the classifier
clf.fit(train_odd_peak.data_zscored, train_stim)# classifier prediction: predict on test data and save the data:
for test_set in test_sets:
'testing on test set %s' % test_set.name)
logging.info(=data_detrend, run_list=test_runs)
test_set.zscore(signalsif test_set.data_zscored.size < 0:
continue
# create dataframe containing classifier predictions:
= test_set.predict(clf=clf, run_list=test_runs)
df_pred # add the current classifier as a new column:
'classifier'] = np.repeat(clf_name, len(df_pred))
df_pred[# add a label that indicates the mask / training regime:
'mask'] = np.repeat(mask_label, len(df_pred))
df_pred[# melt the data frame:
= melt_df(df=df_pred, melt_columns=train_stim)
df_pred_melt # append dataframe to list of dataframe results:
data_list.append(df_pred_melt)'''
========================================================================
DECODING ON RESTING STATE DATA:
========================================================================
'''
'Loading fMRI data of %d resting state runs ...' % len(path_rest))
logging.info(= [image.load_img(i) for i in path_rest]
data_rest 'loading successful!')
logging.info(
# combine all masks from the feature selection by intersection:
def multimultiply(arrays):
import copy
# start with the first array:
= copy.deepcopy(arrays[0].astype(np.int))
array_union # loop through all arrays
for i in range(len(arrays)):
# multiply every array with all previous array
= np.multiply(array_union, copy.deepcopy(arrays[i].astype(np.int)))
array_union # return the union of all arrays:
return(array_union)
for mask_label in ['union', 'union_hpc']:
# calculate the union of all masks by multiplication:
if mask_label == 'union':
= multimultiply(tmaps_masked_thresh_bin).astype(int).astype(bool)
masks_union elif mask_label == 'union_hpc':
= multimultiply(mask_hpc).astype(int).astype(bool)
masks_union # masked tmap into image like object:
= image.new_img_like(path_func_task[0], masks_union)
masks_union_nii = opj(path_out_masks, '{}_task-rest_mask-{}.nii.gz'.format(sub, mask_label))
path_save
masks_union_nii.to_filename(path_save)# mask all resting state runs with the averaged feature selection masks:
= [masking.apply_mask(i, masks_union_nii) for i in data_rest]
data_rest_masked # detrend and standardize each resting state run separately:
= [clean(i, detrend=True, standardize=True) for i in data_rest_masked]
data_rest_final # mask all functional task runs separately:
= [masking.apply_mask(i, masks_union_nii) for i in data_task]
data_task_masked # detrend each task run separately:
= [clean(i, detrend=True, standardize=False) for i in data_task_masked]
data_task_masked_detrend # combine the detrended data of all runs:
= np.vstack(data_task_masked_detrend)
data_task_masked_detrend # select only oddball data and standardize:
=data_task_masked_detrend, run_list=runs)
train_odd_peak.zscore(signals# write session and run labels:
= [i.split(sub + "_")[1].split("_task")[0] for i in path_rest]
ses_labels = [i.split("prenorm_")[1].split("_space")[0] for i in path_rest]
run_labels = ['_'.join([a, b]) for (a, b) in zip(ses_labels, run_labels)]
file_names = 1
rest_interval # save the voxel patterns:
= len(train_odd_peak.data_zscored[0])
num_voxels = ['v' + str(x) for x in range(num_voxels)]
voxel_labels = pd.DataFrame(train_odd_peak.data_zscored, columns=voxel_labels)
df_patterns # add the stimulus labels to the dataframe:
'label'] = copy.deepcopy(train_odd_peak.stim)
df_patterns[# add the participant id to the dataframe:
'id'] = np.repeat(df_events['subject'].unique(), len(train_odd_peak.stim))
df_patterns[# add the mask label:
'mask'] = np.repeat(mask_label, len(train_odd_peak.stim))
df_patterns[# split pattern dataframe by label:
= [df_patterns[df_patterns['label'] == i] for i in df_patterns['label'].unique()]
df_pattern_list # create file path to save the dataframe:
= [opj(path_out_data, '{}_voxel_patterns_{}_{}'.format(sub, mask_label, i)) for i in df_patterns['label'].unique()]
file_paths # save the final dataframe to a .csv-file:
+ '.csv', sep=',', index=False) for (i, j) in zip(df_pattern_list, file_paths)]
[i.to_csv(j # save only the voxel patterns as niimg-like objects
=i.loc[:, voxel_labels].to_numpy(), mask_img=masks_union_nii).to_filename(j + '.nii.gz') for (i, j) in zip(df_pattern_list, file_paths)]
[masking.unmask(X#[image.new_img_like(path_func_task[0], i.loc[:, voxel_labels].to_numpy()).to_filename(j + '.nii.gz') for (i, j) in zip(df_pattern_list, file_paths)]
# decoding on resting state data:
for clf_name, clf in clf_set.items():
# print classifier name:
'classifier: %s' % clf_name)
logging.info(# get the example labels for all slow trials:
= copy.deepcopy(train_odd_peak.stim)
train_stim # replace labels for single-label classifiers:
if clf_name in class_labels:
# replace all other labels with other
= ['other' if x != clf_name else x for x in train_stim]
train_stim # turn into a numpy array
= np.array(train_stim, dtype=object)
train_stim # train the classifier
clf.fit(train_odd_peak.data_zscored, train_stim)# classifier prediction: predict on test data and save the data:
= [clf.predict(i) for i in data_rest_final]
pred_rest_class = [clf.predict_proba(i) for i in data_rest_final]
pred_rest_proba # get the class names of the classifier:
= clf.classes_
classes_names # save classifier predictions on resting state scans
for t, name in enumerate(pred_rest_proba):
# create a dataframe with the probabilities of each class:
= pd.DataFrame(
df_rest_pred =classes_names)
pred_rest_proba[t], columns# get the number of predictions made:
= len(df_rest_pred)
num_pred # get the number of trials in the test set:
= int(num_pred / rest_interval)
num_trials # add the predicted class label to the dataframe:
'pred_label'] = pred_rest_class[t]
df_rest_pred[# add the true stimulus label to the dataframe:
'stim'] = np.full(num_pred, np.nan)
df_rest_pred[# add the volume number (TR) to the dataframe:
'tr'] = np.arange(1, num_pred + 1)
df_rest_pred[# add the sequential TR to the dataframe:
'seq_tr'] = np.arange(1, num_pred + 1)
df_rest_pred[# add the trial number to the dataframe:
'trial'] = np.tile(
df_rest_pred[1, rest_interval + 1), num_trials)
np.arange(# add the run number to the dataframe:
'run_study'] = np.repeat(run_labels[t], num_pred)
df_rest_pred[# add the session number to the dataframe:
'session'] = np.repeat(ses_labels[t], len(df_rest_pred))
df_rest_pred[# add the inter trial interval to the dataframe:
'tITI'] = np.tile('rest', num_pred)
df_rest_pred[# add the participant id to the dataframe:
'id'] = np.repeat(df_events['subject'].unique(), num_pred)
df_rest_pred[# add the name of the classifier to the dataframe:
'test_set'] = np.repeat('rest', num_pred)
df_rest_pred[# add the name of the classifier to the dataframe;
'classifier'] = np.repeat(clf_name, num_pred)
df_rest_pred[# add a label that indicates the mask / training regime:
'mask'] = np.repeat(mask_label, len(df_rest_pred))
df_rest_pred[# melt the data frame:
= melt_df(df=df_rest_pred, melt_columns=train_stim)
df_pred_melt # append dataframe to list of dataframe results:
data_list.append(df_pred_melt)# run classifier trained on all runs across all test sets:
for test_set in test_sets:
'testing on test set %s' % test_set.name)
logging.info(=data_task_masked_detrend, run_list=runs)
test_set.zscore(signalsif test_set.data_zscored.size < 0:
continue
# create dataframe containing classifier predictions:
= test_set.predict(clf=clf, run_list=runs)
df_pred # add the current classifier as a new column:
'classifier'] = np.repeat(clf_name, len(df_pred))
df_pred[# add a label that indicates the mask / training regime:
'mask'] = np.repeat(mask_label, len(df_pred))
df_pred[# melt the data frame:
= melt_df(df=df_pred, melt_columns=train_stim)
df_pred_melt # append dataframe to list of dataframe results:
data_list.append(df_pred_melt)
# concatenate all decoding dataframes in one final dataframe:
= pd.concat(data_list, sort=False)
df_all # create file path to save the dataframe:
= opj(path_out_data, '{}_decoding.csv'.format(sub))
file_path # save the final dataframe to a .csv-file:
=',', index=False)
df_all.to_csv(file_path, sep
'''
========================================================================
STOP LOGGING:
========================================================================
'''
= time.time()
end = (end - start)/60
total_time 'total running time: %0.2f minutes' % total_time)
logging.info( logging.shutdown()
3.6.3 Parallelization on HPC
The decoding pipeline was parallelized on the local high performance cluster of the Max Planck Institute for Human Development, Berlin, Germany, using the following code:
#!/usr/bin/bash
# ==============================================================================
# SCRIPT INFORMATION:
# ==============================================================================
# SCRIPT: PARALELLIZE THE DECODING SCRIPT ON THE MPIB CLUSTER (TARDIS)
# PROJECT: HIGHSPEED
# WRITTEN BY LENNART WITTKUHN, 2018 - 2020
# CONTACT: WITTKUHN AT MPIB HYPHEN BERLIN DOT MPG DOT DE
# MAX PLANCK RESEARCH GROUP NEUROCODE
# MAX PLANCK INSTITUTE FOR HUMAN DEVELOPMENT (MPIB)
# MAX PLANCK UCL CENTRE FOR COMPUTATIONAL PSYCHIATRY AND AGEING RESEARCH
# LENTZEALLEE 94, 14195 BERLIN, GERMANY
# ==============================================================================
# DEFINE ALL PATHS:
# ==============================================================================
# define the name of the current task:
TASK_NAME="highspeed-decoding"
# define the name of the project:
PROJECT_NAME="highspeed"
# path to the home directory:
PATH_HOME=/home/mpib/wittkuhn
# path to the current code folder:
PATH_CODE=${PATH_HOME}/${PROJECT_NAME}/${TASK_NAME}/code/decoding
# cd into the directory of the current script:
cd ${PATH_CODE}
# path to the current script:
PATH_SCRIPT=${PATH_CODE}/highspeed_decoding.py
# path to the output directory:
PATH_OUT=${PATH_HOME}/${PROJECT_NAME}/${TASK_NAME}/decoding
# path to the working directory:
PATH_WORK=${PATH_HOME}/${PROJECT_NAME}/${TASK_NAME}/work
# path to the log directory:
PATH_LOG=${PATH_HOME}/${PROJECT_NAME}/${TASK_NAME}/logs/$(date '+%Y%m%d_%H%M%S')
# ==============================================================================
# CREATE RELEVANT DIRECTORIES:
# ==============================================================================
# create output directory:
if [ ! -d ${PATH_OUT} ]; then
mkdir -p ${PATH_OUT}
fi
# create working directory:
if [ ! -d ${PATH_WORK} ]; then
mkdir -p ${PATH_WORK}
fi
# create directory for log files:
if [ ! -d ${PATH_LOG} ]; then
mkdir -p ${PATH_LOG}
fi
# ==============================================================================
# DEFINE PARAMETERS:
# ==============================================================================
# maximum number of cpus per process:
N_CPUS=1
# memory demand in *GB*
MEM=20GB
# user-defined subject list
PARTICIPANTS=$1
# ==============================================================================
# RUN THE DECODING:
# ==============================================================================
# initialize a counter for the subjects:
SUB_COUNT=0
for i in {1..40}; do
# turn the subject id into a zero-padded number:
SUB=$(printf "%02d\n" ${i})
# start slurm job file:
echo "#!/bin/bash" > job
# name of the job
echo "#SBATCH --job-name highspeed-decoding-sub-${SUB}" >> job
# set the expected maximum running time for the job:
echo "#SBATCH --time 12:00:00" >> job
# determine how much RAM your operation needs:
echo "#SBATCH --mem ${MEM}" >> job
# request multiple cpus:
echo "#SBATCH --cpus-per-task ${N_CPUS}" >> job
# write (output) log to log folder:
echo "#SBATCH --output ${PATH_LOG}/highspeed-decoding-sub-${SUB}-%j.out" >> job
# email notification on abort/end, use 'n' for no notification:
echo "#SBATCH --mail-type NONE" >> job
# start in the current directory:
echo "#SBATCH --workdir ." >> job
# activate virtual environment on cluster:
echo "source /etc/bash_completion.d/virtualenvwrapper" >> job
echo "workon highspeed-decoding" >> job
# define the main command:
echo "python3 ${PATH_SCRIPT} ${SUB}" >> job
# submit job to cluster queue and remove it to avoid confusion:
sbatch job
rm -f job
done
3.6.4 Software: Required packages
The requirements.txt
file lists the required packages which can be installed e.g., using pip install -r requirements.txt
annexremote==1.4.4
appdirs==1.4.4
attrs==20.3.0
boto==2.49.0
certifi==2020.11.8
cffi==1.14.3
chardet==3.0.4
Counter==1.0.0
cryptography==3.2.1
cycler==0.10.0
datalad==0.13.5
Deprecated==1.2.10
distro==1.5.0
fasteners==0.15
future==0.18.2
humanize==3.1.0
idna==2.10
iniconfig==1.1.1
iso8601==0.1.13
jeepney==0.6.0
joblib==0.17.0
jsmin==2.2.2
keyring==21.5.0
keyrings.alt==4.0.1
kiwisolver==1.3.1
matplotlib==3.3.3
monotonic==1.5
msgpack==1.0.0
nibabel==3.2.0
nilearn==0.7.0
numpy==1.19.4
packaging==20.4
pandas==1.1.4
patool==1.12
Pillow==8.0.1
pkg-resources==0.0.0
pluggy==0.13.1
py==1.9.0
pycparser==2.20
PyGithub==1.53
PyJWT==1.7.1
pyparsing==2.4.7
pytest==6.1.2
python-dateutil==2.8.1
pytz==2020.4
PyYAML==5.3.1
requests==2.25.0
scikit-learn==0.23.2
scipy==1.5.4
SecretStorage==3.2.0
simplejson==3.17.2
six==1.15.0
sklearn==0.0
threadpoolctl==2.1.0
toml==0.10.2
tqdm==4.53.0
urllib3==1.26.2
Whoosh==2.7.4
wrapt==1.12.1