Skip to content

Step 1: Create a .tsv file for an assessment tool

Prerequisite

It is assumed that the collected phenotypic data is available in .csv format. Survey platforms like LimeSurvey provide export options for this.

If your .csv file contains data for more than one assessment tool, create one .csv file per assessment tool. Each .csv file must contain a column with the participant_id.

Export/Digitisation of data

Questionnaire data is typically collected through an online survey tool (such as LimeSurvey) and can be exported in a variety of formats (e.g. .csv, .sav). In case pen and paper data is available this has to be digitized.

Example content of exported data

full_survey_export_V2.csv: | VPCode | BDI_Item1 | BDI_Item2 | BDI_ItemX | PHQ9_Item1 | PHQ_Item2 | PHQ_ItemX | | ------ | ------ |------ | ------ |------ | ------ |------ | | PHWEE | 2 | 2 | 3 | stimme nicht zu | 3 | stimme voll und ganz zu | | AWEEE | 1 | 1 | 2 | 4 | stimme nicht zu | 2 |

Clean data

Before you start annotating, it is necessary that the data is free from faulty data (i.e., often survey tests or subjects that only answered one question and then dropped out are included in the export). All such entries have to be removed.

Separation of different questionnaires / question groups

In order for the questionnaire data to be BIDS compliant, each group of questions must be available in a separate .tsv file. Often, many different questionnaires are collected (e.g., Beck-Depression Inventory-II (BDI-II) and Patient Health Questionnaire 9 (PHQ9)). Typically, exporting a survey results in one large file containing all the collected questionnaires. It is necessary to split the exported data into separate files. In case of nonstandardized questionnaires, it is recommended to split the large data file into semantically related data chunks (e.g. all questions that assess a subject's current mood, all questions that assess a specific concept). Each .tsv file contains the participant_id (i.e. the subject identifier) as first column.

Example .tsv files structure

├── autism_quotient_10.tsv
├── beck_depression_inventory.tsv
├── patient_health_questionnaire.tsv
├── empathy_quotient_10.tsv
├── light_triad.tsv
├── psychopathy_personality_invertory_revised.tsv
└── toronto_alexithymia_scale.tsv

Naming convention

Please make sure that there is one .tsv file per questionnaire and the file name matches the questionnaire name.

Save as .tsv files (tab separated values)

When creating all the files in line with the instruction save them as <measurement_tool>.tsv files. If you need help with this, please have a look here for LibreOffice.

Example of BIDS compliant questionnaire data

beck_depression_inventory.tsv: | participant_id | BDI_Item1 | BDI_Item2 | BDI_ItemX | | ------ | ------ |------ | ------ | | PHWEE | 2 | 2 | 3 | | AWEEE | 1 | 1 | 2 |

patient_health_questionnaire.tsv: | participant_id | PHQ9_Item1 | PHQ_Item2 | PHQ_ItemX | | ------ |------ | ------ |------ | | PHWEE | not at all | 3 | fully agree | | AWEEE | 4 | not at all | 2 |

Change all values to numerical values

Exception: free responses given by the subjects

Note: It is recommended to work thought this and the following *Data annotation step continuously for each datafile.

A core principle of BIDS is that each data file is accompanied by a file that describes the contents of the data file in detail (see section below) so none of the information will be lost. It is important that all the values that are now represented as phrases (e.g. not at all, agree) are available in their numerical expression. It is assumed that such entries stem from, for example, Likert-Scale where only the poles are described (e.g. [not at all] - 1 - 2 - 3 - 4 - [fully agree]).

Example changes of the patient_health_questionnaire.tsv introduced above:

participant_id PHQ9_Item1 PHQ_Item2 PHQ_ItemX
PHWEE 0 3 5
AWEEE 4 0 2

Please make sure that the content of the <measurement_tool>.tsv only contains numeric values or free text responses. If you have, for example, responses like "Strongly Agree" or "not at all" in your data, please replace them with numeric values. The information about the destinct levels of non-numeric responses will be added in the next step.

Naming conventions

Please make sure to follow the following naming conventions:

  • Each file contains the subject identifier in the participant_id column.
  • The name of the additional columns (i.e. the assessment tool items) can be chosen freely.
  • All participants listed in the participants.tsv file must be listed in the <measurement_tool>.tsv as well. If a participant did not take the questionnaire you can fill up the respective row with n/a (1).
  1. More information about how to handle issues like drop-outs or the same questionnaires administered in different sessions can be found here.