Step 1: Create a .tsv
file for an assessment tool
Prerequisite
It is assumed that the collected phenotypic data is available in .csv
format. Survey platforms like LimeSurvey provide export options for this.
If your .csv
file contains data for more than one assessment tool, create one .csv
file per assessment tool. Each .csv
file must contain a column with the participant_id
.
Export/Digitisation of data
Questionnaire data is typically collected through an online survey tool (such as LimeSurvey) and can be exported in a variety of formats (e.g. .csv
, .sav
).
In case pen and paper data is available this has to be digitized.
Example content of exported data
full_survey_export_V2.csv
:
| VPCode | BDI_Item1 | BDI_Item2 | BDI_ItemX | PHQ9_Item1 | PHQ_Item2 | PHQ_ItemX |
| ------ | ------ |------ | ------ |------ | ------ |------ |
| PHWEE | 2 | 2 | 3 | stimme nicht zu | 3 | stimme voll und ganz zu |
| AWEEE | 1 | 1 | 2 | 4 | stimme nicht zu | 2 |
Clean data
Before you start annotating, it is necessary that the data is free from faulty data (i.e., often survey tests or subjects that only answered one question and then dropped out are included in the export). All such entries have to be removed.
Separation of different questionnaires / question groups
In order for the questionnaire data to be BIDS compliant, each group of questions must be available in a separate .tsv
file. Often, many different questionnaires are collected (e.g., Beck-Depression Inventory-II (BDI-II) and Patient Health Questionnaire 9 (PHQ9)). Typically, exporting a survey results in one large file containing all the collected questionnaires. It is necessary to split the exported data into separate files. In case of nonstandardized questionnaires, it is recommended to split the large data file into semantically related data chunks (e.g. all questions that assess a subject's current mood, all questions that assess a specific concept).
Each .tsv
file contains the participant_id
(i.e. the subject identifier) as first column.
Example .tsv
files structure
├── autism_quotient_10.tsv
├── beck_depression_inventory.tsv
├── patient_health_questionnaire.tsv
├── empathy_quotient_10.tsv
├── light_triad.tsv
├── psychopathy_personality_invertory_revised.tsv
└── toronto_alexithymia_scale.tsv
Naming convention
Please make sure that there is one .tsv
file per questionnaire and the file name matches the questionnaire name.
Save as .tsv
files (tab separated values)
When creating all the files in line with the instruction save them as <measurement_tool>.tsv
files. If you need help with this, please have a look here for LibreOffice.
Example of BIDS compliant questionnaire data
beck_depression_inventory.tsv
:
| participant_id | BDI_Item1 | BDI_Item2 | BDI_ItemX |
| ------ | ------ |------ | ------ |
| PHWEE | 2 | 2 | 3 |
| AWEEE | 1 | 1 | 2 |
patient_health_questionnaire.tsv
:
| participant_id | PHQ9_Item1 | PHQ_Item2 | PHQ_ItemX |
| ------ |------ | ------ |------ |
| PHWEE | not at all | 3 | fully agree |
| AWEEE | 4 | not at all | 2 |
Change all values to numerical values
Exception: free responses given by the subjects
Note: It is recommended to work thought this and the following *Data annotation step continuously for each datafile.
A core principle of BIDS is that each data file is accompanied by a file that describes the contents of the data file in detail (see section below) so none of the information will be lost.
It is important that all the values that are now represented as phrases (e.g. not at all
, agree
) are available in their numerical expression. It is assumed that such entries stem from, for example, Likert-Scale where only the poles are described (e.g. [not at all]
- 1
- 2
- 3
- 4
- [fully agree]
).
Example changes of the patient_health_questionnaire.tsv
introduced above:
participant_id | PHQ9_Item1 | PHQ_Item2 | PHQ_ItemX |
---|---|---|---|
PHWEE | 0 | 3 | 5 |
AWEEE | 4 | 0 | 2 |
Please make sure that the content of the <measurement_tool>.tsv
only contains numeric values or free text responses.
If you have, for example, responses like "Strongly Agree" or "not at all" in your data, please replace them with numeric values. The information about the destinct levels of non-numeric responses will be added in the next step.
Naming conventions
Please make sure to follow the following naming conventions:
- Each file contains the subject identifier in the
participant_id
column. - The name of the additional columns (i.e. the assessment tool items) can be chosen freely.
- All participants listed in the
participants.tsv
file must be listed in the<measurement_tool>.tsv
as well. If a participant did not take the questionnaire you can fill up the respective row withn/a
(1).
- More information about how to handle issues like drop-outs or the same questionnaires administered in different sessions can be found here.