Participant Metadata¶
Demographic data or participant data is stored in BIDS participants files, specifically the participants.tsv and participants.json file, stored at the top level of a BIDS dataset. The ANC also supports Neurobagel annotations, that make your data findable based on participants characteristics via the Neurobagel query interface.
participants.tsv¶
The default participant.tsv file contains three columns: participant_id, age, and sex. These three variables are the minimum requirement. Adding additional columns for more extensive subject description is recommended. Whenever data belonging to a new participant is added to a dataset, a new row should be added to this file.
The participants.tsv file has the following rules:
- Use tabulator as column separator.
- Use lower case m, f and o for the sex values. According to BIDS these values refer to the phenotypical sex.
- Age should be an integer.
- Use 89 to indicate ages over 89 to prevent participant identification. Do not indicate age as 89+ or any other string value.
- Missing values are always indicated via
n/a
Example¶
| participant_id | age | sex | deafness | hearing_months | level_of_education |
|---|---|---|---|---|---|
| sub-450207866cba | 28 | m | no | n/a | 1 |
| sub-9fb5e49bc1b9 | 28 | m | no | n/a | 2 |
| sub-51d9a4b8abed | 38 | m | yes | n/a | 3 |
| sub-fa7f791ac1f8 | 60 | m | yes | 3 | 1 |
| sub-c06323af4171 | 19 | f | yes | n/a | 1 |
| sub-29863894b750 | 55 | m | ci_pre | 24 | 1 |
| sub-a415d9dc6d83 | 39 | m | no | n/a | 3 |
| sub-09d43edda56a | 23 | f | no | n/a | 2 |
| sub-503a5c6607c5 | 28 | m | no | n/a | n/a |
| sub-a7cee54229be | 58 | m | yes | n/a | 1 |
participants.json¶
We highly recommend using the Neurobagel Annotation Tool for creating the annotations for your .tsv file!! (Because the annotations are sometimes a bit redundant.)
The default participants.json contains the description of the columns in the participants.tsv. The default annotations are in line with the annotations required for the querying of summarized participant level information via Neurobagel.
Currently, the Neurobagel Data Model annotates the variables participant_id, sex, age, diagnosis and specific assessment_tools (such as standardized psychological and clinical questionnaires).
Example¶
In the following you can find BIDS standard annotations accompanied by Neurobagel augmentations (the Neurobagel annotations can be found under the Annotations entry for each column).
{
"participant_id": {
"Description": "",
"Annotations": {
"IsAbout": {
"TermURL": "nb:ParticipantID",
"Label": "Participant ID"
},
"VariableType": "Identifier"
}
},
"age": {
"Description": "Age of the participant in years",
"Units": "years",
"Annotations": {
"IsAbout": {
"TermURL": "nb:Age",
"Label": "Age"
},
"VariableType": "Continuous",
"MissingValues": [
"n/a"
],
"Format": {
"TermURL": "nb:FromFloat",
"Label": "float"
}
}
},
"sex": {
"Description": "Sex as indicated by the participant",
"Levels": {
"f": {
"Description": "female",
"TermURL": "snomed:248152002"
},
"m": {
"Description": "male",
"TermURL": "snomed:248153007"
},
"o": {
"Description": "other",
"TermURL": "snomed:32570681000036106"
}
},
"Annotations": {
"IsAbout": {
"TermURL": "nb:Sex",
"Label": "Sex"
},
"VariableType": "Categorical",
"Levels": {
"f": {
"TermURL": "snomed:248152002",
"Label": "Female"
},
"m": {
"TermURL": "snomed:248153007",
"Label": "Male"
},
"o": {
"TermURL": "snomed:32570681000036106",
"Label": "Other"
}
},
"MissingValues": [
"n/a"
]
}
}
}
An additional group column describes participant's diagnosis. Use the annotation tool provided by Neurobagel for generating the column description. An example you can find in the categorical example below.
{
"deafness": {
"Description": "Grouping we used in this study as we investigated different levels of audiovisual listening experience",
"Levels": {
"yes": {
"Description": "Congenitally deaf",
"TermURL": "snomed:95828007"
},
"ci_pre": {
"Description": "Did hear something but turned deaf (bilaterally or single-sided) at some point",
"TermURL": "snomed:343087000"
},
"no": {
"Description": "No hearing problems are reported",
"TermURL": "ncit:C94342"
}
},
"Annotations": {
"IsAbout": {
"TermURL": "nb:Diagnosis",
"Label": "Diagnosis"
},
"VariableType": "Categorical",
"Levels": {
"yes": {
"Description": "Congenital deafness",
"TermURL": "snomed:95828007"
},
"ci_pre": {
"Description": "Partial deafness",
"TermURL": "snomed:343087000"
},
"no": {
"Description": "Healthy Control",
"TermURL": "ncit:C94342"
}
},
"MissingValues": [
"n/a"
]
}
}
}
Format age¶
Per Neurobagel documentation, nb:FromFloat is the correct format for all numeric age values, including integers. A raw value of 31 is treated the same as 31.0. Other supported formats (e.g. nb:FromEuro for 31,5, nb:FromISO8601 for 31Y6M, nb:FromBounded for 89+) are listed in the Neurobagel age documentation.
Age Range¶
Age ranges can also be represented using the nb:FromRange format. With this format, .tsv values are written as min-max (e.g. 30-35). No Levels entry is needed — Neurobagel parses the bounds directly from the range string.
{
"age_range": {
"Description": "Age range of the participant.",
"Units": "years",
"Annotations": {
"IsAbout": {
"TermURL": "nb:Age",
"Label": "Age"
},
"VariableType": "Continuous",
"Format": {
"TermURL": "nb:FromRange",
"Label": "range"
},
"MissingValues": [
"n/a"
]
}
}
}
Legacy age range documentation (levels-based)
An older approach listed each range bin as an explicit level. This is no longer recommended but kept here for reference.
{
"age_range": {
"Description": "Age range of the participant.",
"Units": "years",
"Annotations": {
"IsAbout": {
"TermURL": "nb:Age",
"Label": "Age"
},
"VariableType": "Continuous",
"Format": {
"TermURL": "nb:FromRange",
"Label": "range"
}
},
"Levels": {
"15-20": "15-20 years",
"21-25": "21-25 years",
"26-30": "26-30 years",
"31-35": "31-35 years",
"36-40": "36-40 years"
}
}
}
Additional columns (beyond participant_id, age, sex, diagnosis and assessment_tool)¶
It might be that your participants.tsv file contains additional columns with numerical or categorical content beyond the variables currently annotated by Neurobagel. In such cases we rely on the standard BIDS annotations containing a description, units (where applicable) and levels (for categorical variables).
Additional columns MUST NOT contain personal data.
Examples¶
Below you find the annotations for a numerical column (hearing_months) and a categorical column (level_of_education)
{
"hearing_months": {
"Description": "Calculated months that participants were exposed to audiovisual listening experience",
"Units": "months"
},
"level_of_education": {
"Description": "Highest reached degree",
"Levels": {
"1": {
"Description": "No degree",
"TermURL": "Please insert a termURL here if you have one available"
},
"2": {
"Description": "Bachelor",
"TermURL": "Please insert a termURL here if you have one available"
},
"3": {
"Description": "Master",
"TermURL": "Please insert a termURL here if you have one available"
}
}
}
}