Skip to content

Demographic data

Demographic data or participant data is stored in BIDS participants files, specifically the participants.tsv and participants.json file, stored at the top level of a BIDS dataset. The ANC also supports Neurobagel annotations, that make your data findable based on participants characteristics via the Neurobagel query interface.

participants.tsv

The default participant.tsv file contains three columns: participant_id, age, and sex. These three variables are the minimum requirement. Adding additional columns for more extensive subject description is recommended. Whenever data belonging to a new participant is added to a dataset, a new row should be added to this file.

The participants.tsv file has the following rules:

  • Use tabulator as column separator.
  • Use lower case m, f and o for the sex values. According to BIDS these values refer to the phenotypical sex.
  • Age should be an integer.
  • Use 89 to indicate ages over 89 to prevent participant identification. Do not indicate age as 89+ or any other string value.
  • Missing values are always indicated via n/a

Example

participant_id age sex deafness hearing_months level_of_education
sub-450207866cba 28 m no n/a 1
sub-9fb5e49bc1b9 28 m no n/a 2
sub-51d9a4b8abed 38 m yes n/a 3
sub-fa7f791ac1f8 60 m yes 3 1
sub-c06323af4171 19 f yes n/a 1
sub-29863894b750 55 m ci_pre 24 1
sub-a415d9dc6d83 39 m no n/a 3
sub-09d43edda56a 23 f no n/a 2
sub-503a5c6607c5 28 m no n/a n/a
sub-a7cee54229be 58 m yes n/a 1

participants.json

We highly recommend using the Neurobagel Annotation Tool for creating the annotations for your .tsv file!! (Because the annotations are sometimes a bit redundant.)

The default participants.json contains the description of the columns in the participants.tsv. The default annotations are in line with the annotations required for the querying of summarized participant level information via Neurobagel.

Currently, the Neurobagel Data Model annotates the variables participant_id, sex, age, diagnosis and specific assessment_tools (such as standardized psychological and clinical questionnaires).

Example

In the following you can find BIDS standard annotations accompanied by Neurobagel augmentations (the Neurobagel annotations can be found under the Annotations entry for each column).

{
  "participant_id": {
    "Description": "",
    "Annotations": {
      "IsAbout": {
        "TermURL": "nb:ParticipantID",
        "Label": "Participant ID"
      },
      "VariableType": "Identifier"
    }
  },
  "age": {
    "Description": "Age of the participant in years",
    "Units": "years",
    "Annotations": {
      "IsAbout": {
        "TermURL": "nb:Age",
        "Label": "Age"
      },
      "VariableType": "Continuous",
      "MissingValues": [
        "n/a"
      ],
      "Format": {
        "TermURL": "nb:FromFloat",
        "Label": "float"
      }
    }
  },
  "sex": {
    "Description": "Sex as indicated by the participant",
    "Levels": {
      "F": {
        "Description": "female",
        "TermURL": "snomed:248152002"
      },
      "M": {
        "Description": "male",
        "TermURL": "snomed:248153007"
      }
    },
    "Annotations": {
      "IsAbout": {
        "TermURL": "nb:Sex",
        "Label": "Sex"
      },
      "VariableType": "Categorical",
      "Levels": {
        "F": {
          "TermURL": "snomed:248152002",
          "Label": "Female"
        },
        "M": {
          "TermURL": "snomed:248153007",
          "Label": "Male"
        }
      },
      "MissingValues": [
        "n/a"
      ]
    }
  }
}

An additional group column describes participant's diagnosis. Use the annotation tool provided by Neurobagel for generating the column description. An example you can find in the categorical example below.

{
   "deafness": {
    "Description": "Grouping we used in this study as we investigated different levels of audiovisual listening experience",
    "Levels": {
      "yes": {
        "Description": "Congenitally deaf",
        "TermURL": "snomed:95828007"
      },
      "ci_pre": {
        "Description": "Did hear something but turned deaf (bilaterally or single-sided) at some point",
        "TermURL": "snomed:343087000"
      },
      "no": {
        "Description": "No hearing problems are reported",
        "TermURL": "ncit:C94342"
      }
    },
    "Annotations": {
      "IsAbout": {
        "TermURL": "nb:Diagnosis",
        "Label": "Diagnosis"
      },
      "VariableType": "Categorical",
      "Levels": {
        "yes": {
          "Description": "Congenital deafness",
          "TermURL": "snomed:95828007"
        },
        "ci_pre": {
          "Description": "Partial deafness",
          "TermURL": "snomed:343087000"
        },
        "no": {
          "Description": "Healthy Control",
          "TermURL": "ncit:C94342"
        }
      },
      "MissingValues": [
        "n/a"
      ]
    }
  }
}

age different than float

The template assumes that the age column has float values. If your age values are of a different type, this has to be indicated in the section of the participant.json marked below and according to this table.

Age Range

Age ranges can also be represented. For example for .tsv entries such as 30-40

{
  "age_range": {
    "Description": "Age range of the participant.",
    "Units": "years",
    "Annotations": {
      "IsAbout": {
        "TermURL": "nb:Age",
        "Label": "Age"
      },
      "VariableType": "Continuous",
      "Format": {
        "TermURL": "nb:FromRange",
        "Label": "range"
      }
    },
    "Levels": {
      "15-20": "15-20 years",
      "21-25": "21-25 years",
      "26-30": "26-30 years",
      "31-35": "31-45 years",
      "36-40": "46-60 years",
    }
  }
}

Additional columns (beyond participant_id, age, sex, diagnosis and assessment_tool)

It might be that your participants.tsv file contains additional columns with numerical or categorical content beyond the variables currently annotated by Neurobagel. In such cases we rely on the standard BIDS annotations containing a description, units (where applicable) and levels (for categorical variables).

Additional columns MUST NOT contain personal data.

Examples

Below you find the annotations for a numerical column (hearing_months) and a categorical column (level_of_education)

 {
  "hearing_months": {
    "Description": "Calculated months that participants were exposed to audiovisual listening experience",
    "Units": "months"
  },
  "level_of_education": {
    "Description": "Highest reached degree",
    "Levels": {
      "1": {
        "Description": "No degree",
        "TermURL": "Please insert a termURL here if you have one available"
      },
      "2": {
        "Description": "Bachelor",
        "TermURL": "Please insert a termURL here if you have one available"
      },
      "3": {
        "Description": "Master",
        "TermURL": "Please insert a termURL here if you have one available"
      }
    }
  }
}