Skip to content

Data Format Requirements

History

1.0.0 Previous versions can be found in the Agreement history.

The Austrian NeuroCloud (ANC) repository adheres strictly to the Brain Imaging Data Structure (BIDS) specification as the only accepted format for all submitted datasets. BIDS is an internationally recognized standard that provides a well-defined framework to ensure consistency, organization, and usability of neuroimaging data, facilitating reproducibility, and transparency across studies.

While BIDS compliance is a minimum requirement, the Austrian NeuroCloud imposes additional data format requirements to ensure that datasets are of the highest integrity and are ready for reuse in a wide range of research applications.

General requirements

All datasets submitted to the repository must fully comply with the latest BIDS specification.

Raw data

At the moment ANC accepts only raw data, as defined in the BIDS specification, which refers to "unprocessed or minimally processed due to file format conversion".

Source data

Source imaging data, as defined in the BIDS specification, refers to "data before harmonization, reconstruction, and/or file format conversion", and should not be uploaded to ANC. This includes, for example, DICOM images. Due to disk space limitations, we recommend storing source data outside ANC.

An exception from this rule are original log files, for example, E-Prime or Psychopy event logs.

Derived data

The storage of derivatives is currently under development. At present, ANC does not accept derived data.

Supported data types

Any BIDS-formatted dataset can be stored in ANC.

ANC actively supports MEG and MRI data types. Beyond BIDS, no additional format requirements are imposed for these types of data.

Additionally, ANC accepts phenotypic data and purely phenotypic datasets. See Questionnaire data for more information.

BIDS validation

Every change made to a dataset file triggers a continuous validation process to ensure that all files maintain BIDS compliance throughout the dataset’s lifecycle. Datasets are validated using the latest version of BIDS Validator.

All datasets must pass BIDS validation. If validation errors cannot be resolved, exceptions will be evaluated by ANC Data Stewards on a case by case basis.

ANC enforces the following modifications to certain BIDS Validator issues to ensure high dataset quality:

Participants age 89 or higher (code: 56)

Severity before and after change: warning / error

Original issue reason: As per section 164.514(C) of "The De-identification Standard" under HIPAA guidelines, participants with age 89 or higher should be tagged as 89+. More information can be found at https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/#standard

Change motivation: To prevent the potential identification of older participants.

Change implications: If a participant's age is 89 years or higher, it must be represented as 89.

EVENTS_TSV_MISSING (code: 25)

Severity before and after change: warning / error

Original issue reason: Task scans should have a corresponding events.tsv file. If this is a resting state scan you can ignore this warning or rename the task to include the word rest.

Change motivation: Event files provide essential information for understanding the experiment.

Change implications: Event TSV files must be provided when injecting participant data or the task must be renamed to rest.

CUSTOM_COLUMN_WITHOUT_DESCRIPTION (code: 82)

Severity before and after change: warning / error

Original issue reason: Tabular file contains custom columns not described in a data dictionary.

Change motivation: Custom columns in tabular files are often unclear without their descriptions.

Change implications: Each custom column in a TSV file must be described with at least a free-text entry in the corresponding JSON file.

Dataset metadata

The Metadata Specification outlines ANC's metadata policy, including sources and dissemination channels. To ensure automated metadata dissemination and to meet DOI generation requirements, certain minimum metadata requirements must be met.

The following files must be present:

  • ./README.md: Describing the dataset.
  • ./CITATION.cff: Including at least title and authors.
  • ./dataset_description.json: Including the keys "Name", "BIDSVersion", "HEDVersion", and "DatasetType".

Templates and guidelines for these files are available in each dataset project upon creation.

Session directories

Every dataset must include at least one session directory for each subject, even if no separate sessions exist. This structure simplifies software development and enhances automated dataset handling processes.

Subject information

Datasets must include ./participants.tsv and ./participants.json files, containing at least the participant_id column to store identifiers for all participants in the study. Demographic data such as age and sex should also be stored in these files, as this information is used to index and query datasets by demographic attributes.

Templates and guidelines for these files are available in each dataset project upon creation.

Task experiments

Datasets from task-based experiments across all modalities must include detailed event files.

Additionally, in the root directory (./) of the dataset, there must be one JSON file for each task, which provides detailed descriptions of all columns in the corresponding event TSV files.

For more details on formatting and planning studies with comprehensive event files, refer to the task experiment guidelines.

To increase interoperability, ANC strongly recommends annotating the events using the Hierarchical Event Descriptor (HED) standard.

Questionnaire data

Phenotypic data must adhere to the BIDS specification.

Some common use cases are not covered by BIDS. The table below provides ANC-recommended solutions for these cases. Adhering to these recommendations ensures that any future standard developments can be applied automatically.

Issue Error cause Solution
Pre-screening data BIDS validator errors occur if any participant_id in the <measurement-tool>.tsv file does not match one of the sub- directories. Store phenotypic data for participants without imaging data in the ./phenotype/extra subdirectory and add this directory to the .bidsignore file.
Varying questionnaires across subjects (e.g., study dropouts) BIDS validator errors occur if the dataset subjects (based on ./sub- directories) do not match the participant_id in the <measurement-tool>.tsv file. Add missing participant_id entries to the <measurement_tool>.tsv file, with n/a values.
Longitudinal data BIDS specification does not cover repeated phenotypic measures. Add the _ses- label to the <measurement-tool>.tsv file name (e.g., <measurement-tool_ses-1>.tsv, <measurement-tool_ses-2>.tsv).

Further developments related to storing phenotypic data in BIDS are being tracked in BIDS Extension Proposal 36 (BEP 36): Phenotypic Data Guidelines.