Metadata Specification

History

1.0.0 Previous versions can be found in the Agreement history.

Sharing metadata is a critical practice in scientific research and data management. High-quality metadata facilitates the understanding and proper utilization of datasets, enhancing the reproducibility of scientific results and fostering collaboration across disciplines. To uphold these principles, the Austrian NeuroCloud adheres to the FAIR (Findable, Accessible, Interoperable, Reusable) data management guidelines. By following the FAIR principles, the ANC ensures that the shared metadata meets the highest standards of data stewardship, promoting greater transparency and efficiency in research.

The metadata specification is based on the Brain Image Data Structure (BIDS), the only data format currently supported in the Austrian NeuroCloud. BIDS is a standardized format that organizes and describes neuroimaging and related data, enabling seamless sharing and analysis. By adopting the BIDS format, the ANC ensures compatibility with a wide range of tools and platforms used in the neuroimaging community, further enhancing the utility and impact of our shared metadata.

The ANC is committed to protecting the privacy of individuals by ensuring that no personal data is ever shared or published. All metadata shared will be anonymized and stripped of any personally identifiable information, in compliance with ethical standards and legal regulations. This commitment safeguards the privacy of research participants while allowing the scientific community to benefit from shared data.

Additionally, the ANC will only share metadata that does not reveal any specific scientific contributions, thereby securing the intellectual property of the scientists.

Metadata sources

The metadata is categorized into four different levels: dataset, participant, measurement, and event. For each of these levels, specific files in a BIDS-formatted dataset are listed and used as the source of the metadata. Files are not published as they are, and in most cases only a summary of the information they contain is distributed. In the remainder, the POSIX syntax and glob patterns are used to specify the files and their paths relative to the BIDS dataset root directory ./.

Dataset level

The metadata provides a comprehensive description of the dataset as a whole, including its name, authors, identifiers, and related publications.

File pattern	Metadata definition
`./README.md`	Entire content of the file.
`./CITATION.cff`	Entire content of the file.
`./dataset_description.json`	Entire content of the file.

Participant level

The metadata provides basic information about the participant's demographics and completed assessment tools. Participant level metadata is published only in summarized form and never disclosed for individual participants. Data of individual participants may be stored in a database, but the query results reveal only summarized information about participants.

File pattern	Metadata definition
`./participants.tsv`	Columns and their values representing demographic data of the participants, including but not limited to age, sex, and diagnosis. Columns representing the availability and a score of a completed assessment tool.
`./participants.json`	Definitions of the columns representing demographic data and the availability of a completed assessment tool.

Measurement level

The metadata provides information about the data modalities available in a dataset, their specific acquisition parameters, and the number of acquisitions within and between multiple sessions. Acquisition times and dates are not published.

File pattern	Metadata definition
`./sub-/sessions.tsv`	The number of different acquisition sessions.
`./sub-/ses-/*scans.tsv`	Data modalities within a session.
`./sub-/ses-//.json`	Acquisition parameters of any modality.

Event level

The metadata lists and describes the events that occurred during the data acquisition, in particular the stimuli to which the subjects were exposed. This metadata is essential for semantic interoperability of the datasets, especially for finding similar experiments. This metadata will be published in a summarized form, and the entire event logs will not be redistributed.

File pattern	Metadata definition
`./sub-/ses-//events.tsv`	Stimuli and subject actions.
`./*events.json`	Descriptions of events and their HED annotations.

Dissemination channels

Metadata dissemination is facilitated through several channels, as outlined in the following table. These channels are designed to ensure broad and effective access to the metadata and the underlying datasets, and to enhance its findability within the scientific community.

Channel	Description	Metadata levels
DOI registration	The Austrian NeuroCloud uses DataCite as DOI registration service via the Library of the Paris Lodron University of Salzburg. Certain minimal dataset-level metadata is required for registering a DOI with DataCite. All applicable dataset-level metadata will be shared with DataCite.	Dataset
DataCite Commons	DataCite Commons exposes the metadata of the DOIs registered with DataCite for querying. The metadata of each DOI registered with the Austrian NeuroCloud will be exposed in DataCite Commons.	Dataset
Dataset website	Each dataset has an automatically generated website. This website is associated with the dataset DOI. The metadata is rendered on the website and embedded in its source code, complying with the FAIR assessment guidelines and ensuring a high fairness score.	Dataset, Participant, Measurement
ANC querying interface (work in progress)	To increase findability, the metadata will be stored in a database and exposed in a custom-built ANC querying interface.	All levels
Neurobagel node	To enable interoperability with other datasets at a participant level, the metadata will be stored in a database and exposed for querying using Neurobagel.	Dataset, Participant