Skip to content

New Dataset Onboarding

This workflow describes what the data steward does when a new dataset project is initialized in the ANC.

1. Initial assignment

When a new project is created, a Dataset Status issue is opened automatically and assigned to Monique. Monique reassigns the dataset to a data steward based on:

  • Incoming data type (MRI, MEG, EEG, physiological, other)
  • Data steward availability and current workload

The assigned data steward updates the Dataset Status label from Initialized to Formation.

2. First contact with the data owner

Reach out to the data owner to:

  • Confirm the data type and expected scope (number of subjects, sessions, tasks)
  • Confirm whether data collection is ongoing or complete
  • Point them to the relevant onboarding route in the handbook
  • Clarify who is responsible for what (researcher fills in metadata, data steward reviews and merges)
  • Ask whether they need a DOI at the end of collection

3. Project setup check

  • The correct project template was used (MRI Salzburg, MEG Salzburg, BIDS-basic, etc.)
  • The project is in the correct research unit sub-group under bids-datasets; if no suitable group exists, create one following the group-level settings guide
  • CI/CD pipelines are running (BIDS validator component is included in .gitlab-ci.yml)
  • For MEG Salzburg: config.yaml has been filled in by the researcher
  • For EEG using the ANC pipeline: .eeg_conversion_config.yml has been filled in

4. Pilot subject

Guide the researcher through depositing a first (pilot) subject:

  • Researcher follows the self-deposit guide and marks their first MR as ready
  • Data steward reviews the pilot MR using the relevant injection checklist (MRI, MEG, EEG, phenotype)
  • Any structural issues (wrong session layout, missing sidecar fields, incorrect event labels) are resolved before further data is added
  • First MR is merged; update the Dataset Status label: set First sample arrived to True

5. Ongoing data collection

  • Subsequent MRs follow the same review process
  • Monitor for consistency across subjects (same sequences, same event structure, same participant columns)
  • Keep the Dataset Status labels up to date as milestones are reached

6. Metadata completion

Once all data is collected:

  • Researcher fills in CITATION.cff, dataset_description.json, and README.md
  • Data steward reviews metadata completeness against the general metadata guide
  • participants.tsv and participants.json are complete and Neurobagel-compatible
  • HED annotations are present where applicable
  • Update Dataset Status: set Author metadata complete and Metadata complete to True

7. DOI (if requested)

Follow the DOI assignment workflow. Update Dataset Status: set DOI assigned to True and Dataset status to Maintenance.