New Dataset Onboarding¶

This workflow describes what the data steward does when a new dataset project is initialized in the ANC.

1. Initial assignment¶

When a new project is created, a Dataset Status issue is opened automatically and assigned to Monique. Monique reassigns the dataset to a data steward based on:

Incoming data type (MRI, MEG, EEG, physiological, other)
Data steward availability and current workload

The assigned data steward updates the Dataset Status label from Initialized to Formation.

2. First contact with the data owner¶

Reach out to the data owner to:

Confirm the data type and expected scope (number of subjects, sessions, tasks)
Confirm whether data collection is ongoing or complete
Point them to the relevant onboarding route in the handbook
Clarify who is responsible for what (researcher fills in metadata, data steward reviews and merges)
Ask whether they need a DOI at the end of collection

3. Project setup check¶

The correct project template was used (MRI Salzburg, MEG Salzburg, BIDS-basic, etc.)
The project is in the correct research unit sub-group under bids-datasets; if no suitable group exists, create one following the group-level settings guide
CI/CD pipelines are running (BIDS validator component is included in .gitlab-ci.yml)
For MEG Salzburg: config.yaml has been filled in by the researcher
For EEG using the ANC pipeline: .eeg_conversion_config.yml has been filled in

4. Pilot subject¶

Guide the researcher through depositing a first (pilot) subject:

Researcher follows the self-deposit guide and marks their first MR as ready
Data steward reviews the pilot MR using the relevant injection checklist (MRI, MEG, EEG, phenotype)
Any structural issues (wrong session layout, missing sidecar fields, incorrect event labels) are resolved before further data is added
First MR is merged; update the Dataset Status label: set First sample arrived to True

5. Ongoing data collection¶

Subsequent MRs follow the same review process
Monitor for consistency across subjects (same sequences, same event structure, same participant columns)
Keep the Dataset Status labels up to date as milestones are reached

6. Metadata completion¶

Once all data is collected:

Researcher fills in CITATION.cff, dataset_description.json, and README.md
Data steward reviews metadata completeness against the general metadata guide
participants.tsv and participants.json are complete and Neurobagel-compatible
HED annotations are present where applicable
Update Dataset Status: set Author metadata complete and Metadata complete to True

7. DOI (if requested)¶

Follow the DOI assignment workflow. Update Dataset Status: set DOI assigned to True and Dataset status to Maintenance.