New Dataset Onboarding¶
This workflow describes what the data steward does when a new dataset project is initialized in the ANC.
1. Initial assignment¶
When a new project is created, a Dataset Status issue is opened automatically and assigned to Monique. Monique reassigns the dataset to a data steward based on:
- Incoming data type (MRI, MEG, EEG, physiological, other)
- Data steward availability and current workload
The assigned data steward updates the Dataset Status label from Initialized to Formation.
2. First contact with the data owner¶
Reach out to the data owner to:
- Confirm the data type and expected scope (number of subjects, sessions, tasks)
- Confirm whether data collection is ongoing or complete
- Point them to the relevant onboarding route in the handbook
- Clarify who is responsible for what (researcher fills in metadata, data steward reviews and merges)
- Ask whether they need a DOI at the end of collection
3. Project setup check¶
- The correct project template was used (MRI Salzburg, MEG Salzburg, BIDS-basic, etc.)
- The project is in the correct research unit sub-group under
bids-datasets; if no suitable group exists, create one following the group-level settings guide - CI/CD pipelines are running (BIDS validator component is included in
.gitlab-ci.yml) - For MEG Salzburg:
config.yamlhas been filled in by the researcher - For EEG using the ANC pipeline:
.eeg_conversion_config.ymlhas been filled in
4. Pilot subject¶
Guide the researcher through depositing a first (pilot) subject:
- Researcher follows the self-deposit guide and marks their first MR as ready
- Data steward reviews the pilot MR using the relevant injection checklist (MRI, MEG, EEG, phenotype)
- Any structural issues (wrong session layout, missing sidecar fields, incorrect event labels) are resolved before further data is added
- First MR is merged; update the Dataset Status label: set
First sample arrivedtoTrue
5. Ongoing data collection¶
- Subsequent MRs follow the same review process
- Monitor for consistency across subjects (same sequences, same event structure, same participant columns)
- Keep the Dataset Status labels up to date as milestones are reached
6. Metadata completion¶
Once all data is collected:
- Researcher fills in
CITATION.cff,dataset_description.json, andREADME.md - Data steward reviews metadata completeness against the general metadata guide
-
participants.tsvandparticipants.jsonare complete and Neurobagel-compatible - HED annotations are present where applicable
- Update Dataset Status: set
Author metadata completeandMetadata completetoTrue
7. DOI (if requested)¶
Follow the DOI assignment workflow.
Update Dataset Status: set DOI assigned to True and Dataset status to Maintenance.