Skip to content

Commit 9af55c7

Browse files
committed
enh: add datalad & containers section
1 parent 5fb9d15 commit 9af55c7

File tree

2 files changed

+176
-0
lines changed

2 files changed

+176
-0
lines changed

docs/apps/datalad.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
Apps may be able to identify if the input dataset is handled with
2+
*DataLad* or *Git-Annex*, and pull down linked data that has not
3+
been fetched yet.
4+
One example of one such application is *MRIQC*, and all the examples
5+
on this documentation page will refer to it.
6+
7+
!!! important "Summary"
8+
9+
Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets
10+
within containers can be tricky.
11+
In particular, one of our general recommendations involves mounting
12+
or binding folders into the container in **read-only mode**, which
13+
will disallow *DataLad* from writing to the dataset tree.
14+
Similarly, and depending on the specific runtime settings of the
15+
container framework, *DataLad* may encounter issues with file ownership too.
16+
This section guides users through ensuring smooth execution of
17+
*BIDS-Apps* on *DataLad*/*Git-annex*-managed datasets.
18+
19+
## *DataLad* and *Docker*
20+
21+
When executing *MRIQC* within *Docker* on a *DataLad* dataset
22+
(for instance, installed from [*OpenNeuro*](https://openneuro.org)),
23+
we will need to ensure the following settings are observed:
24+
25+
* the user id (uid) who *installed* the *DataLad* dataset must match
26+
the uid who is *executing MRIQC* within the container runtime
27+
* the uid who is *executing MRIQC* within the container must
28+
have sufficient permissions to write in the tree.
29+
30+
### Setting execution uid
31+
32+
If the uid is not correct, we will likely encounter the following error:
33+
34+
```
35+
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
36+
37+
To add an exception for this directory, call:
38+
git config --global --add safe.directory /data
39+
40+
git-annex: automatic initialization failed due to above problems']
41+
```
42+
43+
Confusingly, following the suggestion from *DataLad* directly on the host
44+
(`git config --global --add safe.directory /data`) will not work in this
45+
case, because this line must be executed within the container.
46+
47+
Instead, we can override the default user executing within the container
48+
(which is `root`, or uid = 0).
49+
This can be achieved with
50+
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):
51+
52+
```
53+
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
54+
```
55+
56+
We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set.
57+
Let's update the last example in the previous
58+
[*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine):
59+
60+
61+
``` {.shell hl_lines="5"}
62+
$ docker run -ti --rm \
63+
-v $HOME/ds002785:/data:ro \
64+
-v $HOME/ds002785/derivatives:/out \
65+
-v $HOME/tmp/ds002785-workdir:/work \
66+
-u $(id -u):$(id -g) \ # set execution uid:gid
67+
nipreps/mriqc:<latest-version> \
68+
\
69+
/data /out/mriqc-<latest-version> \
70+
participant \
71+
-w /work
72+
```
73+
74+
The above command line will ensure *MRIQC* to be executed with the current
75+
uid and gid, which will match the filesystem's permissions if the dataset
76+
was installed with the same user.
77+
78+
!!! danger "Match uid and gid with those corresponding to the user who installed the dataset"
79+
80+
When different users are to install the dataset and
81+
execute the application, *Docker* must be executed with the
82+
uid and gid corresponding to the user who installed the dataset.
83+
The uid corresponding to a given username (for instance `janedoe`)
84+
can be obtained as follows:
85+
86+
```
87+
getent passwd "janedoe" | cut -f 3 -d ":"
88+
```
89+
90+
and her gid:
91+
92+
```
93+
getent passwd "janedoe" | cut -f 4 -d ":"
94+
```
95+
96+
### Mounting the dataset folder without *read-only* permissions
97+
98+
If the dataset is protected with *read-only* permissions, then *MRIQC*
99+
will hit the following error
100+
([see `nipreps/mriqc#1363`](https://github.com/nipreps/mriqc/issues/1363)):
101+
102+
```
103+
get(error): sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz (file) [git-annex: .git/annex/tmp: createDirectory: permission denied (Read-only file system)]
104+
action summary:
105+
get (error: 1)
106+
Traceback (most recent call last):
107+
File "/opt/conda/bin/mriqc", line 8, in <module>
108+
sys.exit(main())
109+
^^^^^^
110+
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/run.py", line 43, in main
111+
parse_args(argv)
112+
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/parser.py", line 658, in parse_args
113+
initialize_meta_and_data()
114+
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 447, in initialize_meta_and_data
115+
_datalad_get(dataset)
116+
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 282, in _datalad_get
117+
return get(
118+
^^^^
119+
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 773, in eval_func
120+
return return_func(*args, **kwargs)
121+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
122+
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 763, in return_func
123+
results = list(results)
124+
^^^^^^^^^^^^^
125+
File "/opt/conda/lib/python3.11/site-packages/datalad_next/patches/interface_utils.py", line 287, in _execute_command_
126+
raise IncompleteResultsError(
127+
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
128+
[{'action': 'get',
129+
'annexkey': 'MD5E-s76037251--344f061a3165c71e36b98ad1649c3c8c.nii.gz',
130+
'error_message': 'git-annex: .git/annex/tmp: createDirectory: permission '
131+
'denied (Read-only file system)',
132+
'path': '/data/sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz',
133+
'refds': '/data',
134+
'status': 'error',
135+
'type': 'file'}]
136+
```
137+
138+
This error indicates that the container is executed with
139+
the appropriate uid and gid pair.
140+
In this case, we will need to ensure *DataLad* can write
141+
to the dataset installation when obtaining new data.
142+
This is easily achieved by **removing the read-only parameters** of the
143+
mount option:
144+
145+
``` {.shell hl_lines="2 5"}
146+
$ docker run -ti --rm \
147+
-v $HOME/ds002785:/data \ # mount data WITHOUT :ro
148+
-v $HOME/ds002785/derivatives:/out \
149+
-v $HOME/tmp/ds002785-workdir:/work \
150+
-u $(id -u):$(id -g) \ # set execution uid:gid
151+
nipreps/mriqc:<latest-version> \
152+
\
153+
/data /out/mriqc-<latest-version> \
154+
participant \
155+
-w /work
156+
```
157+
158+
## *DataLad* and *Singularity*/*Apptainer*
159+
160+
In the case of *Singularity* and *Apptainer*, ensuring the uid that
161+
executes the container [involves using user namespace mappings](https://apptainer.org/docs/admin/1.0/user_namespace.html#user-namespace-requirementsn).
162+
Therefore, you will need to contact your system administrator to figure
163+
out a convenient solution to the problem.
164+
165+
Since most of *Singularity*/*Apptainer* deployments automatically bind
166+
the user's `$HOME` directory, *DataLad*'s suggested direction may
167+
work:
168+
169+
```
170+
git config --global --add safe.directory <path-to-dataset-in-host>
171+
```
172+
173+
Allowing the container to write on the dataset's tree is straightforward
174+
and homologous to *Docker*, by removing the `:ro` setting in the binding
175+
option (`-B`).

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ nav:
2929
- Introduction: apps/framework.md
3030
- Executing with Docker: apps/docker.md
3131
- Executing with Singularity: apps/singularity.md
32+
- Git-Annex and DataLad within containers: apps/datalad.md
3233
- Presentations: users/talks.md
3334
- Educational: users/educational.md
3435
- Documentation (devs):

0 commit comments

Comments
 (0)