Description
For more complex settings, dealing with multi-modal inputs is an important requirement. This requires specifying multiple inputs, which is usually handled in a named (dictionaries) or unnamed (lists) way of passing the inputs. We have already included a few changes to allow passing dictionaries to summary networks (Adapter.group
, Adapter.ungroup
, handling of dictionary inputs in the new Standardization
layer), but @LarsKue voiced concerns that the current approach leads to a cluttered/inconsistent user interface. Therefore, we want to open the discussion again to gather the requirements and needs, as well as ideas and directions for the implementation.
I see mainly the following discussion points, feel free to add more if you see other aspects:
- Importance: Is this an edge case that only experienced users need, or do we want it to be a first-class feature just as single-input networks?
- Which parts of the pipeline should handle the increased complexity?
Areas with models with multi-modal data include psychology (see e.g. [1]), and physical models, where the same simulator can produce data for different experimental settings, which can then be combined for more exact inference.
Some related discussion has already taken place in #503, but I think it would be good to restart the discussion here.
I will provide my current thoughts below, also tagging @LarsKue and @stefanradev93 for input.
[1] https://link.springer.com/article/10.1007/s42113-023-00167-4