卷:5 | |
Multimodal learning with graphs | |
Article | |
关键词: PREDICTION; NETWORK; MODEL; REPRESENTATION; PHYSICS; | |
DOI : 10.1038/s42256-023-00624-6 | |
来源: SCIE |
【 摘 要 】
Artificial intelligence for graphs has achieved remarkable success in modelling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases - assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets is challenging because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, graph artificial intelligence methods combine different modalities while leveraging cross-modal dependencies through geometric relationships. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models. One of the main advances in deep learning in the past five years has been graph representation learning, which enabled applications to problems with underlying geometric relationships. Increasingly, such problems involve multiple data modalities and, examining over 160 studies in this area, Ektefaie et al. propose a general framework for multimodal graph learning for image-intensive, knowledge-grounded and language-intensive problems.
【 授权许可】
Free