Xinyi Wang
A survey of deep-learning-based radiology report generation using multimodal inputs
Wang, Xinyi; Figueredo, Grazziela; Li, Ruizhe; Zhang, Wei Emma; Chen, Weitong; Chen, Xin
Authors
Dr GRAZZIELA FIGUEREDO G.Figueredo@nottingham.ac.uk
ASSOCIATE PROFESSOR
Dr RUIZHE LI RUIZHE.LI@NOTTINGHAM.AC.UK
RESEARCH FELLOW
Wei Emma Zhang
Weitong Chen
Dr XIN CHEN XIN.CHEN@NOTTINGHAM.AC.UK
ASSOCIATE PROFESSOR
Abstract
Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.
Citation
Wang, X., Figueredo, G., Li, R., Zhang, W. E., Chen, W., & Chen, X. (2025). A survey of deep-learning-based radiology report generation using multimodal inputs. Medical Image Analysis, 103, Article 103627. https://doi.org/10.1016/j.media.2025.103627
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 24, 2025 |
Online Publication Date | May 13, 2025 |
Publication Date | 2025-07 |
Deposit Date | May 20, 2025 |
Publicly Available Date | May 20, 2025 |
Journal | Medical Image Analysis |
Print ISSN | 1361-8415 |
Electronic ISSN | 1361-8423 |
Publisher | Elsevier |
Peer Reviewed | Peer Reviewed |
Volume | 103 |
Article Number | 103627 |
DOI | https://doi.org/10.1016/j.media.2025.103627 |
Public URL | https://nottingham-repository.worktribe.com/output/49266693 |
Publisher URL | https://www.sciencedirect.com/science/article/pii/S1361841525001744?via%3Dihub |
Files
ReportGenerationSurveyMedicalImageAnalysis
(4.6 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search