Medical Reasoning Agent for Chest X-ray
1Department of Computer Science, University of Toronto, Toronto, Canada
2Vector Institute, Toronto, Canada
3University Health Network, Toronto, Canada
4Cohere For AI, Toronto, Canada
5Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care. While recent innovations have led to specialized models for various CXR interpretation tasks, these solutions often operate in isolation, limiting their practical utility in clinical practice.
We present MedRAX, the first versatile AI agent that seamlessly integrates state-of-the-art CXR analysis tools and multimodal large language models into a unified framework. MedRAX dynamically leverages these models to address complex medical queries without requiring additional training.
To rigorously evaluate its capabilities, we introduce ChestAgentBench, a comprehensive benchmark containing 2,500 complex medical queries across 7 diverse categories. Our experiments demonstrate that MedRAX achieves state-of-the-art performance compared to both open-source and proprietary models, representing a significant step toward the practical deployment of automated CXR interpretation systems.
We present MedRAX, an open-source agent-based framework that can dynamically reason, plan, and execute multi-step CXR workflows. MedRAX integrates multimodal reasoning abilities with structured tool-based decision-making, allowing real-time CXR interpretation without unnecessary computational overhead. Our framework integrates heterogeneous machine learning models—from lightweight classifiers to large LMMs—specialized for diverse downstream tasks, allowing it to decompose and solve complex medical queries by reasoning across multiple analytical skills.
ChestAgentBench is a medical VQA benchmark that offers several distinctive advantages:
We established seven core competencies alongside reasoning that are essential for CXR interpretation:
We evaluate MedRAX against four models: LLaVA-Med, a finetuned LLaVA-13B model for biomedical visual question answering (Li et al. 2024), CheXagent, a Vicuna-13B VLM trained for CXR interpretation (CheXagent), along with GPT-4o and Llama-3.2-90B Vision as popular closed and open-source multimodal LLMs respectively.
We evaluate models on two complementary benchmarks:
We present two representative cases that compare MedRAX to GPT-4o.
This question asks the model to determine the type of tube present in the CXR. GPT-4o incorrectly suggests an endotracheal tube based on the central positiong of the tube alone. MedRAX, integrated findings from multiple tools like report generation and visual QA, and correctly identifies a chest tube despite one tool (LLaVA-Med) suggesting otherwise. This demonstrates MedRAX's ability to resolve conflicting tool outputs through systematic reasoning.
This questions asks about diagnosing the predominant disease and comparing its severity across lungs. GPT-4o misinterprets the CXR as showing pneumonia with right lung predominance. MedRAX, through sequential tool application of report generation for disease identification and segmentation for lung opacity analysis, correctly determines left pneumothorax as the main finding. This demonstrates MedRAX's ability to break down complex queries into targeted analytical steps.
MedRAX establishes a new benchmark in AI-driven CXR interpretation by integrating structured tool orchestration with large-scale reasoning. Our evaluation on ChestAgentBench demonstrates its superiority over both general-purpose and domain-specific models, reinforcing the advantages of explicit stepwise reasoning in medical AI. These findings highlight the potential of combining foundation models with specialized tools, a principle that could be applied to broader domains in healthcare and beyond. Future work should focus on optimizing tool selection, uncertainty-aware reasoning, and expanding MedRAX's capabilities to multimodal medical imaging for greater clinical impact.
@misc{fallahpour2025medraxmedicalreasoningagent,
title={MedRAX: Medical Reasoning Agent for Chest X-ray},
author={Adibvafa Fallahpour and Jun Ma and Alif Munim and Hongwei Lyu and Bo Wang},
year={2025},
eprint={2502.02673},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.02673},
}
This website template is borrowed from here. Thank you!