Medical AI has reached an inflection point. While vision-language models (VLMs) have shown promise in medical imaging, they’ve lacked the systematic, transparent reasoning that clinicians have to trust AI-assisted diagnoses. Changing that is NVIDIA Clara, a family of models, tools, and recipes which might be built for accelerating scientific discovery, analyzing medical images, and providing a foundational understanding of human health, biology and chemistry.
Specifically, Clara Reason introduces multimodal chain-of-thought models that mirror radiologists’ considering, providing step-by-step diagnostic reasoning with explanations that clinicians can validate and trust.
NVIDIA is expanding beyond traditional image evaluation to create a medical AI reasoning ecosystem that mixes foundational datasets with multimodal models to deliver interpretable decision support.
This post details the technical implementation of Clara NV-Reason-CXR-3B, a 3-billion-parameter VLM that’s specialized on chest x-ray evaluation. We cover the dataset creation methodology that captures radiologist thought processes through voice annotations, the two-stage training pipeline combining supervised fine-tuning with gradient reinforcement policy optimization, and validation results from clinical institutions.
Traditional medical AI approaches lack transparent reasoning
Today’s medical AI models often operate as black boxes, providing diagnoses without explaining their reasoning. This creates a trust barrier for clinicians who need to know and validate AI recommendations before incorporating them into patient care decisions.
Traditional approaches to medical AI have focused on improving accuracy metrics without addressing the elemental need for explainability. A radiologist doesn’t simply discover an abnormality—they systematically review anatomical structures, consider differential diagnoses, and articulate their thought process. The ultimate diagnosis is greater than only a label, it’s an internal thought means of a radiologist, based on years of experience, that led to it.
Reasoning AI models have demonstrated significant improvements in solving math, programming, and logic questions. By considering step-by-step before answering, they’re in a position to break down tasks into subgoals to unravel complex multi-step problems. Similarly in medical AI, the radiologist thought process allows the model to go deeper into each step and tackle complex medical problems.
How does Clara Reason provide transparent medical AI reasoning?
Clara Reason addresses the explainability challenge through an architecture that mixes multimodal perception with structured reasoning capabilities.
NVIDIA researchers contribute reasoning capabilities to Clara Reason through the Clara NV-Reason-CXR-3B model, a VLM specialized in chest x-ray evaluation. It’s designed to think like a radiologist when analyzing chest radiographs, and supply a full chain-of-thought process that mimics internal considering of the physician.
This allows AI to elucidate its diagnostic reasoning and supply detailed knowledgeable thoughts. It’s designed to reply within the form of a teacher, a senior radiologist, explaining the issue and the answer and offers:
- Chain-of-thought processing
- The reasoning engine generates step-by-step diagnostic evaluation
- Systematic anatomical review
- Identification of normal and abnormal findings
- Differential diagnosis consideration
- Clinical output generation
- Most important findings
- Step-by-step reasoning pathway
- Differential diagnoses and their likelihood
- Recommendations for follow-up or clinical correlation
- Clarification multi-step follow-up chat
- Structured report generation
In line with Dr. Mariam Aboian, Assistant Professor at Children’s Hospital of Philadelphia (CHOP), “For the primary time, generative AI is describing what is happening in radiologists’ heads and their chain-of-thought as they’re considering through the study, identifying the findings, and organizing them to find out the diagnosis. This provides innovation in explainability, which is critically needed for clinical implementation of AI and communication with physicians and medical providers across healthcare.”
Making a dataset that captures how radiologists think
Through collaboration with the National Institutes of Health (NIH), Children’s Hospital of Philadelphia (CHOP), and VinBrain, NVIDIA researchers created the primary dataset that captures radiologists’ thought processes. Unlike traditional datasets that concentrate on labels or reports, this collected data includes 1-2 pages of detailed radiological considering per image, dictated by radiologists to capture their thought processes.
Systematic examination protocol
Radiologists were asked to dictate all their thoughts, deliberations, and uncertainties when reading chest x-ray, loosely matching the next order:
Quality Assessment → Medical Devices → Airways → Lungs (R/L) → Mediastinum → Heart → Abdomen → Bones → Summary
Each annotation takes 7-Quarter-hour and is broken down into 10-20 detailed distinct observations and thoughts equivalent to, “I see haziness in the proper lower lobe, which makes me consider…”
Modern data collection
The team developed an annotation tool that captures authentic radiologist considering. The important thing insight is the simplicity of implementation, including:
- Voice recordings with speech-to-text capture natural clinical reasoning
- Basic ROI tools link observations to image regions
- Multi-language transcription enables global collaboration (transcribe and translate into English)
- Raw audio/text files may be formatted for training—no proprietary tools required
Teams can implement an identical approach using existing viewers with basic annotation capabilities, or just collect voice recordings alongside image reviews. The most important goal here is to capture the radiologist’s thought process, not the precise tooling.
Annotation focus areas include:
- Differential diagnoses: Include uncertainties and clinical reasoning
- Negative findings: Explicitly state what’s normal/absent to offer complete clinical picture
As well as, the training dataset has been expanded with synthetic data by distilling from GPT-OSS 120B based on chest x-ray reports (MIMIC-CXR, Open-I) with the radiologist reasoning data serving as examples. The synthetic dataset is roughly ~100K data points.
NV-Reason-CXR-3B training pipeline
The NV-Reason-CXR-3B model leverages Qwen2.5-VL-3B-Instruct VLM as a place to begin, and follows the approach popularized by DeepSeek-R1.
Stage 1: Supervised fine-tuning (SFT)
The initial stage trains the model on expert radiologist reasoning data using roughly 100K reasoning examples that mix our original annotations with synthetic data. Training runs on 4 nodes with eight NVIDIA H100 GPUs each (32 GPUs total) for 4 hours. The target is to show the model to generate structured diagnostic reasoning that follows authentic radiologist thought patterns.
Stage 2: Group Relative Policy Optimization (GRPO)
The second stage uses reinforcement learning to refine reasoning quality on larger datasets without requiring explicit reasoning annotations. Training uses an expanded chest X-ray dataset with verified diagnostic labels, using a reward function based on the share of appropriately identified abnormalities and diagnoses. This differs from traditional GRPO applications in math and logic tasks that typically use binary rewards.
Training uses the identical infrastructure as Stage 1 and runs for 4 days. This approach allows the model to learn from a broader dataset while preserving the structured considering patterns established within the supervised fine-tuning stage.
What’s the clinical validation and impact of Clara Reason?
Clara Reason acts as an AI co-pilot for radiologists, saving time while enhancing diagnostic confidence through transparent reasoning. The model demonstrates strong alignment with clinical considering, validated by board-certified radiologists.
Key advantages include:
- Time savings: Acts as a co-pilot, explains decisions, can write a structured report if essential
- Enhanced accuracy: Following the radiologists’ internal thought process helps with complex medical decisions.
- Built-in trust: Transparent explanation of reasoning pathways
- Teaching assistance: Explainability of choices provides confidence and academic value
Core capabilities include:
- Radiologist-aligned chain-of-thought: Captures actual internal considering processes, not generic AI reasoning
- Systematic examination patterns: Follows clinical protocols
- Transparent decision-making: Every diagnosis includes explainable reasoning pathways
- Confidence estimation: Calibrated uncertainty with clinical context
“The CXR reasoning model is a tremendous opportunity for assisting not only referring doctors but additionally patients who would really like to learn more in regards to the thought means of establishing differential diagnoses using imaging findings from all anatomic structures covered in the sphere of view, together with patients’ clinical information and symptoms,” said Ismail Baris Turkbey, M.D., F.S.A.R., Senior Clinician, NCI/CCR/MIB, National Institutes of Health. “Moreover, this novel tool has significant potential to function an academic assistant for trainees in radiology and medicine.”
How does Clara Reason transform clinical workflows?
Clara Reason is designed for the next primary use cases:
- Clinical decision support: Radiologists use Clara Reason as a “second reader” that gives detailed reasoning they will quickly validate. The transparent thought process allows clinicians to discover where they agree or disagree with the AI’s assessment, enhancing diagnostic confidence.
- Medical education: Medical schools and residency programs integrate Clara Reason to assist trainees develop systematic diagnostic considering. The model’s detailed reasoning serves as an always-available teaching assistant that demonstrates expert-level evaluation.
- Research applications: Researchers use Clara Reason to research large imaging datasets with consistent, documented reasoning—enabling latest insights into disease patterns and diagnostic variations across populations.
Learn how to integrate Clara Reason right into a medical AI application
Use the next quick start example to integrate Clara Reason into your medical AI application:
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
# Load the Clara Reason model
model_name = "nvidia/NV-Reason-CXR-3B"
model = AutoModelForImageTextToText.from_pretrained(
model_name,
torch_dtype=torch.float16,
).eval().to("cuda")
processor = AutoProcessor.from_pretrained(model_name)
# Load chest x-ray image
image = Image.open("chest_xray.png")
# Prepare input with clinical context
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{
"type": "text",
"text": "Find abnormalities and support devices."
}
]
}
]
# Create prompt using chat template
text = processor.apply_chat_template(
messages,
add_generation_prompt=True
)
# Process inputs
inputs = processor(
text=text,
images=[image],
return_tensors="pt"
).to(model.device)
# Generate reasoning with chain-of-thought
generated_ids = model.generate(
**inputs,
max_new_tokens=2048
)
# Trim the generated ids to remove the input ids
trimmed_generated_ids = [
out_ids[len(in_ids):]
for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
# Decode the response
generated_text = processor.batch_decode(
trimmed_generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
print("Output:")
print(generated_text)
Example output:
THINK: We'll begin with the standard assessment of this AP chest x-ray. As you possibly can see, the lung fields are relatively well visualized, although the patient's position and the indisputable fact that it's a conveyable AP view may limit the evaluation to some extent. I feel the standard is suitable for such a examination.
Now, we're taking a look at the medical devices and instrumentation. I do not see any medical devices equivalent to central venous catheters, endotracheal tubes, or chest tubes. Nonetheless, I do notice that the patient is probably going in a clinical setting, given the portable nature of the examination.
The central airways appear patent. The trachea is visible and never deviated, and the proper and left most important bronchi appear unremarkable. I do not see any foreign objects or abnormalities within the central airways.
Moving on to the proper lung, we're evaluating the upper lobe first. The aeration of the proper upper lobe appears reduced, which might be suggestive of some volume loss or atelectasis. I do not see any distinct masses or nodules, but there are some subtle opacities that would represent infiltrates or edema.
Next, we are going to evaluate the proper middle and lower lobes. The lung volumes appear low, which might be indicative of some atelectasis or volume loss. I see some patchy opacities in the proper lower lobe, which can represent pulmonary edema or infiltrates. The costophrenic angle is barely blunted, which might be significant for a small pleural effusion.
Now, let's examine the left lung. The left upper lobe appears just like the proper, with reduced aeration and a few subtle opacities that would represent infiltrates or edema. The left lower lobe also shows some patchy opacities and a rather blunted costophrenic angle, suggestive of a small pleural effusion.
We're taking a look at the pleura now, and as I discussed before, the costophrenic angles are barely blunted bilaterally, which is concerning for small bilateral pleural effusions. I do not see any evidence of pneumothorax.
The mediastinum appears enlarged, which might be indicative of cardiomegaly. The cardiac silhouette is outstanding, and I feel this finding could also be meaningful for moderate cardiomegaly.
The hila appear unremarkable, with none significant adenopathy or masses. The lung vasculature appears outstanding, which might be suggestive of pulmonary edema.
The center, as I discussed before, appears enlarged, with a cardiothoracic ratio that's elevated. This finding is concerning for cardiomegaly.
Now, we're taking a look at the upper abdomen, and I see some normal-appearing bowel gas patterns. There isn't a free air under the diaphragm, which is very important to notice.
Finally, we're taking a look at the bony structures. The clavicles, scapulae, and ribs appear unremarkable, with none focal lesions or fractures. The vertebral bodies are well visualized and appear normal.
In summary, probably the most significant findings on this patient are the presence of mild pulmonary edema, small bilateral pleural effusions, and moderate cardiomegaly. These findings could represent a spread of clinical conditions, including heart failure or other cardiopulmonary diseases.
ANSWER: Cardiomegaly, Edema, Enlarged Cardiomediastinum, Lung Opacity, Pleural Effusion
Start with Clara Reason
Clara Reason introduces chain-of-thought models that mirror radiologists’ considering—providing step-by-step diagnostic reasoning with explanations that clinicians can validate and trust. More specifically:
- NV-Reason-CXR-3B generates step-by-step diagnostic reasoning for chest x-ray evaluation, producing detailed thought processes somewhat than diagnostic labels alone.
- Dataset methodology captures radiologist thought processes through voice recordings during image evaluation, creating 1-2 pages of detailed reasoning per chest x-ray.
- Two-stage training with GRPO enables reasoning with minimal annotated data by first learning from expert reasoning examples, then using reinforcement learning to refine reasoning quality on larger datasets without requiring reasoning annotations.
This breakthrough in medical AI is powered by collaboration.
Able to start?
Stay awake to this point by subscribing to NVIDIA news, and following NVIDIA Healthcare on LinkedIn, X, and YouTube.
