Theme Correction
Clinician-AI Performance Comparison on Contour Corrections
Evaluating how deep learning models compare to human experts in predicting clinical impact of segmentation variations
Abstract
In radiation therapy, segmentation accuracy directly impacts treatment outcomes. While automated segmentation methods promise efficiency and consistency, evaluating their quality remains challenging. Traditional geometric metrics (Dice, Hausdorff distance) fail to capture what matters clinically: how segmentation errors affect radiation dose delivery.
This research addresses this gap by developing and validating methods that assess segmentation quality through clinical dosimetry. We demonstrate that deep learning models can predict dosimetric impact more accurately and consistently than expert radiation oncologists, revealing concerning patterns in human expert assessment including poor inter-rater agreement (Cohen’s κ: 0.33–0.74) and systematic conservative biases.
Our work establishes the foundation for AI-based quality assurance systems that provide objective, dose-informed segmentation evaluation for safer, more efficient radiotherapy planning.
Introduction
The Problem with Geometric Metrics
Medical image segmentation has traditionally relied on geometric metrics like Dice Similarity Coefficient and Hausdorff Distance. These metrics offer mathematical precision and computational efficiency, making them attractive for algorithm development. However, they suffer from an inherent limitation: they assume all spatial errors carry equal clinical significance.
In radiotherapy, this assumption is fundamentally flawed. A small contour variation in a high-dose region may have profound clinical consequences, while a larger error in a low-dose area might be insignificant. The relationship between geometric accuracy and clinical impact is complex, context-dependent, and influenced by treatment technique, anatomical structure, and dose distribution.
The Clinical Challenge
Our research revealed a striking finding: radiation oncologists themselves struggle to visually predict which contour variations will cause clinically significant dosimetric changes. This underscores the urgent need for automated computational approaches that can accurately assess clinical implications of segmentation errors.
Key Research Contributions
1. Clinician vs AI Performance Study
Our MIDL 2024 study (Kamath et al., 2024) and extended Radiotherapy & Oncology 2025 journal (Willmann et al., 2025) provided the first systematic comparison of deep learning dose prediction versus clinical expert assessment.
Key Findings:
- Deep learning models outperformed radiation oncologists in estimating dosimetric impact
- Higher correlation with ground truth dose distributions
- Substantially reduced variability compared to human experts
- Completed assessments in seconds vs minutes/hours
Expert Variability Analysis:
- Cohen’s Kappa values: 0.33–0.74 (weak to moderate agreement)
- 46% false positive rate: Equivalent variations incorrectly classified as “worse”
- Conservative bias: No expert identified objectively superior variations
- Implications for unnecessary clinical corrections and resource waste
2. AutoDoseRank Framework
AutoDoseRank (Mercado et al., 2024) introduced automated dosimetry-informed segmentation ranking for radiotherapy quality assurance.
Innovation:
- Deep learning-based dose predictor eliminating need for full dose recalculation
- Clinical priority integration considering organ-specific importance
- Patient-level assessment across all organs-at-risk
Performance:
- Outperformed 3 of 4 radiation oncology experts in ranking accuracy
- Better inter-rater agreement (Kendall’s Tau) than human experts
- Sub-second inference enabling real-time clinical integration
3. Automated Quality Assurance Integration
Our work establishes frameworks for integrating AI-based dose prediction into clinical workflows:
- Objective assessment replacing subjective geometric metrics
- Workflow efficiency through instant, automated evaluation
- Improved consistency reducing harmful inter-evaluator variability
- Enhanced safety through accurate identification of significant errors
- Cost-effectiveness via reduced unnecessary corrections
Methods and Approach
Deep Learning Dose Prediction
Our approach employs deep learning models trained on clinical treatment planning data to predict dose distributions without requiring full physics-based simulation. This enables:
- Rapid assessment: Dose prediction in seconds vs hours
- Accurate sensitivity: Captures dosimetric impact of contour variations
- Clinical validation: Evaluated against ground truth treatment plans
Clinical Validation Studies
Multi-Institutional Survey:
- 4 radiation oncologists and 3 medical physicists
- 54 glioblastoma target volume variations across 14 patients
- Statistical analysis: Cohen’s Kappa, correlation analysis, pattern analysis
Performance Benchmarking:
- Head-to-head comparison: AI vs expert clinicians
- Kendall’s Tau ranking correlation
- Normalized Distance-based Performance Measure (NDPM)
Quality Assurance Framework
- Organ-specific priority integration
- Patient-level comprehensive assessment
- Uncertainty quantification for clinical confidence
- Real-time feedback capabilities
Conclusion
Key Achievements
This research establishes that AI-based dose prediction provides more accurate and consistent dosimetric impact assessment than expert clinicians, addressing a critical gap in radiotherapy quality assurance. Key achievements include:
- First systematic clinician-AI comparison demonstrating AI superiority
- Identification of concerning patterns in expert assessment (poor agreement, systematic biases)
- Practical framework for automated, dose-informed quality assurance
- Clinical validation across multiple institutions and expert clinicians
Clinical Impact
The integration of AI-based dose prediction into radiotherapy workflows offers:
- Objective evaluation replacing subjective geometric assessment
- Improved efficiency through instant automated feedback
- Enhanced safety via consistent, accurate dosimetric evaluation
- Resource optimization by reducing unnecessary corrections
- Clinical decision support with quantitative dose-based insights
Future Directions
Promising future research directions include:
- Multi-modal integration: Incorporating functional imaging and temporal data
- Real-time contouring feedback: Interactive dose-aware editing tools
- Outcome-based validation: Linking segmentation quality to patient outcomes
- Adaptive quality assurance: Personalized evaluation based on patient characteristics
- Uncertainty quantification: Enhanced confidence estimation for clinical trust
By continuing to bridge computational methods with clinical needs, we advance toward safer, more efficient, and more effective radiation therapy through intelligent quality assurance systems.
References
2025
- Predicting the impact of target volume contouring variations on the organ at risk dose: results of a qualitative surveyRadiotherapy and Oncology, 2025
2024
- Comparing the Performance of Radiation Oncologists versus a Deep Learning Dose Predictor to Estimate Dosimetric Impact of Segmentation Variations for RadiotherapyIn Medical Imaging with Deep Learning, 2024
- AutoDoseRank: Automated Dosimetry-Informed Segmentation Ranking for RadiotherapyIn MICCAI Workshop on Cancer Prevention through Early Detection, 2024