Explainable Diabetic Retinopathy Screening System

Diabetic Retinopathy Screening Hero Image

Field	Details
Domain	Healthcare
Assurance Goal	Explainability

Overview

The NHS Midlands Diabetic Eye Screening Programme has deployed an AI-powered system to assist in screening retinal images for signs of diabetic retinopathy (DR)—a complication of diabetes that damages blood vessels in the retina and can lead to vision loss if untreated. The system analyses fundus photographs (i.e. images of the back of the eye) to detect early signs of the condition, processing approximately 200,000 images annually across 50 screening locations.

Given that screening results directly affect patient care pathways, and determine whether patients are referred for specialist treatment or continue routine monitoring, the Programme has commissioned an assurance case. The focus of this case is to demonstrate that the system provides explanations that enable clinicians to understand, verify, and appropriately act on AI recommendations.

System Description

What the System Does

The Diabetic Retinopathy Screening System (DRSS) supports the NHS screening programme by:

Analysing digital fundus photographs for signs of diabetic retinopathy
Classifying images according to the NHS grading scheme (e.g. R0-R3 for retinopathy severity, M0-M1 for maculopathy)
Generating visual explanations highlighting areas of concern
Providing confidence scores and uncertainty estimates
Flagging cases requiring urgent specialist review

How It Works

During a routine diabetic eye screening appointment:

Image Capture: A trained screener captures fundus photographs of both eyes using a digital retinal camera
Image Quality Assessment: The system first evaluates whether image quality is sufficient for reliable analysis
Feature Detection: A deep learning model identifies relevant clinical features: microaneurysms (tiny bulges in blood vessels), haemorrhages (bleeding), exudates (fatty deposits), and neovascularisation (abnormal new blood vessel growth)
Severity Classification: Based on detected features, the system assigns a grade according to NHS grading criteria
Explanation Generation: The system produces visual attention maps (highlighting which image regions influenced the decision) and natural language explanations
Clinical Review: A qualified grader reviews the AI output alongside the images and makes the final grading decision

Key Technical Details

Aspect	Details
Model Architecture	EfficientNet-B4 (a neural network optimised for image classification) with attention mechanism (highlighting influential image regions) for explanation generation
Training Data	500,000 graded retinal images from UK screening programmes with expert consensus labels
Input	Two fundus photographs per eye (macula-centred and disc-centred views)
Output	Grade (R0-R3, M0-M1), confidence score (0-100%), attention heatmap, feature-level predictions, natural language summary
Performance	Sensitivity: 95.2%, Specificity: 89.7% for referable retinopathy (validated on held-out UK dataset)
Explainability Methods	Grad-CAM (gradient-weighted class activation mapping, showing which image regions most influenced the classification), attention visualisation, prototype-based reasoning (comparing to similar known cases)
Validation	Prospective clinical validation; quarterly performance monitoring; annual external audit

Deployment Context

Coverage: Multiple screening programmes across NHS Midlands, serving populations in Birmingham, Coventry, Leicester, Nottingham, and Derby
Volume: ~200,000 screening episodes annually
Patient Population: Adults with Type 1 or Type 2 diabetes
Workflow Integration: Embedded within existing grading workflow; all cases receive human review
Operational Since: September 2023

Stakeholders

Stakeholder	Interest	Concern
Patients	Accurate screening and understandable results	Need to understand why they’re being referred (or not)
Screening Graders	Reliable AI assistance that supports their expertise	Must understand AI reasoning to make informed decisions
Ophthalmologists	Appropriate referrals with relevant clinical context	Need explanations that inform treatment planning
Programme Managers	Efficient screening with maintained quality	Balance throughput with clinical safety
MHRA	Safe and effective medical device operation	Regulatory compliance and post-market surveillance
NHS England	National screening programme integrity	Consistent standards across all centres

Regulatory Context

The system operates within several regulatory frameworks:

UK Medical Devices Regulations 2002: DRSS is classified as a Class IIa medical device requiring CE/UKCA marking
MHRA Guidance on AI as a Medical Device: Requirements for clinical evidence, post-market surveillance, and transparency
NHS Diabetic Eye Screening Programme Standards: National standards for screening quality and grading accuracy
NICE Guidance: Evidence requirements for AI in diagnostic pathways
UK GDPR: Special category health data processing requirements; data protection impact assessment required
Duty of Candour: Healthcare providers must be open with patients about their care

Explainability Considerations

Several aspects of this system require careful attention to explainability:

Clinical Decision Support

Graders must understand why the AI has assigned a particular grade to make informed decisions. A simple classification without reasoning could lead to over-reliance or inappropriate dismissal of AI recommendations.

Multiple Audiences

Explanations must serve different users with different needs:

Graders need technical detail about detected features and their locations
Ophthalmologists need clinically relevant information for treatment planning
Patients need accessible explanations of their results

Uncertainty Quantification and Communication

The system must clearly communicate when it is uncertain, enabling graders to apply additional scrutiny to borderline cases rather than treating all AI outputs with equal confidence.

Feature Attribution Accuracy

Visual explanations (heatmaps) must accurately reflect the regions the model used for its decision. Misleading explanations could be worse than no explanation at all.

Edge Cases and Limitations

The system must clearly indicate when images or cases fall outside its reliable operating envelope, such as unusual pathology, poor image quality, or rare presentations.

Assurance Focus

The assurance case should demonstrate that:

The Diabetic Retinopathy Screening System provides explanations that enable qualified graders to understand, verify, and appropriately act on AI recommendations for patient care.

Deliberative Prompts

What makes an explanation “good enough” for a clinical decision, and who decides?
How do you balance explanation detail against the time pressures of high-volume screening?
When an explanation is technically accurate but clinically misleading, whose responsibility is the resulting harm?
How should uncertainty be communicated without undermining trust in the system or encouraging graders to ignore it?
What do patients need to understand about AI involvement in their care, and when should this be disclosed?

Suggested Strategies

When developing your assurance case, consider these potential approaches:

Strategy 1: Explanation Fidelity

Ensure that visual and textual explanations accurately represent the model’s actual reasoning process, so that clinicians are not misled about why a particular recommendation was made.

Strategy 2: Clinical Workflow Integration

Design explanations that fit naturally within graders’ existing decision-making processes, providing the right information at the right time without creating cognitive overload or workflow disruption.

Strategy 3: Uncertainty and Limitation Communication

Develop clear, calibrated ways of communicating model confidence and indicating when the system is operating outside its reliable envelope, enabling graders to appropriately modulate their scrutiny and know when to exercise additional caution or seek alternative assessment.

Strategy 4: Audience-Appropriate Explanations

Create explanation formats tailored to the distinct needs of graders, ophthalmologists, and patients, ensuring each group receives information they can understand and act upon.

Recommended Techniques for Evidence

The following techniques from the TEA Techniques library may be useful when gathering evidence for this assurance case:

Gradient-weighted Class Activation Mapping (Grad-CAM) - Generate visual heatmaps showing which regions of the retinal image most influenced the classification decision
Saliency Maps - Provide pixel-level attribution showing which parts of the image the model considers most relevant for its prediction
Conformal Prediction - Produce calibrated confidence intervals for predictions, enabling graders to understand when the model is uncertain