Transparent Clinical GenAI System with Legacy Data

| Field | Details |
|---|---|
| Domain | Healthcare AI / Data Governance |
| Assurance Goal | Transparency |
Overview
The Greater Manchester Integrated Care System (ICS) has deployed a Generative AI-powered clinical decision support system to assist clinicians in managing patients with complex, multi-morbid conditions. The system—called ClinAssist—analyses patient records and generates natural language summaries, identifies potential care gaps, suggests evidence-based interventions, and drafts clinical correspondence.
ClinAssist was trained on 15 years of electronic health records from across Greater Manchester, comprising over 3 million patient records. However, these records were collected under consent frameworks that permitted data use for “improving the quality of care” and “healthcare research”—terms that predate the emergence of large language models (LLMs) and generative AI (GenAI). Patients did not specifically consent to their data being used to train GenAI systems.
Following concerns raised by patient advocacy groups and the local Healthwatch organisation, the Greater Manchester ICS has commissioned an assurance case to demonstrate that ClinAssist operates transparently, with appropriate accountability for its use of patient data and clear communication to patients about how their information contributes to AI-assisted care.
System Description
What the System Does
ClinAssist supports clinical workflows across primary and secondary care by:
- Generating comprehensive patient summaries from fragmented health records across multiple care settings
- Identifying potential care gaps by comparing patient records against clinical guidelines
- Suggesting evidence-based interventions tailored to the patient’s specific circumstances
- Drafting clinical correspondence (referral letters, discharge summaries, GP communications)
- Answering clinician queries about patient history and relevant clinical evidence
- Flagging potential drug interactions, contraindications, and safety concerns
How It Works
When a clinician accesses ClinAssist for a patient consultation:
- Record Retrieval: The system accesses the patient’s integrated care record, pulling data from GP records, hospital episodes, community care, mental health services, and social care where relevant
- Context Assembly: A retrieval-augmented generation (RAG) architecture identifies the most relevant portions of the patient’s history for the current clinical context
- Query Processing: The clinician’s request (e.g., “summarise this patient’s diabetes management over the past two years”) is processed by the language model
- Response Generation: The LLM generates a response drawing on the patient’s specific records, clinical guidelines, and its training knowledge
- Citation and Grounding: The system provides citations to specific records that informed its response, enabling clinician verification
- Audit Logging: All queries and responses are logged for clinical governance and audit purposes
Key Technical Details
| Aspect | Details |
|---|---|
| Model Architecture | Fine-tuned large language model (based on open-source foundation model) with retrieval-augmented generation; 70 billion parameters |
| Training Data | Foundation model pre-trained on general text corpora; fine-tuned on 3.2 million patient records (2008-2023) from GM ICS; RAG knowledge base includes clinical guidelines from NICE, royal colleges, specialist societies, and medical literature |
| Fine-tuning Approach | Supervised fine-tuning on clinician-verified summaries and responses; reinforcement learning from human feedback (RLHF) with clinical experts |
| Input | Patient record context (retrieved), clinician query, current clinical guidelines |
| Output | Natural language response with citations to source records, confidence indicators, and safety flags |
| Guardrails | Content filtering for harmful outputs; mandatory citation requirements; escalation triggers for high-risk scenarios |
| Validation | Clinical accuracy assessment by specialty panels; ongoing monitoring of clinician acceptance rates and corrections |
Deployment Context
- Coverage: All localities within Greater Manchester ICS, serving 2.8 million residents
- Care Settings: Primary care (GP practices), acute hospitals, community services, mental health trusts
- Users: ~12,000 clinicians including GPs, hospital doctors, nurses, pharmacists, and allied health professionals
- Query Volume: ~8,000 clinical queries daily (growing as adoption increases)
- Patient Records Accessed: 3.2 million patient records used for fine-tuning; ~50,000 individual records accessed daily for inference
- Operational Since: September 2024
Stakeholders
| Stakeholder | Interest | Concern |
|---|---|---|
| Patients (Past) | Data used appropriately and respectfully | Did not consent to GenAI training; may feel data was used beyond reasonable expectations |
| Patients (Current) | Benefit from AI-assisted care; understand AI involvement | Need to know how AI uses their data and contributes to their care |
| Clinicians | Efficient, accurate clinical support | Must understand data provenance and system limitations; professional accountability |
| GM ICS Leadership | Improved care quality and efficiency | Legal and reputational risk from data governance concerns; public trust |
| Information Governance Teams | Lawful, fair data processing | Ensuring compliance with evolving interpretation of consent and data protection law |
| Healthwatch Greater Manchester | Patient voice in healthcare decisions | Ensuring patients have meaningful say in how their data is used |
| ICO | Data protection compliance | Interpretation of consent, purpose limitation, and automated decision-making provisions |
| NHS England | Safe, effective AI deployment nationally | Setting precedents for GenAI data governance across the NHS |
Regulatory Context
The system operates within a complex and evolving regulatory landscape:
- UK GDPR and Data Protection Act 2018: Lawful basis for processing (consent, legitimate interests, or public task); purpose limitation principle; transparency requirements; Article 22 rights regarding solely automated decision-making (noting ClinAssist is decision support with clinician oversight, so Article 22 may not directly apply, though its principles inform system design)
- Common Law Duty of Confidentiality: Patients’ reasonable expectations about how their health information will be used
- NHS Act 2006 (as amended): Section 251 provisions for using patient data without consent for medical purposes in certain circumstances (noting that Confidentiality Advisory Group approval for GenAI training represents novel and contested regulatory ground)
- Caldicott Principles: Framework for handling patient-identifiable information, including the principle that patient data should only be used for the purpose for which it was collected
- National Data Guardian Guidance: Recommendations on patient data use, including the importance of transparency and public engagement
- ICO Guidance on AI and Data Protection: Emerging guidance on fairness, transparency, and accountability in AI systems
- EU AI Act (for reference): While not directly applicable post-Brexit, provides influential framework for high-risk AI classification
- National Data Opt-Out: Patients who have registered a national data opt-out must have their data excluded from uses beyond their direct care; ClinAssist excludes opted-out patients from both training data and inference
- UK Medical Devices Regulations: ClinAssist is classified as a Class IIa medical device (clinical decision support software) requiring UKCA marking and conformity with MHRA guidance on AI as a Medical Device
Transparency Considerations
Several aspects of this system require careful attention to transparency:
Consent Scope and Evolution
The original consent frameworks under which patient data was collected could not have anticipated generative AI:
- “Improving quality of care” was typically understood to mean service audits and quality improvement, not training AI models
- “Healthcare research” traditionally implied statistical analysis, not systems that generate novel content
- Patients could not have formed reasonable expectations about uses that did not exist when they consented
Data Provenance in Generated Content
When ClinAssist generates a clinical summary or recommendation:
- Which specific patient records influenced the output?
- How does training data (from other patients) influence responses about a specific patient?
- Can patients understand and contest how their data contributed to the model’s capabilities?
Distinguishing Patient Data Roles
Patient data plays multiple distinct roles in the system:
- Training data: Historical records used to train the underlying model (one-time, aggregated)
- Inference data: Current patient’s records used to generate specific responses (real-time, individual)
- These different uses have different privacy implications and may warrant different transparency approaches
Attribution and Accountability
When the system provides clinical advice:
- How should the relative contributions of training data, retrieved records, and model reasoning be communicated?
- Who is accountable if advice informed by another patient’s data leads to harm?
- How can patients exercise rights over data that has been absorbed into model weights?
Public Trust and Social Licence
Beyond legal compliance, the system’s legitimacy depends on public trust:
- Do patients believe their data should be used this way?
- Has there been adequate public engagement about AI uses of health data?
- How should the system’s existence and data practices be communicated to the public?
Assurance Focus
The assurance case should demonstrate that:
ClinAssist operates transparently, with clear accountability for its use of patient data, meaningful communication to patients about AI involvement in their care, and appropriate mechanisms for patients to understand and exercise their data rights.
Deliberative Prompts
- Can consent given for “healthcare research” in 2010 legitimately cover training generative AI in 2024? Where should the line be drawn on evolving interpretations of historical consent?
- What is the ethical balance between using data for patient benefit and respecting patient autonomy when explicit consent for AI training was not obtained?
- How should transparency obligations differ for training data (where specific attribution may be impossible) versus inference data (where the patient is identifiable)?
- If a patient exercises their right to erasure, what does this mean for data that has been absorbed into model weights? Is meaningful erasure even possible?
- Who bears the burden of ensuring patients understand AI involvement in their care—the technology provider, the healthcare organisation, or individual clinicians?
- Does providing better care through AI-assisted insights justify using data in ways patients may not have anticipated? How do we weigh collective benefit against individual autonomy?
Suggested Strategies
When developing your assurance case, consider these potential approaches:
Strategy 1: Layered Transparency
Develop transparency mechanisms appropriate to different audiences and contexts:
- Public-facing communications about the system’s existence and general data practices
- Patient-specific notifications when AI contributes to their individual care
- Detailed technical documentation for governance bodies and regulators
- Clinician guidance on explaining AI involvement to patients
Strategy 2: Data Lineage and Attribution
Implement technical capabilities to trace how patient data influences system outputs:
- Clear distinction between training data contributions and inference data use
- Citation mechanisms linking generated content to source records
- Audit trails enabling retrospective analysis of data influence
Strategy 3: Meaningful Consent Refresh
Develop mechanisms for patients to express preferences about their data use:
- Prospective consent options for new patients and new data collection
- Retrospective engagement with patients whose historical data was used
- Opt-out mechanisms that are practically meaningful (even if complete erasure from trained models is infeasible)
Strategy 4: Public Engagement and Social Licence
Establish ongoing dialogue with the public about AI use of health data:
- Citizen panels and deliberative engagement on data governance
- Transparent reporting on system performance, benefits, and limitations
- Responsive governance that adapts to public concerns
Strategy 5: Accountability Frameworks
Clarify accountability for AI-influenced clinical decisions:
- Clear allocation of responsibility between technology provider, healthcare organisation, and individual clinicians
- Governance structures for investigating adverse events involving AI
- Mechanisms for patients to raise concerns and seek redress
Recommended Techniques for Evidence
The following techniques from the TEA Techniques library may be useful when gathering evidence for this assurance case:
- Model Cards - Create standardised documentation describing the model’s training data, intended use, and limitations
- Datasheets for Datasets - Document the provenance, composition, and intended uses of training datasets to enable transparency about data origins
- Retrieval-Augmented Generation Evaluation - For RAG systems, evaluate and document how retrieved content influences generated responses
- Membership Inference Attack Testing - Assess privacy risks by testing whether specific patient records can be identified as having been in training data
- Model Development Audit Trails - Maintain comprehensive records of data processing, model training decisions, and validation results for governance and accountability