Stanford Tests AI Tool to Help Clinicians Explain Imaging and Lab Results
A new pilot program at Stanford Health Care suggests that artificial intelligence may help primary care providers better communicate test results to patients while reducing administrative burden. Early feedback shows strong enthusiasm for the tool, though concerns about accuracy and completeness remain. The findings were published in JAMA Network Open.
The initiative centers on an electronic health record–integrated application that uses generative AI to draft plain-language explanations of lab, imaging, and pathology results. Physicians can review and edit these drafts before sending them to patients via the portal. Stanford researchers chose Claude 3.5 Sonnet (Anthropic) as the underlying large language model, citing its fast response time, adherence to prompts, and ability to mimic clinician writing styles.
The program builds on other AI-based features already in use at Stanford, such as tools for composing responses to patient messages. This new application specifically addresses the challenge of distilling technical test findings into language that patients can easily understand. “This study demonstrated the utility of a generative AI tool for drafting test result explanations, highlighting ease of use, improved efficiency, and higher-quality explanations,” wrote lead author Shreya J. Shah, MD, and colleagues.
The pilot invited primary care providers (PCPs) to test the tool over several weeks. Of 244 clinicians who tried it at least once, 93 completed follow-up surveys. Nearly 85% of respondents rated the tool straightforward and user-friendly. Many described it as particularly useful for explaining lab results (72%) and imaging findings (63%).
Survey responses also indicated that clinicians saw improvements in efficiency (71%) and in the quality of patient-facing explanations (72%). More than half reported using the tool frequently, and 83% anticipated continuing long-term. About 54% believed the program was ready for broader rollout across the health system. Average perceived time savings was modest—roughly 1.1 minutes per task—but individual experiences ranged from significant time saved to slight additional time spent.
Clinicians also commented on how the tool might improve patient engagement by making test results more understandable and accessible. However, barriers to adoption centered on concerns about the accuracy and completenessof AI-generated content. Some suggested that outputs would be more useful if they incorporated patient-specific contextfrom visit notes, or if workflow integration were streamlined further.
In their analysis, Shah and colleagues emphasized that while AI-generated draft comments show promise, further refinements are necessary: “Additional improvements should focus on optimizing prompts, updating the LLM, incorporating patient-specific context, and streamlining workflow integration. Future evaluations should quantify impacts on clinician inbox burden (time spent, message volume) and consider patient perspectives.”
Stanford also highlighted a regulatory backdrop for the project. In a January news release, leaders noted that the tool supports compliance with the 21st Century Cures Act, which requires healthcare organizations to release results quickly to patients. By automating draft explanations, the program could help physicians meet this mandate while easing the communication gap that often arises with technical medical data.
The early success of Stanford’s pilot underscores both the potential and the challenges of embedding AI into frontline clinical workflows. For now, the study points toward a hybrid model—AI to generate initial drafts, with physicians ensuring accuracy and tailoring messages to individual patients.