AI systems capable of clinical reasoning and dialogue have the potential to dramatically increase access to medical expertise and care while giving physicians back time with their patients where it truly matters. However, developing these technologies responsibly requires a rigorous, evidence-based approach. Over the past few years, our teams have explored the “art of the possible” through research systems that demonstrate clinician-level capabilities in simulated settings. While we have begun testing the safety and feasibility of these systems in clinical settings, moving to the next stage of assessing these systems requires additional rigor and scale. It involves studying the utility and impact of AI in virtual care involving more patients across an array of geographies and conditions and with controlled comparisons.
Today, we are announcing a significant step in that ongoing research journey: In partnership with Included Health, a leading US healthcare provider, we will be launching, pending Institutional Review Board (IRB) approval, a prospective consented nationwide randomized study to assess AI in a real-world virtual care setting. This new research will build upon our foundational research on the use of AI for diagnostic and management reasoning, personalized health insights and navigating health information.
This work represents a significant evolution in our research. Early studies published in Nature first assessed our AI system’s diagnostic reasoning capabilities, including its assistive effect for physicians. We then compared the system’s conversational diagnostic capabilities to those of primary care physicians in simulated settings with patient actors. In addition to understanding capabilities, we also explored a physician-centered paradigm with asynchronous oversight of AI. Our initial step toward testing conversational AI in real-world clinical settings was a single-center feasibility study in partnership with Beth Israel Deaconess Medical Center. The study’s goal was to demonstrate the system’s safety based on outcome measures like the number of interruptions by the safety supervisor in response to safety concerns. We have observed strong indications of safety in this initial study and look forward to sharing results when complete.

