No Roads, No Revolution: Why Healthcare’s Broken Data Infrastructure Is Undermining AI Innovation
In the age of generative models and multimodal clinical assistants, healthcare AI is often heralded as the next frontier—one poised to solve physician burnout, accelerate diagnosis, and optimize health system efficiency. Yet beneath the optimism lies a sobering reality: the infrastructure needed to power scalable, clinically meaningful AI simply does not exist in most of U.S. healthcare.
This is the warning issued by Mitesh Rao, MD, MHS, CEO of OMNY Health, in a recent interview with MedCity News. According to Rao, without interoperable and representative data at scale, even the most advanced AI algorithms will struggle to deliver value beyond surface-level automation.
“We haven’t built the roads—we’re trying to put Ferraris on dirt,” Rao said. “Until we address the fragmented, inaccessible nature of healthcare data, innovation in AI will hit a ceiling.”
AI Growth, Infrastructure Stagnation
Investment in healthcare AI has surged. In 2024 alone, U.S. digital health startups raised more than $4.3 billion for AI-driven platforms, according to Rock Health. Companies like Abridge (ambient documentation), Ambience (AI scribe systems), and Hippocratic AI (LLMs for clinical communication) have crossed the billion-dollar valuation mark. Regulatory agencies, including the White House, are actively promoting frameworks to integrate AI into critical infrastructure sectors like healthcare.
But the explosion in funding and hype has outpaced progress in solving the most fundamental constraint: access to clean, standardized, and interoperable data.
Most successful healthcare AI applications today operate in narrow domains—revenue cycle automation, clinical documentation, or administrative workflows—precisely because they depend on structured data that's relatively easy to access. When it comes to more ambitious use cases, such as predictive diagnostics, AI-assisted clinical decision-making, or population-level disease modeling, innovation consistently runs into a wall: incompatible systems, siloed records, and a lack of common data standards.
Siloes, Standards, and Status Quo Incentives
The root of the problem is the fragmented nature of healthcare’s IT ecosystem. Patient data is scattered across a multitude of proprietary EHRs, claims systems, and clinical repositories—each with its own taxonomy, access controls, and institutional ownership.
Despite federal efforts such as the 21st Century Cures Act and CMS’ Interoperability and Patient Access Rule, progress remains slow. While application programming interfaces (APIs) like FHIR (Fast Healthcare Interoperability Resources) have made inroads, widespread adoption is uneven, and real-world implementation is patchy.
“The incentives are not aligned,” Rao emphasized. “Large incumbents don’t gain financially by making data more portable. In many cases, the business model depends on vendor lock-in.”
Vendors like Epic and Oracle Cerner, which together cover the majority of hospital EHR market share in the U.S., have faced longstanding criticism for limiting data portability, imposing high interface fees, or offering incomplete access through proprietary APIs. While both have made public commitments to interoperability, data liquidity often remains tightly controlled in practice.
Policy without Penalties Falls Flat
Although CMS and ONC continue to introduce interoperability initiatives—including TEFCA (Trusted Exchange Framework and Common Agreement)—many experts argue that current policies lack teeth. Without strict enforcement mechanisms or direct financial consequences for noncompliance, health systems and vendors have little motivation to change.
The issue isn't only technical—it's also economic and political. Without reimbursement models that reward data sharing or penalize information blocking, the burden falls on innovators to “work around” systemic fragmentation.
“If you’re building AI in retail, you get access to millions of rows of clean transaction data. In healthcare, you get 20 Excel files from five systems in three different formats, none of which talk to each other,” noted a senior data engineer at a health tech startup, who requested anonymity due to contractual restrictions.
The Clinical Cost of Inaccessible Data
The consequences of this data fragmentation are not merely operational—they are clinical.
AI models trained on limited, non-representative, or outdated datasets risk entrenching bias, missing rare conditions, or failing to generalize beyond a narrow setting. A 2022 JAMA Network Open study found that most clinical AI models in use were developed using data from fewer than five healthcare systems—raising serious concerns about generalizability and safety.
Moreover, lack of real-time data access impairs AI tools designed for dynamic clinical settings, such as inpatient monitoring or predictive deterioration scoring. When models can’t “see” the latest labs, medications, or care plan changes, their clinical utility diminishes sharply.
Interoperability as National Infrastructure
What’s needed, Rao and other experts argue, is a reframing of healthcare data infrastructure as a form of national critical infrastructure—akin to roads, energy grids, or air traffic control.
Building such an ecosystem would require:
Standardized, enforced data formats across EHR vendors and health systems
Mandatory data access APIs with usage transparency
Real penalties for information blocking, as outlined in the Cures Act
Neutral data intermediaries to facilitate secure, de-identified exchange
Public-private partnerships to maintain infrastructure across regions
The U.K.’s NHS Spine and Estonia’s national health record system offer models where centralized data exchange, real-time patient access, and regulatory coordination have enabled greater digital innovation without compromising security.
From Hype to High Ground
The promise of AI in healthcare is not unfounded. But translating that promise into measurable outcomes will require more than technical advances or venture capital—it will demand infrastructure reform, policy resolve, and a willingness to prioritize the invisible foundation behind every clinical algorithm: trustworthy, connected, and actionable data.
“We’re trying to build the future of healthcare on a foundation that can barely support email attachments,” Rao quipped. “Until that changes, the revolution will remain stalled on the runway.”
Sources:
MedCity News, “Healthcare’s Poor Data Infrastructure Is Hindering AI Innovation,” Katie Adams, August 2025.
JAMA Network Open, “External Validity of Published Clinical Machine Learning Models,” 2022.
CMS Interoperability and Patient Access Final Rule, 2020.
Office of the National Coordinator for Health IT (ONC), TEFCA Framework.
Rock Health Funding Database, 2024.
OMNY Health, corporate communications.