AI & Data Infrastructure in Healthcare

No Roads, No Revolution: Why Healthcare’s Broken Data Infrastructure Is Undermining AI Innovation 

In the age of generative models and multimodal clinical assistants, healthcare AI is often heralded as the next frontierone poised to solve physician burnout, accelerate diagnosis, and optimize health system efficiency. Yet beneath the optimism lies a sobering reality: the infrastructure needed to power scalable, clinically meaningful AI simply does not exist in most of U.S. healthcare. 

This is the warning issued by Mitesh Rao, MD, MHS, CEO of OMNY Health, in a recent interview with MedCity News. According to Rao, without interoperable and representative data at scale, even the most advanced AI algorithms will struggle to deliver value beyond surface-level automation. 

“We haven’t built the roadswe’re trying to put Ferraris on dirt,” Rao said. “Until we address the fragmented, inaccessible nature of healthcare data, innovation in AI will hit a ceiling.” 

AI Growth, Infrastructure Stagnation 

Investment in healthcare AI has surged. In 2024 alone, U.S. digital health startups raised more than $4.3 billion for AI-driven platforms, according to Rock Health. Companies like Abridge (ambient documentation), Ambience (AI scribe systems), and Hippocratic AI (LLMs for clinical communication) have crossed the billion-dollar valuation mark. Regulatory agencies, including the White House, are actively promoting frameworks to integrate AI into critical infrastructure sectors like healthcare. 

But the explosion in funding and hype has outpaced progress in solving the most fundamental constraint: access to clean, standardized, and interoperable data. 

Most successful healthcare AI applications today operate in narrow domainsrevenue cycle automation, clinical documentation, or administrative workflowsprecisely because they depend on structured data that's relatively easy to access. When it comes to more ambitious use cases, such as predictive diagnostics, AI-assisted clinical decision-making, or population-level disease modeling, innovation consistently runs into a wall: incompatible systems, siloed records, and a lack of common data standards. 

Siloes, Standards, and Status Quo Incentives 

The root of the problem is the fragmented nature of healthcare’s IT ecosystem. Patient data is scattered across a multitude of proprietary EHRs, claims systems, and clinical repositorieseach with its own taxonomy, access controls, and institutional ownership. 

Despite federal efforts such as the 21st Century Cures Act and CMS’ Interoperability and Patient Access Rule, progress remains slow. While application programming interfaces (APIs) like FHIR (Fast Healthcare Interoperability Resources) have made inroads, widespread adoption is uneven, and real-world implementation is patchy. 

“The incentives are not aligned,” Rao emphasized. “Large incumbents don’t gain financially by making data more portable. In many cases, the business model depends on vendor lock-in.” 

Vendors like Epic and Oracle Cerner, which together cover the majority of hospital EHR market share in the U.S., have faced longstanding criticism for limiting data portability, imposing high interface fees, or offering incomplete access through proprietary APIs. While both have made public commitments to interoperability, data liquidity often remains tightly controlled in practice. 

Policy without Penalties Falls Flat 

Although CMS and ONC continue to introduce interoperability initiativesincluding TEFCA (Trusted Exchange Framework and Common Agreement)—many experts argue that current policies lack teeth. Without strict enforcement mechanisms or direct financial consequences for noncompliance, health systems and vendors have little motivation to change. 

The issue isn't only technicalit's also economic and political. Without reimbursement models that reward data sharing or penalize information blocking, the burden falls on innovators towork aroundsystemic fragmentation. 

If you’re building AI in retail, you get access to millions of rows of clean transaction data. In healthcare, you get 20 Excel files from five systems in three different formats, none of which talk to each other,” noted a senior data engineer at a health tech startup, who requested anonymity due to contractual restrictions. 

The Clinical Cost of Inaccessible Data 

The consequences of this data fragmentation are not merely operationalthey are clinical. 

AI models trained on limited, non-representative, or outdated datasets risk entrenching bias, missing rare conditions, or failing to generalize beyond a narrow setting. A 2022 JAMA Network Open study found that most clinical AI models in use were developed using data from fewer than five healthcare systemsraising serious concerns about generalizability and safety. 

Moreover, lack of real-time data access impairs AI tools designed for dynamic clinical settings, such as inpatient monitoring or predictive deterioration scoring. When models can’tseethe latest labs, medications, or care plan changes, their clinical utility diminishes sharply. 

Interoperability as National Infrastructure 

What’s needed, Rao and other experts argue, is a reframing of healthcare data infrastructure as a form of national critical infrastructureakin to roads, energy grids, or air traffic control. 

Building such an ecosystem would require: 

  • Standardized, enforced data formats across EHR vendors and health systems 

  • Mandatory data access APIs with usage transparency 

  • Real penalties for information blocking, as outlined in the Cures Act 

  • Neutral data intermediaries to facilitate secure, de-identified exchange 

  • Public-private partnerships to maintain infrastructure across regions 

The U.K.’s NHS Spine and Estonia’s national health record system offer models where centralized data exchange, real-time patient access, and regulatory coordination have enabled greater digital innovation without compromising security. 

From Hype to High Ground 

The promise of AI in healthcare is not unfounded. But translating that promise into measurable outcomes will require more than technical advances or venture capitalit will demand infrastructure reform, policy resolve, and a willingness to prioritize the invisible foundation behind every clinical algorithm: trustworthy, connected, and actionable data. 

We’re trying to build the future of healthcare on a foundation that can barely support email attachments,” Rao quipped. “Until that changes, the revolution will remain stalled on the runway.” 

Sources: 

  • MedCity News, “Healthcare’s Poor Data Infrastructure Is Hindering AI Innovation,” Katie Adams, August 2025. 

  • JAMA Network Open, “External Validity of Published Clinical Machine Learning Models,” 2022. 

  • CMS Interoperability and Patient Access Final Rule, 2020. 

  • Office of the National Coordinator for Health IT (ONC), TEFCA Framework. 

  • Rock Health Funding Database, 2024. 

  • OMNY Health, corporate communications. 

Liked the article? Share with friends: