One man’s opinion on the state of the market
To make full use of real-world data, AI looms as the newest refinery, but much work needs to be done to get key essential advantages.

Over the past few months, I’ve had more conversations than I can count trying to make sense of where the real-world data (RWD) space is headed. I’ve learned a lot in a short while and now I know I have a lot more to learn.
After working to consolidate my notes and come up with a cogent piece of thought leadership, I think I have more questions than when I started.
Reader beware, this may not be cogent and is far from thought leadership. I have not come away with any more clarity; I have more questions, some observations and a creeping sense that we’re on the cusp of a market inflection point. Whatever is coming, it certainly feels like we are positioned for potentially seismic change.
The pace of unpredictable change
The pace of change reverberating through this industry with frightening rapidity has become wholly unpredictable. While my trusty crystal ball can’t tell me what the next year, or even the next few months, look like. I have come away with a sense that there are opportunities emerging from the ether.
We have a laundry list of long-standing problems, and some of the technical resources needed to solve them are becoming more widely available. Advances in cloud infrastructure, natural language processing and the speedy evolution of language/reasoning models all help to accelerate these opportunities.
Many of these are in areas that we have historically referred to as “last mile problems,” in layman’s terms — let’s solve all the really hard stuff later.
Well, welcome to later.
The healthcare house of cards
The data we utilize is downstream of source data collection, processing and the outdated and poorly reinvested-in technology infrastructure that underpin the U.S. healthcare system.
As technology and data driven thinkers, it’s easy for us to forget the house of cards that the U.S. healthcare technology infrastructure is built upon. This lack of awareness is the key reason why many enter the healthcare industry believing that technology alone will solve the problem or is in some way transferable from other industries — usually to find failure or, worse, compound the problem.
By most accounts, America spends the most on healthcare among developed countries, but has the worst outcomes. As we go about our daily business of “innovating” these problems away, we should remind ourselves of the stark reality of the industry.
Here are a few grim paragraphs from a recent article from the Georgetown Journal of International Affairs.
“The Center for Medicare and Medicaid Services recently released data showing that US health expenditure rose by 7.5 percent in 2023 to $4.867 trillion in 2023, representing 17.6 percent of (gross domestic product) and $14,570 per capita. Per capita health expenditure in the United States was 89 percent higher than the average for developed countries and 56 percent higher than Switzerland, the second-highest spender.
“Despite this seismic spending, the United States lags behind all OECD countries in several key health metrics, including infant mortality, the prevalence of chronic conditions and life expectancy. Thus, the US healthcare system — characterized by exceptionally high financial investment and poor outcomes — delivers a strikingly disappointing ROI as compared to the developed world.”
The challenges to be faced
The industry has a variety of large problems that the industry needs to confront.
The AI morass. It’s an epic adventure wading through the quagmire of advanced analytics and artificial intelligence in the healthcare space. As many of you have heard me say, let’s make AI RPA again — it’s funny until you think about it and very often becomes dark satire when you ask some of these companies to explain their business model.
Follow the money. Plenty of venture dollars have been used with varying degrees of responsibility, and some of the wells are quickly running dry, if not collapsing in on themselves. Consolidation, attrition and well-marketed recaps are no longer specters on the horizon. They are here and accompanied by large investors watching the space with enough dry powder to alter the landscape.
Data: The crude reality. Everyone likes to say “data is the new oil.” And like oil, data’s value depends not just on its existence, but on how (and whether) it gets refined, transported and used.
Conflicting forces, rising pressure. Right now, I believe there are several various forces pulling the healthcare data market in different directions.
Vertical integration. As a producer and refiner, you have much more control over your own destiny (and use rights). The fact that many current producers also have well-established annual recurring revenue (ARR) and sticky SaaS models make them highly attractive acquisition targets. Why wouldn’t they be? An acquirer gets the ARR and the ability to capture the value of the exhaust.
The oil rush. New entrants are driving data proliferation and new data sources seem to be appearing in the market in droves. Everyone who touches data in any capacity believes they have found El Dorado. A minority will have a well that will produce; an even smaller fraction has high-quality extracts. The reality is that the majority are sitting on a dry well or they didn’t draft the right fine print and didn’t get the mineral rights.
Inflated expectations. Overinflated expectations related to the value of the data have historically been challenging. This is still an issue but becoming less so as more types of data become commoditized. Pricing in this space is still an art not a science, and at the end of the day, the data is worth what a buyer is willing to pay for it. The data exhaust that is created leads to very high margin revenue. So, if you have a free-flowing well, make hay while the sun shines.
Supply chain disruption. Ongoing supply disruptions and the ripple effects of last year’s data breach continue, pushing both buyers and suppliers into new territory. Legacy data providers have re-evaluated distribution channels and partnerships, forcing many to find new sources. This ongoing uncertainty has led buyers to build direct relationships with producers while also pushing some into vertical integrations.
New refineries emerging. Secondary resellers are working with producers to refine higher grades of data as they dive deep into clinical notes and multi-modal data sources to drive insights into specific therapeutic areas. It’s becoming apparent that while some producers excel at drilling the wells and extracting the raw crude, they may lack the clinical, technical and financial expertise to further refine that into something combustible.
Private equity consolidation. Private equity rollups are inflating prices at the pump, and scaled up players are edging out smaller competitors. This aggregation of wells is a trend that is continuing to build momentum. When it comes to price increases in this space, rising tides float all boats.
Cloud infrastructure evolution. Cloud infrastructure and adoption is evolving quickly, and the ease of data access via clean rooms will enable wider access and faster consolidation of wells. Clean rooms bring the promise of data processing in place enabling streamlined access, query, cohort building, privacy work, data harmonization and data delivery. Assuming data providers adopt this approach, this may further disrupt the supply chain and bring new assets to market faster than before.
Quality remains king. Data quality remains a foundational issue. Even when data is available, the effort to clean, harmonize and ready it for analytics varies wildly across producers, refiners, use cases and therapeutic areas.
Crude vs. refined: The AI overlay
In oil and gas, crude has limited utility. Its value multiplies only after refining. The same is true with data. Raw real-world data might check a box for coverage, but without tools to structure it — natural language processing, tokenization, annotation and the like — its usability is limited.
AI is the newest refinery. But even that isn’t a cure-all; AI models are only as good as the data they’re trained on. “Analytics-ready” is the new holy grail, but getting there takes real infrastructure, tooling and expertise.
The companies that understand both the limitations of raw data and the potential of properly refined information are positioning themselves to thrive in this evolving marketplace. The challenge isn’t just having data or even AI capabilities — it’s having the right fusion of domain expertise, technology and commercial vision.
Finding the opportunity
If we follow the oil analogy, the winning plays may lie in the following.
Infrastructure platforms that streamline privacy, access, query, cohort building and data logistics (inclusive of data delivery) are becoming increasingly valuable. As data sources proliferate, the ability to efficiently manage and deliver this information becomes a competitive advantage.
Infrastructure is not just technology. The commercial models, data contracting and governance processes are complex, time consuming and serviced by a limited number of people (and lawyers) that truly understand the dynamics. Seamless commercial enablement may prove to be the master key within this infrastructure, overlooked at one’s own peril.
These platforms will serve as the pipelines and transportation networks of our data economy — essential infrastructure that enables everything downstream. The ability to flow data and insights to end buyers through these pipelines will be critical to data consumers. Ease of use at the pump will be a clear competitive advantage.
Data refineries
Overall data quality, harmonization, annotation and data abstraction/enhancement to serve up analytics-ready data across diverse data types is still lacking. Nobody wants to take accountability for data quality, and oftentimes the issues start at the source where the data is created, which further complicates the issue.
Given the secondhand nature of de-identified data, it’s challenging to go back to the source and correct data issues. Targeted data abstraction and deep insight creation from unstructured clinical data can provide some relief; however, it’s not without additional tools and effort.
There are some examples of secondary resellers deploying tools and clinical expertise across large unstructured data assets to refine the highest grades of data in targeted therapeutic areas. These specialized refineries represent a significant opportunity in the market.
Wellhead aggregation
Stitching fragmented sources together in a way that’s usable and building economies of scale across a large array of smaller wells remains a key opportunity.
While all data is valuable, small sources of commoditized data are hard to commercialize without some level of aggregation. Additionally, for small, commoditized data assets, the costs of commercialization (privacy, tokenization and more) can quickly outweigh the commercial benefit.
For example, there is still an entire universe of untapped lab data under the surface that has yet to be explored. The same is true for unstructured data across various data types and sources.
Resource exploration
Companies with data “exhaust” still have a real opportunity, if they know how to harness it. The environment is constantly changing, and understanding how to produce and commercialize data takes the right partners to maximize the value while ensuring compliance with regulatory and privacy requirements.
The real-world data landscape is at a critical juncture. As traditional sources face challenges, new technologies create possibilities and market pressures reshape business models, we’re likely to see winners and losers emerge at an accelerated pace.
Those who understand both the technical and commercial realities of healthcare data — who can extract, refine, and deliver true value rather than just raw information — will find themselves riding the next wave of industry growth.
For the rest? Those wells might run dry sooner than expected.
Scott Ponder is the founder of Channel Mark Advisors, a boutique consultancy specializing in data commercialization and strategic data sourcing in the real-world data space. He has built and commercialized multiple RWD products, driving significant revenue in the space. He acknowledges the help of Su Huang, founder and managing director of RWD Advisory LLC. for edits and moral support.