How to Maximize HEDIS Scores with Synthetic Data for Software Development | Blog

In the U.S. healthcare industry, the Healthcare Effectiveness Data and Information Set (HEDIS) serves as the primary report card for health plans. Developed and maintained by the National Committee for Quality Assurance (NCQA), HEDIS is a standardized suite of performance measures used by more than 90% of health plans to track the quality of care and service provided to members.

The stakes for these scores are high. For payers, HEDIS performance is a core component of Medicare Advantage Star Ratings. These ratings determine eligibility for Quality Bonus Payments (QBPs)—which can amount to millions of dollars annually—as well as a plan's ability to offer competitive rebates. Poor performance can lead to financial penalties and loss of enrollment.

However, organizations face a significant data roadblock. Improving HEDIS scores requires analyzing massive volumes of sensitive patient data, including claims, lab results, and clinical notes. Accessing this data for development and testing is often blocked by stringent HIPAA compliance requirements. This creates friction: engineers need data to build tools that close care gaps, but legal restrictions prevent them from using real Protected Health Information (PHI).

Tonic.ai provides a solution to this deadlock by generating high-fidelity synthetic data. This allows developers to work with data that maintains the utility of production datasets without the inherent privacy risks.

Understanding HEDIS and the quest for 5-star ratings

HEDIS measures evaluate specific clinical processes and outcomes. A typical measure asks: “Of the members diagnosed with diabetes, what percentage received a hemoglobin A1c (HbA1c) test in the last year?” To maximize a score, a plan must prove that the numerator (patients who received the service) is as close to the denominator (eligible patients) as possible.

The domains of care

HEDIS includes more than 90 measures across several domains:

Preventive screening: Breast cancer screening, immunizations, and wellness visits.
Chronic condition management: Controlling high blood pressure and diabetes care.
Behavioral health: Follow-up after hospitalization for mental illness and antidepressant medication management.
Access/availability of care: Timeliness of prenatal and postpartum care.

The financial incentive

HEDIS scores are the engine behind the CMS Star Rating system. The financial implications are binary:

4 stars and above: Plans typically qualify for significant bonus payments and have their rebate percentage increased.
Below 4 stars: Plans lose access to these bonuses, making it difficult to offer the supplemental benefits required to attract and retain members.

Identifying care gaps

The most effective way to boost scores is through proactive care gap analysis. A care gap occurs when a patient is eligible for a service but hasn't received it. Closing these gaps requires sophisticated software that can ingest disparate data feeds and alert providers or members in real-time. Building these tools, however, is where the data bottleneck begins.

The HEDIS data logjam: complexity vs compliance

Software engineers and data scientists in the healthcare industry tasked with building HEDIS reporting engines face three primary hurdles:

1. Heterogeneous data sources

HEDIS reporting is not a single-source operation. It requires data from:

Claims databases: Billing codes that indicate procedures and diagnoses.
Electronic health records (EHRs): Detailed clinical results that claims might miss.
Lab feeds: Specific values (like blood glucose levels) necessary for outcome-based measures.

2. The risks of using real PHI

Using production data in development or staging environments is a high-risk practice. A single breach of a database containing patient histories can lead to HIPAA fines and irreparable reputational damage. Consequently, access to real-world data is typically restricted.

3. Development friction

When developers are denied access to production-grade data, they often resort to manual dummy data that lacks the complexity of real clinical records. Alternatively, they wait months for legal de-identification approvals. This red tape slows the development of AI-driven tools meant to identify care gaps, leaving potential HEDIS points—and revenue—on the table.

Transforming HEDIS reporting with Tonic Structural

Tonic Structural solves the structured data bottleneck by providing high-fidelity masking and synthesis for relational databases.

High-fidelity masking

Structural transforms sensitive claims and lab data into synthetic versions that look and act like the original. It preserves referential integrity. If a patient record is linked to five different lab results in the source database, the synthetic version will maintain those same links. This allows engineers to test complex joins and queries without seeing real patient names or Social Security numbers.

Note on Digital Standards: While Structural supports traditional SQL-based claims and clinical data, it also provides native support for de-identifying FHIR (Fast Healthcare Interoperability Resources). Its JSON Document View allows developers to mask nested patient resources while keeping the schema valid for digital HEDIS (dQMs) testing.

Maintaining statistical utility

For HEDIS math to be useful, synthetic data must retain the statistical distribution of the original set. If 15% of your members have Type 2 Diabetes in production, the synthetic set should reflect that same 15%. This ensures that when developers test scoring logic or dashboards, the results mirror what will happen in production. Structural offers a comprehensive library of data generators to ensure that statistical distributions and relationships within production data are maintained.

Accelerating the SDLC

Structural offers features like subsetting, allowing developers to create smaller, portable versions of massive healthcare databases. These targeted datasets enable engineers to iterate 24/7 in isolated silos, accelerating the SDLC for quality improvement tools.

Unlocking clinical insights with Tonic Textual

While administrative claims provide some data, a significant portion of HEDIS proof is buried in unstructured doctor's notes or clinical narratives. This is the hidden data problem.

The value of unstructured data

In many cases, a patient may have received a screening, but the claim was never filed or was coded incorrectly. The only evidence exists in the narrative text of an EHR. To capture this for HEDIS reporting, organizations use Natural Language Processing (NLP) to scan notes for proof of care.

Safeguarding clinical narratives

Training an NLP model requires access to thousands of clinical notes. However, these notes are saturated with PHI (names, addresses, specific dates). Tonic Textual uses proprietary Named Entity Recognition (NER) models to automatically detect and scrub these identifiers from clinical text.

AI training for HEDIS

By using Textual, data scientists can safely train Large Language Models (LLMs) or Retrieval-Augmented Generation (RAG) systems to analyze clinical notes. These models can then be deployed to identify care evidence that would have otherwise been missed. This directly boosts the numerator of HEDIS measures by finding hidden completions, leading to higher scores without changing the actual care delivered.

The future of digital HEDIS and synthetic data

The NCQA is currently transitioning toward Digital Quality Measures (dQMs) and the use of Electronic Clinical Data Systems (ECDS).

Automation and interoperability: The industry is moving away from manual chart reviews toward fully automated, interoperable systems based on the FHIR (Fast Healthcare Interoperability Resources) standard.
Predictive analytics: Future HEDIS success will rely on AI to predict which members are at the highest risk of missing a screening before the measurement year ends.
Future-proofing with Tonic.ai: As these digital standards evolve, teams can use Tonic Fabricate to generate fully synthetic clinical scenarios. This allows for testing edge cases and new dQM logic long before real-world data is available, ensuring the system is ready for the audit season.

Conclusion: build the tools to win the HEDIS race

Maximizing HEDIS scores is a data engineering challenge as much as a clinical one. While the goal is improved patient outcomes and higher Star Ratings, the primary hurdle is safe, rapid access to high-quality data.

Tonic.ai provides the infrastructure to streamline compliance and mitigate security risks. By using high-fidelity synthetic data, healthcare organizations can build, test, and deploy the AI-driven tools necessary to close care gaps and secure quality bonus payments.

Stop waiting for data access. Start building with safe data today. Book a demo to begin exploring the Tonic product suite today.

Frequently asked questions

No. Official HEDIS reporting and audits must be conducted using actual, audited real-world data. However, synthetic data is the industry standard for developing and testing the scoring engines, NLP models, and dashboards that produce those final reports. It ensures the systems work correctly before they touch sensitive production data.

Tonic.ai provides solutions for meeting HIPAA Safe Harbor and Expert Determination standards in data de-identification. It removes the 18 specific identifiers (names, geographic subdivisions smaller than a state, dates, etc.) and uses advanced mathematical transformations to ensure that the resulting data cannot be re-identified to an individual.

Administrative Measures: Use only electronic data from claims and encounters.‍
Hybrid Measures: Combine administrative data with a sample of medical record reviews. Because hybrid measures require clinical detail, they are much harder to calculate. Using Tonic Textual to safely process clinical notes is particularly valuable for improving scores in hybrid measures.

See all FAQs

How to maximize HEDIS scores with synthetic data

Understanding HEDIS and the quest for 5-star ratings

The domains of care

The financial incentive

Identifying care gaps

The HEDIS data logjam: complexity vs compliance

1. Heterogeneous data sources

2. The risks of using real PHI

3. Development friction

Transforming HEDIS reporting with Tonic Structural

High-fidelity masking

Maintaining statistical utility

Accelerating the SDLC

Unlocking clinical insights with Tonic Textual

The value of unstructured data

Safeguarding clinical narratives

AI training for HEDIS

The future of digital HEDIS and synthetic data

Conclusion: build the tools to win the HEDIS race

Frequently asked questions

Can synthetic data be used for official HEDIS audits?

How does Tonic ensure HIPAA compliance?

What is the difference between administrative and hybrid HEDIS measures?

Want to make your data usable?

Related blog posts

Healthcare’s blind spot: What happens after our data is shared?

A guide to data masking for HITRUST certification

Using synthesized data for HIPAA expert determination

Make your sensitive data usable for testing and development.