Search ...

M

Research

Directional right arrow graphic

With more than $1.3 billion in research expenditures, we are leading the way in numerous areas. Here are some examples.

Partnerships

Directional right arrow graphic

Innovation

Directional right arrow graphic
University of PIttsburgh Health Sciences logo
University of PIttsburgh Health Sciences logo

About

3
2

Education

3
2

Research

3
2

Partnerships

3
2

Innovation

3
2

Impact

3
2

News

3
2

Events

3
2

Subscribe

Contact

Give

PittMed Mag

Search ...

M

December 22, 2025

Breaking Health Data Barriers: Pitt Medical Researcher Collaborates with Health Sciences and Sports Analytics Cloud Innovation Center

In electronic health records (EHRs), every piece of information (such as a lab test result) has a proprietary reference code associated with it. These reference codes are generally specific to individual hospitals and are critical to…

Home / News / Breaking Health Data Barriers: Pitt Medical Researcher Collaborates with Health Sciences and Sports Analytics Cloud Innovation Center

By Shannon Turgeon

In electronic health records (EHRs), every piece of information (such as a lab test result) has a proprietary reference code associated with it. These reference codes are generally specific to individual hospitals and are critical to identifying data.

However, because each hospital uses different proprietary codes, this poses a significant burden—in terms of both time and resources—for medical researchers.

Christopher Horvat is very familiar with this problem.

Horvat is an associate professor of critical care medicine, of pediatrics and of biomedical informatics, School of Medicine, and associate professor of clinical and translational science at the University of Pittsburgh. He is also the senior director of clinical informatics, Pitt Department of Critical Care Medicine and UPMC ICU Service Center.

“Take something as simple as a sodium level. If you search the EHR, you’ll find 329 different variables with ‘sodium’ in the name,” he said.

“One might look right, but unless you actually query and validate it, you could be looking at a urine sodium, a sodium supplement, or finally the blood sodium you meant to find,” he explained. “This complexity is the single biggest barrier to meaningful use of EHR data, and a major reason why hospitals still struggle to share data reliably.”

Horvat brought this issue to the attention of the team at Pitt’s Health Sciences and Sports Analytics Cloud Innovation Center, powered by Amazon Web Services (AWS). This center, which opened in April, is intended to leverage cloud technology to drive advancements in the health care and sports industries.

Horvat worked with Cloud Innovation Center (CIC) intern Gary Farrell, an undergraduate computer science and data science major, and Maciej Zukowski, an Amazon Web Services Solutions Architect, to come up with a solution.

There are a few “standard” dictionaries of EHR codes, including LOINC, SNOMED CT and RxNorm. The team sought to develop a tool that could automate the initial EHR code- mapping process and match proprietary codes to these standards.

“The idea was to build a model that can identify and map data on its own. When you receive a massive EHR export with only minimal context, the model should be able to tell you exactly what each variable represents, without sending someone back into charts to manually verify everything,” said Horvat.

To accomplish this, Farrell used a process called embedding to search through standard codes in the LOINC and SNOMED databases, and select the 30 code displays that have the most similar meaning to the proprietary code display. He then gave a large language model (LLM) tool called Claude Sonnet 4.5 the proprietary code, the corresponding average value or set of categories, and the 30 preselected options. From there, the LLM selected the top three matches. Once those steps had been completed, Farrell used synthetic data sets to extensively test the model and ensure that the AI would apply the language and statistical analysis to new data.

CIC intern Gary Farrell demonstrating the EHR code mapping tool to CIC intern Varun ShelkeCaption: CIC intern Gary Farrell, who worked closely with Horvat throughout the project, demonstrating the EHR code mapping tool to CIC intern Varun Shelke.

Farrell, who worked between 15 and 20 hours per week on the project, also incorporated additional features into the tool that will save medical researchers time, while still ensuring that their system is trustworthy. His model provides multiple options for what the correct code might be, with a ranking of which code is the most likely match and why, with an accuracy of 92.7%. Medical researchers can then review what the system came up with—ensuring that a human will make the final determination on which result is correct.

This method also transforms and outputs the data into the standard format used to exchange health care information, known as Fast Healthcare Interoperability Resources (FHIR). Having the data formatted in FHIR makes it easy to share.

“Having a tool that can automatically map variables and convert formats smoothly would be a game-changer,” Horvat said. “If you can automate the mapping, you can take on virtually any predictive analytics project, like adult heart failure, pediatric neurologic deterioration, sepsis across all ages, and cut the development time dramatically.”

The CIC team has packaged its tool so that anyone can run it with only a few commands.

For Farrell, who had no prior experience working with electronic health record data, this project has been a hands-on way to see the possibilities at the intersection of health science and technology.

“When I first started, it was kind of overwhelming to hear LOINC and SNOMED and RXNorm and all these other terms,” Farrell said. “But once we dove into it, it became a lot more clear. I think it’s neat to see the real-world impact that you can have in the health sciences.”

Last Updated: May 7, 2026

The Future of Health is Pittsburgh