Electronic health records (EHRs) contain important information about patients’ health outlook and the care they receive, but the records are not always precise. A new study describes an approach that uses machine learning, a type of artificial intelligence, to carefully track patients’ medical records over time in EHRs to predict their likelihood of having or developing different diseases. The study was led by researchers at Massachusetts General Hospital and is published in Cell Patterns.

The computer sorts through thousands of patients and can find sequences that a physician would likely never identify on their own as relevant, but actually are associated with the disease.

Hossein Estiri, PhD
Mass General Laboratory of Computer Science

“Over the past decade, billions of dollars have been spent to institute meaningful use of EHR systems. For a multitude of reasons, however, EHR data are still complex and have ample quality issues, which make it difficult to leverage these data to address pressing health issues, especially during pandemics such as COVID-19, when rapid responses are needed,” said lead author Hossein Estiri, PhD, of the Mass General Laboratory of Computer Science. “In this paper, we propose an algorithm for exploiting the temporal information in the EHRs that is distorted by layers of administrative and health care system processes.”

The strategy connects information from EHRs on patients’ medications and diagnoses over time, rather than from independent health records. Analyses revealed that this sequential approach can accurately compute the likelihood that a patient may actually have an underlying disease.

“Our study doesn’t rely on single diagnostic codes but instead relies on sequences of codes with the expectation that a sequence of relevant characteristics over time is more likely to represent reality than a single element,” Dr. Estiri said. “Additionally, the computer sorts through thousands of patients and can find sequences that a physician would likely never identify on their own as relevant, but actually are associated with the disease.”

As an example, coronary artery disease followed by chest pain in the medical record was more useful for predicting the development of heart failure than either of the factors on their own or in a different order.

The method can therefore identify disease markers that are interpretable by clinicians. This could lead to new computational models for identifying and validating new disease markers and for advancing medical discoveries. The proposed way of thinking about medical records could also help identify patients in a community who are at risk of developing a variety of other diseases and recommend their evaluation by health care providers.

Paper cited: Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN Cell Patterns 18 Jun 2020

Funding: This work was funded through the U.S. National Human Genome Research Institute.

About Massachusetts General Hospital

Massachusetts General Hospital, founded in 1811, is the original and largest teaching hospital of Harvard Medical School. The Mass General Research Institute conducts the largest hospital-based research program in the nation, with annual research operations of more than $1 billion and comprises more than 9,500 researchers working across more than 30 institutes, centers and departments. In August 2019, Mass General was named #2 in the U.S. News & World Report list of "America’s Best Hospitals."