Clinical Question: Can an AI model accurately diagnose ST-segment elevation myocardial infarction (STEMI) using electrocardiograms (ECGs), and how does its performance compare to that of clinical physicians and ECG machine algorithms?
Paper: Lee SH, Jeon KL, Lee YJ, Ko YG, Choi D, Hong MK, et al. Development of Clinically Validated Artificial Intelligence Model for Detecting ST-segment Elevation Myocardial Infarction. Annals of Emergency Medicine. 2024. PMID: 39066765
What They Did:
The study involved the development of a deep ensemble AI model using ECG data from a prospective percutaneous coronary intervention registry in Korea. Two board-certified cardiologists reviewed and classified the ECGs, and their consensus served as the gold standard. The AI model was trained on a large dataset, tested, and clinically validated. Its performance was then compared with ECG machine interpretations and clinical physicians’ diagnoses.
Population:
- Model Development Data Set: ECGs from patients who underwent percutaneous coronary intervention in Severance Hospital from the KOMATE registry were included for the model development data set.
- 300 ECGs were randomly taken from the 2020 test set and provided to three 2nd year residents in internal medicine.
- Whereas the 2006-2019 Model Development Data set was randomly split 9:1 into a training set and an internal validation set.
- Clinical Validation Set
- ECGs from patients with chest pain who visited the emergency department of Severance Hospital in 2020, regardless of percutaneous coronary intervention
- Critical Pathway Cohort
- Patients who were activated at Severance Hospital via the CRITICAL PATHWAY (2 Physicians agreed on activation) between 2007-2020
- External validation using the Physikalisch Technische Bundesanstalt (PTB)-XL dataset.
- Publicly available ECG data set
- Two cardiologists reviewed ECGs of those with acute MI and if considered STEMI were included in this external validation set.
Exclusion:
- Absence of coronary angiography data
- Absence of an adequate ECG performed within 24 hours of percutaneous coronary intervention.
- Inadequate ECG Quality
- ECG obtained >24H from PCI
Intervention:
- The AI model implemented on ECGs to diagnose STEMI.
Comparator:
- AI model’s diagnostic performance compared to:
- ECG machine interpretations.
- PGY2 Internal Medicine Residents (based on 300 randomly selected ECGs).
Outcomes:
- Primary Outcome: Performance Characteristics of the AI Model in detecting STEMI.
Results:
- Model Development Data Set: ECGs from patients who underwent percutaneous coronary intervention in Severance Hospital from the KOMATE registry were included for the model development data set.
- Training Set= 15641
-
-
- Internal Validation Set=1738
-
- Test Set N=1318
- Clinical Validation Set
- ECGs from patients with chest pain who visited the emergency department of Severance Hospital in 2020, regardless of percutaneous coronary intervention. (N=2699)
- Critical Pathway Cohort
- Patients who were activated at Severance Hospital via the CRITICAL PATHWAY (2 Physicians agreed on activation) between 2007-2020. N=3307
- External validation using the Physikalisch Technische Bundesanstalt (PTB)-XL dataset.
- 5991 ECGs with Normal annotation and 79 STEMI ECGs.
- 18697 ECGs were eligible with 1745 (9.3%) classified as STEMI.
Strengths:
- Clinical Validation Using Diverse Patient Data: The study validated the model using real-world ECG data from chest pain patients, enhancing its real-world applicability.
- Explainability through Grad-CAM: Grad-CAM visualizations highlight the model’s focus areas, increasing clinician trust and reducing the “black box” perception.
- External Validation on the PTB-XL Dataset: Testing on the PTB-XL dataset confirmed the model’s reliability across diverse patient populations and data sources.
- Deep Ensemble Model Design: The use of a deep ensemble model—a collection of multiple neural networks— reduced overfitting risks and improved the model’s reliability.
- Classification Standards by Cardiologists: Two board-certified cardiologists classified ECGs, boosting the dataset’s diagnostic accuracy and reliability for model training.
Limitations:
- Single-Center Data: The study used data from a single center in Korea, limiting generalizability. Broader studies in diverse settings are needed.
- Inexperienced Clinician Comparison: Relatively inexperienced providers (PGY2 internal medicine residents) were used for comparison, raising questions about how the model would perform against more experienced clinicians.
- No STEMI-Equivalent Features Included: Key ECG patterns like DeWinter’s T wave and Smith Modified Sgarbossa criteria were not considered. This omission may reduce sensitivity and lead to missed STEMIs, increasing false negatives in critical cases.
- Lack of Clinical Outcome Analysis: The study did not analyze the effect of AI-driven diagnoses on patient outcomes.
- Confirmation Bias and Variability in Diagnostic Standards:
- The development dataset relied on PCI-confirmed cases, introducing confirmation bias as classification decisions were influenced by knowledge of prior PCI.
- Cases without PCI were excluded, potentially missing undiagnosed/mismanaged STEMIs.
- In the validation dataset, some cases lacked coronary angiography and relied on less definitive methods like CT or echo, adding variability and reducing the reliability of training and validation processes.
- Clinical Context in STEMI Identification: The AI model lacked clinical data such as patient history, symptoms, or biomarkers, limiting its ability to mimic real-world STEMI diagnosis, which requires integrating ECG and clinical findings.
- Lack of Transparency in Deriving the External Validation Set: The selection of ECGs in the external validation set was not explained, making it unclear how representative or reliable this dataset was for assessing model performance.
DISCUSSION:
Simplifying AUPRC: AUPRC (Area Under the Precision-Recall Curve) is often preferred over AUC (Area Under the ROC Curve) in situations where there is a class imbalance, such as when STEMI cases are rare in a dataset. While AUC measures overall model performance, it can give overly optimistic results with imbalanced data. AUPRC, on the other hand, focuses specifically on the positive class (STEMI) and is more sensitive to how well the model balances precision (avoiding false positives) and recall (catching all true positives). This makes it a more realistic indicator of the model’s performance in these scenarios.
Simplifying Grad-CAM (Gradient-Weighted Class Activation Mapping): This tool makes the AI model more understandable by showing which parts of the ECG contributed most to the model’s decision. It highlights important segments of the ECG, such as the ST- and T-segments, which are key indicators of STEMI. By visualizing this, clinicians can trust that the AI model is focusing on relevant areas, improving confidence in its recommendations. This is best represented in Figure 3 (see below) where the orange color is overlying the st segment and t wave, designating that the AI model felt that segment to be the most relevant for decision making (as opposed to a section of random artifact).
However, the model is inherently constrained by the types of inputs it was trained on. Grad-CAM provides value by highlighting ECG segments, such as the ST and T-wave, that contribute most to the model’s decisions. However, this interpretability does not compensate for the limited input diversity in the model’s training.
Comparison to Queen of Hearts Model: While both the current study and the Queen of Hearts model aim to enhance ECG-based myocardial infarction detection, their vastly different methodologies make direct comparisons based on sensitivity, specificity, or ROC curves inherently difficult. The Queen of Hearts study, however, stands out for its robust design and diagnostic rigor, particularly its use of well-defined outcome measures.
Unlike the current study, which relied on a single-center dataset from Korea and excluded STEMI-equivalent patterns such as DeWinter’s T wave and Smith’s Modified Sgarbossa criteria, the Queen of Hearts model incorporated subtler ECG patterns indicative of coronary occlusions, ensuring broader applicability to diverse ECG findings. Moreover, the current study compared its AI model against relatively inexperienced PGY2 internal medicine residents, limiting the clinical relevance of its findings. In contrast, the Queen of Hearts model validated its performance against two highly regarded ECG experts, providing a more rigorous and clinically meaningful benchmark.
By focusing on OMI/NOMI detection rather than traditional STEMI classification, the Queen of Hearts model moves beyond conventional criteria to address advancements in nuanced ECG interpretation. This distinction highlights the need for AI models, including the one in the current study, to evolve toward broader detection capabilities that encompass subtler, high-risk patterns.
The Promise of AI in STEMI Diagnosis: The findings highlight both the promise and limitations of using AI for STEMI diagnosis. The AI model demonstrated impressive accuracy (92.1%) and sensitivity (95.4%) in identifying STEMI, outperforming traditional ECG machine interpretations and inexperienced clinicians, particularly in sensitivity. This suggests that AI models could be essential in supporting clinicians by reducing the likelihood of missed STEMI diagnoses, a critical advancement given the high stakes associated with timely intervention.
However, the model’s real-world applicability remains in question. One major limitation is the study’s reliance on data from a single center in Korea, which may not fully capture the diversity in patient presentations across global populations. Furthermore, the model was not trained on all comers in the emergency department, many of whom do not have ACS. As a result, its performance in broader ED populations, including those without chest pain, remains unknown.
The authors appropriately stress the importance of multi-center, prospective validations to determine whether the AI model can maintain its accuracy and clinical utility across diverse settings. Without such validation, integrating AI-based decision support tools into emergency practice should proceed cautiously, with clinicians ensuring that AI outputs are verified against clinical judgment and comprehensive patient evaluations.
Authors Conclusion: “…The developed deep ensemble model for the diagnosis of STEMI achieved outstanding and well- balanced performance in both a percutaneous coronary intervention registry and a symptom-based ECG set. Grad- CAM also enhanced the explainability of the AI model and its alignment with real-world practice. Further studies with prospective validation regarding clinical benefit in a real- world setting are warranted.”
Our Conclusion:
While this AI model shows promise, several key limitations hinder its readiness for clinical use. Reliance on single-center data and traditional STEMI criteria rather than OMI criteria raise questions about real-world applicability. Additionally, the lack of comparison with experienced clinicians weakens confidence in its performance over current standards. Prospective validation in diverse settings is crucial before this or similar AI tools can achieve widespread adoption. Time and rigorous testing will ultimately determine which models, if any, demonstrate lasting clinical utility.
References:
- Lee SH, Jeon KL, Lee YJ, Ko YG, Choi D, Hong MK, et al. Development of Clinically Validated Artificial Intelligence Model for Detecting ST-segment Elevation Myocardial Infarction. Annals of Emergency Medicine. 2024. PMID: 39066765
- Herman R, Meyers HP, Smith SW, et al. International evaluation of an artificial intelligence-powered electrocardiogram model detecting acute coronary occlusion myocardial infarction. Eur Heart J Digit Health. 2023;5(2):123-133. Published 2023 Nov 28. PMID: 38505483
Peer Review: Anand Swaminathan MD, MPH & Marco Propersi, DO