Wearable patch ECG monitoring enables continuous long-term monitoring outside of the clinic. During a monitoring study, service providers leverage human technicians and algorithms to analyze raw data and distill clinically relevant metrics into daily and end-of-study reports for the prescribing clinician. Atrial fibrillation (AFib) onset/offset detection and burden quantification of atrial fibrillation burden are important aspects of this reporting and must be performed with high sensitivity and precision in order to support clinical decision making.
New deep learning algorithms have demonstrated impressive performance when applied to AFib detection. However, many of these algorithms do not denote AFib onset/offset and very few have been validated on large, diverse, real-world datasets. Rigorous validation is particularly important for deep learning algorithms because of their capacity to “memorize” training data. Memorization results in algorithims that do not perform well when presented with ECG that is significantly different than the training data. For example, algorithms suffering from an inability to generalize may perform well on ECG that contains 100% AFib or 100% normal sinus rhythm but may perform poorly when presented with rhythm transitions.
The following describes the Preventice BeatLogic TM deep learning platform for detecting and classifying cardiac arrhythmias. Real-world validation demonstrates performance of the algorithm for sinus rhythm and AFib using data that contains transitions into and out of AFib with varying durations.
Two deep learning models form the foundation of the Preventice BeatLogic TM platform. The first detects and classifies beats and noise, the second detects and classifies rhythms. Both models leverage a convolutional neural network with residual connections and incorporate a repeating series of layers: 1-D convolution, batch norm, nonlinear activation, and dropout (Figure 1). Dropout is excluded following the first and last convolutions and classification is performed by a fully-connected and subsequent softmax layer. The rhythm classification model produces a prediction every one second and beat detection/ classification produces a prediction every 0.125 seconds. Predictions from each model are generated within the context of a one-minute slice of ECG enabling each 1 or 0.125 second classification to gather context from the entire ECG. Results from the two models are merged in post-processing to create contiguous rhythm and artifact annotations.
ECG training data was gathered from more than 10,000 patients who were monitored using the BodyGuardian® Heart (BGH) device. Rhythm labels were annotated and adjudicated by 3 certified ECG technicians, each with more than 5 years of experience. ECG records were captured from the mobile cardiac telemetry platform, which receives ECG segments that are generally 1 to 4 minutes in duration. Continuous and discontinuous atrial fibrillation were represented approximately equally in the dataset.
ECG validation data was gathered from 512 patients who were monitored using the BodyGuardian ® Heart (BGH) device. From the mobile cardiac telemetry platform, more than 2,500 ECG records were pseudo randomly captured to ensure rhythm and patient diversity within the validation data set. The large pool of candidate ECG records were filtered down so that at least 20 examples of each rhythm called by the Preventice BeatLogicTM platform were contained within the final validation data set. Rhythm labels were adjudicated by 3 board certified electrophysiologists. No patient crossover was allowed between the training and validation data sets.
Algorithm validation was performed in accordance with the EC57 guidelines for assessing for cardiac rhythm measurement algorithms. This includes quantification of sensitivity and precision for episode (event detection) and duration (detected event overlap).
The Preventice BeatLogic TM platform achieved state-of-the-art AFib and Sinus Rhythm classification performance. Sensitivity was evaluated for AFib truth durations longer than 0, 30, and 60 seconds and precision was evaluated for AFib algorithm detections with durations longer than 0, 30, and 60 seconds. Episode sensitivity and precision improved to perfect for AFib events with duration longer than 1 minute.
The Preventice BeatLogic TM platform achieved state-of-the-art performance for detection and classification of both sinus rhythm and AFib, denoting onset and offset that is accurate to within a few heart beats. AFib sensitivity increased as the minimum duration for true AFib was increased to 30 and 60 seconds indicating that while the platform performs well on short bouts of AFib, much like humans, the system is better at correctly classifying longer durations. Within the academic literature most deep learning algorithms for AFib detection (1) fail to validate using real-world data, (2) do not demonstrate robust generalization by testing on a large unique patient data set, and/or (3) do not follow the standard EC57 guidelines for validation. These failures undermine published measures of performance and are addressed in this work.