Discussion
In this study, we determined that the participating health systems had used adequate numbers of ThermoCool catheters and could obtain data of sufficient quality from their electronic information systems to evaluate the safety and effectiveness of cardiac ablation catheters. There were several important lessons learnt from this feasibility study, which will not only inform our investigations of the ablation catheters in question but also the use of RWD generally for medical device-specific studies.
A key strength was the successful use of a decentralised model for research.8 All three health systems retained their data behind their individual firewalls, but data were collected using common definitions that will enable research using distributed analytics. This will provide a much larger sample size than from a single health system. The infrastructure was additionally built for two similar studies (focused on persistent AF and ischaemic VT); these commonalities suggest that a reusable infrastructure can be created for answering multiple research questions about real-world medical device safety and effectiveness.
In addition, several challenges were encountered in the feasibility stage of this research as well as insights that have led us to conclude that the challenges are all addressable. An overall challenge was that diagnostic codes lacked sufficient resolution for creating patient cohorts, which may necessitate expanding data types used for computed phenotypes. ICD-9-CM codes lack specificity for subtypes of AF. Because our ablation indication of interest was persistent AF, this prevents analysis of specific phenotypes of atrial fibrillation and would require alternative extraction approaches. Fortunately, the ICD-10-CM transition (1 October 2015) provides additional detail to AF subtypes. However, many patients had non-specific ICD-10-CM codes recorded; if the specific AF subtype cannot be documented, these patients may be dropped as they cannot be determined to have persistent AF. One option to address this challenge is to create and validate an algorithm (eg, such as including the use of antiarrhythmic medications) that may increase probability a patient has persistent AF. We also can use the Mayo Clinic cardiac ablation registry, a quality improvement database that captures most ablations with nurse-abstracted data, to validate our coding algorithms for identifying persistent AF. Additionally, Mercy has previously validated natural language processing algorithms to identify patients with persistent AF in Mercy’s notes text. To ensure a consistent definition of AF, we started with use of the ICD-10-CM diagnosis code at the time of the ablation procedure. If there were multiple codes, we considered refining based on disease progression in future decision-making.
Similarly, while there are codes for identification of VT ablation, there are no specific ICD-9-CM or ICD-10-CM codes for identification of ablation of ischaemic VT. Patients were determined to have ischaemic VT using ICD-9-CM and ICD-10-CM diagnosis codes for ischaemic heart disease,17 assuming these codes in a patient’s history would indicate an ischaemic aetiology of VT. The accuracy of these diagnostic codes, and what look-back period to use, for ischaemic heart disease is unclear.
Second, using health system data to create a cohort of patients who underwent the procedure of interest using a specific device has challenges. Ascertainment of pertinent covariates using prior diagnosis codes may be limited unless patients were cared for routinely within a health system prior to the procedure. Missing data for this reason has been termed EHR-discontinuity and can risk biasing studies through misclassification of confounding variables and outcomes.18 A solution is to use an algorithm to identify patients with high data completeness in the EHR.18 For patients receiving ablation within a referral centre, chart histories of covariates (eg, prior ablation) may not be in discrete fields and, thus, may be incompletely ascertained if only automated extraction is used and, thus, limit accurate risk adjustment. However, these data may be available to a manual abstractor. Another potential solution is linkage with insurer claims data, such as Medicare fee-for-service claims, or registry data. (There are currently multiple national AF ablation registries and sometimes local registries, such as the Mayo Clinic cardiac ablation registry.) Another challenge is that other covariates that describe how the catheter was used may not be in the EHR or captured by current coding algorithms; this is particularly important as different operators may use different ablation techniques, and ablation techniques, such as the use of high-power short-duration ablation, may evolve over time. A solution is to use procedural data captured by the ablation technology during the procedure that records numerous parameters employed by the operator, including power, contact force, lesion duration, continuous versus point-by-point ablation and lesion sets, which are not captured with ICD codes. Additionally, important covariates such as centre ablation volume and operator experience can be obtained from health system data.
Third, ensuring that patients have sufficient follow-up within the given health system is important for long-term outcome ascertainment, particularly at regional referral centres. In-hospital complications will nearly always be captured. However, postdischarge, patients may receive care at outside health systems. If they experience a complication, they may present to the nearest emergency department, geographically distant from the centre where the ablation procedure was performed. Follow-up decreased progressively from 7 days to 30 days to 6 months and then to 1 year. We found that at least two-thirds of patients had follow-up at 1 year for all three participating health systems. Follow-up beyond 1 year is likely to be challenging. However, because we did not capture care at outside health systems, it is possible that this may undercount pertinent clinical conditions or events, including hospitalizations; although possible, we do not think that such missed events are likely to occur often when patients continue to see clinicians within the same health system. Low follow-up data capture rates in individual health system EHRs must, nevertheless, be identified. Strategies to improve data capture, such as through patient-centred digital health data sharing platforms that track outcomes in multiple EHR systems,19 could increase the amount of available follow-up data. Another possible solution is identification of a subpopulation that likely has follow-up within the health system where the ablation is performed (eg, within a close geographic radius or receiving primary and/or cardiology care within the health system). Finally, linkage to claims or registry data could provide additional follow-up.
Fourth, although a small number of patients were identified with the four outcomes of interest, there was variability in the PPVs of using diagnosis codes to identify the safety outcomes of interest across outcomes and across healthcare systems, despite filtering on primary discharge diagnosis codes. For example, the PPVs for identification of stroke at one health system were relatively low, which may have resulted from the inability of the algorithm to distinguish patients with acute strokes from those with history of prior stroke. Using primary discharge diagnosis codes for hospitalizations for serious events like stroke may be less subject to misclassification than codes from outpatient visits. However, some patients with peri-procedural safety events may be missed by reliance solely on primary diagnosis codes, since safety events that occur during the procedure may be entered as secondary diagnoses while the primary diagnosis is used for the morbidity that led to the procedure. Both primary and secondary diagnoses codes should likely be used. Mid-term and long-term evaluation for events like heart failure that have a high background rate in the target population may make it difficult to differentiate between a prior event and a treatment-related occurrence using diagnostic codes alone. Finally, additional clinically important outcomes, such as atrio-oesophageal fistula and pulmonary vein stenosis, which are expected to be identified only at longer duration of follow-up, also need to be identified.
There are multiple possible solutions for improving accuracy of outcome ascertainment, which are similar to the strategies to address cohort identification. For example, prior studies have validated a variety of diagnosis codes against chart review in electronic health record studies.11 Additionally, tools for phenotype development and evaluation using machine learning approaches are increasingly being made available to assist these efforts.20 Ultimately, comparing the diagnostic codes or algorithms (composed of several codes for presence, absence, timing, setting and coincident diagnoses and treatments) with clinician review of electronic health records to determine extent of concordance between codes and clinical judgement may be necessary. While our study only had the resources to examine PPVs and sometimes identified smaller numbers of adverse events than would be expected based on the peer-reviewed literature, our study is limited because we did not assess negative predictive values, sensitivity, specificity, and accuracy of outcome ascertainment among patients with adequate follow-up within health systems to ensure that outcomes are not missed. If coding algorithms fail to perform adequately in this regard, an alternative approach to safety event identification will be considered. A group of patients at high risk of the outcome of interest who have negative results on the algorithm to detect the outcome of interest will require manual review, which can be time-consuming. Such analyses should examine specific sections of a patient’s EHR data, such as clinical notes where, if a given clinical event had occurred, it would be recorded. For example, cardiology or cardiac electrophysiology notes should capture ablation-related complications. This approach has been used successfully in validating algorithms to detect opioid overdose21 and addiction22 in postmarketing studies overseen by the FDA. In our study, it is possible that clinicians who performed the chart review were, in some cases, the same as those performing the procedures; it is possible that this could bias towards undercounting of adverse events. While this potential source of bias is not critical in a feasibility study of this type, efforts will be needed to ensure that methods are unbiased and consistent across different sites in a study of device effectiveness and safety.
Finally, effectiveness outcomes need to be carefully chosen to ensure that they are meaningful patient-oriented metrics. While we did not evaluate efficacy end points in this study, for future research, we decided to use clinical outcomes that reflect sequelae of arrhythmia recurrence: for AF, this includes rehospitalisation for AF or interventions addressing a new atrial tachyarrhythmia, including cardioversion, repeat ablation or new antiarrhythmic drug prescription; for ischaemic VT, this includes hospitalisation for VT or repeat ablation as well as heart failure, since that can be a sequela of VT. These are meaningful measures of effectiveness since the goal of ablation is to prevent additional treatment for an arrhythmia or serious consequences thereof, primarily heart failure, recurrent VT and mortality. Arrhythmia ICD diagnosis and procedure codes alone (ie, AF or VT codes) for outcome identification may have low PPVs because clinicians may add these codes of past events to follow-up visits since they can help with medical history or reimbursement. Even relying on primary diagnosis codes may not be sufficient, and manual chart review may be necessary to improve PPV and, therefore, it may not be easily achievable. In the case of AF, for example, physicians may maintain the diagnosis for clinic visits if patients are continued on therapeutic anticoagulation for thromboembolism prophylaxis. We found that PPVs using this approach for arrhythmia-related hospitalisation had variation across health systems (27%, 43% and 84%); but, because the numbers assessed were small, the CIs around the point estimates of several outcomes were overlapping. Additionally, for patients with AF, a blanking period (ie, time period when arrhythmia recurrences are not included) needs to be considered, given the high recurrence rate during this 60-day or 90-day period postablation.23 Other data sources that could be helpful for outcome ascertainment, such as results from electrocardiograms, outpatient rhythm monitors or cardiac implantable electronic device (ie, pacemaker, implantable cardioverter defibrillator or implantable loop recorder) interrogations, were evaluated and found to be not feasible to easily obtain from the EHR databases used in our study; however, novel approaches are creating methods to import these data in standardised formats.