Background Systematic reviews (SRs) of computer-assisted (CA) total knee arthroplasty (TKA) and total hip arthroplasty (THA) report conflicting evidence on its superiority over conventional surgery. Little is known about the quality of these SRs; variability in their methodological quality may be a contributing factor. We evaluated the methodological quality of all published SRs to date, summarized and examined the consistency of the evidence generated by these SRs.
Methods We searched four databases through December 31, 2018. A MeaSurement Tool to Assess systematic Reviews 2 (AMSTAR 2) was applied to assess the methodological quality. Evidence from included meta-analyses on functional, radiological and patient-safety outcomes was summarized. The corrected covered area was calculated to assess the overlap between SRs in including the primary studies.
Results Based on AMSTAR 2, confidence was critically low in 39 of the 42 included SRs and low in 3 SRs. Low rating was mainly due to failure in developing a review protocol (90.5%); providing a list of excluded studies (81%); accounting for risk of bias when discussing the results (67%); using a comprehensive search strategy (50%); and investigating publication bias (50%). Despite inconsistency between SR findings comparing functional, radiological and patient safety outcomes for CA and conventional procedures, most TKA meta-analyses favored CA TKA, whereas most THA meta-analyses showed no difference. Moderate overlap was observed among TKA SRs and high overlap among THA SRs.
Conclusions Despite conclusions of meta-analyses favoring CA arthroplasty, decision makers adopting this technology should be aware of the low confidence in the results of the included SRs. To improve confidence in future SRs, journals should consider using a methodological assessment tool to evaluate the SRs prior to making a publication decision.
- orthopedic devices
- technology assessment, biomedical
- health care quality, access, and evaluation
- robotic surgical procedures
- health technology
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- orthopedic devices
- technology assessment, biomedical
- health care quality, access, and evaluation
- robotic surgical procedures
- health technology
What is already known on the subject?
Systematic reviews of computer-assisted total knee and hip arthroplasty report conflicting evidence on its superiority over conventional arthroplasty.
Little is known about the quality of these Systematic reviews (SRs); variability in their methodological quality may be a contributing factor.
What are the new findings?
Most of the SRs showed that computer-assisted (CA) is equivalent to or better than conventional knee and hip arthroplasty; however, the confidence in the included SRs ranges from critically low to low.
There is a plethora of outcomes measures and inconsistency in reporting outcomes in SRs.
How might these results affect future research or surgical practice?
They highlight the need to conduct a high-quality SR to inform the decision on adopting CA knee and hip arthroplasty.
Journals should consider using a methodological assessment tool (eg, A MeaSurement Tool to Assess systematic Reviews 2) to assess the quality of SRs.
To strengthen evidence synthesis related to Total Knee Arthroplasty and Total Hip Arthroplasty outcomes, standardized outcome measures such as those recommended by the Outcome Measures in Rheumatology Trials Total Joint Replacement Working Group should be used and reported.
Instability and loosening of the implant are among the most common reasons for revisions of total knee arthroplasty (TKA),1 and total hip arthroplasty (THA),2 and are mainly due to inaccurate positioning of the implant and malalignment of the limb.3 Computer-assisted (CA) arthroplasty, whether navigation or robotic systems, is proposed as an alternative to improve the accuracy of implant positioning and reduce malalignment4 through providing intraoperative feedback to the surgeons before cutting the bones.5 The navigation system guides the surgeon during the operation,6 whereas the robot system operates on patients to insure precise cutting of the bones.6
Utilization of CA arthroplasty has been steadily increasing over the past few years in USA. For example, CA TKA has increased from 0.37% in 2005 to 2.32% in 2012 with average increase of 0.26% per year.7 CA arthroplasty is associated with a steep learning curve (10–20 cases) for the surgeon, and significant costs for equipment and continuous maintenance for hospitals.8 9 With concerns about overutilization of joint replacement,10 investment in new technologies should be supported by high quality evidence to justify societal resources use.11
Multiple systematic reviews (SRs) have been conducted to compare CA TKA and THA to conventional approaches; however, the results of these SRs are conflicting.12 For example, Shi and colleagues conducted a meta-analysis on the alignment outcomes of conventional versus CA TKA and suggested no difference,13 whereas Rebal and colleagues found improved alignment outcomes with CA TKA.14 Both were published in the same year, suggesting potential inconsistency in the methodology of conducting these SRs.
SRs and meta-analyses provide the highest level of evidence15 and should be of high quality. However, little is known about the quality of SR comparing CA and conventional approaches. We conducted an umbrella review to (1) Evaluate the methodological quality of SRs. (2) Summarize and examine the consistency of the evidence generated by these SRs.
Structure of the umbrella review
An umbrella review systematically evaluates and collects evidence from multiple SRs on all outcomes for which these have been conducted.16 To develop our umbrella review, we followed the steps outlined in the Cochrane Handbook and other methodological papers on conducting umbrella reviews.17–19 A protocol has been developed prior to the conduction of this review. We developed a comprehensive search strategy to include all SRs and meta-analyses comparing CA to conventional TKA and THA. We included both TKA and THA SRs because both procedures are elective orthopaedic procedures on the rise,20 and provide long-lasting joints that are effective in alleviating pain and regaining function for patients with end-stage osteoarthritis.21 22 Moreover, TKA and THA are often considered together in reimbursement policies. However, we summarized the results separately for TKA and THA because surgical outcomes may differ by joint type. We executed the study selection, data extraction and quality assessment of the SRs in duplicate. We used the validated AMSTAR 2 tool to assess the methodological quality of the included SRs.23 To summarize and examine the consistency of the evidence, we compared conclusions from meta-analyses for outcomes common across more than one meta-analysis. We also calculated the corrected covered area (CCA)24 to assess the level of overlap between meta-analyses in including the same pool of primary studies, since high levels of overlap should produce more consistent conclusions. We used Covidence SR software, Veritas Health Innovation, Melbourne, Australia (available at www.covidence.org).
We searched MEDLINE, EMBASE, the Cochrane Database and Epistemonikos to identify SRs published through May 2017 comparing CA-TKA and THA versus conventional TKA and THA. The search strategy combined keywords (eg, knee arthroplasty, hip arthroplasty) with subject heading terms (eg, surgery, CA, arthroplasty, replacement, knee, hip), and specialized clinical queries for SRs. We also searched the gray literature (eg, conference proceedings, reports, and doctoral theses). We reran the search strategy to include the rest of 2017 and the whole year of 2018. See online supplementary appendix A for details.
Screening and selection of SRs
To exclude irrelevant citations, one reviewer (MMH) screened all citations for their titles and abstracts. Full-text articles of the remaining citations were retrieved and assessed independently by two reviewers (MMH and MZ). Included reviews satisfied the following inclusion criteria: they were SRs as defined by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement,25 26 and explicitly compared CA to conventional procedures. To identify any potential studies not identified in the database searches, we contacted authors of the included studies, and searched the bibliographies. Two reviewers (MMH and MZ) independently extracted data from the included SRs. The extracted data were general information about the SR (eg, year of publication, journal, and sources of funding) as well as details about the interventions, design, and main findings of the studies included in the reviews.
Assessment of methodological quality
After agreement on study inclusion, two reviewers (MMH, MZ) independently assessed the methodological quality of the included reviews using ‘A MeaSurement Tool to Assess systematic Reviews 2’ (AMSTAR 2). In case of disagreement, consensus was reached by discussions mediated by the senior author (HMKG).
First developed in 2007 (as AMSTAR) to only evaluate the methodological quality of SRs that synthesize evidence from randomized trials, this appraisal tool has been further developed, as AMSTAR 2, in 2017 to expand its use to SRs of randomized trials and non-randomized studies.23 Since its release, AMSTAR 2 has been used widely in many umbrella reviews.27–33 AMSTAR 2 comprises 16 domains, 7 of them are critical domains as they strongly undermine the confidence in the conclusions of the SRs: 1 domain is related to protocol registration, 2 are related to search strategy (adequacy and justifying studies’ exclusion), 2 are related to the assessment of risk of bias of the included studies and its effect on SR conclusions, 1 is related to the method of evidence synthesis, and 1 is related to the publication bias (table 1).23 The overall confidence in the results of the SRs is rated into four categories: high (no or one non-critical weakness), moderate (more than one non-critical weakness), low (one critical flaw with or without non-critical weaknesses), and very low (more than one critical flaw with or without non-critical weaknesses).23 AMSTAR 2 is a valid and reliable instrument, similar to other appraisal tools of SRs.34 35
Summary and assessment of the consistency of the evidence
First, we summarized the evidence out of the SRs that conducted meta-analyses. We categorized the outcomes reported by SRs into functional, radiological, and patient safety related and others. Then, we assessed the overlap between those meta-analyses in using the same primary studies by calculating the CCA.24 CCA assesses over-representation bias induced by using the same primary studies in different meta-analyses. As such, higher CCA suggests that the evidence summarized in an umbrella review is more likely to support the results of the primary studies included in multiple meta-analyses.24 CCA uses the number of the included meta-analyses, the number of the primary publications including the duplications, and the number of the primary publications after removing the duplications.24 CCA value ≤5 indicates slight overlap, 6–10 indicates moderate overlap, 11–15 indicates high overlap, and >15 indicates very high overlap.24
After deduplication, our initial search yielded 442 citations (figure 1). After screening the titles and abstracts, we excluded 330 citations not meeting the inclusion criteria. We retrieved the full texts of the remaining 112 citations for detailed full-text screening. After examining the full texts, we excluded 73 articles for not meeting our inclusion criteria (online supplementary appendix B). We also searched the gray literature and screened the references of the included studies and added two articles not captured by the original search strategy. Also, we contacted experts in the field resulting in one more article eligible for inclusion. As a result, we included 42 SRs.12–14 36–74
Description of the included SRs
The publication years of the included SRs ranged from 2004 to 2018 with most of the SRs (78%) published between 2011 and 2019. Four SRs were published in languages other than English: one German,40 one Korean,62 and two Mandarin.54 68 Of all the 42 SRs, 3 compared conventional to CA modalities of both TKA and THA,38 52 71 9 addressed THA,36 42 49 53 55 59 63 70 74 and the rest addressed TKA. The approach to evidence synthesis was as follows: 7 SRs synthesized the evidence qualitatively,38 40 44 52 56 63 66 7 SRs conducted meta-analysis and qualitative evidence synthesis,37 45 67 70 71 73 74 and the remaining 28 SRs conducted only meta-analysis. Regarding the intervention, four SRs compared minimally invasive (MI) CA TKA to MI TKA,39 64–66 one SR compared MI THA to CA THA to conventional THA,63 four SRs compared robotic THA to conventional THA,38 52 70 71 five SRs compared robotic TKA to conventional TKA.38 52 66 71 73 The remaining SRs compared CA navigation arthroplasty versus conventional surgery.
Methodological quality of the included SRs
Based on AMSTAR 2, confidence was rated critically low in the results of 39 studies and low in 3 studies. Low confidence was attributed to reasons such as 38 (90.5%) SRs not reporting development of a protocol; 34 (81%) SRs not providing a list of the excluded studies and not justifying the exclusion. In addition, of the 28 SRs that included non-randomized primary studies, 24 (85.%) SRs did not account for the risk of bias when interpreting the results. Figure 2 shows the prevalence of critical flaws and non-critical weaknesses across the included SRs. Table 2 shows a detailed rating of the critical flaws and non-critical weaknesses for each SR.
Summary and consistency of the evidence
Consistency of the conclusions from meta-analyses.
Three functional outcomes were compared for TKA.14 45 60 64 67 69 71 72 Knee Society Scores: Two out of eight SRs showed that CA TKA had superior scores,14 60 while six out of eight SRs showed no difference.45 64 67 69 71 72 Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC): One out of three SRs showed slightly improved scores with the CA TKA,60 while two out of three SRs showed no difference.71 72 Range of motion: two out of two SRs showed postoperative improvement with CA TKA.64 69 Two SRs reported meta-analysis results for THA and found no significant difference in the functional scores (Harris Hip Score (HHS), Merle d'Aubigne Hip Score, and Japanese Orthopedic Association Score) between CA and conventional THA.70 71 (figure 3)
Six radiological outcomes were compared for TKA (figure 3). Mechanical axis malalignment: 12/15 SRs showed significantly less malalignment with CA TKA12 14 37 43 46–48 51 54 57 60 66 73 whereas 3/15 SRs showed no significant difference.13 64 65 Coronal plane femoral malalignments: six out of eight SRs showed significantly less malalignments with CA TKA,12 14 43 47 57 60 whereas two out of eight SRs showed no difference.13 64 Coronal plane tibial component outliers: five out of six SRs showed significantly fewer outliers,12 14 43 57 60 while one out of six SRs showed no difference.13 Sagittal femoral component malalignment: two out of two SRs showed significantly less malalignment with CA TKA.12 57 Femoral slope malalignment: one out of one SR showed significantly less slope with CA TKA.60 Tibial slope malalignment: two out of three SRs showed significant difference in favor of CA TKA,12 60 while one out of three SRs showed no difference.57
Seven radiological outcomes were compared for THA. Cup positioning outside the safe zone: significantly reduced with CA THA in five out of five SRs.36 42 55 59 70 Number of outliers of acetabular cups outside the desired alignment range: two out of two SRs showed no significant difference.49 55 Cup inclination: one out of four SRs showed improved inclination in the navigated group,74 while three out of four SRs reported no significant difference.36 42 55 Cup anteversion: one out of four SRs reported improved anteversion,74 while three SRs reported no difference.36 42 55 Postoperative dislocation: one out of two SRs reported significant reduction with CA THA,59 while one out of two SRs reported insignificant difference.36 Reduction in the leg length discrepancy: one out of two SRs showed significant reduction within the navigated group,36 while one out of two SRs showed no significant difference.70 Heterotopic ossification: one out of one SR reported a higher rate in patients that underwent conventional THA.70
Patient safety related and other outcomes
Six patient safety related outcomes were compared for TKA (figure 3). Complications and adverse events: two out of three SRs showed no difference,39 46 while one out of three SRs showed that CA TKA is associated with fewer complications and adverse events.60 Postoperative blood loss and calculated blood loss: one out of one SR showed less blood loss with CA TKA.50 Allogenic blood transfusion rate: one out of one SR showed no difference.50 Operative blood loss: two out of two SRs showed no difference60 69 Hematocrit value after surgery: one out of one SR showed no difference.50 Tourniquet time: one out of one SR showed decreased tourniquet time with conventional TKA.60Table 3 provides details on TKA outcome measures.
Four patient safety related outcome measures were compared for THA (figure 3). Operative time: one out of two SRs reported a significantly longer time with the navigated procedures,36 another SR showed no significant difference.70 Deep venous thrombosis (DVT): one out of one SR concluded no significant difference.36 Joint infection: one out of one SR concluded no difference.59 Total complication rate: one out of one SR showed higher rate in patients who underwent conventional THA.70Table 4 provides details on THA outcome measures.
Overlap between SRS
For TKA, the number of primary studies included in the meta-analyses was 180 (468 before removal of duplicates) resulting in a CCA of 8%, which indicates moderate overlap (online supplementary appendix C). For THA, the number of primary studies included in the meta-analyses was 23 (36 before removal of duplicates), resulting in a CCA of 13.2%, which indicates high overlap (online supplementary appendix D).
We aimed to evaluate the methodological quality of SRs comparing CA and conventional arthroplasty using the AMSTAR 2 tool and summarize and examine the consistency of the evidence generated by these SRs. Our umbrella review identified 42 SRs. We found low confidence in the evidence provided by 3 SRs, and very low confidence in the evidence provided by the remaining 39 SRs. Most SRs concluded that CA procedures had generally better radiological and similar functional outcomes compared with conventional procedures. However, depending on the outcome, discrepancy in the conclusions of the SRs varied significantly. Patient safety related outcomes were infrequently reported in the included SRs. Over-representation of the primary studies was shown by the moderate overlap among TKA SRs, and high overlap among THA SRs. These conclusions have implications for policy makers evaluating and adopting this technology, and for journals considering future SRs for publication.
We found that most of the included SRs showed that CA procedures are equivalent or better than conventional ones, which may have been used to support the increase in utilization of CA THA and TKA.7 However, given that these SRs are inflicted by the very low confidence in their conclusions, we caution that these findings should not be used to support further adoption of this technology. Moreover, the published SRs included little data on patient related safety outcomes, which creates a major gap in the assessment of the technology, especially knowing that THA and TKA are among the top seven orthopaedic procedures with the highest complication rates.75 While the US Food and Drug Administration approved the use of navigation systems, postmarket surveillance is still needed to minimize unintended consequences, as is the case with metal-on-metal hip resurfacing, which proved costly and unsafe.76 77
There is a plethora of outcome measures and inconsistency in reporting outcomes in SRs. This finding highlights the need to standardize the outcomes reported by both the primary studies and SRs,78–80 in order to synthesize the evidence more comprehensively and meaningfully for technology assessment and guidelines development. To address this, core domains have been developed for clinical trials of TKA and THA developed by the Outcome Measures in Rheumatology Trials (OMERCAT) Total Joint Replacement Working Group;81 82 however, those core domains are not yet fully represented in trials and SRs of CA TKA and THA. For example, the included SRs in this review did not report measures related to patient satisfaction, revision, and death domains and only few reported on adverse events domains.
By synthesizing evidence from RCTs and other comparative non-randomized studies,83 SRs provide much needed data for the evaluation of medical devices. The Idea Development Evaluation Assessment and Long-term (IDEAL) framework, allows robust evaluation of surgical innovations based on its stage of development.84 85 Although 35 of the 42 SRs were published after the publication of the IDEAL framework in 2009, none reported the IDEAL stage of the primary studies. We attribute the under-reporting of the IDEAL framework in the included SRs mainly to the lack of awareness of its existence and its value, but partly also because SRs are perceived as outside of the scope of the framework.86 We suggest including SRs in the IDEAL framework as they have the potential to inform the evaluation and assessment phases depending on the robustness of the SRs and the quality of the primary studies.
Our findings also have important implications for journals considering SRs, in general, and on this topic in particular. Since the availability of the Quality of Reporting of Meta-analyses (QUOROM) statement in 1999 and the PRISMA statement in 2009,25 87 most journals require adherence to these guidelines to improve the reporting quality of SRs. Despite the enforcement of these reporting requirements, confidence was low in all included SRs in our study. Therefore, to enhance the confidence in the evidence synthesized by SRs, journals may consider requiring authors to abide by a methodological assessment tool (eg, AMSTAR 2) in addition to PRISMA guidelines. We suspect that many of these additional requirements will not be burdensome to authors. In our umbrella review, many of the unfulfilled requirements for AMSTAR 2 were administrative in nature (eg, presence of a protocol, availability of a list of excluded studies and reasons for exclusion) and can easily be addressed to increase the transparency and raise the confidence level in future SRs.
To our knowledge, this is the first umbrella review evaluating all published SRs comparing CA and conventional total joint arthroplasty. We followed the umbrella review guidelines strictly, and conducted the study selection, data extraction, and quality appraisal in duplicates. We then summarized the evidence in a structured way. We also assessed the overlap bias, an important step usually under-reported by umbrella reviews.24 Nevertheless, we must mention the limitations of our study. First, although we developed a protocol to help plan for our review, we did not register it, a step that would have provided more methodological strength for our review. Second, despite extensive efforts to identify all relevant SRs without language restrictions, it is still possible that we missed some SRs. Third, due to the absence of a reliable method of quantitatively synthesizing the evidence from multiple meta-analyses, we narratively summarized the evidence. Fourth, our extraction and assessment relied on the available manuscripts and supplemental materials. While we tried to contact the journals and authors inquiring for specific missing information, not all of them responded with clarification or additional information. Therefore, we cannot eliminate the possibility of underestimating the methodological quality for some studies because of the lack of access to relevant information.
Based on the findings of this review, we call for high quality SRs that can be used with great confidence to inform the decision on using CA TKA and THA. In addition, we encourage journals publishing SRs to use a methodological assessment tool to assess the quality of SRs. Finally, we advocate for standardization of the reported outcome measures for CA TKA and THA to facilitate evidence synthesis and outcome research.
Our umbrella review of 42 SRs found low methodological quality of the SRs undermining the confidence in the evidence synthesized by those reviews. Despite fairly high levels of overlap between the SRs in the primary studies examined, we found inconsistency in the results of the SRs tackling TKA and THA. Our findings suggest the need to improve the methodological quality of studies synthesizing evidence in this area to better inform clinical practice.
Contributors MMH and HMKG conceived and designed the study. MMH and MZ acquired the data. MZ and MMH performed the analyses. MMH, MZ and HMKG drafted the manuscript. All authors critically reviewed the manuscript and approved the final submitted version.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests MB reports personal fees and other from Medacta, other from Zimmer Biomet, outside the submitted work. All other authors have no competing interests to declare.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.