A literature review reports that underpowered randomized clinical trials (RCTs), which have been criticized as unethical and inadequate tests of an underlying hypothesis, accounted for approximately 21% of phase III RCTs of adult rheumatologic diseases published in English during 2001–2002, according to a new study in the November issue of the Journal of Rheumatology.1
In fact, only half of the negative or indeterminate phase III rheumatology RCTs (and 79% of trials overall) published during this period had an adequate sample size, according to lead researcher Helen Keen, MD, of Queen Elizabeth Hospital in Adelaide, Australia.
"A significant number of RCTs in rheumatic disease are of limited clinical value, with concomitant ethical issues," Dr. Keen and colleagues conclude. "Both investigators and institutional review boards have a responsibility to prospective study participants to determine the adequate sample size that will answer their research question, and to be realistic about their ability to recruit and retain enough participants in a given time frame prior to study commencement."
Moreover, the authors assert, "journal editors have a responsibility to ensure that authors adequately address power issues in their reports so the wider rheumatology community can assess the likelihood of a type II error in an RCT with negative results."
The researchers reviewed the sample size of 205 phase III trials in rheumatology. Of these, 119 were positive, 81 were negative, and the remaining 5 were indeterminate. Of the 86 negative or indeterminate RCTs, 37 reported sample size calculations (all but 4 had adequate power), the study showed.
Of the 49 remaining phase III trials that did not report power calculations, only 10 were adequately powered. The majority of the negative underpowered studies were of rheumatoid arthritis and osteoarthritis; few concerned rare rheumatologic diseases.
While underpowered trials can be useful in meta-analyses, "a meta-analysis is only as good as its individual trials," says Vibeke Strand, MD, biopharmaceutical consultant and adjunct clinical professor in the division of immunology at Stanford University School of Medicine in Palo Alto, California. "The intra-study group comparisons need to be made first, and then across groups; thus, underpowered studies will remain problematic."
Solutions to this issue can include "study design, conservative sample size calculations, and conservative designs that don't ask too many questions," she suggests, adding that obstacles include patient accrual, unanticipated changes in medical practice, and ascertainment bias.
Results likely underestimate true prevalence of underpowered RCTs
In an editorial accompanying the new study,2 Scott D. Halpern, MD, PhD, of the Center for Clinical Epidemiology and Biostatistics at the University of Pennsylvania School of Medicine in Philadelphia, points out that despite the new findings, the number of underpowered RCTs is probably vastly underestimated.
For one, Dr. Halpern writes, underpowered trials are more likely to yield negative findings, and negative findings are less likely to be published. Moreover, the authors calculated power only for negative and indeterminate RCTs, but assumed that all positive trials were appropriately powered. However "positive RCTs might also be underpowered and simply get lucky by finding an unexpectedly large treatment difference, thereby yielding statistically significant results," he writes.
"The fact that even the best-case scenario is unfavorable suggests a continued need for exploration into, and public notification of, the problem of underpowered clinical trials," Dr. Halpern continues. "The issue may appear to be a dead horse, but I believe it is still worth beating."
Dr. Strand urges caution in interpreting results from RCTs, "especially in major journals, which print post-hoc safety analyses and not the primary predefined efficacy analysis."
Many dangers exist regarding the publication of underpowered trials, Dr. Strand tells CIAOMed, pointing to the Celecoxib Long-term Arthritis Safety Study (CLASS) and the Vioxx® GI Outcomes Research Trial (VIGOR), both which were underpowered to detect cardiovascular events.
Lee S. Simon, MD, rheumatologist and associate clinical professor of medicine at Harvard Medical School in Boston, Massachusetts, says that the new findings are "likely the tip of the iceberg, [as] very rarely do you actually see the presentation of the powering of the trial."
In fact, Dr. Simon asserts, "99% of the people who read these articles don't know the implication of powering." Medical journals play a role in perpetuating this problem, Dr. Simon says, noting that journals unfortunately have become an arm of the popular press. "At one time, [clinical trials data were] published in the journals for us, the scientific and clinical community, to assess, but now it is as if they are one arm of marketing. Thus, misinterpretation of data is rampant, expression of partial data de rigueur, and worse, the evidence is ‘peer-reviewed' and thus supposedly OK."
However, he notes, "peer review can only review what is submitted, so wrong conclusions are often reached for the wrong reasons."
References
1. Keen HI, Pile K, Hill CI. The prevalence of underpowered randomized clinical trials in rheumatology. J Rheum. 2005;32:2083-2088.
2. Halpern SD. Adding nails to the coffin of underpowered trials. J Rheum. 2005;32:2065-2066.