Meta-Analysis and Reproducibility in Economics Research

Paper Session

Sunday, Jan. 8, 2017 1:00 PM – 3:00 PM

Hyatt Regency Chicago, Toronto
Hosted By: American Economic Association
  • Chair: Edward Miguel, University of California-Berkeley

How Often Should We Believe Positive Results?

Eva Vivalt
,
Australian National University
Aidan Coville
,
World Bank

Abstract

High false positive and false negative reporting probabilities (FPRP and FNRP) reduce the veracity of the available research in a particular field, undermining the value of evidence to inform policy. However, we rarely have good estimates of false positive and false negative rates since both the prior and study power are required for their calculation, and these are not typically available or directly knowable. We leverage AidGrade’s data set of 647 impact evaluations in development economics and complement this by gathering estimates of priors and reasonable minimum detectable effects of various intervention-outcome combinations from policymakers, development practitioners and researchers in order to generate estimates of the FPRP and FNRP rates in development economics.

External Validity in United States Education Research

Sean Tanner
,
University of California-Berkeley

Abstract

As methods for internal validity improve, methodological concerns have shifted toward assessing how well the research community can extrapolate from individual studies. Under recent federal granting initiatives, over $1 billion has been awarded to education programs that have been validated by a single randomized or natural experiment. If these experiments have weak external validity, scientific advancement is delayed and federal education funding might be squandered. By analyzing trials clustered within interventions, this research describes how well a single study’s results are predicted by additional studies of the same intervention in addition to analyzing how well study samples match the target populations of interventions. I find that U.S. education trials are conducted on samples of students who are systematically less white and more socioeconomically disadvantaged that the overall student population. Moreover, I find that effect sizes tend to decay in the second and third trials of interventions.

Aggregating Distributional Treatment Effects: A Bayesian Hierarchical Analysis of the Microcredit Literature

Rachael Meager
,
Massachusetts Institute of Technology

Abstract

This paper develops methods to aggregate evidence on distributional treatment effects from multiple studies conducted in different settings, and applies them to the microcredit literature. Several randomized trials of expanding access to microcredit found substantial effects on the tails of household outcome distributions, but the extent to which these findings generalize to future settings was not known. Aggregating the evidence on sets of quantile effects poses additional challenges relative to average effects because distributional effects must imply monotonic quantiles and pass information across quantiles. Using a Bayesian hierarchical framework, I develop new models to aggregate distributional effects and assess their generalizability. For continuous outcome variables, the methodological challenges are addressed by applying transforms to the unknown parameters. For partially discrete variables such as business profits, I use contextual economic knowledge to build tailored parametric aggregation models. I find generalizable evidence that microcredit has negligible impact on the distribution of various household outcomes below the 75th percentile, but above this point there is no generalizable prediction. Thus, while microcredit typically does not lead to worse outcomes at the group level, there is no generalizable evidence on whether it improves group outcomes. Households with previous business experience account for the majority of the impact and see large increases in the right tail of the consumption distribution.

Why Economics is Weak and Biased

Tom Stanley
,
Hendrix College
John P. A. Ioannidis
,
Stanford University
Chris Doucouliagos
,
Deakin University

Abstract

This paper investigates two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 64,076 economic estimates from 159 empirical
economics literatures drawn from more than 6,700 empirical studies. Using this extensive
quantitative survey of empirical economics, we calculate statistical power and likely bias.
Taking a ‘conservative’ approach (that is, one prone to over-estimate power), the median of these 159 median powers is no more than 18% and likely closer to 10%. Furthermore, 90% of reported findings are under-powered (relative to the widely-accepted 80% convention) in half of these areas of research, and 20% are comprised entirely of underpowered studies. However, other disciplines are also underpowered. For example, the median power among 14,886 meta-analyses of medical research is only 8% (Turner et al., 2013).

Low power makes economic findings mixed and predictably unreliable. “Not only do underpowered studies lead to a confusing literature but they also create a literature that contains biased estimates of effect” (Maxwell, 2004, p.161). Focus on statistical power leads to a new empirical estimator of effect—the weighted average of the adequately powered (WAAP). WAAP
uses optimal WLS weights and reduces reporting bias without making any assumption about the
cause, distribution or model of selection bias. Lastly, we employ this adequately powered
weighted average to assess the overall magnitude of bias in economics. Typically, reported
economic effects are exaggerated by a factor of two, with one-third inflated by a factor of four or more.

References:

Maxwell, S.E. (2004). ‘The persistence of underpowered studies in psychological research: causes, consequences, and remedies’, Psychological Methods, vol. 9, pp. 147-63.

Turner, R.M., Bird, S.M. and Higgins, J.P.T. (2013) ‘The impact of study size on meta-
analyses: Examination of underpowered studies in Cochrane reviews,’ PLoS ONE 8(3): e59202. doi:10.1371/journal.pone.0059202.
Discussant(s)
Rachel Glennerster
,
Massachusetts Institute of Technology
Eva Vivalt
,
Australian National University
Solomon Hsiang
,
University of California-Berkeley
Edward Miguel
,
University of California-Berkeley
JEL Classifications
  • A3 - Collective Works
  • C1 - Econometric and Statistical Methods and Methodology: General