Abstract
In many branches of modern science, researchers first study or mine large datasets, and then select the parameters they estimate and the data they use and publish. Such data-based selection complicates formal statistical inference. An example discussed here for the purpose of illustration, is that of pharmaceutical companies that typically conduct many experiments but may publish only selected data. The selection often depends on the outcomes of the experiments since naturally there is interest in potentially useful drugs, and it is in general unclear how it should affect inference. Is this effect the same for the company and the public? Does it matter if they are Bayesian or frequentist? Should the company reveal all experiments it conducts and, if so, how should this change the conclusions? This note discusses these questions in terms of a simple example of a sequence of binomial experiments conducted by a pharmaceutical company, where results are published only if the number of "failures" is small. We do not suggest that this example corresponds to reality in the pharmaceutical industry, nor in science in general; our goal is to elaborate on the importance and difficulties of taking selection into account when performing statistical analysis.
Original language | English |
---|---|
Pages (from-to) | 211-217 |
Number of pages | 7 |
Journal | American Statistician |
Volume | 63 |
Issue number | 3 |
DOIs | |
State | Published - 2009 |
Keywords
- Binomial model
- Confidence interval
- Credible set
- Decision theory
- Meta analysis
- Publication bias