Abstract
We present a discipline for verifiable computational scientific research. Our discipline revolves around three simple new concepts - verifiable computational result (VCR), VCR repository and Verifiable Result Identifier (VRI). These are web- and cloud-computing oriented concepts, which exploit today's web infrastructure to achieve standard, simple and automatic reproducibility in computational scientific research. The VCR discipline requires very slight modifications to the way researchers already conduct their computational research and authoring, and to the way publishers manage their content. In return, the discipline marks a significant step towards delivering on the long-anticipated promises of making scientific computation truly reproducible. A researcher practicing this discipline in everyday work produces computational scripts and word processor files that look very much like those they already produce today, but in which a few lines change very subtly and naturally. Those scripts produce a stream of verifiable results, which are the same tables, figures, charts and datasets the researcher traditionally would have produced, but which are watermarked for permanent identification by a VRI, and are automatically and permanently stored in a VCR repository. In a scientific community practicing Verifiable Computational Research, exchange of both ideas and data involves exchanging result identifiers - VRIs - rather than exchanging files. These identifiers are controlled, trusted and automatically generated strings that point to publicly available result as it was originally created by the computational process itself. When a verifiable result is included in a publication, its identifier can be used by any reader with a web browser to locate, browse and, where appropriate, re-execute the computation that produced the result. Journal readers can therefore scrutinize, dispute, understand and eventually trust these computational results, all to an extent impossible through textual explanations that constitute the core of scientific publications to date. In addition, the result identifier can be used by subsequent computations to locate and retrieve both the published result (in graphical or numerical form) and the original datasets used by its generating computation. Colleagues can thus cite and import data into their own computations, just as traditional publications allow them to cite and import ideas. We describe an existing software implementation of the Verifiable Computational Research discipline, and argue that it solves many of the crucial problems commonly facing computer-based and computeraided research in various scientific fields. Our system is secure, naturally adapted to large-scale and cloud computations and to modern massive data analysis, yet places effectively no additional workload on either the researcher or the publisher.
Original language | English |
---|---|
Pages (from-to) | 637-647 |
Number of pages | 11 |
Journal | Procedia Computer Science |
Volume | 4 |
DOIs | |
State | Published - 2011 |
Externally published | Yes |
Event | 11th International Conference on Computational Science, ICCS 2011 - Singapore, Singapore Duration: 1 Jun 2011 → 3 Jun 2011 |
Bibliographical note
Funding Information:MG is supported by a William R. and Sara Hart Kimball Stanford Graduate Fellowship and would like to thank Balasubramanian Narasimhan, Alon Shalita and Omer Tamuz for their helpful suggestions.
Keywords
- Computation chronicle
- Reproducible research
- VCR repository
- Verifiable computational research
- Verifiable result
- Verifiable result identifier