Computer-Oriented Geoscience Lab

Open science

We have an open-by-default policy, meaning that we assume that software and data will be made openly available under permissive licenses (CC-BY, BSD, etc.) unless there is a very good reason not to (like PII or other restrictions).

General guidelines on openness

  • Work on software should be conducted in public repositories.
  • Papers, dissertations, and research projects may be held back as private until time of first submission, when they must be made public for peer-review.
  • Because most repositories will be made publicly accessible, including their entire history, lab members are required to behave professionally in them (including commit messages, issues, pull requests, code comments, etc).
  • Many journals consider the content of referee reports to be confidential. Unless the report is explicitly open, it should not be added to repositories as text, commit messages, or comments in papers.
  • Grant proposals developed internally with no external PIs or Co-PIs should be made open unless there is confidential information.
  • Grant proposals developed in collaboration with external PIs or Co-PIs may be kept confidential at the external collaborators discretion, although our preference is for them to be open.

Research ethics

Our actions should be guided by the ethics of participating in the global scientific community. This means:

  • When other researchers request assistance with software developed in the lab, we should attempt to make a best effort to assist them. It is not unreasonable to ask for authorship, particularly if the collaboration is extensive.
  • Provide citations to all software that assisted in the development of the scholarly work. In general it is acceptable to cite the layers of software in the analysis stack (e.g., NumPy, Matplotlib, IPython/Jupyter, Fatiando, etc.)
  • Provide citations to data sources (DOIs and publications) wherever possible, and where not possible, should be included as footnotes.
  • Plagiarism is unacceptable in any form. This includes “first pass” text included in papers or proposals. If including text from an external source, it must be clearly marked as such to ensure it is not accidentally included in the final product.
  • Prioritizing our professional obligations over fear of being “scooped.” For instance, it is unacceptable to interfere with the peer-review process for a paper out of concern of protecting one’s own work (i.e., “sitting” on a review for it, making unreasonable requests to delay publication, and so on.)

Data and code availability

  • All data and models generated by the lab will be made available in both raw and processed forms under CC0 or CC-BY licenses.
  • All software will be made available in source form under permissive, BSD-style licenses, unless constrained by external copyleft licensing.
  • Source code will be made publicly available on GitHub (or similar), with additional archives on longer-term preservation platforms like figshare or Zenodo.
  • Every effort will be made to preserve integrity and accessibility of data.

Note: The text above is based on the GBMF DDD Data Sharing Plan by Matthew Turk (licensed CC-BY).

Reproducibility

  • All necessary components to provide reproducibility of a scholarly work should be provided in a publicly accessible location, as part of the scientific record (e.g., by including DOIs to data archives in papers), and distributed under an appropriate open license.
  • The only exception is when strongly prohibited by external concerns (private or sensitive data, etc).
  • Method development papers must include:
    1. The source code that implements the methodology.
    2. Analysis code that generated plots and results from the paper.
  • The additional overhead of making work reproducible should not be onerous compared to the other expectations, and can actually reduce the overall effort of developing papers and workflows.
  • We will endeavor to respond to requests to reproduce our results by providing necessary technology and data, allowing for reasonable commitments of time and effort.

Credit and terms of reuse: This manual is based on the excellent Lab Carpentry blueprints, with material adapted from the Data Intensive Biology Lab and the Data Exploration Lab. The manual contents are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.