Reproducibility

Goal

This session aims to highlight the importance of reproducibility in AI-driven Arctic research. Participants will learn about the challenges and best practices for ensuring that AI models and their results can be reproduced by other researchers, a cornerstone for building trust and advancing the field. The discussion will cover strategies for documenting experiments, sharing data and code, and using version control systems.

Introduction

Reproducibility is not a new topic when it comes to artificial intelligence and machine learning in science, but is more important than ever as AI research is often criticized for not being reproducible. This becomes particularly problematic when validation of a model requires reproducing it.

Key Elements & Terminology

Importance of reproducibility, challenges to achieving reproducibility, best practices for reproducible research, tools and techniques to enhance reproducibility

Key terms:

  • Hyperparameter: parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix “hyper” in the term indicates their role as top-level parameters controlling the learning process. Hyperparameters exist externally in the model and cannot be estimated using the data because the developer sets these values before the model training begins.

  • Modeling code: code used to implement the model

  • Implementation code: code used for inference

Outline

The Reproducibility Checklist

The Reproducibility Checklist was created by Canadian computer scientist, Joelle Pineau, with the goal of facilitating reproducible machine learning algorithms that can be tested and replicated. Their checklist is as follows:

For all models and algorithms, check that you include:

  • A clear description of the mathematical setting, algorithm, and/or model

  • An analysis of the complexity (time, space, sample size) of any algorithm

  • A link to a downloadable source code*, with specification of all dependencies, including external libraries

For any theoretical claim, check that you include:

  • A statement of the result

  • A clear explanation of each assumption

  • A complete proof of the claim

For all figures and tables that include empirical results, check that you include:

  • A complete description of the data collection process, including sample size

  • A link to a downloadable version of the dataset or simulation environment

  • An explanation of any data that was excluded and a description of any preprocessing step

  • An explanation of how samples were allocated for training, validation, and testing

  • The range of hyperparameters considered, method to select the best hyperparameter configuration, and specification of each hyperparameter used to generate results

  • The exact number of evaluation runs

  • A description of how experiments were run

  • A clear definition of the specific measure of statistics used to report results

  • Clearly defined error bars

  • A description of results with central tendency (e.g., mean) and variation (e.g., standard deviation)

  • A description of the computing infrastructure used

*With sensitive data or proprietary code, scientists may not wish to release all of its code and data. In this case, data can be anonymized and/or partial code can be released that won’t run but can be read and reproduced

Sharing Code

The first step to solving the problem of reproducibility is sharing the code that was used to generate the model. This allows other researchers to: + Validate the model

  • Track code construction and see any author annotations

  • Expand on published work

Despite this, sharing code does not always mean that models are fully reproducible. Many machine learning models are trained on restricted datasets or require extensive computing power for training the model. Because of this, there are a few additional criteria that improve reproducibility including:

  • Data and metadata availability (must be included without question)

  • Transparency of the code you’re using and dependencies needed to run the code

  • Easily installable computational analysis tools and pipelines

  • Installed software should behave the same on every machine and should have the same runtime The Reproducibility Spectrum by Altuna Akalin in Towards Data Science

Model Repositories

PyTorch Hub is a pre-trained model repository designed to facilitate reproducibility and enable new research. It is easily usable with Colab and Papers with Code, but models must be trained on openly accessible data.

  • Papers with Code is an open source hub for publications that include direct links to GitHub code, no account needed to access datasets. Example paper with attached code from Papers with Code ### Version Control !(images/VersionControl.png)

Version control is the process of keeping track of every individual change by each contributor that’s saved in a version control framework, or a special database. Keeping a history of these changes to track model performance relative to model parameters saves the time you’d spend retraining the model. Using a version control system ensures easier:

  • Collaboration

  • Versioning

    • If your model breaks, you’ll have a log of any changes that were made, allowing you or others to revert back to a stable version
  • Dependency tracking

    • You can test more than one model on different branches or repositories, tune the model parameters and hyperparameters, and monitor the accuracy of each implemented change
  • Model updates

    • Version control allows for incrementally released versions while continuing the development of the next release

References & Resources

  1. Gundersen, Odd Erik, and Sigbjørn Kjensmo. 2018. “State of the Art: Reproducibility in Artificial Intelligence”. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). https://doi.org/10.1609/aaai.v32i1.11503.
  2. Gundersen, Odd Erik, Yolanda Gil, and David W. Aha. “On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications.” AI Magazine 39, no. 3 (September 28, 2018): 56–68. https://doi.org/10.1609/aimag.v39i3.2816.
  3. “How the AI Community Can Get Serious about Reproducibility.” Accessed September 18, 2024. https://ai.meta.com/blog/how-the-ai-community-can-get-serious-about-reproducibility/.
  4. Abid, Areeba. “Addressing ML’s Reproducibility Crisis.” Medium, January 7, 2021. https://towardsdatascience.com/addressing-mls-reproducibility-crisis-7d59e9ed050.
  5. PyTorch. “Towards Reproducible Research with PyTorch Hub.” Accessed September 18, 2024. https://pytorch.org/blog/towards-reproducible-research-with-pytorch-hub/.
  6. Stojnic, Robert. “ML Code Completeness Checklist.” PapersWithCode (blog), April 8, 2020. https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501.
  7. Akalin, Altuna. “Scientific Data Analysis Pipelines and Reproducibility.” Medium, July 5, 2021. https://towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5.
  8. Hashesh, Ahmed. “Version Control for ML Models: What It Is and How To Implement It.” neptune.ai, July 22, 2022. https://neptune.ai/blog/version-control-for-ml-models.