Reproducibility
Goal
This session aims to highlight the importance of reproducibility in AI-driven Arctic research. Participants will learn about the challenges and best practices for ensuring that AI models and their results can be reproduced by other researchers, a cornerstone for building trust and advancing the field. The discussion will cover strategies for documenting experiments, sharing data and code, and using version control systems.
Introduction
Reproducibility is not a new topic when it comes to artificial intelligence and machine learning in science, but is more important than ever as AI research is often criticized for not being reproducible1. This becomes particularly problematic when validation of a model requires reproducing it.
Key Elements & Terminology
Importance of reproducibility, challenges to achieving reproducibility, best practices for reproducible research, tools and techniques to enhance reproducibility
Key terms:
Hyperparameter: parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix “hyper” in the term indicates their role as top-level parameters controlling the learning process. Hyperparameters exist externally in the model and cannot be estimated using the data because the developer sets these values before the model training begins.
Modeling code: code used to implement the model
Implementation code: code used for inference
Outline
The Reproducibility Checklist
The Reproducibility Checklist was created by Canadian computer scientist, Joelle Pineau, with the goal of facilitating reproducible machine learning algorithms that can be tested and replicated. Their checklist is as follows:
For all models and algorithms, check that you include:
A clear description of the mathematical setting, algorithm, and/or model
An analysis of the complexity (time, space, sample size) of any algorithm
A link to a downloadable source code*, with specification of all dependencies, including external libraries
For any theoretical claim, check that you include:
A statement of the result
A clear explanation of each assumption
A complete proof of the claim
For all figures and tables that include empirical results, check that you include:
A complete description of the data collection process, including sample size
A link to a downloadable version of the dataset or simulation environment
An explanation of any data that was excluded and a description of any preprocessing step
An explanation of how samples were allocated for training, validation, and testing
The range of hyperparameters considered, method to select the best hyperparameter configuration, and specification of each hyperparameter used to generate results
The exact number of evaluation runs
A description of how experiments were run
A clear definition of the specific measure of statistics used to report results
Clearly defined error bars
A description of results with central tendency (e.g., mean) and variation (e.g., standard deviation)
A description of the computing infrastructure used
*With sensitive data or proprietary code, scientists may not wish to release all of its code and data. In this case, data can be anonymized and/or partial code can be released that won’t run but can be read and reproduced
Model Repositories
PyTorch Hub is a pre-trained model repository designed to facilitate reproducibility and enable new research. It is easily usable with Colab and Papers with Code, but models must be trained on openly accessible data.
- Papers with Code is an open source hub for publications that include direct links to GitHub code, no account needed to access datasets.
Version Control
Version control is the process of keeping track of every individual change by each contributor that’s saved in a version control framework, or a special database. Keeping a history of these changes to track model performance relative to model parameters saves the time you’d spend retraining the model. Using a version control system ensures easier:
Collaboration
Versioning
- If your model breaks, you’ll have a log of any changes that were made, allowing you or others to revert back to a stable version
Dependency tracking
- You can test more than one model on different branches or repositories, tune the model parameters and hyperparameters, and monitor the accuracy of each implemented change
Model updates
- Version control allows for incrementally released versions while continuing the development of the next release
References & Resources
- Gundersen, Odd Erik, and Sigbjørn Kjensmo. 2018. “State of the Art: Reproducibility in Artificial Intelligence”. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). https://doi.org/10.1609/aaai.v32i1.11503.
- Gundersen, Odd Erik, Yolanda Gil, and David W. Aha. “On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications.” AI Magazine 39, no. 3 (September 28, 2018): 56–68. https://doi.org/10.1609/aimag.v39i3.2816.
- “How the AI Community Can Get Serious about Reproducibility.” Accessed September 18, 2024. https://ai.meta.com/blog/how-the-ai-community-can-get-serious-about-reproducibility/.
- Abid, Areeba. “Addressing ML’s Reproducibility Crisis.” Medium, January 7, 2021. https://towardsdatascience.com/addressing-mls-reproducibility-crisis-7d59e9ed050.
- PyTorch. “Towards Reproducible Research with PyTorch Hub.” Accessed September 18, 2024. https://pytorch.org/blog/towards-reproducible-research-with-pytorch-hub/.
- Stojnic, Robert. “ML Code Completeness Checklist.” PapersWithCode (blog), April 8, 2020. https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501.
- Akalin, Altuna. “Scientific Data Analysis Pipelines and Reproducibility.” Medium, July 5, 2021. https://towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5.
- Hashesh, Ahmed. “Version Control for ML Models: What It Is and How To Implement It.” neptune.ai, July 22, 2022. https://neptune.ai/blog/version-control-for-ml-models.