Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

nbdiff: “git diff” for Jupyter Notebooks Version Control


Owner: Vadim Rudakov, lefthand67@gmail.com
Version: 0.3.1
Birth: 2025-12-28
Last Modified: 2025-12-31


This guide explains how to use nbdime (specifically nbdiff) to manage Jupyter Notebooks effectively. Standard git diffs are notoriously difficult to read because notebooks are stored as dense JSON files containing metadata, cell IDs, and execution counts that obscure actual logic changes.

1. What is nbdiff and Why We Use It

Jupyter Notebooks (.ipynb) are JSON documents. When you change a single line of code or simply re-run a cell, a standard git diff shows a mess of structural changes.

We use nbdiff because it is content-aware. It understands the notebook structure and filters out the “noise” to show you only what matters:

The Strategy: “Keep the Data, Filter the View”

We treat .ipynb files as Flat Files. Unlike standard software engineering where we strip all “artifacts,” in research, the outputs are the evidence.

We store the full notebook (metadata and plots) in Git, but we use nbdiff externally to avoid being blinded by JSON noise during manual reviews or when prompting AI tools like Aider.

2. Installation with uv

We use uv for fast, reliable Python tool management. To install the nbdime suite (which includes nbdiff) as a globally accessible tool:

uv tool install nbdime

This installs the following CLI tools:

3. How to Use nbdiff

Common CLI Commands

CommandDescription
nbdiff HEAD path/to/notebook.ipynbCompare current changes to the last commit.
nbdiff 351b33e HEAD -- path/to/nb.ipynbCompare two specific commits.
nbdiff-web 351b33e HEAD -- path/to/nb.ipynbView changes in a rich browser UI (Highly Recommended).

Integrating with aider

aider is a pair-programming tool that works best when it can see clean diffs. To help the AI write meaningful commit messages or understand your logic, feed it a “clean” text stream:

  1. Check the diff: If you want the AI to review your work, run:

/run nbdiff --ignore-metadata HEAD path/to/notebook.ipynb

or even cleaner:

# Within aider, use /run to provide context
/run nbdiff --ignore-metadata --ignore-outputs HEAD path/to/notebook.ipynb

Why? This tells the AI: “Ignore the 5,000 lines of base64 image data and focus on the 5 lines of Python code I changed.”

  1. Targeted Edits: If Aider struggles to find a cell, use nbdiff-web to find the Cell Index and prompt:

“In cell index 14, please update the LaTeX formula to include the bias term .”

4. Advanced Setup: “Documentation-First” Aliases

If you are working on a research paper or documentation, you likely want to hide all code execution noise (output blobs and run counts) to focus on the text and logic.

Create a “Clean” Alias

Add a “clean” version of nbdiff to your ~/.bashrc :

# Add the alias
echo 'alias nbclean="nbdiff --ignore-metadata --ignore-outputs"' >> ~/.bashrc
# Refresh your shell
source ~/.bashrc

Now, running nbclean will hide all image data and execution numbers, showing only the code and markdown changes.

When to use nbclean:

  1. Before Committing: Run nbclean HEAD to ensure you didn’t leave a “print(x)” or a temporary debug variable in your code.

  2. Aider Context: Use it to copy-paste clean code diffs into your chat if /run is unavailable.

5. Comparison of Workflows

MethodBest for...Why?
git diffData IntegrityShows the “Truth.” Confirms if binary data/metadata was actually changed.
nbdiffManual ReviewShows the “Logic.” Filters out execution counts so you can read the code.
nbdiff-webVisual AuditShows the “Result.” Best for checking if a LaTeX formula or Plot changed.
nbcleanAI / AiderProvides “Pure Context.” Strips everything but code for the LLM.

Summary Reference

TaskCommand
Installuv tool install nbdime
List configurationnbdime --config
Simple Diffnbdiff old.ipynb new.ipynb
Visual/Web Diffnbdiff-web <hash> HEAD <file>
Ignore Noisenbdiff --ignore-metadata --ignore-outputs
Git Integrationnbdime config-git --enable

Appendix A. If you want Git integration

1. Configuration: Native Git Integration

Instead of running a separate command, you can configure Git to use nbdiff automatically whenever you run git diff on a .ipynb file.

To enable global Git integration:

nbdime config-git --enable --global

Now, a standard git diff will output a clean, readable notebook summary instead of raw JSON.

You can always see the nbdime configuration like this:

nbdime --config

2. Configure “No-Noise” Git Defaults

To ensure your Git history remains clean and your diffs focus only on code changes, configure nbdime to ignore noise globally.

Run these commands to tell the Git driver to skip metadata and outputs:

git config --global diff.jupyternotebook.command "git-nbdiffdriver diff --ignore-metadata --ignore-outputs"

To verify your settings are active, run:

nbdime --config

Why it is not a good idea to use ignore flags for daily work

While integrating --ignore-metadata and --ignore-outputs into your global Git configuration significantly cleans up your workflow, there are several “blind spots” and technical risks you should include in your handbook.

1. The “False Clean” Security Risk

When you ignore metadata/outputs in git diff, you are only changing what you see in the terminal or browser. You are not cleaning the file itself.

2. Loss of “Result Provenance”

In many data science contexts, the output is the evidence.

3. Merge Conflict “Shadows”

Merging is more dangerous than diffing with these flags.

4. Broken Interactive Workflows (git add -p)

If you use interactive staging (git add -p or git add -e), Git expects a “dumb” line-by-line diff.