Skip to content

Ensure that git filters are applied when reading blobs #13428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

joshuaspence
Copy link

@joshuaspence joshuaspence commented Jul 8, 2025

Motivation

Nix 2.20 introduced a regression (a breaking change) in which git repositories do not properly have filters applied (specifically filters that are configured via .gitattributes). This manifests as "NAR hash mismatch" errors when users are upgrading from earlier versions and they have a dependency that uses .gitattributes to manipulate line endings. curl is the most notable example I am aware of which, for example, specifies in .gitattributes that *.bat files should have CRLF line endings.

Note that although this change is technically a breaking change, it is undeniably fixing some very broken behaviour within Nix and, therefore, I believe that it is worthwhile. Additionally, I suspect that there are very few dependencies in the wild that are actually affected by this bug and so hopefully the blast radius is small.

Context

Fixes #11428.

The current nix behaviour seems completely broken when it comes to handling git filters.

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

readonly REPO="https://github.com/curl/curl.git"
readonly COMMIT="6b951a6928811507d493303b2878e848c077b471"

readonly FILES=(
  # This file does not contain CRLF but has `eol=crlf` in `.gitattributes`.
  buildconf.bat

  # This file does not contain CRLF.
  README.md

  # This file contains CRLF and isn't matched by `.gitattributes`.
  winbuild/README.md
)

# Don't respect user/system configuration.
export GIT_CONFIG_GLOBAL=/dev/null
export GIT_CONFIG_NOSYSTEM=true

function check_crlf() {
  OUTPUT_DIR="curl-$(echo "$@" | sha256sum | awk '{print $1}')"

  if ! test -d "${OUTPUT_DIR}"; then
    git $@ clone --quiet "${REPO}" "${OUTPUT_DIR}"
    git $@ -C "${OUTPUT_DIR}" checkout --quiet "${COMMIT}"
  fi

  for FILE in ${FILES[@]}; do
    if grep --quiet $'\r' "${OUTPUT_DIR}/${FILE}"; then
      echo "git $@ => ${FILE} contains CRLF"
    else
      echo "git $@ => ${FILE} does not contain CRLF"
    fi
  done

  echo
}

nix --version
for FILE in ${FILES[@]}; do
  if grep --quiet $'\r' "$(nix eval --raw --expr "(builtins.fetchGit { url = \"${REPO}\"; ref = \"master\"; rev = \"${COMMIT}\"; }).outPath")/${FILE}"; then
    echo "nix $(nix --version) => ${FILE} contains CRLF"
  else
    echo "nix $(nix --version) => ${FILE} does not contain CRLF"
  fi
done
echo

check_crlf
check_crlf -c core.eol=lf
check_crlf -c core.eol=crlf
check_crlf -c core.eol=native
check_crlf -c core.autocrlf=false
check_crlf -c core.autocrlf=input
check_crlf -c core.autocrlf=true

The output of the above script on my machine is as follows:

nix (Nix) 2.29.1
nix nix (Nix) 2.29.1 => buildconf.bat does not contain CRLF
nix nix (Nix) 2.29.1 => README.md does not contain CRLF
nix nix (Nix) 2.29.1 => winbuild/README.md contains CRLF

git  => buildconf.bat contains CRLF
git  => README.md does not contain CRLF
git  => winbuild/README.md contains CRLF

git -c core.eol=lf => buildconf.bat contains CRLF
git -c core.eol=lf => README.md does not contain CRLF
git -c core.eol=lf => winbuild/README.md contains CRLF

git -c core.eol=crlf => buildconf.bat contains CRLF
git -c core.eol=crlf => README.md does not contain CRLF
git -c core.eol=crlf => winbuild/README.md contains CRLF

git -c core.eol=native => buildconf.bat contains CRLF
git -c core.eol=native => README.md does not contain CRLF
git -c core.eol=native => winbuild/README.md contains CRLF

git -c core.autocrlf=false => buildconf.bat contains CRLF
git -c core.autocrlf=false => README.md does not contain CRLF
git -c core.autocrlf=false => winbuild/README.md contains CRLF

git -c core.autocrlf=input => buildconf.bat contains CRLF
git -c core.autocrlf=input => README.md does not contain CRLF
git -c core.autocrlf=input => winbuild/README.md contains CRLF

git -c core.autocrlf=true => buildconf.bat contains CRLF
git -c core.autocrlf=true => README.md contains CRLF
git -c core.autocrlf=true => winbuild/README.md contains CRLF

This shows that the way Nix is handling git repositories is inconsistent with git, irrespective of the user and/or system git configuration.

Another manifestation of this bug is that the NAR hash for a git repository can change depending on the order of evaluations. This can be demonstrated by the following script:

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

nix --version
git clone --quiet https://github.com/joshuaspence/nix-crlf-test.git

rm -rf ~/.cache/nix
nix eval --impure --expr "builtins.fetchGit \"file://$(readlink -f nix-crlf-test)\""
nix eval --impure --expr "builtins.fetchGit { url = \"file://$(readlink -f nix-crlf-test)\"; ref = \"$(git -C nix-crlf-test branch --show-current)\"; rev = \"$(git -C nix-crlf-test rev-parse HEAD)\"; }"

rm -rf ~/.cache/nix
nix eval --impure --expr "builtins.fetchGit { url = \"file://$(readlink -f nix-crlf-test)\"; ref = \"$(git -C nix-crlf-test branch --show-current)\"; rev = \"$(git -C nix-crlf-test rev-parse HEAD)\"; }"
nix eval --impure --expr "builtins.fetchGit \"file://$(readlink -f nix-crlf-test)\""

rm -rf nix-crlf-test

The output of this script is as follows:

nix (Nix) 2.29.1
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA="; outPath = "/nix/store/pbm7g5wjg44d1z7byaivhcs9rrv58fqf-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-k7u7RAaF+OvrbtT3KCCDQA8e9uOdflUo5zSgsosoLzA="; outPath = "/nix/store/pbm7g5wjg44d1z7byaivhcs9rrv58fqf-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-BBhuj+vOnwCUnk5az22PwAnF32KE1aulWAVfCQlbW7U="; outPath = "/nix/store/9vi7nc2507l5fjyd0cg6fgbrikncpjmw-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }
{ lastModified = 946684800; lastModifiedDate = "20000101000000"; narHash = "sha256-BBhuj+vOnwCUnk5az22PwAnF32KE1aulWAVfCQlbW7U="; outPath = "/nix/store/9vi7nc2507l5fjyd0cg6fgbrikncpjmw-source"; rev = "27fcdeab9b5edc4095160b6d9a15a5c5260bca38"; revCount = 2; shortRev = "27fcdea"; submodules = false; }

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Jul 8, 2025
@edolstra
Copy link
Member

edolstra commented Jul 8, 2025

What decides whether CR/LF conversion gets done? For reproducibility, it should not depend on the user's configuration or on the system type.

@joshuaspence
Copy link
Author

What decides whether CR/LF conversion gets done? For reproducibility, it should not depend on the user's configuration or on the system type.

It is determined by the .gitattributes file within the repo. At least, that is the intention.

@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Jul 9, 2025
@joshuaspence joshuaspence marked this pull request as ready for review July 9, 2025 08:10
@joshuaspence joshuaspence requested a review from edolstra as a code owner July 9, 2025 08:10
@edolstra edolstra added this to Nix team Jul 9, 2025
@github-project-automation github-project-automation bot moved this to Triage in Nix team Jul 9, 2025
@edolstra edolstra added this to the fetch-tree stabilisation milestone Jul 9, 2025
@edolstra
Copy link
Member

edolstra commented Jul 9, 2025

Team discussion:

  • Should this be controlled by a fetcher attribute (similar to submodules and lfs)? That way it's opt-in and doesn't risk changing hashes. If we make this the default behavior, we should make that decision before stabilizing fetchTree.
  • This restores pre-libgit behaviour but might break hashes for recent lock files.
  • It's not entirely clear what the use case is for CR/LF conversion that doesn't depend on the user's system / configuration... (I.e. if you want everybody to see the same linefeed characters, then why not just commit those as the raw data.)

@edolstra edolstra removed this from Nix team Jul 9, 2025
@GrahamDennis
Copy link

I just want to highlight this piece of @joshuaspence's description:

Another manifestation of this bug is that the NAR hash for a git repository can change depending on the order of evaluations.

This implies that the current behaviour of nix is impure / dependent on an order of operations, which suggests that this code needs to change in some way regardless to address this impurity. It should be acknowledged that this impurity occurs in a particular edge case, but it is an impurity.

@tomberek
Copy link
Contributor

tomberek commented Jul 10, 2025

The order of evaluation matters because the fetcher-cache is being polluted and keyed by only the rev, but not by all the attributes describing how the repo is fetched.

using cache entry 'fetchToStore:{"fingerprint":"27fcdeab9b5edc4095160b6d9a15a5c5260bca38;e","method":"nar","name":"source","path":"/","store":"/nix/store"}' -> '{"storePath":"9vi7nc2507l5fjyd0cg6fgbrikncpjmw-source"}'

Then, the two mechanisms do the fetch with different settings:

copying '«git+file:///home/tom/nixpkgs/t3/nix-crlf-test?exportIgnore=1&ref=master&rev=27fcdeab9b5edc4095160b6d9a15a5c5260bca38»/' to the store...

copying '/home/tom/nixpkgs/t3/nix-crlf-test/' to the store...

It is documented that:

To fetch the content of a checked-out work directory:
builtins.fetchGit ./work-dir

which seems to imply that this uses the contents of the work directory, not the contents of the git objects, hence after filters are run.

It's odd and surprising that the attribute format behaves differently than the plain file URL format, but at least a portion of the problem is due to the fact that the fetcher cache is seeing them as equivalent. This leads to the order-of-operations impurity.

Edit: looks like it was attempted to include this information into the fetcher cache with the ;e suffix to the key. But the file URL format seems to assume that setting by default, even though that is not what is implemented.

Edit2: The problem would be fixed if the workdir format correctly used the attribute that reflected that is post-export - setting exportIgnore to false.

@roberth
Copy link
Member

roberth commented Jul 14, 2025

Just for context, Nix is currently inconsistent about smudge/clean filters as well.

We'll need similar calls in the source access for git workdirs, but we could make that the scope of a future PR.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2025-07-09-nix-team-meeting-minutes-233/66474/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NAR hash mismatch cloning git repository with crlfs introduced in 2.20
6 participants