Skip to content

Conversation

Liozou
Copy link
Member

@Liozou Liozou commented Sep 23, 2025

On Windows, during precompilation of package A, a DLL file is generated and may replace the existing one. If A is already loaded in the julia session however, the corresponding soon-to-be-obsolete DLL cannot be removed while julia is running. Currently, this problem is solved by moving the old DLL in a special "julia_delayed_deletes" folder, which will be cleaned in a later julia session by Pkg.gc().
However, moving an in-use DLL is only permitted on the same drive, and the "julia_delayed_deletes" is located in the tempdir, a.k.a. on a fixed drive, say C:. Thus, if the DEPOT_PATH points to a ".julia" in another drive, say D:, any precompilation occuring on an already loaded package will fail. This is what happens in #59589, actually resulting in an infinite loop that bloats up both memory and disk. @staticfloat had actually predicted that such an issue could occur in #53456 (comment).

This PR fixes #59589 by changing the "delayed deleting" mechanism: instead of moving the old DLLs to a fixed folder, they are renamed in their initial folder and recorded in a list stored at a fixed location. Upon Pkg.gc(), the listed obsolete files will be deleted (JuliaLang/Pkg.jl#4392).

It also prevents any similar infinite loops by cutting the rm -> mv -> rm cycle into rm -> rename. I also removed the private argument allow_delayed_delete from rm, which is only called in by Pkg but does not appear to be useful.

EDIT: current state is #59635 (comment)

@Liozou Liozou added system:windows Affects only Windows backport 1.12 Change should be backported to release-1.12 labels Sep 23, 2025
@KristofferC KristofferC mentioned this pull request Sep 24, 2025
24 tasks
@Liozou
Copy link
Member Author

Liozou commented Sep 24, 2025

The CI failures come from the loading.jl test that attempt to remove the depot at the end of the test. This fails, because the DLL is still present (albeit renamed) within... So I'm reverting the design of this PR to something closer to the current state. In the current version of the PR:

  • When an in-use DLL is being rm-ed, rename it to a unique name in the same folder.
  • Move it to a special directory is either joinpath(tempdir(), "julia_delayed_deletes") if it is on the same drive as the DLL, else joinpath(drive, "julia_delayed_deletes") with the correct drive. This way, the depot can be removed during the same julia session. (removed it 8badff9)
  • The path to the now obsolete DLL is written in a file stored in joinpath(tempdir(), "julia_delayed_deletes_ref"). This file has a unique name generated by mktemp, so that multiple julia processes can attempt to rm the same in-use DLL without corruption.
  • Upon Pkg.gc(), the files in the "julia_delayed_deletes_ref" folder are read, each containing a single line with the path to an obsolete DLL to remove. If all DLLs in a "julia_delayed_deletes" folder have been removed, the folder itself is removed; if all files referenced in the "julia_delayed_deletes_ref" folder are removed, that folder is also removed.
  • Like currently, the rm called by Pkg.gc() in this particular setting is the only one that unsets the allow_delayed_delete kwarg. By default, rm attempts this strategy upon trying to delete an in-use DLL.

base/file.jl Outdated
Comment on lines 330 to 338
drive = first(splitdrive(path))
tmpdrive = first(splitdrive(tempdir()))
# in-use DLL must be kept on the same drive
deletedir = if drive == tmpdrive
joinpath(tempdir(), "julia_delayed_deletes")
else
joinpath(drive, "julia_delayed_deletes")
end
mkpath(deletedir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally a very unreliable way to go about finding a safe folder to move a file to. I suggest just keeping it in the same folder

Suggested change
drive = first(splitdrive(path))
tmpdrive = first(splitdrive(tempdir()))
# in-use DLL must be kept on the same drive
deletedir = if drive == tmpdrive
joinpath(tempdir(), "julia_delayed_deletes")
else
joinpath(drive, "julia_delayed_deletes")
end
mkpath(deletedir)
path = abspath(path)
deletedir = dirname(path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. that when Pkg.gc goes to clean up these "empty" folders it should remember to consider applying any remappings and try both the old and new (renamed) paths

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Actually that's what I did at first, but it messes with the current state of the tests in loading.jl, which call rm(depot) during the same process that creates the in-use DLL, and thus fail. See CI for the previous commit: https://buildkite.com/julialang/julia-master/builds/50766/steps/canvas?sid=019976ae-d4af-457c-b0d1-9cc06fba875d
I agree that the solution here is not quite ideal... but I don't know how to find a better "safe" folder on a given drive. The joinpath(tempdir(), "julia_delayed_deletes") one is the current solution on master, but there is no equivalent on any drive.

@vtjnash
Copy link
Member

vtjnash commented Sep 24, 2025

Thanks for working on this!

@Liozou
Copy link
Member Author

Liozou commented Sep 26, 2025

Thanks again for the reviews! In the current solution, I partially reverted the last changes so that the renamed obsolete DLLs remain in their folder. This required modifying the loadings.jl tests: instead of making the test depots with mktempdir(), which inevitably fails at cleanup because it encounters in-use DLLs, they are now made with mktempdir(; cleanup=false). Then, in an atexit hook, a special julia process is spawned, whose sole job is to wait for the test process to die, and then to cleanup these depots. This only affects Windows, nothing is changed for the other systems.

@Liozou Liozou force-pushed the rmdll branch 3 times, most recently from 6e8770a to 2e10a27 Compare September 26, 2025 14:48
end
"""
cmd = Cmd(`$(Base.julia_cmd()) --startup-file=no -e $rmcmd`; ignorestatus=true, detach=true)
run(cmd; wait=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run(cmd; wait=false)
rd, wr = Base.link_pipe(true, true)
run(cmd, rd, devnull, stderr; wait=false)
Base.close_pipe_sync(rd)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run(cmd; wait=false)
p = Base.PipeEndpoint()
run(cmd, p, devnull, stderr; wait=false)
join(p, DEPOTS_TOREMOVE, "\n")
Base.dup(Base.rawhandle(p)) # intentionally leak a reference, until the process exits
close(p)

Comment on lines +14 to +17
for _ in 1:3600 # wait up to 1h from atexit() until the julia testing process dies
sleep(1)
ccall(:uv_kill, Cint, (Cuint, Cint), $(getpid()), 0) == Base.UV_ESRCH && break
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _ in 1:3600 # wait up to 1h from atexit() until the julia testing process dies
sleep(1)
ccall(:uv_kill, Cint, (Cuint, Cint), $(getpid()), 0) == Base.UV_ESRCH && break
end
eof(stdin)

Comment on lines +330 to +331
drive = first(splitdrive(path))
tmpdrive = first(splitdrive(tempdir()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused variable cleanup

Suggested change
drive = first(splitdrive(path))
tmpdrive = first(splitdrive(tempdir()))

sleep(1)
ccall(:uv_kill, Cint, (Cuint, Cint), $(getpid()), 0) == Base.UV_ESRCH && break
end
for path in $DEPOTS_TOREMOVE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows command line is very limited in how much text it can reliably transfer. Sometimes as low as 2k characters. This list should be passed via stdin to avoid the coammnd getting truncated

@vtjnash
Copy link
Member

vtjnash commented Sep 26, 2025

The special julia process to be spawned is very clever. Nice job. We might even want to move this code into temp_cleanup_purge_all if the TEMP_CLEANUP set is non-empty after attempting in-process cleanup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.12 Change should be backported to release-1.12 system:windows Affects only Windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rm of an in-use DLL on a separate drive from tempdir on Windows will infinitely recurse
5 participants