dlclose() does not behave properly on Mac
This report will reference this repository which reproduces the issue: https://github.com/dradtke/rust-dylib-issues
The Issue
The repository contains an application library, built as a dylib, and two example main programs, one in Rust and one in C. Each main application runs in a loop, loading the library with dlopen(), calling a method, and then closing with dlclose(). The expectation is that any changes to the library will be picked up immediately by the main application when it is recompiled.
However, the behavior between the two programs differs. If I run the two main programs side-by-side, then make a change to the returned message and recompile the library, only the C program immediately reflects the change. The Rust main program won't reflect any changes until it is fully restarted.
It appears that this is Mac-specific behavior. When the same test is run on Debian, the two main programs behave identically.
The Environment
Operating System: macOS Sierra 10.12.6 Rust Version:
rustc 1.23.0 (766bd11c8 2018-01-01)
binary: rustc
commit-hash: 766bd11c8a3c019ca53febdcd77b2215379dd67d
commit-date: 2018-01-01
host: x86_64-apple-darwin
release: 1.23.0
LLVM version: 4.0
C Compiler:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Your main.rs includes
extern crate app;-- it may be that the linker on Linux is trimming the unused dependency but macOS is keeping it linked. If the library is loaded at startup, thendlopen/dlclosewill just be bumping the reference count up and down.Josh Stone at 2018-02-02 21:41:54
Ah, that's a good call. Unfortunately, it looks like removing
extern crate app;causes it to segfault, which also doesn't happen on Linux.Damien Radtke at 2018-02-02 22:03:32
Can you capture any information about the segfault? Perhaps a debugger backtrace?
Josh Stone at 2018-02-03 01:30:03
Likely a duplicate of https://github.com/rust-lang/rust/issues/28794.
Simonas Kazlauskas at 2018-02-03 16:55:47
A quick look at your Rust code reveals it invoking undefined behaviour. You use CString to null-terminate your literals, however
CString::new(&symbol[..]).unwrap().into_raw()will immediately free the bufferCStringallocates so the C code reads an invalid pointer.This could also be a cause for different behaviour.
Simonas Kazlauskas at 2018-02-03 16:58:27
Here's what the debugger says when I run it:
Process 12004 launched: '/Users/dradtke/Workspace/rust/dylib/main/target/debug/main' (x86_64) Message: hello there world Process 12004 stopped * thread #1: tid = 0x78d98d, 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7 libsystem_platform.dylib`OSSpinLockLock: -> 0x7fff9932dca9 <+7>: lock 0x7fff9932dcaa <+8>: cmpxchgl %ecx, (%rdi) 0x7fff9932dcad <+11>: jne 0x7fff9932dcb0 ; <+14> 0x7fff9932dcaf <+13>: retqAnd the full backtrace:
warning: could not load any Objective-C class information. This will significantly reduce the quality of type information available. * thread #1: tid = 0x78d98d, 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00007fff9932dca9 libsystem_platform.dylib`OSSpinLockLock + 7 frame #1: 0x00000001000250c6 main`je_arena_dalloc_large [inlined] je_malloc_mutex_lock + 38 at mutex.h:99 [opt] frame #2: 0x00000001000250ba main`je_arena_dalloc_large(tsdn=0x000000010060d008, arena=0x4d746c7561666544, chunk=0x0000000100200000, ptr=0x00000001003002c0) + 26 at arena.c:3075 [opt] frame #3: 0x0000000100026625 main`je_arena_ralloc [inlined] je_arena_sdalloc(slow_path=true) + 12 at arena.h:1516 [opt] frame #4: 0x0000000100026619 main`je_arena_ralloc [inlined] je_isdalloct(slow_path=true) + 164 at jemalloc_internal.h:1195 [opt] frame #5: 0x0000000100026575 main`je_arena_ralloc [inlined] je_isqalloc(slow_path=true) at jemalloc_internal.h:1205 [opt] frame #6: 0x0000000100026575 main`je_arena_ralloc(tsd=0x000000010060d008, arena=0x0000000000000000, ptr=<unavailable>, oldsize=<unavailable>, size=<unavailable>, alignment=<unavailable>, zero=<unavailable>, tcache=<unavailable>) + 2037 at arena.c:3376 [opt] frame #7: 0x000000010001cc79 main`je_rallocx [inlined] je_iralloct(ptr=<unavailable>, oldsize=<unavailable>, alignment=0, tcache=<unavailable>, arena=0x0000000000000000) + 263 at jemalloc_internal.h:1259 [opt] frame #8: 0x000000010001cb72 main`je_rallocx(ptr=0x00000001003002c0, size=33, flags=<unavailable>) + 674 at jemalloc.c:2414 [opt] frame #9: 0x0000000100019ee1 main`alloc_jemalloc::contents::__rde_realloc + 81 at lib.rs:170 [opt] frame #10: 0x0000000100002d68 main`alloc::vec::{{impl}}::reserve_exact<u8> [inlined] alloc::heap::{{impl}}::realloc + 19 at heap.rs:127 [opt] frame #11: 0x0000000100002d55 main`alloc::vec::{{impl}}::reserve_exact<u8> [inlined] alloc::raw_vec::{{impl}}::reserve_exact<u8,alloc::heap::Heap> + 28 at raw_vec.rs:429 [opt] frame #12: 0x0000000100002d39 main`alloc::vec::{{impl}}::reserve_exact<u8> + 25 at vec.rs:486 [opt] frame #13: 0x0000000100006bee main`std::ffi::c_str::{{impl}}::from_vec_unchecked + 30 at c_str.rs:360 [opt] frame #14: 0x0000000100006ba2 main`std::ffi::c_str::{{impl}}::_new + 114 at c_str.rs:335 [opt] frame #15: 0x00000001000020fc main`std::ffi::c_str::{{impl}}::new<&str>(t=(data_ptr = "../app/target/debug/libapp.dylibget_message", length = 32)) + 60 at c_str.rs:329 frame #16: 0x0000000100002246 main`main::main + 102 at main.rs:19 frame #17: 0x000000010003fc0f main`panic_unwind::__rust_maybe_catch_panic + 31 at lib.rs:101 [opt] frame #18: 0x000000010000fab9 main`std::rt::lang_start [inlined] std::panicking::try<(),closure> + 51 at panicking.rs:459 [opt] frame #19: 0x000000010000fa86 main`std::rt::lang_start [inlined] std::panic::catch_unwind<closure,()> at panic.rs:365 [opt] frame #20: 0x000000010000fa86 main`std::rt::lang_start + 422 at rt.rs:58 [opt] frame #21: 0x0000000100002705 main`main + 37 frame #22: 0x00007fff9911f235 libdyld.dylib`start + 1 frame #23: 0x00007fff9911f235 libdyld.dylib`start + 1Damien Radtke at 2018-02-05 16:02:22
@nagisa
however
CString::new(&symbol[..]).unwrap().into_raw()will immediately free the bufferThat's not true --
CString::into_raw()relinquishes ownership, and that will just leak unless you pass the memory back toCString::from_raw()later.But that does highlight to me that the other
from_raw()calls are problematic. EspeciallyCString::from_raw(dlerror()), asdlerror()'s return value is not meant to be freed by the caller. That should probably beCStr::from_ptr()instead.The other
CString::from_raw(func())might be OK, when you're absolutely sure thatfunc()is returning memory that came fromCString::into_raw(). Plus, thoseCStrings need to be using the same allocator, which is what I suspect broke after removingextern crate app, since the crash is in jemalloc.Generally speaking, allocating in one domain and freeing in another is fraught with danger.
Josh Stone at 2018-02-05 23:05:44
I have a somewhat similar problem with my library, so I tried the repository above. My Rust version is the same, but I'm on High Sierra 10.13.3.
I ran it with
DYLD_PRINT_APIS=1to see dyld log.It (reloading) actually worked correctly.
dlopen(../app/target/debug/libapp.dylib, 0x00000002) dyld_image_path_containing_address(0x1019ce000) dlopen(../app/target/debug/libapp.dylib) ==> 0x10261b000 dlsym(0x10261b000, get_message) dlsym(0x10261b000, get_message) ==> 0x1019cf620 Message: hello world dlclose(0x10261b000) dlclose(), found unused image 0x10261b000 libapp.dylib dlclose(), deleting 0x10261b000 libapp.dylib dlopen(../app/target/debug/libapp.dylib, 0x00000002) dyld_image_path_containing_address(0x1019ce000) dlopen(../app/target/debug/libapp.dylib) ==> 0x10261b000 dlsym(0x10261b000, get_message) dlsym(0x10261b000, get_message) ==> 0x1019cf620 Message: hello world dlclose(0x10261b000) dlclose(), found unused image 0x10261b000 libapp.dylib dlclose(), deleting 0x10261b000 libapp.dylibWhen I changed it to
crate-type = ["cdylib"],dlcloseno longer unloaded the lib (and the program either segfaulted or returned with error).dlopen(../app/target/debug/libapp.dylib, 0x00000002) dyld_image_path_containing_address(0x10e772000) dlopen(../app/target/debug/libapp.dylib) ==> 0x7fe1e0700000 dlsym(0x7fe1e0700000, get_message) dlsym(0x7fe1e0700000, get_message) ==> 0x10e773460 Message: hello world dlclose(0x7fe1e0700000) dlopen(../app/target/debug/libapp.dylib, 0x00000002) dlopen(../app/target/debug/libapp.dylib) ==> 0x7fe1e0700000 dlsym(0x7fe1e0700000, get_mess) dlsym(0x7fe1e0700000, get_mess) ==> NULL dlerror() Failed to retrieve get_message symbol: dlsym(0x7fe1e0700000, get_mess): symbol not foundThis is quite weird, since the problem I have with my library is the opposite: unloading worked in Sierra, but stopped working in High Sierra (regardless of
crate-type).Tuấn-Anh Nguyễn at 2018-02-09 16:43:38
I'm experiencing this problem as well, on High Sierra (10.13.4). I noticed the following:
When
dlcloseing a library written in C (clang -shared), the dylib gets unloaded as expected.dlclose(0x7fc3eaf8b000) 3043 dlclose(), found unused image 0x7fc3eaf8b000 libhsgame.dylib 3044 dlclose(), deleting 0x7fc3eaf8b000 libhsgame.dylibWhen I try the same exact thing again with an identical cdylib written in Rust,
dlclosedoes not unload the library. A refcount > 0 can't be the problem, because even when Idlclosethe cdylib 100 times in a loop,dlclosestill refuses to release the library (DYLD_PRINT_APIS=1confirms it only gets opened once and closed 100 times).To my knowledge,
dlcloseonly refuses to release a dylib when, other than a refcount > 0, the dylib is still being used somewhere (pointers holding addresses of the lib's symbols still exist), to avoid dangling pointers.If that's the case, then the question is, where do these pointers come from? If not, what the hell else is going on?
Daniel Hauser at 2018-04-28 14:53:45
Ok, I think I found something - I tried two more things with my Rust cdylib:
- Switching to the system allocator, no effect.
- Switching to the system allocator and turning the cdylib into a no_std crate, this fixes the problem -
dlclosereleases the lib.
rustc --version rustc 1.27.0-nightly (ac3c2288f 2018-04-18)Daniel Hauser at 2018-04-28 15:25:34
There’s a recent change in OS X that has "improved"
dlcloserecently to not actually unload libraries if some conditions are satisfied. See this comment. Perhaps that’s the reason your library wasn’t unloaded?Simonas Kazlauskas at 2018-04-28 21:15:09
Thanks for the link @nagisa, this definitely seems to be related. Do you know a page where all cases, that make a dylib un-unloadable, are listed? @nanotech's comment only lists a few and I'd like to figure the exact reason why Rust cdylibs fall into that category.
Daniel Hauser at 2018-04-28 22:09:16
Historically, thread local storage (__thread) is what causes rust dylibs to not get unloaded or generally not work right with
dlclose.I don’t know of a full list though.
Simonas Kazlauskas at 2018-04-28 23:03:25
This is very likely to be related to the issues described in https://github.com/rust-lang/rust/issues/88737 and https://github.com/rust-lang/rust/issues/88737#issuecomment-1178525208. Fixing this will likely not result in the behavior the user wants though -- the fact that dlclose works anywhere is kind of a bug, it failing to unload is actually macOS doing the right thing.
In general you should not dlclose rust libraries that use libstd. There's no way for us to support this on many targets (and we don't quite support it correctly on all the platforms where we could, which is why sometimes it will unload, which can be unsound).
Unfortunately, dlclose is just not really coherent in programs which have thread local storage (in particular if destructors to be run on that TLS data). See that issue for an explanation of why.
Thom Chiovoloni at 2022-09-23 02:03:52
Musl libc doesn't even implement dlclose at all. It returns without doing anything.
bjorn3 at 2023-01-16 19:27:32