Reference to a #[thread_local] is dereferenced too early
I'm working on the dynamic library for elfmalloc for Mac, and one of the caveats that we have to work with is that, during dynamic library loading, TLS is unavailable (if you try to access a #[thread_local], the program simply aborts).
In order to work around this, we have a mechanism which checks to make sure that dynamic loading is finished before we access any thread-local storage (see the with method).
Recently, we encountered a strange bug: We were spuriously accessing thread-local values while dynamic loading was still happening. The code in question looks like this:
pub unsafe fn alloc(size: usize) -> *mut u8 {
LOCAL_ELF_HEAP.with(|h| (*h.get()).alloc.alloc(size))
.unwrap_or_else(|| super::large_alloc::alloc(size))
}
LOCAL_ELF_HEAP is a #[thread_local] variable, and in the with method, we make sure that dynamic loading is done before we dereference &self. To make a long story short, it appears as though the generated code has reordered the instructions to dereference LOCAL_ELF_HEAP before we check to make sure that dynamic loading is done. You can see that in the disassembly of the alloc method (output from lldb):
0x10f1b7670 <+0>: pushq %rbp
0x10f1b7671 <+1>: movq %rsp, %rbp
0x10f1b7674 <+4>: subq $0x40, %rsp
0x10f1b7678 <+8>: leaq -0x20(%rbp), %rax
0x10f1b767c <+12>: leaq -0x28(%rbp), %rcx
0x10f1b7680 <+16>: movq %rdi, -0x28(%rbp)
0x10f1b7684 <+20>: movq %rcx, -0x10(%rbp)
0x10f1b7688 <+24>: movq -0x10(%rbp), %rdx
0x10f1b768c <+28>: leaq 0xaf5f5(%rip), %rdi ; elfmalloc::general::global::LOCAL_ELF_HEAP::h7d8e035078da16c9
0x10f1b7693 <+35>: movq %rax, -0x30(%rbp)
0x10f1b7697 <+39>: movq %rdx, -0x38(%rbp)
0x10f1b769b <+43>: callq *(%rdi)
-> 0x10f1b769d <+45>: leaq -0x20(%rbp), %rdi
0x10f1b76a1 <+49>: movq %rax, %rsi
0x10f1b76a4 <+52>: movq -0x38(%rbp), %rdx
0x10f1b76a8 <+56>: callq 0x10f1e2ac0 ; _$LT$alloc_tls..TLSSlot$LT$T$GT$$GT$::with::h2a59e6dd7499a672 at lib.rs:209
0x10f1b76ad <+61>: leaq -0x20(%rbp), %rdi
0x10f1b76b1 <+65>: leaq -0x28(%rbp), %rax
0x10f1b76b5 <+69>: movq %rax, -0x8(%rbp)
0x10f1b76b9 <+73>: movq -0x8(%rbp), %rsi
0x10f1b76bd <+77>: callq 0x10f1eb9a0 ; _$LT$core..option..Option$LT$T$GT$$GT$::unwrap_or_else::h09d8edc34df58ca5 at option.rs:373
0x10f1b76c2 <+82>: movq %rax, -0x40(%rbp)
0x10f1b76c6 <+86>: movq -0x40(%rbp), %rax
0x10f1b76ca <+90>: addq $0x40, %rsp
0x10f1b76ce <+94>: popq %rbp
0x10f1b76cf <+95>: retq
The -> points to where the next instruction to be executed at the time that the program aborted. The callq is to libdyld.dylib's _tlv_bootstrap, which is simply implemented as abort();.
The key thing to note is that the call to _tlv_boostrap (which attempts - and fails - to access the #[thread_local]) is before the call to with a couple of instructions later. I'm not positive, but my hypothesis is that the compiler reordered the accesses because immutable references are supposed to guarantee that the referent exists, and so it thought that the reordering was valid.
I think
compiler_fencemay be a way to enforce the ordering you want?Steven Fackler at 2018-03-08 05:59:38
Perhaps; I'll see if that fixes it. However, I feel like this should be something that the compiler fixes, as it seems like a soundness issue to me (namely, it lets you create an immutable reference to something that doesn't exist).
Joshua Liebow-Feeser at 2018-03-08 06:07:09
You can also produce immutable references to things that don't exist if you
munmapchunks of your address space. The language makes some assumptions about the state of the world that it expects. One of those assumptions is that you are not running while the dynamic loader is stitching the world together :)Steven Fackler at 2018-03-08 06:11:41
Unfortunately it looks like
compiler_fencedoesn't fix it. I'm not really sure why.You can also produce immutable references to things that don't exist if you
munmapchunks of your address space. The language makes some assumptions about the state of the world that it expects. One of those assumptions is that you are not running while the dynamic loader is stitching the world together :)But in order to do that, you need to call
munmap, which is unsafe. My point is that you can create a#[thread_local]in safe code, which means that you can violate Rust's memory safety guarantees in safe Rust.Joshua Liebow-Feeser at 2018-03-09 08:39:51
What do you propose the solution to this is at the language level, then? Should we make
#[thread_local]access unsafe?Steven Fackler at 2018-03-09 17:19:53
But in order to do that, you need to call munmap, which is unsafe.
You don't need to do that, something that can affect your process needs to do that.
Steven Fackler at 2018-03-09 17:42:13
What do you propose the solution to this is at the language level, then? Should we make
#[thread_local]access unsafe?I'm not sure whether this is simple or not, but I think the ideal solution would simply be to somehow disallow that optimization from taking place when it comes to accessing thread locals on Mac. I agree that technically my argument implies that accessing
#[thread_local]s should be unsafe, but I also understand that that's pretty draconian given how niche of a problem this is.But in order to do that, you need to call munmap, which is unsafe.
You don't need to do that, something that can affect your process needs to do that.
I have a philosophical disagreement with this point, but seeing as I've already ceded that it's fine to have
#[thread_local]access be safe even though it technically shouldn't be, I'm not sure us agreeing on this is important for this discussion.Joshua Liebow-Feeser at 2018-03-09 18:55:38