Unions interacting with Enum layout optimization

e2ce0c5
Opened by Mark Rousskov at 2024-11-01 19:11:38

IRC (#rust) short discussion log, since I don't actually know much about this:

19:30 Amanieu: bluss: I had a look at nodrop-union, shouldn't you add another field to union with type () to inhibit the enum layout optimization?
19:31 Amanieu: admittedly the exact interactions between enums and unions is still somewhat poorly specified
19:31 bluss: Amanieu: good question. As it is now, that optimization is not used for unions
19:31 bluss: Amanieu: but I haven't checked the RFC discussions to see how it's going to be
19:32 Amanieu: I don't think this point was covered in the discussions
19:33 Amanieu: well, someone should probably open an issue about it
19:33 Amanieu: it's 2am here so I'll let someone else do that :P
19:34 bluss: it's 3am here, so I'm passing it on

cc @Amanieu @bluss

  1. Some specific examples:

    union A {
        x: Box<i32>,
        y: Box<i64>,
    }
    
    union B {
        x: Box<i32>,
        y: Box<i64>,
        z: (),
    }
    

    The question here is essentially, what is the size of Option<A>? Should the enum layout optimization apply, in which case it is just 1 pointer length? Or should it act like Option<B>, where the () inhibits this optimization and forces the use of a separate enum tag. The key point here is that the optimization requires all union variants to be NonZero, which is not the case for ().

    Amanieu d'Antras at 2016-09-11 01:41:19

  2. Previous discussion (11 April - 28 May) https://github.com/rust-lang/rfcs/pull/1444#issuecomment-208268447

    bluss at 2016-09-11 01:55:02

  3. union B could use the layout optimization just as well as union A, since B's () member has padding bytes where the discriminant would want to be placed.

    bluss at 2016-09-11 11:04:48

  4. Layout of non-#[repr(C)] unions is unspecified, just like the layout of any other ADT. If you want any layout guarantees, I’d recommend sticking to #[repr(C)] for now.

    Simonas Kazlauskas at 2016-09-11 11:31:25

  5. This is another open question for the tracking issue: #32836

    bluss at 2016-09-11 11:53:43

  6. @nagisa While repr(C) fixes the representation of a type Foo, it does not say anything (that I know to be documented) about the representation of None::<Foo>, i.e types composed of Foo.

    bluss at 2016-10-08 18:26:37

  7. @Amanieu unions don't have a discriminant, though, so all three unions should have the size of a pointer.

    Adding a () to a union should never add any size to the union, because it adds zero bits of information. You can always interpret a union as any type you want, and you need 0 bits to identify the 1 possible value of type ().

    I don't think the enum layout optimization needs to apply here at all; the size of a union should always match the maximum size of any field type.

    Josh Triplett at 2017-01-02 22:49:59

  8. While repr(C) fixes the representation of a type Foo, it does not say anything (that I know to be documented) about the representation of None::<Foo>, i.e types composed of Foo.

    Why do you care about representation of Option<T>? It is repr(Rust) and therefore unpecified.

    Simonas Kazlauskas at 2017-02-02 23:28:03

  9. I think that this discussion has come to the conclusion that unions interact "as all other things" with enum layout optimization, and I'm going to close.

    Mark Rousskov at 2017-05-20 13:06:44

  10. I think this issue should be re-opened.

    It is originally about the nodrop-union crate, which is pretty much the same as std::mem::ManuallyDrop. The problem is that both of these types:

    • Are generic. They may or may not contain something that implements Drop. They may or may not contain NonZero<_> (or something else that might qualify for future enum layout optimizations).
    • One of their use case is to be used with uninitialized data, for example in arrayvec.

    They need not only a way to inhibit automatic drop glue (which union is already guaranteed to do) but also stop enum layout optimization “from the outside” from peeking “inside” them (which union happens to do in the current implementation, but maintaining that doesn’t seem agreed-upon).

    Concretely, ManuallyDrop needs to be written in some way (whether that’s #[repr(C)], adding a () variant, or something else) such that this code is guaranteed not to read uninitialized memory:

    Some(ManuallyDrop::new(uninitialized::<[String; 100]>())).is_some()
    

    More generally, we need to have principles for what to do to use std::mem::uninitialized safely in a generic context. std::mem::ManuallyDrop is probably part of that story.

    Simon Sapin at 2017-05-20 14:28:01

  11. Hm, okay. I'll reopen; this was (or is? Was it merged?) an unresolved question in the RFC.

    Mark Rousskov at 2017-05-20 14:38:12

  12. Could you point to a specific part of the RFC? I can’t find it in https://github.com/rust-lang/rfcs/pull/1897.

    Simon Sapin at 2017-05-20 14:48:23

  13. I don't think the original union RFC mentioned it (RFC 1444), but there was some discussion in the thread itself: https://github.com/rust-lang/rfcs/pull/1444#issuecomment-208268447 and https://github.com/rust-lang/rfcs/pull/1444#issuecomment-222315797 (but also just search from "enum layout")

    Mark Rousskov at 2017-05-20 14:52:47

  14. Ah, found https://github.com/rust-lang/rfcs/pull/1444#issuecomment-222315797 which concludes that adding a () field/variant inhibits enum layout optimization. But then https://github.com/rust-lang/rust/issues/36394#issuecomment-246174598 says the opposite.

    Simon Sapin at 2017-05-20 14:52:48

  15. Ah, found rust-lang/rfcs#1444 (comment) which concludes that adding a () field/variant inhibits enum layout optimization. But then #36394 (comment) says the opposite.

    Just an update: at present MaybeUninit<T> depends on () inhibiting enum layout optimization to make constructs like the following safe:

    assert!(Some(MaybeUninit::<&u8>::zeroed()).is_some());
    

    Peter Todd at 2019-05-07 16:30:31

  16. Also see https://github.com/rust-lang/unsafe-code-guidelines/issues/73: discussion of the invariant that a union has to satisfy at all times. This directly informs which layout optimizations are possible.

    Ralf Jung at 2019-05-18 22:37:11