RawVec stores a capacity field even if T is zero-sized

cfdb857
Opened by gnzlbg at 2025-02-14 03:59:23

When std::mem::size_of::<T>() == 0 RawVec<T> has always a capacity equal to std::usize::MAX elements. It currently stores a cap: usize that always contains this value but doing so is unnecessary in this case.

@withoutboats this would be a not so far fetched case for allowing struct specialization like @arielb1 proposed.

  1. Time to interpret what https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#guarantees actually means for this

    Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

    Plain reading says that the capacity field must be there. It's a bit silly, since the only interface with the representation should be with Vec::from_raw_parts, where this "triplet" guarantee isn't needed.

    bluss at 2017-10-21 15:16:44

  2. @bluss I am finishing overhauling the growth-strategy for vectors (#29931, #27627, I'd like to find a mentor to discuss it before submitting a PR) but I'd like to improve on this afterwards.

    My proposal would be to amend those guarantees as follows:

    Most fundamentally, Vec<T> where T is not zero-sized is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized. When T is zero-sized the layout of Vec<T> is unspecified.

    And then to replace the cap field of RawVec with a zero-sized type when T is zero-sized.

    gnzlbg at 2017-10-21 15:23:16

  3. FWIW I think that the standard library should make more guarantees than it currently does, in particular with respect to the space and time complexity of operations.

    Yet I think this is an example of something that the standard library should never guarantee. The wording kills me:

    Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified

    What is this guarantee for? It sounds like I can rely on this for serialization purposes, but then the Vec type has different sizes in 32 and 64-bit architectures. This wording also prevents other equally-sized implementations, e.g., a triple of pointers, that might optimize better on a different backend, and the current topic, optimizations for zero-sized types.

    Changing this wording would be a breaking change, but if we can't change this, this should become an example of how the types of guarantees that standard library types should not be making.

    gnzlbg at 2017-10-21 15:38:18

  4. Cool. Does it include using usable_size as well? (Don't worry if not, I think that's fine to address as a separate little project too.)

    I think the guarantees are definitely overreaching there, but since they are not well defined enough to actually be used for much of anything, maybe the whole triplet thing can be reworded — the Vec has a representation equivalent to pointer, capacity and length, it's just that the capacity doesn't need to be stored for some element types.

    By the way, do RawVec and Vec need to store a pointer field for ZST?

    For questions, I'll certainly answer anything I can if you find me here or on IRC. Then I suppose @Gankro can as well, if he has time.

    I haven't heard about struct field specialization, and would love to read more about that.

    bluss at 2017-10-21 15:44:06

  5. I agree it could also forgo the pointer. All a ZST Vec should need to store is a length. I guess it only depends on whether from_raw_parts followed by as_ptr needs to preserve arbitrary pointer values.

    Josh Stone at 2017-10-21 18:21:02

  6. @cuviper I am talking about RawVec here, but I agree, Vec only needs to store a length, RawVec doesn't need to store anything.

    gnzlbg at 2017-10-24 15:29:00

  7. We can't implement this until some kind of specialization allows it in a backwards compatible way. Aside from doubts about associated type specialization, the problem of keeping Vec covariant in T., Which is not the case with a.t.spec. now AFAIU.

    bluss at 2017-10-24 15:50:12

  8. @bluss can you elaborate about how this kind of specialization would change the variance of Vec ? AFAIK the only way in which a user of Vec<T> can tell whether this specialization is applied is by observing the mem::size_of::<Vec<T>>. We would need some "hacks" to make iteration on these vectors work, but that should be doable (e.g. using a static variable and returning references to that).

    gnzlbg at 2017-10-25 15:15:36

  9. @gnzlbg Yeah, I was meaning to come back to you with an example.

    Here's some code that shows that code that compiles for current Vec<T> does not compile if we instead use an associated type (that depends on T) in the representation.

    (playground link)

    <details>
    struct CurrentVec<T> {
        elements: Box<T>
    }
    
    struct SmartRepresentationVec<T> {
        elements: <T as Smart>::Representation,
    }
    
    trait Smart {
        type Representation;
    }
    
    impl<T> Smart for T {
        type Representation = Box<T>;
    }
    
    // This compiles!
    fn change_lifetime1<'a, T>(input: CurrentVec<&'static T>) -> CurrentVec<&'a T> {
        input
    }
    
    // This does not! The SmartRepresentationVec is invariant in its parameter.
    fn change_lifetime2<'a, T>(input: SmartRepresentationVec<&'static T>) -> SmartRepresentationVec<&'a T> {
        input
    }
    
    </details>

    bluss at 2017-10-27 17:49:28

  10. @bluss why is that?

    gnzlbg at 2017-10-28 14:56:55

  11. I don't know exactly why the rule is like this.

    It looks like niko explains the rule here: https://github.com/rust-lang/rust/issues/21726#issuecomment-71949910 by showing how the old ~~"covariant"~~ rule (for us here, "covariant in T" means "be like Vec<T>") was unsound in some situations.

    bluss at 2017-10-28 15:11:19

  12. I would like to see an implementation of this in a PR.

    My reading of the Guarantees section seems like the "no more, no less" claim is just a generalization of the following claims, not a further claim of its own. It just means to say that Vec can't be a rope, can't do small-vector optimization, must support efficient conversion to Box<[T]> when len == capacity, etc all the other following claims. I agree with https://github.com/rust-lang/rust/issues/45431#issuecomment-338411384 that the "no more, no less" claim is not actionable by itself and so there is no problem rewording it.

    David Tolnay at 2017-11-18 22:09:26

  13. Notice that in https://github.com/rust-lang/unsafe-code-guidelines/issues/224 a different extension is discussed. That is, adding a mem::CompressedNonNull<T> type that's a ZST if T is a ZST, in which case the address is always just mem::align_of::<T>().

    With this and that optimizations, Vec<T> would "semantically" still have 3 fields (ptr, size, cap), but ptr and cap would be ZSTs when T is a ZST, making Vec<ZST> 1-word wide.

    A way to provide wording that guarantees that these optimizations will never happen would be to, e.g., guarantee that the size_of::<Vec<T>>() == size_of::<usize>() * 3 for all Ts. The current wording is a bit ambiguous on what's allowed or not.

    gnzlbg at 2020-01-10 12:32:59

  14. Is there a tangible use case for making Vec smaller for some types?

    bluss at 2020-04-13 14:34:16

  15. As noted in #117763, fixing this could allow giving Cow<[T]> a more optimal layout for T of all sizes. Special-casing the type of the capacity field when T is a ZST, would allow capping it to isize::MAX in all other cases. This would create a new niche, which Cow's layout could exploit to good effect.

    @rustbot label A-layout A-specialization I-heavy I-slow

    Jules Bertholet at 2023-11-09 19:52:12