Allocator traits and std::heap

f30caef
Opened by Niko Matsakis at 2025-04-08 11:02:09

FCP proposal: https://github.com/rust-lang/rust/issues/32838#issuecomment-336957415 FCP checkboxes: https://github.com/rust-lang/rust/issues/32838#issuecomment-336980230


Tracking issue for rust-lang/rfcs#1398 and the std::heap module.

  • [x] land struct Layout, trait Allocator, and default implementations in alloc crate (https://github.com/rust-lang/rust/pull/42313)
  • [x] decide where parts should live (e.g. default impls has dependency on alloc crate, but Layout/Allocator could be in libcore...) (https://github.com/rust-lang/rust/pull/42313)
  • [ ] fixme from source code: audit default implementations (in Layout for overflow errors, (potentially switching to overflowing_add and overflowing_mul as necessary).
  • [x] decide if realloc_in_place should be replaced with grow_in_place and shrink_in_place (comment) (https://github.com/rust-lang/rust/pull/42313)
  • [ ] review arguments for/against associated error type (see subthread here)
  • [ ] determine what the requirements are on the alignment provided to fn dealloc. (See discussion on allocator rfc and global allocator rfc and trait Alloc PR.)
    • Is it required to deallocate with the exact align that you allocate with? Concerns have been raised that allocators like jemalloc don't require this, and it's difficult to envision an allocator that does require this. (more discussion). @ruuda and @rkruppe look like they've got the most thoughts so far on this.
  • [ ] should AllocErr be Error instead? (comment)
  • [x] Is it required to deallocate with the exact size that you allocate with? With the usable_size business we may wish to allow, for example, that you if you allocate with (size, align) you must deallocate with a size somewhere in the range of size...usable_size(size, align). It appears that jemalloc is totally ok with this (doesn't require you to deallocate with a precise size you allocate with) and this would also allow Vec to naturally take advantage of the excess capacity jemalloc gives it when it does an allocation. (although actually doing this is also somewhat orthogonal to this decision, we're just empowering Vec). So far @Gankro has most of the thoughts on this. (@alexcrichton believes this was settled in https://github.com/rust-lang/rust/pull/42313 due to the definition of "fits")
  • [ ] similar to previous question: Is it required to deallocate with the exact alignment that you allocated with? (See comment from 5 June 2017)
  • [x] OSX/alloc_system is buggy on huge alignments (e.g. an align of 1 << 32) https://github.com/rust-lang/rust/issues/30170 #43217
  • [ ] should Layout provide a fn stride(&self) method? (See also https://github.com/rust-lang/rfcs/issues/1397, https://github.com/rust-lang/rust/issues/17027 )
  • [ ] Allocator::owns as a method? https://github.com/rust-lang/rust/issues/44302

State of std::heap after https://github.com/rust-lang/rust/pull/42313:

pub struct Layout { /* ... */ }

impl Layout {
    pub fn new<T>() -> Self;
    pub fn for_value<T: ?Sized>(t: &T) -> Self;
    pub fn array<T>(n: usize) -> Option<Self>;
    pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
    pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;

    pub fn size(&self) -> usize;
    pub fn align(&self) -> usize;
    pub fn align_to(&self, align: usize) -> Self;
    pub fn padding_needed_for(&self, align: usize) -> usize;
    pub fn repeat(&self, n: usize) -> Option<(Self, usize)>;
    pub fn extend(&self, next: Self) -> Option<(Self, usize)>;
    pub fn repeat_packed(&self, n: usize) -> Option<Self>;
    pub fn extend_packed(&self, next: Self) -> Option<(Self, usize)>;
}

pub enum AllocErr {
    Exhausted { request: Layout },
    Unsupported { details: &'static str },
}

impl AllocErr {
    pub fn invalid_input(details: &'static str) -> Self;
    pub fn is_memory_exhausted(&self) -> bool;
    pub fn is_request_unsupported(&self) -> bool;
    pub fn description(&self) -> &str;
}

pub struct CannotReallocInPlace;

pub struct Excess(pub *mut u8, pub usize);

pub unsafe trait Alloc {
    // required
    unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);

    // provided
    fn oom(&mut self, _: AllocErr) -> !;
    fn usable_size(&self, layout: &Layout) -> (usize, usize);
    unsafe fn realloc(&mut self,
                      ptr: *mut u8,
                      layout: Layout,
                      new_layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn alloc_zeroed(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn alloc_excess(&mut self, layout: Layout) -> Result<Excess, AllocErr>;
    unsafe fn realloc_excess(&mut self,
                             ptr: *mut u8,
                             layout: Layout,
                             new_layout: Layout) -> Result<Excess, AllocErr>;
    unsafe fn grow_in_place(&mut self,
                            ptr: *mut u8,
                            layout: Layout,
                            new_layout: Layout) -> Result<(), CannotReallocInPlace>;
    unsafe fn shrink_in_place(&mut self,
                              ptr: *mut u8,
                              layout: Layout,
                              new_layout: Layout) -> Result<(), CannotReallocInPlace>;

    // convenience
    fn alloc_one<T>(&mut self) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn dealloc_one<T>(&mut self, ptr: Unique<T>)
        where Self: Sized;
    fn alloc_array<T>(&mut self, n: usize) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn realloc_array<T>(&mut self,
                               ptr: Unique<T>,
                               n_old: usize,
                               n_new: usize) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn dealloc_array<T>(&mut self, ptr: Unique<T>, n: usize) -> Result<(), AllocErr>
        where Self: Sized;
}

/// The global default allocator
pub struct Heap;

impl Alloc for Heap {
    // ...
}

impl<'a> Alloc for &'a Heap {
    // ...
}

/// The "system" allocator
pub struct System;

impl Alloc for System {
    // ...
}

impl<'a> Alloc for &'a System {
    // ...
}
  1. šŸ“¢ This feature has a dedicated working group, please direct comments and concerns to the working group's repo.

    The remainder of this post is no longer an accurate summary of the current state; see that dedicated working group instead.

    <details> <summary>Old content</summary>

    Original Post:


    FCP proposal: https://github.com/rust-lang/rust/issues/32838#issuecomment-336957415 FCP checkboxes: https://github.com/rust-lang/rust/issues/32838#issuecomment-336980230


    Tracking issue for rust-lang/rfcs#1398 and the std::heap module.

    • [x] land struct Layout, trait Allocator, and default implementations in alloc crate (https://github.com/rust-lang/rust/pull/42313)
    • [x] decide where parts should live (e.g. default impls has dependency on alloc crate, but Layout/Allocator could be in libcore...) (https://github.com/rust-lang/rust/pull/42313)
    • [ ] fixme from source code: audit default implementations (in Layout for overflow errors, (potentially switching to overflowing_add and overflowing_mul as necessary).
    • [x] decide if realloc_in_place should be replaced with grow_in_place and shrink_in_place (comment) (https://github.com/rust-lang/rust/pull/42313)
    • [ ] review arguments for/against associated error type (see subthread here)
    • [ ] determine what the requirements are on the alignment provided to fn dealloc. (See discussion on allocator rfc and global allocator rfc and trait Alloc PR.)
      • Is it required to deallocate with the exact align that you allocate with? Concerns have been raised that allocators like jemalloc don't require this, and it's difficult to envision an allocator that does require this. (more discussion). @ruuda and @rkruppe look like they've got the most thoughts so far on this.
    • [ ] should AllocErr be Error instead? (comment)
    • [x] Is it required to deallocate with the exact size that you allocate with? With the usable_size business we may wish to allow, for example, that you if you allocate with (size, align) you must deallocate with a size somewhere in the range of size...usable_size(size, align). It appears that jemalloc is totally ok with this (doesn't require you to deallocate with a precise size you allocate with) and this would also allow Vec to naturally take advantage of the excess capacity jemalloc gives it when it does an allocation. (although actually doing this is also somewhat orthogonal to this decision, we're just empowering Vec). So far @Gankro has most of the thoughts on this. (@alexcrichton believes this was settled in https://github.com/rust-lang/rust/pull/42313 due to the definition of "fits")
    • [ ] similar to previous question: Is it required to deallocate with the exact alignment that you allocated with? (See comment from 5 June 2017)
    • [x] OSX/alloc_system is buggy on huge alignments (e.g. an align of 1 << 32) https://github.com/rust-lang/rust/issues/30170 #43217
    • [ ] should Layout provide a fn stride(&self) method? (See also https://github.com/rust-lang/rfcs/issues/1397, https://github.com/rust-lang/rust/issues/17027 )
    • [x] Allocator::owns as a method? https://github.com/rust-lang/rust/issues/44302

    State of std::heap after https://github.com/rust-lang/rust/pull/42313:

    pub struct Layout { /* ... */ }
    
    impl Layout {
        pub fn new<T>() -> Self;
        pub fn for_value<T: ?Sized>(t: &T) -> Self;
        pub fn array<T>(n: usize) -> Option<Self>;
        pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
        pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
    
        pub fn size(&self) -> usize;
        pub fn align(&self) -> usize;
        pub fn align_to(&self, align: usize) -> Self;
        pub fn padding_needed_for(&self, align: usize) -> usize;
        pub fn repeat(&self, n: usize) -> Option<(Self, usize)>;
        pub fn extend(&self, next: Self) -> Option<(Self, usize)>;
        pub fn repeat_packed(&self, n: usize) -> Option<Self>;
        pub fn extend_packed(&self, next: Self) -> Option<(Self, usize)>;
    }
    
    pub enum AllocErr {
        Exhausted { request: Layout },
        Unsupported { details: &'static str },
    }
    
    impl AllocErr {
        pub fn invalid_input(details: &'static str) -> Self;
        pub fn is_memory_exhausted(&self) -> bool;
        pub fn is_request_unsupported(&self) -> bool;
        pub fn description(&self) -> &str;
    }
    
    pub struct CannotReallocInPlace;
    
    pub struct Excess(pub *mut u8, pub usize);
    
    pub unsafe trait Alloc {
        // required
        unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
        unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
    
        // provided
        fn oom(&mut self, _: AllocErr) -> !;
        fn usable_size(&self, layout: &Layout) -> (usize, usize);
        unsafe fn realloc(&mut self,
                          ptr: *mut u8,
                          layout: Layout,
                          new_layout: Layout) -> Result<*mut u8, AllocErr>;
        unsafe fn alloc_zeroed(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
        unsafe fn alloc_excess(&mut self, layout: Layout) -> Result<Excess, AllocErr>;
        unsafe fn realloc_excess(&mut self,
                                 ptr: *mut u8,
                                 layout: Layout,
                                 new_layout: Layout) -> Result<Excess, AllocErr>;
        unsafe fn grow_in_place(&mut self,
                                ptr: *mut u8,
                                layout: Layout,
                                new_layout: Layout) -> Result<(), CannotReallocInPlace>;
        unsafe fn shrink_in_place(&mut self,
                                  ptr: *mut u8,
                                  layout: Layout,
                                  new_layout: Layout) -> Result<(), CannotReallocInPlace>;
    
        // convenience
        fn alloc_one<T>(&mut self) -> Result<Unique<T>, AllocErr>
            where Self: Sized;
        unsafe fn dealloc_one<T>(&mut self, ptr: Unique<T>)
            where Self: Sized;
        fn alloc_array<T>(&mut self, n: usize) -> Result<Unique<T>, AllocErr>
            where Self: Sized;
        unsafe fn realloc_array<T>(&mut self,
                                   ptr: Unique<T>,
                                   n_old: usize,
                                   n_new: usize) -> Result<Unique<T>, AllocErr>
            where Self: Sized;
        unsafe fn dealloc_array<T>(&mut self, ptr: Unique<T>, n: usize) -> Result<(), AllocErr>
            where Self: Sized;
    }
    
    /// The global default allocator
    pub struct Heap;
    
    impl Alloc for Heap {
        // ...
    }
    
    impl<'a> Alloc for &'a Heap {
        // ...
    }
    
    /// The "system" allocator
    pub struct System;
    
    impl Alloc for System {
        // ...
    }
    
    impl<'a> Alloc for &'a System {
        // ...
    }
    
    </details>

    Aria Desires at 2019-05-06 17:47:28

  2. I unfortunately wasn't paying close enough attention to mention this in the RFC discussion, but I think that realloc_in_place should be replaced by two functions, grow_in_place and shrink_in_place, for two reasons:

    • I can't think of a single use case (short of implementing realloc or realloc_in_place) where it is unknown whether the size of the allocation is increasing or decreasing. Using more specialized methods makes it slightly more clear what is going on.
    • The code paths for growing and shrinking allocations tend to be radically different - growing involves testing whether adjacent blocks of memory are free and claiming them, while shrinking involves carving off properly sized subblocks and freeing them. While the cost of a branch inside realloc_in_place is quite small, using grow and shrink better captures the distinct tasks that an allocator needs to perform.

    Note that these can be added backwards-compatibly next to realloc_in_place, but this would constrain which functions would be by default implemented in terms of which others.

    For consistency, realloc would probably also want to be split into grow and split, but the only advantage to having an overloadable realloc function that I know of is to be able to use mmap's remap option, which does not have such a distinction.

    Jonathan S at 2016-04-11 03:07:19

  3. Additionally, I think that the default implementations of realloc and realloc_in_place should be slightly adjusted - instead of checking against the usable_size, realloc should just first try to realloc_in_place. In turn, realloc_in_place should by default check against the usable size and return success in the case of a small change instead of universally returning failure.

    This makes it easier to produce a high-performance implementation of realloc: all that is required is improving realloc_in_place. However, the default performance of realloc does not suffer, as the check against the usable_size is still performed.

    Jonathan S at 2016-04-11 03:12:08

  4. Another issue: The doc for fn realloc_in_place says that if it returns Ok, then one is assured that ptr now "fits" new_layout.

    To me this implies that it must check that the alignment of the given address matches any constraint implied by new_layout.

    However, I don't think the spec for the underlying fn reallocate_inplace function implies that it will perform any such check.

    • Furthermore, it seems reasonable that any client diving into using fn realloc_in_place will themselves be ensuring that the alignments work (in practice I suspect it means that the same alignment is required everywhere for the given use case...)

    So, should the implementation of fn realloc_in_place really be burdened with checking that the alignment of the given ptr is compatible with that of new_layout? It is probably better in this case (of this one method) to push that requirement back to the caller...

    Felix S Klock II at 2016-10-26 13:04:27

  5. @gereeter you make good points; I will add them to the check list I am accumulating in the issue description.

    Felix S Klock II at 2016-10-26 13:05:05

  6. (at this point I am waiting for #[may_dangle] support to ride the train into the beta channel so that I will then be able to use it for std collections as part of allocator integration)

    Felix S Klock II at 2016-10-31 17:38:52

  7. I'm new to Rust, so forgive me if this has been discussed elsewhere.

    Is there any thought on how to support object-specific allocators? Some allocators such as slab allocators and magazine allocators are bound to a particular type, and do the work of constructing new objects, caching constructed objects which have been "freed" (rather than actually dropping them), returning already-constructed cached objects, and dropping objects before freeing the underlying memory to an underlying allocator when required.

    Currently, this proposal doesn't include anything along the lines of ObjectAllocator<T>, but it would be very helpful. In particular, I'm working on an implementation of a magazine allocator object-caching layer (link above), and while I can have this only wrap an Allocator and do the work of constructing and dropping objects in the caching layer itself, it'd be great if I could also have this wrap other object allocators (like a slab allocator) and truly be a generic caching layer.

    Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?

    Joshua Liebow-Feeser at 2017-01-04 20:12:58

  8. I don't think this has been discussed yet.

    You could write your own ObjectAllocator<T>, and then do impl<T: Allocator, U> ObjectAllocator<U> for T { .. }, so that every regular allocator can serve as an object-specific allocator for all objects.

    Future work would be modifying collections to use your trait for their nodes, instead of plain ole' (generic) allocators directly.

    John Ericson at 2017-01-04 20:22:53

  9. @pnkfelix

    (at this point I am waiting for #[may_dangle] support to ride the train into the beta channel so that I will then be able to use it for std collections as part of allocator integration)

    I guess this has happened?

    Niko Matsakis at 2017-01-04 20:25:22

  10. @Ericson2314 Yeah, writing my own is definitely an option for experimental purposes, but I think there'd be much more benefit to it being standardized in terms of interoperability (for example, I plan on also implementing a slab allocator, but it would be nice if a third-party user of my code could use somebody else's slab allocator with my magazine caching layer). My question is simply whether an ObjectAllocator<T> trait or something like it is worth discussing. Although it seems that it might be best for a different RFC? I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...

    Joshua Liebow-Feeser at 2017-01-04 20:27:20

  11. @joshlf

    Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?

    Yes, it would be another RFC.

    I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...

    that depends on the scope of the RFC itself, which is decided by the person who writes it, and then feedback is given by everyone.

    But really, as this is a tracking issue for this already-accepted RFC, thinking about extensions and design changes isn't really for this thread; you should open a new one over on the RFCs repo.

    Steve Klabnik at 2017-01-04 20:42:32

  12. @joshlf Ah, I thought ObjectAllocator<T> was supposed to be a trait. I meant prototype the trait not a specific allocator. Yes that trait would merit its own RFC as @steveklabnik says.


    @steveklabnik yeah now discussion would be better elsewhere. But @joshlf was also raising the issue lest it expose a hitherto unforeseen flaw in the accepted but unimplemented API design. In that sense it matches the earlier posts in this thread.

    John Ericson at 2017-01-04 21:01:36

  13. @Ericson2314 Yeah, I thought that was what you meant. I think we're on the same page :)

    @steveklabnik Sounds good; I'll poke around with my own implementation and submit an RFC if it ends up seeming like a good idea.

    Joshua Liebow-Feeser at 2017-01-04 21:27:36

  14. @joshlf I don't any reason why custom allocators would go into the compiler or standard library. Once this RFC lands, you could easily publish your own crate that does an arbitrary sort of allocation (even a fully-fledged allocator like jemalloc could be custom-implemented!).

    Alexander Regueiro at 2017-01-04 21:54:03

  15. @alexreg This isn't about a particular custom allocator, but rather a trait that specifies the type of all allocators which are parametric on a particular type. So just like RFC 1398 defines a trait (Allocator) that is the type of any low-level allocator, I'm asking about a trait (ObjectAllocator<T>) that is the type of any allocator which can allocate/deallocate and construct/drop objects of type T.

    Joshua Liebow-Feeser at 2017-01-04 21:58:58

  16. @alexreg See my early point about using standard library collections with custom object-specific allocators.

    John Ericson at 2017-01-04 22:01:33

  17. Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

    On 4 Jan 2017, at 21:59, Joshua Liebow-Feeser notifications@github.com wrote:

    @alexreg https://github.com/alexreg This isn't about a particular custom allocator, but rather a trait that specifies the type of all allocators which are parametric on a particular type. So just like RFC 1398 defines a trait (Allocator) that is the type of any low-level allocator, I'm asking about a trait (ObjectAllocator<T>) that is the type of any allocator which can allocate/deallocate and construct/drop objects of type T.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270499064, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3IhyyPhFgu1EGHr_GM_Evsr0SRzIks5rPBZGgaJpZM4IDYUN.

    Alexander Regueiro at 2017-01-04 22:02:54

  18. I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

    On 4 Jan 2017, at 22:01, John Ericson notifications@github.com wrote:

    @alexreg https://github.com/alexreg See my early point about using standard library collections with custom object-specific allocators.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270499628, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3CrjYIXqcv8Aqvb4VTyPcajJozICks5rPBbOgaJpZM4IDYUN.

    Alexander Regueiro at 2017-01-04 22:03:49

  19. Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

    Yes but you probably want some standard library functionality to rely on it (such as what @Ericson2314 suggested).

    I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

    Ideally you'd want both - to accept either type of allocator. There are very significant benefits to using object-specific caching; for example, both slab allocation and magazine caching give very significant performance benefits - take a look at the papers I linked to above if you're curious.

    Joshua Liebow-Feeser at 2017-01-04 22:13:00

  20. But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

    On 4 Jan 2017, at 22:13, Joshua Liebow-Feeser notifications@github.com wrote:

    Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

    Yes but you probably want some standard library functionality to rely on it (such as what @Ericson2314 https://github.com/Ericson2314 suggested).

    I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

    Ideally you'd want both - to accept either type of allocator. There are very significant benefits to using object-specific caching; for example, both slab allocation and magazine caching give very significant performance benefits - take a look at the papers I linked to above if you're curious.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270502231, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3L9F9r_0T5evOtt7Es92vw6gBxR9ks5rPBl9gaJpZM4IDYUN.

    Alexander Regueiro at 2017-01-04 22:16:15

  21. But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

    Ah, so the problem is that the semantics are different. Allocator allocates and frees raw byte blobs. ObjectAllocator<T>, on the other hand, would allocate already-constructed objects and would also be responsible for dropping these objects (including being able to cache constructed objects which could be handed out later in leu of constructing a newly-allocated object, which is expensive). The trait would look something like this:

    trait ObjectAllocator<T> {
        fn alloc() -> T;
        fn free(t T);
    }
    

    This is not compatible with Allocator, whose methods deal with raw pointers and have no notion of type. Additionally, with Allocators, it is the caller's responsibility to drop the object being freed first. This is really important - knowing about the type T allows ObjectAllocator<T> to do things like call T's drop method, and since free(t) moves t into free, the caller cannot drop t first - it is instead the ObjectAllocator<T>'s responsibility. Fundamentally, these two traits are incompatible with one another.

    Joshua Liebow-Feeser at 2017-01-04 22:28:41

  22. Ah right, I see. I thought this proposal already included something like that, i.e. a ā€œhigher-levelā€ allocator over the byte level. In that case, a perfectly fair proposal!

    On 4 Jan 2017, at 22:29, Joshua Liebow-Feeser notifications@github.com wrote:

    But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

    Ah, so the problem is that the semantics are different. Allocator allocates and frees raw byte blobs. ObjectAllocator<T>, on the other hand, would allocate already-constructed objects and would also be responsible for dropping these objects (including being able to cache constructed objects which could be handed out later in leu of constructing a newly-allocated object, which is expensive). The trait would look something like this:

    trait ObjectAllocator<T> { fn alloc() -> T; fn free(t T); } This is not compatible with Allocator, whose methods deal with raw pointers and have no notion of type. Additionally, with Allocators, it is the caller's responsibility to drop the object being freed first. This is really important - knowing about the type T allows ObjectAllocator<T> to do things like call T's drop method, and since free(t) moves t into free, the caller cannot drop t first - it is instead the ObjectAllocator<T>'s responsibility. Fundamentally, these two traits are incompatible with one another.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270505704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3GViJBefuk8IWgPauPyL5tV78Fn5ks5rPB08gaJpZM4IDYUN.

    Alexander Regueiro at 2017-01-05 00:16:00

  23. @alexreg Ah yes, I was hoping so too :) Oh well - it'll have to wait for another RFC.

    Joshua Liebow-Feeser at 2017-01-05 00:53:18

  24. Yes, do kick start that RFC, I’m sure it would get plenty of support! And thanks for the clarification (I hadn’t kept up with the details of this RFC at all).

    On 5 Jan 2017, at 00:53, Joshua Liebow-Feeser notifications@github.com wrote:

    @alexreg https://github.com/alexreg Ah yes, I was hoping so too :) Oh well - it'll have to wait for another RFC.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270531535, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3MQQeXhTliU5CBsoheBFL26Ee9WUks5rPD8RgaJpZM4IDYUN.

    Alexander Regueiro at 2017-01-05 00:55:05

  25. A crate for testing custom allocators would be useful.

    Jeff Burdges at 2017-01-12 18:13:02

  26. Forgive me if I'm missing something obvious, but is there a reason for the Layout trait described in this RFC not to implement Copy as well as Clone, since it's just POD?

    Eliza Weisman at 2017-02-07 17:35:14

  27. I can't think of any.

    John Ericson at 2017-03-20 19:59:46

  28. Sorry to be bringing this up so late in the process, but...

    Might it be worth adding support for a dealloc-like function that isn't a method, but rather a function? The idea would be to use alignment to be able to infer from a pointer where in memory its parent allocator is and thus be able to free without needing an explicit allocator reference.

    This could be a big win for data structures that use custom allocators. This would allow them to not keep a reference to the allocator itself, but rather only need to be parametric on the type of the allocator to be able to call the right dealloc function. For example, if Box is eventually modified to support custom allocators, then it'd be able to keep being only a single word (just the pointer) as opposed to having to be expanded to two words to store a reference to the allocator as well.

    On a related note, it might also be useful to support a non-method alloc function to allow for global allocators. This would compose nicely with a non-method dealloc function - for global allocators, there'd be no need to do any kind of pointer-to-allocator inference since there'd only be a single static instance of the allocator for the whole program.

    Joshua Liebow-Feeser at 2017-04-05 23:15:35

  29. @joshlf The current design allows you to get that by just having your allocator be a (zero-sized) unit type - i.e. struct MyAlloc; that you then implement the Allocator trait on. Storing references or nothing at all, always, is less general than storing the allocator by-value.

    Eduard-Mihai Burtescu at 2017-04-06 15:20:24

  30. I could see that being true for a directly-embedded type, but what about if a data structure decies to keep a reference instead? Does a reference to a zero-sized type take up zero space? That is, if I have:

    struct Foo()
    
    struct Blah{
        foo: &Foo,
    }
    

    Does Blah have zero size?

    Joshua Liebow-Feeser at 2017-04-06 16:17:36

  31. Actually, even it's possible, you might not want your allocator to have zero size. For example, you might have an allocator with a non-zero size that you allocate from, but that has the ability to free objects w/o knowing about the original allocator. This would still be useful for making a Box take only a word. You'd have something like Box::new_from_allocator which would have to take an allocator as an argument - and it might be a nonzero-sized allocator - but if the allocator supported freeing without the original allocator reference, the returned Box<T> could avoid storing a reference to the allocator that was passed in the original Box::new_from_allocator call.

    Joshua Liebow-Feeser at 2017-04-06 16:21:08

  32. For example, you might have an allocator with a non-zero size that you allocate from, but that has the ability to free objects w/o knowing about the original allocator.

    I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.

    John Ericson at 2017-04-06 17:49:19

  33. Is/should the compiler be allowed to optimize away allocations with these allocators?

    Zoxc at 2017-04-06 18:14:37

  34. Is/should the compiler be allowed to optimize away allocations with these allocators?

    @Zoxc What do you mean?

    I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.

    For posterity, let me clarify this statement (I talked to @Ericson2314 about it offline): The idea is that a Box could be parametric just on a deallocator. So you could have the following implementation:

    trait Allocator {
        type D: Deallocator;
        
        fn get_deallocator(&self) -> Self::D;
    }
    
    trait Deallocator {}
    
    struct Box<T, D: Deallocator> {
        ptr: *mut T,
        d: D,
    }
    
    impl<T, D: Deallocator> Box<T, D> {
        fn new_from_allocator<A: Allocator>(x: T, a: A) -> Box<T, A::D> {
            ...
            Box {
                ptr: ptr,
                d: a.get_deallocator()
            }
        }
    }
    

    This way, when calling new_from_allocator, if A::D is a zero-sized type, then the d field of Box<T, A::D> takes up zero size, and so the size of the resulting Box<T, A::D> is a single word.

    Joshua Liebow-Feeser at 2017-04-06 20:30:49

  35. Is there a timeline for when this will land? I'm working on some allocator stuff, and it'd be nice if this stuff were there for me to build off of.

    If there's interest, I'd be happy to lend some cycles to this, but I'm relatively new to Rust, so that might just create more work for the maintainers in terms of having to code review a newbie's code. I don't want to step on anybody's toes, and I don't want to make more work for people.

    Joshua Liebow-Feeser at 2017-05-05 17:40:24

  36. Ok we've recently met to evaluate the state of allocators and I think there's some good news for this as well! It looks like support has not yet landed in libstd for these APIs, but everyone's still happy with them landing at any time!

    One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues, but regardless of that it seems like a good idea to land the Allocator trait and the Layout type in the proposed std::heap module for experimentation elsewhere in the ecosystem!

    @joshlf if you'd like to help out here I think that'd be more than welcome! The first piece will likely be landing the basic type/trait from this RFC into the standard library, and then from there we can start experimenting and playing around with collections in libstd as well.

    Alex Crichton at 2017-05-19 16:52:25

  37. @alexcrichton I think your link is broken? It points back here.

    One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues

    Adding the trait is a good first step, but without refactoring existing APIs to use it they won't see much usage. In https://github.com/rust-lang/rust/issues/27336#issuecomment-300721558 I propose we can refactor the crates behind the facade immediately, but add newtype wrappers in std. Annoying to do, but allows us to make progress.

    John Ericson at 2017-05-19 21:55:02

  38. @alexcrichton What would be the process for getting object allocators in? My experiments so far (soon to be public; I can add you to the private GH repo if you're curious) and the discussion here have led me to believe that there's going to be a near-perfect symmetry between the allocator traits and object allocator traits. E.g., you'll have something like (I changed Address to *mut u8 for symmetry with *mut T from ObjectAllocator<T>; we'd probably end up with Address<T> or something like that):

    unsafe trait Allocator {
        unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
        unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
    }
    
    unsafe trait ObjectAllocator<T> {
        unsafe fn alloc(&mut self) -> Result<*mut T, AllocErr>;
        unsafe fn dealloc(&mut self, ptr: *mut T);
    }
    

    Thus, I think experimenting with both allocators and object allocators at the same time could be useful. I'm not sure if this is the right place for that, though, or whether there should be another RFC or, at the very least, a separate PR.

    Joshua Liebow-Feeser at 2017-05-19 22:37:18

  39. Oh I meant to link here which also has information about the global allocator. @joshlf is that what you're thinking?

    Alex Crichton at 2017-05-20 05:02:56

  40. It sounds like @alexcrichton wants a PR that provides the Allocator trait and Layout type, even if its not integrated into any collection in libstd.

    If I understand that correctly, then I can put up a PR for that. I had not done so because I keep trying to get at least integration with RawVec and Vec prototyped. (At this point I have RawVec done, but Vec is a bit more challenging due to the many other structures that build off of it, like Drain and IntoIter etc...)

    Felix S Klock II at 2017-05-30 14:35:49

  41. actually, my current branch seems like it might actually build (and the one test for integration withRawVec passed), so I went ahead and posted it: #42313

    Felix S Klock II at 2017-05-30 14:50:29

  42. @hawkw asked:

    Forgive me if I'm missing something obvious, but is there a reason for the Layout trait described in this RFC not to implement Copy as well as Clone, since it's just POD?

    The reason I made Layout only implement Clone and not Copy is that I wanted to leave open the possibility of adding more structure to the Layout type. In particular, I still am interested in trying to have the Layout attempt to track any type structure used to construct it (e.g. 16-array of struct { x: u8, y: [char; 215] }), so that allocators would have the option of exposing instrumentation routines that report on what types their current contents are composes from.

    This would almost certainly have to be an optional feature, i.e. it seems like the tide is firmly against forcing developers to to use the type-enriched Layout constructors. so any instrumentation of this form would need to include something like an "unknown memory blocks" category to handle allocations that do not have the type information.

    But nonetheless, features like this were the main reason why I did not opt to make Layout implement Copy; I basically figured that an implementation of Copy would be a premature constraint on Layout itself.

    Felix S Klock II at 2017-05-30 14:58:05

  43. @alexcrichton @pnkfelix

    Looks like @pnkfelix has this one covered, and that PR is getting traction, so let's just go with that. I'm looking over it and making comments now, and it looks great!

    Joshua Liebow-Feeser at 2017-06-06 04:00:12

  44. Currently the signature for Allocator::oom is:

        fn oom(&mut self, _: AllocErr) -> ! {
            unsafe { ::core::intrinsics::abort() }
        }
    

    It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like Option<Layout> for why OOM is happening.

    Alex Crichton at 2017-06-06 18:59:51

  45. @alexcrichton Might it be worth having either multiple oom_xxx variants or an enum of different argument types? There are a couple of different signatures for methods that could fail (e.g., alloc takes a layout, realloc takes a pointer, an original layout, and a new layout, etc), and there might be cases in which an oom-like method would want to know about all of them.

    Joshua Liebow-Feeser at 2017-06-06 19:13:53

  46. @joshlf that's true, yes, but I'm not sure if it's useful. I wouldn't want to just add features because we can, they should continue to be well motivated.


    A point for stabilization here is also "determine what the requirements are on the alignment provided to fn dealloc", and the current implementation of dealloc on Windows uses align to determine how to correctly free. @ruuda you may be interested in this fact.

    Alex Crichton at 2017-06-06 22:21:32

  47. A point for stabilization here is also "determine what the requirements are on the alignment provided to fn dealloc", and the current implementation of dealloc on Windows uses align to determine how to correctly free.

    Yes, I think this is how I initially ran into this; my program crashed on Windows because of this. As HeapAlloc makes no alignment guarantees, allocate allocates a bigger region and stores the original pointer in a header, but as an optimization this is avoided if the alignment requirements would be satisfied anyway. I wonder if there is a way to convert HeapAlloc into an alignment-aware allocator that does not require alignment on free, without losing this optimization.

    Ruud van Asseldonk at 2017-06-07 10:41:16

  48. @ruuda

    As HeapAlloc makes no alignment guarantees

    It does provide a minimum alignment guarantee of 8 bytes for 32bit or 16 bytes for 64bit, it just doesn't provide any way to guarantee alignment higher than that.

    The _aligned_malloc provided by the CRT on Windows can provide allocations of higher alignment, but notably it must be paired with _aligned_free, using free is illegal. So if you don't know whether an allocation was done via malloc or _aligned_malloc then you're stuck in the same conundrum that alloc_system is in on Windows if you don't know the alignment for deallocate. The CRT does not provide the standard aligned_alloc function which can be paired with free, so even Microsoft hasn't been able to solve this problem. (Although it is a C11 function and Microsoft doesn't support C11 so that's a weak argument.)

    Do note that deallocate only cares about the alignment to know whether it is overaligned, the actual value itself is irrelevant. If you wanted a deallocate that was truly alignment independent, you could simply treat all allocations as overaligned, but you'd waste a lot of memory on small allocations.

    Peter Atashian at 2017-06-07 11:23:18

  49. @alexcrichton wrote:

    Currently the signature for Allocator::oom is:

        fn oom(&mut self, _: AllocErr) -> ! {
            unsafe { ::core::intrinsics::abort() }
        }
    

    It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like Option<Layout> for why OOM is happening.

    The AllocErr already carries the Layout in the AllocErr::Exhausted variant. We could just add the Layout to the AllocErr::Unsupported variant as well, which I think would be simplest in terms of client expectations. (It does have the drawback of silghtly increasing the side of the AllocErr enum itself, but maybe we shouldn't worry about that...)

    Felix S Klock II at 2017-06-07 15:42:17

  50. Oh I suspect that's all that's needed, thanks for the correction @pnkfelix!

    Alex Crichton at 2017-06-07 15:45:12

  51. I'm going to start repurposing this issue for the tracking issue for std::heap in general as it will be after https://github.com/rust-lang/rust/pull/42727 lands. I'll be closing a few other related issues in favor of this.

    Alex Crichton at 2017-06-20 14:37:59

  52. Is there a tracking issue for converting collections? Now that the PRs are merged, I'd like to

    • Discuss the associated error type
    • Discuss converting collections to use any local allocator, (especially leveraging associated error type)

    John Ericson at 2017-06-20 14:52:39

  53. I've opened https://github.com/rust-lang/rust/issues/42774 to track integration of Alloc into std collections. With historical discussion in the libs team that's likely to be on a separate track of stabilization than an initial pass of the std::heap module.

    Alex Crichton at 2017-06-20 15:11:43

  54. While reviewing allocator-related issues I also came across https://github.com/rust-lang/rust/issues/30170 which @pnkfelix awhile back. It looks like the OSX system allocator is buggy with high alignments and when running that program with jemalloc it's segfaulting during deallocation on Linux at least. Worth considering during stabilization!

    Alex Crichton at 2017-06-20 17:22:16

  55. I've opened #42794 as a place to discuss the specific question of whether zero-sized allocations need to match their requested alignment.

    Felix S Klock II at 2017-06-21 11:49:42

  56. (oh wait, zero-sized allocations are illegal in user allocators!)

    Felix S Klock II at 2017-06-21 11:50:34

  57. Since the alloc::heap::allocate function and friends are now gone in Nightly, I’ve updated Servo to use this new API. This is part of the diff:

    -use alloc::heap;
    +use alloc::allocator::{Alloc, Layout};
    +use alloc::heap::Heap;
    
    -        let ptr = heap::allocate(req_size as usize, FT_ALIGNMENT) as *mut c_void;
    +        let layout = Layout::from_size_align(req_size as usize, FT_ALIGNMENT).unwrap();
    +        let ptr = Heap.alloc(layout).unwrap() as *mut c_void;
    

    I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.

    • Would it make sense to have a convenience method for allocator.alloc(Layout::from_size_align(…))?
    • Would it make sense to make <Heap as Alloc>::_ methods available as free functions or inherent methods? (To have one fewer item to import, the Alloc trait.)

    Simon Sapin at 2017-07-08 08:33:00

  58. Alternatively, could the Alloc trait be in the prelude or is it too niche of a use case?

    Simon Sapin at 2017-07-08 08:33:54

  59. @SimonSapin IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.

    Eduard-Mihai Burtescu at 2017-07-08 08:38:10

  60. @SimonSapin

    I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.

    I had the exact same feeling with my codebase - it's pretty clunky now.

    Would it make sense to have a convenience method for allocator.alloc(Layout::from_size_align(…))?

    Do you mean in the Alloc trait, or just for Heap? One thing to consider here is that there's now a third error condition: Layout::from_size_align returns an Option, so it could return None in addition to the normal errors you can get when allocating.

    Alternatively, could the Alloc trait be in the prelude or is it too niche of a use case?

    IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.

    I agree that it's probably too low-level to put in the prelude, but I still think that there's value in optimizing the ergonomics (selfishly, at least - that was a really annoying refactor šŸ˜ ).

    Joshua Liebow-Feeser at 2017-07-08 16:26:54

  61. @SimonSapin did you not handle OOM before? Also in std all three types are available in the std::heap module (they're supposed to be in one module). Also did you not handle the case of overflowing sizes before? Or zero-sized types?

    Alex Crichton at 2017-07-08 18:27:59

  62. did you not handle OOM before?

    When it existed the alloc::heap::allocate function returned a pointer without a Result and did not leave a choice in OOM handling. I think it aborted the process. Now I’ve added .unwrap() to panic the thread.

    they're supposed to be in one module

    I see now that heap.rs contains pub use allocator::*;. But when I clicked Alloc in the impl listed on the rustdoc page for Heap I was sent to alloc::allocator::Alloc.

    As to the rest, I haven’t looked into it. I’m porting to a new compiler a big pile of code that was written years ago. I think these are callbacks for FreeType, a C library.

    Simon Sapin at 2017-07-08 18:55:37

  63. When it existed the alloc::heap::allocate function returned a pointer without a Result and did not leave a choice in OOM handling.

    It did give you a choice. The pointer it returned could have been a null pointer which would indicate the heap allocator failed to allocate. This is why I'm so glad it switched to Result so people don't forget to handle that case.

    Peter Atashian at 2017-07-09 00:20:27

  64. Oh well, maybe the FreeType ended up doing a null check, I don’t know. Anyway, yes, returning a Result is good.

    Simon Sapin at 2017-07-09 06:24:48

  65. Given #30170 and #43097, I am tempted to resolve the OS X issue with ridiculously big alignments by simply specifying that users cannot request alignments >= 1 << 32.

    One very easy way to enforce this: Change the Layout interface so that align is denoted by a u32 instead of a usize.

    @alexcrichton do you have thoughts on this? Should I just make a PR that does this?

    Felix S Klock II at 2017-07-13 11:05:53

  66. @pnkfelix Layout::from_size_align would still take usize and return an error on u32 overflow, right?

    Simon Sapin at 2017-07-13 11:22:47

  67. @SimonSapin what reason is there to have it continue taking usize align, if a static precondition is that it is unsafe to pass a value >= 1 << 32 ?

    Felix S Klock II at 2017-07-13 11:29:14

  68. and if the answer is "well some allocators might support an alignment >= 1 << 32", then we're back to the status quo and you can disregard my suggestion. The point of my suggestion is basically a "+1" to comments like this one

    Felix S Klock II at 2017-07-13 11:31:30

  69. Because std::mem::align_of returns usize

    Simon Sapin at 2017-07-13 11:40:38

  70. @SimonSapin ah, good old stable API's... sigh.

    Felix S Klock II at 2017-07-13 11:42:57

  71. @pnkfelix limiting to 1 << 32 seems reasonable to me!

    Alex Crichton at 2017-07-13 14:22:58

  72. @rfcbot fcp merge

    Ok this trait and its types have bake for awhile now and also been the underlying implementation of the standard collections since its inception. I would propose starting out with a particularly conservative initial offering, namely only stabilizing the following interface:

    pub struct Layout { /* ... */ }
    
    extern {
        pub type void;
    }
    
    impl Layout {
        pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
        pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
    
        pub fn size(&self) -> usize;
        pub fn align(&self) -> usize;
    }
    
    pub unsafe trait Alloc {
        unsafe fn alloc(&mut self, layout: Layout) -> *mut void;
        unsafe fn alloc_zeroed(&mut self, layout: Layout) -> *mut void;
        unsafe fn dealloc(&mut self, ptr: *mut void, layout: Layout);
    
        // all other methods are default and unstable
    }
    
    /// The global default allocator
    pub struct Heap;
    
    impl Alloc for Heap {
        // ...
    }
    
    impl<'a> Alloc for &'a Heap {
        // ...
    }
    
    /// The "system" allocator
    pub struct System;
    
    impl Alloc for System {
        // ...
    }
    
    impl<'a> Alloc for &'a System {
        // ...
    }
    
    <details><summary>Original proposal</summary><p>
    pub struct Layout { /* ... */ }
    
    impl Layout {
        pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
        pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
    
        pub fn size(&self) -> usize;
        pub fn align(&self) -> usize;
    }
    
    // renamed from AllocErr today
    pub struct Error {
        // ...
    }
    
    impl Error {
        pub fn oom() -> Self;
    }
    
    pub unsafe trait Alloc {
        unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, Error>;
        unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
    
        // all other methods are default and unstable
    }
    
    /// The global default allocator
    pub struct Heap;
    
    impl Alloc for Heap {
        // ...
    }
    
    impl<'a> Alloc for &'a Heap {
        // ...
    }
    
    /// The "system" allocator
    pub struct System;
    
    impl Alloc for System {
        // ...
    }
    
    impl<'a> Alloc for &'a System {
        // ...
    }
    
    </p></details>

    Notably:

    • Only stabilizing alloc, alloc_zeroed, and dealloc methods on the Alloc trait for now. I think this solves the most pressing issue we have today, defining a custom global allocator.
    • Remove the Error type in favor of just using raw pointers.
    • Change the u8 type in the interface to void
    • A stripped down version of the Layout type.

    There are still open questions such as what to do with dealloc and alignment (precise alignment? fits? unsure?), but I'm hoping we can resolve them during FCP as it likely won't be an API-breaking change.

    Alex Crichton at 2017-10-16 17:16:25

  73. +1 to getting something stabilized!

    Renames AllocErr to Error and moves the interface to be a bit more conservative.

    Does this eliminate the option for allocators to specify Unsupported? At the risk of harping more on something I've been harping on a lot, I think that #44557 is still an issue.

    Layout

    It looks like you've removed some of the methods from Layout. Did you mean to have the ones you left out actually removed, or just left as unstable?

    Joshua Liebow-Feeser at 2017-10-16 17:51:18

  74. impl Error {
        pub fn oom() -> Self;
    }
    

    Is this a constructor for what’s today AllocErr::Exhausted? If so, shouldn’t it have a Layout parameter?

    Simon Sapin at 2017-10-16 18:15:37

  75. Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

    • [x] @BurntSushi
    • [x] @Kimundi
    • [x] @alexcrichton
    • [x] @aturon
    • [x] @cramertj
    • [x] @dtolnay
    • [x] @eddyb
    • [x] @nikomatsakis
    • [x] @nrc
    • [x] @pnkfelix
    • [x] @sfackler
    • [x] @withoutboats

    Concerns:

    • ~~*mut u8~~ resolved by https://github.com/rust-lang/rust/issues/32838#issuecomment-342622672

    Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

    See this document for info about what commands tagged team members can give me.

    Rust RFC bot at 2017-10-16 18:16:41

  76. I'm really excited about getting to stabilize some of this work!

    One question: in the above thread, @joshlf and @Ericson2314 raised an interesting point about the possibility of separating the Alloc and Dealloc traits in order to optimize for cases in which alloc requires some data, but dealloc requires no extra information, so the Dealloc type can be zero-sized.

    Was this question ever resolved? What are the disadvantages of separating the two traits?

    Taylor Cramer at 2017-10-16 19:00:13

  77. @joshlf

    Does this eliminate the option for allocators to specify Unsupported?

    Yes and no, it would mean that such an operation is not supported in stable rust immediately, but we could continue to support it in unstable Rust.

    Did you mean to have the ones you left out actually removed, or just left as unstable?

    Indeed! Again though I'm just proposing a stable API surface area, we can leave all the other methods as unstable. Over time we can continue to stabilize more of the functionality. I think it's best to start as conservative as we can.


    @SimonSapin

    Is this a constructor for what’s today AllocErr::Exhausted? If so, shouldn’t it have a Layout parameter?

    Aha good point! I sort of wanted to leave the possibility though to make Error a zero-sized type if we really needed it, but we can of course keep the layout-taking methods in unstable and stabilize them if necessary. Or do you think that the layout-preserving Error should be stabilized in the first pass?


    @cramertj

    I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it. Two traits is twice the boilerplate in general, as now everyone would have to type Alloc + Dealloc in collections for example. I would expect that such a specialized use would not want to inform the interface all other users end up using, personally.

    Alex Crichton at 2017-10-16 19:05:14

  78. @cramertj @alexcrichton

    I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it.

    In general I agree that it's not worth it with one glaring exception: Box. Box<T, A: Alloc> would, given the current definition of Alloc, have to be at least two words large (the pointer that it already has and a reference to an Alloc at the very least) except in the case of global singletons (which can be implemented as ZSTs). A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.

    Joshua Liebow-Feeser at 2017-10-16 19:41:52

  79. @alexcrichton

    as now everyone would have to type Alloc + Dealloc in collections for example

    We could add something like this:

    trait Allocator: Alloc + Dealloc {}
    impl<T> Allocator for T where T: Alloc + Dealloc {}
    

    Taylor Cramer at 2017-10-16 20:38:13

  80. A 2x (or more) blowup in the space required to store such a common and fundamental type

    Only when you use a custom allocator that is not process-global. std::heap::Heap (the default) is zero-size.

    Simon Sapin at 2017-10-16 21:03:19

  81. Or do you think that the layout-preserving Error should be stabilized in the first pass?

    @alexcrichton I don’t really understand why this proposed first pass is at it is at all. There’s barely more than could already be done by abusing Vec, and not enough for example to use https://crates.io/crates/jemallocator.

    What still needs to be resolved to stabilize the whole thing?

    Simon Sapin at 2017-10-16 21:07:44

  82. Only when you use a custom allocator that is not process-global. std::heap::Heap (the default) is zero-size.

    That seems like the primary use case of having parametric allocators, no? Imagine the following simple definition of a tree:

    struct Node<T, A: Alloc> {
        t: T,
        left: Option<Box<Node<T, A>>>,
        right: Option<Box<Node<T, A>>>,
    }
    

    A tree constructed from those with a 1-word Alloc would have a ~1.7x blowup in size for the whole data structure compared to a ZST Alloc. That seems pretty bad to me, and these kinds of applications are sort of the whole point of having Alloc be a trait.

    Joshua Liebow-Feeser at 2017-10-16 21:31:45

  83. @cramertj

    We could add something like this:

    We're also going to have actual trait aliases :) https://github.com/rust-lang/rust/issues/41517

    GƔbor Lehel at 2017-10-16 21:40:32

  84. @glaebhoerl Yeah, but stabilization still seems a ways off since there isn't an implementation yet. If we disable manual impls of Allocator I think we can switch to trait-aliases backwards-compatibly when they arrive ;)

    Taylor Cramer at 2017-10-17 00:05:02

  85. @joshlf

    A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.

    I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)


    @cramertj

    We could add something like this:

    Indeed! We've then taken one trait to three though. In the past we've never had a great experience with such traits. For example Box<Both> does not cast to Box<OnlyOneTrait>. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.


    @SimonSapin

    What still needs to be resolved to stabilize the whole thing?

    I don't know. I wanted to start with the absolute smallest thing so there would be less debate.

    Alex Crichton at 2017-10-17 20:29:40

  86. I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)

    Yeah, the idea is that, given a pointer to an object allocated from your your type of allocator, you can figure out which instance it came from (e.g., using inline metadata). Thus, the only information you need to deallocate is type information, not runtime information.

    Joshua Liebow-Feeser at 2017-10-17 20:33:18

  87. To come back to alignment on deallocate, I see two ways forward:

    • Stabilize as proposed (with alignment on deallocate). Giving away ownership of manually allocated memory would be impossible unless the Layout is included. In particular, it is impossible to build a Vec or Box or String or other std container with a stricter alignment than required (for instance because you don't want the boxed element to straddle a cache line), without deconstructing and deallocating it manually later (which is not always an option). Another example of something that would be impossible, is filling a Vec using simd operations, and then giving it away.

    • Do not require alignment on deallocate, and remove the small-allocation optimization from Windows’ HeapAlloc-based alloc_system. Always store the alignment. @alexcrichton, as you committed that code, do you remember why it was put there in the first place? Do we have any evidence that it saves a significant amount of memory for real-world applications? (With microbenchmarks it is possible to make the results come out either way depending the allocation size — unless HeapAlloc was rounding up sizes anyway.)

    In any case, this is a very difficult trade off to make; the memory and performance impact will depend highly on the type of application, and which one to optimize for is application-specific as well.

    Ruud van Asseldonk at 2017-10-17 20:39:28

  88. I think we may actually be Just Fine (TM). Quoting the Alloc docs:

    Some of the methods require that a layout fit a memory block. What it means for a layout to "fit" a memory block means (or equivalently, for a memory block to "fit" a layout) is that the following two conditions must hold:

    1. The block's starting address must be aligned to layout.align().

    2. The block's size must fall in the range [use_min, use_max], where:

      • use_min is self.usable_size(layout).0, and

      • use_max is the capacity that was (or would have been) returned when (if) the block was allocated via a call to alloc_excess or realloc_excess.

    Note that:

    • the size of the layout most recently used to allocate the block is guaranteed to be in the range [use_min, use_max], and

    • a lower-bound on use_max can be safely approximated by a call to usable_size.

    • if a layout k fits a memory block (denoted by ptr) currently allocated via an allocator a, then it is legal to use that layout to deallocate it, i.e. a.dealloc(ptr, k);.

    Note that last bullet. If I allocate with a layout with alignment a, then it should be legal for me to deallocate with alignment b < a because an object which is aligned to a is also aligned to b, and thus a layout with alignment b fits an object allocated with a layout with alignment a (and with the same size).

    What this means is that you should be able to allocate with an alignment which is greater than the minimum alignment required for a particular type and then allow some other code to deallocate with the minimum alignment, and it should work.

    Joshua Liebow-Feeser at 2017-10-17 20:51:17

  89. Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)

    There was an RFC for this recently and it seems very unlikely that it could be done due to compatibility concerns: https://github.com/rust-lang/rfcs/pull/2040

    For example Box<Both> does not cast to Box<OnlyOneTrait>. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.

    Trait object upcasting on the other hand seems uncontroversially desirable, and mostly a question of effort / bandwidth / willpower to get it implemented. There was a thread recently: https://internals.rust-lang.org/t/trait-upcasting/5970

    GƔbor Lehel at 2017-10-17 23:19:57

  90. @ruuda I was the one who wrote that alloc_system implementation originally. alexcrichton merely moved it around during the great allocator refactors of <time period>.

    The current implementation requires that you deallocate with the same alignment specified that you allocated a given memory block with. Regardless of what the documentation may claim, this is the current reality that everyone must abide by until alloc_system on Windows is changed.

    Allocations on Windows always use a multiple of MEMORY_ALLOCATION_ALIGNMENT (although they remember the size you allocated them with to the byte). MEMORY_ALLOCATION_ALIGNMENT is 8 on 32bit and 16 on 64bit. For overaligned types, because the alignment is greater than MEMORY_ALLOCATION_ALIGNMENT, the overhead caused by alloc_system is consistently the amount of alignment specified, so a 64byte aligned allocation would have 64 bytes of overhead.

    If we decided to extend that overaligned trick to all allocations (which would get rid of the requirement to deallocate with the same alignment that you specified when allocating), then more allocations would have overhead. Allocations whose alignments are identical to MEMORY_ALLOCATION_ALIGNMENT will suffer a constant overhead of MEMORY_ALLOCATION_ALIGNMENT bytes. Allocations whose alignments are less than MEMORY_ALLOCATION_ALIGNMENT will suffer an overhead of MEMORY_ALLOCATION_ALIGNMENT bytes approximately half of the time. If the size of the allocation rounded up to MEMORY_ALLOCATION_ALIGNMENT is greater than or equal to the size of the allocation plus the size of a pointer, then there is no overhead, otherwise there is. Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?

    Peter Atashian at 2017-10-18 03:13:40

  91. @ruuda

    I personally feel that the implementation of alloc_system today on Windows is a bigger benefit than having the ability to relinquish ownership of an allocation to another container like Vec. AFAIK though there's no data to measure the impact of always padding with the alignment and not requiring an alignment on deallocation.

    @joshlf

    I think that comment is wrong, as pointed out alloc_system on Windows relies on the same alignment being passed to deallocation as was passed on allocation.

    Alex Crichton at 2017-10-18 14:28:20

  92. Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?

    It depends on the application whether the overhead is significant, and whether to optimize for memory or performance. My suspicion is that for most applications either is fine, but a small minority cares deeply about memory, and they really cannot afford those extra bytes. And another small minority needs control over alignment, and they really need it.

    Ruud van Asseldonk at 2017-10-18 17:44:28

  93. @alexcrichton

    I think that comment is wrong, as pointed out alloc_system on Windows relies on the same alignment being passed to deallocation as was passed on allocation.

    Doesn't that imply that alloc_system on Windows doesn't actually properly implement the Alloc trait (and thus maybe we ought to change the requirements of the Alloc trait)?


    @retep998

    If I'm reading your comment correctly, isn't that alignment overhead present for all allocations regardless of whether we need to be able to deallocate with a different alignment? That is, if I allocate 64 bytes with 64 byte alignment and I also deallocate with 64 byte alignment, the overhead you described is still present. Thus, it's not a feature of being able to deallocate with different alignments so much as it is a feature of requesting larger-than-normal alignments.

    Joshua Liebow-Feeser at 2017-10-18 18:00:16

  94. @joshlf The overhead caused by alloc_system currently is due to requesting larger-than-normal alignments. If your alignment is less than or equal to MEMORY_ALLOCATION_ALIGNMENT, then there is no overhead caused by alloc_system.

    However, if we changed the implementation to allow deallocating with different alignments, then the overhead would apply to nearly all allocations, regardless of alignment.

    Peter Atashian at 2017-10-18 23:38:43

  95. Ah I see; makes sense.

    Joshua Liebow-Feeser at 2017-10-18 23:46:55

  96. What is the meaning of implementing Alloc for both Heap and &Heap? In what cases would the user use one of those impls vs the other?

    Is this the first standard library API in which *mut u8 would mean "pointer to whatever"? There is String::from_raw_parts but that one really does mean pointer to bytes. I am not a fan of *mut u8 meaning "pointer to whatever" -- even C does better. What are some other options? Maybe a pointer to opaque type would be more meaningful.

    @rfcbot concern *mut u8

    David Tolnay at 2017-10-19 05:39:20

  97. @dtolnay Alloc for Heap is sort of "standard" and Alloc for &Heap is like Write for &T where the trait requires &mut self but the implementation does not. Notably that means that types like Heap and System are threadsafe and do not need to be synchronized when allocating.

    More importantly, though, usage of #[global_allocator] requires that the static it's attached to , which has type T, to have Alloc for &T. (aka all global allocators must be threadsafe)

    For *mut u8 I think that *mut () may be interesting, but I don't personally feel too compelled to "get this right" here per se.

    Alex Crichton at 2017-10-19 14:01:19

  98. The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

    Amanieu d'Antras at 2017-10-19 15:44:28

  99. For *mut u8 I think that *mut () may be interesting, but I don't personally feel too compelled to "get this right" here per se.

    If we go with *mut u8 in a stable interface, aren't we locking ourselves in? In other words, once we stabilize this, we won't have a chance to "get this right" in the future.

    Also, *mut () seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.

    Joshua Liebow-Feeser at 2017-10-19 20:30:31

  100. The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

    True, but you could easily do let ptr = (foo as *mut u8) and then go about your merry way. That doesn't seem like enough of a motivation to stick with *mut u8 in the API if there are compelling alternatives (which, to be fair, I'm not sure there are).

    Joshua Liebow-Feeser at 2017-10-19 20:33:21

  101. Also, *mut () seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.

    That optimization will already probably never happen-- it'd break too much existing code. Even if it did, it would be applied to &() and &mut (), not *mut ().

    Taylor Cramer at 2017-10-19 20:36:41

  102. If RFC 1861 were close to being implemented/stabilized, I'd suggest using it:

    extern { pub type void; }
    
    pub unsafe trait Alloc {
        unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut void, Error>;
        unsafe fn dealloc(&mut self, ptr: *mut void, layout: Layout);
        // ...
    }
    

    It's probably too far away though, right?

    Joshua Liebow-Feeser at 2017-10-19 20:47:42

  103. @joshlf I thought I saw a PR open about them, the remaining unknown is DynSized.

    Eduard-Mihai Burtescu at 2017-10-20 11:50:03

  104. Will this work for struct hack like objects? Say I have a Node<T> that looks like so:

    struct Node<T> {
       size: u32,
       data: T,
       // followed by `size` bytes
    }
    

    and a value type:

    struct V {
      a: u32,
      b: bool,
    }
    

    Now I want to allocate Node<V> with a string of size 7 in a single allocation. Ideally I want to make an allocation of size 16 align 4 and fit everything in it: 4 for u32, 5 for V, and 7 for the string bytes. This works because the last member of V has alignment 1 and the string bytes also have alignment 1.

    Note that this is not allowed in C/C++ if the types are composed like above, as writing to packed storage is undefined behavior. I think this is a hole in the C/C++ standard that unfortunately cannot be fixed. I can expand on why this is broken but lets focus on Rust instead. Can this work? :-)

    Alkis Evlogimenos at 2017-10-24 14:32:17

  105. With respect to the size and alignment of the Node<V> structure itself, you're pretty much at the whim of the Rust compiler. It's UB (undefined behavior) to allocate with any size or alignment smaller than what Rust requires since Rust may make optimizations based on the assumption that any Node<V> object - on the stack, on the heap, behind a reference, etc - has a size and alignment matching what is expected at compile time.

    In practice, it looks like the answer is unfortunately no: I ran this program and found that, at least on the Rust Playground, Node<V> is given a size of 12 and an alignment of 4, meaning that any objects after the Node<V> must be offset by at least 12 bytes. It looks like the offset of the data.b field within the Node<V> is 8 bytes, implying that bytes 9-11 are trailing padding. Unfortunately, even though those padding bytes "aren't used" in some sense, the compiler still treats them as part of the Node<V>, and reserves the right to do anything it likes with them (most importantly, including writing to them when you assign to a Node<V>, implying that if you try to squirrel away extra data there, it may get overwritten).

    (note, btw: you can't treat a type as packed that the Rust compiler doesn't think is packed. However, you can tell the Rust compiler that something is packed, which will change the type's layout (removing padding), using repr(packed))

    However, with respect to laying out one object after another without having them both be part of the same Rust type, I'm almost 100% sure this is valid - after all, it's what Vec does. You can use the methods of the Layout type to dynamically calculate how much space is needed for the total allocation:

    let node_layout = Layout::new::<Node<V>>();
    // NOTE: This is only valid if the node_layout.align() is at least as large as mem::align_of_val("a")!
    // NOTE: I'm assuming that the alignment of all strings is the same (since str is unsized, you can't do mem::align_of::<str>())
    let padding = node_layout.padding_needed_for(mem::align_of_val("a"));
    let total_size = node_layout.size() + padding + 7;
    let total_layout = Layout::from_size_align(total_size, node_layout.align()).unwrap();
    

    Joshua Liebow-Feeser at 2017-10-24 15:19:55

  106. Would something like this work?

    #[repr(C)]
    struct Node<T> {
       size: u32,
       data: T,
       bytes: [u8; 0],
    }
    

    … then allocate with a larger size, and use slice::from_raw_parts_mut(node.bytes.as_mut_ptr(), size) ?

    Simon Sapin at 2017-10-24 15:35:01

  107. Thanks @joshlf for the detailed answer! The TLDR for my usecase is that I can get a Node<V> of size 16 but only if V is repr(packed). Otherwise the best I can do is size 19 (12 + 7).

    @SimonSapin not sure; I will try.

    Alkis Evlogimenos at 2017-10-24 15:49:48

  108. Haven't really caught up with this thread, but I am dead set against stabilizing anything yet. We've not made progress implementing the the hard problems yet:

    1. Allocator-polymorphic collections
      • not even non-bloated box!
    2. Falliable collections

    I think the design of the fundamental traits will affect the solutions of those: I've had little time for Rust for the past few months, but have argued this at times. I doubt I will have time to fully make my case here either, so I can only hope we first at least write up a complete solution to all of those: somebody prove me wrong that it's impossible to be rigorous (force correct usage), flexible, and ergonomic with the current traits. Or even just finish checking the boxes at the top.

    John Ericson at 2017-10-24 16:16:22

  109. Re: @Ericson2314's comment

    I think that a relevant question related to the conflict between that perspective and @alexcrichton's desire to stabilize something is: How much benefit do we get from stabilizing a minimal interface? In particular, very few consumers will call Alloc methods directly (even most collections will probably use Box or some other similar container), so the real question becomes: what does stabilizing buy for users who will not be calling Alloc methods directly? Honestly the only serious use case I can think of is that it paves a path for allocator-polymorphic collections (which will likely be used by a much broader set of users), but it seems like that's blocked on #27336, which is far from being resolved. Maybe there are other large use cases I'm missing, but based on that quick analysis, I'm inclined to lean away from stabilization as having only marginal benefits at the cost of locking us into a design that we might later find to be suboptimal.

    Joshua Liebow-Feeser at 2017-10-24 16:41:42

  110. @joshlf it allows people to define and use their own global allocators.

    Steven Fackler at 2017-10-24 16:42:55

  111. Hmmm good point. would it be possible to stabilize specifying the global allocator without stabilizing Alloc? I.e., the code that implements Alloc would have to be unstable, but that would probably be encapsulated in its own crate, and the mechanism to mark that allocator as the global allocator would itself be stable. Or am I misunderstanding how stable/unstable and the stable compiler/nightly compiler interact?

    Joshua Liebow-Feeser at 2017-10-24 16:49:19

  112. Ah @joshlf remember that #27336 is a distraction, as per https://github.com/rust-lang/rust/issues/42774#issuecomment-317279035. I'm pretty sure we'll run into other problems---problems with the traits as they exist, which is why I want to work to commence work on that now. It's a lot easier to discuss those problems once they arrive for all to see than debate predicted futures post-#27336 .

    John Ericson at 2017-10-24 17:21:59

  113. @joshlf But you can't compile the crate that defines the global allocator with a stable compiler.

    Steven Fackler at 2017-10-24 17:49:04

  114. @sfackler Ah yes, there's that misunderstanding I was afraid of :P

    Joshua Liebow-Feeser at 2017-10-24 18:18:52

  115. I find the name Excess(ptr, usize) a bit confusing because the usize is not the excess in size of the requested allocation (as in the extra size allocated), but the total size of the allocation.

    IMO Total, Real, Usable, or any name that conveys that the size is the total size or real size of the allocation is better than "excess", which I find misleading. The same applies for the _excess methods.

    gnzlbg at 2017-10-25 11:58:51

  116. I agree with @gnzlbg above, I think a plain (ptr, usize) tuple would be just fine.

    Arthur Silva at 2017-10-25 14:22:39

  117. Note that Excess is not proposed to be stablized in the first pass, however

    Alex Crichton at 2017-10-25 14:49:48

  118. Posted this thread for discussion on reddit, which has some people with concerns: https://www.reddit.com/r/rust/comments/78dabn/custom_allocators_are_on_the_verge_of_being/

    bstrie at 2017-10-25 21:47:17

  119. After further discussion with @rust-lang/libs today, I'd like to make a few tweaks to the stabilization proposal which can be summarized with:

    • Add alloc_zeroed to the set of stabilized methods, otherwise having the same signature as alloc.
    • Change *mut u8 to *mut void in the API using extern { type void; } support, solving @dtolnay's concern and providing a way forward for unifying c_void across the ecosystem.
    • Change the return type of alloc to *mut void, removing the Result and the Error

    Perhaps the most contentious is the last point so I want to elaborate on that as well. This came out of discussion with the libs team today and specifically revolved around how (a) the Result-based interface has a less efficient ABI than a pointer-returning one and (b) almost no "production" allocators today provide the ability to learn anything more than "this just OOM'd". For performance we can mostly paper over it with inlining and such but it remains that the Error is an extra payload that's difficult to remove at the lowest layers.

    The thinking for returning payloads of errors is that allocators can provide implementation-specific introspection to learn why an allocation failed and otherwise almost all consumers should only need to know whether the allocation succeeded or failed. Additionally this is intended to be a very low-level API which isn't actually called that often (instead, the typed APIs which wrap things up nicely should be called instead). In that sense it's not paramount that we have the most usable and ergonomic API for this location, but rather it's more important about enabling use cases without sacrificing performance.

    Alex Crichton at 2017-10-31 22:22:28

  120. The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

    In the libs meeting we also suggested impl *mut void { fn offset } which does not conflict with the existing offset defined for T: Sized. Could also be byte_offset.

    David Tolnay at 2017-10-31 23:20:28

  121. +1 for using *mut void and byte_offset. Is there going to be an issue with stabilization of the extern types feature, or can we sidestep that issue because only the definition is unstable (and liballoc can do unstable things internally) and not the use (e.g., let a: *mut void = ... isn't unstable)?

    Joshua Liebow-Feeser at 2017-10-31 23:59:47

  122. Yep, we don't need to block on extern type stabilization. Even if extern type support gets deleted, the void we define for this can always be a magical type worst case.

    Steven Fackler at 2017-11-01 00:28:19

  123. Was there any further discussion in the libs meeting about whether or not Alloc and Dealloc should be separate traits?

    Taylor Cramer at 2017-11-01 00:33:54

  124. We didn't touch on that specifically, but we generally felt that we shouldn't be diverting from prior art unless we have a particularly compelling reason to do so. In particular, C++'s Allocator concept does not have a similar split.

    Steven Fackler at 2017-11-01 00:42:53

  125. I'm not sure that's an apt comparison in this case. In C++, everything's explicitly freed, so there's no equivalent to Box that needs to store a copy of (or a reference to) its own allocator. That's what causes the large size blowup for us.

    Joshua Liebow-Feeser at 2017-11-01 00:46:06

  126. std::unique_ptr is generic on a "Deleter".

    Taylor Cramer at 2017-11-01 00:48:25

  127. @joshlf unique_ptr is the equivalent of Box, vector is the equivalent of Vec, unordered_map is the equivalent of HashMap, etc.

    @cramertj Ah, interesting, I was only looking at collections types. It seems like that might be a thing to do then. We can always add it in later via blanket impls but it'd probably be cleaner to avoid that.

    Steven Fackler at 2017-11-01 01:25:02

  128. The blanket impl approach might be cleaner, actually:

    pub trait Dealloc {
        fn dealloc(&self, ptr: *mut void, layout: Layout);
    }
    
    impl<T> Dealloc for T
    where
        T: Alloc
    {
        fn dealloc(&self, ptr: *mut void, layout: Layout) {
            <T as Alloc>::dealloc(self, ptr, layout)
        }
    }
    

    One less trait to worry about for the majority of use cases.

    Steven Fackler at 2017-11-01 02:23:04

    • Change the return type of alloc to *mut void, removing the Result and the Error

    Perhaps the most contentious is the last point so I want to elaborate on that as well. This came out of discussion with the libs team today and specifically revolved around how (a) the Result-based interface has a less efficient ABI than a pointer-returning one and (b) almost no "production" allocators today provide the ability to learn anything more than "this just OOM'd". For performance we can mostly paper over it with inlining and such but it remains that the Error is an extra payload that's difficult to remove at the lowest layers.

    I’m concerned that this would make it very easy to use the returned pointer without checking for null. It seems that the overhead could also be removed without adding this risk by returning Result<NonZeroPtr<void>, AllocErr> and making AllocErr zero-size?

    (NonZeroPtr is a merged ptr::Shared and ptr::Unique as proposed in https://github.com/rust-lang/rust/issues/27730#issuecomment-316236397.)

    Simon Sapin at 2017-11-01 07:03:37

  129. @SimonSapin something like Result<NonZeroPtr<void>, AllocErr> requires three types to be stabilized, all of which are brand new and some of which have historically been languishing quite a bit for stabilization. Something like void isn't even required and is a nice-to-have (in my opinion).

    I agree it's "easy to use it without checking for null", but this is, again, a very low-level API that's not intended to be heavily used, so I don't think we should optimize for caller ergonomics.

    Alex Crichton at 2017-11-01 15:38:07

  130. People can also build higher level abstractions like alloc_one on top of the low level alloc that could have more complex return types like Result<NonZeroPtr<void>, AllocErr>.

    Steven Fackler at 2017-11-01 15:42:01

  131. I agree that AllocErr would not be useful in practice, but how about just Option<NonZeroPtr<void>>? APIs that are impossible to accidentally misuse, without overhead, is one of the things that set Rust apart from C, and returning to C-style null pointers feels like a step backwards to me. Saying it is ā€œa very low-level API that's not intended to be heavily usedā€ is like saying that we should not care about memory safety on uncommon microcontroller architectures because they are very low-level and not heavily used.

    Ruud van Asseldonk at 2017-11-01 20:06:55

  132. Every interaction with the allocator involves unsafe code regardless of the return type of this function. Low level allocation APIs are possible to misuse whether the return type is Option<NonZeroPtr<void>> or *mut void.

    Alloc::alloc in particular is the API that is low level and not intended to be heavily used. Methods like Alloc::alloc_one<T> or Alloc::alloc_array<T> are the alternatives that would be more heavily used and have a "nicer" return type.

    A stateful AllocError is not worth it, but a zero sized type that implements Error and has a Display of allocation failure is nice to have. If we go the NonZeroPtr<void> route, I see Result<NonZeroPtr<void>, AllocError> as preferable to Option<NonZeroPtr<void>>.

    Steven Fackler at 2017-11-01 20:19:44

  133. Why the rush to stabilize :(!! Result<NonZeroPtr<void>, AllocErr> is indisputably nicer for clients to use. Saying this is a "very low-level API" that need not be nice is just depressingly unambitious. Code at all levels should be as safe and maintainable as possible; obscure code that isn't constantly edited (and thus paged in people's short term memories) all the more so!

    Furthermore, if we are to have have user-written allocate-polymorphic collections, which I certainly hope for, that's an open amount of fairly complex code using allocators directly.

    John Ericson at 2017-11-03 21:22:20

  134. Re dealloaction, operationally, we almost certainly want to reference/clone the alloactor just once per tree-based collection. That means passing in the allocator to each custom-allocator-box being destroyed. But, It's an open problem on how best to do this in Rust without linear types. Contrary to my previous comment, I'd be OK with some unsafe code in collections implementations for this, because the ideal usecase changes the implementation of Box, not the implementation of split allocator and deallocator traits. I.e. we can make stabilizable progress without blocking on linearity.

    @sfackler I think we need some associated types connecting the deallocator to the allocator; it may be not possible to retrofit them.

    John Ericson at 2017-11-03 21:46:02

  135. @Ericson2314 There is a "rush" to stabilize because people want to use allocators to real things in the real world. This is not a science project.

    What would that associated type be used for?

    Steven Fackler at 2017-11-03 21:52:42

  136. @sfackler people can still pin a nightly / and type of people who care about this sort of advance feature should be comfortable doing that. [If the issue is unstable rustc vs unstable Rust, that's a different problem that needs a policy fix.] Baking in lousy APIs, on the contrary, hamstrings us forever, unless we want to split the ecosystem with a new 2.0 std.

    The associated types would relate the deallocator to the allocator. Each needs to know about that other for this to work. [There's still the issue of using the wrong (de)allocator of the right type, but I accept that no one has remotely proposed a solution to that.]

    John Ericson at 2017-11-03 22:02:45

  137. If people can just pin to a nightly, why do we have stable builds at all? The set of people who are directly interacting with allocator APIs is much smaller than the people who want to take advantage of those APIs by e.g. replacing the global allocator.

    Can you write some code that shows why a deallocator needs to know the type of its associated allocator? Why doesn't C++'s allocator API need a similar mapping?

    Steven Fackler at 2017-11-03 22:11:22

  138. If people can just pin to a nightly, why do we have stable builds at all?

    To indicate language stability. Code you write against this version of things will never break. On a newer compiler. You pin a nightly when you need something so bad, it's not worth waiting for the final iteration of the feature deemed of quality worthy of that guarantee.

    The set of people who are directly interacting with allocator APIs is much smaller than the people who want to take advantage of those APIs by e.g. replacing the global allocator.

    Aha! This would be for moving jemalloc out of tree, etc? No one has proposed stabilizing the awful hacks that allow choosing the global allocator, just the heap static itself? Or did I read the proposal wrong?

    John Ericson at 2017-11-03 22:43:17

  139. The awful hacks that allow for choosing the global allocator are proposed to be stabilized, which is half of what allows us to move jemalloc out of tree. This issue is the other half.

    Steven Fackler at 2017-11-03 22:48:31

  140. #[global_allocator] attribute stabilization: https://github.com/rust-lang/rust/issues/27389#issuecomment-336955367

    Simon Sapin at 2017-11-03 22:48:57

  141. Yikes

    John Ericson at 2017-11-03 22:49:26

  142. @Ericson2314 What do you think would be a non-awful way to select the global allocator?

    Simon Sapin at 2017-11-04 07:18:16

  143. (Responded in https://github.com/rust-lang/rust/issues/27389#issuecomment-342285805)

    John Ericson at 2017-11-06 20:44:02

  144. The proposal has been amended to use *mut void.

    @rfcbot resolved *mut u8

    David Tolnay at 2017-11-07 21:08:33

  145. @rfcbot reviewed

    After some discussion on IRC, I'm approving this with the understanding that we do not intend to stabilize a Box generic on Alloc, but instead on some Dealloc trait with an appropriate blanket impl, as suggested by @sfackler here. Please let me know if I've misunderstood the intention.

    Taylor Cramer at 2017-11-07 21:21:00

  146. @cramertj Just to clarify, it's possible to add that blanket impl after the fact and not break the Alloc definition that we're stabilizing here?

    Joshua Liebow-Feeser at 2017-11-07 21:30:00

  147. @joshlf yep, it'd look like this: https://github.com/rust-lang/rust/issues/32838#issuecomment-340959804

    Steven Fackler at 2017-11-07 21:36:35

  148. How will we specify the Dealloc for a given Alloc? I'd imagine something like this?

    pub unsafe trait Alloc {
        type Dealloc: Dealloc = Self;
        ...
    }
    

    Joshua Liebow-Feeser at 2017-11-07 21:41:32

  149. I guess that puts us in thorny territory WRT https://github.com/rust-lang/rust/issues/29661.

    Taylor Cramer at 2017-11-07 21:43:26

  150. Yeah, I don't think there's a way to have the addition of Dealloc be backwards-compatible with existing definitions of Alloc (which don't have that associated type) without having a default.

    Joshua Liebow-Feeser at 2017-11-07 21:44:59

  151. If you wanted to automatically be able to grab the deallocator corresponding to an allocator, you'd need more than just an associated type, but a function to produce a deallocator value.

    But, this can be handled in the future with that stuff being attached to a separate subtrait of Alloc I think.

    Steven Fackler at 2017-11-07 22:02:40

  152. @sfackler i'm not sure I understand. Can you write out the signature of Box::new under your design?

    Taylor Cramer at 2017-11-07 22:05:46

  153. This is ignoring placement syntax and all of that, but one way you could do it would be

    pub struct Box<T, D>(NonZeroPtr<T>, D);
    
    impl<T, D> Box<T, D>
    where
        D: Dealloc
    {
        fn new<A>(alloc: A, value: T) -> Box<T, D>
        where
            A: Alloc<Dealloc = D>
        {
            let ptr = alloc.alloc_one().unwrap_or_else(|_| alloc.oom());
            ptr::write(&value, ptr);
            let deallocator = alloc.deallocator();
            Box(ptr, deallocator)
        }
    }
    

    Notably, we need to actually be able to produce an instance of the deallocator, not just know its type. You could also parameterize the Box over the Alloc and store A::Dealloc instead, which might help with type inference. We can make this work after this stabilization by moving Dealloc and deallocator to a separate trait:

    pub trait SplitAlloc: Alloc {
        type Dealloc;
    
        fn deallocator(&self) -> Self::Dealloc;
    }
    

    Steven Fackler at 2017-11-07 22:43:24

  154. But what would the impl of Drop look like?

    Joshua Liebow-Feeser at 2017-11-07 23:00:01

  155. impl<T, D> Drop for Box<T, D>
    where
        D: Dealloc
    {
        fn drop(&mut self) {
            unsafe {
                ptr::drop_in_place(self.0);
                self.1.dealloc_one(self.0);
            }
        }
    }
    

    Steven Fackler at 2017-11-07 23:30:57

  156. But assuming we stabilize Alloc first, then not all Allocs will implement Dealloc, right? And I thought impl specialization was still a ways off? In other words, in theory, you'd want to do something like the following, but I don't think it works yet?

    impl<T, D> Drop for Box<T, D> where D: Dealloc { ... }
    impl<T, A> Drop for Box<T, A> where A: Alloc { ... }
    

    Joshua Liebow-Feeser at 2017-11-07 23:36:41

  157. If anything, we'd have a

    default impl<T> SplitAlloc for T
    where
        T: Alloc { ... }
    

    But I don't think that'd really be necessary. The use cases for custom allocators and global allocators are distinct enough that I wouldn't assume there'd be a ton of overlap between them.

    Steven Fackler at 2017-11-07 23:40:22

  158. I suppose that could work. It seems much cleaner to me, though, to just have Dealloc right off the bat so we can have the simpler interface. I imagine we could have a pretty simple, uncontroversial interface that would require no change to existing code that already implements Alloc:

    unsafe trait Dealloc {
        fn dealloc(&mut self, ptr: *mut void, layout: Layout);
    }
    
    impl<T> Dealloc for T
    where
        T: Alloc
    {
        fn dealloc(&self, ptr: *mut void, layout: Layout) {
            <T as Alloc>::dealloc(self, ptr, layout)
        }
    }
    
    unsafe trait Alloc {
        type Dealloc: Dealloc = &mut Self;
        fn deallocator(&mut self) -> Self::Dealloc { self }
        ...
    }
    

    Joshua Liebow-Feeser at 2017-11-07 23:49:26

  159. I though associated type defaults were problematic?

    Steven Fackler at 2017-11-07 23:52:03

  160. A Dealloc that's a mutable reference to the allocator seems not all that useful - you can only allocate one thing at a time, right?

    Steven Fackler at 2017-11-07 23:54:11

  161. I though associated type defaults were problematic?

    Oh I guess associated type defaults are far enough away that we can't rely on them.

    Still, we could have the simpler:

    unsafe trait Dealloc {
        fn dealloc(&mut self, ptr: *mut void, layout: Layout);
    }
    
    impl<T> Dealloc for T
    where
        T: Alloc
    {
        fn dealloc(&self, ptr: *mut void, layout: Layout) {
            <T as Alloc>::dealloc(self, ptr, layout)
        }
    }
    
    unsafe trait Alloc {
        type Dealloc: Dealloc;
        fn deallocator(&mut self) -> Self::Dealloc;
        ...
    }
    

    and just require the implementor to write a bit of boilerplate.

    A Dealloc that's a mutable reference to the allocator seems not all that useful - you can only allocate one thing at a time, right?

    Yeah, good point. Probably a moot point anyway given your other comment.

    Joshua Liebow-Feeser at 2017-11-07 23:56:26

  162. Should deallocator take self, &self, or &mut self?

    Steven Fackler at 2017-11-07 23:59:44

  163. Probably &mut self to be consistent with the other methods.

    Joshua Liebow-Feeser at 2017-11-08 00:03:04

  164. Are there any allocators that would prefer to take self by value so they don't have to e.g. clone state?

    Steven Fackler at 2017-11-08 00:07:23

  165. The problem with taking self by value is that it precludes getting a Dealloc and then continuing to allocate.

    Joshua Liebow-Feeser at 2017-11-08 00:08:13

  166. I'm thinking of a hypothetical "oneshot" allocator, though I don't know how much of a real thing that is.

    Steven Fackler at 2017-11-08 00:11:33

  167. Such an allocator might exist, but taking self by value would require that all allocators work that way, and would preclude any allocators allowing allocation after deallocator has been called.

    Joshua Liebow-Feeser at 2017-11-08 00:16:55

  168. I would still like to see some of this implemented and used in collections before we think about stabilizing it.

    Steven Fackler at 2017-11-08 00:19:15

  169. Do you think https://github.com/rust-lang/rust/issues/27336 or the points discussed in https://github.com/rust-lang/rust/issues/32838#issuecomment-339066870 will allow us to move forward on collections?

    Joshua Liebow-Feeser at 2017-11-08 01:27:14

  170. I'm worried about the type alias approach's impact on documentation readability. A (very verbose) way to allow progress would be to wrap types:

    pub struct Vec<T>(alloc::Vec<T, Heap>);
    
    impl<T> Vec<T> {
        // forwarding impls for everything
    }
    

    Steven Fackler at 2017-11-08 01:35:06

  171. I know it's a pain, but it seems like the changes we're discussing here are big enough that if we decide to go forward with split alloc/dealloc traits, we should try them out in std first and then re-FCP.

    Taylor Cramer at 2017-11-08 01:36:44

  172. What is the timeline on waiting on this stuff to get implemented?

    Steven Fackler at 2017-11-08 01:41:33

  173. The grow_in_place method doesn't return any kind of excess capacity. It currently calls usable_size with a layout, extends the allocation to at least fit this layout, but if the allocation is extended beyond that layout users have no way to know.

    gnzlbg at 2017-11-10 09:33:21

  174. I am having a hard time understanding the advantage of the alloc and realloc methods over alloc_excess and realloc_excess.

    An allocator needs to find a suitable memory block to perform an allocation: this requires knowing the size of the memory block. Whether the allocator then returns a pointer, or the tuple "pointer and size of the memory block" does not make any measurable performance differences.

    So alloc and realloc just increase the API surface and seem to encourage writing less performant code. Why do we have them in the API at all? What's their advantage?


    EDIT: or in other words: all potentially allocating functions in the API should return Excess, which basically removes the need for all the _excess methods.

    gnzlbg at 2017-11-10 09:42:26

  175. Excess is only interesting for use cases that involve arrays which can grow. It's not useful or relevant for Box or BTreeMap, for example. There may be some cost in computing what the excess is, and there's certainly a more complex API, so it doesn't seem to me like code that doesn't care about excess capacity should be forced to pay for it.

    Steven Fackler at 2017-11-10 18:26:24

  176. There may be some cost in computing what the excess is

    Can you give an example? I don't know, and cannot imagine, an allocator that is able to allocate memory but that does not know how much memory it is actually allocating (which is what Excess is: real amount of memory allocated; we should rename it).

    The only commonly used Allocator where this might be slightly controversial is POSIX malloc, which even though it always computes the Excess internally, does not expose it as part of its C API. However, returning the requested size as the Excess is ok, portable, simple, incurs no cost at all, and is what everybody using POSIX malloc is already assuming anyways.

    jemalloc and basically any other Allocator out there provide API's that returns the Excess without incurring any costs, so for those allocators, returning the Excess is zero cost as well.

    There may be some cost in computing what the excess is, and there's certainly a more complex API, so it doesn't seem to me like code that doesn't care about excess capacity should be forced to pay for it.

    Right now everybody is already paying the price of the allocator trait having two APIs for allocating memory. And while one can build an Excess-less API on top of an Excess-full one`, the opposite is not true. So I wonder why it isn't done like this:

    • Alloc trait methods always return Excess
    • add an ExcessLessAlloc trait that just drops off the Excess from Alloc methods for all users that 1) care enough to use Alloc but 2) don't care about the real amount of memory currently being allocated (looks like a niche to me, but I still think that such an API is nice to have)
    • if one day somebody discovers a way to implement Allocators with fast-paths for Excess-less methods, we can always provide a custom implementation of ExcessLessAlloc for it.

    FWIW I just landed on this thread again because I can't implement what I want on top of Alloc. I mentioned that it is missing grow_in_place_excess before, but I just got stuck again because it is also missing alloc_zeroed_excess (and who knows what else).

    I'd be more comfortable if the stabilization here would focus on stabilizing an Excess-full API first. Even if its API is not the most ergonomic for all uses, such an API would at least allow all uses which is a necessary condition to show that the design is not flawed.

    gnzlbg at 2017-11-13 13:02:25

  177. Can you give an example? I don't know, and cannot imagine, an allocator that is able to allocate memory but that does not know how much memory it is actually allocating (which is what Excess is: real amount of memory allocated; we should rename it).

    Most allocators today use size classes, where each size class allocates only objects of a particular fixed size, and allocation requests that don't fit a particular size class are rounded up to the smallest size class that they fit inside. In this scheme, it's common to do things like having an array of size class objects and then doing classes[size / SIZE_QUANTUM].alloc(). In that world, figuring out what size class is used takes extra instructions: e.g., let excess = classes[size / SIZE_QUANTUM].size. It may not be a lot, but the performance of high-performance allocators (like jemalloc) are measured in single clock cycles, so it could represent meaningful overhead, especially if that size ends up getting passed through a chain of function returns.

    Joshua Liebow-Feeser at 2017-11-14 21:12:57

  178. Can you give an example?

    At least going off of your PR to alloc_jemalloc, alloc_excess is pretty clearly running more code than alloc: https://github.com/rust-lang/rust/pull/45514/files.

    Steven Fackler at 2017-11-14 21:17:30

  179. In this scheme, it's common to do things like having an array of size class objects and then doing classes[size / SIZE_QUANTUM].alloc(). In that world, figuring out what size class is used takes extra instructions: e.g., let excess = classes[size / SIZE_QUANTUM].size

    So let me see if I follow properly:

    // This happens in both cases:
    let size_class = classes[size / SIZE_QUANTUM];
    let ptr = size_class.alloc(); 
    // This would happen only if you need to return the size:
    let size = size_class.size;
    return (ptr, size);
    

    Is that it?


    At least going off of your PR to alloc_jemalloc, alloc_excess is pretty clearly running more code than alloc

    That PR was a bugfix (not a perf fix), there are many things wrong with the current state of our jemalloc layer perf-wise but since that PR it at least returns what it should:

    • nallocx is a const function in the GCC sense, that is, a true pure function. This means it has no side-effects, its results depends on its arguments only, it accesses no global state, its arguments are not pointers (so the function cannot access global state throw them), and for C/C++ programs LLVM can use this information to elide the call if the result is not used. AFAIK Rust currently cannot mark FFI C functions as const fn or similar. So this is the first thing that could be fixed and that would make realloc_excess zero-cost for those that don't use the excess as long as inlining and optimizations work properly.
    • nallocx is always computed for aligned allocations inside mallocx, that is, all code is already comptuing it, but mallocx throws its result away, so here we are actually computing it twice, and in some cases nallocx is almost as expensive as mallocx... I have a fork of jemallocator that has some benchmarks for things like this in its branches, but this must be fixed upstream by jemalloc by providing an API that does not throw this away. This fix, however, only affects those that are currently using the Excess.
    • and then is the issue that we are computing the align flags twice, but that is something that LLVM can optimize on our side (and trivial to fix).

    So yes, it looks like more code, but this extra code is code that we are actually calling twice, because the first time that we called it we threw the results away. It is not impossible to fix, but I haven't found the time to do this yet.


    EDIT: @sfackler I managed to free some time for it today and was able to make alloc_excess "free" with respect to alloc in jemallocs slow path, and have only an overhead of ~1ns in jemallocs' fast path. I haven't really looked at the fast path in much detail, but it might be possible to improve this further. The details are here: https://github.com/jemalloc/jemalloc/issues/1074#issuecomment-345040339

    gnzlbg at 2017-11-15 19:16:53

  180. Is that it?

    Yes.

    Joshua Liebow-Feeser at 2017-11-15 19:44:23

  181. So this is the first thing that could be fixed and that would make realloc_excess zero-cost for those that don't use the excess as long as inlining and optimizations work properly.

    When used as the global allocator, none of this can be inlined.

    Even if its API is not the most ergonomic for all uses, such an API would at least allow all uses which is a necessary condition to show that the design is not flawed.

    There is literally zero code on Github that calls alloc_excess. If this is such a crucially important feature, why has no one ever used it? C++'s allocation APIs do not provide access to excess capacity. It seems incredibly straightforward to add/stabilize these features in the future in a backwards compatible way if there is actual concrete evidence that they improve performance and anyone actually cares enough to use them.

    Steven Fackler at 2017-11-21 03:55:14

  182. When used as the global allocator, none of this can be inlined.

    Then that is a problem that we should try to solve, at least for LTO builds, because global allocators like jemalloc rely on this: nallocx is the way it is by design, and the first recommendation jemalloc's devs made us regarding alloc_excess performance is that we should have those calls inlined, and we should propagate C attributes properly, so that the compiler removes the nallocx calls from the call-sites that do not use the Excess, like C and C++ compilers do.

    Even if we can't do that, the Excess API can still be made zero-cost by patching the jemalloc API (I have an initial implementation of such a patch in my rust-lang/jemalloc fork). We could either maintain that API ourselves, or try to land it upstream, but for that to land upstream we must make a good case about why these other languages can perform these optimizations and Rust cannot. Or we must have another argument, like this new API is significantly faster than mallocx + nallocx for those users that do need the Excess.

    If this is such a crucially important feature, why has no one ever used it?

    That's a good question. std::Vec is the poster-child for using the Excess API, but it currently does not use it, and all my previous comments stating "this and that are missing from the Excess API" were me trying to make Vec use it. The Excess API:

    • was not returning the Excess at all: https://github.com/rust-lang/rust/pull/45514
    • is missing functionality like grow_in_place_excess and alloc_zeroed_excess,
    • cannot propagate C attributes in FFI properly: https://github.com/rust-lang/rust/issues/46046
    • ... (I doubt it ends here)

    I cannot know why nobody is using this API. But given that not even the std library can use it for the data-structure it is best suited for (Vec), if I had to guess, I would say that the main reason is that this API is currently broken.

    If I had to guess even further, I would say that not even those who designed this API have used it, mainly because no single std collection uses it (which is where I expect this API to be tested at first), and also because using _excess and Excess everywhere to mean usable_size/allocation_size is extremely confusing/annoying to program with.

    This is probably because more work was put into the Excess-less APIs, and when you have two APIs, it is hard to keep them in sync, it is hard for users to discover both and know which to use, and finally, it is hard for users to prefer convenience over doing the right thing.

    gnzlbg at 2017-11-21 09:29:31

  183. Or in other words, if I have two competing APIs, and I put 100% of the work into improving one, and 0% of the work into improving the other, it isn't surprising to reach the conclusion that one is in practice significantly better than the other.

    gnzlbg at 2017-11-21 09:34:32

  184. As far as I can tell, these are the only two calls to nallocx outside of jemalloc tests on Github:

    https://github.com/facebook/folly/blob/f2925b23df8d85ebca72d62a69f1282528c086de/folly/detail/ThreadLocalDetail.cpp#L182 https://github.com/louishust/mysql5.6.14_tokudb/blob/4897660dee3e8e340a1e6c8c597f3b2b7420654a/storage/tokudb/ft-index/ftcxx/malloc_utils.hpp#L91

    Neither of them resemble the current alloc_excess API, but are rather used standalone to compute an allocation size before it's made.

    Apache Arrow looked into using nallocx in their implementation but found things did not work out well:

    https://issues.apache.org/jira/browse/ARROW-464

    These are basically the only references to nallocx I can find. Why is it important that the initial implementation of allocator APIs support such an obscure feature?

    Steven Fackler at 2017-11-21 18:16:15

  185. As far as I can tell, these are the only two calls to nallocx outside of jemalloc tests on Github:

    From the top of my head I know that at least facebook's vector type is using it via facebook's malloc implementation (malloc and fbvector growth policy; that is a big chunk of C++'s vectors at facebook use this) and also that Chapel used it to improve the performance of their String type (here and the tracking issue). So maybe today wasn't Github's best day?

    Why is it important that the initial implementation of allocator APIs support such an obscure feature?

    The initial implementation of an allocator API does not need to support this feature.

    But good support for this feature should block the stabilization of such an API.

    gnzlbg at 2017-11-21 22:14:53

  186. Why should it block stabilization if it can be added backwards-compatibly later?

    Hanna Kruppe at 2017-11-21 22:17:27

  187. Why should it block stabilization if it can be added backwards-compatibly later?

    Because for me at least it means that only half of the design space has been sufficiently explored.

    gnzlbg at 2017-11-21 22:19:31

  188. Do you expect the non-excess related portions of the API will be affected by the design of the excess-related functionality? I admit I've only followed that discussion half-heartedly but it seems unlikely to me.

    Hanna Kruppe at 2017-11-21 22:22:08

  189. If we can't make this API:

    fn alloc(...) -> (*mut u8, usize) { 
       // worst case system API:
       let ptr = malloc(...);
       let excess = malloc_excess(...);
       (ptr, excess)
    }
    let (ptr, _) = alloc(...); // drop the excess
    

    as efficient as this one:

    fn alloc(...) -> *mut u8 { 
       // worst case system API:
       malloc(...)
    }
    let ptr = alloc(...);
    

    then we have bigger problems.

    Do you expect the non-excess related portions of the API will be affected by the design of the excess-related functionality?

    So yes, I expect a good excess-API to have a huge effect on the design of the non-excess related functionality: it would completely remove it.

    That would prevent the current situation of having two APIs that are out of sync, and in which the excess-api has less functionality than the excess-less one. While one can build an excess-less API on top of an excess-full one, the opposite is not true.

    Those who want to drop the Excess should just drop it.

    gnzlbg at 2017-11-21 22:39:34

  190. To clarify, if there were some way of adding an alloc_excess method after the fact in a backwards-compatible way, then you'd be OK with it? (but of course, stabilizing without alloc_excess means that adding it later would be a breaking change; I'm just asking so I understand your reasoning)

    Joshua Liebow-Feeser at 2017-11-21 22:43:39

  191. @joshlf It is very straightforward to do that.

    Steven Fackler at 2017-11-21 22:48:19

  192. :bell: This is now entering its final comment period, as per the review above. :bell:

    Rust RFC bot at 2017-11-21 22:48:21

  193. Those who want to drop the Excess should just drop it.

    Alternatively, the 0.01% of people that care about excess capacity can use another method.

    Steven Fackler at 2017-11-21 22:49:37

  194. @sfackler This is what I get for taking a two-week break from rust - I forget about default method impls :)

    Joshua Liebow-Feeser at 2017-11-21 23:04:40

  195. Alternatively, the 0.01% of people that care about excess capacity can use another method.

    Where you are getting this number?

    All of my Rust data-structures are flat in memory. The ability to do that is the only reason I use Rust; if I could just Box everything I would be using a different language. So I don't care about the Excess the 0.01% of the time, I care about it all the time.

    I understand that this is domain specific, and that in other domains people would never care about the Excess, but I doubt that only 0.01% of Rust users care about this (I mean, a lot of people use Vec and String, which are the poster-child data-structures for Excess).

    gnzlbg at 2017-11-21 23:34:25

  196. I am getting that number from the fact that there are approx 4 things in total that use nallocx, compared to the set of things that use malloc.

    Steven Fackler at 2017-11-21 23:49:53

  197. @gnzlbg

    Are you suggesting that if we did it "right" from the start, we'd have just fn alloc(layout) -> (ptr, excess) and no fn alloc(layout) -> ptr at all? That seems far from obvious to me. Even if excess is available, it seems natural to have the latter API for the use cases where excess doesn't matter (e.g., most tree structures), even if it's implemented as alloc_excess(layout).0.

    Hanna Kruppe at 2017-11-22 00:12:51

  198. @rkruppe

    That seems far from obvious to me. Even if excess is available, it seems natural to have the latter API for the use cases where excess doesn't matter (e.g., most tree structures), even if it's implemented as alloc_excess(layout).0.

    Currently, the excess-full API is implemented on top of the excess-less one. Implementing Alloc for an excess-less allocator requires the user to provide the alloc and dealloc methods.

    However, if I want to implement Alloc for an excess-full allocator, I need to provide more methods (at least alloc_excess, but this grows if we go into realloc_excess, alloc_zeroed_excess, grow_in_place_excess, ...).

    If we were to do it the other way around, that is, implement the excess-less API as a nicety on top of the excess-full one, then implementing alloc_excess and dealloc suffices for supporting both types of allocators.

    The users that don't care or can't return or query the excess can just return the input size or layout (which is a tiny inconvenience), but the users that can handle and want to handle the excess don't need to implement any more methods.


    @sfackler

    I am getting that number from the fact that there are approx 4 things in total that use nallocx, compared to the set of things that use malloc.

    Given these facts about _excess usage in the Rust ecosystem:

    • 0 things in total use _excess in the rust ecosystem
    • 0 things in total use _excess in the rust std library
    • not even Vec and String can use the _excess API properly in the rust std library
    • the _excess API is unstable, out-of-sync with the excess-less API, buggy until very recently (did not even return the excess at all), ...

    and given these facts about the usage of _excess in other languages:

    • jemalloc's API is not natively supported by C or C++ programs due to backwards compatibility
    • C and C++ programs that want to use jemalloc's excess API need to go way out of their way to use it by:
      • opting out of the system allocator and into jemalloc (or tcmalloc)
      • re-implement their language's std library (in the case of C++, implement an incompatible std library)
      • write their whole stack on top of this incompatible std library
    • some communities (firefox uses it, facebook reimplements the collections in the C++ standard library to be able to use it, ...) still go out of their way to use it.

    These two arguments look plausible to me:

    • The excess API in std is not usable, therefore the std library cannot use it, therefore nobody can, which is why it isn't used even once in the Rust ecosystem.
    • Even though C and C++ make it close to impossible to use this API, big projects with manpower go to great lengths to use it, therefore at least some potentially tiny community of people care a lot about it.

    Your argument:

    • Nobody uses the _excess API, therefore only 0.01% of the people care about it.

    does not.

    gnzlbg at 2017-11-22 10:40:44

  199. @alexcrichton The decision to switch from -> Result<*mut u8, AllocErr> to -> *mut void may come as a significant surprise to people who followed the original development of the allocator RFC's.

    I don't disagree with the points you make, but it nonetheless seemed like a fair number of people would have been willing to live with the "heavy-weightness" of Result over the increased likelihood of missing a null-check on the returned value.

    • I am ignoring the runtime efficiency issues imposed by the ABI because I, like @alexcrichton, assume that we could deal with those in some way via compiler tricks.

    Is there some way we could get increased visibility on that late change on its own?

    One way (off the top of my head): Change the signature now, in a PR on its own, on the master branch, while Allocator is still unstable. And then see who complains on the PR (and who celebrates!).

    • Is this too heavy-handed? Seems like it is by definitions less heavy-handed than coupling such a change with stabilization...

    Felix S Klock II at 2017-11-22 11:59:30

  200. On the subject of whether to return *mut void or to return Result<*mut void, AllocErr>: Its possible that we should be revisiting the idea of separate "high-level" and "low-level" allocator traits, as discussed in take II of the Allocator RFC.

    Felix S Klock II at 2017-11-22 12:05:33

  201. (Obviously if I had a serious objection to the *mut void return value, then I would file it as a concern via the fcpbot. But at this point I'm pretty much trusting the judgement of the libs team, perhaps in some part due to fatigue over this allocator saga.)

    Felix S Klock II at 2017-11-22 12:13:08

  202. @pnkfelix

    The decision to switch from -> Result<*mut u8, AllocErr> to -> *mut void may come as a significant surprise to people who followed the original development of the allocator RFC's.

    The latter implies that, as discussed, the only error we care to express is OOM. Thus, a slightly lighter-weight in-between that still has the benefit of protection against accidental failure to check for errors is -> Option<*mut void>.

    Joshua Liebow-Feeser at 2017-11-22 15:08:25

  203. @gnzlbg

    The excess API in std is not usable, therefore the std library cannot use it, therefore nobody can, which is why it isn't used even once in the Rust ecosystem.

    Then go fix it.

    @pnkfelix

    On the subject of whether to return *mut void or to return Result<*mut void, AllocErr>: Its possible that we should be revisiting the idea of separate "high-level" and "low-level" allocator traits, as discussed in take II of the Allocator RFC.

    Those were basically our thoughts, except that the high level API would be in Alloc itself as alloc_one, alloc_array etc. We can even let those develop in the ecosystem first as extension traits to see what APIs people converge on.

    Steven Fackler at 2017-11-22 16:20:32

  204. @pnkfelix

    The reason I made Layout only implement Clone and not Copy is that I wanted to leave open the possibility of adding more structure to the Layout type. In particular, I still am interested in trying to have the Layout attempt to track any type structure used to construct it (e.g. 16-array of struct { x: u8, y: [char; 215] }), so that allocators would have the option of exposing instrumentation routines that report on what types their current contents are composes from.

    Has this been experimented with somewhere?

    gnzlbg at 2017-11-25 09:57:55

  205. @sfackler I have done most of it already, and all of it can be done with the duplicated API (no excess + _excess methods). I would be fine with having two APIs and not having a complete _excess API right now.

    The only thing that still worries me a bit is that to implement an allocator right now one needs to implement alloc + dealloc, but alloc_excess + dealloc should also work as well. Would it be possible to give alloc a default implementation in terms of alloc_excess later or is that a not possible or a breaking change? In practice, most allocators are going to implement most methods anyways, so this is not a big deal, but more like a wish.


    jemallocator implements Alloc twice (for Jemalloc and &Jemalloc), where the Jemalloc implementation for some method is just a (&*self).method(...) that forwards the method call to the &Jemalloc implementation. This means that one must manually keep both implementations of Alloc for Jemalloc in sync. Whether getting different behaviors for the &/_ implementations can be tragic or not, I don't know.


    I've found it very hard to find out what people are actually doing with the Alloc trait in practice. The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox), and are only using it to change the global allocator. It worries me a lot that I could not find any project that uses it as a collection type parameter (maybe I was just unlucky and there are some?). I was particularly looking for examples of implementing SmallVec and ArrayVec on top of a Vec-like type (since std::Vec doesn't have an Alloc type parameter yet), and also was wondering how cloning between these types (Vecs with a different Allocator) would work (the same probably applies to cloning Boxes with different Allocs). Are there examples of how these implementations would look like somewhere?

    gnzlbg at 2017-11-29 09:56:12

  206. The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

    For what it’s worth, Servo is trying to move off of unstable features where possible: https://github.com/servo/servo/issues/5286

    This is also a chicken-and-egg problem. Many projects don’t use Alloc yet because it’s still unstable.

    Simon Sapin at 2017-11-29 10:04:21

  207. It's not really clear to me why we should have a full complement of _excess APIs in the first place. They originally existed to mirror jemalloc's experimental *allocm API, but those were removed in 4.0 several years ago in favor of not duplicating their entire API surface. It seems like we could follow their lead?

    Would it be possible to give alloc a default implementation in terms of alloc_excess later or is that a not possible or a breaking change?

    We can add a default implementation of alloc in terms of alloc_excess, but alloc_excess will need to have a default implementation in terms of alloc. Everything works fine if you implement one or both, but if you don't implement either, your code will compile but infinitely recurse. This has come up before (maybe for Rand?), and we could have some way of saying that you need to implement at least one of those functions but we don't care which.

    It worries me a lot that I could not find any project that uses it as a collection type parameter (maybe I was just unlucky and there are some?).

    I don't know of anyone that is doing that.

    Steven Fackler at 2017-11-29 17:27:08

  208. The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

    One big thing preventing this from moving forward is that stdlib collections don't support parametric allocators yet. That pretty much precludes most other crates as well, since most external collections use internal ones under the hood (Box, Vec, etc).

    Joshua Liebow-Feeser at 2017-11-29 22:04:32

  209. The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

    One big thing preventing this from moving forward is that stdlib collections don't support parametric allocators yet. That pretty much precludes most other crates as well, since most external collections use internal ones under the hood (Box,Ā Vec, etc).

    This applies to me -- I've got a toy kernel, and if I could I'd be using Vec<T, A>, but instead I have to have an interior-mutable global allocator facade, which is gross.

    remexre at 2017-11-30 03:46:48

  210. @remexre how is parameterizing your data structures going to avoid global state with interior mutability?

    Steven Fackler at 2017-11-30 03:50:23

  211. There will still be interior-mutable global state I suppose, but it feels a lot safer to have a setup where the global allocator is unusable until memory is fully mapped than to have a global set_allocator function.


    EDIT: Just realized I didn't answer the question. Right now, I've got something like:

    struct BumpAllocator{ ... }
    struct RealAllocator{ ... }
    struct LinkedAllocator<A: 'static + AreaAllocator> {
        head: Mutex<Option<Cons<A>>>,
    }
    #[global_allocator]
    static KERNEL_ALLOCATOR: LinkedAllocator<&'static mut (AreaAllocator + Send + Sync)> =
        LinkedAllocator::new();
    

    where AreaAllocator is a trait that lets me (at runtime) verify that the allocators aren't accidentally "overlapping" (in terms of the address ranges they allocate into). BumpAllocator is only used very early on, for scratch space when mapping the rest of memory to create the RealAllocators.

    Ideally, I'd like to have a Mutex<Option<RealAllocator>> (or a wrapper that makes it "insert-only") be the only allocator, and have everything allocated early on be parameterized by the early-boot BumpAllocator. This'd also let me ensure that the BumpAllocator doesn't get used after early-boot, since the stuff I allocate couldn't outlive it.

    remexre at 2017-11-30 03:55:14

  212. @sfackler

    It's not really clear to me why we should have a full complement of _excess APIs in the first place. They originally existed to mirror jemalloc's experimental *allocm API, but those were removed in 4.0 several years ago in favor of not duplicating their entire API surface. It seems like we could follow their lead?

    Currently shrink_in_place calls xallocx which returns the real allocation size. Because shrink_in_place_excess does not exist, it throws this size away, and users must call nallocx to recompute it, whose cost really depends on how big the allocation is.

    So at least some jemalloc allocation functions that we are already using are returning us the usable size, but the current API does not allow us to use it.

    gnzlbg at 2017-11-30 08:40:34

  213. @remexre

    When I was working on my toy kernel, avoiding the global allocator to ensure no allocation happened until an allocator was set up was a goal of mine too. Glad to hear I'm not the one one!

    John Ericson at 2017-11-30 16:14:45

  214. I do not like the word Heap for the default global allocator. Why not Default?

    Another point of clarification: RFC 1974 puts all this stuff in std::alloc but it's currently in std::heap. Which location is being proposed for stabilization?

    jethrogb at 2017-11-30 18:56:47

  215. @jethrogb "Heap" is a pretty canonical term for "that thing malloc gives you pointers to" - what are your concerns with the term?

    Steven Fackler at 2017-11-30 19:20:33

  216. @sfackler

    "that thing malloc gives you pointers to"

    Except in my mind that is what System is.

    jethrogb at 2017-11-30 19:58:09

  217. Ah sure. Global is another name then maybe? Since you use #[global_allocator] to select it.

    Steven Fackler at 2017-11-30 20:02:56

  218. There can be multiple heap allocators (e.g. libc and prefixed jemalloc). How about renaming std::heap::Heap to std::heap::Default and #[global_allocator] to #[default_allocator]?

    The fact that it’s what you get if you don’t specify otherwise (presumably when for example Vec gains an extra type parameter / field for the allocator) is more important than the fact that it doesn’t have "per-instances" state (or instances really).

    Simon Sapin at 2017-11-30 22:31:18

  219. The final comment period is now complete.

    Rust RFC bot at 2017-12-01 22:58:28

  220. Regarding FCP, I think that the API subset that was proposed for stabilization is of very limited use. For example, it does not support the jemallocator crate.

    Simon Sapin at 2017-12-01 23:52:35

  221. In what way? jemallocator may have to flag off some of the impls of unstable methods behind a feature flag but that's it.

    Steven Fackler at 2017-12-02 00:30:10

  222. If jemallocator on stable Rust cannot implement for example Alloc::realloc by calling je_rallocx but needs to rely on the default alloc + copy + dealloc impl, then it’s not an acceptable replacement for the standard library’s alloc_jemalloc crate IMO.

    Simon Sapin at 2017-12-02 18:17:25

  223. Sure, you could get something to compile, but it’s not a particularly useful thing.

    Simon Sapin at 2017-12-02 18:18:04

  224. Why? C++ doesn't have any concept of realloc at all in its allocator API and that doesn't appear to have crippled the language. It's obviously not ideal, but I don't understand why it would be unacceptable.

    Steven Fackler at 2017-12-02 23:48:58

  225. C++ collections generally don’t use realloc because C++ move constructors can run arbitrary code, not becuse realloc is not useful.

    And the comparison is not with C++, it’s with the current Rust standard library with built-in jemalloc support. Switching to and out-of-std allocator with only this subset of Alloc API would be a regression.

    And realloc is an example. jemallocator currently also implements alloc_zeroed, alloc_excess, usable_size, grow_in_place, etc.

    Simon Sapin at 2017-12-03 08:39:42

  226. alloc_zeroed is proposed to be stabilized. As far as I can tell (look upthread), there are literally zero uses of alloc_excess in existence. Could you show some code that will regress if that falls back to a default implementation.

    More generally, though, I don't see why this is an argument against stabilizing a portion of these APIs. If you don't want to use jemallocator, you can continue not using it.

    Steven Fackler at 2017-12-03 16:48:49

  227. Could Layout::array<T>() be made a const fn?

    Mikhail Zabaluev at 2017-12-04 12:29:18

  228. It can panic, so not at the moment.

    Steven Fackler at 2017-12-04 16:57:14

  229. It can panic, so not at the moment.

    I see... I'd settle for const fn Layout::array_elem<T>() that would be a non-panicking equivalent of Layout::<T>::repeat(1).0.

    Mikhail Zabaluev at 2017-12-04 17:57:45

  230. @mzabaluev I think what you're describing is equivalent to Layout::new<T>(). It can currently panic, but that's just because it's implemented using Layout::from_size_align and then .unwrap(), and I expect it could be done differently.

    Joshua Liebow-Feeser at 2017-12-04 18:48:44

  231. @joshlf I think this struct has the size of 5, while as elements of an array these are placed at every 8 bytes due to the alignment:

    struct Foo {
        bar: u32,
        baz: u8
    }
    

    I'm not sure that an array of Foo would include the padding of the last element for its size calculation, but that's my strong expectation.

    Mikhail Zabaluev at 2017-12-04 19:21:34

  232. In Rust, the size of an object is always a multiple of its alignment so that the address of the nth element of an array is always array_base_pointer + n * size_of<T>(). So the size of an object in an array is always the same as the size of that object on its own. See the Rustonomicon page on repr(Rust) for more details.

    Joshua Liebow-Feeser at 2017-12-04 19:29:29

  233. OK, it turns out that a struct is padded to its alignment, but AFAIK this is not a stable guarantee except in #[repr(C)]. Anyway, making Layout::new a const fn would be welcome as well.

    Mikhail Zabaluev at 2017-12-04 19:29:35

  234. This is the documented (and so guaranteed) behavior of a stable function:

    https://doc.rust-lang.org/std/mem/fn.size_of.html

    Returns the size of a type in bytes.

    More specifically, this is the offset in bytes between successive elements in an array with that item type including alignment padding. Thus, for any type T and length n, [T; n] has a size of n * size_of::<T>().

    Simon Sapin at 2017-12-04 22:04:05

  235. Thanks. I just realized that any const fn that multiplies the result of Layout::new would be inherently panicky in turn (unless it's done with saturating_mul or some such), so I'm back to square one. Continuing with a question about panics in the const fn tracking issue.

    Mikhail Zabaluev at 2017-12-05 05:13:14

  236. The panic!() macro is currently not supported in constant expressions, but panics from checked arithmetic are generated by the compiler and not affected by that limitation:

    error[E0080]: constant evaluation error
     --> a.rs:1:16
      |
    1 | const A: i32 = i32::max_value() * 2;
      |                ^^^^^^^^^^^^^^^^^^^^ attempt to multiply with overflow
    
    error: aborting due to previous error
    

    Simon Sapin at 2017-12-05 09:06:30

  237. This is related to Alloc::realloc but not to stabilization of the minimal interface (realloc is not part of it):

    Currently, because Vec::reserve/double call RawVec::reserve/double which call Alloc::realloc, the default impl of Alloc::realloc copies dead vector elements (in the [len(), capacity()) range). In the absurd case of a huge empty vector that wants to insert capacity() + 1 elements and thus reallocates, the cost of touching all that memory is not insignificant.

    In theory, if the default Alloc::realloc implementation would also take a "bytes_used" range, it could just copy the relevant part on reallocation. In practice, at least jemalloc overrides Alloc::realloc default impl with a call to rallocx. Whether doing an alloc/dealloc dance copying only the relevant memory is faster or slower than a rallocx call will probably depend on many things (does rallocx manage to expand the block in place? how much unnecessary memory will rallocx copy? etc.).

    gnzlbg at 2017-12-08 16:25:15

  238. https://github.com/QuiltOS/rust/tree/allocator-error I've started to demonstrate how I think associated error type solves our collections and error-handling problems by doing the generalization itself. In particular, note how in the modules I change that I

    • Always reuse the Result<T, A::Err> implementation for the T implemetation
    • Never unwrap or anything else partial
    • No oom(e) outside of AbortAdapter.

    This means that the changes I'm making are quite safe, and quite mindless too! Working with both the error-returning and error-aborting should not require extra effort to maintain mental invariants---the type checker does all the work.

    I recall---I think in @Gankro's RFC? or the pre-rfc thread---Gecko/Servo people saying it was nice to not have the fallibility of collections be part of their type. Well, I can add a #[repr(transparent)] to AbortAdapter so that collections can safely be transmuted between Foo<T, A> and Foo<T, AbortAdapter<A>> (within safe wrappers), allowing one to freely switch back and forth without duplicating every method. [For back-compat, the standard library collections will need to be duplicated in any event, but user methods need not be as Result<T, !> is these days quite easy to work with.]

    ~~Unfortunately the code won't fully type check because changing the type params of a lang item (box) confuses the compiler (surprise!). But hopefully what I have so far gives a flavor of what I am doing. The ICE-causing box commit is the last one---everything before it is good.~~ @eddyb fixed rustc in #47043!

    edit @joshlf I was informed of your https://github.com/rust-lang/rust/pull/45272, and incorporated that in here. Thanks!

    John Ericson at 2017-12-28 03:44:25

  239. Persistent Memory (eg http://pmem.io) is the next big thing, and Rust needs to be positioned to work well with it.

    I've been working recently on a Rust wrapper for a persistent memory allocator (specifically, libpmemcto). Whatever decisions are made regarding the stabilisation of this API, it needs to:-

    • Be able to support a performant wrapper around a persistent memory allocator like libpmemcto;
    • Be able to specify (parameterize) collection types by allocator (at the moment, one needs to duplicate Box, Rc, Arc, etc)
    • Be able to clone data across allocators
    • Be able to support having persistent memory stored structs with fields that are re-initialized on instantiation of a persistent memory pool, ie, some persistent memory structs needs to have fields that are only stored temporarily on the heap. My current use cases are a reference to the persistent memory pool used for allocation and transient data used for locks.

    As an aside, the pmem.io development (Intel's PMDK) makes heavy use of a modified jemalloc allocator under the covers - so it seems prudent that using jemalloc as an example API consumer would be prudent.

    Raphael Cohn at 2018-01-04 18:36:09

  240. Would it be possible to reduce the scope of this to cover only GlobalAllocators first until we gain more experience with using Allocators in collections?

    IIUC this already would serve servo's needs and would allow us to experiment with parametrizing containers in parallel. In the future we can either move collections to use GlobalAllocator instead or just add a blanket impl of Alloc for GlobalAllocator so that these can be used for all collections.

    Thoughts?

    gnzlbg at 2018-01-16 12:10:01

  241. @gnzlbg For the #[global_allocator] attribute to be useful (beyond selecting heap::System) the Alloc trait needs to be stable too, so that it can be implemented by crates like https://crates.io/crates/jemallocator. There is no type or trait named GlobalAllocator at the moment, are you proposing some new API?

    Simon Sapin at 2018-01-16 12:41:03

  242. here is no type or trait named GlobalAllocator at the moment, are you proposing some new API?

    What I suggested is renaming the "minimal" API that @alexcrichton suggested to stabilize here from Alloc to GlobalAllocator to represent only global allocators, and leaving the door open for collections to be parametrized by a different allocator trait in the future (which does not mean that we can't parametrize them by the GlobalAllocator trait).

    IIUC servo currently only needs to be able to switch the global allocator (as opposed to also being able to parametrize some collections by an allocator). So maybe instead of trying to stabilize a solution that should be future proofed for both use-cases, we can address only the global allocator issue now, and figure out how to parametrize collections by allocators later.

    Don't know if that makes sense.

    gnzlbg at 2018-01-16 13:01:47

  243. IIUC servo currently only needs to be able to switch the global allocator (as opposed to also being able to parametrize some collections by an allocator).

    That is correct, but:

    • If a trait and its method are stable so that it can be implemented, then it can also be called directly without going through std::heap::Heap. So it’s not only a trait global allocator, it’s a trait for allocators (even if we end up making a different one for collections generic over allocators) and GlobalAllocator is not a particularly good name.
    • The jemallocator crate currently implements alloc_excess, realloc, realloc_excess, usable_size, grow_in_place, and shrink_in_place which are not part the proposed minimal API. These can be more efficient than the default impl, so removing them would be a performance regression.

    Simon Sapin at 2018-01-16 13:26:42

  244. Both points make sense. I just thought that the only way to significantly accelerate the stabilization of this feature was to cut out a dependency on it also being a good trait for parametrizing collections over it.

    gnzlbg at 2018-01-16 13:39:15

  245. [It would be nice if Servo could be like (stable | official mozilla crate), and cargo could enforce this, to remove a little pressure here.]

    John Ericson at 2018-01-16 18:23:07

  246. @Ericson2314 servo is not the only project that wants to use these APIs.

    Steven Fackler at 2018-01-16 18:32:29

  247. @Ericson2314 I don’t understand what this means, could you rephrase?

    Simon Sapin at 2018-01-17 07:24:04

  248. For context: Servo currently uses a number unstable features (including #[global_allocator]), but we’re trying to slowly move away from that (either by updating to a compiler that has stabilized some features, or by finding stable alternatives.) This is tracked at https://github.com/servo/servo/issues/5286. So stabilizing #[global_allocator] would be nice, but it’s not blocking any Servo work.

    Firefox relies on the fact that Rust std defaults to the system allocator when compiling a cdylib, and that mozjemalloc which ends up being linked into the same binary defines symbols like malloc and free that "shadow" (I don’t know the proper linker terminology) those from libc. So allocations from Rust code in Firefox ends up using mozjemalloc. (This is on Unix, I don’t know how it works on Windows.) This works out, but it feels fragile to me. Firefox uses stable Rust, and I’d like it to use #[global_allocator] to explicitly select mozjemalloc to make the whole setup is more robust.

    Simon Sapin at 2018-01-17 08:09:24

  249. @SimonSapin the more that I play with allocators and collections, the more I tend to think that we don't want to parametrize the collections by Alloc yet, because depending on the allocator, a collection might want to offer a different API, the complexity of some operations change, some collection details actually depend on the allocator, etc.

    So I would like to suggest a way in which we can make progress here.

    Step 1: Heap allocator

    We could restrict ourselves at first to try to let users select the allocator for the heap (or the system/platform/global/free-store allocator, or however you prefer to name it) in stable Rust.

    The only thing that we initially parametrize by it is Box, which only needs to allocate (new) and deallocate (drop) memory.

    This allocator trait could initially have the API that @alexcrichton proposed (or somewhat extended), and this allocator trait could, on nightly, still have a slightly extended API to support the std:: collections.

    Once we are there, users that want to migrate to stable will be able to do so, but might get a performance hit, because of the unstable API.

    Step 2: Heap allocator without performance hit

    At that point, we can re-evaluate the users that can't move to stable because of a performance hit, and decide how to extend this API and stabilize that.

    Steps 3 to N: supporting custom allocators in std collections.

    First, this is hard, so it might never happen, and I think it never happening isn't a bad thing.

    When I want to parametrize a collection with a custom allocator I either have a performance problem or an usability problem.

    If I have an usability problem I typically want a different collection API that exploits features of my custom allocator, like for example my SliceDeque crate does. Parametrizing a collection by a custom allocator won't help me here.

    If I have a performance problem, it would still be very hard for a custom allocator to help me. I am going to consider Vec in the next sections only, because it is the collection I've reimplemented most often.

    Reduce the number of system allocator calls (Small Vector Optimization)

    If I want to allocate some elements inside the Vec object to reduce the number of calls to the system allocator, today I just use SmallVec<[T; M]>. However, a SmallVec is not a Vec:

    • moving a Vec is O(1) in the number of elements, but moving a SmallVec<[T; M]> is O(N) for N < M and O(1) afterwards,

    • pointers to the Vec elements are invalidated on move if len() <= M but not otherwise, that is, if len() <= M operations like into_iter need to move the elements into the iterator object itself, instead of just taking pointers.

    Could we make Vec generic over an allocator to support this? Everything is possible, but I think that the most important costs are:

    • doing so makes the implementation of Vec more complex, which might impact users not using this feature
    • the documentation of Vec would become more complex, because the behavior of some operations would depend on the allocator.

    I think these costs are non-negligible.

    Make use of allocation patterns

    The growth-factor of a Vec is tailored to a particular allocator. In std we can tailor it to the common ones jemalloc/malloc/... but if you are using a custom allocator, chances are that the growth-factor we choose by default won't be the best for your use case. Should every allocator be able to specify a growth-factor for vec-like allocation patterns? I don't know but my gut feeling tells me: probably not.

    Exploit extra features of your system allocator

    For example, an over-committing allocator is available in most of the Tier 1 and Tier 2 targets. In Linux-like and Macos systems the heap allocator overcommits by default, while the Windows API exposes VirtualAlloc which can be used to reserve memory (e.g. on Vec::reserve/with_capacity) and commit memory on push.

    Currently the Alloc trait doesn't expose a way to implement such an allocator on Windows, because it doesn't separate the concepts of commiting and reserving memory (on Linux a non-over-commiting allocator can be hacked in by just touching each page once). It also doesn't expose a way for an allocator to state whether it over-commits or not by default on alloc.

    That is, we would need to extend the Alloc API to support this for Vec, and that would be IMO for little win. Because when you have such an allocator, Vec semantics change again:

    • Vec doesn't need to grow ever again, so operations like reserve make little sense
    • push is O(1) instead of amortized O(1).
    • iterators to live objects are never invalidates as long as the object is alive, which allows some optimizations

    Exploit more extra features of your system allocator

    Some system alloctors like cudaMalloc/cudaMemcpy/... differentiate between pinned and non-pinned memory, allow you to allocate memory on disjoint address spaces (so we would need an associated Pointer type in the Alloc trait), ...

    But using these on collections like Vec does again change the semantics of some operations in subtle ways, like whether indexing a vector suddenly invokes undefined behavior or not, depending on whether you do so from a GPU kernel or from the host.

    Wrapping up

    I think that trying to come up with an Alloc API that can be used to parametrize all collections (or only even Vec) is hard, probably too hard.

    Maybe after we get global/system/platform/heap/free-store allocators right, and Box, we can rethink the collections. Maybe we can reuse Alloc, maybe we need a VecAlloc, VecDequeAlloc, HashMapAlloc`, ... or maybe we just say, "you know what, if you really need this, just copy-paste the standard collection into a crate, and mold it to your allocator". Maybe the best solution is to just make this easier, by having std collections in its own crate (or crates) in the nursery and using only stable features, maybe implemented as a set of building blocks.

    Anyways, I think trying to tackle all these problems here at once and trying to come up with an Alloc trait that is good for everything is too hard. We are at step 0. I think that the best way to get to step 1 and step 2 quick is to leave collections out of the picture until we are there.

    gnzlbg at 2018-01-17 10:47:45

  250. Once we are there, users that want to migrate to stable will be able to do so, but might get a performance hit, because of the unstable API.

    Picking a custom allocator is usually about improving performance, so I don’t know who this initial stabilization would serve.

    Simon Sapin at 2018-01-17 14:12:40

  251. Picking a custom allocator is usually about improving performance, so I don’t know who this initial stabilization would serve.

    Everybody? At least right now. ~~Most~~ Some of the methods you complain are missing in the initial stabilization proposal (alloc_excess, for example), are AFAIK not used by anything in the standard library yet. Or did this change recently?

    gnzlbg at 2018-01-17 14:18:10

  252. Vec (and other users of RawVec) use realloc in push

    Simon Sapin at 2018-01-17 14:24:50

  253. @SimonSapin

    The jemallocator crate currently implements alloc_excess, realloc, realloc_excess, usable_size, grow_in_place, and shrink_in_place

    From these methods, AFAIK realloc, grow_in_place, and shrink_in_place are used but grow_in_place is only a naive wrapper over shrink_in_place for jemalloc at least so if we implemented the default unstable impl of grow_in_place in terms of shrink_in_place in the Alloc trait, that cuts it down to two methods: realloc and shrink_in_place.

    Picking a custom allocator is usually about improving performance,

    While this is true, you might get more performance from using a more suited allocator without these methods, than a bad allocator that has them.

    IIUC the main use case for servo was to use Firefox jemalloc instead of having a second jemalloc around, was that right?

    gnzlbg at 2018-01-17 14:25:45

  254. Even if we add realloc and shrink_in_place to the Alloc trait in an initial stabilization, that would only delay the performance complaints.

    For example, the moment we add any unstable API to the Alloc trait that ends up being using by the std collections, you wouldn't be able to get the same performance on stable than you would be able to get on nightly. That is, if we add realloc_excess and shrink_in_place_excess to the alloc trait and make Vec/String/... use them, that we stabilized realloc and shrink_in_place wouldn't have helped you a single bit.

    gnzlbg at 2018-01-17 14:34:20

  255. IIUC the main use case for servo was to use Firefox jemalloc instead of having a second jemalloc around, was that right?

    Although they share some code, Firefox and Servo are two separate projects/applications.

    Firefox use mozjemalloc, which is a fork of an old version of jemalloc with a bunch of features added. I think that some unsafe FFI code relies for correctness and soundness on mozjemalloc being used by Rust std.

    Servo uses jemalloc which happens to be Rust’s default for executables at the moment, but there are plans to change that default to the system’s allocator. Servo also has some unsafe memory usage reporting code that relies for soundness on jemalloc being used indeed. (Passing Vec::as_ptr() to je_malloc_usable_size.)

    Simon Sapin at 2018-01-17 14:40:39

  256. Servo uses jemalloc which happens to be Rust’s default for executables at the moment, but there are plans to change that default to the system’s allocator.

    It would be good to know if the system allocators in the systems that servo targets provide optimized realloc and shrink_to_fit APIs like jemalloc does? realloc (and calloc) are very common, but shrink_to_fit (xallocx) is AFAIK specific to jemalloc. Maybe the best solution would be to stabilize realloc and alloc_zeroed (calloc) in the initial implementation, and leave shrink_to_fit for later. That should allow servo to work with system allocators in most platforms without performance issues.

    Servo also has some unsafe memory usage reporting code that relies for soundness on jemalloc being used indeed. (Passing Vec::as_ptr() to je_malloc_usable_size.)

    As you know, the jemallocator crate has APIs for this. I expect that crates similar to the jemallocator crate will pop up for other allocators offering similar APIs as the global allocator story begins getting stabilized. I haven't thinked about whether these APIs belong in the Alloc trait at all.

    gnzlbg at 2018-01-17 14:50:26

  257. I don’t think malloc_usable_size needs to be in the Alloc trait. Using #[global_allocator] to be confident what allocator is used by Vec<T> and separately using a function from the jemallocator crate is fine.

    Simon Sapin at 2018-01-17 14:55:49

  258. @SimonSapin once the Alloc trait becomes stable, we'll probably have a crate like jemallocator for Linux malloc and Windows. These crates could have an extra feature to implement the parts that they can of the unstable Alloc API (like, e.g., usable_size on top of malloc_usable_size) and some other things that are not part of the Alloc API, like memory reporting on top of mallinfo. Once there are usable crates for the systems that servo targets it would be easier to know which parts of the Alloc trait to prioritize stabilizing, and we'll probably find out newer APIs that should be at least experimented with for some allocators.

    gnzlbg at 2018-01-17 15:07:34

  259. @gnzlbg I'm a bit skeptical of the things in https://github.com/rust-lang/rust/issues/32838#issuecomment-358267292. Leaving out all those system-specific stuff, it's not hard to generalize collections for alloc--I've done it. Trying to incorporate that seems like a separate challenge.

    @SimonSapin Does firefox have a no unstable Rust policy? I think I was getting confused: Firefox and Servo want this, but if so its Firefox's use-case that would be adding pressure to stabilize.

    @sfackler See that ^. I was trying to make a distinction between projects that need vs want this stable, but Servo is on the other side of that divide.

    John Ericson at 2018-01-17 21:11:50

  260. I have a project that wants this and requires it to be stable. There is nothing particularly magical about either Servo or Firefox as consumers of Rust.

    Steven Fackler at 2018-01-17 21:15:13

  261. @Ericson2314 Correct, Firefox uses stable: https://wiki.mozilla.org/Rust_Update_Policy_for_Firefox. As IĀ explained though there’s working solution today, so this is not a real blocker for anything. It’d be nicer / more robust to use #[global_allocator], that’s all.

    Servo does use some unstable features, but as mentioned we’re trying to change that.

    Simon Sapin at 2018-01-17 21:26:05

  262. FWIW, parametric allocators are very useful to implement allocators. A lot of the less performance-sensitive bookkeeping becomes much easier if you can use various data structures internally and parametrize them by some simpler allocator (like bsalloc). Currently, the only way to do that in a std environment is to have a two-phase compilation in which the first phase is used to set your simpler allocator as the global allocator and the second phase is used to compile the larger, more complicated allocator. In no-std, there's no way to do it at all.

    Joshua Liebow-Feeser at 2018-01-17 21:31:15

  263. @Ericson2314

    Leaving out all those system-specific stuff, it's not hard to generalize collections for alloc--I've done it. Trying to incorporate that seems like a separate challenge.

    Do you have an implementation of ArrayVec or SmallVec on top of Vec + custom allocators that I could look at? That was the first point I mentioned, and that's not system specific at all. Arguably that would be the simplest two allocators imaginable, one is just a raw array as storage, and the other one can be built on top of the first one by adding a fallback to the Heap once the array runs out of capacity. The main difference is that those allocators are not "global", but each of the Vecs has its own allocator independent of all others, and these allocators are stateful.

    Also, I am not arguing to never do this. I am just stating that this is very hard: C++ has been trying for 30 years with only partial-success: GPU allocators and GC allocators work due to the generic pointer types, but implementing ArrayVec and SmallVec on top of Vec does not result in a zero-cost abstraction in C++ land (P0843r1 discusses some of the issues for ArrayVec in detail).

    So I'd just prefer if we would pursue this after we stabilize the pieces that do deliver something useful as long as these don't make pursuing custom collection allocators in the future.


    I talked a bit with @SimonSapin on IRC and if we were to extend the initial stabilization proposal with realloc and alloc_zeroed, then Rust in Firefox (which only uses stable Rust) would be able to use mozjemalloc as a global allocator in stable Rust without the need for any extra hacks. As mentioned by @SimonSapin, Firefox currently has a workable solution for this today, so while this would be nice, it doesn't seem to be very high priority.

    Still, we could start there, and once we are there, move servo to stable #[global_allocator] without a performance loss.


    @joshlf

    FWIW, parametric allocators are very useful to implement allocators.

    Could you elaborate a bit more on what you mean? Is there a reason why you can't parametrize your custom allocators with the Alloc trait? Or your own custom allocator trait and just implement the Alloc trait on the final allocators (these two traits do not necessarily need to be equal)?

    gnzlbg at 2018-01-18 09:10:46

  264. I don't understand where the use case of "SmallVec = Vec + special allocator" comes from. It's not something I've seen mentioned a lot before (neither in Rust nor in other contexts), precisely because it has many serious problems. When I think of "improving performance with a specialized allocator", that's not at all what I think of.

    Hanna Kruppe at 2018-01-18 10:49:10

  265. Looking over the Layout API, I was wondering about the differences in error handling between from_size_align and align_to, where the former returns None in case of an error, while the latter panics (!).

    Wouldn't it be more helpful and consistent to add a suitably defined and informative LayoutErr enum and return a Result<Layout, LayoutErr> in both cases (and perhaps use it for the other functions that currently return an Option as well)?

    Roland Steiner at 2018-01-18 11:09:54

  266. @rkruppe

    I don't understand where the use case of "SmallVec = Vec + special allocator" comes from. It's not something I've seen mentioned a lot before (neither in Rust nor in other contexts), precisely because it has many serious problems. When I think of "improving performance with a specialized allocator", that's not at all what I think of.

    There are two independent ways of using allocators in Rust and C++: the system allocator, used by all allocations by default, and as a type argument for a collection parametrized by some allocator trait, as a way to create an object of that particular collection that uses a particular allocator (which can be the system's allocator or not).

    What follows focus only on this second use case: using a collection and an allocator type to create an object of that collection that uses a particular allocator.

    In my experience with C++, parametrizing a collection with an allocator serves two use cases:

    • improve performance of a collection object by making the collection use a custom allocator targeted at a specific allocation pattern, and/or
    • add a new feature to a collection allowing it to do something that it couldn't do before.

    Adding new features to collections

    This is the use case of allocators that I see in C++ code-bases 99% of the time. The fact that adding a new feature to a collection improves performance is, in my opinion, coincidental. In particular, none of the following allocators improves performance by targeting an allocation pattern. They do so by adding features, that in some cases, as @Ericson2314 mentions, can be considered "system-specific". These are some examples:

    • stack allocators for doing small buffer optimizations (see Howard Hinnant's stack_alloc paper). They let you use std::vector or flat_{map,set,multimap,...} and by passing it a custom allocator you add in a small buffer optimization with (SmallVec) or without (ArrayVec) heap fall back. This allows, for example, putting a collection with its elements on the stack or static memory (where it would have otherwise used the heap).

    • segmented memory architectures (like 16-bit wide pointer x86 targets and GPGPUs). For example, C++17 Parallel STL was, during C++14, the Parallel Technical Specification. It's precursor library of the same author is NVIDIA's Thrust library, which includes allocators to allow container clases to use GPGPU memory (e.g. thrust::device_malloc_allocator) or pinned memory (e.g. thrust::pinned_allocator; pinned memory allows faster transfer between host-device in some cases).

    • allocators to solve parallelism-related issues, like false sharing (e.g. Intel Thread Building Blocks cache_aligned_allocator) or over-alignment requirements of SIMD types (e.g. Eigen3's aligned_allocator).

    • interprocess shared memory: Boost.Interprocess has allocators that allocate the collection's memory using OS interprocess shared memory facilities (e.g. like System V shared memory). This allows to directly use a std container to manage memory used to communicate between different processes.

    • garbage collection: Herb Sutter's deferred memory allocation library uses a user-defined pointer type to implement allocators that garbage collect memory. So that, for example, when a vector grows, the old chunk of memory is maintained alive till all pointers to that memory have been destroyed, avoiding iterator invalidation.

    • instrumented allocators: Bloomberg's Software Library's blsma_testallocator allows you to log memory alloctation/deallocation (and C++-specific object construction/destruction) patterns of the objects where you use it. You don't know if a Vec allocates after reserve ? Plug in such an allocator, and they will tell you if it happens. Some of these allocators let you name them, so that you can use it on multiple objects and get logs saying which object is doing what.

    These are the types of allocators that I see most often in the wild in C++. As I mentioned before, the fact that they improve performance in some cases, is, in my opinion, coincidental. The important part is that none of them tries to target a particular allocation pattern.

    Targeting allocation patterns to improve performance.

    AFAIK there aren't any widely used C++ allocators that do this and I'll explain why I think this is in a second. The following libraries target this use case:

    However, these libraries do not really provide a single allocator for a particular use case. Instead, they provide allocator building blocks that you can use to build custom allocators targeted at the particular allocation pattern in a particular part of an application.

    The general advice that I recall from my C++ days was to just "don't use them" (they are the very last resort) because:

    • matching the system allocator's performance is very hard, beating it is very very hard,
    • the chances of someone else's application memory allocation pattern matching yours is slim, so you really need to know your allocation pattern and know what allocator building blocks you need to match it
    • they are not portable because different vendors have different C++ standard library implementations which do use different allocation patterns; vendors typically target their implementation at their system allocators. That is, a solution tailored to one vendor might perform horribly (worse than the system allocator) in another.
    • there are many alternatives one can exhaust before trying to use these: using a different collection, reserving memory, ... Most of the alternatives are lower effort and can deliver larger wins.

    This does not mean that libraries for this use case aren't useful. They are, which is why libraries like foonathan/memory are blooming. But at least in my experience they are way less used in the wild than allocators that "add extra features" because to deliver a win you must beat the system allocator, which is requires more time than most users are willing to invest (Stackoverflow is full of questions of the type "I used Boost.Pool and my performance got worse, what can I do? Not use Boost.Pool.").

    Wrapping up

    IMO I think it is great that the C++ allocator model, though far from perfect, supports both use cases, and I think that if Rust's std collections are to be parametrized by allocators, they should support both use cases as well, because at least in C++ allocators for both cases have turned out to be useful.

    I just think this problem is slightly orthogonal to being able to customize the global/system/platform/default/heap/free-store allocator of a particular application, and that trying to solve both problems at the same time might delay the solution for one of them unnecessarily.

    What some users want to do with a collection parametrized by an allocator might be way different from what some other users want to do. If @rkruppe starts from "matching allocation patterns" and I start from "preventing false sharing" or "using a small buffer optimization with heap fallback" it's just going to be hard to first, understand the needs of each other, and second, arrive at a solution that works for both.

    gnzlbg at 2018-01-18 13:40:18

  267. @gnzlbg Thanks for the comprehsive write-up. Most of it doesn't address my original question and I disagree with some of it, but it's good to have it spelled out so we don't talk past each other.

    My question was specifically about this application:

    stack allocators for doing small buffer optimizations (see Howard Hinnant's stack_alloc paper). They let you use std::vector or flat_{map,set,multimap,...} and by passing it a custom allocator you add in a small buffer optimization with (SmallVec) or without (ArrayVec) heap fall back. This allows, for example, putting a collection with its elements on the stack or static memory (where it would have otherwise used the heap).

    Reading about stack_alloc, I realize now how that can work. It's not what people usually mean by SmallVec (where the buffer is stored inline in the collection), which is why I missed that option, but it side steps the problem of having to update pointers when the collection moves (and also makes those moves cheaper). Also note that short_alloc allows multiple collections to share one arena, which makes it even more unlike typical SmallVec types. It's more like a linear/bump-pointer allocator with graceful fallback to heap allocation when running of out alotted space.

    I disagree that this sort of allocator and cache_aligned_allocator are fundamentally adding new features. They are used differently, and depending on your definition of "allocation pattern" they may not optimize for a specific allocation pattern. However, they certainly optimize for specific use cases and they don't have any significant behavioral differences from general-purpose heap allocators.

    I do however agree that use cases like Sutter's deferred memory allocation, which substantially change what a "pointer" even means, are a separate application that may need a separate design if we want to support it at all.

    Hanna Kruppe at 2018-01-18 14:06:51

  268. Reading about stack_alloc, I realize now how that can work. It's not what people usually mean by SmallVec (where the buffer is stored inline in the collection), which is why I missed that option, but it side steps the problem of having to update pointers when the collection moves (and also makes those moves cheaper).

    I mentioned stack_alloc because it is the only such allocator with "a paper", but it was released in 2009 and precedes C++11 (C++03 did not support stateful allocators in collections).

    The way this work in C++11 (which supports stateful allocators), in a nutshell, is:

    • std::vector stores an Allocator object inside it just like Rust RawVec does.
    • the Allocator interface has an obscure property called Allocator::propagate_on_container_move_assignment (POCMA from now on) that user-defined allocators can customize; this property is true by default. If this property is false, on move assignment, the allocator cannot be propagated, so a collection is required by the standard to move each of its elements to the new storage manually.

    So when a vector with the system allocator is moved, first the storage for the new vector on the stack is allocated, then the allocator is moved (which is zero-sized), and then the 3 pointers are moved, which are still valid. Such moves are O(1).

    OTOHO, when a vector with a POCMA == true allocator is moved, first the storage for the new vector on the stack is allocated and initialized with an empty vector, then the old collection is drained into the new one, so that the old one is empty, and the new one full. This moves each element of the collection individually, using their move assignment operators. This step is O(N) and fixes internal pointers of the elements. Finally, the original now empty collection is dropped. Note that this looks like a clone, but isn't because the elements themselves aren't cloned, but moved in C++.

    Does that make sense?

    The main problem with this approach in C++ are that:

    • the vector growth-policy is implementation defined
    • the allocator API does not have _excess methods
    • the combination of the two issues above means that if you know your vector can at most hold 9 elements, you can't have a stack allocator that can hold 9 elements, because your vector might try to grow when it has 8 with a growth factor of 1.5 so you need to pessimize and allocate space for 18 elements.
    • the complexity of the vector operation changes depending on allocator properties (POCMA is just one of many properties that the C++ Allocator API has; writing C++ allocators is non-trivial). This makes specifying the API of vector a pain because sometimes copying or moving elements between different allocators of the same type has extra costs, which change the complexity of the operations. It also makes reading the spec a huge pain. Many online sources of documentation like cppreference put the general case upfront, and the obscure details of what changes if one allocator property is true or false in tiny small letters to avoid bothering 99% of the users with them.

    There are many people working on improving C++'s allocator API to fix these issues, for example, by adding _excess methods and guaranteeing that standard conforming collections use them.

    I disagree that this sort of allocator and cache_aligned_allocator are fundamentally adding new features.

    Maybe what I meant is that they allow you to use std collections in situations or for types for which you couldn't use them before. For example, in C++ you can't put the elements of a vector in the static memory segment of your binary without something like a stack allocator (yet you can write your own collection that does it). OTOH, the C++ standard does not support over-aligned types like SIMD types, and if you try to heap allocate one with new you will invoke undefined behavior (you need to use posix_memalign or similar). Using the object typically manifests the undefined behavior via a segfault (*). Things like aligned_allocator allow you to heap allocate these types, and even put them in std collections, without invoking undefined behavior, by using a different allocator. Sure the new allocator will have different allocation patterns (these allocators basically overalign all memory btw...), but what people use them for is to be able to do something that they couldn't do before.

    Obviously, Rust is not C++. And C++ has problems that Rust doesn't have (and vice-versa). An allocator that adds a new feature in C++ might be unnecessary in Rust, which, for example, doesn't have any problems with SIMD types.

    (*) Users of Eigen3 suffer this deeply, because to avoid undefined behavior when using C++ and STL containers, you need to guard the containers against SIMD types, or types containing SIMD types (Eigen3 docs) and also you need to guard your self from ever using new on your types by overloading operator new for them (more Eigen3 docs).

    gnzlbg at 2018-01-18 15:21:14

  269. @gnzlbg thanks, I was also confused by the smallvec exmaple. That would require non-moveable types and some sort of alloca in Rust---two RFCs in review and then more follow-up work---so I have no qualms punting on that for now. The existing smallvec strategy of always using all the stack space you'll need seems fine for now.

    I also agree with @rkruppe that in your revised list, the new capabilities of the allocator need not be known to the collection using the allocator. Sometimes the full Collection<Allocator> has new properties (existing entirely in pinned memory, lets say) but that's just a natural consequence of using the allocator.

    The one exception here I see is allocators that only allocate a single size/type (the NVidia ones do this, as do slab allocators). We could have a separate ObjAlloc<T> trait that's blanket implemented for normal allocators: impl<A: Alloc, T> ObjAlloc<T> for A. Then, collections would use ObjAlloc bounds if they just needed to allocate a few items. But, I feel somewhat silly even bringing this up as it should be doable backwards compatibly later.

    John Ericson at 2018-01-18 16:59:19

  270. Does that make sense?

    Sure but it's not really relevant to Rust since we have no move constructors. So a (movable) allocator that directly contains the memory it hands out pointers to just isn't possible, period.

    For example, in C++ you can't put the elements of a vector in the static memory segment of your binary without something like a stack allocator (yet you can write your own collection that does it).

    This is not a behavioral change. There are many valid reasons to control where collections get their memory, but all of them related to "externalities" like performance, linker scripts, control over the whole program's memory layout, etc.

    Things like aligned_allocator allow you to heap allocate these types, and even put them in std collections, without invoking undefined behavior, by using a different allocator.

    This is why I specifically mentioned TBB's cache_aligned_allocator and not Eigen's aligned_allocator. cache_aligned_allocator does not seem to guarantee any specific alignment in its documentation (it just says that it's "typically" 128 byte), and even if it did it usually wouldn't be used for this purpose (because its alignment is probably too large for common SIMD types and too small for things like page-aligned DMA). Its purpose, as you state, is to avoid false sharing.

    Hanna Kruppe at 2018-01-18 17:08:21

  271. @gnzlbg

    FWIW, parametric allocators are very useful to implement allocators.

    Could you elaborate a bit more on what you mean? Is there a reason why you can't parametrize your custom allocators with the Alloc trait? Or your own custom allocator trait and just implement the Alloc trait on the final allocators (these two traits do not necessarily need to be equal)?

    I think I wasn't clear; let me try to explain better. Let's say I'm implementing an allocator that I expect to use either:

    • As the global allocator
    • In a no-std environment

    And let's say that I'd like to use Vec under the hood to implement this allocator. I can't just use Vec directly as it exists today because

    • If I'm the global allocator, then using it will just introduce a recursive dependency on myself
    • If I'm in a no-std environment, there is no Vec as it exists today

    Thus, what I need is to be able to use a Vec that is parametrized on another allocator that I use internally for simple internal bookkeeping. This is the goal of bsalloc (and the source of the name - it is used to bootstrap other allocators).

    In elfmalloc, we are still able to be a global allocator by:

    • When compiling ourselves, statically compile jemalloc as the global allocator
    • Produce a shared object file that can be dynamically loaded by other programs

    Note that in this case, it's important that we don't compile ourselves using the system allocator as the global allocator because then, once loaded, we'd re-introduce the recursive dependency because, at that point, we are the system allocator.

    But it doesn't work when:

    • Somebody wants to use us as the global allocator in Rust in the "official" way (as opposed to by creating a shared object file first)
    • We're in a no-std environment

    OTOH, the C++ standard does not support over-aligned types like SIMD types, and if you try to heap allocate one with new you will invoke undefined behavior (you need to use posix_memalign or similar).

    Since our current Alloc trait takes alignment as a parameter, I assume this class of problem (the "I can't work without a different alignment" problem) goes away for us?

    Joshua Liebow-Feeser at 2018-01-18 17:16:54

  272. @gnzlbg - a comprehensive write-up (thank you) but none of the use cases cover persistent memory*.

    This use case must be considered. In particular, it strongly influences what the right thing to do is:-

    • That more than one allocator is in use, and especially, when used that allocator is for persistent memory, it would never be the system allocator; (indeed, there could be multiple persistent memory allocators)
    • The cost of 're-implementing' the standard collections is high, and leads to incompatible code with third-party libraries.
    • The allocator's lifetime is not necessarily 'static.
    • That objects stored in persistent memory need additional state that must be populated from the heap, ie they need state reininitialized. This is particularly so for mutexes and the like. What was once disposable no longer is disposed.

    Rust has a superb opportunity to seize the initiative here, and make it a first-class platform for what will replace HDs, SSDs and even PCI-attached storage.

    *Not a surprise, really, because until very recently it's been a bit special. It's now widely supported in Linux, FreeBSD and Windows.

    Raphael Cohn at 2018-01-19 11:32:13

  273. @raphaelcohn

    This really isn't the place to work out persistent memory. Yours is not the only school of thought regarding the interface to persistent memory- for instance, it may turn out that the prevailing approach is simply to treat it like faster disk, for data integrity reasons.

    If you have a use case for using persistent memory this way it would probably be better to make that case elsewhere somehow first. Prototype it, come up with some more concrete changes to the allocator interface, and ideally make the case that those changes are worth the impact they would have on the average case.

    Russell Johnston at 2018-01-19 17:54:39

  274. @rpjohnst

    I disagree. This is exactly the sort of place it belongs. I want to avoid a decision being made that creates a design that is the result of too narrow a focus and search for evidence.

    The current Intel PMDK - which is where a lot of effort for low-level user space support is focused - approaches it far more as allocated, regular memory with pointers - memory that is similar to that via mmap, say. Indeed, if one wants to work with persistent memory on Linux, then, I believe it's pretty much your only port of call at the moment. In essence, one of the most advanced toolkits for using it - the prevailing one if you will - treats it as allocated memory.

    As for prototyping it - well, that's exactly what I said I have done:-

    I've been working recently on a Rust wrapper for a persistent memory allocator (specifically, libpmemcto).

    (You can use an early-days version of my crate at https://crates.io/crates/nvml. There's a lot more experimentation in source control in the cto_pool module).

    My prototype is built in mind with what is needed to replace a data storage engine in a real-world, large scale system. A similar mindset is behind many of my open source projects. I've found over many years the best libraries, as are the best standards, ones which derive from real usage.

    Nothing like trying to fit a real-world allocator to the current interface. Frankly, the experience of using the Alloc interface, then copying the whole of Vec, then tweaking it, was painful. A lot of places assume that allocators aren't passed in, eg Vec::new().

    In doing it, I made some observations in my original comment about what would be required of an allocator, and what would be required of an user of such an allocator. I think those are very valid on a discussion thread about an allocator interface.

    Raphael Cohn at 2018-01-20 18:05:47

  275. The good news is your first 3 bullet points from https://github.com/rust-lang/rust/issues/32838#issuecomment-358940992 are shared by other use-cases.

    John Ericson at 2018-01-21 16:36:21

  276. I just wanted to add that i did not add non-volatile memory to the list because the list listed use cases of allocators parametrizing containers in the C++ world that are ā€œwidelyā€ used, at least on my experience (those alloctors I mentioned are mostly from very popular libraries used by many). While I know of the efforts of the Intel SDK (some of their libraries target C++) I don’t personally know any projects using them (do they have an allocator that can be used with std::vector? I don’t know). This doesn’t mean that they aren’t used nor important. I’d be interested into knowing about these, but the main point of my post was that parametrizing allocators by containers is very complex, and we should try to make progress with system allocators without closing any doors for containers (but we should tackle that later).

    On Sun 21. Jan 2018 at 17:36, John Ericson notifications@github.com wrote:

    The good news is your first 3 bullet points from #32838 (comment) https://github.com/rust-lang/rust/issues/32838#issuecomment-358940992 are shared by other use-cases.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-359261305, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3Npk95PZBZcm7tknNp_Cqrs_3T1UkEks5tM2ekgaJpZM4IDYUN .

    gnzlbg at 2018-01-21 17:34:02

  277. I tried to read most of what has been written already so this may be here already and in that case I'm sorry if I missed it but here goes:

    Something that is fairly common for games (in C/C++) is is to use "per frame scratch allocation" What this means is there is an linear/bump allocator that is used for allocations that are alive for a certain period of time (in a game frame) and then "destroyed".

    Destroyed in this case meaning that you reset the allocator back to it's starting position. There is no "destruction" of objects at all as these objects has to be of POD type (thus no destructors are being executed)

    I wonder if something like this will fit with the current allocator design in Rust?

    (edit: It should be there is NO destruction of objects)

    Daniel Collin at 2018-01-31 18:44:23

  278. @emoon

    Something that is fairly common for games (in C/C++) is is to use "per frame scratch allocation" What this means is there is an linear/bump allocator that is used for allocations that are alive for a certain period of time (in a game frame) and then "destroyed".

    Destroyed in this case meaning that you reset the allocator back to it's starting position. There is "destruction" of objects at all as these objects has to be of POD type (thus no destructors are being executed)

    Should be doable. Off the top of my head, you'd need one object for the arena itself and another object that is a per-frame handle on the arena. Then, you could implement Alloc for that handle, and assuming you were using high-level safe wrappers for allocation (e.g., imagine that Box becomes parametric on Alloc), the lifetimes would ensure that all of the allocated objects were dropped before the per-frame handle was dropped. Note that dealloc would still be called for each object, but if dealloc was a no-op, then the entire drop-and-deallocate logic might be completely or mostly optimized away.

    Joshua Liebow-Feeser at 2018-01-31 19:10:11

  279. You could also use a custom smart pointer type that doesn't implement Drop, which would make a lot of things easier elsewhere.

    Russell Johnston at 2018-01-31 20:51:03

  280. Thanks! I made a typo in my original post. It's to say that there is no destruction of objects.

    Daniel Collin at 2018-02-01 10:07:18

  281. For people who aren't expert in allocators, and can't follow this thread, what's the current consensus: do we plan to support custom allocators for the stdlib collection types?

    Alexander Regueiro at 2018-02-09 18:29:17

  282. @alexreg I'm not sure what the final plan is, but there is confirmed 0 technical difficulties in doing so. OTOH we don't have a good way to then expose that in std because default type variables are suspect, but I have no problem with just making it an alloc-only thing for now so we can make progress on the lib side unimpeded.

    John Ericson at 2018-02-09 22:59:31

  283. @Ericson2314 Okay, good to hear. Are default type variables implemented yet? Or at RFC stage perhaps? As you say, if they're just restricted to things related to alloc / std::heap, it should all be okay.

    Alexander Regueiro at 2018-02-10 00:42:01

  284. @alexreg See https://github.com/rust-lang/rfcs/pull/2321

    Taylor Cramer at 2018-02-10 01:13:16

  285. I really think AllocErr should be Error. It would be more consistent with another modules (e.g. io).

    brunoczim at 2018-02-20 18:35:19

  286. impl Error for AllocError probably makes sense and doesn’t hurt, but I’ve personally found the Error trait to be useless.

    Simon Sapin at 2018-02-20 20:43:48

  287. I was looking at at the Layout::from_size_align function today, and the "align must not exceed 2^31 (i.e. 1 << 31)," limitation did not make sense to me. And git blame pointed to #30170.

    I must say that was a quite deceptive commit message there, talking about align fitting in a u32, which is only incidental, when the actual thing being "fixed" (more worked around) is a system allocator misbehaving.

    Which leads me to this note: The "OSX/alloc_system is buggy on huge alignments" item here should not be checked. While the direct issue has been dealt with, I don't think the fix is right for the long term: Because a system allocator misbehaves shouldn't prevent implementing an allocator that behaves. And the arbitrary limitation on Layout::from_size_align does that.

    Mike Hommey at 2018-02-25 21:53:47

  288. @glandium Is it useful to request alignment to a multiple of 4Ā gigbytes or more?

    Simon Sapin at 2018-02-25 22:00:06

  289. I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB, which is not possible currently, but hardly more. But I don't think arbitrary limitations should be added just because we don't think of such reasons now.

    Mike Hommey at 2018-02-25 22:10:32

  290. I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB

    What are those cases?

    Steven Fackler at 2018-02-25 22:57:27

  291. I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB

    What are those cases?

    Concretely, I just added support for arbitrarily large alignments in mmap-alloc in order to support allocating large, aligned slabs of memory for use in elfmalloc. The idea is to have the slab of memory be aligned to its size so that, given a pointer to an object allocated from that slab, you merely need to mask off the low bits to find the containing slab. We don't currently use slabs that are 4GB in size (for objects that large, we go directly to mmap), but there's no reason that we couldn't, and I could totally imagine an application with large RAM requirements that wanted to do that (that is, if it allocated multi-GB objects frequently enough that it didn't want to accept the overhead of mmap).

    Joshua Liebow-Feeser at 2018-02-25 23:11:53

  292. Here's a possible use case for > 4GiB alignment: alignment to a large page boundary. There already are platforms that support > 4 GiB pages. This IBM document say "the POWER5+ processor supports four virtual memory page sizes: 4 KB, 64 KB, 16 MB, and 16 GB." Even x86-64 isn't far off: "huge pages" typically are 2 MiB, but it also supports 1 GiB.

    Scott Lamb at 2018-02-26 05:18:31

  293. All the non-typed functions in the Alloc trait are dealing with *mut u8. Which means they could take or return null pointers, and all hell would break loose. Should they use NonNull instead?

    Mike Hommey at 2018-03-04 11:56:28

  294. There are many pointers that they could return from which all hell would break loose. On Sun, Mar 4, 2018 at 3:56 AM Mike Hommey notifications@github.com wrote:

    All the non-typed functions in the Alloc trait are dealing with *mut u8. Which means they could take or return null pointers, and all hell would break loose. Should they use NonNull instead?

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-370223269, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY2UR2dRxDtdACeRUh_djM-DExRuLxiks5ta9aFgaJpZM4IDYUN .

    Steven Fackler at 2018-03-04 17:53:56

  295. A more compelling reason to use NonNull is that it would allow the Results currently returned from Alloc methods (or Options, if we switch to that in the future) to be smaller.

    Joshua Liebow-Feeser at 2018-03-04 21:12:42

  296. A more compelling reason to use NonNull is that it would allow the Results currently returned from Alloc methods (or Options, if we switch to that in the future) to be smaller.

    I don't think it would because AllocErr has two variants.

    There are many pointers that they could return from which all hell would break loose.

    But a null pointer is clearly more wrong than any other pointer.

    I like to think that the rust type system helps with footguns, and is used to encode invariants. The documentation for alloc clearly says "If this method returns an Ok(addr), then the addr returned will be non-null address", but its return type doesn't. As things are, Ok(malloc(layout.size())) would be a valid implementation, when it clearly isn't.

    Note, there are also notes about Layout size needing to be non-zero, so I'd also argue it should encode that as a NonZero<usize>.

    It's not because all those functions are inherently unsafe that we shouldn't have some footgun prevention.

    Mike Hommey at 2018-03-04 22:07:11

  297. Out of all the possible errors in using (edit: and implementing) allocators, passing a null pointer is one of the easiest to track down (you always get a clean segfault on dereference, at least if you have an MMU and didn't do very weird things with it), and usually one of the most trivial ones to fix as well. It's true that unsafe interfaces can try to prevent footguns, but this footgun seems disproportionally small (compared to the other possible errors, and to the verbosity of encoding this invariant in the type system).

    Besides, it seems likely that allocator implementations would just use the unchecked constructor of NonNull "for performance": since in a correct allocator would never return null anyway, it would want to skip the NonNell::new(...).unwrap(). In that case you won't actually get any tangible footgun prevention, just more boilerplate. (The Result size benefits, if real, may still be a compelling reason for it.)

    Hanna Kruppe at 2018-03-04 22:24:01

  298. allocator implementations would just use the unchecked constructor of NonNull

    The point is less to help allocator implementation than to help their users. If MyVec contains a NonNull<T> and Heap.alloc() already returns a NonNull, that one less checked or unsafe-unchecked call that I need to make.

    Simon Sapin at 2018-03-04 22:55:26

  299. Note that pointers are not only return types, they are also input types to e.g. dealloc and realloc. Are those functions supposed to harden against their input being possibly null or not? The documentation would tend to say no, but the type system would tend to say yes.

    Quite similarly with layout.size(). Are allocation functions supposed to handle the requested size being 0 somehow, or not?

    (The Result size benefits, if real, may still be a compelling reason for it.)

    I doubt there are size benefits, but with something like #48741, there would be codegen benefits.

    Mike Hommey at 2018-03-04 23:30:52

  300. If we continue that principle of being more flexible for users of the API, pointers should be NonNull in return types but not in arguments. (This doesn’t mean that those arguments should be null-checked at runtime.)

    Simon Sapin at 2018-03-05 06:59:37

  301. I think a Postel's law approach is the wrong one to take here. Is there any case in which passing a null pointer to an Alloc method is valid? If not, then that flexibility is basically just giving the footgun a slightly more sensitive trigger.

    On Mar 5, 2018 8:00 AM, "Simon Sapin" notifications@github.com wrote:

    If we continue that principle of being more flexible for users of the API, pointers should be NonNull in return types but not in arguments. (This doesn’t mean that those arguments should be null-checked at runtime.)

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-370327018, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_2L8zrOLyUv5mUc_kiiXOAn1f60k9Uks5tbOJ0gaJpZM4IDYUN .

    Joshua Liebow-Feeser at 2018-03-05 10:05:18

  302. The point is less to help allocator implementation than to help their users. If MyVec contains a NonNull<T> and Heap.alloc() already returns a NonNull, that one less checked or unsafe-unchecked call that I need to make.

    Ah this makes sense. Doesn't fix the footgun, but centralizes the responsibility for it.

    Note that pointers are not only return types, they are also input types to e.g. dealloc and realloc. Are those functions supposed to harden against their input being possibly null or not? The documentation would tend to say no, but the type system would tend to say yes.

    Is there any case in which passing a null pointer to an Alloc method is valid? If not, then that flexibility is basically just giving the footgun a slightly more sensitive trigger.

    The user absolutely has to read the documentation and keep the invariants in mind. Many invariants can't be enforced via type system at all -- if they could, the function wouldn't be unsafe to begin with. So this is solely a question of whether putting NonNull in any given interface will actually help users by

    • reminding them to read the docs and think about the invariants
    • offering convenience (@SimonSapin's point wrt alloc's return value)
    • giving some material advantage (e.g., layout optimizations)

    I don't see any strong advantage in making e.g., the argument of dealloc into NonNull. I see roughly two classes of uses of this API:

    1. Relatively trivial use, where you call alloc, store the returned pointer somewhere, and after a while pass the stored pointer to dealloc.
    2. Complicated scenarios involving FFI, lots of pointer arithmetic, etc. where there's significant logic involved in ensuring you pass the right thing to dealloc at the end.

    Taking NonNull here basically only helps the first kind of use case, because those will store the NonNull in some nice place and just pass it to NonNull unaltered. Theoretically it could prevent some typos (passing foo when you meant bar) if you're juggling multiple pointers and only one of them is NonNull, but this doesn't seem too common or important. The disadvantage of dealloc taking a raw pointer (assuming alloc returns NonNull which @SimonSapin has convinced me should happen) would be that it requires an as_ptr in the dealloc call, which is potentially annoying but doesn't impact safety either way.

    The second kind of use case is not helped because it likely can't keep using NonNull throughout the whole process, so it would have to manually re-create a NonNull from the raw pointer it got by whatever means. As I argued earlier, this would likely become an unchecked/unsafe assertion rather than an actual run time check, so no footguns are prevented.

    This is not to say I am in favor of dealloc taking a raw pointer. I just don't see any the claimed advantages wrt footguns. Consistency of the types probably just wins by default.

    Hanna Kruppe at 2018-03-05 11:53:39

  303. I'm sorry but I read this as "Many invariants can't be enforced via type system at all...therefore let's not even try". Don't let the perfect be the enemy of the good!

    John Ericson at 2018-03-05 20:59:10

  304. I think it's more about the tradeoffs between the guarantees provided by NonNull and the ergonomics lost from having to transition back and forth between NonNull and raw pointers. I don't have a particularly strong opinion either way-- neither side seems unreasonable.

    Taylor Cramer at 2018-03-05 21:06:16

  305. @cramertj Yeah, but I don't really buy the premise of that sort of argument. People say Alloc is for obscure, hidden, and largely unsafe use-cases. Well, in obscure, hard-to-read code, I would like to have as much safety as possible---precisely because they are so rarely touched its likely the original author won't be around. Conversely, if the code is being read years later, screw egonomics. If anything, it is counter-productive. The code should strive to be very explicit so an unfamiliar reader can better figure out what on earth is going on. Less noise < clearer invariants.

    John Ericson at 2018-03-05 21:14:36

  306. The second kind of use case is not helped because it likely can't keep using NonNull throughout the whole process, so it would have to manually re-create a NonNull from the raw pointer it got by whatever means.

    This is simply a coordination failure, not a technical inevitability. Sure, right now many unsafe APIs might use raw pointers. So something has to lead the way switching to a superior interface using NonNull or other wrappers. Then other code can more easily follow suit. I see 0 reason to constantly fall back on hard-to-read, uninformative raw-pointers in greenfield, all-Rust, unsafe code.

    John Ericson at 2018-03-05 21:17:13

  307. Hi!

    I just want to say that, as the author/maintainer of a Rust custom allocator, I am in favor of NonNull. Pretty much for all the reasons that have already been laid out in this thread.

    Also, I'd like to point out that @glandium is the maintainer of firefox's fork of jemalloc, and has lots of experience hacking on allocators as well.

    Nick Fitzgerald at 2018-03-05 21:24:35

  308. I could write much much more responding to @Ericson2314 and others, but it's quickly becoming a very detached and philosophical debate, so I'm cutting it short here. I was arguing against what I believe is an over-statement of the safety benefits of NonNull in this sort of API (there are other benefits, of course). That is not to say there are no safety benefits, but as @cramertj said, there are trade offs and I think the "pro" side is overstated. Regardless, I have already said I lean towards using NonNull in various places for other reasons -- for the reason @SimonSapin gave in alloc, in dealloc for consistency. So let's do that and not go off on any more tangents.

    Hanna Kruppe at 2018-03-05 21:30:22

  309. If there's a few NonNull use-cases that everyone is on board with, that's a great start.

    John Ericson at 2018-03-05 22:47:19

  310. We'll probably want to update Unique and friends to use NonNull instead of NonZero to keep friction at least within liballoc low. But this does look like something we are already doing anyways at one abstraction level over the allocator, and I can't think of any reasons for not doing this at the allocator level as well. Looks like a reasonable change to me.

    gnzlbg at 2018-03-06 11:48:09

  311. (There’s already two implementations of the From trait that convert safely between Unique<T> and NonNull<T>.)

    Simon Sapin at 2018-03-06 12:50:31

  312. Considering I need something very much like the allocator API in stable rust, I extracted the code from the rust repo and put it in a separate crate:

    • https://crates.io/crates/allocator_api
    • https://github.com/glandium/allocator_api

    This /could/ be used to iterate on experimental changes to the API, but as of now, it's a plain copy of what is in the rust repo.

    Mike Hommey at 2018-03-28 07:46:27

  313. [Off topic] Yeah it would be nice if std could use out of tree stable code crates like that, so that we can experiment with unstable interfaces in stable code. This is one of the reasons I like having std a facade.

    John Ericson at 2018-03-28 18:06:00

  314. std could depend on a copy of a crate from crates.io, but if your program also depends on the same crate it wouldn’t "look like" the same crate/types/traits to rustc anyway so I don’t see how it would help. Anyway, regardless of the facade, making unstable features unavailable on the stable channel is a very deliberate choice, not an accident.

    Simon Sapin at 2018-03-28 18:24:56

  315. It seems we have some agreement wrt using NonNull. What's the way forward for this to actually happen? just a PR doing it? a RFC?

    Unrelatedly, I've been looking at some assembly generated from Boxing things, and the error paths are rather large. Should something be done about it?

    Examples:

    pub fn bar() -> Box<[u8]> {
        vec![0; 42].into_boxed_slice()
    }
    
    pub fn qux() -> Box<[u8]> {
        Box::new([0; 42])
    }
    

    compiles to:

    example::bar:
      sub rsp, 56
      lea rdx, [rsp + 8]
      mov edi, 42
      mov esi, 1
      call __rust_alloc_zeroed@PLT
      test rax, rax
      je .LBB1_1
      mov edx, 42
      add rsp, 56
      ret
    .LBB1_1:
      mov rax, qword ptr [rsp + 8]
      movups xmm0, xmmword ptr [rsp + 16]
      movaps xmmword ptr [rsp + 32], xmm0
      mov qword ptr [rsp + 8], rax
      movaps xmm0, xmmword ptr [rsp + 32]
      movups xmmword ptr [rsp + 16], xmm0
      lea rdi, [rsp + 8]
      call __rust_oom@PLT
      ud2
    
    example::qux:
      sub rsp, 104
      xorps xmm0, xmm0
      movups xmmword ptr [rsp + 58], xmm0
      movaps xmmword ptr [rsp + 48], xmm0
      movaps xmmword ptr [rsp + 32], xmm0
      lea rdx, [rsp + 8]
      mov edi, 42
      mov esi, 1
      call __rust_alloc@PLT
      test rax, rax
      je .LBB2_1
      movups xmm0, xmmword ptr [rsp + 58]
      movups xmmword ptr [rax + 26], xmm0
      movaps xmm0, xmmword ptr [rsp + 32]
      movaps xmm1, xmmword ptr [rsp + 48]
      movups xmmword ptr [rax + 16], xmm1
      movups xmmword ptr [rax], xmm0
      mov edx, 42
      add rsp, 104
      ret
    .LBB2_1:
      movups xmm0, xmmword ptr [rsp + 16]
      movaps xmmword ptr [rsp + 80], xmm0
      movaps xmm0, xmmword ptr [rsp + 80]
      movups xmmword ptr [rsp + 16], xmm0
      lea rdi, [rsp + 8]
      call __rust_oom@PLT
      ud2
    

    That's a rather large amount of code to add to any place creating boxes. Compare with 1.19, which didn't have the allocator api:

    example::bar:
      push rax
      mov edi, 42
      mov esi, 1
      call __rust_allocate_zeroed@PLT
      test rax, rax
      je .LBB1_2
      mov edx, 42
      pop rcx
      ret
    .LBB1_2:
      call alloc::oom::oom@PLT
    
    example::qux:
      sub rsp, 56
      xorps xmm0, xmm0
      movups xmmword ptr [rsp + 26], xmm0
      movaps xmmword ptr [rsp + 16], xmm0
      movaps xmmword ptr [rsp], xmm0
      mov edi, 42
      mov esi, 1
      call __rust_allocate@PLT
      test rax, rax
      je .LBB2_2
      movups xmm0, xmmword ptr [rsp + 26]
      movups xmmword ptr [rax + 26], xmm0
      movaps xmm0, xmmword ptr [rsp]
      movaps xmm1, xmmword ptr [rsp + 16]
      movups xmmword ptr [rax + 16], xmm1
      movups xmmword ptr [rax], xmm0
      mov edx, 42
      add rsp, 56
      ret
    .LBB2_2:
      call alloc::oom::oom@PLT
    

    Mike Hommey at 2018-03-29 02:09:36

  316. If this is actually significant, then it's indeed annoying. However maybe LLVM optimizes this out for larger programs?

    Pierre Krieger at 2018-03-29 13:19:51

  317. There are 1439 calls to __rust_oom in latest Firefox nightly. Firefox doesn't use rust's allocator, though, so we get direct calls to malloc/calloc, followed by a null check that the jumps to the oom preparation code, which is usually two movq and a lea, filling the AllocErr and getting its address to pass it to __rust__oom. That's the best case scenario, essentially, but that's still 20 bytes of machine code for the two movq and the lea.

    It I look at ripgrep, there are 85, and they are all in identical _ZN61_$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$3oom17h53c76bda5 0c6b65aE.llvm.nnnnnnnnnnnnnnn functions. All of them are 16 bytes long. There are 685 calls to those wrapper functions, most of which are preceded by code similar to what I pasted in https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 .

    Mike Hommey at 2018-03-29 13:57:15

  318. @nox was looking today at enabling the mergefunc llvm pass, I wonder if that makes any difference here.

    Emilio Cobos Ɓlvarez at 2018-03-29 15:17:20

  319. mergefunc apparently doesn't get rid of the multiple identical _ZN61_$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$3oom17h53c76bda5 0c6b65aE.llvm.nnnnnnnnnnnnnnn functions (tried with -C passes=mergefunc in RUSTFLAGS).

    But what makes a big difference is LTO, which is actualy what makes Firefox call malloc directly, leaving the creation of the AllocErr to right before calling __rust_oom. That also makes the creation of the Layout unnecessary before calling the allocator, leaving it to when filling the AllocErr.

    This makes me think the allocation functions, except __rust_oom should probably be marked inline.

    BTW, having looked at the generated code for Firefox, I'm thinking it would ideally be desirable to use moz_xmalloc instead of malloc. This is not possible without a combination of the Allocator traits and being able to replace the global heap allocator, but brings the possible need for a custom error type for the Allocator trait: moz_xmalloc is infallible and never returns in case of failure. IOW, it handles OOM itself, and the rust code wouldn't need to call __rust_oom in that case. Which would make it desirable for the allocator functions to optionally return ! instead of AllocErr.

    Mike Hommey at 2018-03-29 21:15:30

  320. We’ve discussed making AllocErr a zero-size struct, which might also help here. With the pointer also made NonNull, the entire return value could be pointer-sized.

    Simon Sapin at 2018-03-29 23:40:43

  321. https://github.com/rust-lang/rust/pull/49669 makes a number of changes to these APIs, with the goal of stabilizing a subset covering global allocators. Tracking issue for that subset: https://github.com/rust-lang/rust/issues/49668. In particular, a new GlobalAlloc trait is introduced.

    Simon Sapin at 2018-04-04 21:45:50

  322. Will this PR allow us to do things like Vec::new_with_alloc(alloc) where alloc: Alloc soon?

    Alexander Regueiro at 2018-04-04 22:10:35

  323. @alexreg no

    Steven Fackler at 2018-04-04 22:13:17

  324. @sfackler Hmm, why not? What do we need before we can do that? I don't really get the point of this PR otherwise, unless it's for simply changing the global allocator.

    Alexander Regueiro at 2018-04-04 22:24:44

  325. @alexreg

    I don't really get the point of this PR otherwise, unless it's for simply changing the global allocator.

    I think it's simply for changing the global allocator.

    Taylor Cramer at 2018-04-04 22:37:15

  326. @alexreg If you mean on stable, there are a number of unresolved design question that we’re not ready to stabilize. On Nightly, this is supported by RawVec and probably fine to add as #[unstable] for Vec for anyone who feels like working on that.

    And yes, as mentioned in the PR its point is to allow changing the global allocator, or allocating (e.g. in a custom collection type) without absuing Vec::with_capacity.

    Simon Sapin at 2018-04-04 22:49:35

  327. FWIW, the allocator_api crate mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-376793369 has RawVec<T, A> and Box<T, A> on the master branch (not released yet). I'm thinking of it as an incubator for what collections generic over the allocation type could look like (plus the fact that I do need a Box<T, A> type for stable rust). I haven't started porting vec.rs to add Vec<T, A> just yet, but PRs are welcome. vec.rs is large.

    Mike Hommey at 2018-04-04 23:00:00

  328. I'll note that the codegen "issues" mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 should be gone with the changes in #49669.

    Now, with some more thought given to using the Alloc trait to help implement an allocator in layers, there are two things that I think would be useful (at least to me):

    • as mentioned earlier, optionally being able to specify a different AllocErr type. This can be useful to make it !, or, now that AllocErr is empty, to optionally have it convey more information than "failed".
    • optionally being able to specify a different Layout type. Imagine you have two layers of allocators: one for page allocations, and one for larger regions. The latter can rely on the former, but if they both take the same Layout type, then both layers need to do their own validation: at the lowest level, that size and alignment is a multiple of the page size, and the higher level, that size and alignment match the requirements for the larger regions. But those checks are redundant. With specialized Layout types, the validation could be delegated to the Layout creation instead of in the allocator itself, and conversions between the Layout types would allow to skip the redundant checks.

    Mike Hommey at 2018-04-04 23:03:22

  329. @cramertj @SimonSapin @glandium Okay, thanks for clarifying. I may just submit a PR for some of the other collections-prime types. Is it best to do this against your allocator-api repo/crate, @glandium, or rust master?

    Alexander Regueiro at 2018-04-04 23:05:09

  330. @alexreg considering the amount of breaking changes to the Alloc trait in #49669, it's probably better to wait for it to merge first.

    Mike Hommey at 2018-04-04 23:08:03

  331. @glandium Fair enough. That doesn't seem too far away from landing. I just noticed the https://github.com/pnkfelix/collections-prime repo too... what's that in relation to yours?

    Alexander Regueiro at 2018-04-04 23:47:18

  332. I would add one more open question:

    • Is Alloc::oom allowed to panic? Currently the docs say that this method must abort the process. This has implications for code that uses allocators since they must then be designed to handle unwinding properly without leaking memory.

    I think that we should allow panicking since a failure in a local allocator does not necessarily mean that the global allocator will fail as well. In the worst case, the global allocator's oom will be called which will abort the process (doing otherwise would break existing code).

    Amanieu d'Antras at 2018-04-04 23:57:27

  333. @alexreg It's not. It just seems to be a plain copy of what's in std/alloc/collections. Well, a two-year old copy of it. My crate is much more limited in scope (the published version only has the Alloc trait as of a few weeks ago, the master branch only has RawVec and Box on top of that), and one of my goals is to keep it building with stable rust.

    Mike Hommey at 2018-04-04 23:58:16

  334. @glandium Okay, in that case it probably makes sense for me to wait until that PR lands, then create a PR against rust master and tag you, so you know when it gets merged into master (and can then merge it into your crate), fair?

    Alexander Regueiro at 2018-04-05 00:55:12

  335. @alexreg makes sense. You /could/ start working on it now, but that would likely induce some churn on your end if/when bikeshedding changes things in that PR.

    Mike Hommey at 2018-04-05 00:59:20

  336. @glandium I've got other things to keep me busy with Rust for now, but I'll be onto it when that PR gets approved. It will be great go get allocator-generic heap allocation / collections on both nightly and stable soon. :-)

    Alexander Regueiro at 2018-04-05 01:30:51

  337. Is Alloc::oom allowed to panic? Currently the docs say that this method must abort the process. This has implications for code that uses allocators since they must then be designed to handle unwinding properly without leaking memory.

    @Amanieu This RFC was merged: https://github.com/rust-lang/rfcs/pull/2116 The docs and implementation might just not have been updated yet.

    gnzlbg at 2018-04-05 06:39:22

  338. There is one change to the API that I'm considering to submit a PR for:

    Split the Alloc trait in two parts: "implementation" and "helpers". The former would be functions like alloc, dealloc, realloc, etc. and the latter, alloc_one, dealloc_one, alloc_array, etc. While there are some hypothetical benefits from being able to have custom implementation for the latter, it's far from the most common need, and when you need to implement generic wrappers (which I've found to be incredibly common, to the point I've actually started to write a custom derive for that), you still need to implement all of them because the wrappee might be customizing them.

    OTOH, if an Alloc trait implementer does try to do fancy things in e.g. alloc_one, they're not guaranteed that dealloc_one will be called for that allocation. There are multiple reasons for this:

    • The helpers are not used consistently. Just one example, raw_vec uses a mix of alloc_array, alloc/alloc_zeroed, but only uses dealloc.
    • Even with consistent use of e.g. alloc_array/dealloc_array, one can still safely convert a Vec into a Box, which would then use dealloc.
    • Then there are some parts of the API that just don't exist (no zeroed version of alloc_one/alloc_array)

    So even though there are actual use cases for specialization of e.g. alloc_one (and as a matter of fact, I do have such a need for mozjemalloc), one is better off using a specialized allocator instead.

    Actually, it's worse than that, in the rust repo, there's exactly one use of alloc_array, and no use of alloc_one, dealloc_one, realloc_array, dealloc_array. Not even box syntax uses alloc_one, it uses exchange_malloc, which takes a size and align. So those functions are more meant as a convenience for clients than for implementers.

    With something like impl<A: Alloc> AllocHelpers for A (or AllocExt, whatever name is chosen), we'd still have the convenience of those functions for clients, while not allowing implementers to shoot themselves in the foot if they thought they'd do fancy things by overriding them (and making it easier on people implementing proxy allocators).

    Mike Hommey at 2018-05-03 21:53:45

  339. There is one change to the API that I'm considering to submit a PR for

    Did so in #50436

    Mike Hommey at 2018-05-03 23:41:55

  340. @glandium

    (and as a matter of fact, I do have such a need for mozjemalloc),

    Could you elaborate on this use case?

    gnzlbg at 2018-05-04 09:44:08

  341. mozjemalloc has a base allocator that purposefully leaks. Except for one kind of objects, where it keeps a free list. I can do that by layering allocators rather than do tricks with alloc_one.

    Mike Hommey at 2018-05-04 09:48:56

  342. Is it required to deallocate with the exact alignment that you allocated with?

    Just to reinforce that the answer to this question is YES, I have this lovely quote from Microsoft themselves:

    aligned_alloc() will probably never be implemented, as C11 specified it in a way that’s incompatible with our implementation (namely, that free() must be able to handle highly aligned allocations)

    Using the system allocator on Windows will always require knowing the alignment when deallocating in order to correctly deallocate highly aligned allocations, so can we please just mark that question as resolved?

    Peter Atashian at 2018-05-07 20:27:05

  343. Using the system allocator on Windows will always require knowing the alignment when deallocating in order to correctly deallocate highly aligned allocations, so can we please just mark that question as resolved?

    It’s a shame, but it is the way it is. Let’s give up on overaligned vectors then. :confused:

    Ruud van Asseldonk at 2018-05-07 21:32:42

  344. Let’s give up on overaligned vectors then

    How come? You just need Vec<T, OverAlignedAlloc<U16>> that both allocates and deallocates with overalignment.

    gnzlbg at 2018-05-07 21:35:19

  345. How come? You just need Vec<T, OverAlignedAlloc<U16>> that both allocates and deallocates with overalignment.

    I should have been more specific. I meant moving overaligned vectors into an API outside of your control, i.e. one that takes a Vec<T> and not Vec<T, OverAlignedAlloc<U16>>. (For example CString::new().)

    Ruud van Asseldonk at 2018-05-07 22:01:19

  346. You should rather use

    #[repr(align(16))]
    struct OverAligned16<T>(T);
    

    and then Vec<OverAligned16<T>>.

    Mike Hommey at 2018-05-07 22:20:55

  347. You should rather use

    That depends. Suppose you want to use AVX intrinsics (256 bit wide, 32-byte alignment requirement) on a vector of f32s:

    • Vec<T, OverAlignedAlloc<U32>> solves the problem, one can use AVX intrinsics directly on the vector elements (in particular, aligned memory loads), and the vector still derefs into a &[f32] slice making it ergonomic to use.
    • Vec<OverAligned32<f32>> does not really solve the problem. Each f32 takes 32 bytes of space due to the alignment requirement. The padding introduced prevents the direct use of AVX operations since the f32s are not on continuous memory any more. And I personally find the deref to &[OverAligned32<f32>] a bit tedious to deal with.

    For a single element in a Box, Box<T, OverAligned<U32>> vs Box<OverAligned32<T>>, both approaches are more equivalent, and the second approach might indeed be preferable. In any case is nice to have both options.

    gnzlbg at 2018-05-07 22:32:33

  348. Posted this wrt changes to the Alloc trait: https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487

    Mike Hommey at 2018-05-10 10:08:20

  349. The tracking post at the top of this issue is horribly out of date (was last edited in 2016). We need an updated list of active concerns to continue the discussion productively.

    Amanieu d'Antras at 2018-05-29 11:12:18

  350. The discussion would also significantly benefit from an up-to-date design document, containing the current unresolved questions, and the rationale for the design decisions.

    There are multiple threads of diffs from "what's currently implemented on nightly" to "what was proposed in the original Alloc RFC" spawning thousands of comments on different channels (rfc repo, rust-lang tracking issue, global alloc RFC, internal posts, many huge PRs, etc.), and what's being stabilized in the GlobalAlloc RFC does not look that much from what was proposed in the original RFC.

    This is something that we need anyways to finish updating the docs and the reference, and would be helpful in the current discussions as well.

    gnzlbg at 2018-05-29 11:34:26

  351. I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

    Amanieu d'Antras at 2018-05-29 20:15:03

  352. I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

    Yes, absolutely. Especially Box, since we don't yet know how to avoid having Box<T, A> take up two words.

    Joshua Liebow-Feeser at 2018-05-29 20:21:23

  353. Yes, absolutely. Especially Box, since we don't yet know how to avoid having Box<T, A> take up two words.

    I don't think we should worry the size of Box<T, A> for the initial implementation, but this is something that can be added later in a backward-compatible way by adding a DeAlloc trait which only supports deallocation.

    Example:

    trait DeAlloc {
        fn dealloc(&mut self, ptr: NonNull<Opaque>, layout: Layout);
    }
    
    trait Alloc {
        // In addition to the existing trait items
        type DeAlloc: DeAlloc = Self;
        fn into_dealloc(self) -> Self::DeAlloc {
            self
        }
    }
    
    impl<T: Alloc> DeAlloc for T {
        fn dealloc(&mut self, ptr: NonNull<Opaque>, layout: Layout) {
            Alloc::dealloc(self, ptr, layout);
        }
    }
    

    Amanieu d'Antras at 2018-05-29 20:37:49

  354. I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

    I think @Ericson2314 has been working on this, per https://github.com/rust-lang/rust/issues/42774. Would be nice to get an update from him.

    Alexander Regueiro at 2018-05-29 20:39:12

  355. I don't think we should worry the size of Box<T, A> for the initial implementation, but this is something that can be added later in a backward-compatible way by adding a DeAlloc trait which only supports deallocation.

    That's one approach, but it's not at all clear to me that it's definitely the best one. It has the distinct disadvantages, for example, that a) it only works when a pointer -> allocator lookup is possible (this isn't true of, e.g., most arena allocators) and, b) it adds significant overhead to dealloc (namely, to do the reverse lookup). It may end up being the case that the best solution to this problem is a more general-purpose effect or context system like this proposal or this proposal. Or perhaps something different altogether. So I don't think we should assume that this will be easy to solve in a manner which is backwards-compatible with the current incarnation of the Alloc trait.

    Joshua Liebow-Feeser at 2018-05-29 20:49:45

  356. @joshlf Considering the fact that Box<T, A> only has access to itself when it is dropped, this is the best thing that we can do with safe code only. Such a pattern might be useful for arena-like allocators which have a no-op dealloc and just free memory when the allocator is dropped.

    For more complicated systems where the allocator is owned by a container (e.g. LinkedList) and managed multiple allocations, I expect that Box will not be used internally. Instead, the LinkedList internals will use raw pointers which are allocated and freed with the Alloc instance that is contained in the LinkedList object. This will avoid doubling the size of every pointer.

    Amanieu d'Antras at 2018-05-29 22:07:53

  357. Considering the fact that Box<T, A> only has access to itself when it is dropped, this is the best thing that we can do with safe code only. Such a pattern might be useful for arena-like allocators which have a no-op dealloc and just free memory when the allocator is dropped.

    Right, but Box doesn't know that dealloc is no-op.

    For more complicated systems where the allocator is owned by a container (e.g. LinkedList) and managed multiple allocations, I expect that Box will not be used internally. Instead, the LinkedList internals will use raw pointers which are allocated and freed with the Alloc instance that is contained in the LinkedList object. This will avoid doubling the size of every pointer.

    I think it would really be a shame to require people to use unsafe code in order to write any collections at all. If the goal is to make all collections (presumably including those outside of the standard library) optionally parametric on an allocator, and Box isn't allocator-parametric, then a collections author must either not use Box at all or use unsafe code (and keep in mind that remembering to always free things is one of the most common types of memory unsafety in C and C++, so it's difficult-to-get-right unsafe code at that). That seems like an unfortunate bargain.

    Joshua Liebow-Feeser at 2018-05-29 22:19:11

  358. Right, but Box doesn't know that dealloc is no-op.

    Why wouldn't adapt what C++ unique_ptr does ? That is: to store pointer to allocator if it's "stateful", and do not store it if the allocator is "stateless" (e.g. global wrapper around malloc or mmap). This would require to split current Alloc traint into two traits: StatefulAlloc and StatelessAlloc. I realize that it is a very rude and inelegant (and probably someone has already proposed it in previous discussions). Despite its inelegance this solution is simple and backward compatible (without performance penalties).

    I think it would really be a shame to require people to use unsafe code in order to write any collections at all. If the goal is to make all collections (presumably including those outside of the standard library) optionally parametric on an allocator, and Box isn't allocator-parametric, then a collections author must either not use Box at all or use unsafe code (and keep in mind that remembering to always free things is one of the most common types of memory unsafety in C and C++, so it's difficult-to-get-right unsafe code at that). That seems like an unfortunate bargain.

    I'm afraid that an implementation of effect or context system which could allow one to write node-based containers like lists, trees, etc in safe manner might take too much time (if it's possible in principle). I didn't see any papers or academic languages that tackle this problem (please, correct me if such works actually exist).

    So resorting to unsafe in implementation of node-based containers might be a necessary evil, at least in short-term perspective.

    Evgeniy Moiseenko at 2018-06-12 19:43:24

  359. @eucpp Note that unique_ptr doesn't store an allocator-- it stores a Deleter:

    Deleter must be FunctionObject or lvalue reference to a FunctionObject or lvalue reference to function, callable with an argument of type unique_ptr<T, Deleter>::pointer`

    I see this as roughly equivalent to us providing split Alloc and Dealloc traits.

    Taylor Cramer at 2018-06-12 19:56:44

  360. @cramertj Yes, you are right. Still, two traits are required - stateful and stateless Dealloc.

    Evgeniy Moiseenko at 2018-06-12 20:07:49

  361. Wouldn't a ZST Dealloc be sufficient?

    On Tue, Jun 12, 2018 at 3:08 PM Evgeniy Moiseenko notifications@github.com wrote:

    @cramertj https://github.com/cramertj Yes, you are right. Still, two traits are required - stateful and stateless Dealloc.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-396716689, or mute the thread https://github.com/notifications/unsubscribe-auth/AEAJtWkpF0ofVc18NwbfV45G4QY6SCFBks5t8B_AgaJpZM4IDYUN .

    remexre at 2018-06-12 20:09:50

  362. Wouldn't a ZST Dealloc be sufficient?

    @remexre I suppose it would :)

    I didn't know that rust compiler supports ZST out of box. In C++ it would require at least some tricks around empty base optimisation. I'm pretty new at Rust so sorry for some obvious mistakes.

    Evgeniy Moiseenko at 2018-06-12 20:39:11

  363. I don’t think we need separate traits for stateful v.s. stateless.

    With Box augmented with an A type parameter, it would contain a value of A directly, not a reference or pointer to A. That type can be zero-size fo a stateless (de)allocator. Or A itself can be something like a reference or handle to a stateful allocator that can be shared between multiple allocated objects. So instead of impl Alloc for MyAllocator, you might want to do something like impl<'r> Alloc for &'r MyAllocator

    Simon Sapin at 2018-06-12 20:59:38

  364. By the way, a Box that only knows how to deallocate and not how to allocate would not implement Clone.

    Simon Sapin at 2018-06-12 21:03:07

  365. @SimonSapin I'd expect that Cloneing would require specifying an allocator again, the same way that creating a new Box would (that is, it wouldn't be done using the Clone trait).

    Taylor Cramer at 2018-06-12 21:13:02

  366. @cramertj Wouldn't it be inconsistent compared to Vec and other containers that implement Clone ? What are the downsides of storing instance of Alloc inside Box rather than Dealloc ? Then Box might implement Clone as well as clone_with_alloc.

    Evgeniy Moiseenko at 2018-06-13 07:21:12

  367. I don't this the split traits really affect Clone in a huge way - the impl would just look like impl<T, A> Clone for Box<T, A> where A: Alloc + Dealloc + Clone { ... }.

    Steven Fackler at 2018-06-13 16:00:20

  368. @sfackler I wouldn't be opposed to that impl, but I would also expect to have a clone_into or something that uses a provided allocator.

    Taylor Cramer at 2018-06-13 16:46:48

  369. Would it make sense to a alloc_copy method to Alloc? This could be used to provide faster memcpy (Copy/Clone) implementations for large allocations, e.g. by doing copy-on-write clones of pages.

    the8472 at 2018-07-04 21:07:59

  370. That would be pretty cool, and trivial to provide a default implementation for.

    Joshua Liebow-Feeser at 2018-07-04 21:51:41

  371. What would be using such an alloc_copy function? impl Clone for Box<T, A>?

    Mike Hommey at 2018-07-04 21:57:42

  372. Yeah, ditto for Vec.

    the8472 at 2018-07-04 22:03:24

  373. Having looked into it some more it seems like approaches to create copy-on-write pages within the same process range between hacky and impossible, at least if you want to do it more than one level deep. So alloc_copy wouldn't be a huge benefit.

    Instead a more general escape hatch that allows future virtual memory shenanigans might be of some use. I.e. if an allocation is large, backed by mmap anyway and stateless then the allocator could promise to be oblivious about future changes to the allocation. The user could then move that memory to a pipe, unmap it or similar things. Alternatively there could be a dumb mmap-all-the-things allocator and a try-transfer function.

    the8472 at 2018-07-05 18:34:27

  374. Instead a more general escape hatch that allows future virtual memory

    Memory allocators (malloc, jemalloc, ...) do not generally let you steal any kind of memory from them, and they do not generally let you query or change what the properties of the memory they own. So what does this general escape hatch has to do with memory allocators?

    Also, virtual memory support differs greatly between platforms, so much that using virtual memory effectively often requires different algorithms per platform often with completely different guarantees. I have seen some portable abstractions over virtual memory, but I haven't seen one yet that wasn't crippled to the point of being useless in some situations due to their "portability".

    gnzlbg at 2018-07-06 12:01:33

  375. You're right. Any such use-case (I was mostly thinking of platform-specific optimizations) is probably best served by using a custom allocator in the first place.

    the8472 at 2018-07-06 18:58:20

  376. Any thoughts on the Composable Allocator API described by Andrei Alexandrescu in his CppCon presentation? The video is available on YouTube here: https://www.youtube.com/watch?v=LIb3L4vKZ7U (he starts to describe his proposed design around 26:00, but the talk is entertaining enough you may prefer to watch it through).

    It sounds like the inevitable conclusion of all this is that collections libraries should be generic WRT allocation, and the app programmer himself should be able to compose allocators and collections freely at the construction site.

    shanemikel at 2018-09-06 00:53:14

  377. Any thoughts on the Composable Allocator API described by Andrei Alexandrescu in his CppCon presentation?

    The current Alloc API allows writing composable allocators (e.g. MyAlloc<Other: Alloc>) and you can use traits and specialization to achieve pretty much everything that's achieved in Andreis talk. However, beyond the "idea" that one should be able to do that, pretty much nothing from Andrei's talk can apply to Rust since the way Andrei builds the API sits on unconstrained generics + SFINAE/static if from the very start and Rust's generics system is completely different to that one.

    gnzlbg at 2018-09-06 09:27:24

  378. I would like to propose stabilizing the rest of the Layout methods. These are already useful with the current global allocator API.

    Amanieu d'Antras at 2018-10-25 17:33:39

  379. Are these all the methods that you mean?

    • pub fn align_to(&self, align: usize) -> Layout
    • pub fn padding_needed_for(&self, align: usize) -> usize
    • pub fn repeat(&self, n: usize) -> Result<(Layout, usize), LayoutErr>
    • pub fn extend(&self, next: Layout) -> Result<(Layout, usize), LayoutErr>
    • pub fn repeat_packed(&self, n: usize) -> Result<Layout, LayoutErr>
    • pub fn extend_packed(&self, next: Layout) -> Result<(Layout, usize), LayoutErr>
    • pub fn array<T>(n: usize) -> Result<Layout, LayoutErr>

    gnzlbg at 2018-10-25 17:44:59

  380. @gnzlbg Yes.

    Amanieu d'Antras at 2018-10-25 17:46:29

  381. @Amanieu Sounds ok to me, but this issue is already huge. Consider filing a separate issue (or even a stabilization PR) that we could FCP separately?

    Simon Sapin at 2018-10-25 22:06:52

  382. from Allocators and lifetimes:

    1. (for allocator impls): moving an allocator value must not invalidate its outstanding memory blocks.

      All clients can assume this in their code.

      So if a client allocates a block from an allocator (call it a1) and then a1 moves to a new place (e.g. vialet a2 = a1;), then it remains sound for the client to deallocate that block via a2.

    Does this imply, that an Allocator must be Unpin?

    Tim Diekmann at 2019-02-24 21:15:40

  383. Good catch!

    Since the Alloc trait is still unstable, I think we still get to change the rules if we want to and modify this part of the RFC. But it is indeed something to keep in mind.

    Simon Sapin at 2019-02-24 23:20:27

  384. @gnzlbg Yes, I'm aware of the huge differences in the generics systems, and that not everything he details is implementable in the same way in Rust. I've been working on the library on-and-off since posting, though, and I'm making good progress.

    shanemikel at 2019-02-25 00:09:07

  385. Does this imply, that an Allocator must be Unpin?

    It doesn't. Unpin is about the behavior of a type when wrapped in a Pin, there's no particular connection to this API.

    srrrse at 2019-02-25 02:12:22

  386. But can't Unpin be used to enforce the mentioned constraint?

    Tim Diekmann at 2019-02-25 09:05:07

  387. Another question regarding dealloc_array: Why does the function return Result? In the current implementation, this may fail in two cases:

    • n is zero
    • capacity overflow for n * size_of::<T>()

    For the first we have two cases (as in the documentation, the implementer can choose between these):

    • The allocation returns Ok on zeroed n => dealloc_array should also return Ok.
    • The allocation returns Err on zeroed n => there is no pointer that can be passed to dealloc_array.

    The second is ensured by the following safety constraint:

    the layout of [T; n] must fit that block of memory.

    This means, that we must call dealloc_array with the same n as in the allocation. If an array with n elements could be allocated, n is valid for T. Otherwise, the allocation would have failed.

    Edit: Regarding the last point: Even if usable_size returns a higher value than n * size_of::<T>(), this is still valid. Otherwise the implementation violates this trait constraint:

    The block's size must fall in the range [use_min, use_max], where:

    • [...]
    • use_max is the capacity that was (or would have been) returned when (if) the block was allocated via a call to alloc_excess or realloc_excess.

    This only holds, as the trait requires an unsafe impl.

    Tim Diekmann at 2019-02-25 13:24:39

  388. For the first we have two cases (as in the documentation, the implementer can choose between these):

    • The allocation returns Ok on zeroed n

    Where did you get this information from?

    All Alloc::alloc_ methods in the docs specify that the behavior of zero-sized allocations is undefined under their "Safety" clause.

    gnzlbg at 2019-02-25 16:56:49

  389. Docs of core::alloc::Alloc (highlighted relevant parts):

    A note regarding zero-sized types and zero-sized layouts: many methods in the Alloc trait state that allocation requests must be non-zero size, or else undefined behavior can result.

    • However, some higher-level allocation methods (alloc_one, alloc_array) are well-defined on zero-sized types and can optionally support them: it is left up to the implementor whether to return Err, or to return Ok with some pointer.

    • If an Alloc implementation chooses to return Ok in this case (i.e. the pointer denotes a zero-sized inaccessible block) then that returned pointer must be considered "currently allocated". On such an allocator, all methods that take currently-allocated pointers as inputs must accept these zero-sized pointers, without causing undefined behavior.

    • In other words, if a zero-sized pointer can flow out of an allocator, then that allocator must likewise accept that pointer flowing back into its deallocation and reallocation methods.

    Tim Diekmann at 2019-02-25 17:07:38

  390. So one of the error conditions of dealloc_array is definitely suspicious:

    /// # Safety
    ///
    /// * the layout of `[T; n]` must *fit* that block of memory.
    ///
    /// # Errors
    ///
    /// Returning `Err` indicates that either `[T; n]` or the given
    /// memory block does not meet allocator's size or alignment
    /// constraints.
    

    If [T; N] does not meet the allocator size or alignment constraints, then AFAICT it does not fit the block of memory of the allocation, and the behavior is undefined (per the safety clause).

    The other error condition is "Always returns Err on arithmetic overflow." which is pretty generic. It's hard to tell whether it is an useful error condition. For each Alloc trait implementation one might be able to come up with a different one that could do some arithmetic that could in theory wrap, so šŸ¤·ā€ā™‚ļø


    Docs of core::alloc::Alloc (highlighted relevant parts):

    Indeed. I find it weird that so many methods (e.g. Alloc::alloc) state that zero-sized allocations are undefined behavior, but then we just provide Alloc::alloc_array(0) with implementation-defined behavior. In some sense Alloc::alloc_array(0) is a litmus test to check whether an allocator supports zero-sized allocations or not.

    gnzlbg at 2019-02-25 17:15:39

  391. If [T; N] does not meet the allocator size or alignment constraints, then AFAICT it does not fit the block of memory of the allocation, and the behavior is undefined (per the safety clause).

    Yup, I think this error condition can be dropped as it's redundant. Either we need the safety clause or an error condition, but not both.

    The other error condition is "Always returns Err on arithmetic overflow." which is pretty generic. It's hard to tell whether it is an useful error condition.

    IMO, it is guarded by the same safety clause as above; if the capacity of [T; N] would overflow, it does not fit that memory block to deallocate. Maybe @pnkfelix could elaborate on this?

    In some sense Alloc::alloc_array(1) is a litmus test to check whether an allocator supports zero-sized allocations or not.

    Did you mean Alloc::alloc_array(0)?

    Tim Diekmann at 2019-02-25 17:28:56

  392. IMO, it is guarded by the same safety clause as above; if the capacity of [T; N] would overflow, it does not fit that memory block to deallocate.

    Note that this trait can be implemented by users for their own custom allocators, and that these users can override the default implementations of these methods. So when considering whether this should return Err for arithmetic overflow or not, one should not only focus on what the current default implementation of the default method does, but also consider what it might make sense for users implementing these for other allocators.

    Did you mean Alloc::alloc_array(0)?

    Yes, sorry.

    gnzlbg at 2019-02-25 18:10:01

  393. Note that this trait can be implemented by users for their own custom allocators, and that these users can override the default implementations of these methods. So when considering whether this should return Err for arithmetic overflow or not, one should not only focus on what the current default implementation of the default method does, but also consider what it might make sense for users implementing these for other allocators.

    I see, but implementing Alloc requires an unsafe impl and implementors has to follow the safety rules mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-467093527.

    Tim Diekmann at 2019-02-25 18:36:58

  394. Every API left pointing here for a tracking issue is the Alloc trait or related to the Alloc trait. @rust-lang/libs, do you feel it’s useful to keep this open in addition to https://github.com/rust-lang/rust/issues/42774?

    Simon Sapin at 2019-03-10 19:40:42

  395. Simple background question: What is the motivation behind the flexibility with ZSTs? It seems to me that, given that we know at compile-time that a type is a ZST, we can completely optimize out both the allocation (to return a constant value) and the deallocation. Given that, it seems to me that we should say one of the following:

    • It's always up to the implementor to support ZSTs, and they can't return Err for ZSTs
    • It's always UB to allocate ZSTs, and it's the caller's responsibility to short-circuit in this case
    • There's some sort of alloc_inner method that callers implement, and an alloc method with a default implementation which does the short circuiting; alloc must support ZSTs, but alloc_inner may NOT be called for a ZST (this is just so that we can add the short-circuiting logic in a single place - in the trait definition - in order to save implementors some boilerplate)

    Is there a reason that the flexibility we have with the current API is needed?

    Joshua Liebow-Feeser at 2019-03-11 07:15:10

  396. Is there a reason that the flexibility we have with the current API is needed?

    It's a trade-off. Arguably, the Alloc trait is used more often than it is implemented, so it might make sense to make using Alloc as easy as possible by providing built-in support for ZSTs.

    This would mean that implementers of the Alloc trait will need to take care of this, but more importantly to me that those trying to evolve the Alloc trait will need to keep ZSTs in mind on every API change. It also complicates the docs of the API by explaining how ZSTs are (or could be if it is "implementation defined") handled.

    C++ allocators pursue this approach, where the allocator tries to solve many different problem. This did not only make them harder to implement and harder to evolve, but also harder for users to actually use because of how all these problems interact in the API.

    I think that handling ZSTs, and allocating/deallocating raw memory are two orthogonal and different problems, and therefore we should keep the Alloc trait API simple by just not handling them.

    Users of Alloc like libstd will need to handle ZSTs, e.g., on each collection. That's definitely a problem worth solving, but I don't think the Alloc trait is the place for that. I'd expect an utility to solve this problem to pop out within libstd out of necessity, and when that happens, we can maybe try to RFC such an utility and expose it in std::heap.

    gnzlbg at 2019-03-11 08:04:22

  397. That all sounds reasonable.

    I think that handling ZSTs, and allocating/deallocating raw memory are two orthogonal and different problems, and therefore we should keep the Alloc trait API simple by just not handling them.

    Doesn't that imply that we should have the API explicitly not handle ZSTs rather than be implementation-defined? IMO, an "unsupported" error is not very helpful at runtime since the vast majority of callers will not be able to define a fallback path, and will therefore have to assume that ZSTs are unsupported anyway. Seems cleaner to just simplify the API and declare that they're never supported.

    Joshua Liebow-Feeser at 2019-03-11 15:44:08

  398. Would specialization be used by alloc users to handle ZST? Or just if size_of::<T>() == 0 checks?

    Jeff Burdges at 2019-03-11 16:14:53

  399. Would specialization be used by alloc users to handle ZST? Or just if size_of::<T>() == 0 checks?

    The latter should be sufficient; the appropriate code paths would be trivially removed at compile time.

    Joshua Liebow-Feeser at 2019-03-11 16:17:10

  400. Doesn't that imply that we should have the API explicitly not handle ZSTs rather than be implementation-defined?

    For me, an important constraint is that if we ban zero-sized allocations, the Alloc methods should be able to assume that the Layout passed to them is not zero-sized.

    There are multiple ways to achieve this. One would be to add another Safety clause to all Alloc methods stating that if the Layout is zero-sized the behavior is undefined.

    Alternatively, we could ban zero-sized Layouts, then Alloc doesn't need to say anything about zero-sized allocations since these cannot safely happen, but doing that would have some downsides.

    For example, , some types like HashMap build up the Layout from multiple Layouts, and while the final Layout might not be zero-sized, the intermediate ones might be (e.g. in HashSet). So these types would need to use "something else" (e.g. a LayoutBuilder type) to build up their final Layouts, and pay for a "non-zero-sized" check (or use an _unchecked) method when converting to Layout.

    Would specialization be used by alloc users to handle ZST? Or just if size_of::<T>() == 0 checks?

    We can't specialize on ZSTs yet. Right now all code uses size_of::<T>() == 0 .

    gnzlbg at 2019-03-11 16:17:24

  401. There are multiple ways to achieve this. One would be to add another Safety clause to all Alloc methods stating that if the Layout is zero-sized the behavior is undefined.

    It'd be interesting to consider whether there are ways to make this a compile-time guarantee, but even a debug_assert that the layout is non-zero-sized should be sufficient to catch 99% of bugs.

    Joshua Liebow-Feeser at 2019-03-11 16:21:26

  402. I haven't paid any attention to discussions about allocators, so sorry about that. But I've long wished that the allocator had access to the type of the value its allocating. There may be allocator designs that could use it.

    Brian Anderson at 2019-04-04 21:40:44

  403. Then we'd probably have the same issues as C++ and it's allocator api.

    Tim Diekmann at 2019-04-04 21:54:49

  404. But I've long wished that the allocator had access to the type of the value its allocating. T

    What do you need this for?

    gnzlbg at 2019-04-05 14:07:41

  405. @gnzblg @brson Today I had a potential use case for a knowing something about the type of value being allocated.

    I'm working on a global allocator which can be switched between three underlying allocators - a thread local one, a global one with locks, and one for coroutines to use - the idea being I can constrain a coroutine representing a network connection to a maximum amount of dynamic memory usage (in the absence of being able to control allocators in collections, esp in third party code)*.

    It'd be possibly be handy to know if I'm allocating a value that might move across threads (eg Arc) vs one that won't. Or it might not. But it is a possible scenario. At the moment the global allocator has a switch which the user uses to tell it from which allocator to make allocations (not needed for realloc or free; we can just look at the memory address for that).

    *[It also allows me to use NUMA local memory wherever possible without any locking, and, with a 1 core 1 thread model, to cap total memory usage].

    Raphael Cohn at 2019-04-12 07:14:34

  406. @raphaelcohn

    I'm working on a global allocator

    I don't think any of this would (or could) apply to the GlobalAlloc trait, and the Alloc trait already has generic methods that can make use of type information (e.g. alloc_array<T>(1) allocates a single T, where T is the actual type, so the allocator can take the type into account while performing the allocation). I think it would be more useful for the purposes of this discussion to actually see code implementing allocators that make use of type information. I haven't heard any good argument about why these methods need to be part of some generic allocator trait, as opposed to just being part of the allocator API, or some other allocator trait.

    gnzlbg at 2019-04-12 08:47:24

  407. I think it would also be very interesting to know which of the types parametrized by Alloc do you intend to combine with allocators that use type information, and what do you expect to be the outcome.

    AFAICT, the only interesting type for that would be Box because it allocates a T directly. Pretty much all other types in std never allocate a T, but some private internal type that your allocator can't know anything about. For example, Rc and Arc could allocate (InternalRefCounts, T), List / BTreeSet / etc. allocate internal node types, Vec/Deque/... allocate arrays of Ts, but not Ts themselves, etc.

    For Box and Vec we could add in backwards compatible ways a BoxAlloc and an ArrayAlloc trait with blanket impls for Alloc that allocators could specialize to hijack how those behave, if there is ever a need to attack these problems in a generic way. But is there a reason why providing your own MyAllocBox and MyAllocVec types that conspire with your allocator to exploit type information isn't a viable solution ?

    gnzlbg at 2019-04-12 09:36:51

  408. As we now have a dedicated repository for the allocators WG, and the list in the OP is out of date, this issue may be closed to keep discussions and tracking of this feature in one place?

    Tim Diekmann at 2019-05-04 12:40:01

  409. A good point @TimDiekmann! I'm going to go ahead and close this in favor of discussion threads in that repository.

    Alex Crichton at 2019-05-06 15:16:48

  410. This is still the tracking issue that some #[unstable] attribute point to. I think it should not be closed until these features have been either stabilized or deprecated. (Or we could change the attributes to point to a different issue.)

    Simon Sapin at 2019-05-06 15:50:46

  411. Yeah unstable features referenced in git master should definitely have an open tracking issue.

    jethrogb at 2019-05-06 17:30:56

  412. Agreed. Also added a notice and link to the OP.

    Aria Desires at 2019-05-06 17:47:57

  413. Figured I would put this in here. I ran into a compiler bug when using the allocator_api feature with Box: #90911

    adsnaider at 2021-11-23 03:27:56

  414. Figured I would put this in here. I ran into a compiler bug when using the allocator_api feature with Box: https://github.com/rust-lang/rust/issues/90911

    In fact this feature is littered with ICEs. That's because it fits quite badly with MIR and the MIR-consuming parts of the compiler -- codegen, Miri all need to do lots of special-casing for Box, and adding an allocator field made it all a lot worse. See https://github.com/rust-lang/rust/issues/95453 for more details. IMO we shouldn't stabilize Box<T, A> until these design issues are resolved.

    Ralf Jung at 2022-07-04 03:30:17

  415. Box being #[fundamental] means that one can implement core traits for Box<i32, MyCustomAllocator>, and that probably ought to be prevented before stabilizing the type parameter.

    scottmcm at 2022-09-15 05:32:58

  416. I have a lot of concerns about the current API -- t's not a good fit for existing allocators leading to perf losses, and it's story is quite poor when used with dynamic dispatch (a common use case for allocators).

    It's a bit surprising that this apparently has finished FCP with merge disposition, but IMO there's more than just the issue around #[fundamental] that needs work here.

    Thom Chiovoloni at 2022-09-15 05:43:48

  417. I'd also prefer if we had at least a solid plan for how to properly solve the problems arising from Box being both a primitive type (for purposes like noalias and align attributes) and having arbitrary user-defined data inside it. It looks like for now the ICE wack-a-mole slowed down but architecturally in the compiler it is still somewhat messy.

    Things did get a lot better since my previous message thanks to the new "derefer" MIR pass, but there are a bunch of remaining rough edges.

    Ralf Jung at 2022-09-15 05:58:43

  418. I'd also chime in that there was, fairly recently, discussion on zulip about a storages proposal. Stabilizing the current API would likely render implementing such a proposal in a backwards-compatible way impossible. I personally would like to at least have time for such a proposal to be made and decided on before stabilizing allocators as-is

    Rune Tynan at 2022-09-16 00:38:57

  419. the FCP labels should probably be removed since they appear to be out of date

    Jacob Lifshay at 2023-01-18 18:00:33

  420. 2016

    Deleted user at 2023-03-24 00:21:43

  421. We discussed this in the libs meeting yesterday and would like to, as a first step, stabilize just the Allocator trait without stabilizing its use in standard library container just yet.

    There are several blockers that need to be resolved before this can happen:

    • Issues around the exact lifetime of allocations for allocators with lifetimes needs to be resolved. Specifically:
      • #94069
      • #90822
    • Add a method to the Allocator trait to determine whether 2 allocator instances are equivalent:
      • https://github.com/rust-lang/wg-allocators/issues/109
    • Deprecate GlobalAlloc and change #[global_allocator] to require Allocator instead.
      • https://github.com/rust-lang/wg-allocators/issues/43
      • https://github.com/rust-lang/wg-allocators/issues/21

    We are specifically choosing to defer the following features:

    Amanieu d'Antras at 2023-05-19 16:47:09

  422. Is the thinking that we don't think adding allocator parameters to collections will teach us anything about the allocator trait itself? Not necessarily agreeing or disagreeing, just want to be explicit about it.

    John Ericson at 2023-05-19 16:57:17

  423. Sort of. We think the Allocator trait is mostly ready, but we would like to get a better idea of how it is used with collections in the ecosystems so that we can apply this when adding support to the types in the standard library.

    Amanieu d'Antras at 2023-05-19 21:34:18

  424. There is also the issue that you don't want to change the stdlib collections in a way that would make it hard to support Storage later.

    Jules Bertholet at 2023-05-19 21:55:07

  425. Is the allocator trait really ready? I've tried to use it a few times in the past and it felt really awkward and painful (I've shared a writeup before, earlier in this thread, focused around the pain of trying to use dyn Allocator: https://hackmd.io/ld8wv8kHQaKF_YsU6r8CKQ).

    It also doesn't map well to existing system allocator APIs, or to our existing GlobalAlloc trait. And relevant to even just the standard library usage in particular, the stuff around excess is almost unusable due to not being able to distinguish between cases that desire excess capacity and cases that don't.

    I get the desire to push for stabilization on a long-unstable feature that many users want, but I don't know that this is actually a trait in a position to be stabilized without us regretting it later.

    Thom Chiovoloni at 2023-05-19 21:59:48

  426. Is the allocator trait really ready? I've tried to use it a few times in the past and it felt really awkward and painful (I've shared a writeup before, earlier in this thread, focused around the pain of trying to use dyn Allocator: hackmd.io/ld8wv8kHQaKF_YsU6r8CKQ).

    I did read that, but you don't propose any concrete way to improve upon what we currently have. It's also difficult for me to understand how well the solutions that you propose apply since I am not familiar with your code base.

    Amanieu d'Antras at 2023-05-19 22:04:49

  427. That's true, I think my concrete suggestion for that case would be to follow more along the design of RawWaker, which manages to allow dynamic dispatch without requiring an allocator.

    Thom Chiovoloni at 2023-05-19 22:18:27

  428. (NOT A CONTRIBUTION)

    Is the allocator trait really ready? I've tried to use it a few times in the past and it felt really awkward and painful (I've shared a writeup before, earlier in this thread, focused around the pain of trying to use dyn Allocator: https://hackmd.io/ld8wv8kHQaKF_YsU6r8CKQ).

    I don't see any way an API like waker would improve the situation. You wrote:

    … Especially because in the majority of cases, you shouldn’t even need to allocate anything for this, since it’s a ZST anyway. And for that matter, for these you shouldn’t need to maintain a refcount)…

    For completeness, there’s also Vec<T, &'static dyn Allocator>. This works, but is either dangerous or unflexible.

    If all you care about are ZST allocators that live the length of the program and you want dynamic dispatch, &'static dyn Allocator is absolutely the right type for you. They're ZSTs, you can put one in a static and use its allocator implementation.

    If you want them to be able to hold data and live for less than the length of your program (like an arena), then you need &'a dyn Allocator and, yea, you need to manage that lifetime. Them's the breaks.

    A waker-like API would force dynamic dispatch on everyone, and the only thing it would allow is using raw pointers as the receiver type.


    Allocator should probably be implemented for all of the pointer types to a type that implements Allocator, not just shared reference.


    I'm personally not very convinced AllocError pulls its weight and would weigh the option of just using Option. You know you're never going to add fields to AllocError because it would pessimize calls to malloc. What are the benefits of AllocError?

    srrrse at 2023-05-19 22:28:34

  429. FooLibError can impl From<AllocError> and then ? will Just Work.

    Lokathor at 2023-05-19 22:31:48

  430. Allocator should probably be implemented for all of the pointer types to a type that implements Allocator, not just shared reference.

    FWIW that shared reference impl is a perf footgun, not sure if we want to keep it: https://github.com/rust-lang/rust/issues/98232.

    Ralf Jung at 2023-05-20 07:05:05

  431. If all you care about are ZST allocators that live the length of the program and you want dynamic dispatch, &'static dyn Allocator is absolutely the right type for you. They're ZSTs, you can put one in a static and use its allocator implementation.

    If you want them to be able to hold data and live for less than the length of your program (like an arena), then you need &'a dyn Allocator and, yea, you need to manage that lifetime. Them's the breaks.

    @thomcc Not sure if helpful to you, but in my engine I have built an abstraction for "allocator + data allocated in it".

    https://gist.github.com/yanchith/113bca60ccf86b79ad7e4b0ddec98c64

    This lets me pretend that 'a is 'static and stops the lifetime contagion. Of course, this is unsafe , but the tradeoff I made is to make resetting the allocator unsafe so that everything else, even leaking, can be safe. The invariant for resetting is that the allocator must not have been used for any data that's not tracked by the abstraction.

    This mostly works very well, but there's still a few annoyances, like the data inside the abstraction always having to be initialized and usable, so every in a couple of places I have to pass in a constructor function that creates the data from the allocator.

    (The code itself is tailored to work with reference to one specific allocator, but I can imagine making it work with &dyn Allocator too)

    JÔn Tóth at 2023-06-09 10:51:56

  432. So, I still think the support for dyn is not great but honestly it's been long enough since I wrote that that I don't care to dig up the details (they weren't static ZSTs, nor were they borrowed, they were reference counted but allocating memory for the Rc requires an allocator... Anyway, it's not important), and I think it can be solved later anyway.

    More broadly it is by far not the only issue I hit. Here's one that's easy for me to argue for, that I expect nearly everybody to hit as they use the API for the first time (even now when writing this I messed up multiple times, despite thinking very intently about the issue). And the only benefit we get for it is a mild theoretical one:

    Supporting zero-sized allocations

    Allocator, allows users to call it with zero-sized allocation requests, since this allows allocate to be a safe function. This is a change from GlobalAlloc, which forbids this, with the constraint that its input Layout must not be zero-sized.

    This has been discussed in the past. The discussions focused on performance[^perf], and suggested the concerns about difficult checking were solved by adding an extra method (which is insufficient for reasons I'll get into, and no longer exists anyway).

    [^perf]: The performance measurments and arguments seem to have failed to account for dynamic dispatch, but perhaps Allocator was not object safe at that point. (It's also really hard to measure branch prediction overhead in a microbenchmark, due to branch predictors being more sophisticated the benchmarks most folks write). Either way this isn't a part of my argument, although I do think it matters.

    There are two reasons I believe this to be a mistake:

    1. Making Allocator::allocate safe is useless. There is no way to use the memory it returns without unsafe code, and realistic uses will want to deallocate that memory later, which requires calling an unsafe method.

    2. The impact of this decision is that every single method on Allocator must now be certain that it is handling zero-sized layouts correctly. This ends up being really awkward -- suddenly when implementing Allocator::shrink, you must start by determining "is this actually shrink, or is this a round-about way of calling deallocate" (e.g. is it being shrunk to zero-sized layout), similarly "grow" needs to determine if it's actually being asked to do the initial allocation.

    Concretely, a correct[^1] implementation of Allocator that overrides all methods (obviously if you use our defaults it will be fine, but they may be significantly slower than what is possible), is very likely[^2] to need to end up handling zero sized allocations in the following manner, or something close:

    [^1]: I'm considering "returns error on all zero-sized allocation" to be a form of incorrectness, as none of the stdlib allocators behave this way. It's obviously sound to do so, however.

    [^2]: The main case where you can get away without these checks is a bump allocator, which has no deallocation or resizing, and can be written to avoid the branch on alloc size==0, but often will have to trade a small amount of wasted space (to align the zero-sized allocation) in order to do so. Either way such an allocator is not harmed by changing this design (if anything, their design argues that we should care about adding branching into these functions). Besides, most allocator authors will not be calling bump allocators, so I don't think it should be considered representative.

    • In allocate and allocate_zeroed, handle the case where the layout is zero-sized, and return the correct dangling pointer.
      if layout.size() == 0 {
          return Ok(NonNull::slice_from_raw_parts(layout.dangling(), 0));
      }
      
    • In deallocate, handle the case where the layout is zero-sized, and do nothing:
      if layout.size() == 0 {
          return;
      }
      
    • In grow, handle the case where old is zero-sized, and the case where both old and new are zero-sized (although the "both zero" case is handled inside self.allocate already):
      if old.size() == 0 {
          // Note: `self.allocate` handles `new.size() == 0`
          return self.allocate(new);
      }
      
    • grow_zeroed is the same as grow, but with allocate_zeroed instead of allocate:
      if old.size() == 0 {
          // Note: `self.allocate` handles `new.size() == 0`
          return self.allocate_zeroed(new);
      }
      
    • And finally, in shrink: new may be zero-sized, and if it is, you need to call deallocate and then return dangling pointer (based on new.dangling(), not old -- I got this wrong when writing this example):
      if new.size() == 0 {
          // Note: `self.deallocate` handles `old.size() == 0`
          self.deallocate(ptr, old);
          return Ok(NonNull::slice_from_raw_parts(new.dangling(), 0));
      }
      

    (A complete version is here)

    Individually, none of these seem that bad, but they're actually fairly tricky to get right, and took me a few tries. Online, I wasn't able to find folks who got it right either, although usually they failed to check at all.

    I think there are cases where we'd say this is fine for a low-level unsafe trait, but we really get no benefit from it, and even have already made the decision once to have it be unsafe.


    Anyway, I'll leave it there for now. Basically, I think we actually got it right with GlobalAlloc::alloc being unsafe -- allocate being safe is useless, and it's not worth complicating (nearly) every implementation this way.

    (I started a zulip thread for this https://rust-lang.zulipchat.com/#narrow/stream/197181-t-libs.2Fwg-allocators/topic/Forbidding.20Zero-sized.20layouts.20in.20Allocator.3A.3Aallocate.2E)

    Thom Chiovoloni at 2023-06-10 14:16:41

  433. I wrote up another that has bugged me with Vec for a while. Basically it's unfortunate that grow() is forced to copy the whole capacity of the vec rather than just the part in use.

    Zulip: https://rust-lang.zulipchat.com/#narrow/stream/197181-t-libs.2Fwg-allocators/topic/Allocator.20growth.20method.20and.20length.20param

    I mention this because while it's minor, it also seems like adding it after stabilization would be a lot worse[^1], and it can show up in performance profiles.

    [^1]: Either we default the impl to calling the existing grow method (bad because we don't get to use it to fix Vec's problem), or we default it to allocate new+copy+dealloc old, which hurts most custom allocators far more than this problem.

    Thom Chiovoloni at 2023-06-10 15:33:34

  434. Oh, the current zero-sized semantics are even more problematic than I realized. @Amanieu pointed out that you could handle zero-sized allocations in your Allocator by bumping them up to some non-zero value.

    I had forgotten this but yeah it's intended to work -- allocators are supposed to be able to round/bump up their allocations, and so long as they do so consistently and don't require callers reflect that increase in the Layouts they use later on everything is supposed to be fine.

    Unfortunately, this means zero-sized allocations are:

    1. no longer free in terms time or space.
    2. something you must deallocate.

    The first would be terribly annoying for the end user, since now Box::new_in needs to loose its guarantees, but... the second is a bigger problem.

    Concretely, it's terrible for container authors as well, since it means that if they decide not to handle zero sized types (and let it be handled entirely by the allocator), they can no longer use something like capacity == 0 as the constraint to determine whether or not they need to free the memory.

    Consider the following situation (which is a specific instance of a more general problem, rather than a bug that only is in Vec):

    1. An allocator BumpZeroAllocator exists that implements the "bump up zero-sized allocation" strategy.
    2. You have a Vec<T, BumpZeroAllocator>, which has a zero length but not a zero capacity.
    3. You call shrink_to_fit() on the vec.
    4. The vec requests an allocator.shrink call happen to reduce the size from whatever it was, into a zero-sized layout (because the Vec had len==0)
    5. The allocator bumps the zero-sized target layout to be nonzero, but otherwise services the allocation exactly, returning a NonZero<[u8]> with a non-zero length (e.g. a nonzero excess).
    6. The vec (well, its rawvec) ignores the NonZero excess, because it's allowed to do so. It stores zero as the capacity, and the result pointer as the pointer.
    7. Later, when Vec is dropped, it checks if the capacity was zero to know if the allocation needs to be freed. It is zero, so it does not get freed.
    8. Unfortunately, the allocation did need to be freed, and didn't get freed.

    This is a real bug we have today: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=f47a46ddf21ef83a3146197af79574e5. Note that it's not specific to shrink, as it could happen with any zero-sized allocation request. Also very important to point out: This is not a nefarious Allocator impl, and it's not misusing or incorrectly implementating any part the API as far as I know -- this is intended to be a legal Allocator implementation.

    That said, I don't see a good way to fix this without making all our collection types check for zero-sized allocations prior to calling their allocator. Or start carving out ways zero-sized layouts need to be handled.


    TLDR: If the allocator is responsible for handling zero-sized layouts, it becomes very very hard to implement correct container types without handling zero-sized layouts on the container side of the API (in addition to what I said previously about the allocator-side needing to handle them).

    Vec has a leak caused by this in from completely legal (one that was intended by design) allocator implementation, and to fix it we'd need to basically always call Allocator with non-zero layouts, meaning often both sides of the API would have the checks...

    Additionally, allowing this would probably make us have to give up the comment on Box::new_in that says "This doesn’t actually allocate if T is zero-sized.", because the underlying allocator might.

    In general, the semantics are hard to work with for implementors of allocators (in a more obvious and annoying way), hard to work with for users of allocator (in a much more subtle way), and push both sides of the API to need to implement the same checks. And they also give us almost no practical benefit, since the API is still unsafe in practice. So, I get that the design has a ton of inertia, but... yeah.

    Thom Chiovoloni at 2023-06-10 21:47:46

  435. I think that both Box and Vec need to be fixed to check for a capacity of 0 and not allocate memory in this case. In particular, RawVec::shrink needs to check if it is shrinking to a capacity of 0 and free its backing memory instead of shrinking it.

    In essence, it is the allocator caller's responsibility to properly handle ZSTs before calling into the allocator, if they wish to guarantee no allocation for ZSTs. Allocators are required to properly handle zero-sized layouts in that the returned allocation can later be deallocated, but they are not required to ensure that zero-sized allocations do not consume memory. In fact, allocators should avoid special-casing zero-sized layouts and let the caller deal with it.

    This is not an uncommon stance. For example, most malloc implementations in libc (including glibc, dlmalloc, musl) will allocate memory for malloc(0) so that the resulting pointer is non-null and can be passed to free.

    Amanieu d'Antras at 2023-06-11 00:17:51

  436. In particular, RawVec::shrink needs to check if it is shrinking to a capacity of 0 and free its backing memory instead of shrinking it.

    At the point where every call-site has to special case this, we should just admit that the part of the allocator interface that tries to make zero-sized layouts allowed is a bad part of that interface. But more broadly this seems extremely unfortunate to me.

    The problem is that you're basically suggesting that it's fine if Box/Vec have a convention where they special case zero-sized layouts, and that this can be an implementation detail of their use of allocators, which other container authors can either decide to adopt, or not.

    I don't know that that works, I think it's very surprising and confusing, but concretely: Box and Vec promise that you're able to allocate memory using the allocator separately and construct them from the memory it gives back. This promise currently applies only to Global but is quite nice, and would an unfortunate limitation on the allocator interface if it lost this. However, this means that everybody using Allocator pretty much has to do it like Box/Vec do it.

    In essence, it is the allocator caller's responsibility to properly handle ZSTs before calling into the allocator, if they wish to guarantee no allocation for ZSTs

    Yeah so I don't get the point of this argument. What do we get out of keeping the API the way it is now? It's clearly error-prone to use (std got it wrong), I can't speak as concretely but I'm telling you that I've found it very error-prone to write allocators with the current API.

    The status quo will end up with us checking before calling the allocator, and then the allocator checking after being called. This seems like clearly the worst of all possible situations.


    I mean, The easy way to solve all of these problems IMO is just to forbid zero-sized layouts in the allocator interface, and document that the process for "allocating" a zero-sized block of memory is layout.dangling(). That fixes all of it, and makes it very clear to the users of the allocator trait what their responsibilities are.

    Is it possible to have an allocator trait without that limitation? Yeah, it is, and it would look like the one we have. It's just very hard to use, easy to misuse, error-prone, and gives very little benefits.

    If you're in a case where the allocator you really need to call some custom allocator and don't want to pay for a zero-size check on either side, I think it's fine to just have that call not go throught the Allocator interface.

    Thom Chiovoloni at 2023-06-11 01:25:14

  437. We can still keep allocate function safe by adding NonZeroLayout.

    Gary Guo at 2023-06-11 11:43:15

  438. I'm in favor of that. It makes what to do clear and gives a place to put the documentation about how to "allocate" a zero-sized layout.

    Thom Chiovoloni at 2023-06-12 05:24:05

  439. Concretely, it's terrible for container authors as well, since it means that if they decide not to handle zero sized types (and let it be handled entirely by the allocator), they can no longer use something like capacity == 0 as the constraint to determine whether or not they need to free the memory.

    That seems quite unsurprising? You have to either consistently handle size zero in your container or let it be handled by the allocator, mix-and-match cannot work.

    Arguably RawVec is at fault here for not properly tracking if it actually has any memory that needs to be given back to the allocator. If zero-sized allocation was UB, presumably shrink-to-0 would also be UB, so the RawVec code would still be wrong, but for different reasons?

    Ralf Jung at 2023-06-16 13:28:26

  440. Hey guys, I just created a new issue to discuss. https://github.com/rust-lang/wg-allocators/issues/114

    I am not actually pushing towards splitting the trait, but I would like this issue to be addressed.

    Victor at 2023-06-16 14:01:55

  441. If zero-sized allocation was UB, presumably shrink-to-0 would also be UB, so the RawVec code would still be wrong, but for different reasons?

    Yes, we'd need to use NonZeroLayout (or whatever) there too. I think we'd need to use it everywhere?

    Thom Chiovoloni at 2023-06-23 06:37:35

  442. My question was: isn't the RawVec code wrong either way in how it handles shrink-to-0, both with the current Allocator trait and with your proposed "non zero" variant? You said the current trait is a footgun, but it seems your proposal doesn't actually resolve that footgun.

    Ralf Jung at 2023-06-25 07:11:19

  443. It solves it by forcing you to use deallocation rather than shrinking to a zero-sized allocation, because creating the zero-sized layout would fail (or be UB if done improperly).

    Thom Chiovoloni at 2023-06-25 07:18:16

  444. So that's the same solution as with an allocator that does support zero size.

    Is the point that supporting zero size doesn't bring any benefit for types like Vec that aim to not invoke the allocator in new? You seemed to say that the current Allocator contract makes things worse for Vec (compared to a non-zero-size Allocator) and I don't see how.

    Ralf Jung at 2023-06-25 13:18:13

  445. Yes, but that's highly non-obvious[^1], and will be a subtle footgun that's hard to get right -- specifically, this is hard to get right for both the user of the Allocator API, and implementer of the Allocator API. In concrete terms, you will often end up with checks on both side of the API boundary, which is definitely a sign that something is wrong.

    I didn't say that the API cannot be made to work in the way it currently is, just that it's a footgun on both ends.

    [^1]: Your earlier comment that it's unsurprising is counter to my experience that almost everybody using this API has gotten it wrong, including the stdlib.

    Thom Chiovoloni at 2023-06-26 10:15:14

  446. IMO that the source of the problem here is Vec insisting on not interacting with the allocator in Vec::new. This means it must always know whether it has something to give back to the allocator or not. That's an important piece of state to track. If Vec used a capacity of Option<usize> where None indicates "not allocated" this would be trivial, but Vec chose to represent None as 0 and that's where things start to become subtle. The shrinking logic seems to currently get that wrong, but I wouldn't blame the allocator API for that -- it's Vec that chose to be "clever" in its state representation here.

    In most cases it is probably perfectly fine to have a data structure invariant saying that the data pointer is always something that was returned by alloc, and hence can be freed by dealloc. Then grow/shrink can be implemented in the obvious way without any checks on the data structure side. That would be a Vec that just always calls alloc in new and dealloc in drop. However, the real Vec chooses to not follow that path and instead has an invariant saying "if the capacity is non-zero, then there is some memory to give back to the allocator, else there is not". Vec then has to live with the consequences of that choice, in particular it uses capacity 0 as a sentinel itself and hence has to always special-case this situation everywhere. Why would we punish every user of Allocator (having them special case capacity 0) for this somewhat special choice Vec made?

    Or is this common enough that we want to tailor the entire allocator trait towards data structures that use capacity 0 as a sentinel themselves to avoid calling alloc in new?

    Ralf Jung at 2023-06-26 11:46:53

  447. Or is this common enough that we want to tailor the entire allocator trait towards data structures that use capacity 0 as a sentinel themselves to avoid calling alloc in new?

    Most collections do this, yeah. To enable an empty collection to be about as cheap as an Option::None, to make it const fn new() and to make construction panic-free.

    Though as @thomcc already mentioned Box is inconsistent in when it calls the allocator or uses a dangling pointer.

    the8472 at 2023-06-26 14:41:50

  448. I've addressed the issues with ZST handling in Box and Vec in #113113.

    As a side note, all the logic for handling zero-sized allocations could also be implemented as a type that wraps an Allocator and causes zero-sized allocations to become no-ops. We don't have to provide this in the standard library, it can just be a crate.

    Amanieu d'Antras at 2023-06-28 01:21:22

  449. How would such a wrapper type handle dealloc? An inconsistent abstraction could be a footgun.

    Lilith River at 2023-06-28 02:39:02

  450. The wrapper would return NonNull::dangling for all zero-sized allocations and ignore all zero-sized deallocations. Non-zero allocations are passed on to the underlying allocator.

    Amanieu d'Antras at 2023-06-28 02:40:15

  451. That makes sense; I forgot size was passed to dealloc.

    On Tue, Jun 27, 2023, 8:40 PM Amanieu @.***> wrote:

    The wrapper would return NonNull::dangling for all zero-sized allocations and ignore all zero-sized deallocations. Non-zero allocations are passed on to the underlying allocator.

    — Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-1610574584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2LH4DZ5YKLOAMLJVIJGLXNOKR5ANCNFSM4CANQUGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

    Lilith River at 2023-06-28 02:42:39

  452. As a side note, all the logic for handling zero-sized allocations could also be implemented as a type that wraps an Allocator and causes zero-sized allocations to become no-ops.

    This is true, but you end up with code performing the same checks many times. For example, both sides of the interface end up checking here if your underlying allocator doesn't handle zero-sized allocations well, and most (or at least very many) designs for allocators do not, in my experience.

    Given that forbidding it is consistent with the earlier design (GlobalAlloc), I don't really see what we get by allowing it at this point when it clearly causes trouble both for users and implementers of the API.

    Thom Chiovoloni at 2023-06-28 07:43:33

  453. I don't really see what we get by allowing it at this point when it clearly causes trouble both for users and implementers of the API.

    It can't possible cause trouble for users -- every user that is correct wrt the more restricted API (that is UB on zero-sized allocs) is also correct wrt the current API.

    Ralf Jung at 2023-06-28 11:05:03

  454. Another reason to have the collection check for zero size: for e.g. Box<T>, the collection knows statically whether T is a ZST, so the checks can be optimized out.

    It can't possible cause trouble for users -- every user that is correct wrt the more restricted API (that is UB on zero-sized allocs) is also correct wrt the current API.

    Correctness trouble is the worst kind of trouble, but not the only kind. Users also want performance, which the current API pessimizes.

    Jules Bertholet at 2023-06-28 12:42:59

  455. I also don't necessarily agree that the current API avoids correctness issues. We clearly had one in the stdlib code, which would have almost certainly been avoided if we made it clear that zero-sized allocations were forbidden (especially if we forbid it by making things take a NonZeroLayout or such).

    My experience is that the sort of optimizations that the stdlib does here are quite common, and not a special case. This is likely because of how long we've documented that zero-sized allocations are free and infallible (something which definitely won't be universally true if we delegate this to the underlying allocator).

    Thom Chiovoloni at 2023-06-28 14:25:36

  456. FWIW, if I read https://github.com/rust-lang/rust/commit/56cbf2f22aeb6448acd7eb49e9b2554c80bdbf79 correctly, then RawVec actually did this correctly when the Allocator (AllocRef back then) trait was changed to allow zero-sized allocations. The bug was introduced in https://github.com/rust-lang/rust/commit/2526accdd35c564eee80b6453a0b4965e6a76afd, a later commit in the same PR (https://github.com/rust-lang/rust/pull/70362).

    I also don't necessarily agree that the current API avoids correctness issues.

    I agree that for data structures that have a special zero-size state themselves where they don't own any memory that must be given back to the allocator, the current API does not help at all.

    The question is, is that case common enough to justify hurting the alternative case where data structures would be completely fine with always owning some allocated memory, even when the size is 0 -- basically leaving it to allocators to make size 0 efficient, instead of having to implement that optimization for each data structure? Naively it seems much better to do this in the allocator once rather than in every single data structure, but clearly Vec disagrees since it doesn't trust the allocator doing this right. Interesting.

    Here is a possible alternative to just making Allocator require non-ZST: we could say that passing the result of NonNull::dangling() to deallocate/grow/shrink with a size-zero layout must always be allowed and not attempt to actually deallocate that pointer. Then Vec::new could still use NonNull::dangling, but all the rest of the Vec code could freely treat the backing buffer as if it was created by the allocator, and drop could unconditionally call deallocate.

    That would make the buggy shrink actually legal. It would avoid having to re-implement the "size 0 optimization" in each and every data structure, instead having it implemented once in the allocator. So just from a code structure perspective, it seems to me like that is the API that types like Vec actually want: a const-compatible, zero-cost fn new without having to worry about size 0 anywhere. allocate would be safe without the need for a NonZeroLayout.

    What I don't know is whether this API is something allocators can reasonably implement. @thomcc what do you think?

    Ralf Jung at 2023-06-28 15:22:01

  457. Here is a possible alternative to just making Allocator require non-ZST: we could say that passing the result of NonNull::dangling() to deallocate/grow/shrink with a size-zero layout must always be allowed and not attempt to actually deallocate that pointer.

    How would you tell apart dangling() from a valid pointer?

    Gary Guo at 2023-06-28 16:12:24

  458. Vec does it, so clearly it is possible. It would be up to the allocator to ensure that for size 0, it can never confuse dangling with a valid pointer.

    Ralf Jung at 2023-06-28 16:18:36

  459. Vec checks for capacity == 0 to determine whether its pointer is dangling.

    Jules Bertholet at 2023-06-28 16:32:35

  460. a const-compatible, zero-cost fn new without having to worry about size 0 anywhere

    I'm not sure how it's const-compatible unless the allocator is a const trait/impl/something (or at least allocation is const), which doesn't seem likely to be the case most of the time (since usually allocation requires a number of operations which are problematic for const). I have to think about the rest of your comment though.

    Thom Chiovoloni at 2023-06-28 21:02:09

  461. NonNull::dangling is a const fn, which is all that is needed for Vec::new. (The function would stay unchanged compared to what the stdlib does right now.)

    To be clear, I have no idea if this proposal makes sense. The part where the allocator, in dealloc, has to handle dangling and therefore has to ensure to never actually put a real allocation (that must be freed) of size 0 at that address is certainly somewhat suspicious. I arrived at this purely from the perspective of "what would it take for Vec to not need to special-case capacity 0".

    Ralf Jung at 2023-06-28 21:06:29

  462. Oh, I see. I misinterpreted your proposal then.

    I think the problem here is that now dealloc can't solely use allocation size to tell the difference between, since it needs to know if the allocation came from a call to alloc with a zero-sized layout, or if it came from NonNull::dangling(), which can return a valid pointer (and in practice often will on 32 bit targets if the alignment is sufficiently high).

    Thom Chiovoloni at 2023-06-28 21:10:14

  463. So sounds like allocators would basically be forced to not have an actual allocation for size 0, so that they can make 'deallocate' always a NOP when the size is 0.

    Ralf Jung at 2023-06-29 06:04:12

  464. Yeah. We'd have to document that a zero sized allocation needs to be equivalent to dangling (or at least some kind of no-op), which seems a bit odd to me, but it would work. Basically, instead of telling users of Allocator that they cant' give allocators a zero-sized layout and should allocate a zero-sized layout in a certain way, we're saying they must behave a specific manner when given zero-sized layouts.

    The downside here is that the checks couldn't always be removed in cases where Allocator is a trait object. It also plausibly adds a branch into the allocation path that could otherwise be avoided. It also is a bit error-prone if not documented properly, as I go over in https://github.com/rust-lang/rust/issues/32838#issuecomment-1585684243 (although I've softened on this proposal since the issues I hit could possibly be addressed with documentation).

    This would make the "handle zero-sized layout by rounding up" approach suggested elsewhere in this thread invalid though (but I don't see a way to keep it without many other downsides).

    Thom Chiovoloni at 2023-06-29 07:36:42

  465. I wrote a blog post announcing that I'm intending on working on the Allocator design, and I wrote down (roughly) a set of things I'm thinking about https://shift.click/blog/allocator-trait-talk.

    Largely speaking my feeling that Allocator needs more comprehensive rework/consideration is why I haven't filed a PR for the extra parameter for grow, for any changes around zero-sized allocations[^1], etc.

    [^1]: I've started to have second thoughts on zero-sized allocations, which is one of the things I'm hoping to work through. I think that perhaps Allocators which use resources for zero-sized allocations is a little analogous to Iterators which return None in the middle -- IOW, could an approach more similar to FusedIterator/Fuse work? I'm not sure, maybe.

    Anyway, all of this is tricky because Allocator is trying to serve so many roles, so it's hard to find a design that doesn't end up making a trade-off that negatively impacts something or other, and it takes a lot of experimentation to play around with different implementations of the trait and code using it.

    Thom Chiovoloni at 2023-08-07 03:09:18

  466. I'm sorry but isn't both "new_in" and "with_capacity_in" have very minor mistakes in source documentation in example section? Or am I missing something? https://doc.rust-lang.org/std/collections/struct.VecDeque.html

    Abhishek Khare at 2024-01-23 18:47:52

  467. @wallgeek No, you're right. They don't exemplify the APIs at all. It should be fixed!

    John-John Tedro at 2024-01-23 19:26:39

  468. I've checked all new_in methods in https://doc.rust-lang.org/std/index.html?search=new_in&filter-crate=std:

    • VecDeque - a blunt copy from new
    • BTreeMap
      • Makes a new empty BTreeMap with a reasonable choice for B.

      • what is B exactly?
    • BTreeSet - ditto with BTreeMap
    • LinkedList
      • Constructs an empty LinkedList<T, A>.

      • the description can be improved

    The rest is okay.

    pravic at 2024-01-24 07:14:35

    • what is B exactly?

    Currently in std:

    const B: usize = 6;
    

    The struct-level docs explain this and compare a B-tree with a binary tree that would have individual allocations for each item:

    https://doc.rust-lang.org/std/collections/struct.BTreeMap.html

    A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing this, we reduce the number of allocations by a factor of B, and improve cache efficiency in searches.

    It’s probably not relevant for the docs of a constructor to talk about the "choice" of B, since that choice is compile-time constant in the current implementation. (As opposed to something users could influence like Vec::with_capacity)

    Simon Sapin at 2024-01-24 21:16:33

  469. With allocator_api, we now have safe memory allocation methods like try_new() for types like Box and Arc, and some collections (Vec, HashMap) support try_reserve as a workaround. However, collections like BTreeMap lack equivalent allocation-checked methods for operations like insert. To build a more consistent API, could we introduce allocation-checked variants across std::collections::* with a clear prefix like checked_, or safe_, etc.. (e.g., safe_insert, checked_push)? These methods would return Result<T, AllocError> on allocation failure, streamlining safe memory allocation handling.

    vmolsa at 2024-10-31 04:20:51

  470. I added a small guide for current state of allocator_api into one of my projects' documentation. Search engines don't seem to handle it, but it may be useful for some new people looking around.

    In any case, some general feedback on the current state from own experiences is below.

    General Purpose Feedback

    When trying allocator_api for the first time a while back, while I was still relatively new to Rust, I found it to be a tiny bit challenging to use.

    The semantics weren't immediately clear at first, e.g.

    • How to consume an allocator.
    • Allocator reference vs zero sized allocator (2 ways to design an allocator)
    • Overheads (if any) on the heap

    Even with a blog post or two around, not everything was clear, so I added another resource (above). Think this can be resolved with just a tiny bit more examples/docs.

    Allocate & Remaining Methods

    One thing I also found a bit weird at first is allocate returns a NonNull<[u8]>, but the other APIs take a NonNull<u8>.

    It may not be immediately clear to people newer to Rust that the representation of a slice in Rust is ptr + len (fat pointer), there is (technically) a possibility that someone may think the representation is a pointer to len + data in same memory allocation; and that assumption may confuse a user.

    I've temporarily been in that camp, until I learned that [T] is actually unsized (Dynamically sized type (DST)), and references to the data is always ptr + len. The magic is the phrase [Pointer types](https://doc.rust-lang.org/reference/types/pointer.html) to DSTs are sized but have twice the size of pointers to sized types from the DST Docs

    Providing a blanket implementation here may be useful, so user can just pass whatever they received from allocate to the other functions.

    allocator_api2

    For now we have allocator_api2 to provide re-exports.

    It's generally not so hard to use, however some 'best practices' could be noted somewhere, given how long actual design of allocator_api has been taking; these things that come to mind:

    • There's no coercion to unsized types without the actual std, type so you have to rely on undocumented unsize_box hack. (and similar caveats)
    • You have to write no_std to avoid std prelude (can still use std crate via extern crate), otherwise it's easy to mix types such std Vec and allocator_api2 Vec.
    • Code duplication and cache (in-efficiency), since programs compiled will now have multiple copies of the regular containers.

    In any case, since the thread has died down, for over a year, does anyone know the future plans/state for allocator_api?

    There's a lot of talk above, as usual; but it's hard to make conclusions given the long passage of time since the thread was alive, conversations may have been happening in the working group chats for example; so I figured I'd ask.

    Sewer. at 2024-12-05 22:28:47

  471. @Sewer56 Thanks for the summary and your small guide. It's super useful. There have been some discussions happening related to this on Rust Zulip. https://rust-lang.zulipchat.com/#narrow/channel/219381-t-libs/topic/no_global_oom_handling.20removal/near/498629183

    Vaishali Thakkar at 2025-02-10 06:32:43

  472. As someone who just finished up implementing Allocator for my new crate (https://crates.io/crates/stalloc), I think I'm qualified enough to comment on this API.

    And... it's pretty bad. Many of the design decisions have forced me to make performance sacrifices for no reason:

    • allocate() should absolutely not allow zero-sized allocations. In my code, this ends up being special-cased to return a dangling pointer, resulting in additional overhead on every single allocation.
    • deallocate() also needs a special case to handle user code trying to deallocate a dangling pointer.
    • grow(): this one is horrendous. It stuffs together four separate tasks into one function: trying to grow, allocating, copying, freeing. That should be four separate function calls! This questionable design decision means that growing a Vec often results in the allocator pointlessly copying uninitialized memory because it doesn't realize that we only care about the parts up to len. Not only that, you're also allowed to pointlessly call it with old_layout == new_layout, which has to be special cased. And not only that: you're allowed to call it with a different alignment than your original allocation (why...?) which has to be, once again, checked for and special cased. And not only that, you're also allowed to call this function with a dangling pointer, in case you want to "grow" your zero-size allocation! My proposal is to kill this one with fire. As an alternative, I would like to propose grow_in_place() with the signature:
    /// Tries to grow an allocation in-place. If that isn’t possible, this function is a no-op.
    /// # Safety
    ///
    /// `ptr` must point to a valid allocation of `old_layout.size()` bytes, which was
    /// originally allocated with an alignment of `old_layout.align()`, and `new_size > old_layout.size()`.
    pub unsafe fn grow_in_place(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_size: usize,
    ) -> Result<NonNull<[u8]>, AllocError>;
    

    If a user calls this function and gets AllocError, they can decide whether they want to reallocate, copy, and free — it's not foisted upon them.

    • shrink(): This one has all the same pitfalls and special cases as grow(). Implementing Allocator is so tricky and exhausting because you have to handle absurd situations like user code allocating 1 byte, "shrinking" it to 1 byte (with a higher alignment — time to reallocate!), shrinking it to zero bytes (wait, isn't that just deallocate()...?), shrinking it again to zero bytes but with a higher alignment (wait, is it bad if the dangling pointer isn't aligned enough...?), and then "growing" it to zero bytes with an even higher alignment (you get the picture). Also, what happens if the user calls shrink() with a size of 0 and an alignment of 2^29 and the resulting "dangling" pointer happens to end up right in the middle of your allocator? Admittedly, I'm pretty sure that would never happen, but hopefully you weren't about to try identifying a dangling pointer at runtime. Anyway, to match grow_in_place, I propose replacing shrink() with shrink_in_place:
    /// Shrinks the allocation. This function always succeeds and never reallocates.
    /// # Safety
    ///
    /// `ptr` must point to a valid allocation of `old_layout.size()` bytes, which was originally
    /// allocated with an alignment of `old_layout.align()`, and `new_size` must be in `1..old_layout.size()`.
    pub unsafe fn shrink_in_place(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_size: usize,
    ) -> NonNull<[u8]>
    

    In general, the Allocator methods should be as unsafe as possible, because it's easy to create a wrapper with extra safety checks, but doing the reverse is impossible.

    On an unrelated note, we really need String::new_in(). That might be for another thread, though...

    abgros at 2025-03-28 06:09:59

  473. Something else I thought of: it might be useful to have a method like grow_up_to(), which tries to grow an allocation as much as it can, even if it can't completely fulfill the request:

    /// Tries growing an allocation in-place. If that isn't possible, the allocator will grow
    /// as much as it is able to. This function always succeeds and never reallocates.
    /// # Safety
    ///
    /// `ptr` must point to a valid allocation of `old_layout.size()` bytes, which was
    /// originally allocated with an alignment of `old_layout.align()`, and `new_size > old_layout.size()`.
    unsafe fn grow_up_to(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_size: usize,
    ) -> NonNull<[u8]>
    

    But I'm not sure whether the stdlib collection types would be able to use this as an optimization.

    abgros at 2025-03-28 06:39:04

    • grow(): this one is horrendous. It stuffs together four separate tasks into one function: trying to grow, allocating, copying, freeing.

    we need the semantics to be such that you can implement it by calling C's realloc, hence why it does all the reallocating/copying/freeing if it can't resize in place.

    Jacob Lifshay at 2025-03-28 06:50:56

    • grow(): this one is horrendous. It stuffs together four separate tasks into one function: trying to grow, allocating, copying, freeing.

    we need the semantics to be such that you can implement it by calling C's realloc, hence why it does all the reallocating/copying/freeing if it can't resize in place.

    In principle, I guess you could implement grow_in_place() with C's realloc. It might end up moving and copying your data against your wishes, but user code should have no way of finding this out (the old pointer is always invalidated).

    abgros at 2025-03-28 07:11:00

  474. For the case of zero-sized allocations, the intent is that the allocator is not required to special case it to avoid allocating memory. Instead it is the caller's job to avoid invoking the allocator in the first place when it wants to avoid allocating memory for zero-sized allocations (for example, Vec does this). It's therefore perfectly fine for the allocator to implement ZST allocation in an inefficient manner. Admittedly the Allocator trait documentation could be improved to make this clearer.

    Also, what happens if the user calls shrink() with a size of 0 and an alignment of 2^29 and the resulting "dangling" pointer happens to end up right in the middle of your allocator?

    Zero-sized "allocations" are allowed to be in the middle of other allocations and aren't considered to overlap them. See https://github.com/rust-lang/reference/pull/1657 which discusses this.

    Amanieu d'Antras at 2025-03-28 11:31:00

  475. Zero-sized "allocations" are allowed to be in the middle of other allocations and aren't considered to overlap them. See https://github.com/rust-lang/reference/pull/1657 which discusses this.

    I don't think we reached a consensus on this question there, and it is certainly not reflected in the PR.

    So no, until we have explicitly decided otherwise, zero-sized allocations cannot be in the middle of other allocations, not for the purpose of the Abstract Machine. That said, the Allocator trait does not have to return AM allocations, it just has to return pointers that satisfy certain properties, so the question for Allocator implementations is somewhat independent.

    Ralf Jung at 2025-03-28 12:56:25

  476. you're allowed to call it with aĀ different alignmentĀ than your original allocation (why...?)

    It’s not the common case, but I’ve written code that makes use of this (in https://github.com/Jules-Bertholet/unsized-vec/).

    Jules Bertholet at 2025-03-28 13:32:16

  477. When discussing some things on Zulip. One of the things I saw was the desire for Allocators to (work with/be part of) the proposed store API so that the store API does not make the Allocator API redundant in a lot of cases. So to make this api forward compatibility to allow to be stabilized while the details of the store API or worked on I came up with this: https://github.com/Keith-Cancel/allocator_api_forward_compat

    Keith at 2025-04-07 07:00:26

  478. @abgros re: realloc/grow, I don't know if any allocation libraries actually do this in the wild. But for larger allocations allocators may be implementing realloc more efficiently by remapping the original pages to new virtual memory locations using the OS's virtual memory API (e.g. mremap(2) on linux). In this case, realloc/grow would still need to return a new pointer. But the actual physical memory pages wouldn't have been actually copied.

    Obviously, this is only worth it for large allocations as changing virtual memory mappings has a cost.

    Peter Todd at 2025-04-08 11:02:09