Provide a way to const-initialize vendor-specific vector types

724cc36
Opened by Steven Fackler at 2024-12-01 15:03:24

The platform independent simd types have constructors which are const fns, so they can be used in things like lookup tables: https://raw.githubusercontent.com/sfackler/stream-vbyte64/ea0d5b0afdf97f31473fafbf33d460fbbb313785/src/tables.rs.

However, the same is not the case for the vendor-specific types like __m256i. There are platform intrinsics which initialize those types (e.g. _mm256_setr_epi32 is equivalent to i32x8::new) but those are not const fns.

One way to work around this is by type-punning through unions:

#![feature(stdsimd)]

use std::arch::x86_64::*;

#[repr(C)]
union Pun {
    a: __m256i,
    b: [u32; 8],
}

static FOO: Pun = Pun { b: [1, 2, 3, 4, 5, 6, 7, 8] };

fn main() {
    let x = unsafe { _mm256_extract_epi32(FOO.a, 3) };
    println!("{}", x);
}

This kind of union is pretty common in SIMD-related C code I've seen, and I think well defined behavior from what the unions RFC describes. Is this the "right" way to do this? Can/should we make functions like _mm256_set_epi32 const?

cc @alexcrichton

  1. cc @gnzlbg

    @sfackler to confirm, is this something that's expected from C/C++? Or is this a "nice to have" in Rust?

    Alex Crichton at 2018-03-05 10:32:12

  2. The following should work in nightly as soon as https://github.com/rust-lang-nursery/stdsimd/pull/338 is merged:

    const A: u32x8 = u32x8::new(1, 2, 3, 4, 5, 6, 7, 8);
    let b: __m256i = A.into_bits(); 
    

    However, IIUC @sfackler correctly, the problem is that we can't make b const because into_bits is not a const fn. @sfackler did I understand you correctly?

    If so, this has two causes:

    • mem::transmute is not a const fn
    • into_bits is a trait method

    First, we need to be able to do transmutes in const fns somehow (*). The plan still is to have zero-cost bitwise conversions between vector types of the same size. That is, between architecture specific ones like __m256 and __m256i, between portable vector types like u32x8 and u64x4, and between portable vector types and architecture specific ones (e.g. _m256 and u32x8).

    Currently, these conversions have zero runtime cost, but as @sfackler mentions, they have an usability cost if they aren't const fns. We should eliminate this cost.

    Once we can perform const mem::transmutes, @sfackler could write:

    const B: __m256i = mem::transmute(u32x8::new(1, 2, 3, 4, 5, 6, 7, 8)); 
    

    In the longer term I would prefer if the following would just work:

    const B: __m256i = u32x8::new(1, 2, 3, 4, 5, 6, 7, 8).into_bits(); 
    

    But we would need const trait methods for this, or at least I don't see a way around that.


    (*) EDIT: We could provide our own simd::transmute function that uses an union internally to perform const transmutes. . . like this:

    #![feature(const_fn, untagged_unions, stdsimd)]
    use std::arch::x86_64::*;
    use std::simd::u32x8;
    
    // Marker trait for safe bitwise conversions
    pub unsafe trait SafeFromBits<Other> {}
    unsafe impl SafeFromBits<u32x8> for __m256i {}
    
    mod simd {
        #[allow(unions_with_drop_fields)]
        union U<A, B> {
            a: A,
            b: B,
        }
        pub const fn transmute<A, B: ::SafeFromBits<A>>(x: A) -> B {
            unsafe { U::<A, B> { a: x }.b }
        }
    }
    
    static FOO: __m256i = simd::transmute(u32x8::new(1, 2, 3, 4, 5, 6, 7, 8));
    
    fn main() {
        println!("{:?}", FOO);
    }
    

    Just because we can does not mean we should. I'd still prefer if into_bits() would just be const fn.

    gnzlbg at 2018-03-05 10:58:31

  3. The transmute is trivial.

    Trait methods are currently beyond the horizon (https://github.com/rust-lang/rfcs/pull/2237 has been closed and is currently brainstormed out of band in https://github.com/Centril/rfc-effects)

    Oli Scherer at 2018-04-20 10:13:53

  4. Why not make SIMD intrinsics const fns if it's applicable? IIUC _mm256_set_epi32 internally boils down to mem::transmute(i32x8::new(e0, e1, e2, e3, e4, e5, e6, e7)), so if we'll make core::mem::transmute const, or will use @gnzlbg's workaround (which can be left private) for the time being, then I think it should be relatively easy to pull off.

    Artyom Pavlov at 2018-10-19 16:44:57

  5. I don't see any reason against doing that, at least for the intrinsics that initialize vectors.

    gnzlbg at 2018-10-19 16:49:02

  6. BTW it's already possible to use transmute as const fn.

    That do you think about other "pure" intrinsics? It could be left for later, but I think we should consider making them constant as well in future.

    Artyom Pavlov at 2018-10-19 17:02:52

  7. That do you think about other "pure" intrinsics?

    Many intrinsics are pure, but most of them call llvm intrinsics that cannot be called from const fn, so I don't know whether there is anything that can be done about these. In any case, I think its ok to decide here on a case-by-case basis. The vector constructors look like a good place to start.

    gnzlbg at 2018-10-19 17:38:40

  8. transmute is const since 1.56. This issue seems to be solved now.

    https://rust.godbolt.org/z/s7bEEj3W9

    use std::arch::x86_64::*;
    use std::mem::transmute;
    
    pub const FOO: __m256i = unsafe { transmute([1, 2, 3, 4, 5, 6, 7, 8]) };
    
    #[target_feature(enable = "avx2")]
    pub unsafe fn test() -> u32 {
        _mm256_extract_epi32(FOO, 3) as u32
    }
    
    example::test:
            mov     eax, 4
            ret
    

    Nugine at 2022-09-25 01:47:44