#[repr(C)] enum bit width determination algo does not match that of the C compiler
In C, support for enum determinants that cannot be represented by an int is implementation-defined. Many compilers choose a different basic type for representation, based on the types or the values of the determinants. This is where the algorithm currently used by Rust for #[repr(C)] enums may differ from the one used by the C compiler, although the difference can be observed in fairly marginal cases.
GCC uses the types of the determinant expressions to find the best fit, so this is a 32-bit type on 64-bit Linux:
typedef enum {
A = 0x80000001,
B = -0x80000000,
} C;
Yet this is 64-bit:
typedef enum {
A = 0x80000001,
B = -2147483648 /* -0x80000000 */
} C;
This is because the type of integer constant 0x80000000 is determined as unsigned int due to its hexadecimal notation, and the negation preserves the type.
Rust first coerces all discriminants to isize and then apparently works out the best fitting representation type from the value range, where fitting means that negative values are preserved as such. So this ends up being 64-bit:
#[repr(C)]
enum C {
A = 0x80000001,
B = -0x80000000,
}
fn main() {
println!("{}", std::mem::size_of::<C>());
}
I'm uncertain as to which approach is the best for fixing this. Trying to match the behavior of C compilers quirk-for-quirk does not seem to be feasible: there may be more than one compiler per target, and in fact the behavior even changes with compiler options, as e.g. the type of an integer constant is determined differently pre-C99 and post-C99. Perhaps a more conservative solution would be to lint on discriminant values that are out of the libc::c_int domain and suggest using fixed-width representations such as #[repr(u32)].
Credit goes out to @retep998 for discovering this.
Mikhail Zabaluev at 2015-10-09 07:17:05
See also
-fshort-enums. Some ARM ABIs require-fshort-enumsand oithers require-fno-short-enums.Brian Smith at 2015-10-22 04:08:38
@briansmith: I think
-fshort-enumsneeds to be represented by#[repr(u16)]and the like. The concern here is about not doing what the C compiler does by default.Mikhail Zabaluev at 2015-10-22 19:52:21
@briansmith: I think -fshort-enums needs to be represented by #[repr(u16)] and the like. The concern here is about not doing what the C compiler does by default.
I think what the C compiler does by default is sometimes determined by the ABI. For example, I believe sometimes GCC's default for that flag differs depending on how GCC was configured when it was built.
IMO, instead of matching the default of "the" C compiler (in fact, there is often more than one C compiler, each of which may have a different default), Rust should default to matching the ABI.
Brian Smith at 2016-05-27 23:51:53
I tripped over this while trying to add
waitidto libc. This function is specified by POSIX to take one argument of typeidtype_t, andidtype_tis specified to be a C enum. (As far as I know, this is the only such case within POSIX.) At present, because of this (and arguably also #34641) there is no way to be sure that a Rust shim for a C function that takes or returns a C enum type is ABI-correct.Zack Weinberg at 2017-01-06 20:08:57
Rust first coerces all discriminants to
isizeThis by itself is arguably a bug, even on boring platforms like x86 Linux. Simple constructive proof:
#include <stdio.h> enum E { L = 0, H = 0x100000000 }; int main() { printf("%zu > %zu\n", sizeof(enum E), sizeof(void*)); return 0; }$ cc -m32 -Wall -W -O1 big-enum.c && ./a.out 8 > 4That isn't portable ISO C, but GCC and Clang accept it without warnings unless
-pedanticis used, and the second example in #34641 is basically the same thing but with fewer bits.The original discriminant sizing (and
#[repr(C)]) would coerce values to bothu64andi64to try out the various int types, because constant evaluation was… different back then, and “what type do#[repr(C)]initializers have?” wasn't really a well-formed question. But it is now.
So, trying to summarize the problems here:
#[repr(C)]on Windows considers 64-bit types and should not; this looks relatively easy to fix.- C numerics don't match Rust (
-0x80000000vs.-2147483648); this is hard. I don't know if there's any solution besides requiring users to constant-fold according to C rules before using the values in Rust, or maybe writing a (procedural?) macro to do that. Is there some way to at least lint for this? isizemay be too small for some C enums; this can be worked around by manually applying a fixed-sizerepr, assuming there aren't any cases that are larger than some target'sisizebut small enough for ABI differences to differ.- Somewhere along the line the target architecture changed from an
enumto a string, so thei32lower bound became the default, and it's probably no longer correct for all targets (MSP430?), and then there's-fshort-enums. Fixing this is probably tedious but not conceptually difficult.
As for
idtype_t, it's probably not using huge values, so the main problem there would be # 4.Jed Davis at 2017-11-23 07:14:25