backtraces broken on the Android bot

eca6213
Opened by Daniel Micay at 2021-10-14 06:44:57

PIE is now required by Android (#17437) but it breaks backtrace support on the Android bot. It may just need to be updated to the current NDK / Android versions.

  1. I've locally enabled PIE for android and tested this with the latest NDK (r10d) generated for android-21 and an android emulator running the latest API (android-22), and it still fails with:

    test [run-pass] run-pass/backtrace.rs ... FAILED
    
    failures:
    
    ---- [run-pass] run-pass/backtrace.rs stdout ----
    
    error: test run failed!
    status: exit code: 101
    command: x86_64-apple-darwin/test/run-pass/backtrace.stage2-arm-linux-androideabi
    stdout:
    ------------------------------------------
    
    ------------------------------------------
    stderr:
    ------------------------------------------
    thread '<main>' panicked at 'bad output: thread '<main>' panicked at 'explicit panic', /Users/tamird/src/rust/src/test/run-pass/backtrace.rs:23
    stack backtrace:
       1: 0xb6c53da7 - <unknown>
       2: 0xb6c5ea8b - <unknown>
       3: 0xb6c0df37 - <unknown>
       4: 0xb6f87fb7 - <unknown>
       5: 0xb6f87d73 - <unknown>
       6: 0xb6f8b913 - <unknown>
       7: 0xb6d0dfe3 - <unknown>
       8: 0xb6d0dfbb - <unknown>
       9: 0xb6d0dfbb - <unknown>
      10: 0xb6d0dfbb - <unknown>
      11: 0xb6d0dfbb - <unknown>
      12: 0xb6d0dfbb - <unknown>
      13: 0xb6d0dfbb - <unknown>
      14: 0xb6d0dfbb - <unknown>
      15: 0xb6d0dfbb - <unknown>
      16: 0xb6d0dfbb - <unknown>
      17: 0xb6d0dfbb - <unknown>
      18: 0xb6d0dfbb - <unknown>
      19: 0xb6d0dfbb - <unknown>
      20: 0xb6d0dfbb - <unknown>
      21: 0xb6d0dfbb - <unknown>
      22: 0xb6d0dfbb - <unknown>
      23: 0xb6d0dfbb - <unknown>
      24: 0xb6d0dfbb - <unknown>
      25: 0xb6d0dfbb - <unknown>
      26: 0xb6d0dfbb - <unknown>
      27: 0xb6d0dfbb - <unknown>
      28: 0xb6d0dfbb - <unknown>
      29: 0xb6d0dfbb - <unknown>
      30: 0xb6d0dfbb - <unknown>
      31: 0xb6d0dfbb - <unknown>
      32: 0xb6d0dfbb - <unknown>
      33: 0xb6d0dfbb - <unknown>
      34: 0xb6d0dfbb - <unknown>
      35: 0xb6d0dfbb - <unknown>
      36: 0xb6d0dfbb - <unknown>
      37: 0xb6d0dfbb - <unknown>
      38: 0xb6d0dfbb - <unknown>
      39: 0xb6d0dfbb - <unknown>
      40: 0xb6d0dfbb - <unknown>
      41: 0xb6d0dfbb - <unknown>
      42: 0xb6d0dfbb - <unknown>
      43: 0xb6d0dfbb - <unknown>
      44: 0xb6d0dfbb - <unknown>
      45: 0xb6d0dfbb - <unknown>
      46: 0xb6d0dfbb - <unknown>
      47: 0xb6d0dfbb - <unknown>
      48: 0xb6d0dfbb - <unknown>
      49: 0xb6d0dfbb - <unknown>
      50: 0xb6d0dfbb - <unknown>
      51: 0xb6d0dfbb - <unknown>
      52: 0xb6d0dfbb - <unknown>
      53: 0xb6d0dfbb - <unknown>
      54: 0xb6d0dfbb - <unknown>
      55: 0xb6d0dfbb - <unknown>
      56: 0xb6d0dfbb - <unknown>
      57: 0xb6d0dfbb - <unknown>
      58: 0xb6d0dfbb - <unknown>
      59: 0xb6d0dfbb - <unknown>
      60: 0xb6d0dfbb - <unknown>
      61: 0xb6d0dfbb - <unknown>
      62: 0xb6d0dfbb - <unknown>
      63: 0xb6d0dfbb - <unknown>
      64: 0xb6d0dfbb - <unknown>
      65: 0xb6d0dfbb - <unknown>
      66: 0xb6d0dfbb - <unknown>
      67: 0xb6d0dfbb - <unknown>
      68: 0xb6d0dfbb - <unknown>
      69: 0xb6d0dfbb - <unknown>
      70: 0xb6d0dfbb - <unknown>
      71: 0xb6d0dfbb - <unknown>
      72: 0xb6d0dfbb - <unknown>
      73: 0xb6d0dfbb - <unknown>
      74: 0xb6d0dfbb - <unknown>
      75: 0xb6d0dfbb - <unknown>
      76: 0xb6d0dfbb - <unknown>
      77: 0xb6d0dfbb - <unknown>
      78: 0xb6d0dfbb - <unknown>
      79: 0xb6d0dfbb - <unknown>
      80: 0xb6d0dfbb - <unknown>
      81: 0xb6d0dfbb - <unknown>
      82: 0xb6d0dfbb - <unknown>
      83: 0xb6d0dfbb - <unknown>
      84: 0xb6d0dfbb - <unknown>
      85: 0xb6d0dfbb - <unknown>
      86: 0xb6d0dfbb - <unknown>
      87: 0xb6d0dfbb - <unknown>
      88: 0xb6d0dfbb - <unknown>
      89: 0xb6d0dfbb - <unknown>
      90: 0xb6d0dfbb - <unknown>
      91: 0xb6d0dfbb - <unknown>
      92: 0xb6d0dfbb - <unknown>
      93: 0xb6d0dfbb - <unknown>
      94: 0xb6d0dfbb - <unknown>
      95: 0xb6d0dfbb - <unknown>
      96: 0xb6d0dfbb - <unknown>
      97: 0xb6d0dfbb - <unknown>
      98: 0xb6d0dfbb - <unknown>
      99: 0xb6d0dfbb - <unknown>
      100: 0xb6d0dfbb - <unknown>
     ... <frames omitted>
    ', /Users/tamird/src/rust/src/test/run-pass/backtrace.rs:54
    
    ------------------------------------------
    
    thread '[run-pass] run-pass/backtrace.rs' panicked at 'explicit panic', /Users/tamird/src/rust/src/compiletest/runtest.rs:1525
    

    Tamir Duberstein at 2015-04-29 20:49:17

  2. Triage: we're in the process of moving all of the bots to Travis/AppVeyor, so I'm not sure if this 1. is still true 2. will persist there.

    Steve Klabnik at 2017-01-03 17:34:26

  3. So after a bit of investigation, I found that android internally uses libunwind to handle its unwinding. It might be a good idea to try having rust use that ? If it's a good idea, I might start implementing it.

    Robin Lambertz at 2017-07-22 18:04:27

  4. It's an option. We only used libunwind for a handful of releases. We've started writing our own unwinder since none of the off the shelf options were reliable or performant enough for our needs. That's getting close to ready, and we will be including it in the NDK.

    Dan Albert at 2017-07-22 18:35:21

  5. @DanAlbert Oh, OK. So, if I make FFI bindings to libunwinderstack (when it's in the NDK) and use them in libstd, would I need to statically link to it ?

    Robin Lambertz at 2017-07-22 20:03:08

  6. Either statically linked or included in the APK. The former is probably simpler.

    Dan Albert at 2017-07-22 20:17:13

  7. Right. Thanks for the insights :D. I'll try figuring out how libunwinderstack works and hopefully draft a PR up over the next few days.

    EDIT: Just found out libunwinderstack is written in CPP. Hopefully that won't hinder me too much, but it is going to be a bit more painful that I had anticipated at first.

    Robin Lambertz at 2017-07-22 20:22:09

  8. So after some unfruitful attempts at making bindgen and libunwinderstack work together (I couldn't get bindgen to generate constructors for unwinderstack::Maps), I decided to investigate other options.

    In doing so, I found out that backtrace-rs has no trouble getting a correct stacktrace. Furthermore, I figured that rust's tracing part seemed correct (when using RUST_BACKTRACE=full and comparing, the stack depth and pc seem correct), so the printing module seems to be at fault.

    I'm going to be looking into the differences between backtrace-rs and the built-in backtracing of rust. I already have a few ideas.

    Robin Lambertz at 2017-07-23 13:49:15

  9. I've been investigating this over the last few days and I think the issue is a missing dl_iterate_phdr function. It looks like that function was removed from android at some point.

    Andy Lowry at 2017-11-16 09:06:01

  10. Found this in the NDK which might be the issue:

    #if __ANDROID_API__ >= 21
    int dl_iterate_phdr(int (*)(struct dl_phdr_info*, size_t, void*), void*) __INTRODUCED_IN(21);
    #endif /* __ANDROID_API__ >= 21 */
    

    Andy Lowry at 2017-11-16 10:45:29

  11. You're missing a small amount of scope there: that's wrapped in an #ifdef __arm__. It's more obvious what's going on if you look at the unprocessed source in bionic.

    dl_iterate_phdr is not available for ARM Android (for whatever reason) until android-21 (Lollipop). I don't know the current state of libunwindstack, but the goal is for it to be usable for the NDK (so supporting ICS). If the dl_iterate_phdr call is coming from LLVM's libunwind, there was a patch that was submitted recently to add a fallback implementation if it's not available: https://reviews.llvm.org/D39468.

    Dan Albert at 2017-11-16 18:10:36

  12. Thanks. I was trying to work out exactly where that function came from.

    I was ignoring non-arm androids, as they are not that common atm.

    Is there anything we need to do to test and then consume this fix? Or is just a case of waiting and re-enabling the tests once the change propagates to rust?

    Andy Lowry at 2017-11-17 16:59:05

  13. I was ignoring non-arm androids, as they are not that common atm.

    idk about rust developers, but most of the time Android developers will use x86 emulators since they are much faster than an arm emulator.

    Is there anything we need to do to test and then consume this fix? Or is just a case of waiting and re-enabling the tests once the change propagates to rust?

    Should be automatic once you get the update. Do you periodically update from upstream, or do you track AOSP's copy (a fair number of projects do that for Android libraries) or something? If you're tracking AOSP lmk and I'll get it updated.

    Dan Albert at 2017-11-17 19:19:42

  14. We have our own fork of AOSP that we need to bring back towards the official. We would need to bring any change into our own fork.

    Andy Lowry at 2017-11-28 10:46:26

  15. Okay, I'll try to find some time this week to get AOSP's unwinder updated so you at least get this fix when you sync up with upstream.

    Dan Albert at 2017-11-28 21:35:16

  16. Two weeks late, but better than never: https://android-review.googlesource.com/c/platform/external/libunwind_llvm/+/567913

    Dan Albert at 2017-12-13 22:13:57

  17. Is any progress here?

    Kayo Phoenix at 2019-12-24 11:33:29

  18. I Bumped into this issue myself today. I built the binary with this command:

    cargo build --target armv7-linux-androideabi --release
    

    Rust compiler version:

    $ rustc --version
    rustc 1.41.0 (5e1a79984 2020-01-27)
    

    This is what the backtrace looks like:

    I/flutter (22222): ⛔  main - stderr:
    I/flutter (22222): thread '<unnamed>' panicked at 'internal error: entered unreachable code', components/funder/src/handler/canceler.rs:309:43
    I/flutter (22222): stack backtrace:
    I/flutter (22222):    0: <unknown>
    I/flutter (22222):    1: <unknown>
    I/flutter (22222):    2: <unknown>
    I/flutter (22222):    3: <unknown>
    I/flutter (22222):    4: <unknown>
    I/flutter (22222):    5: <unknown>
    I/flutter (22222):    6: <unknown>
    I/flutter (22222):    7: <unknown>
    I/flutter (22222):    8: <unknown>
    I/flutter (22222):    9: <unknown>
    I/flutter (22222):   10: <unknown>
    I/flutter (22222):   11: <unknown>
    I/flutter (22222):   12: <unknown>
    I/flutter (22222):   13: <unknown>
    I/flutter (22222):   14: <unknown>
    I/flutter (22222):   15: <unknown>
    I/flutter (22222):   16: <unknown>
    I/flutter (22222):   17: <unknown>
    I/flutter (22222):   18: <unknown>
    I/flutter (22222):   19: <unknown>
    I/flutter (22222): 
    I/flutter (22222): ⛔  main - stderr:
    I/flutter (22222):   20: <unknown>
    I/flutter (22222):   21: <unknown>
    I/flutter (22222): note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    I/flutter (22222): [2020-03-17T12:50:00Z ERROR stcompact::server_loop] node() error: IndexClientError(AppServerClosed)
    I/flutter (22222): 
    

    real at 2020-03-17 13:25:39

  19. Latest output inside the emulator. This seems somewhat promising, as at least some symbols are showing up. It's not clear to me why we don't have them for the test case but do for std.

    thread 'main' panicked at 'bad output: thread 'main' panicked at 'explicit panic', /checkout/src/test/ui/backtrace.rs:17:9
    stack backtrace:
       0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
       1: core::fmt::write
       2: rust_metadata_std_1b375e3cf1aedcda5e72be7c39da1f61
       3: rust_metadata_std_1b375e3cf1aedcda5e72be7c39da1f61
       4: rust_metadata_std_1b375e3cf1aedcda5e72be7c39da1f61
       5: std::panicking::rust_panic_with_hook
       6: <unknown>
       7: <unknown>
       8: <unknown>
       9: <unknown>
      10: std::rt::lang_start_internal
      11: <unknown>
      12: __libc_init
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    ', /checkout/src/test/ui/backtrace.rs:67:5
    

    Mark Rousskov at 2020-04-30 15:52:52

  20. I'm encountering this issue on an Android 11 device... Any luck resolving it? I only get <unknown>.

    s1341 at 2021-04-07 12:23:07

  21. backtrace-rs doesn't work either. unwind at least produces a list of addresses, but crashes with a segfault when it hits a bad address.

    s1341 at 2021-04-07 13:28:01

  22. Hi, after many years, is there any solution? This is a critical issue, because stack traces are quite important to debug issues in production environments!

    I need to at least know a list of addresses (and do not segfault even if hit bad addr!), such that I can copy these addresses and symbolize on my computer.

    fzyzcjy at 2021-10-12 12:46:34

  23. @s1341 Have you solved it? Thanks!

    fzyzcjy at 2021-10-13 10:11:36

  24. I think backtraces are working for me... using backtrace-rs.

    s1341 at 2021-10-14 05:27:13

  25. @s1341 Hi do you use it in production environment, such as --release, or only with debug? I do face problems sometimes there...

    fzyzcjy at 2021-10-14 06:21:40

  26. I use in release. What issues are you facing? Can you be more specific about when you have problems?

    s1341 at 2021-10-14 06:42:50

  27. @s1341 Hi thanks for your reply! My issue: https://github.com/rust-lang/backtrace-rs/issues/445

    fzyzcjy at 2021-10-14 06:44:57