Using ToSocketAddrs seems to remember EMFILE on the same thread

39120c1
Opened by Sean McArthur at 2025-03-11 19:37:27

This was noticed in https://github.com/hyperium/hyper/issues/1422, where a user tried to trigger more connections than their allowed max file descriptors, and saw the EMFILE error. It was then noticed that afterwards, every call to to_socket_addrs that requires a DNS lookup would fail from then on. However, trying the same DNS lookup on a new thread would work fine.

I was able to reproduce this using just the standard library here:

use std::net::TcpStream;

fn main() {
    let cnt = 30_000; // adjust for your system
    let host = "localhost:3000"; // using "127.0.0.1:3000" doesn't have the same problem

    let mut sockets = Vec::with_capacity(cnt);
    for i in 0..cnt {
        match TcpStream::connect(host) {
            Ok(tcp) => sockets.push(tcp),
            Err(e) => {
                println!("error {} after {} connects", e, i);
                break;
            }
        }
    }

    drop(sockets);
    println!("closing all sockets");

    // sleep because why not
    ::std::thread::sleep(::std::time::Duration::from_secs(5));

    TcpStream::connect(host).unwrap();

    println!("end");
}

Just start up a local server, and try to run this program against it. Also, notice that if you change from "localhost" to "127.0.0.1", the issue doesn't show up.

  1. Using 127.0.0.1 doesn't touch DNS at all, so it makes sense that you wouldn't see the issue with it.

    Steven Fackler at 2018-02-02 00:27:59

  2. Yea, I was providing some instructions on how we isolated it to likely being related to DNS.

    Sean McArthur at 2018-02-02 00:32:40

  3. FWIW, it appears to work fine for me on Fedora 27 (glibc 2.26). I started the hyper hello example under ulimit -n 4096, then ran the reproducer under the default limit 1024:

    error Device or resource busy (os error 16) after 1021 connects
    closing all sockets
    end
    

    That's 16 == EBUSY, but strace shows me that the resolver indeed gets EMFILE trying to open /etc/hosts and the like. Still, I may not be reproducing the same problem or error recovery.

    Also note that on_resolver_failure() has special behavior for glibc < 2.26. It may be that or related bugs lurking in glibc which hurt this case.

    Josh Stone at 2018-02-02 22:33:41

  4. I have hit the same issue, and to test it I put together a small stand-alone reproducer: https://gist.github.com/miquels/c47316f7b19a0af3d9927bafef94de35

    If I build and run this on debian 9/stretch, it shows the buggy behaviour. If I then run the same binary on debian 10/buster (aka "testing", not released yet) it works as expected.

    debian 9/stretch glibc version: 2.24 debian 10/buster glibc version: 2.28

    I ported the same reproducer to C, and sure enough, it shows the same behaviour. This is a bug in glibc that was fixed between 2.25 and 2.28.

    If I set h_errno = 0 after I get an error, the problem goes away. As expected, it is some global state that is not reset.

    I can fix this in the rust reproducer as well, by adding:

        extern { fn __h_errno_location() -> *mut i32; }
        unsafe { *__h_errno_location() = 0 }
    

    So if on_resolver_failure() added this workaround, it would probably solve this issue.

    Miquel van Smoorenburg at 2019-02-05 14:46:21

  5. Triage: Debian 9 EOL was June 2022, so I suggest we close this as obsolete. Or can anyone reproduce this on a still maintained OS version?

    Martin Nordholts at 2024-11-08 18:57:11

  6. RHEL 7 ELS is ongoing, which matches Rust's own minimum support of glibc 2.17, but I can't reproduce the problem there.

    Josh Stone at 2024-11-08 19:31:52