Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread leak in netavark-dhcp-proxy #811

Open
jsonn opened this issue Sep 18, 2023 · 12 comments
Open

Thread leak in netavark-dhcp-proxy #811

jsonn opened this issue Sep 18, 2023 · 12 comments

Comments

@jsonn
Copy link

jsonn commented Sep 18, 2023

Using SuSE MicroOS with a bunch of macvlan-using containers, I see netvark-dhcp-proxy hanging every few days. From journalctl:

netavark[14606]: thread 'tokio-runtime-worker' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /home/abuild/rpmbuild/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:686:29

Even with RUST_BACKTRACE=1 set, it doesn't give a backtrace. Last time this happened, ps reported over 4000 threads for the PID.

@Luap99
Copy link
Member

Luap99 commented Sep 18, 2023

How many macvlan containers are we talking about? Do you know how long your DHCP lease time is?

@jsonn
Copy link
Author

jsonn commented Sep 18, 2023

16 container ATM, 10 minutes.

@Luap99
Copy link
Member

Luap99 commented Sep 18, 2023

Ok I think that explains why it leaks so fast then. I think we spawn a new thread for each lease but somehow the code does not cleanup the old one so we leak the old thread.
I take a look.

@jsonn
Copy link
Author

jsonn commented Mar 25, 2024

Any news?

@Luap99
Copy link
Member

Luap99 commented Apr 2, 2024

No, I haven't found the time to reproduce this issue.

@Jackbaude
Copy link

I can take a look at this issue. Can someone point me in the right direction to reproduce this?

@jsonn
Copy link
Author

jsonn commented May 7, 2024

Use macvlan and a DHCP server with as short a lease as reasonable, e.g. a minute. Observe the number of threads?

@Luap99
Copy link
Member

Luap99 commented May 8, 2024

yes checking ls /proc/$pidOfProxy/task/ over time should show the leak I guess

@baude
Copy link
Member

baude commented Jun 24, 2024

I am now able to replicate. I started 10 containers on a network where the lease is only 60 seconds. In my case, the nv dhcp-proxy PID is 6808 and after a short while:

Threads:	552

@jjzazuet
Copy link

Ah, just noticed this issue. Could this be related? My DHCP lease time is 30 mins.

#1024

Thanks!

@thecubic
Copy link

I definitely have this thread leak, there were 13708 threads for ~15 containers after 3 days of running - and I was also seeing #618 as a symptom (I assume, of thread starvation). I have the underlying pattern (IPv6 multicast on IPv4 network)

I updated past the fix for that specific symptom and I'm watching how many threads it creates long-term

@thecubic
Copy link

My thread leak seems "better, but not totally fixed". I have 1497 threads after 6 days (post #1022) versus the 13708 after 3 days.

Importantly the dhcp-proxy is not spinning CPU right now and my core symptom (restarting containers sometimes had dhcp task aborts) is gone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants