Hi @darius, thanks for your input!
From what it sounds like, it doesnt sound like a race condition, but instead it does sounds like rust-libp2p is disconnecting due to an idle connection, which by default would disconnect right away since the duration is set to zero by default.
I called it a race condition because from go-libp2p’s perspective rust-libp2p’s idle timeout is racing against its own efforts to open a new stream. Sometimes rust wins and go gets disconnected and sometimes go wins and we have an open stream which keeps the connection open.
This obviously not only applies in the rust ↔ go interaction but in general. I’m mentioning go-libp2p here because its default BasicHost explicitly waits until the identify exchange has completed before the user can open a new stream, which amplifies this issue.
And, as said above, because Prysm uses the BasicHost under the hood, I believe this could have consequences for Prysm → Lighthouse connectivity in the Ethereum network. I wasn’t able to confirm this with João from Lighthouse yet though. Lighthouse nodes usually have many open connections to Prysm nodes but I hypothesise that they are mostly outbound (instead of inbound) from Lighthouse’s perspective. This is what we couldn’t confirm yet.
There has been discussions to increase the idle timeout, but in this case, you might want to set it yourself via
Config::with_idle_connection_timeoutinSwarmBuilder::with_swarm_config(see ping example) . I find 10 to 15 seconds to be enough for default values but this could be smaller or larger depending on your use case.
I wasn’t aware of the discussion, thanks for the pointer! I think changing the default idle timeout in rust-libp2p to something > 0 is the right solution here. In the meantime, manually setting it works as well of course
I just wanted to raise the general issue (which seems to be already being discussed).