Libp2p connections return stream reset but no disconnections notified

acud · February 8, 2022, 12:49am

Howdy,

We’ve been seeing elevated amounts of stream resets that do not originate from our application layer code that utilizes libp2p. To reiterate, the protocol messages do not appear to land in the stream handlers that might reject them with a stream reset. It appears that the streams get reset before the messages trickle up to our stack, and to make things even worse, the underlying symptom is that it looks like the connection hangs in a way that one peer assumes it is connected to another, while the other does not share the same view of things.

It appears that things got worse with the update to libp2p v0.16.0 but now with 0.17.0 this stuff is just all over the place and it causes higher level protocols that are meant to enforce SLAs to go haywire and impose all sorts of peer sanctions. We don’t get any disconnection notification from the Host's Network. These nodes sit together on the same k8s cluster, so it is quite a peculiar behavior that we haven’t seen before.

Could you point out to what can lead to this sort of situation? Any experience with this sort of behavior?

mxinden · February 8, 2022, 12:26pm

I am assuming you are using go-libp2p?

Maybe @marten or @vyzo can help here.

acud · February 8, 2022, 12:49pm

that is correct, we are using go-libp2p

marten · February 8, 2022, 12:51pm

In v0.17.0, we decreased the concurrent stream limit in yamux from 1000 to 256 (the limit applies per yamux session).
This probably shows that there’s a bug / leak in how you use streams. You might want to investigate if you’re probably closing streams once you’re done with them, otherwise you’ll leak resources (and run into the limit).
In v0.18.0, we will introduce a resource manager (see the release notes for rc1 for more details), which will (dynamically) limit the number of streams. We just released rc4, which lifts the yamux limit in favor of the limits imposed by the resource manager. These limits can be adjusted (see the resource manager for details). Note that if there’s a leak in your code, you’ll eventually run into the limits anyway, so it would pay off to investigate that anyway.

acud · February 8, 2022, 2:17pm

We do use a lot of transient streams, but I am not sure how many do we use concurrently. We definitely do not multiplex within a protocol on a single stream as suggested by some (however in the examples shown in eth2 implementations this is not done and transient disposable streams are used as far as I remember).

Can you elaborate on what is the expected behavior when the stream limit is reached? is the data just dropped silently or are we supposed to get a stream reset or is there another explicit error which is expected to bubble up the stack? it might be useful to return an explicit error in that case (this is just a suggestion).

Topic		Replies	Views
What causes a "stream reset"? Users and Developers	7	2279	August 1, 2021
how to notify other peer/host that a peer has disconnected so that they can close the stream for that peer Users and Developers	2	808	September 24, 2020
Go-libp2p v0.14.1 released News and announcements	1	426	May 31, 2021
What causes a stream reset? go	5	1040	May 28, 2020
Drastic Network Disruptions After Connmgr Limit Reduce on a Single Node Users and Developers	1	165	January 11, 2024

Libp2p connections return stream reset but no disconnections notified

Related topics