We’ve been seeing elevated amounts of stream resets that do not originate from our application layer code that utilizes libp2p. To reiterate, the protocol messages do not appear to land in the stream handlers that might reject them with a stream reset. It appears that the streams get reset before the messages trickle up to our stack, and to make things even worse, the underlying symptom is that it looks like the connection hangs in a way that one peer assumes it is connected to another, while the other does not share the same view of things.
It appears that things got worse with the update to libp2p
v0.16.0 but now with
0.17.0 this stuff is just all over the place and it causes higher level protocols that are meant to enforce SLAs to go haywire and impose all sorts of peer sanctions. We don’t get any disconnection notification from the
Network. These nodes sit together on the same k8s cluster, so it is quite a peculiar behavior that we haven’t seen before.
Could you point out to what can lead to this sort of situation? Any experience with this sort of behavior?