Dialing a node from >5 nodes stopped working between 0.39.5 and 0.42.2

In 0.39.5 we had a simple test that would initialize 10 nodes, have the last 9 nodes dial the first one, publish a message from a random node, and see if it would propagate.

This test started failing when we upgraded to 0.42.2 as first node would start rejecting connections after it had already received 5 connections (i.e. when the 7th node dialed it). Here’s a small jest test to reproduce the error:

import { noise } from '@chainsafe/libp2p-noise';
import { mplex } from '@libp2p/mplex';
import { tcp } from '@libp2p/tcp';
import { createLibp2p, Libp2p } from 'libp2p';

const createNode = async () => {
  const node = await createLibp2p({
    addresses: {
      listen: ['/ip4/0.0.0.0/tcp/0'],
    },
    transports: [tcp()],
    streamMuxers: [mplex()],
    connectionEncryption: [noise()],
  });

  return node;
};

const connectAddr = async (source: Libp2p, target: Libp2p) => {
  for (const address of target.getMultiaddrs()) {
    try {
      const conn = await source.dial(address);
      if (conn) {
        return true;
      }
    } catch (error: any) {
      console.log('error', error);
      return false;
    }
  }
  return false;
};

describe('123456', () => {
  test('123456', async () => {
    const nodes = await Promise.all([
      createNode(),
      createNode(),
      createNode(),
      createNode(),
      createNode(),
      createNode(),
      createNode(), // does not fail if you comment this out
    ]);

    for (const node of nodes.slice(1)) {
      const res = await connectAddr(node, nodes[0]);
      expect(res).toBeTruthy();
    }
  });
});

The error received is:

      error [AggregateError: All promises were rejected] {
        [errors]: [
          Error: read ECONNRESET
              at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
            errno: -54,
            code: 'ERR_ENCRYPTION_FAILED',
            syscall: 'read'
          }
        ]
      }

This can be resolved if we add the peer id to the address book first and then call dial. But I couldn’t find anything obvious in the changelog that points to why dialing the addrs directly should fail after 5.

Any ideas what might be causing this?

Not sure but it might be the auto-dial feature that changed from 0.39 -0.40: fix!: do not auto-dial peers by achingbrain · Pull Request #1397 · libp2p/js-libp2p · GitHub

Perhaps you might want to configure the connection manager to keep more connections?

You can see more information by running your test with the env var DEBUG=libp2p*

If you do that you will see "libp2p:connection-manager connection from 127.0.0.1 refused - inboundConnectionThreshold exceeded by host /ip4/127.0.0.1/tcp/59306 +31ms"

What’s happening is libp2p is protecting itself from malicious peers who try to open lots and lots of connections very quickly.

To get your test to pass either introduce a delay between connection attempts, or set the connection manager’s inboundConnectionThreshold config option to be Infinity.

Further reading: js-libp2p/LIMITS.md at master · libp2p/js-libp2p · GitHub

1 Like

thanks, that resolved the issue. i missed this change in the 0.40 release notes