Error on dial: system: cannot reserve connection: resource limit exceeded

I am posting this for future reference and for anyone experiencing the same issue.

I have added some logs to the DHT on a side project that connects to the IPFS DHT and performs some requests. I have noticed that sometimes some Dials fail with the following error: system: cannot reserve connection: resource limit exceeded

At first it seemed an issue related with the limit on the number of file descriptors, but the issue persisted even after increasing the limit.
This program was running on a server in Azure with an ulimit of 64000.
I also tried to run this on a Macbook Pro with an ulimit of 10240.

We run the DHT with default parameters, and configure a libp2p host with tcp, wc, and quic transports:

transports := libp2p.ChainOptions(
		libp2p.Transport(tcp.NewTCPTransport),
		libp2p.Transport(ws.New),
		libp2p.Transport(quic.NewTransport),
	)

listenAddrs := libp2p.ListenAddrStrings(
		"/ip4/0.0.0.0/tcp/0",
		"/ip4/0.0.0.0/tcp/0/ws",
		"/ip4/0.0.0.0/udp/0/quic",
	)

opts := []libp2p.Option{
		libp2p.Identity(priv),
		transports,
		listenAddrs,
		libp2p.DefaultConnectionManager,
	}
h, err := libp2p.New(opts...)

dht, err := kad.New(ctx, h, kad.Mode(1), kad.BootstrapPeers(kad.GetDefaultBootstrapPeerAddrInfos()...), kad.LogToFile(cfg.Agent.DHTLogs, os.O_APPEND|os.O_WRONLY|os.O_CREATE))

...

After that point we provide keys on the DHT.
After providing keys on the DHT we search for a random key on the DHT every second.
The issue seems to occur when trying to search for the key.

I don’t known exactly how many connections it had (I am also not sure how to get that information), but I can try to get it if it helps.
From what I can gather from my logs, it seems that connections are still possible after the first error, but infrequent as it looks like the following dial attempts fall in the same error.

I have been pointed to try to increase the resources of DHT in the libp2p network resource manager go-libp2p-resource-manager/README.md at master · libp2p/go-libp2p-resource-manager · GitHub.

I’ll try this and get back to.

So I managed to solve this by adding more resources to the libp2p node.
Although it was a bit difficult to find out which limit the program was hitting, the error is not very helpful, nor is the code.

The libp2p resource manager documentation, specifies limits for inbound connections and outbound connections. I increased both, with no avail… because the limit I was hitting was the overall system connections limit. I must say that having a connections limit that disregards the inbound and outbound connection limit is a bit weird… (e.g., ConnsInbound + ConnsOutbound != Conns, when I would expect the opposite)

Nevertheless… here to code on how I managed to get around the issue:

limiterCfg, err := os.Open("limiterCfg.json")
if err != nil {
	panic(err)
}
limiter, err := rcmgr.NewDefaultLimiterFromJSON(limiterCfg)
if err != nil {
	panic(err)
}
rcm, err := rcmgr.NewResourceManager(limiter)
if err != nil {
	panic(err)
}

opts := []libp2p.Option{
	libp2p.Identity(priv),
	transports,
	listenAddrs,
	libp2p.DefaultConnectionManager,
	libp2p.ResourceManager(rcm),
}

h, err := libp2p.New(opts...)

//For debug
go func() {
	for {
		<-time.After(1 * time.Minute)
		rcm.ViewSystem(func(scope network.ResourceScope) error {
			stat := scope.Stat()
			fmt.Println("System:",
				"\n\t memory", stat.Memory,
				"\n\t numFD", stat.NumFD,
				"\n\t connsIn", stat.NumConnsInbound,
				"\n\t connsOut", stat.NumConnsOutbound,
				"\n\t streamIn", stat.NumStreamsInbound,
				"\n\t streamOut", stat.NumStreamsOutbound)
			return nil
		})
		rcm.ViewTransient(func(scope network.ResourceScope) error {
			stat := scope.Stat()
			fmt.Println("Transient:",
				"\n\t memory:", stat.Memory,
				"\n\t numFD:", stat.NumFD,
				"\n\t connsIn:", stat.NumConnsInbound,
				"\n\t connsOut:", stat.NumConnsOutbound,
				"\n\t streamIn:", stat.NumStreamsInbound,
				"\n\t streamOut:", stat.NumStreamsOutbound)
			return nil
		})
		rcm.ViewProtocol(kad.ProtocolDHT, func(scope network.ProtocolScope) error {
			stat := scope.Stat()
			fmt.Println(kad.ProtocolDHT,
				"\n\t memory:", stat.Memory,
				"\n\t numFD:", stat.NumFD,
				"\n\t connsIn:", stat.NumConnsInbound,
				"\n\t connsOut:", stat.NumConnsOutbound,
				"\n\t streamIn:", stat.NumStreamsInbound,
				"\n\t streamOut:", stat.NumStreamsOutbound)
			return nil
		})
	}
}()

And the limitCfg.json file that contains the limits (I might have gone a bit overboard, but oh well…)

{
  "System":  {
    "StreamsInbound": 4096,
    "StreamsOutbound": 32768,
    "Conns": 64000,
    "ConnsInbound": 512,
    "ConnsOutbound": 32768,
    "FD": 64000
  },
  "Transient": {
    "StreamsInbound": 4096,
    "StreamsOutbound": 32768,
    "ConnsInbound": 512,
    "ConnsOutbound": 32768,
    "FD": 64000
  },

  "ProtocolDefault":{
    "StreamsInbound": 1024,
    "StreamsOutbound": 32768
  },

  "ServiceDefault":{
    "StreamsInbound": 2048,
    "StreamsOutbound": 32768
  }
}