I’m trying to better understand how components of a rust-libp2p node interact, especially how network behaviours do. I found this thread very helpful already, but some questions remain open.
As context, I’m currently trying to build a file sharing application that should work well on a large amount of consumer devices that are likely behind NATs and firewalls. I expect to need at least
- peer discovery → kademlia (+ identify?!)
- message exchange → request_response (for now)
- direct connection establishment → dcutr, upnp (other options?)
As explained in the post already mentioned above, the identify protocol should enhance peer discovery, as I understand by increasing the amount of knowledge that nodes in the network have about each other, but it’s not required. So far so good. I also understand that this enhancement (a sort of interaction between kademlia and identify) roughly happens as follows:
- kademlia opens a bunch of connections to walk the DHT
- identify “notices” new connections by listening to network events
- identify exchanges peer info with peers behind those new connections if it deems that beneficial
- identify updates the routing table (global state), which all (or many) network behaviours use, improving the node’s performance
Please correct me if any of that is (very) wrong.
I would characterize this behaviour as “very async”. kademlia and identify do not strictly depend on each other in the sense that a specific sequence of events is particularly important to one or the other. Is that right?
I have noticed two interesting things in the file-sharing and dcutr (can’t link, only allowed 2 links as new user) examples:
- file-sharing does not use dcutr. I suppose that means it only works between peers with publicly available addresses?!
- dcutr actually consists of two protocols: dcutr and relay. What I kind of expected to find but didn’t (though I could have missed it), is a sort of
on_dial_attempt_first_hole_punch
functionality.
I want to focus on the second point. If such a functionality does not exist, I see two options that this behaviour could fulfill its functions:
- It is also totally async, meaning when a dialing attempt is made to a node behind NAT, it would simply pass through the relay (though I believe to have read or heard that relays aren’t that “general purpose”, but idk). This would just work, just perhaps be slow. While the nodes already communicate, dcutr could do its magic, punch a hole, and then upgrade the connection (I suppose again by manipulating the routing table in some way). The nodes would then eventually smoothly switch to direct communication
- It actually hooks into the dialing process, delaying it until hole punching is done. If this is the case, given the event driven nature of rust-libp2p, some fairly intricate inter-network-behaviour communication has to occur for other tasks to wait until dcutr has done its thing.
Could you clarify which of the two is close to the truth, what I still didn’t get right, and any additional details you think would be important to understand?
I also don’t yet know how the composability of NetworkBehaviours factors into this picture. Could you explain how the view of the network / general capabilities of Behavior B differs in the following two cases?
Swarm
|- top level Behaviour
|- Behaviour A
|- Behaviour B
vs
Swarm
|- top level Behaviour
|- Behaviour A
|- Behaviour B
or in words: When would I nest B in A instead of make B a sibling of A?
And since we are at the topic of kademlia and hole punching, the final piece I’m not clear on is how kademlia acts without any sort of hole punching mechanism: It’s purpose is peer discovery. Let’s look at the case where I’m looking for some explicit PeerId. It’s supposed to give me back a list of network addresses under which I can (probably) reach that peer, right?! For these addresses to be of use, they need to be dialable. So, assuming that the PeerId I’m querying is behind a firewall and no NAT traversal mechanism is enabled, would kademlia simply return no info on that peer, or would it return addresses that I can never dial successfully? And if e.g. dcutr is active, is it correct that it would return an address to the relay the peer has a reservation with, which I could then use to perform hole punching?
I realize this is a huge topic and I’m sorry to cram all that into a single post. I just really hope that someone can connect the dots between these inter-realated topics