Forming a mental model of how the different protocols relate with each other

Background

  1. I’ve written permissionless blockchain software using rust-libp2p that uses Kademlia and Gossipsub and these seem to work fine.
  2. I have adequate theoretical understanding of the Kademlia and Gossipsub algorithms.

So my questions are not about how to use libp2p, and not about wanting a high-level overview of what Kademlia and Gossipsub are as algorithms.

Instead, my questions are about how the different libp2p protocols work with and relate to each other under the umbrella of libp2p. E.g., do they sit in a layered architecture? Do they interact through events? Do they directly call each other? How much of one protocol’s internal state is visible to another protocol?

I have two specific questions (below), but the general qualm that I’m expressing in this post is that libp2p really needs an architecture and a set of docs that help non-maintainers form an intuitive and accurate mental model of what libp2p is.

I hope someone with knowledge about libp2p internals can respond to my specific questions, and pointers into further reading and info-dumping about architecture would also be appreciated.

Specific questions

How does Kademlia depend on Identify?

The rust-libp2p docs page for Kademlia say that add_address must be called on receiving an Identify message to add the sender to the receiver’s routing table.

A specific question is then: how does Kademlia in the sender trigger the sending of the Identity message in the first place?

The libp2p Kademlia specification only has a single, and brief mention of Identify, and none of the descriptions of DHT operations reference Identify, or call for sending an Identify message as a step. Also, the RPC protobuf Message definition contains a nested “Peer” message, whose fields are mostly duplicates of what is contained in an Identify message, this makes sending an Identify message for Kademlia-peering seem redundant.

Additionally, the fact that Kademlia can ostensibly trigger Identify messages confuses the mental model I’ve been trying to form of libp2p as a “layered architecture”. I’m tempted to imagine Kademlia and Identify as two protocols in the same layer, and Kademlia depending on Identify in this manner feels like a violation of this relationship.

How does Gossipsub use Kademlia for peer discovery?

Multiple articles in the docs state that Kademlia can be used for peer discovery, and the article on pubsub explicitly states that Gossipsub needs a separate protocol for peer discovery, with DHTs mentioned as an option.

Say I choose Kademlia for peer discovery. An obvious thing I’ll look for is some kind of interface that I can use to “plug in” Kademlia into Gossipsub. I would initially expect to find some kind of method that’s generic over a peer discovery mechanism and which I can “pass” Kademlia into.

However, searching through the rust-libp2p Gossipsub codebase, there are no non-comment matches for ”discovery”.

My best hunch about how Gossipsub uses Kademlia, after reading and reflecting on libp2p’s written materials is that every time Kademlia (or any other protocol) opens a connection to a new peer, Gossipsub somehow finds out. When it finds out, it decides whether to also open a substream to the new peer for itself. But how and where exactly this happens, I don’t know.

rust-libp2p’s libp2p-kad does not directly depend on libp2p-identify. Even further it does not depend on any libp2p-identify like protocol running next to it. That said, libp2p-kad’s routing table health can be improved through additional information, e.g. from libp2p-identify. Such information can be external addresses of a node where one only has an incoming connection for. Or for libp2p-kad to learn that the remote speaks the Kademlia protocol in the first place.

The only thing you as a user have to do is include both libp2p-kad’s NetworkBehaviour and libp2p-identify’s NetworkBehaviour in your derived NetworkBehaviour:

#[derive(NetworkBehaviour)]
pub struct Behaviour {
    identify: identify::Behaviour,
    kademlia: Kademlia<MemoryStore>,
    // ...
}

libp2p-kad can not trigger libp2p-identify messages. Even further libp2p-kad is not aware of the libp2p-identify protocol. The latter can improve the former’s performance but it is not required.

That is exactly right. libp2p-kad will establish connections to remote peers, i.e. discover remote peers. libp2p-gossipsub will be notified of these new connections and then decide whether to include the peer in its mesh or not.

In case you want to see the inner workings, I suggest reading the on_connection_established method in the libp2p-gossipsub NetworkBehaviour implementation.

Let me know if the above is of some help @strelkaalice !

Thanks for replying, @mxinden!

Your answer to my question on how Gossipsub uses Kademlia is helpful, and I’m gonna look deeper into the on_connection_established method.

How do responding peers discover the listening address of requesting peer?

However, I’m still not satisfied with my understanding of how Identify helps Kademlia. The context here is that I want to explore whether we can remove Identify from our software, so I want to know how “optional” it really is.

This section of the rust-libp2p docs say that “existing nodes in Kademlia cannot discover the listen address of nodes querying them without Identify.” I interpret this as concurring with your answer to this GitHub issue here, as well as your direct reply to me in this thread when you say that “Such information can be external addresses of a node where one only has an incoming connection for”.

I understand the problem as follows (tell me if I’m wrong): when a peer sends a FIND_NODE request to another peer, the requesting peer will get the listen address of nodes in the response, but the responding peer will only get the requesting peer’s dial address. The result is that while the requesting peer is able update its routing table, but the responding peer is not able to.

This seems a severe impediment to routing table completeness, and if this is really what’s going to happen if Identify is not in `NetworkBehaviour, then I think Identify is essentially compulsory to use Kademlia for peer discovery.

Now I want to understand how exactly Identify helps. Identify messages do contain listening addresses, so if the requesting peer also sends an Identify message to the responding peer in the scenario I describe above, and the responding peer calls add_address upon receiving it, then the issue is solved.

My question is then: how does adding Identify to NetworkBehaviour trigger the sending of Identify messages? You’ve said that Kademlia does not trigger the sending of Identify messages, so are (just guessing here) Identify messages automatically sent during the establishment of substreams?

I suggest keeping it. I enhances libp2p-kad’s routing table and is useful for debugging. See e.g. GitHub - mxinden/libp2p-lookup: Lookup a peer by its id or address..

It is not about the direction of the request, but the direction of the connection. If I get an inbound connection, I can not be sure that I can reach the remote on the destination address of the inbound connection. E.g. maybe they don’t use TCP port-reuse, or they are behind a firewall or NAT.

Opposite to that, if I established an outbound connection I thus confirmed that I can reach the remote under that address and thus it is safe to add that address to my routing table.

In the case of the former, i.e. an inbound connection, I can use libp2p-identify on the existing inbound connection to have the remote send me their listening addresses, i.e. the addresses that I can use to reach out to them (connections in the other direction). These addresses I can then safely add to my routing table.

automatically

Ah. Got it.

So the peer which initiates the connection automatically sends an Identify message to the other peer, and that way the other peer finds out the address the initiating peer is always reachable at.

Both peers periodically send identify messages to their counterpart.

You can find more details of the protocol here: https://github.com/libp2p/specs/tree/master/identify