Peerstore pruning

The JS Peerstore (PeerBook) is pretty basic at the moment. Go has TTLs for various address types in its Peerstore. For peers themselves, are there any pruning mechanisms in Go for removing stale peers?

Yeah, both the memory peerstore and the datastore-backed peerstore are garbage collected. go-libp2p uses TTLs (*) to set expiries for individual address book entries.

Here’s the description of the algorithm we use for the datastore-backed peerstore. We recently transitioned out of strict database-enforced TTLs as they amplified entry count and made range queries inefficient.

  • Rationale: It is wasteful to traverse the entire KV store on every GC cycle.
  • Proposal: keep a lookahead window of peers that need to be visited shortly.
    • Purges operate on over lookahead window only, thus avoiding full DB range scans.
    • Default lookahead period: 12 hours.
    • This is not a sliding window, but rather a jumping window. It jumps every 12 hours instead of sliding smoothly.
    • The lookahead window lives in a dedicated namespace in the KV store. Entries have nil value and all data is contained in the key: /peers/gc/addrs/<ts>/<peer id in b32> . ts is the Unix timestamp when the record needs to be visited next. This pattern makes time slice range scans very efficient.
  • Algorithmically, this GC algorithm runs over two cycles.
    • Lookahead cycles populate the lookahead window by performing a full DB scan and adding the entries that need to be visited in this period to the window.
    • Purge cycles perform a range scan over the /peers/gc/addrs namespace until they hit a key with value > now; then they stop. Each entry is refreshed, re-persisted, and removed from the window. It is re-added if the new “next visit timestamp” happens to fall within the current period.

I think this algo makes efficient use of resources and minimises read amplification in the context of GC.

(extracted from https://github.com/libp2p/go-libp2p-peerstore/pull/47#issuecomment-439509360).

(*) TTLs are the wrong metric. We should be modelling address confidence, observations and decay.

Hello, is there any mechanism of pruning stale peers in js implementation of libp2p?
I can see that here: js-libp2p/README.md at 707bd7843c5b05a70916055015e3f483cc385759 · libp2p/js-libp2p · GitHub is a note about adding it in the future so I am wondering if it was added yet (and maybe readme wasn’t updated)? Looking at the code I didn’t see it but perhaps I missed something.

I noticed that we have a lot of stale peers in address book (e.g 5 active peers - 65 in address book) - probably because we have one constantly running node which collects all of them.

@vasco-santos — do you have any thoughts on what the best solution would be here?

I work with EmiM on this, and we’re seeing a huge performance cost because libp2p will endlessly retry offline peers, and we’re using Tor so each of those tries is a little costly from a performance perspective and it adds up.

All of our peer addresses are persistent, since they’re Tor onion URLs, and we already have a store of them all, so we don’t really need this peer discovery piece at all. (Or at least, where it would be helpful would be for gossiping about peers that are currenrtly online so we only connect to those ones!)

Is there a way we can disable redialing peers entirely? Like, drop them from the peerstore as soon as we fail to connect?

Hey folks,

Unfortunately js peerStore still does not have TTLs/any peer scoring in place that would enable peerstore pruning. We did not implement TTLs like go as we would like to get the address confidence as a better indicator of the real value of these addresses. Some issues tracking that that you can watch:

In the meantime and taking into consideration your use case, I think the best solution is to disable the autoDial js-libp2p/PEER_DISCOVERY.md at 707bd7843c5b05a70916055015e3f483cc385759 · libp2p/js-libp2p · GitHub . With this, for starting out you can probably iterate the PeerStore in the application level and attempt to dial each peer according to your needs. As you mentioned, I would recommend if a dial fail, you should remove the peer from the AddressBook.

Once we get the address confidence in place, as well as address garbage collecting, you can move to it

1 Like

Thanks, this is helpful!

@vasco-santos thank you for the hints.

I turned off autodial. I wasn’t sure what is the best place to iterate over address book (cause at the beginning address book is empty) so instead I attached to ‘peer:discovered’ event and for each discovered peer I try to dial it - when dialing fails the peer is removed from the books (libp2p.peerStore.delete).

This is the test I performed:

I created a small network consisting of a few peers. I let them connect to each other and then I disconnect one of them. The thing is that the peer is still discovered after being deleted and the loop of ‘discovering and deleting’ continues. I guess this is because at the same time at least one peer still has information about deleted peer (cause cleaning process happens in different time for each peer)?

I even added deleting peer to peer:disconnect just to be sure but it didn’t help.

Example - disconnected node is discovered over and over again:

peer_1          | Discovered QmRbkBkhTt2DbLMF8kAaf1oxpfKQuEfLKFzVCDzQhabwkw: 40
peer_1          | addressbook before deleting [Map Iterator] { 'QmRbkBkhTt2DbLMF8kAaf1oxpfKQuEfLKFzVCDzQhabwkw' }
peer_1          | Aborting /dns4/<address>/tcp/7788/ws/p2p/QmRbkBkhTt2DbLMF8kAaf1oxpfKQuEfLKFzVCDzQhabwkw: 40

The “Aborting (…)” log comes from our WebSockets transport module from ._connect method. On production we have lots of noise here because it tries to dial even old peers and does it many times per peer. 40 is how many times the code reached this line for single peer.

Do you have any idea what I’m doing wrong? Maybe I should be using perstent datastore?

NOTES:

  • The peer I disconnected manually was bootstrap node so every peer had information about it from the beginning. Perhaps that’s why other peers still knew about it and were discovering it again and again?
  • If I removed (disconnect) the regular peer that is known by one/two peers it’s not discovered anymore.

UPDATE: It’s working now. Regular peer deleted from address book is not seen in the network, deleted bootstrap node is constantly being discovered.

Thanks for reporting this @EmiM

So, the bootstrap module continuously emits the discovery event js-libp2p-bootstrap/index.js at master · libp2p/js-libp2p-bootstrap · GitHub which will ends up doing what you experience. Probably worth for you to fork bootstrap and only emit one time, or just do the dial manually instead of using it.

Thank you for the answer.

  • Does transport module (in our case Websocket+tor) depends on address book or only on discovered peers? Maybe deleting inactive peers from peer book is not a good approach?
    In _maybeConnect - src/index.js I added a check whether connecting to peer fails or succeeds by parsing the error message (HostUnreachable). Do you think this would be a good place to remove peer from a book?

  • What consequences will be for emitting peer:discovery only one time? I suppose interval was added here for a reason :slight_smile:

Update: I parse error caught in _maybeConnect but somehow I stopped getting “HostUnreachable” (even if it’s definitely thrown by Websocket transport) and only errors I get are aggregated “The operation was aborted” and/or “already aborted” but I am not sure if I can assume that aborted dial == failed dial.