Is libp2p secure against memory DoS attacks by malicious peers?

yotam · February 6, 2023, 8:39am

Hi, not sure whether this was discussed before, but I could not find much info so asking here.
Is libp2p (specifically the Rust version that I consider using) resilient against memory DoS attacks?
I mean, are there any data structures that can grow indefinitely? Are there any means they could be bounded (e.g., by guaranteeing that not more than X messages may be in-flight simultaneously)?
Are there any ways malicious peers could cause other peers to go out of memory? Even in a theoretical end-case…

Any pointer to existing information would also be useful.

Thank you!

MarcoPolo · February 10, 2023, 6:55pm

Yes. Nothing should grow unbounded. See DoS Mitigation - libp2p for general thoughts on the matter.

If you do find something to the contrary please reach our directly to either me or Max Inden (rust-libp2p maintainer)

mxinden · February 10, 2023, 9:25pm

Thanks for reaching out directly. Important topic.

Adding to Marco’s comment above, since rust-libp2p CVE we have released a series of patches hardening a rust-libp2p node against memory DOS attacks.

We have not been able, neither in practice nor in theory, to attack and bring down a node with a reasonable amount of attack resources since.

Most communication across components is backpressured, thus ensuring a bounded number of e.g. “messages” in flight. See Backpressure between components · Issue #3078 · libp2p/rust-libp2p · GitHub for outstanding work.

As of today, rust-libp2p deployments are dependent on setting reasonable connection limits. Upcoming, limits can be enforced dynamically, e.g. based on available memory.

@yotam let us know if this is of some help. Happy to answer any questions here or in the next community call. As Marco said above, in case you do find a vulnerability, please disclose it via a private medium.

mxinden · February 10, 2023, 9:27pm

Also worth mentioning are our rust-libp2p coding guidelines, containing multiple entries on the topic of backpressure (memory DOS) and local work prioritization (CPU DOS).

github.com

libp2p/rust-libp2p/blob/master/docs/coding-guidelines.md

# Coding Guidelines

<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
**Table of Contents**

- [Coding Guidelines](#coding-guidelines)
    - [Hierarchical State Machines](#hierarchical-state-machines)
        - [Conventions for `poll` implementations](#conventions-for-poll-implementations)
        - [Prioritize local work over new work from a remote](#prioritize-local-work-over-new-work-from-a-remote)
    - [Bound everything](#bound-everything)
        - [Channels](#channels)
        - [Local queues](#local-queues)
        - [Tasks](#tasks)
        - [Further reading](#further-reading)
    - [No premature optimizations](#no-premature-optimizations)
    - [Keep things sequential unless proven to be slow](#keep-things-sequential-unless-proven-to-be-slow)
    - [Use `async/await` for sequential execution only](#use-asyncawait-for-sequential-execution-only)
    - [Don't communicate by sharing memory; share memory by communicating.](#dont-communicate-by-sharing-memory-share-memory-by-communicating)
        - [Further Reading](#further-reading)
    - [Use iteration not recursion](#use-iteration-not-recursion)

This file has been truncated. show original

yotam · February 14, 2023, 10:00am

Thanks a lot for the answers.

I have another question w.r.t backpressure then: if libp2p blocks and backpressures, then one single (malicious) peer could potentially slow a (benign) sender down. Are there any mechanisms to protect against such behavior? Where can I read about this blocking behavior?

Thank you very much!

mxinden · February 14, 2023, 2:42pm

We use one channel per connection. Thus a single connection (e.g. malicious peer) can not dis-proportionally slow down all other connections.

github.com

libp2p/rust-libp2p/blob/caed1fe2c717ba1688a4eb0549284cddba8c9ea6/swarm/src/connection/pool.rs#L516


      
              id: ConnectionId,
              obtained_peer_id: PeerId,
              endpoint: &ConnectedPoint,
              muxer: StreamMuxerBox,
              handler: <THandler as IntoConnectionHandler>::Handler,
          ) {
              let conns = self.established.entry(obtained_peer_id).or_default();
              self.counters.inc_established(endpoint);
          
              let (command_sender, command_receiver) = mpsc::channel(self.task_command_buffer_size);
              let (event_sender, event_receiver) = mpsc::channel(self.per_connection_event_buffer_size);
          
              conns.insert(
                  id,
                  EstablishedConnection {
                      endpoint: endpoint.clone(),
                      sender: command_sender,
                  },
              );
              self.established_connection_events.push(event_receiver);
              if let Some(waker) = self.no_established_connections_waker.take() {

Note that we spawn one task per connection and “block” (as in preempt) per connection task, not for the entire rust-libp2p process. (rust-libp2p is written using Rust’s async/await and Future::poll, thus the word “blocking” is misleading here.)

rumenov · February 20, 2023, 9:53am

@mxinden thank you for the replies,

I have a few follow-up question.

Note that we spawn one task per connection and “block” (as in preempt) per connection task, not for the entire rust-libp2p process.

I would assume that the publish call in GossipSub can be IO bound . I guess what @yotam was asking how can an adversary peer impact the publish call?

Another question I have is for the protocol bandwidth overhead. If we publish say 1MB/s of data how much bandwidth capacity will be required per peer so that 1MB reaches all peers ? Assume we have a central place where we know all peers and there are no adversary peers. Let’s assume the best possible configuration as well.

I have looked through the [Gossipsub-v1.1 Evaluation Report](https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/vyzovitis2020.pdf)

And please correct me if I am wrong in my reasoning.

Looking at page 10, table 2, row 4. If I understand correctly, overall in the network there are 2tx/sec and the size of the tx is 2KB. This more or less means that at any given second we have a single publisher which publishes 2 messages each with size 2KB. And each peer is connected to 24 other peers.
So 248.8 GB/month is equal to ~96 KB/s for effective publishing of 4KB/s.

Thank you,
Rosti

rumenov · February 20, 2023, 3:33pm

You can ignore my second question for the overhead. Reading more carefully through the docs it is written that the redundancy is proportional to the degree of the network.

mxinden · February 21, 2023, 9:05pm

The publish called on the main state machine (NetworkBehaviour & Swarm) dispatches to the individual connection tasks. An adversary can slow down the connection task but should not be able to slow down the main state machine.

For others, referring to the gossipsub paper here. https://arxiv.org/pdf/2007.02754.pdf

Let me know in case there you have more questions @yotam and @rumenov. Great job digging into the details here. Also happy to pair-program on a first proof-of-concept in case that would help you hit the ground running.

pshahi · February 21, 2023, 9:56pm

Sharing some additional data points/information for completeness:

GossipSub v1.1 Evaluation Report - which details the tests ran, test parameters, and detailed conclusions
security audit
Formal Analysis of GossipSub video and link to formal specification (Can put in touch with video presenter/authors if interested)

Ref: a blog post that has links to the first two bulleted items

rumenov · February 22, 2023, 3:28pm

Thanks a lot for the new pointers @pshahi !

@mxinden, I have another question w.r.t. the protocol which I could not find an answer to.

What happens when you have a subscriber that falls off from the network? More specifically, imagine we have 1 subscriber in a datacenter, the datacenter has an outage and there is no internet connectivity for prolonged period of time. The subscriber did not leave the topic. Are there any message guarantees for this subscriber ? I personally assume that there is some TTL past which messages just won’t be delivered to this subscriber once he comes up. If this is the case, is there a way to detect when a message is TTLed without being delivered to a subscriber or there is really no good way to know this?

yiannisbot · March 1, 2023, 8:09pm

@rumenov jumping in late here. There isn’t a way to deliver a message to a subscriber that has disappeared for a prolonged period of time. What happens here is that peers will try to gossip messages for three consecutive heartbeats (i.e., 3 seconds). If the peer has come back online by that point and connected to some peers in the network, then it’s got some chances of getting the message through gossip - not sure what value I would attach to “some chances” here though

The assumption here is that it is a P2P network so peers might come and go and there is no good way to keep track at the protocol level. In a blockchain scenario that would mean that the node has fallen out of sync, and would therefore need to re-sync and get caught up. This is outside the protocol logic and would have to be implemented on top.

cc’ing @vyzo as I might have forgotten some details by now

Topic		Replies	Views
Static peers support rust	0	312	March 31, 2023
About the rust category rust	5	1450	November 16, 2020
Help wanted with network behaviour, and some questions rust	1	444	May 29, 2022
Rust-libp2p: How to connect by node ID / try hole punching? rust	4	1136	March 28, 2022
Issue with lost packets rust	2	231	October 16, 2023

Is libp2p secure against memory DoS attacks by malicious peers?

Related topics