Gossipsub Stops Propagating Messages After Some Time, Only Resumes After Server Restart

I’m encountering an issue with my libp2p Gossipsub implementation where gossiping halts after some time, both on local and test servers. The only way to resume gossiping is by restarting the servers. Below is the relevant code configuration for my swarm setup and behavior.

Code Configuration

Here’s the relevant code for the swarm setup and behavior:

network.rs

pub async fn setup_swarm_network(
    keypair: Option<Keypair>,
    bootstrap_addresses: Option<Vec<(PeerId, Multiaddr)>>,
    port: String,
) -> Result<Swarm<SwarmBehaviour>, Box<dyn Error>> {
    let builder = if let Some(keypair) = keypair.clone() {
        SwarmBuilder::with_existing_identity(keypair)
    } else {
        SwarmBuilder::with_new_identity()
    };

    let mut swarm = builder
        .with_tokio()
        .with_tcp(
            tcp::Config::default(),
            noise::Config::new,
            yamux::Config::default,
        )?
        .with_behaviour(|keypair| {
            if bootstrap_addresses.is_none() {
                info!("Bootstrap Peer ID :{}", keypair.public().to_peer_id());
            }
            SwarmBehaviour::new(keypair.clone()).unwrap()
        })?
        .with_swarm_config(|c| {
            c.with_idle_connection_timeout(Duration::from_secs(60))
        })
        .build();

    if let Some(ref bootstrap_addresses) = bootstrap_addresses {
        for (peer_id, multi_addr) in bootstrap_addresses {
            swarm
                .behaviour_mut()
                .kademlia
                .add_address(peer_id, multi_addr.clone());
            swarm.dial(multi_addr.clone())?;
            swarm.behaviour_mut().kademlia.bootstrap()?;
        }
    }

    swarm
        .behaviour_mut()
        .gossipsub
        .subscribe(&IdentTopic::new(NETWORK_TOPIC))?;

    let listen_address = format!("/ip4/0.0.0.0/tcp/{}", port);
    swarm.listen_on(listen_address.parse()?)?;

    Ok(swarm)
}

behaviour.rs

#[derive(NetworkBehaviour)]
pub struct SwarmBehaviour {
    pub gossipsub: gossipsub::Behaviour,
    pub kademlia: kad::Behaviour<MemoryStore>,
}

impl SwarmBehaviour {
    pub fn new(key: Keypair) -> Result<Self, Box<dyn std::error::Error>> {
        let peer_id = key.public().to_peer_id();

        let message_id_fn =
            |message: &gossipsub::Message| gossipsub::MessageId::from(digest(&message.data));

        let gossipsub_config = gossipsub::ConfigBuilder::default()
            .heartbeat_interval(Duration::from_secs(HEARTBEAT_INTERVAL))
            .validation_mode(gossipsub::ValidationMode::Strict)
            .duplicate_cache_time(Duration::from_secs(DUPLICATE_CACHE_DURATION))
            .message_id_fn(message_id_fn)
            .max_messages_per_rpc(Some(MAX_MESSAGES_PER_RPC))
            .mesh_n_low(4)
            .mesh_n_high(10)
            .mesh_n(8)
            .build()
            .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;

        let gossipsub = gossipsub::Behaviour::new(
            gossipsub::MessageAuthenticity::Signed(key.clone()),
            gossipsub_config,
        )?;

        let mut kad_config = kad::Config::new(StreamProtocol::new("/ducat/kad/1.0.0"));
        kad_config.set_query_timeout(Duration::from_secs(60));
        kad_config.set_replication_factor(std::num::NonZero::new(4).unwrap());

        let store = kad::store::MemoryStore::new(peer_id);
        let kademlia = kad::Behaviour::with_config(peer_id, store, kad_config);

        Ok(Self {
            gossipsub,
            kademlia,
        })
    }
}
logs
 peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] DEBUG: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [POOL::POLL - END] (elapsed_milliseconds=0,line=556,target=libp2p_swarm::connection::pool)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-swarm-0.45.1/src/connection/pool.rs
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL - EVENT] 4df596e6: write: (WriteState::Init) (file=null,id=2,line=null,log.line=89,log.module_path=yamux::frame::io,log.target=yamux::frame::io,target=log)
    log.file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/yamux-0.13.4/src/frame/io.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
    --
    remote_addr: /ip4/127.0.0.1/tcp/7070/p2p/16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] DEBUG: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [SWARM::POLL - EVENT] Request to peer in query failed with Io(Custom { kind: ConnectionRefused, error: "protocol not supported" }) (line=2358,query=QueryId(0),target=libp2p_kad::behaviour)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-kad-0.46.2/src/behaviour.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [CONNECTIONHANDLER::POLL - EVENT] a6164add/2: read 15 bytes (file=null,id=1,line=null,log.line=336,log.module_path=yamux::connection::stream,log.target=yamux::connection::stream,remote_addr=/ip4/127.0.0.1/tcp/7070,target=log)
    log.file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/yamux-0.13.4/src/connection/stream.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL - EVENT] 4df596e6: read: (ReadState::Init) (file=null,id=2,line=null,log.line=181,log.module_path=yamux::frame::io,log.target=yamux::frame::io,target=log)
    log.file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/yamux-0.13.4/src/frame/io.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
    --
    remote_addr: /ip4/127.0.0.1/tcp/7070/p2p/16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [NETWORKBEHAVIOUR::POLL - START] (line=3149,target=libp2p_gossipsub::behaviour)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-gossipsub-0.47.0/src/behaviour.rs
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL - EVENT] 4df596e6: read: (ReadState::Header (offset 0)) (file=null,id=2,line=null,log.line=181,log.module_path=yamux::frame::io,log.target=yamux::frame::io,target=log)
    log.file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/yamux-0.13.4/src/frame/io.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
    --
    remote_addr: /ip4/127.0.0.1/tcp/7070/p2p/16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [NETWORKBEHAVIOUR::POLL - END] (elapsed_milliseconds=0,line=3149,target=libp2p_gossipsub::behaviour)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-gossipsub-0.47.0/src/behaviour.rs
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [CONNECTIONHANDLER::POLL - END] (elapsed_milliseconds=0,id=1,line=430,remote_addr=/ip4/127.0.0.1/tcp/7070,target=libp2p_gossipsub::handler)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-gossipsub-0.47.0/src/handler.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [NETWORKBEHAVIOUR::POLL - START] (line=2506,target=libp2p_kad::behaviour)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-kad-0.46.2/src/behaviour.rs
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL - END] (elapsed_milliseconds=0,id=2,line=125,target=libp2p_yamux)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-yamux-0.46.0/src/lib.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
    --
    remote_addr: /ip4/127.0.0.1/tcp/7070/p2p/16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] DEBUG: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [CONNECTION::POLL - END] (elapsed_milliseconds=0,id=1,line=247,remote_addr=/ip4/127.0.0.1/tcp/7070,target=libp2p_swarm::connection)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-swarm-0.45.1/src/connection.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL_INBOUND - START] (id=2,line=84,target=libp2p_yamux)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-yamux-0.46.0/src/lib.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
    --
    remote_addr: /ip4/127.0.0.1/tcp/7070/p2p/16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] DEBUG: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [CONNECTION::POLL - START] (id=1,line=247,remote_addr=/ip4/127.0.0.1/tcp/7070,target=libp2p_swarm::connection)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-swarm-0.45.1/src/connection.rs
    --
    peer: 16Uiu2HAmT4FjyydhhSYgLoGjNJEFGHDexiaH6UxWM1VCW1LT5o1X
[2025-05-27T21:50:30.288Z] TRACE: DUCAT_NODE/16057 on Vinays-MacBook-Pro.local: [STREAMMUXER::POLL_INBOUND - EVENT] 4df596e6: write: (WriteState::Init) (file=null,id=2,line=null,log.line=89,log.module_path=yamux::frame::io,log.target=yamux::frame::io,target=log)
    log.file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/yamux-0.13.4/src/frame/io.rs
    --
    peer: 16Uiu2HAmT4

There are more logs


[2025-05-27T21:50:26.347Z] DEBUG: DUCAT_NODE/16048 on Vinays-MacBook-Pro.local: [SWARM::POLL - EVENT] Request to peer in query failed with Io(Custom { kind: ConnectionRefused, error: "protocol not supported" }) (line=2358,query=QueryId(0),target=libp2p_kad::behaviour)
    file: /Users/vinay10949/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-kad-0.46.2/src/behaviour.rs

Questions

  1. What could be causing Gossipsub to stop propagating messages after some time? Are there specific configurations (e.g., heartbeat_interval, duplicate_cache_time, or mesh parameters) that might lead to this behavior?
  2. Could the issue be related to the Kademlia DHT or peer discovery? If so, how can I debug this interaction?
  3. Are there any known issues with the libp2p Gossipsub implementation that could cause this behavior, or is there something in my configuration that might be misconfigured?
  4. How can I effectively debug this issue? Are there specific logs or metrics I should monitor to identify why gossiping halts?
  5. Could the idle_connection_timeout (set to 60 seconds) or other network configurations (e.g., TCP, Noise, Yamux) be contributing to this issue?
1 Like